Re: Spark's behavior

Andrew Ash Sat, 03 May 2014 17:32:30 -0700

Hi Eduardo,

Yep those machines look pretty well synchronized at this point.  Just
wanted to throw that out there and eliminate it as a possible source of
confusion.


Good luck on continuing the debugging!
Andrew


On Sat, May 3, 2014 at 11:59 AM, Eduardo Costa Alfaia <
e.costaalf...@unibs.it> wrote:

> Hi TD,
>
> I did a test with 8 workers and 1 word source, the time gap was 27 sec,
> how can see in the log files(in attach).
>
> Hi Andrew,
> I configured the ntp, all machines are synchronized.
> root@computer8:/opt/unibs_test/spark-1.0RC3# for num in
> {1,8,10,11,13,15,16,18,22}; do ssh computer$num date; done
> Sat May  3 20:57:41 CEST 2014
> Sat May  3 20:57:41 CEST 2014
> Sat May  3 20:57:41 CEST 2014
> Sat May  3 20:57:42 CEST 2014
> Sat May  3 20:57:42 CEST 2014
> Sat May  3 20:57:42 CEST 2014
> Sat May  3 20:57:42 CEST 2014
> Sat May  3 20:57:42 CEST 2014
> Sat May  3 20:57:42 CEST 2014
>
>
> Informativa sulla Privacy: http://www.unibs.it/node/8155
>
>
>
>
> On May 3, 2014, at 15:46, Eduardo Costa Alfaia <e.costaalf...@unibs.it>
> wrote:
>
> Hi TD, Thanks for reply
> This last experiment I did with one computer, like local, but I think that
> time gap grow up when I add more computer. I will do again now with 8
> worker and 1 word source and I will see what’s go on. I will control the
> time too, like suggested by Andrew.
> On May 3, 2014, at 1:19, Tathagata Das <tathagata.das1...@gmail.com>
> wrote:
>
> From the logs, I see that the print() starts printing stuff 10 seconds
> after the context is started. And that 10 seconds is taken by the initial
> empty job (50 map + 20 reduce tasks) that spark streaming starts to ensure
> all the executors have started. Somehow the first empty task takes 7-8
> seconds to complete. See if this can be reproduced by running a simple,
> empty job in spark shell (in the same cluster) and see if the first task
> takes 7-8 seconds.
>
> Either way, I didnt see the 30 second gap, but a 10 second gap. And that
> does not seem to be a persistent problem as after that 10 seconds, the data
> is being received and processed.
>
> TD
>
>
> On Fri, May 2, 2014 at 2:14 PM, Eduardo Costa Alfaia <
> e.costaalf...@unibs.it> wrote:
>
>> Hi TD,
>>
>> I got the another information today using Spark 1.0 RC3 and the situation
>> remain the same:
>> <PastedGraphic-1.png>
>>
>> The lines begin after 17 sec:
>>
>> 14/05/02 21:52:25 INFO SparkDeploySchedulerBackend: Granted executor ID
>> app-20140502215225-0005/0 on hostPort computer8.ant-net:57229 with 2 cores,
>> 2.0 GB RAM
>> 14/05/02 21:52:25 INFO AppClient$ClientActor: Executor updated:
>> app-20140502215225-0005/0 is now RUNNING
>> 14/05/02 21:52:25 INFO ReceiverTracker: ReceiverTracker started
>> 14/05/02 21:52:26 INFO ForEachDStream: metadataCleanupDelay = -1
>> 14/05/02 21:52:26 INFO SocketInputDStream: metadataCleanupDelay = -1
>> 14/05/02 21:52:26 INFO SocketInputDStream: Slide time = 1000 ms
>> 14/05/02 21:52:26 INFO SocketInputDStream: Storage level =
>> StorageLevel(false, false, false, false, 1)
>> 14/05/02 21:52:26 INFO SocketInputDStream: Checkpoint interval = null
>> 14/05/02 21:52:26 INFO SocketInputDStream: Remember duration = 1000 ms
>> 14/05/02 21:52:26 INFO SocketInputDStream: Initialized and validated
>> org.apache.spark.streaming.dstream.SocketInputDStream@5433868e
>> 14/05/02 21:52:26 INFO ForEachDStream: Slide time = 1000 ms
>> 14/05/02 21:52:26 INFO ForEachDStream: Storage level =
>> StorageLevel(false, false, false, false, 1)
>> 14/05/02 21:52:26 INFO ForEachDStream: Checkpoint interval = null
>> 14/05/02 21:52:26 INFO ForEachDStream: Remember duration = 1000 ms
>> 14/05/02 21:52:26 INFO ForEachDStream: Initialized and validated
>> org.apache.spark.streaming.dstream.ForEachDStream@1ebdbc05
>> 14/05/02 21:52:26 INFO SparkContext: Starting job: collect at
>> ReceiverTracker.scala:270
>> 14/05/02 21:52:26 INFO RecurringTimer: Started timer for JobGenerator at
>> time 1399060346000
>> 14/05/02 21:52:26 INFO JobGenerator: Started JobGenerator at
>> 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Started JobScheduler
>> 14/05/02 21:52:26 INFO DAGScheduler: Registering RDD 3 (reduceByKey at
>> ReceiverTracker.scala:270)
>> 14/05/02 21:52:26 INFO ReceiverTracker: Stream 0 received 0 blocks
>> 14/05/02 21:52:26 INFO DAGScheduler: Got job 0 (collect at
>> ReceiverTracker.scala:270) with 20 output partitions (allowLocal=false)
>> 14/05/02 21:52:26 INFO DAGScheduler: Final stage: Stage 0(collect at
>> ReceiverTracker.scala:270)
>> 14/05/02 21:52:26 INFO DAGScheduler: Parents of final stage: List(Stage 1)
>> 14/05/02 21:52:26 INFO JobScheduler: Added jobs for time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Starting job streaming job
>> 1399060346000 ms.0 from job set of time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobGenerator: Checkpointing graph for time
>> 1399060346000 ms
>> -------------------------------------------14/05/02 21:52:26 INFO
>> DStreamGraph: Updating checkpoint data for time 1399060346000 ms
>>
>> Time: 1399060346000 ms
>> -------------------------------------------
>>
>> 14/05/02 21:52:26 INFO JobScheduler: Finished job streaming job
>> 1399060346000 ms.0 from job set of time 1399060346000 ms
>> 14/05/02 21:52:26 INFO JobScheduler: Total delay: 0.325 s for time
>> 1399060346000 ms (execution: 0.024 s)
>>
>>
>>
>> 14/05/02 21:52:42 INFO JobScheduler: Added jobs for time 1399060362000 ms
>> 14/05/02 21:52:42 INFO JobGenerator: Checkpointing graph for time
>> 1399060362000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Updating checkpoint data for time
>> 1399060362000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Updated checkpoint data for time
>> 1399060362000 ms
>> 14/05/02 21:52:42 INFO JobScheduler: Starting job streaming job
>> 1399060362000 ms.0 from job set of time 1399060362000 ms
>> 14/05/02 21:52:42 INFO SparkContext: Starting job: take at
>> DStream.scala:593
>> 14/05/02 21:52:42 INFO DAGScheduler: Got job 2 (take at
>> DStream.scala:593) with 1 output partitions (allowLocal=true)
>> 14/05/02 21:52:42 INFO DAGScheduler: Final stage: Stage 3(take at
>> DStream.scala:593)
>> 14/05/02 21:52:42 INFO DAGScheduler: Parents of final stage: List()
>> 14/05/02 21:52:42 INFO DAGScheduler: Missing parents: List()
>> 14/05/02 21:52:42 INFO DAGScheduler: Computing the requested partition
>> locally
>> 14/05/02 21:52:42 INFO BlockManager: Found block input-0-1399060360400
>> locally
>> 14/05/02 21:52:42 INFO CheckpointWriter: Checkpoint for time
>> 1399060361000 ms saved to file '
>> hdfs://computer8:54310/user/root/INPUT/checkpoint-1399060361000', took
>> 2457
>> bytes and 507 ms
>> 14/05/02 21:52:42 INFO CheckpointWriter: Saving checkpoint for time
>> 1399060362000 ms to file '
>> hdfs://computer8:54310/user/root/INPUT/checkpoint-1399060362000'
>> 14/05/02 21:52:42 INFO DStreamGraph: Clearing checkpoint data for time
>> 1399060361000 ms
>> 14/05/02 21:52:42 INFO DStreamGraph: Cleared checkpoint data for time
>> 1399060361000 ms
>> 14/05/02 21:52:42 INFO BlockManagerInfo: Added input-0-1399060360800 in
>> memory on computer8.ant-net:50052 (size: 238.8 KB, free: 1177.0 MB)
>> 14/05/02 21:52:42 INFO SparkContext: Job finished: take at
>> DStream.scala:593, took 0.107033025 s
>> -------------------------------------------
>> Time: 1399060362000 ms
>> -------------------------------------------
>> The Project Gutenberg EBook of Don Quixote by Miguel de Cervantes This
>> eBook is
>> for the use of anyone anywhere at no cost and with almost no restrictions
>> whatsoever You may copy it give it away or re use it under the terms of
>> the
>> Project Gutenberg License included with this eBook or online at www
>> gutenberg
>> net Title Don Quixote Author Miguel de Cervantes Saavedra Release Date
>> July 27
>> 2004 EBook 996 Language English START OF THIS PROJECT GUTENBERG EBOOK DON
>> QUIXOTE Produced by David Widger DON QUIXOTE Complete by Miguel de
>> Cervantes
>> Saavedra Translated by John Ormsby CONTENTS Volume I CHAPTER I WHICH
>> TREATS OF
>> THE CHARACTER AND PURSUITS OF THE FAMOUS GENTLEMAN DON QUIXOTE OF LA
>> MANCHA
>> CHAPTER II WHICH TREATS OF THE FIRST SALLY THE INGENIOUS DON QUIXOTE MADE
>> FROM
>> ...
>>
>> 14/05/02 21:52:42 INFO JobScheduler: Finished job streaming job
>> 1399060362000 ms.0 from job set of time 1399060362000 ms
>>
>>
>>
>> On Apr 30, 2014, at 0:56, Tathagata Das <tathagata.das1...@gmail.com>
>> wrote:
>>
>> Strange! Can you just do lines.print() to print the raw data instead of
>> doing word count. Beyond that we can do two things.
>>
>> 1. Can see the Spark stage UI to see whether there are stages running
>> during the 30 second period you referred to?
>> 2. If you upgrade to using Spark master branch (or Spark 1.0 RC3, see
>> different thread by Patrick), it has a streaming UI, which shows the number
>> of records received, the state of the receiver, etc. That may be more
>> useful in debugging whats going on .
>>
>> TD
>>
>>
>> On Tue, Apr 29, 2014 at 3:31 PM, Eduardo Costa Alfaia <
>> e.costaalf...@unibs.it> wrote:
>>
>>> Hi TD,
>>> We are not using stream context with master local, we have 1 Master and
>>> 8 Workers and 1 word source. The command line that we are using is:
>>> bin/run-example org.apache.spark.streaming.examples.JavaNetworkWordCount
>>> spark://192.168.0.13:7077
>>>
>>> On Apr 30, 2014, at 0:09, Tathagata Das <tathagata.das1...@gmail.com>
>>> wrote:
>>>
>>> Is you batch size 30 seconds by any chance?
>>>
>>> Assuming not, please check whether you are creating the streaming
>>> context with master "local[n]" where n > 2. With "local" or "local[1]", the
>>> system only has one processing slot, which is occupied by the receiver
>>> leaving no room for processing the received data. It could be that after 30
>>> seconds, the server disconnects, the receiver terminates, releasing the
>>> single slot for the processing to proceed.
>>>
>>> TD
>>>
>>>
>>> On Tue, Apr 29, 2014 at 2:28 PM, Eduardo Costa Alfaia <
>>> e.costaalf...@unibs.it> wrote:
>>>
>>>> Hi TD,
>>>>
>>>> In my tests with spark streaming, I'm using
>>>> JavaNetworkWordCount(modified) code and a program that I wrote that sends
>>>> words to the Spark worker, I use TCP as transport. I verified that after
>>>> starting Spark, it connects to my source which actually starts sending, but
>>>> the first word count is advertised approximately 30 seconds after the
>>>> context creation. So I'm wondering where is stored the 30 seconds data
>>>> already sent by the source. Is this a normal spark’s behaviour? I saw the
>>>> same behaviour using the shipped JavaNetworkWordCount application.
>>>>
>>>> Many thanks.
>>>> --
>>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>>>
>>>
>>>
>>>
>>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>>
>>
>>
>>
>> Informativa sulla Privacy: http://www.unibs.it/node/8155
>>
>
>
>
>
>

Re: Spark's behavior

Reply via email to