Re: Serial batching with Spark Streaming

2015-06-20 Thread Michal Čizmazia
+ >>>>>> UUID.randomUUID())) { >>>>>> rdd.collect().forEach(s -> out.println(s)); >>>>>> } >>>>>> return null; >>>>>> }); >>>>>> >>>>>> final Java

Re: Serial batching with Spark Streaming

2015-06-20 Thread Tathagata Das
tream words = >>>>> lines.flatMap(x -> Arrays.asList(x.split(" "))); >>>>> final JavaPairDStream pairs = >>>>> words.mapToPair(s -> new Tuple2(s, 1)); >>>>> final JavaPairDStream wordCounts =

Re: Serial batching with Spark Streaming

2015-06-19 Thread Michal Čizmazia
>>>> final JavaPairDStream wordCounts2 = >>>> JavaPairDStream.fromJavaDStream( >>>> wordCounts.transform( (rdd) -> { >>>> sleep.add(1); >>>> Thread.sleep(sleep.localValue() < 6 ? 2 : 5000); >>&g

Re: Serial batching with Spark Streaming

2015-06-19 Thread Tathagata Das
for (final Integer value : values) { >>> newSum += value; >>> } >>> return Optional.of(newSum); >>> }; >>> >>> final List> tuples = >>> ImmutableList.> of(); >>> final Ja

Re: Serial batching with Spark Streaming

2015-06-19 Thread Michal Čizmazia
defaultParallelism()), >> initialRDD); >> >> wordCountsState.print(); >> >> final Accumulator rddIndex = >> context.sparkContext().accumulator(0); >> wordCountsState.foreachRDD( rdd -> { >> rddIndex.add(1); >> final St

Re: Serial batching with Spark Streaming

2015-06-18 Thread Michal Čizmazia
;> HashPartitioner(context.sparkContext().defaultParallelism()), >> initialRDD); >> >> wordCountsState.print(); >> >> final Accumulator rddIndex = >> context.sparkContext().accumulator(0); >> wordCountsState.foreachRDD( rdd -> { >>

Re: Serial batching with Spark Streaming

2015-06-18 Thread Binh Nguyen Van
try (final PrintStream out = new PrintStream(prefix + > UUID.randomUUID())) { > rdd.collect().forEach(s -> out.println(s)); > } > return null; > }); > > context.start(); > context.awaitTermination(); > } > > > On 17 J

Re: Serial batching with Spark Streaming

2015-06-18 Thread Michal Čizmazia
efault behavior should be that batch X + 1 starts processing only after > batch X completes. If you are using Spark 1.4.0, could you show us a > screenshot of the streaming tab, especially the list of batches? And could > you also tell us if you are setting any SparkConf configurations? >

Re: Serial batching with Spark Streaming

2015-06-17 Thread Tathagata Das
, Jun 17, 2015 at 12:22 PM, Michal Čizmazia wrote: > Is it possible to achieve serial batching with Spark Streaming? > > Example: > > I configure the Streaming Context for creating a batch every 3 seconds. > > Processing of the batch #2 takes longer than 3 seconds and creates a

Serial batching with Spark Streaming

2015-06-17 Thread Michal Čizmazia
Is it possible to achieve serial batching with Spark Streaming? Example: I configure the Streaming Context for creating a batch every 3 seconds. Processing of the batch #2 takes longer than 3 seconds and creates a backlog of batches: batch #1 takes 2s batch #2 takes 10s batch #3 takes 2s batch