+
>>>>>> UUID.randomUUID())) {
>>>>>> rdd.collect().forEach(s -> out.println(s));
>>>>>> }
>>>>>> return null;
>>>>>> });
>>>>>>
>>>>>> final Java
tream words =
>>>>> lines.flatMap(x -> Arrays.asList(x.split(" ")));
>>>>> final JavaPairDStream pairs =
>>>>> words.mapToPair(s -> new Tuple2(s, 1));
>>>>> final JavaPairDStream wordCounts =
>>>> final JavaPairDStream wordCounts2 =
>>>> JavaPairDStream.fromJavaDStream(
>>>> wordCounts.transform( (rdd) -> {
>>>> sleep.add(1);
>>>> Thread.sleep(sleep.localValue() < 6 ? 2 : 5000);
>>&g
for (final Integer value : values) {
>>> newSum += value;
>>> }
>>> return Optional.of(newSum);
>>> };
>>>
>>> final List> tuples =
>>> ImmutableList.> of();
>>> final Ja
defaultParallelism()),
>> initialRDD);
>>
>> wordCountsState.print();
>>
>> final Accumulator rddIndex =
>> context.sparkContext().accumulator(0);
>> wordCountsState.foreachRDD( rdd -> {
>> rddIndex.add(1);
>> final St
;> HashPartitioner(context.sparkContext().defaultParallelism()),
>> initialRDD);
>>
>> wordCountsState.print();
>>
>> final Accumulator rddIndex =
>> context.sparkContext().accumulator(0);
>> wordCountsState.foreachRDD( rdd -> {
>>
try (final PrintStream out = new PrintStream(prefix +
> UUID.randomUUID())) {
> rdd.collect().forEach(s -> out.println(s));
> }
> return null;
> });
>
> context.start();
> context.awaitTermination();
> }
>
>
> On 17 J
efault behavior should be that batch X + 1 starts processing only after
> batch X completes. If you are using Spark 1.4.0, could you show us a
> screenshot of the streaming tab, especially the list of batches? And could
> you also tell us if you are setting any SparkConf configurations?
>
, Jun 17, 2015 at 12:22 PM, Michal Čizmazia wrote:
> Is it possible to achieve serial batching with Spark Streaming?
>
> Example:
>
> I configure the Streaming Context for creating a batch every 3 seconds.
>
> Processing of the batch #2 takes longer than 3 seconds and creates a
Is it possible to achieve serial batching with Spark Streaming?
Example:
I configure the Streaming Context for creating a batch every 3 seconds.
Processing of the batch #2 takes longer than 3 seconds and creates a
backlog of batches:
batch #1 takes 2s
batch #2 takes 10s
batch #3 takes 2s
batch
10 matches
Mail list logo