Re: spark streaming 1.3 doubts(force it to not consume anything)

Shushant Arora Tue, 18 Aug 2015 11:22:43 -0700

Is it that in scala its allowed for derived class to have any return type ?


 And streaming jar is originally created in scala so its allowed for
DirectKafkaInputDStream  to return Option[KafkaRDD[K, V, U, T, R]]
compute method ?

On Tue, Aug 18, 2015 at 8:36 PM, Shushant Arora <shushantaror...@gmail.com>
wrote:

> looking at source code of
> org.apache.spark.streaming.kafka.DirectKafkaInputDStream
>
> override def compute(validTime: Time): Option[KafkaRDD[K, V, U, T, R]] = {
>     val untilOffsets = clamp(latestLeaderOffsets(maxRetries))
>     val rdd = KafkaRDD[K, V, U, T, R](
>       context.sparkContext, kafkaParams, currentOffsets, untilOffsets,
> messageHandler)
>
>     currentOffsets = untilOffsets.map(kv => kv._1 -> kv._2.offset)
>     Some(rdd)
>   }
>
>
> But in DStream its def compute (validTime: Time): Option[RDD[T]]  ,
>
> So what should  be the return type of custom DStream extends
> DirectKafkaInputDStream .
> Since I want the behaviour to be same as of DirectKafkaInputDStream  in
> normal scenarios and return none in specific scenario.
>
> And why the same error did not come while extending
> DirectKafkaInputDStream from InputDStream ? Since new return type 
> Option[KafkaRDD[K,
> V, U, T, R]] is not subclass of Option[RDD[T] so it should have been
> failed?
>
>
>
>
> On Tue, Aug 18, 2015 at 7:28 PM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> The superclass method in DStream is defined as returning an Option[RDD[T]]
>>
>> On Tue, Aug 18, 2015 at 8:07 AM, Shushant Arora <
>> shushantaror...@gmail.com> wrote:
>>
>>> Getting compilation error while overriding compute method of
>>> DirectKafkaInputDStream.
>>>
>>>
>>> [ERROR] CustomDirectKafkaInputDstream.java:[51,83]
>>> compute(org.apache.spark.streaming.Time) in CustomDirectKafkaInputDstream
>>> cannot override compute(org.apache.spark.streaming.Time) in
>>> org.apache.spark.streaming.dstream.DStream; attempting to use incompatible
>>> return type
>>>
>>> [ERROR] found   :
>>> scala.Option<org.apache.spark.streaming.kafka.KafkaRDD<byte[],byte[],kafka.serializer.DefaultDecoder,kafka.serializer.DefaultDecoder,byte[][]>>
>>>
>>> [ERROR] required: scala.Option<org.apache.spark.rdd.RDD<byte[][]>>
>>>
>>>
>>> class :
>>>
>>> public class CustomDirectKafkaInputDstream extends
>>> DirectKafkaInputDStream<byte[], byte[], kafka.serializer.DefaultDecoder,
>>> kafka.serializer.DefaultDecoder, byte[][]>{
>>>
>>> @Override
>>> public Option<KafkaRDD<byte[], byte[], DefaultDecoder, DefaultDecoder,
>>> byte[][]>> compute(
>>> Time validTime) {
>>>
>>> int processed=processedCounter.value();
>>> int failed = failedProcessingsCounter.value();
>>> if((processed==failed)){
>>> System.out.println("backing off since its 100 % failure");
>>> return Option.empty();
>>> }else{
>>> System.out.println("starting the stream ");
>>>
>>> return super.compute(validTime);
>>> }
>>> }
>>> }
>>>
>>>
>>> What should be the return type of compute method ? super class is
>>> returning Option<KafkaRDD<byte[], byte[], DefaultDecoder, DefaultDecoder,
>>> byte[][]>>  but its expecting
>>>  scala.Option<org.apache.spark.rdd.RDD<byte[][]>> from derived  class . Is
>>> there something wring with code?
>>>
>>> On Mon, Aug 17, 2015 at 7:08 PM, Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>>
>>>> Look at the definitions of the java-specific
>>>> KafkaUtils.createDirectStream methods (the ones that take a
>>>> JavaStreamingContext)
>>>>
>>>> On Mon, Aug 17, 2015 at 5:13 AM, Shushant Arora <
>>>> shushantaror...@gmail.com> wrote:
>>>>
>>>>> How to create classtag in java ?Also Constructor
>>>>> of DirectKafkaInputDStream takes Function1 not Function but
>>>>> kafkautils.createDirectStream allows function.
>>>>>
>>>>> I have below as overriden DirectKafkaInputDStream.
>>>>>
>>>>>
>>>>> public class CustomDirectKafkaInputDstream extends
>>>>> DirectKafkaInputDStream<byte[], byte[], kafka.serializer.DefaultDecoder,
>>>>> kafka.serializer.DefaultDecoder, byte[][]>{
>>>>>
>>>>> public CustomDirectKafkaInputDstream(
>>>>> StreamingContext ssc_,
>>>>> Map<String, String> kafkaParams,
>>>>> Map<TopicAndPartition, Object> fromOffsets,
>>>>> Function1<MessageAndMetadata<byte[], byte[]>, byte[][]> messageHandler,
>>>>> ClassTag<byte[]> evidence$1, ClassTag<byte[]> evidence$2,
>>>>> ClassTag<DefaultDecoder> evidence$3,
>>>>> ClassTag<DefaultDecoder> evidence$4, ClassTag<byte[][]> evidence$5) {
>>>>> super(ssc_, kafkaParams, fromOffsets, messageHandler, evidence$1,
>>>>> evidence$2,
>>>>> evidence$3, evidence$4, evidence$5);
>>>>> }
>>>>> @Override
>>>>> public Option<KafkaRDD<byte[], byte[], DefaultDecoder, DefaultDecoder,
>>>>> byte[][]>> compute(
>>>>> Time validTime) {
>>>>> int processe=processedCounter.value();
>>>>> int failed = failedProcessingsCounter.value();
>>>>> if((processed==failed)){
>>>>> System.out.println("backing off since its 100 % failure");
>>>>> return Option.empty();
>>>>> }else{
>>>>> System.out.println("starting the stream ");
>>>>>
>>>>> return super.compute(validTime);
>>>>> }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> To create this stream
>>>>> I am using
>>>>> scala.collection.immutable.Map<String, String> scalakafkaParams =
>>>>> JavaConverters.mapAsScalaMapConverter(kafkaParams).asScala().toMap(Predef.<Tuple2<String,
>>>>> String>>conforms());
>>>>> scala.collection.immutable.Map<TopicAndPartition, Long>
>>>>> scalaktopicOffsetMap=
>>>>> JavaConverters.mapAsScalaMapConverter(topicOffsetMap).asScala().toMap(Predef.<Tuple2<TopicAndPartition,
>>>>> Long>>conforms());
>>>>>
>>>>> scala.Function1<MessageAndMetadata<byte[], byte[]>, byte[][]> handler
>>>>> = new Function<MessageAndMetadata<byte[], byte[]>, byte[][]>() {
>>>>>         ..});
>>>>> JavaDStream<byte[][]> directKafkaStream = new
>>>>> CustomDirectKafkaInputDstream(jssc,scalakafkaParams ,scalaktopicOffsetMap,
>>>>> handler,byte[].class,byte[].class, kafka.serializer.DefaultDecoder.class,
>>>>> kafka.serializer.DefaultDecoder.class,byte[][].class);
>>>>>
>>>>>
>>>>>
>>>>> How to pass classTag to constructor in CustomDirectKafkaInputDstream ?
>>>>> And how to use Function instead of Function1 ?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 13, 2015 at 12:16 AM, Cody Koeninger <c...@koeninger.org>
>>>>> wrote:
>>>>>
>>>>>> I'm not aware of an existing api per se, but you could create your
>>>>>> own subclass of the DStream that returns None for compute() under certain
>>>>>> conditions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 12, 2015 at 1:03 PM, Shushant Arora <
>>>>>> shushantaror...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Cody
>>>>>>>
>>>>>>> Can you help here if streaming 1.3 has any api for not consuming any
>>>>>>> message in next few runs?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> ---------- Forwarded message ----------
>>>>>>> From: Shushant Arora <shushantaror...@gmail.com>
>>>>>>> Date: Wed, Aug 12, 2015 at 11:23 PM
>>>>>>> Subject: spark streaming 1.3 doubts(force it to not consume anything)
>>>>>>> To: user <user@spark.apache.org>
>>>>>>>
>>>>>>>
>>>>>>> I Can't make my stream application batch interval to change at run
>>>>>>> time . Its always fixed and it always creates jobs at specified batch
>>>>>>> inetval and enqueue them if earleir batch is not finished.
>>>>>>>
>>>>>>> My requirement is to process the events and post them to some
>>>>>>> external server and if external server is down I want to increase the 
>>>>>>> batch
>>>>>>> time - that is not possible but can I make it not to consume any 
>>>>>>> messages
>>>>>>> in say next 5 successive runs ?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: spark streaming 1.3 doubts(force it to not consume anything)

Reply via email to