Yes, look at KafkaUtils.createRDD On Wed, Jul 22, 2015 at 11:17 AM, Shushant Arora <shushantaror...@gmail.com> wrote:
> Thanks ! > > I am using spark streaming 1.3 , And if some post fails because of any > reason, I will store the offset of that message in another kafka topic. I > want to read these offsets in another spark job and from them the original > kafka topic's messages based on these offsets- > So is it possible in spark job to get kafka messages based on random > offsets ? Or is there any better alternative to handle failure of post > request? > > On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das <t...@databricks.com> > wrote: > >> Yes, you could unroll from the iterator in batch of 100-200 and then post >> them in multiple rounds. >> If you are using the Kafka receiver based approach (not Direct), then the >> raw Kafka data is stored in the executor memory. If you are using Direct >> Kafka, then it is read from Kafka directly at the time of filtering. >> >> TD >> >> On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> I can post multiple items at a time. >>> >>> Data is being read from kafka and filtered after that its posted . Does >>> foreachPartition >>> load complete partition in memory or use an iterator of batch underhood? If >>> compete batch is not loaded will using custim size of 100-200 request in >>> one batch and post will help instead of whole partition ? >>> >>> On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das <t...@databricks.com> >>> wrote: >>> >>>> If you can post multiple items at a time, then use foreachPartition to >>>> post the whole partition in a single request. >>>> >>>> On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher < >>>> rmarsc...@localytics.com> wrote: >>>> >>>>> You can certainly create threads in a map transformation. We do this >>>>> to do concurrent DB lookups during one stage for example. I would >>>>> recommend, however, that you switch to mapPartitions from map as this >>>>> allows you to create a fixed size thread pool to share across items on a >>>>> partition as opposed to spawning a future per record in the RDD for >>>>> example. >>>>> >>>>> On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora < >>>>> shushantaror...@gmail.com> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> Can I create user threads in executors. >>>>>> I have a streaming app where after processing I have a requirement to >>>>>> push events to external system . Each post request costs ~90-100 ms. >>>>>> >>>>>> To make post parllel, I can not use same thread because that is >>>>>> limited by no of cores available in system , can I useuser therads in >>>>>> spark >>>>>> App? I tried to create 2 thredas in a map tasks and it worked. >>>>>> >>>>>> Is there any upper limit on no of user threds in spark executor ? Is >>>>>> it a good idea to create user threads in spark map task? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Richard Marscher* >>>>> Software Engineer >>>>> Localytics >>>>> Localytics.com <http://localytics.com/> | Our Blog >>>>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> >>>>> | Facebook <http://facebook.com/localytics> | LinkedIn >>>>> <http://www.linkedin.com/company/1148792?trk=tyah> >>>>> >>>> >>>> >>> >> >