Re: user threads in executors

2015-07-22 Thread Shushant Arora
Thanks !

I am using spark streaming 1.3 , And if some post fails because of any
reason, I will store the offset of that message in another kafka topic. I
want to read these offsets in another spark job  and from them the original
kafka topic's messages based on these offsets-
 So is it possible in spark job to get kafka messages based on random
offsets ? Or is there any better alternative to handle failure of post
request?

On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das t...@databricks.com wrote:

 Yes, you could unroll from the iterator in batch of 100-200 and then post
 them in multiple rounds.
 If you are using the Kafka receiver based approach (not Direct), then the
 raw Kafka data is stored in the executor memory. If you are using Direct
 Kafka, then it is read from Kafka directly at the time of filtering.

 TD

 On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora shushantaror...@gmail.com
  wrote:

 I can post multiple items at a time.

 Data is being read from kafka and filtered after that its posted . Does 
 foreachPartition
 load complete partition in memory or use an iterator of batch underhood? If
 compete batch is not loaded will using custim size of 100-200 request in
 one batch and post will help instead of whole partition ?

 On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das t...@databricks.com
 wrote:

 If you can post multiple items at a time, then use foreachPartition to
 post the whole partition in a single request.

 On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher 
 rmarsc...@localytics.com wrote:

 You can certainly create threads in a map transformation. We do this to
 do concurrent DB lookups during one stage for example. I would recommend,
 however, that you switch to mapPartitions from map as this allows you to
 create a fixed size thread pool to share across items on a partition as
 opposed to spawning a future per record in the RDD for example.

 On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 Can I create user threads in executors.
 I have a streaming app where after processing I have a requirement to
 push events to external system . Each post request costs ~90-100 ms.

 To make post parllel, I can not use same thread because that is
 limited by no of cores available in system , can I useuser therads in 
 spark
 App? I tried to create 2 thredas in a map tasks and it worked.

 Is there any upper limit on no of user threds in spark executor ? Is
 it a good idea to create user threads in spark map task?

 Thanks




 --
 *Richard Marscher*
 Software Engineer
 Localytics
 Localytics.com http://localytics.com/ | Our Blog
 http://localytics.com/blog | Twitter http://twitter.com/localytics
  | Facebook http://facebook.com/localytics | LinkedIn
 http://www.linkedin.com/company/1148792?trk=tyah







Re: user threads in executors

2015-07-22 Thread Cody Koeninger
Yes, look at KafkaUtils.createRDD

On Wed, Jul 22, 2015 at 11:17 AM, Shushant Arora shushantaror...@gmail.com
wrote:

 Thanks !

 I am using spark streaming 1.3 , And if some post fails because of any
 reason, I will store the offset of that message in another kafka topic. I
 want to read these offsets in another spark job  and from them the original
 kafka topic's messages based on these offsets-
  So is it possible in spark job to get kafka messages based on random
 offsets ? Or is there any better alternative to handle failure of post
 request?

 On Wed, Jul 22, 2015 at 11:31 AM, Tathagata Das t...@databricks.com
 wrote:

 Yes, you could unroll from the iterator in batch of 100-200 and then post
 them in multiple rounds.
 If you are using the Kafka receiver based approach (not Direct), then the
 raw Kafka data is stored in the executor memory. If you are using Direct
 Kafka, then it is read from Kafka directly at the time of filtering.

 TD

 On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 I can post multiple items at a time.

 Data is being read from kafka and filtered after that its posted . Does 
 foreachPartition
 load complete partition in memory or use an iterator of batch underhood? If
 compete batch is not loaded will using custim size of 100-200 request in
 one batch and post will help instead of whole partition ?

 On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das t...@databricks.com
 wrote:

 If you can post multiple items at a time, then use foreachPartition to
 post the whole partition in a single request.

 On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher 
 rmarsc...@localytics.com wrote:

 You can certainly create threads in a map transformation. We do this
 to do concurrent DB lookups during one stage for example. I would
 recommend, however, that you switch to mapPartitions from map as this
 allows you to create a fixed size thread pool to share across items on a
 partition as opposed to spawning a future per record in the RDD for 
 example.

 On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 Can I create user threads in executors.
 I have a streaming app where after processing I have a requirement to
 push events to external system . Each post request costs ~90-100 ms.

 To make post parllel, I can not use same thread because that is
 limited by no of cores available in system , can I useuser therads in 
 spark
 App? I tried to create 2 thredas in a map tasks and it worked.

 Is there any upper limit on no of user threds in spark executor ? Is
 it a good idea to create user threads in spark map task?

 Thanks




 --
 *Richard Marscher*
 Software Engineer
 Localytics
 Localytics.com http://localytics.com/ | Our Blog
 http://localytics.com/blog | Twitter http://twitter.com/localytics
  | Facebook http://facebook.com/localytics | LinkedIn
 http://www.linkedin.com/company/1148792?trk=tyah








Re: user threads in executors

2015-07-22 Thread Tathagata Das
Yes, you could unroll from the iterator in batch of 100-200 and then post
them in multiple rounds.
If you are using the Kafka receiver based approach (not Direct), then the
raw Kafka data is stored in the executor memory. If you are using Direct
Kafka, then it is read from Kafka directly at the time of filtering.

TD

On Tue, Jul 21, 2015 at 9:34 PM, Shushant Arora shushantaror...@gmail.com
wrote:

 I can post multiple items at a time.

 Data is being read from kafka and filtered after that its posted . Does 
 foreachPartition
 load complete partition in memory or use an iterator of batch underhood? If
 compete batch is not loaded will using custim size of 100-200 request in
 one batch and post will help instead of whole partition ?

 On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das t...@databricks.com
 wrote:

 If you can post multiple items at a time, then use foreachPartition to
 post the whole partition in a single request.

 On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher 
 rmarsc...@localytics.com wrote:

 You can certainly create threads in a map transformation. We do this to
 do concurrent DB lookups during one stage for example. I would recommend,
 however, that you switch to mapPartitions from map as this allows you to
 create a fixed size thread pool to share across items on a partition as
 opposed to spawning a future per record in the RDD for example.

 On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 Can I create user threads in executors.
 I have a streaming app where after processing I have a requirement to
 push events to external system . Each post request costs ~90-100 ms.

 To make post parllel, I can not use same thread because that is limited
 by no of cores available in system , can I useuser therads in spark App? I
 tried to create 2 thredas in a map tasks and it worked.

 Is there any upper limit on no of user threds in spark executor ? Is it
 a good idea to create user threads in spark map task?

 Thanks




 --
 *Richard Marscher*
 Software Engineer
 Localytics
 Localytics.com http://localytics.com/ | Our Blog
 http://localytics.com/blog | Twitter http://twitter.com/localytics
  | Facebook http://facebook.com/localytics | LinkedIn
 http://www.linkedin.com/company/1148792?trk=tyah






Re: user threads in executors

2015-07-21 Thread Richard Marscher
You can certainly create threads in a map transformation. We do this to do
concurrent DB lookups during one stage for example. I would recommend,
however, that you switch to mapPartitions from map as this allows you to
create a fixed size thread pool to share across items on a partition as
opposed to spawning a future per record in the RDD for example.

On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora shushantaror...@gmail.com
wrote:

 Hi

 Can I create user threads in executors.
 I have a streaming app where after processing I have a requirement to push
 events to external system . Each post request costs ~90-100 ms.

 To make post parllel, I can not use same thread because that is limited by
 no of cores available in system , can I useuser therads in spark App? I
 tried to create 2 thredas in a map tasks and it worked.

 Is there any upper limit on no of user threds in spark executor ? Is it a
 good idea to create user threads in spark map task?

 Thanks




-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com http://localytics.com/ | Our Blog
http://localytics.com/blog | Twitter http://twitter.com/localytics |
Facebook http://facebook.com/localytics | LinkedIn
http://www.linkedin.com/company/1148792?trk=tyah


Re: user threads in executors

2015-07-21 Thread Shushant Arora
I can post multiple items at a time.

Data is being read from kafka and filtered after that its posted .
Does foreachPartition
load complete partition in memory or use an iterator of batch underhood? If
compete batch is not loaded will using custim size of 100-200 request in
one batch and post will help instead of whole partition ?

On Wed, Jul 22, 2015 at 12:18 AM, Tathagata Das t...@databricks.com wrote:

 If you can post multiple items at a time, then use foreachPartition to
 post the whole partition in a single request.

 On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher 
 rmarsc...@localytics.com wrote:

 You can certainly create threads in a map transformation. We do this to
 do concurrent DB lookups during one stage for example. I would recommend,
 however, that you switch to mapPartitions from map as this allows you to
 create a fixed size thread pool to share across items on a partition as
 opposed to spawning a future per record in the RDD for example.

 On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora 
 shushantaror...@gmail.com wrote:

 Hi

 Can I create user threads in executors.
 I have a streaming app where after processing I have a requirement to
 push events to external system . Each post request costs ~90-100 ms.

 To make post parllel, I can not use same thread because that is limited
 by no of cores available in system , can I useuser therads in spark App? I
 tried to create 2 thredas in a map tasks and it worked.

 Is there any upper limit on no of user threds in spark executor ? Is it
 a good idea to create user threads in spark map task?

 Thanks




 --
 *Richard Marscher*
 Software Engineer
 Localytics
 Localytics.com http://localytics.com/ | Our Blog
 http://localytics.com/blog | Twitter http://twitter.com/localytics |
 Facebook http://facebook.com/localytics | LinkedIn
 http://www.linkedin.com/company/1148792?trk=tyah





Re: user threads in executors

2015-07-21 Thread Tathagata Das
If you can post multiple items at a time, then use foreachPartition to post
the whole partition in a single request.

On Tue, Jul 21, 2015 at 9:35 AM, Richard Marscher rmarsc...@localytics.com
wrote:

 You can certainly create threads in a map transformation. We do this to do
 concurrent DB lookups during one stage for example. I would recommend,
 however, that you switch to mapPartitions from map as this allows you to
 create a fixed size thread pool to share across items on a partition as
 opposed to spawning a future per record in the RDD for example.

 On Tue, Jul 21, 2015 at 4:11 AM, Shushant Arora shushantaror...@gmail.com
  wrote:

 Hi

 Can I create user threads in executors.
 I have a streaming app where after processing I have a requirement to
 push events to external system . Each post request costs ~90-100 ms.

 To make post parllel, I can not use same thread because that is limited
 by no of cores available in system , can I useuser therads in spark App? I
 tried to create 2 thredas in a map tasks and it worked.

 Is there any upper limit on no of user threds in spark executor ? Is it a
 good idea to create user threads in spark map task?

 Thanks




 --
 *Richard Marscher*
 Software Engineer
 Localytics
 Localytics.com http://localytics.com/ | Our Blog
 http://localytics.com/blog | Twitter http://twitter.com/localytics |
 Facebook http://facebook.com/localytics | LinkedIn
 http://www.linkedin.com/company/1148792?trk=tyah



user threads in executors

2015-07-21 Thread Shushant Arora
Hi

Can I create user threads in executors.
I have a streaming app where after processing I have a requirement to push
events to external system . Each post request costs ~90-100 ms.

To make post parllel, I can not use same thread because that is limited by
no of cores available in system , can I useuser therads in spark App? I
tried to create 2 thredas in a map tasks and it worked.

Is there any upper limit on no of user threds in spark executor ? Is it a
good idea to create user threads in spark map task?

Thanks