Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
at 7:26 PM, Akhil Das wrote: > Hmm for a singl core VM you will have to run it in local mode(specifying > master= local[4]). The flag is available in all the versions of spark i > guess. > On Aug 22, 2015 5:04 AM, "Sateesh Kavuri" > wrote: > >> Thanks Akhi

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
ook at the spark.streaming.concurrentJobs by default it runs a > single job. If set it to 2 then it can run 2 jobs parallely. Its an > experimental flag, but go ahead and give it a try. > On Aug 21, 2015 3:36 AM, "Sateesh Kavuri" > wrote: > >> Hi, >> >

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
d you not start disk io in a separate > thread, so that the sceduler can go ahead and assign other tasks ? > On 21 Aug 2015 16:06, "Sateesh Kavuri" wrote: > >> Hi, >> >> My scenario goes like this: >> I have an algorithm running in Spark streaming mode on a

Spark streaming multi-tasking during I/O

2015-08-21 Thread Sateesh Kavuri
Hi, My scenario goes like this: I have an algorithm running in Spark streaming mode on a 4 core virtual machine. Majority of the time, the algorithm does disk I/O and database I/O. Question is, during the I/O, where the CPU is not considerably loaded, is it possible to run any other task/thread so

Re: Spark or Storm

2015-06-16 Thread Sateesh Kavuri
Probably overloading the question a bit. In Storm, Bolts have the functionality of getting triggered on events. Is that kind of functionality possible with Spark streaming? During each phase of the data processing, the transformed data is stored to the database and this transformed data should the

Re: Spark ML decision list

2015-06-05 Thread Sateesh Kavuri
Jun 4, 2015 at 2:14 AM, Sateesh Kavuri > wrote: > >> Hi, >> >> I have used weka machine learning library for generating a model for my >> training set. I have used the PART algorithm (decision lists) from weka. >> >> Now, I would like to use spark ML for t

Spark ML decision list

2015-06-04 Thread Sateesh Kavuri
Hi, I have used weka machine learning library for generating a model for my training set. I have used the PART algorithm (decision lists) from weka. Now, I would like to use spark ML for the PART algo for my training set and could not seem to find a parallel. Could anyone point out the correspond

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
his would have the overall effect of decreasing performance > if your required number of connections outstrips the database's resources. > > On Fri, Apr 3, 2015 at 12:22 AM Sateesh Kavuri > wrote: > >> But this basically means that the pool is confined to the job (of a >> sin

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
pache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams > > On Thu, Apr 2, 2015 at 7:52 AM, Sateesh Kavuri > wrote: > >> Right, I am aware on how to use connection pooling with oracle, but the >> specific question is how to use it in the context of spark job ex

Re: Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
't seem to be Spark specific, btw > > > > > > On Apr 2, 2015, at 4:45 AM, Sateesh Kavuri > wrote: > > > > Hi, > > > > We have a case that we will have to run concurrent jobs (for the same > algorithm) on different data sets. And these jobs can ru

Connection pooling in spark jobs

2015-04-02 Thread Sateesh Kavuri
Hi, We have a case that we will have to run concurrent jobs (for the same algorithm) on different data sets. And these jobs can run in parallel and each one of them would be fetching the data from the database. We would like to optimize the database connections by making use of connection pooling.