Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
tically > spawn another receiver on another machine or on the same machine. > > Thanks > Best Regards > > On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang wrote: > >> Dibyendu, >> >> Thanks for the reply. >> >> I am reading your project homepage now. >&g

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
t;> can enable log rotation etc.) and if you are doing a groupBy, join, etc >> type of operations, then there will be a lot of shuffle data. So You need >> to check in the worker logs and see what happened (whether DISK full etc.), >> We have streaming pipelines running for wee

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
; Thanks > Best Regards > > On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang wrote: > >> Guys, >> >> We have a project which builds upon Spark streaming. >> >> We use Kafka as the input stream, and create 5 receivers. >> >> When this applicat

Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Guys, We have a project which builds upon Spark streaming. We use Kafka as the input stream, and create 5 receivers. When this application runs for around 90 hour, all the 5 receivers failed for some unknown reasons. In my understanding, it is not guaranteed that Spark streaming receiver will d

Is It Feasible for Spark 1.1 Broadcast to Fully Utilize the Ethernet Card Throughput?

2015-01-09 Thread Jun Yang
Guys, I have a question regarding to Spark 1.1 broadcast implementation. In our pipeline, we have a large multi-class LR model, which is about 1GiB size. To employ the benefit of Spark parallelism, a natural thinking is to broadcast this model file to the worker node. However, it looks that bro

Re: k-means clustering

2014-11-20 Thread Jun Yang
Guys, As to the questions of pre-processing, you could just migrate your logic to Spark before using K-means. I only used Scala on Spark, and haven't used Python binding on Spark, but I think the basic steps must be the same. BTW, if your data set is big with huge sparse dimension feature vector

Questions Regarding to MPI Program Migration to Spark

2014-11-16 Thread Jun Yang
Guys, Recently we are migrating our backend pipeline from to Spark. In our pipeline, we have a MPI-based HAC implementation, to ensure the result consistency of migration, we also want to migrate this MPI-implemented code to Spark. However, during the migration process, I found that there are so