Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons.
In my understanding, it is not guaranteed that Spark streaming receiver
will d
; Thanks
> Best Regards
>
> On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang wrote:
>
>> Guys,
>>
>> We have a project which builds upon Spark streaming.
>>
>> We use Kafka as the input stream, and create 5 receivers.
>>
>> When this applicat
t;> can enable log rotation etc.) and if you are doing a groupBy, join, etc
>> type of operations, then there will be a lot of shuffle data. So You need
>> to check in the worker logs and see what happened (whether DISK full etc.),
>> We have streaming pipelines running for wee
tically
> spawn another receiver on another machine or on the same machine.
>
> Thanks
> Best Regards
>
> On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang wrote:
>
>> Dibyendu,
>>
>> Thanks for the reply.
>>
>> I am reading your project homepage now.
>&g
Guys,
Recently we are migrating our backend pipeline from to Spark.
In our pipeline, we have a MPI-based HAC implementation, to ensure the
result consistency of migration, we also want to migrate this
MPI-implemented code to Spark.
However, during the migration process, I found that there are so
Guys,
As to the questions of pre-processing, you could just migrate your logic to
Spark before using K-means.
I only used Scala on Spark, and haven't used Python binding on Spark, but I
think the basic steps must be the same.
BTW, if your data set is big with huge sparse dimension feature vector
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.
However, it looks that bro