tically
> spawn another receiver on another machine or on the same machine.
>
> Thanks
> Best Regards
>
> On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang wrote:
>
>> Dibyendu,
>>
>> Thanks for the reply.
>>
>> I am reading your project homepage now.
>&g
t;> can enable log rotation etc.) and if you are doing a groupBy, join, etc
>> type of operations, then there will be a lot of shuffle data. So You need
>> to check in the worker logs and see what happened (whether DISK full etc.),
>> We have streaming pipelines running for wee
; Thanks
> Best Regards
>
> On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang wrote:
>
>> Guys,
>>
>> We have a project which builds upon Spark streaming.
>>
>> We use Kafka as the input stream, and create 5 receivers.
>>
>> When this applicat
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons.
In my understanding, it is not guaranteed that Spark streaming receiver
will d
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.
However, it looks that bro
Guys,
As to the questions of pre-processing, you could just migrate your logic to
Spark before using K-means.
I only used Scala on Spark, and haven't used Python binding on Spark, but I
think the basic steps must be the same.
BTW, if your data set is big with huge sparse dimension feature vector
Guys,
Recently we are migrating our backend pipeline from to Spark.
In our pipeline, we have a MPI-based HAC implementation, to ensure the
result consistency of migration, we also want to migrate this
MPI-implemented code to Spark.
However, during the migration process, I found that there are so