Hello,

On Wed, Jun 8, 2016 at 12:59 AM, Mich Talebzadeh wrote:
>
> one thing you may consider is using something like flume to store
> data on hfs. [...]

Thank you for your sensible suggestions.

> Have you thought of other tools besides Spark?

No, as least not seriously yet. Flume looks like a good candidate
indeed but other distributed key-value stores (Cassandra, HBase, Redis
Cluster) would fit the bill too I guess. Of course, the lighter the
better.

Other than that, does anyone has any comment on how to proceed with a
Spark job? I understand this may not be the best solution, but I'd
like to know if there is an efficient way to solve my problem with the
RDD API out of curiosity.

Thanks,

--
Jeroen

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to