Hello, On Wed, Jun 8, 2016 at 12:59 AM, Mich Talebzadeh wrote: > > one thing you may consider is using something like flume to store > data on hfs. [...]
Thank you for your sensible suggestions. > Have you thought of other tools besides Spark? No, as least not seriously yet. Flume looks like a good candidate indeed but other distributed key-value stores (Cassandra, HBase, Redis Cluster) would fit the bill too I guess. Of course, the lighter the better. Other than that, does anyone has any comment on how to proceed with a Spark job? I understand this may not be the best solution, but I'd like to know if there is an efficient way to solve my problem with the RDD API out of curiosity. Thanks, -- Jeroen --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org