Re: Reading from multiple Input path into single Resilient Distributed dataset?

Gary Malouf Mon, 16 Dec 2013 06:20:14 -0800

Hi Archit,

There are a lot of nice functions for joining key-value RDDs.
http://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.PairRDDFunctions

A common strategy I do is to take my two sets and factor out a common 'key'
then execute one of the join functions to get the results I am looking for.
 It's hard to advise further on your specific case as I do not know your
data structure but hope this helps.

Gary

On Mon, Dec 16, 2013 at 6:00 AM, Archit Thakur <[email protected]>wrote:

> Hi,
>
> I want to read multiple paths into single RDD.
>
> I know I can do it this way:
> sc.sequenceFile("/data/new_rdd_/*,-,-,-)
>
> What if they belong to different directories or may be different machines?
>
> Is the only way by joining two RDD .
> That is reading different path into different RDD and then join all.?
>
>
> but my real requirement is not to join all RDD but MERGE them, like
> appending 2nd to 1st and so on.
>
> What is the best way for this?
>
> Thanks and Regards,
> Archit Thakur.
>

Re: Reading from multiple Input path into single Resilient Distributed dataset?

Reply via email to