Re: can't union two rdds

2015-03-31 Thread ankurjain.nitrr
Rdd union will result in  

  1 2 
  3 4 
  5 6 
  7 8 
  9 10 
11 12

What you are trying to do is join.
There must be a logic/key to perform join operation.

I think in your case you want the order (index) to be the joining key here.
RDD is a distributed data structure and is not apt for your case.

If that amount for data is less, you can use rdd.collect, just iterate on it
both the list and produce the desired result



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/can-t-union-two-rdds-tp22320p22323.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: can't union two rdds

2015-03-31 Thread roy
use zip



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/can-t-union-two-rdds-tp22320p22321.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: UNION two RDDs

2014-12-22 Thread Jerry Lam
Hi Sean and Madhu,

Thank you for the explanation. I really appreciate it.

Best Regards,

Jerry


On Fri, Dec 19, 2014 at 4:50 AM, Sean Owen so...@cloudera.com wrote:

 coalesce actually changes the number of partitions. Unless the
 original RDD had just 1 partition, coalesce(1) will make an RDD with 1
 partition that is larger than the original partitions, of course.

 I don't think the question is about ordering of things within an
 element of the RDD?

 If the original RDD was sorted, and so has a defined ordering, then it
 will be preserved. Otherwise I believe you do not have any guarantees
 about ordering. In practice, you may find that you still encounter the
 elements in the same order after coalesce(1), although I am not sure
 that is even true.

 union() is the same story; unless the RDDs are sorted I don't think
 there are guarantees. However I'm almost certain that in practice, as
 it happens now, A's elements would come before B's after a union, if
 you did traverse them.

 On Fri, Dec 19, 2014 at 5:41 AM, madhu phatak phatak@gmail.com
 wrote:
  Hi,
  coalesce is an operation which changes no of records in a partition. It
 will
  not touch ordering with in a row AFAIK.
 
  On Fri, Dec 19, 2014 at 2:22 AM, Jerry Lam chiling...@gmail.com wrote:
 
  Hi Spark users,
 
  I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
  RDDA before records in RDDB.
 
  Also, will resultRDD.coalesce(1) change this ordering?
 
  Best Regards,
 
  Jerry
 
 
 
  --
  Regards,
  Madhukara Phatak
  http://www.madhukaraphatak.com



Re: UNION two RDDs

2014-12-19 Thread Sean Owen
coalesce actually changes the number of partitions. Unless the
original RDD had just 1 partition, coalesce(1) will make an RDD with 1
partition that is larger than the original partitions, of course.

I don't think the question is about ordering of things within an
element of the RDD?

If the original RDD was sorted, and so has a defined ordering, then it
will be preserved. Otherwise I believe you do not have any guarantees
about ordering. In practice, you may find that you still encounter the
elements in the same order after coalesce(1), although I am not sure
that is even true.

union() is the same story; unless the RDDs are sorted I don't think
there are guarantees. However I'm almost certain that in practice, as
it happens now, A's elements would come before B's after a union, if
you did traverse them.

On Fri, Dec 19, 2014 at 5:41 AM, madhu phatak phatak@gmail.com wrote:
 Hi,
 coalesce is an operation which changes no of records in a partition. It will
 not touch ordering with in a row AFAIK.

 On Fri, Dec 19, 2014 at 2:22 AM, Jerry Lam chiling...@gmail.com wrote:

 Hi Spark users,

 I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
 RDDA before records in RDDB.

 Also, will resultRDD.coalesce(1) change this ordering?

 Best Regards,

 Jerry



 --
 Regards,
 Madhukara Phatak
 http://www.madhukaraphatak.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



UNION two RDDs

2014-12-18 Thread Jerry Lam
Hi Spark users,

I wonder if val resultRDD = RDDA.union(RDDB) will always have records in
RDDA before records in RDDB.

Also, will resultRDD.coalesce(1) change this ordering?

Best Regards,

Jerry