Re: Joining by values

2015-01-03 Thread Dilip Movva
Thanks Sanjay. I will give it a try.

Thanks
Dilip

On Sat, Jan 3, 2015 at 11:25 PM, Sanjay Subramanian <
sanjaysubraman...@yahoo.com> wrote:

> so I changed the code to
>
> rdd1InvIndex.join(rdd2Pair).map(str => str._2).groupByKey().map(str => 
> (str._1,str._2.toList)).collect().foreach(println)
>
> Now it prints. Don't worry I will work on this to not output as List(...)
> But I am hoping that the JOIN question that @Dilip asked is hopefully
> answered :-)
>
> (2,List(1001,1000,1002,1003, 1004,1001,1006,1007))
> (3,List(1011,1012,1013,1010, 1007,1009,1005,1008))
> (1,List(1001,1000,1002,1003, 1011,1012,1013,1010, 1004,1001,1006,1007,
> 1007,1009,1005,1008))
>
>   --
>  *From:* Shixiong Zhu 
> *To:* Sanjay Subramanian 
> *Cc:* dcmovva ; "user@spark.apache.org" <
> user@spark.apache.org>
> *Sent:* Saturday, January 3, 2015 8:15 PM
>
> *Subject:* Re: Joining by values
>
> call `map(_.toList)` to convert `CompactBuffer` to `List`
>
> Best Regards,
> Shixiong Zhu
>
> 2015-01-04 12:08 GMT+08:00 Sanjay Subramanian <
> sanjaysubraman...@yahoo.com.invalid>:
>
>
> hi
> Take a look at the code here I wrote
>
> https://raw.githubusercontent.com/sanjaysubramanian/msfx_scala/master/src/main/scala/org/medicalsidefx/common/utils/PairRddJoin.scala
>
> /*rdd1.txt
>
> 1~4,5,6,7
> 2~4,5
> 3~6,7
>
> rdd2.txt
>
> 4~1001,1000,1002,1003
> 5~1004,1001,1006,1007
> 6~1007,1009,1005,1008
> 7~1011,1012,1013,1010
>
> */
> val sconf = new 
> SparkConf().setMaster("local").setAppName("MedicalSideFx-PairRddJoin")
> val sc = new SparkContext(sconf)
>
>
> val rdd1 = "/path/to/rdd1.txt"
> val rdd2 = "/path/to/rdd2.txt"
>
> val rdd1InvIndex = sc.textFile(rdd1).map(x => (x.split('~')(0), 
> x.split('~')(1))).flatMapValues(str => str.split(',')).map(str => (str._2, 
> str._1))
> val rdd2Pair = sc.textFile(rdd2).map(str => (str.split('~')(0), 
> str.split('~')(1)))
> rdd1InvIndex.join(rdd2Pair).map(str => 
> str._2).groupByKey().collect().foreach(println)
>
>
> This outputs the following . I think this may be essentially what u r looking 
> for
>
> (I have to understand how to NOT print as CompactBuffer)
>
> (2,CompactBuffer(1001,1000,1002,1003, 1004,1001,1006,1007))
> (3,CompactBuffer(1011,1012,1013,1010, 1007,1009,1005,1008))
> (1,CompactBuffer(1001,1000,1002,1003, 1011,1012,1013,1010, 
> 1004,1001,1006,1007, 1007,1009,1005,1008))
>
>
>   --
>  *From:* Sanjay Subramanian 
> *To:* dcmovva ; "user@spark.apache.org" <
> user@spark.apache.org>
> *Sent:* Saturday, January 3, 2015 12:19 PM
> *Subject:* Re: Joining by values
>
> This is my design. Now let me try and code it in Spark.
>
> rdd1.txt
> =
> 1~4,5,6,7
> 2~4,5
> 3~6,7
>
> rdd2.txt
> 
> 4~1001,1000,1002,1003
> 5~1004,1001,1006,1007
> 6~1007,1009,1005,1008
> 7~1011,1012,1013,1010
>
> TRANSFORM 1
> ===
> map each value to key (like an inverted index)
> 4~1
> 5~1
> 6~1
> 7~1
> 5~2
> 4~2
> 6~3
> 7~3
>
> TRANSFORM 2
> ===
> Join keys in transform 1 and rdd2
> 4~1,1001,1000,1002,1003
> 4~2,1001,1000,1002,1003
> 5~1,1004,1001,1006,1007
> 5~2,1004,1001,1006,1007
> 6~1,1007,1009,1005,1008
> 6~3,1007,1009,1005,1008
> 7~1,1011,1012,1013,1010
> 7~3,1011,1012,1013,1010
>
> TRANSFORM 3
> ===
> Split key in transform 2 with "~" and keep key(1) i.e. 1,2,3
> 1~1001,1000,1002,1003
> 2~1001,1000,1002,1003
> 1~1004,1001,1006,1007
> 2~1004,1001,1006,1007
> 1~1007,1009,1005,1008
> 3~1007,1009,1005,1008
> 1~1011,1012,1013,1010
> 3~1011,1012,1013,1010
>
> TRANSFORM 4
> ===
> join by key
>
> 1~1001,1000,1002,1003,1004,1001,1006,1007,1007,1009,1005,1008,1011,1012,1013,1010
> 2~1001,1000,1002,1003,1004,1001,1006,1007
> 3~1007,1009,1005,1008,1011,1012,1013,1010
>
>
>
>
>  --
>  *From:* dcmovva 
> *To:* user@spark.apache.org
> *Sent:* Saturday, January 3, 2015 10:10 AM
> *Subject:* Joining by values
>
> I have a two pair RDDs in spark like this
>
> rdd1 = (1 -> [4,5,6,7])
>   (2 -> [4,5])
>   (3 -> [6,7])
>
>
> rdd2 = (4 -> [1001,1000,1002,1003])
>   (5 -> [1004,1001,1006,1007])
>   (6 -> [1007,1009,1005,1008])
>   (7 -> [1011,1012,1013,1010])
> I would like to combine them to look like this.
>
> joinedRdd = (1 ->
> [1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013])
> (2 -> [1000,1001,1002,1003,1004,1006,1007])
> (3 -> [1005,1007,1008,1009,1010,1011,1012,1013])
>
>
> Can someone suggest me how to do this.
>
> Thanks Dilip
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>
>
>
>


Re: Joining by values

2015-01-03 Thread Sanjay Subramanian
so I changed the code tordd1InvIndex.join(rdd2Pair).map(str => 
str._2).groupByKey().map(str => 
(str._1,str._2.toList)).collect().foreach(println)
Now it prints. Don't worry I will work on this to not output as List(...) But I 
am hoping that the JOIN question that @Dilip asked is hopefully answered :-) 
(2,List(1001,1000,1002,1003, 1004,1001,1006,1007))(3,List(1011,1012,1013,1010, 
1007,1009,1005,1008))(1,List(1001,1000,1002,1003, 1011,1012,1013,1010, 
1004,1001,1006,1007, 1007,1009,1005,1008))
  From: Shixiong Zhu 
 To: Sanjay Subramanian  
Cc: dcmovva ; "user@spark.apache.org" 
 
 Sent: Saturday, January 3, 2015 8:15 PM
 Subject: Re: Joining by values
   
call `map(_.toList)` to convert `CompactBuffer` to `List`
Best Regards,Shixiong Zhu
2015-01-04 12:08 GMT+08:00 Sanjay Subramanian 
:



hi Take a look at the code here I 
wrotehttps://raw.githubusercontent.com/sanjaysubramanian/msfx_scala/master/src/main/scala/org/medicalsidefx/common/utils/PairRddJoin.scala

/*rdd1.txt

1~4,5,6,7
2~4,5
3~6,7

rdd2.txt

4~1001,1000,1002,1003
5~1004,1001,1006,1007
6~1007,1009,1005,1008
7~1011,1012,1013,1010

*/
val sconf = new 
SparkConf().setMaster("local").setAppName("MedicalSideFx-PairRddJoin")
val sc = new SparkContext(sconf)


val rdd1 = "/path/to/rdd1.txt"
val rdd2 = "/path/to/rdd2.txt"

val rdd1InvIndex = sc.textFile(rdd1).map(x => (x.split('~')(0), 
x.split('~')(1))).flatMapValues(str => str.split(',')).map(str => (str._2, 
str._1))
val rdd2Pair = sc.textFile(rdd2).map(str => (str.split('~')(0), 
str.split('~')(1)))
rdd1InvIndex.join(rdd2Pair).map(str => 
str._2).groupByKey().collect().foreach(println)

This outputs the following . I think this may be essentially what u r looking 
for(I have to understand how to NOT print as 
CompactBuffer)(2,CompactBuffer(1001,1000,1002,1003, 1004,1001,1006,1007))
(3,CompactBuffer(1011,1012,1013,1010, 1007,1009,1005,1008))
(1,CompactBuffer(1001,1000,1002,1003, 1011,1012,1013,1010, 1004,1001,1006,1007, 
1007,1009,1005,1008))

  From: Sanjay Subramanian 
 To: dcmovva ; "user@spark.apache.org" 
 
 Sent: Saturday, January 3, 2015 12:19 PM
 Subject: Re: Joining by values
   
This is my design. Now let me try and code it in Spark.
rdd1.txt =1~4,5,6,72~4,53~6,7
rdd2.txt 
4~1001,1000,1002,10035~1004,1001,1006,10076~1007,1009,1005,10087~1011,1012,1013,1010
TRANSFORM 1===map each value to key (like an inverted 
index)4~15~16~17~15~24~26~37~3
TRANSFORM 2===Join keys in transform 1 and 
rdd24~1,1001,1000,1002,10034~2,1001,1000,1002,10035~1,1004,1001,1006,10075~2,1004,1001,1006,10076~1,1007,1009,1005,10086~3,1007,1009,1005,10087~1,1011,1012,1013,10107~3,1011,1012,1013,1010
TRANSFORM 3===Split key in transform 2 with "~" and keep key(1) i.e. 
1,2,31~1001,1000,1002,10032~1001,1000,1002,10031~1004,1001,1006,10072~1004,1001,1006,10071~1007,1009,1005,10083~1007,1009,1005,10081~1011,1012,1013,10103~1011,1012,1013,1010
TRANSFORM 4===join by key 
1~1001,1000,1002,1003,1004,1001,1006,1007,1007,1009,1005,1008,1011,1012,1013,10102~1001,1000,1002,1003,1004,1001,1006,10073~1007,1009,1005,1008,1011,1012,1013,1010

 

 From: dcmovva 
 To: user@spark.apache.org 
 Sent: Saturday, January 3, 2015 10:10 AM
 Subject: Joining by values
   
I have a two pair RDDs in spark like this

rdd1 = (1 -> [4,5,6,7])
  (2 -> [4,5])
  (3 -> [6,7])


rdd2 = (4 -> [1001,1000,1002,1003])
  (5 -> [1004,1001,1006,1007])
  (6 -> [1007,1009,1005,1008])
  (7 -> [1011,1012,1013,1010])
I would like to combine them to look like this.

joinedRdd = (1 ->
[1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013])
        (2 -> [1000,1001,1002,1003,1004,1006,1007])
        (3 -> [1005,1007,1008,1009,1010,1011,1012,1013])


Can someone suggest me how to do this.

Thanks Dilip



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



   

   



  

Re: Joining by values

2015-01-03 Thread Shixiong Zhu
call `map(_.toList)` to convert `CompactBuffer` to `List`

Best Regards,
Shixiong Zhu

2015-01-04 12:08 GMT+08:00 Sanjay Subramanian <
sanjaysubraman...@yahoo.com.invalid>:

> hi
> Take a look at the code here I wrote
>
> https://raw.githubusercontent.com/sanjaysubramanian/msfx_scala/master/src/main/scala/org/medicalsidefx/common/utils/PairRddJoin.scala
>
> /*rdd1.txt
>
> 1~4,5,6,7
> 2~4,5
> 3~6,7
>
> rdd2.txt
>
> 4~1001,1000,1002,1003
> 5~1004,1001,1006,1007
> 6~1007,1009,1005,1008
> 7~1011,1012,1013,1010
>
> */
> val sconf = new 
> SparkConf().setMaster("local").setAppName("MedicalSideFx-PairRddJoin")
> val sc = new SparkContext(sconf)
>
>
> val rdd1 = "/path/to/rdd1.txt"
> val rdd2 = "/path/to/rdd2.txt"
>
> val rdd1InvIndex = sc.textFile(rdd1).map(x => (x.split('~')(0), 
> x.split('~')(1))).flatMapValues(str => str.split(',')).map(str => (str._2, 
> str._1))
> val rdd2Pair = sc.textFile(rdd2).map(str => (str.split('~')(0), 
> str.split('~')(1)))
> rdd1InvIndex.join(rdd2Pair).map(str => 
> str._2).groupByKey().collect().foreach(println)
>
>
> This outputs the following . I think this may be essentially what u r looking 
> for
>
> (I have to understand how to NOT print as CompactBuffer)
>
> (2,CompactBuffer(1001,1000,1002,1003, 1004,1001,1006,1007))
> (3,CompactBuffer(1011,1012,1013,1010, 1007,1009,1005,1008))
> (1,CompactBuffer(1001,1000,1002,1003, 1011,1012,1013,1010, 
> 1004,1001,1006,1007, 1007,1009,1005,1008))
>
>
>   --
>  *From:* Sanjay Subramanian 
> *To:* dcmovva ; "user@spark.apache.org" <
> user@spark.apache.org>
> *Sent:* Saturday, January 3, 2015 12:19 PM
> *Subject:* Re: Joining by values
>
> This is my design. Now let me try and code it in Spark.
>
> rdd1.txt
> =
> 1~4,5,6,7
> 2~4,5
> 3~6,7
>
> rdd2.txt
> 
> 4~1001,1000,1002,1003
> 5~1004,1001,1006,1007
> 6~1007,1009,1005,1008
> 7~1011,1012,1013,1010
>
> TRANSFORM 1
> ===
> map each value to key (like an inverted index)
> 4~1
> 5~1
> 6~1
> 7~1
> 5~2
> 4~2
> 6~3
> 7~3
>
> TRANSFORM 2
> ===
> Join keys in transform 1 and rdd2
> 4~1,1001,1000,1002,1003
> 4~2,1001,1000,1002,1003
> 5~1,1004,1001,1006,1007
> 5~2,1004,1001,1006,1007
> 6~1,1007,1009,1005,1008
> 6~3,1007,1009,1005,1008
> 7~1,1011,1012,1013,1010
> 7~3,1011,1012,1013,1010
>
> TRANSFORM 3
> ===
> Split key in transform 2 with "~" and keep key(1) i.e. 1,2,3
> 1~1001,1000,1002,1003
> 2~1001,1000,1002,1003
> 1~1004,1001,1006,1007
> 2~1004,1001,1006,1007
> 1~1007,1009,1005,1008
> 3~1007,1009,1005,1008
> 1~1011,1012,1013,1010
> 3~1011,1012,1013,1010
>
> TRANSFORM 4
> ===
> join by key
>
> 1~1001,1000,1002,1003,1004,1001,1006,1007,1007,1009,1005,1008,1011,1012,1013,1010
> 2~1001,1000,1002,1003,1004,1001,1006,1007
> 3~1007,1009,1005,1008,1011,1012,1013,1010
>
>
>
>
>  --
>  *From:* dcmovva 
> *To:* user@spark.apache.org
> *Sent:* Saturday, January 3, 2015 10:10 AM
> *Subject:* Joining by values
>
> I have a two pair RDDs in spark like this
>
> rdd1 = (1 -> [4,5,6,7])
>   (2 -> [4,5])
>   (3 -> [6,7])
>
>
> rdd2 = (4 -> [1001,1000,1002,1003])
>   (5 -> [1004,1001,1006,1007])
>   (6 -> [1007,1009,1005,1008])
>   (7 -> [1011,1012,1013,1010])
> I would like to combine them to look like this.
>
> joinedRdd = (1 ->
> [1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013])
> (2 -> [1000,1001,1002,1003,1004,1006,1007])
> (3 -> [1005,1007,1008,1009,1010,1011,1012,1013])
>
>
> Can someone suggest me how to do this.
>
> Thanks Dilip
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
>
>
>


Re: Joining by values

2015-01-03 Thread Sanjay Subramanian
hi Take a look at the code here I 
wrotehttps://raw.githubusercontent.com/sanjaysubramanian/msfx_scala/master/src/main/scala/org/medicalsidefx/common/utils/PairRddJoin.scala

/*rdd1.txt

1~4,5,6,7
2~4,5
3~6,7

rdd2.txt

4~1001,1000,1002,1003
5~1004,1001,1006,1007
6~1007,1009,1005,1008
7~1011,1012,1013,1010

*/
val sconf = new 
SparkConf().setMaster("local").setAppName("MedicalSideFx-PairRddJoin")
val sc = new SparkContext(sconf)


val rdd1 = "/path/to/rdd1.txt"
val rdd2 = "/path/to/rdd2.txt"

val rdd1InvIndex = sc.textFile(rdd1).map(x => (x.split('~')(0), 
x.split('~')(1))).flatMapValues(str => str.split(',')).map(str => (str._2, 
str._1))
val rdd2Pair = sc.textFile(rdd2).map(str => (str.split('~')(0), 
str.split('~')(1)))
rdd1InvIndex.join(rdd2Pair).map(str => 
str._2).groupByKey().collect().foreach(println)

This outputs the following . I think this may be essentially what u r looking 
for(I have to understand how to NOT print as 
CompactBuffer)(2,CompactBuffer(1001,1000,1002,1003, 1004,1001,1006,1007))
(3,CompactBuffer(1011,1012,1013,1010, 1007,1009,1005,1008))
(1,CompactBuffer(1001,1000,1002,1003, 1011,1012,1013,1010, 1004,1001,1006,1007, 
1007,1009,1005,1008))

  From: Sanjay Subramanian 
 To: dcmovva ; "user@spark.apache.org" 
 
 Sent: Saturday, January 3, 2015 12:19 PM
 Subject: Re: Joining by values
   
This is my design. Now let me try and code it in Spark.
rdd1.txt =1~4,5,6,72~4,53~6,7
rdd2.txt 
4~1001,1000,1002,10035~1004,1001,1006,10076~1007,1009,1005,10087~1011,1012,1013,1010
TRANSFORM 1===map each value to key (like an inverted 
index)4~15~16~17~15~24~26~37~3
TRANSFORM 2===Join keys in transform 1 and 
rdd24~1,1001,1000,1002,10034~2,1001,1000,1002,10035~1,1004,1001,1006,10075~2,1004,1001,1006,10076~1,1007,1009,1005,10086~3,1007,1009,1005,10087~1,1011,1012,1013,10107~3,1011,1012,1013,1010
TRANSFORM 3===Split key in transform 2 with "~" and keep key(1) i.e. 
1,2,31~1001,1000,1002,10032~1001,1000,1002,10031~1004,1001,1006,10072~1004,1001,1006,10071~1007,1009,1005,10083~1007,1009,1005,10081~1011,1012,1013,10103~1011,1012,1013,1010
TRANSFORM 4===join by key 
1~1001,1000,1002,1003,1004,1001,1006,1007,1007,1009,1005,1008,1011,1012,1013,10102~1001,1000,1002,1003,1004,1001,1006,10073~1007,1009,1005,1008,1011,1012,1013,1010

 

 From: dcmovva 
 To: user@spark.apache.org 
 Sent: Saturday, January 3, 2015 10:10 AM
 Subject: Joining by values
   
I have a two pair RDDs in spark like this

rdd1 = (1 -> [4,5,6,7])
  (2 -> [4,5])
  (3 -> [6,7])


rdd2 = (4 -> [1001,1000,1002,1003])
  (5 -> [1004,1001,1006,1007])
  (6 -> [1007,1009,1005,1008])
  (7 -> [1011,1012,1013,1010])
I would like to combine them to look like this.

joinedRdd = (1 ->
[1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013])
        (2 -> [1000,1001,1002,1003,1004,1006,1007])
        (3 -> [1005,1007,1008,1009,1010,1011,1012,1013])


Can someone suggest me how to do this.

Thanks Dilip



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



   

  

Re: Joining by values

2015-01-03 Thread Sanjay Subramanian
This is my design. Now let me try and code it in Spark.
rdd1.txt =1~4,5,6,72~4,53~6,7
rdd2.txt 
4~1001,1000,1002,10035~1004,1001,1006,10076~1007,1009,1005,10087~1011,1012,1013,1010
TRANSFORM 1===map each value to key (like an inverted 
index)4~15~16~17~15~24~26~37~3
TRANSFORM 2===Join keys in transform 1 and 
rdd24~1,1001,1000,1002,10034~2,1001,1000,1002,10035~1,1004,1001,1006,10075~2,1004,1001,1006,10076~1,1007,1009,1005,10086~3,1007,1009,1005,10087~1,1011,1012,1013,10107~3,1011,1012,1013,1010
TRANSFORM 3===Split key in transform 2 with "~" and keep key(1) i.e. 
1,2,31~1001,1000,1002,10032~1001,1000,1002,10031~1004,1001,1006,10072~1004,1001,1006,10071~1007,1009,1005,10083~1007,1009,1005,10081~1011,1012,1013,10103~1011,1012,1013,1010
TRANSFORM 4===join by key 
1~1001,1000,1002,1003,1004,1001,1006,1007,1007,1009,1005,1008,1011,1012,1013,10102~1001,1000,1002,1003,1004,1001,1006,10073~1007,1009,1005,1008,1011,1012,1013,1010

  From: dcmovva 
 To: user@spark.apache.org 
 Sent: Saturday, January 3, 2015 10:10 AM
 Subject: Joining by values
   
I have a two pair RDDs in spark like this

rdd1 = (1 -> [4,5,6,7])
  (2 -> [4,5])
  (3 -> [6,7])


rdd2 = (4 -> [1001,1000,1002,1003])
  (5 -> [1004,1001,1006,1007])
  (6 -> [1007,1009,1005,1008])
  (7 -> [1011,1012,1013,1010])
I would like to combine them to look like this.

joinedRdd = (1 ->
[1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013])
        (2 -> [1000,1001,1002,1003,1004,1006,1007])
        (3 -> [1005,1007,1008,1009,1010,1011,1012,1013])


Can someone suggest me how to do this.

Thanks Dilip



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



  

Joining by values

2015-01-03 Thread dcmovva
I have a two pair RDDs in spark like this

rdd1 = (1 -> [4,5,6,7])
   (2 -> [4,5])
   (3 -> [6,7])


rdd2 = (4 -> [1001,1000,1002,1003])
   (5 -> [1004,1001,1006,1007])
   (6 -> [1007,1009,1005,1008])
   (7 -> [1011,1012,1013,1010])
I would like to combine them to look like this.

joinedRdd = (1 ->
[1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013])
(2 -> [1000,1001,1002,1003,1004,1006,1007])
(3 -> [1005,1007,1008,1009,1010,1011,1012,1013])


Can someone suggest me how to do this.

Thanks Dilip



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joining-by-values-tp20954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org