Re: how to filter value in spark

2014-09-01 Thread Matthew Farrellee
you could join, it'll give you the intersection and a list of the labels
where the value was found.

> a.join(b).collect
Array[(String, (String, String))] = Array((4,(a,b)), (3,(a,b)))

best,


matt

On 08/31/2014 09:23 PM, Liu, Raymond wrote:
> You could use cogroup to combine RDDs in one RDD for cross reference 
> processing.
> 
> e.g.
> 
> a.cogroup(b). filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case 
> (k,(l,r)) => (k, l)}
> 
> Best Regards,
> Raymond Liu
> 
> -Original Message-
> From: marylucy [mailto:qaz163wsx_...@hotmail.com]
> Sent: Friday, August 29, 2014 9:26 PM
> To: Matthew Farrellee
> Cc: user@spark.apache.org
> Subject: Re: how to filter value in spark
> 
> i see it works well,thank you!!!
> 
> But in follow situation how to do
> 
> var a = sc.textFile("/sparktest/1/").map((_,"a"))
> var b = sc.textFile("/sparktest/2/").map((_,"b"))
> How to get (3,"a") and (4,"a")
> 
> 
> 在 Aug 28, 2014,19:54,"Matthew Farrellee"  写道:
> 
>> On 08/28/2014 07:20 AM, marylucy wrote:
>>> fileA=1 2 3 4  one number a line,save in /sparktest/1/
>>> fileB=3 4 5 6  one number a line,save in /sparktest/2/ I want to get
>>> 3 and 4
>>>
>>> var a = sc.textFile("/sparktest/1/").map((_,1))
>>> var b = sc.textFile("/sparktest/2/").map((_,1))
>>>
>>> a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(prin
>>> tln)
>>>
>>> Error throw
>>> Scala.MatchError:Null
>>> PairRDDFunctions.lookup...
>>
>> the issue is nesting of the b rdd inside a transformation of the a rdd
>>
>> consider using intersection, it's more idiomatic
>>
>> a.intersection(b).foreach(println)
>>
>> but not that intersection will remove duplicates
>>
>> best,
>>
>>
>> matt
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>> additional commands, e-mail: user-h...@spark.apache.org
>>
> B�CB??[��X�剀�X�KK[XZ[
> ?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[
> ?\�\�Z[?\���\X?KB�B
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: how to filter value in spark

2014-08-31 Thread Liu, Raymond
You could use cogroup to combine RDDs in one RDD for cross reference processing.

e.g.

a.cogroup(b). filter{case (_, (l,r)) => l.nonEmpty && r.nonEmpty }. map{case 
(k,(l,r)) => (k, l)}

Best Regards,
Raymond Liu

-Original Message-
From: marylucy [mailto:qaz163wsx_...@hotmail.com] 
Sent: Friday, August 29, 2014 9:26 PM
To: Matthew Farrellee
Cc: user@spark.apache.org
Subject: Re: how to filter value in spark

i see it works well,thank you!!!

But in follow situation how to do

var a = sc.textFile("/sparktest/1/").map((_,"a"))
var b = sc.textFile("/sparktest/2/").map((_,"b"))
How to get (3,"a") and (4,"a")


在 Aug 28, 2014,19:54,"Matthew Farrellee"  写道:

> On 08/28/2014 07:20 AM, marylucy wrote:
>> fileA=1 2 3 4  one number a line,save in /sparktest/1/
>> fileB=3 4 5 6  one number a line,save in /sparktest/2/ I want to get 
>> 3 and 4
>> 
>> var a = sc.textFile("/sparktest/1/").map((_,1))
>> var b = sc.textFile("/sparktest/2/").map((_,1))
>> 
>> a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(prin
>> tln)
>> 
>> Error throw
>> Scala.MatchError:Null
>> PairRDDFunctions.lookup...
> 
> the issue is nesting of the b rdd inside a transformation of the a rdd
> 
> consider using intersection, it's more idiomatic
> 
> a.intersection(b).foreach(println)
> 
> but not that intersection will remove duplicates
> 
> best,
> 
> 
> matt
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For 
> additional commands, e-mail: user-h...@spark.apache.org
> 
B�CB??[��X�剀�X�KK[XZ[
?\�\�][��X�剀�X�P?\���\X?KBY][��[圹[X[??K[XZ[
?\�\�Z[?\���\X?KB�B


Re: how to filter value in spark

2014-08-29 Thread marylucy
i see it works well,thank you!!!

But in follow situation how to do

var a = sc.textFile("/sparktest/1/").map((_,"a"))
var b = sc.textFile("/sparktest/2/").map((_,"b"))
How to get (3,"a") and (4,"a")


在 Aug 28, 2014,19:54,"Matthew Farrellee"  写道:

> On 08/28/2014 07:20 AM, marylucy wrote:
>> fileA=1 2 3 4  one number a line,save in /sparktest/1/
>> fileB=3 4 5 6  one number a line,save in /sparktest/2/
>> I want to get 3 and 4
>> 
>> var a = sc.textFile("/sparktest/1/").map((_,1))
>> var b = sc.textFile("/sparktest/2/").map((_,1))
>> 
>> a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(println)
>> 
>> Error throw
>> Scala.MatchError:Null
>> PairRDDFunctions.lookup...
> 
> the issue is nesting of the b rdd inside a transformation of the a rdd
> 
> consider using intersection, it's more idiomatic
> 
> a.intersection(b).foreach(println)
> 
> but not that intersection will remove duplicates
> 
> best,
> 
> 
> matt
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


Re: how to filter value in spark

2014-08-28 Thread Matthew Farrellee

On 08/28/2014 07:20 AM, marylucy wrote:

fileA=1 2 3 4  one number a line,save in /sparktest/1/
fileB=3 4 5 6  one number a line,save in /sparktest/2/
I want to get 3 and 4

var a = sc.textFile("/sparktest/1/").map((_,1))
var b = sc.textFile("/sparktest/2/").map((_,1))

a.filter(param=>{b.lookup(param._1).length>0}).map(_._1).foreach(println)

Error throw
Scala.MatchError:Null
PairRDDFunctions.lookup...


the issue is nesting of the b rdd inside a transformation of the a rdd

consider using intersection, it's more idiomatic

a.intersection(b).foreach(println)

but not that intersection will remove duplicates

best,


matt

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org