Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
ike '%sell%', then you can just > try left semi join, which Spark will use SortMerge join in this case, I guess. > > Yong > > From: Yong Zhang mailto:java8...@hotmail.com>> > Sent: Tuesday, February 21, 2017 1:17 PM > To: Sidney Feiner; Chanh Le; user @spar

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Yong Zhang
___ From: Yong Zhang Sent: Tuesday, February 21, 2017 1:17 PM To: Sidney Feiner; Chanh Le; user @spark Subject: Re: How to query a query with not contain, not start_with, not end_with condition effective? Sorry, didn't pay attention to the originally requirement. Did you try the

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Yong Zhang
_id from data where url like '%sell%')").explain(true) Yong From: Sidney Feiner Sent: Tuesday, February 21, 2017 10:46 AM To: Yong Zhang; Chanh Le; user @spark Subject: RE: How to query a query with not contain, not start_with, not end_with co

RE: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Sidney Feiner
einer.startapp [StartApp]<http://www.startapp.com/> From: Yong Zhang [mailto:java8...@hotmail.com] Sent: Tuesday, February 21, 2017 4:10 PM To: Chanh Le ; user @spark Subject: Re: How to query a query with not contain, not start_with, not end_with condition effective? Not sure if I m

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Yong Zhang
Not sure if I misunderstand your question, but what's wrong doing it this way? scala> spark.version res6: String = 2.0.2 scala> val df = Seq((1,"lao.com/sell"), (2, "lao.com/buy")).toDF("user_id", "url") df: org.apache.spark.sql.DataFrame = [user_id: int, url: string] scala> df.registerTempTabl

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread ayan guha
First thing i would do is to add distinct, both inner and outer queries On Tue, 21 Feb 2017 at 8:56 pm, Chanh Le wrote: > Hi everyone, > > I am working on a dataset like this > *user_id url * > 1 lao.com/buy > 2 bao.com/sell > 2 cao.com/market > 1 lao.

Re: How to query a query with not contain, not start_with, not end_with condition effective?

2017-02-21 Thread Chanh Le
I tried a new way by using JOIN select user_id from data a left join (select user_id from data where url like ‘%sell%') b on a.user_id = b.user_id where b.user_id is NULL It’s faster and seem that Spark rather optimize for JOIN than sub query. Regards, Chanh > On Feb 21, 2017, at 4:56 PM, Cha