Re: Spark SQL DSL for joins?

2014-12-21 Thread Cheng Lian

On 12/17/14 1:43 PM, Jerry Raj wrote:


Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I 
have two tables (backed by Parquet files) and I need to do a join 
across them using a common field (user_id). This works fine using 
standard SQL but not using the language-integrated DSL neither


t1.join(t2, on = 't1.user_id == t2.user_id)


Two issues this line:

1. use |===| instead of |==|
2. Add a single quote before |t2|



nor

t1.join(t2, on = Some('t1.user_id == t2.user_id))

work, or even compile. I could not find any examples of how to perform 
a join using the DSL. Any pointers will be appreciated :)


Thanks
-Jerry

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



​


Spark SQL DSL for joins?

2014-12-16 Thread Jerry Raj

Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I 
have two tables (backed by Parquet files) and I need to do a join across 
them using a common field (user_id). This works fine using standard SQL 
but not using the language-integrated DSL neither


t1.join(t2, on = 't1.user_id == t2.user_id)

nor

t1.join(t2, on = Some('t1.user_id == t2.user_id))

work, or even compile. I could not find any examples of how to perform a 
join using the DSL. Any pointers will be appreciated :)


Thanks
-Jerry

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL DSL for joins?

2014-12-16 Thread Jerry Raj

Another problem with the DSL:

t1.where('term == dmin).count() returns zero. But
sqlCtx.sql(select * from t1 where term = 'dmin').count() returns 700, 
which I know is correct from the data. Is there something wrong with how 
I'm using the DSL?


Thanks


On 17/12/14 11:13 am, Jerry Raj wrote:

Hi,
I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I
have two tables (backed by Parquet files) and I need to do a join across
them using a common field (user_id). This works fine using standard SQL
but not using the language-integrated DSL neither

t1.join(t2, on = 't1.user_id == t2.user_id)

nor

t1.join(t2, on = Some('t1.user_id == t2.user_id))

work, or even compile. I could not find any examples of how to perform a
join using the DSL. Any pointers will be appreciated :)

Thanks
-Jerry

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark SQL DSL for joins?

2014-12-16 Thread Tobias Pfeiffer
Jerry,

On Wed, Dec 17, 2014 at 3:35 PM, Jerry Raj jerry@gmail.com wrote:

 Another problem with the DSL:

 t1.where('term == dmin).count() returns zero.


Looks like you need ===:
https://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD

Tobias