RE: Performance Spark SQL vs Dataframe API faster

2015-09-22 Thread Cheng, Hao
-Spark-SQL-vs-Dataframe-API-faster-tp24768.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Performance Spark SQL vs Dataframe API faster

2015-09-22 Thread sanderg
Is there a difference in performance between writing a spark job using only SQL statements and writing it using the dataframe api or does it translate to the same thing under the hood? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-Spark-SQL-vs

Re: SQL vs. DataFrame API

2015-06-23 Thread Davies Liu
If yo change to ```val numbers2 = numbers```, then it have the same problem On Tue, Jun 23, 2015 at 2:54 PM, Ignacio Blasco wrote: > It seems that it doesn't happen in Scala API. Not exactly the same as in > python, but pretty close. > > https://gist.github.com/elnopintan/675968d2e4be68958df8 >

Re: SQL vs. DataFrame API

2015-06-23 Thread Ignacio Blasco
It seems that it doesn't happen in Scala API. Not exactly the same as in python, but pretty close. https://gist.github.com/elnopintan/675968d2e4be68958df8 2015-06-23 23:11 GMT+02:00 Davies Liu : > I think it also happens in DataFrames API of all languages. > > On Tue, Jun 23, 2015 at 9:16 AM, Ig

Re: SQL vs. DataFrame API

2015-06-23 Thread Davies Liu
I think it also happens in DataFrames API of all languages. On Tue, Jun 23, 2015 at 9:16 AM, Ignacio Blasco wrote: > That issue happens only in python dsl? > > El 23/6/2015 5:05 p. m., "Bob Corsaro" escribió: >> >> Thanks! The solution: >> >> https://gist.github.com/dokipen/018a1deeab668efdf455

Re: SQL vs. DataFrame API

2015-06-23 Thread Bob Corsaro
I've only tried it in python On Tue, Jun 23, 2015 at 12:16 PM Ignacio Blasco wrote: > That issue happens only in python dsl? > El 23/6/2015 5:05 p. m., "Bob Corsaro" escribió: > >> Thanks! The solution: >> >> https://gist.github.com/dokipen/018a1deeab668efdf455 >> >> On Mon, Jun 22, 2015 at 4:3

Re: SQL vs. DataFrame API

2015-06-23 Thread Ignacio Blasco
That issue happens only in python dsl? El 23/6/2015 5:05 p. m., "Bob Corsaro" escribió: > Thanks! The solution: > > https://gist.github.com/dokipen/018a1deeab668efdf455 > > On Mon, Jun 22, 2015 at 4:33 PM Davies Liu wrote: > >> Right now, we can not figure out which column you referenced in >> `

Re: SQL vs. DataFrame API

2015-06-23 Thread Bob Corsaro
Thanks! The solution: https://gist.github.com/dokipen/018a1deeab668efdf455 On Mon, Jun 22, 2015 at 4:33 PM Davies Liu wrote: > Right now, we can not figure out which column you referenced in > `select`, if there are multiple row with the same name in the joined > DataFrame (for example, two `va

Re: SQL vs. DataFrame API

2015-06-22 Thread Davies Liu
Right now, we can not figure out which column you referenced in `select`, if there are multiple row with the same name in the joined DataFrame (for example, two `value`). A workaround could be: numbers2 = numbers.select(df.name, df.value.alias('other')) rows = numbers.join(numbers2,

Re: SQL vs. DataFrame API

2015-06-22 Thread Ignacio Blasco
Sorry thought it was scala/spark El 22/6/2015 9:49 p. m., "Bob Corsaro" escribió: > That's invalid syntax. I'm pretty sure pyspark is using a DSL to create a > query here and not actually doing an equality operation. > > On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco > wrote: > >> Probably you s

Re: SQL vs. DataFrame API

2015-06-22 Thread Bob Corsaro
That's invalid syntax. I'm pretty sure pyspark is using a DSL to create a query here and not actually doing an equality operation. On Mon, Jun 22, 2015 at 3:43 PM Ignacio Blasco wrote: > Probably you should use === instead of == and !== instead of != > Can anyone explain why the dataframe API do

Re: SQL vs. DataFrame API

2015-06-22 Thread Ignacio Blasco
Probably you should use === instead of == and !== instead of != Can anyone explain why the dataframe API doesn't work as I expect it to here? It seems like the column identifiers are getting confused. https://gist.github.com/dokipen/4b324a7365ae87b7b0e5

SQL vs. DataFrame API

2015-06-22 Thread Bob Corsaro
Can anyone explain why the dataframe API doesn't work as I expect it to here? It seems like the column identifiers are getting confused. https://gist.github.com/dokipen/4b324a7365ae87b7b0e5