Re: Spark SQL API taking longer time than DF API.
Please share the results of df.explain()[1] for both. That should give us some clues of what the differences are [1]https://github.com/apache/spark/blob/e1c90d66bbea5b4cb97226610701b0389b734651/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L499 -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark SQL API taking longer time than DF API.
Hi, Without more information it’s very difficult to work out what’s going on. If possible can you do the following and make available to us. 1) for each query call explain() and post the output. 2) Run each query and then go to the sql tab in the spark ui. For each query show us the plan. 3) For each query post some screenshots of the tasks page in the spark ui. (In all of the above make sure to redact ay sensitive information!) You are right in thinking that that the queries should be identical. My hunch is that something subtle is making them not quite identical and the above information should allow us to figure out what. Thanks, Chris > On 8 Apr 2019, at 09:21, neeraj bhadani wrote: > > Hi All, > Can anyone help me here with my query? > > Regards, > Neeraj > >> On Mon, Apr 1, 2019 at 9:44 AM neeraj bhadani >> wrote: >> In Both the cases, I am trying to create a HIVE table based on Union on 2 >> same queries. >> >> Not sure how internally it differs on the process of creation of HIVE table? >> >> Regards, >> Neeraj >> >>> On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke wrote: >>> Is the select taking longer or the saving to a file. You seem to only save >>> in the second case to a file >>> Am 29.03.2019 um 15:10 schrieb neeraj bhadani : Hi Team, I am executing same spark code using the Spark SQL API and DataFrame API, however, Spark SQL is taking longer than expected. PFB Sudo code. --- Case 1 : Spark SQL --- %sql CREATE TABLE AS WITH AS ( ) , AS ( ) SELECT * FROM UNION ALL SELECT * FROM --- Case 2 : DataFrame API --- df1 = spark.sql() df2 = spark.sql() df3 = df1.union(df2) df3.write.saveAsTable() --- As per my understanding, both Spark SQL and DtaaFrame API generate the same code under the hood and execution time has to be similar. Regards, Neeraj
Re: Spark SQL API taking longer time than DF API.
Hi All, Can anyone help me here with my query? Regards, Neeraj On Mon, Apr 1, 2019 at 9:44 AM neeraj bhadani wrote: > In Both the cases, I am trying to create a HIVE table based on Union on 2 > same queries. > > Not sure how internally it differs on the process of creation of HIVE > table? > > Regards, > Neeraj > > On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke wrote: > >> Is the select taking longer or the saving to a file. You seem to only >> save in the second case to a file >> >> Am 29.03.2019 um 15:10 schrieb neeraj bhadani < >> bhadani.neeraj...@gmail.com>: >> >> Hi Team, >>I am executing same spark code using the Spark SQL API and DataFrame >> API, however, Spark SQL is taking longer than expected. >> >> PFB Sudo code. >> >> --- >> >> Case 1 : Spark SQL >> >> >> --- >> >> %sql >> >> CREATE TABLE >> >> AS >> >> >> WITH AS ( >> >> >> >> ) >> >> , AS ( >> >> >> >> ) >> >> >> SELECT * FROM >> >> UNION ALL >> >> SELECT * FROM >> >> >> >> --- >> >> Case 2 : DataFrame API >> >> >> --- >> >> >> df1 = spark.sql() >> >> df2 = spark.sql() >> >> df3 = df1.union(df2) >> >> df3.write.saveAsTable() >> >> >> --- >> >> >> As per my understanding, both Spark SQL and DtaaFrame API generate the >> same code under the hood and execution time has to be similar. >> >> >> Regards, >> >> Neeraj >> >> >>
Re: Spark SQL API taking longer time than DF API.
In Both the cases, I am trying to create a HIVE table based on Union on 2 same queries. Not sure how internally it differs on the process of creation of HIVE table? Regards, Neeraj On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke wrote: > Is the select taking longer or the saving to a file. You seem to only save > in the second case to a file > > Am 29.03.2019 um 15:10 schrieb neeraj bhadani >: > > Hi Team, >I am executing same spark code using the Spark SQL API and DataFrame > API, however, Spark SQL is taking longer than expected. > > PFB Sudo code. > > --- > > Case 1 : Spark SQL > > > --- > > %sql > > CREATE TABLE > > AS > > > WITH AS ( > > > > ) > > , AS ( > > > > ) > > > SELECT * FROM > > UNION ALL > > SELECT * FROM > > > > --- > > Case 2 : DataFrame API > > > --- > > > df1 = spark.sql() > > df2 = spark.sql() > > df3 = df1.union(df2) > > df3.write.saveAsTable() > > > --- > > > As per my understanding, both Spark SQL and DtaaFrame API generate the > same code under the hood and execution time has to be similar. > > > Regards, > > Neeraj > > >
Re: Spark SQL API taking longer time than DF API.
Is the select taking longer or the saving to a file. You seem to only save in the second case to a file > Am 29.03.2019 um 15:10 schrieb neeraj bhadani : > > Hi Team, >I am executing same spark code using the Spark SQL API and DataFrame API, > however, Spark SQL is taking longer than expected. > > PFB Sudo code. > --- > Case 1 : Spark SQL > --- > %sql > CREATE TABLE > AS > > WITH AS ( > > ) > , AS ( > > ) > > SELECT * FROM > UNION ALL > SELECT * FROM > > --- > Case 2 : DataFrame API > --- > > df1 = spark.sql() > df2 = spark.sql() > df3 = df1.union(df2) > df3.write.saveAsTable() > --- > > As per my understanding, both Spark SQL and DtaaFrame API generate the same > code under the hood and execution time has to be similar. > > Regards, > Neeraj >
Re: Spark SQL API taking longer time than DF API.
qry_1 and qry_2 are simple select query with groupBy clause. Are there any specific queries which works in a different way for Spark SQL and DataFrame API? Regards, Neeraj On Sat, Mar 30, 2019 at 7:27 PM Jason Nerothin wrote: > Can you please quantify the difference and provide the query code? > > On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani < > bhadani.neeraj...@gmail.com> wrote: > >> Hi Team, >>I am executing same spark code using the Spark SQL API and DataFrame >> API, however, Spark SQL is taking longer than expected. >> >> PFB Sudo code. >> >> --- >> >> Case 1 : Spark SQL >> >> >> --- >> >> %sql >> >> CREATE TABLE >> >> AS >> >> >> WITH AS ( >> >> >> >> ) >> >> , AS ( >> >> >> >> ) >> >> >> SELECT * FROM >> >> UNION ALL >> >> SELECT * FROM >> >> >> >> --- >> >> Case 2 : DataFrame API >> >> >> --- >> >> >> df1 = spark.sql() >> >> df2 = spark.sql() >> >> df3 = df1.union(df2) >> >> df3.write.saveAsTable() >> >> >> --- >> >> >> As per my understanding, both Spark SQL and DtaaFrame API generate the >> same code under the hood and execution time has to be similar. >> >> >> Regards, >> >> Neeraj >> >> >> > > -- > Thanks, > Jason >
Re: Spark SQL API taking longer time than DF API.
Can you please quantify the difference and provide the query code? On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani wrote: > Hi Team, >I am executing same spark code using the Spark SQL API and DataFrame > API, however, Spark SQL is taking longer than expected. > > PFB Sudo code. > > --- > > Case 1 : Spark SQL > > > --- > > %sql > > CREATE TABLE > > AS > > > WITH AS ( > > > > ) > > , AS ( > > > > ) > > > SELECT * FROM > > UNION ALL > > SELECT * FROM > > > > --- > > Case 2 : DataFrame API > > > --- > > > df1 = spark.sql() > > df2 = spark.sql() > > df3 = df1.union(df2) > > df3.write.saveAsTable() > > > --- > > > As per my understanding, both Spark SQL and DtaaFrame API generate the > same code under the hood and execution time has to be similar. > > > Regards, > > Neeraj > > > -- Thanks, Jason
Spark SQL API taking longer time than DF API.
Hi Team, I am executing same spark code using the Spark SQL API and DataFrame API, however, Spark SQL is taking longer than expected. PFB Sudo code. --- Case 1 : Spark SQL --- %sql CREATE TABLE AS WITH AS ( ) , AS ( ) SELECT * FROM UNION ALL SELECT * FROM --- Case 2 : DataFrame API --- df1 = spark.sql() df2 = spark.sql() df3 = df1.union(df2) df3.write.saveAsTable() --- As per my understanding, both Spark SQL and DtaaFrame API generate the same code under the hood and execution time has to be similar. Regards, Neeraj