qry_1 and qry_2 are simple select query with groupBy clause.

Are there any specific queries which works in a different way for Spark SQL
and DataFrame API?

Regards,
Neeraj

On Sat, Mar 30, 2019 at 7:27 PM Jason Nerothin <jasonnerot...@gmail.com>
wrote:

> Can you please quantify the difference and provide the query code?
>
> On Fri, Mar 29, 2019 at 9:11 AM neeraj bhadani <
> bhadani.neeraj...@gmail.com> wrote:
>
>> Hi Team,
>>    I am executing same spark code using the Spark SQL API and DataFrame
>> API, however, Spark SQL is taking longer than expected.
>>
>> PFB Sudo code.
>>
>> -----------------------------------------------------------------------------------------------
>>
>> Case 1 : Spark SQL
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>> %sql
>>
>> CREATE TABLE <tbl_name>
>>
>> AS
>>
>>
>>  WITH <table_1> AS (
>>
>>      <qry1>
>>
>> )
>>
>> ,<table_2> AS (
>>
>>      <qry2>
>>
>>      )
>>
>>
>> SELECT * FROM <table_1>
>>
>> UNION ALL
>>
>> SELECT * FROM <table_2>
>>
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>> Case  2 : DataFrame API
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>>
>> df1 = spark.sql(<qry1>)
>>
>> df2 = spark.sql(<qry2>)
>>
>> df3 = df1.union(df2)
>>
>> df3.write.saveAsTable(<table_name>)
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>>
>> As per my understanding, both Spark SQL and DtaaFrame API generate the
>> same code under the hood and execution time has to be similar.
>>
>>
>> Regards,
>>
>> Neeraj
>>
>>
>>
>
> --
> Thanks,
> Jason
>

Reply via email to