Hi Kant,

Based on my understanding, I think the only difference is the overhead of
the selection/creation of SqlContext for the query you have passed. As the
table / view is already available for use, sparkSession.sql('your query')
should be simple & good enough.

Following uses the session/context by default created and available:

* sparkSession.sql(**"select value from table")*

while the following would look for create one & run the query (which I
believe is extra overhead):
*df.sqlContext().sql(**"select value from table")*

Regards
Raj



On Wed, Dec 6, 2017 at 6:07 PM, kant kodali <kanth...@gmail.com> wrote:

> Hi All,
>
> I have the following snippets of the code and I wonder what is the
> difference between these two and which one should I use? I am using spark
> 2.2.
>
> Dataset<Row> df = sparkSession.readStream()
>     .format("kafka")
>     .load();
>
> df.createOrReplaceTempView("table");
> df.printSchema();
>
> *Dataset<Row> resultSet =  df.sqlContext().sql(*
> *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery 
> streamingQuery = resultSet
>         .writeStream()
>         .trigger(Trigger.ProcessingTime(1000))
>         .format("console")
>         .start();
>
>
> vs
>
>
> Dataset<Row> df = sparkSession.readStream()
>     .format("kafka")
>     .load();
>
> df.createOrReplaceTempView("table");
>
> *Dataset<Row> resultSet =  sparkSession.sql(*
> *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery 
> streamingQuery = resultSet
>         .writeStream()
>         .trigger(Trigger.ProcessingTime(1000))
>         .format("console")
>         .start();
>
>
> Thanks!
>
>

Reply via email to