Hi Kant, Based on my understanding, I think the only difference is the overhead of the selection/creation of SqlContext for the query you have passed. As the table / view is already available for use, sparkSession.sql('your query') should be simple & good enough.
Following uses the session/context by default created and available: * sparkSession.sql(**"select value from table")* while the following would look for create one & run the query (which I believe is extra overhead): *df.sqlContext().sql(**"select value from table")* Regards Raj On Wed, Dec 6, 2017 at 6:07 PM, kant kodali <kanth...@gmail.com> wrote: > Hi All, > > I have the following snippets of the code and I wonder what is the > difference between these two and which one should I use? I am using spark > 2.2. > > Dataset<Row> df = sparkSession.readStream() > .format("kafka") > .load(); > > df.createOrReplaceTempView("table"); > df.printSchema(); > > *Dataset<Row> resultSet = df.sqlContext().sql(* > *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery > streamingQuery = resultSet > .writeStream() > .trigger(Trigger.ProcessingTime(1000)) > .format("console") > .start(); > > > vs > > > Dataset<Row> df = sparkSession.readStream() > .format("kafka") > .load(); > > df.createOrReplaceTempView("table"); > > *Dataset<Row> resultSet = sparkSession.sql(* > *"select value from table"); //sparkSession.sql(this.query);*StreamingQuery > streamingQuery = resultSet > .writeStream() > .trigger(Trigger.ProcessingTime(1000)) > .format("console") > .start(); > > > Thanks! > >