RE: How to create dataframe from SQL Server SQL query
One more thing I feel for better maintability would be to create a dB view and then use the view in spark. This will avoid burying complicated SQL queries within application code. On 8 Dec 2015 05:55, "Wang, Ningjun (LNG-NPV)" wrote: > This is a very helpful article. Thanks for the help. > > > > Ningjun > > > > *From:* Sujit Pal [mailto:sujitatgt...@gmail.com] > *Sent:* Monday, December 07, 2015 12:42 PM > *To:* Wang, Ningjun (LNG-NPV) > *Cc:* user@spark.apache.org > *Subject:* Re: How to create dataframe from SQL Server SQL query > > > > Hi Ningjun, > > > > Haven't done this myself, saw your question and was curious about the > answer and found this article which you might find useful: > > > http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ > > > > According this article, you can pass in your SQL statement in the > "dbtable" mapping, ie, something like: > > > > val jdbcDF = sqlContext.read.format("jdbc") > > .options( > > Map("url" -> "jdbc:postgresql:dbserver", > > "dbtable" -> "(select docid, title, docText from > dbo.document where docid between 10 and 1000)" > > )).load > > > > -sujit > > > > On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) < > ningjun.w...@lexisnexis.com> wrote: > > How can I create a RDD from a SQL query against SQLServer database? Here > is the example of dataframe > > > > http://spark.apache.org/docs/latest/sql-programming-guide.html#overview > > > > > > *val* jdbcDF *=* sqlContext.read.format("jdbc").options( > > *Map*("url" -> "jdbc:postgresql:dbserver", > > "dbtable" -> "schema.tablename")).load() > > > > This code create dataframe from a table. How can I create dataframe from a > query, e.g. “select docid, title, docText from dbo.document where docid > between 10 and 1000”? > > > > Ningjun > > > > >
RE: How to create dataframe from SQL Server SQL query
This is a very helpful article. Thanks for the help. Ningjun From: Sujit Pal [mailto:sujitatgt...@gmail.com] Sent: Monday, December 07, 2015 12:42 PM To: Wang, Ningjun (LNG-NPV) Cc: user@spark.apache.org Subject: Re: How to create dataframe from SQL Server SQL query Hi Ningjun, Haven't done this myself, saw your question and was curious about the answer and found this article which you might find useful: http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ According this article, you can pass in your SQL statement in the "dbtable" mapping, ie, something like: val jdbcDF = sqlContext.read.format("jdbc") .options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "(select docid, title, docText from dbo.document where docid between 10 and 1000)" )).load -sujit On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) mailto:ningjun.w...@lexisnexis.com>> wrote: How can I create a RDD from a SQL query against SQLServer database? Here is the example of dataframe http://spark.apache.org/docs/latest/sql-programming-guide.html#overview val jdbcDF = sqlContext.read.format("jdbc").options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "schema.tablename")).load() This code create dataframe from a table. How can I create dataframe from a query, e.g. “select docid, title, docText from dbo.document where docid between 10 and 1000”? Ningjun
Re: How to create dataframe from SQL Server SQL query
Hi Ningjun, Haven't done this myself, saw your question and was curious about the answer and found this article which you might find useful: http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/ According this article, you can pass in your SQL statement in the "dbtable" mapping, ie, something like: val jdbcDF = sqlContext.read.format("jdbc") .options( Map("url" -> "jdbc:postgresql:dbserver", "dbtable" -> "(select docid, title, docText from dbo.document where docid between 10 and 1000)" )).load -sujit On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) < ningjun.w...@lexisnexis.com> wrote: > How can I create a RDD from a SQL query against SQLServer database? Here > is the example of dataframe > > > > http://spark.apache.org/docs/latest/sql-programming-guide.html#overview > > > > > > *val* jdbcDF *=* sqlContext.read.format("jdbc").options( > > *Map*("url" -> "jdbc:postgresql:dbserver", > > "dbtable" -> "schema.tablename")).load() > > > > This code create dataframe from a table. How can I create dataframe from a > query, e.g. “select docid, title, docText from dbo.document where docid > between 10 and 1000”? > > > > Ningjun > > >