RE: How to create dataframe from SQL Server SQL query

2015-12-07 Thread ayan guha
One more thing I feel for better maintability would be to create a dB view
and then use the view in spark. This will avoid burying complicated SQL
queries within application code.
On 8 Dec 2015 05:55, "Wang, Ningjun (LNG-NPV)" 
wrote:

> This is a very helpful article. Thanks for the help.
>
>
>
> Ningjun
>
>
>
> *From:* Sujit Pal [mailto:sujitatgt...@gmail.com]
> *Sent:* Monday, December 07, 2015 12:42 PM
> *To:* Wang, Ningjun (LNG-NPV)
> *Cc:* user@spark.apache.org
> *Subject:* Re: How to create dataframe from SQL Server SQL query
>
>
>
> Hi Ningjun,
>
>
>
> Haven't done this myself, saw your question and was curious about the
> answer and found this article which you might find useful:
>
>
> http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/
>
>
>
> According this article, you can pass in your SQL statement in the
> "dbtable" mapping, ie, something like:
>
>
>
> val jdbcDF = sqlContext.read.format("jdbc")
>
> .options(
>
> Map("url" -> "jdbc:postgresql:dbserver",
>
> "dbtable" -> "(select docid, title, docText from
> dbo.document where docid between 10 and 1000)"
>
> )).load
>
>
>
> -sujit
>
>
>
> On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) <
> ningjun.w...@lexisnexis.com> wrote:
>
> How can I create a RDD from a SQL query against SQLServer database? Here
> is the example of dataframe
>
>
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#overview
>
>
>
>
>
> *val* jdbcDF *=* sqlContext.read.format("jdbc").options(
>
>   *Map*("url" -> "jdbc:postgresql:dbserver",
>
>   "dbtable" -> "schema.tablename")).load()
>
>
>
> This code create dataframe from a table. How can I create dataframe from a
> query, e.g. “select docid, title, docText from dbo.document where docid
> between 10 and 1000”?
>
>
>
> Ningjun
>
>
>
>
>


RE: How to create dataframe from SQL Server SQL query

2015-12-07 Thread Wang, Ningjun (LNG-NPV)
This is a very helpful article. Thanks for the help.

Ningjun

From: Sujit Pal [mailto:sujitatgt...@gmail.com]
Sent: Monday, December 07, 2015 12:42 PM
To: Wang, Ningjun (LNG-NPV)
Cc: user@spark.apache.org
Subject: Re: How to create dataframe from SQL Server SQL query

Hi Ningjun,

Haven't done this myself, saw your question and was curious about the answer 
and found this article which you might find useful:
http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/

According this article, you can pass in your SQL statement in the "dbtable" 
mapping, ie, something like:

val jdbcDF = sqlContext.read.format("jdbc")
.options(
Map("url" -> "jdbc:postgresql:dbserver",
"dbtable" -> "(select docid, title, docText from dbo.document 
where docid between 10 and 1000)"
)).load

-sujit

On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) 
mailto:ningjun.w...@lexisnexis.com>> wrote:
How can I create a RDD from a SQL query against SQLServer database? Here is the 
example of dataframe

http://spark.apache.org/docs/latest/sql-programming-guide.html#overview


val jdbcDF = sqlContext.read.format("jdbc").options(
  Map("url" -> "jdbc:postgresql:dbserver",
  "dbtable" -> "schema.tablename")).load()

This code create dataframe from a table. How can I create dataframe from a 
query, e.g. “select docid, title, docText from dbo.document where docid between 
10 and 1000”?

Ningjun




Re: How to create dataframe from SQL Server SQL query

2015-12-07 Thread Sujit Pal
Hi Ningjun,

Haven't done this myself, saw your question and was curious about the
answer and found this article which you might find useful:
http://www.sparkexpert.com/2015/03/28/loading-database-data-into-spark-using-data-sources-api/

According this article, you can pass in your SQL statement in the "dbtable"
mapping, ie, something like:

val jdbcDF = sqlContext.read.format("jdbc")
.options(
Map("url" -> "jdbc:postgresql:dbserver",
"dbtable" -> "(select docid, title, docText from
dbo.document where docid between 10 and 1000)"
)).load

-sujit

On Mon, Dec 7, 2015 at 8:26 AM, Wang, Ningjun (LNG-NPV) <
ningjun.w...@lexisnexis.com> wrote:

> How can I create a RDD from a SQL query against SQLServer database? Here
> is the example of dataframe
>
>
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#overview
>
>
>
>
>
> *val* jdbcDF *=* sqlContext.read.format("jdbc").options(
>
>   *Map*("url" -> "jdbc:postgresql:dbserver",
>
>   "dbtable" -> "schema.tablename")).load()
>
>
>
> This code create dataframe from a table. How can I create dataframe from a
> query, e.g. “select docid, title, docText from dbo.document where docid
> between 10 and 1000”?
>
>
>
> Ningjun
>
>
>