[jira] [Updated] (SPARK-2710) Build SchemaRDD from a JdbcRDD with MetaData (no hard code case class)

Teng Qiu (JIRA) Sun, 27 Jul 2014 16:56:21 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Teng Qiu updated SPARK-2710:
----------------------------

    Description: 
Spark SQL can take Parquet files or JSON files as a table directly (without 
given a case class to define the schema)

as a component named SQL, it should also be able to take a ResultSet from RDBMS 
easily.

i find that there is a JdbcRDD in core: 
core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala

so i want to make some small change in this file to allow SQLContext to read 
the MetaData from the PreparedStatement (read metadata do not need to execute 
the query really).

Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his 
MetaData.

In the further, maybe we can add a feature in sql-shell, so that user can using 
spark-thrift-server join tables from different sources

such as:
{code}
CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" 
"initQuery" "bound" ...
CREATE TABLE parquet_files AS PARQUET "hdfs://tmp/parquet_table/"
SELECT parquet_files.colX, jdbc_tbl1.colY
  FROM parquet_files
  JOIN jdbc_tbl1
    ON (parquet_files.id = jdbc_tbl1.id)
{code}

I think such a feature will be useful, like facebook Presto engine does.


oh, and there is a small bug in JdbcRDD

in compute(), method close()
{code}
if (null != conn && ! stmt.isClosed()) conn.close()
{code}
should be
{code}
if (null != conn && ! conn.isClosed()) conn.close()
{code}

just a small write error :)
but such a close method will never be able to close conn...


  was:
Spark SQL can take Parquet files or JSON files as a table directly (without 
given a case class to define the schema)

as a component named SQL, it should also be able to take a ResultSet from RDBMS 
easily.

i find that there is a JdbcRDD in core: 
core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala

so i want to make some small change in this file to allow SQLContext to read 
the MetaData from the PreparedStatement (read metadata do not need to execute 
the query really).

and there is a small bug in JdbcRDD

in compute(), method close()
{code}
if (null != conn && ! stmt.isClosed()) conn.close()
{code}
should be
{code}
if (null != conn && ! conn.isClosed()) conn.close()
{code}

just a small write error :)

Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his 
MetaData.

In the further, maybe we can add a feature in sql-shell, so that user can using 
spark-thrift-server join tables from different sources

such as:
{code}
CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" 
"initQuery" "bound" ...
CREATE TABLE parquet_files AS JDBC "hdfs://tmp/parquet_table/"
SELECT parquet_files.colX, jdbc_tbl1.colY
  FROM parquet_files
  JOIN jdbc_tbl1
    ON (parquet_files.id = jdbc_tbl1.id)
{code}

I think such a feature will be useful, like facebook Presto engine does.


> Build SchemaRDD from a JdbcRDD with MetaData (no hard code case class)
> ----------------------------------------------------------------------
>
>                 Key: SPARK-2710
>                 URL: https://issues.apache.org/jira/browse/SPARK-2710
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core, SQL
>            Reporter: Teng Qiu
>
> Spark SQL can take Parquet files or JSON files as a table directly (without 
> given a case class to define the schema)
> as a component named SQL, it should also be able to take a ResultSet from 
> RDBMS easily.
> i find that there is a JdbcRDD in core: 
> core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala
> so i want to make some small change in this file to allow SQLContext to read 
> the MetaData from the PreparedStatement (read metadata do not need to execute 
> the query really).
> Then, in Spark SQL, SQLContext can create SchemaRDD with JdbcRDD and his 
> MetaData.
> In the further, maybe we can add a feature in sql-shell, so that user can 
> using spark-thrift-server join tables from different sources
> such as:
> {code}
> CREATE TABLE jdbc_tbl1 AS JDBC "connectionString" "username" "password" 
> "initQuery" "bound" ...
> CREATE TABLE parquet_files AS PARQUET "hdfs://tmp/parquet_table/"
> SELECT parquet_files.colX, jdbc_tbl1.colY
>   FROM parquet_files
>   JOIN jdbc_tbl1
>     ON (parquet_files.id = jdbc_tbl1.id)
> {code}
> I think such a feature will be useful, like facebook Presto engine does.
> oh, and there is a small bug in JdbcRDD
> in compute(), method close()
> {code}
> if (null != conn && ! stmt.isClosed()) conn.close()
> {code}
> should be
> {code}
> if (null != conn && ! conn.isClosed()) conn.close()
> {code}
> just a small write error :)
> but such a close method will never be able to close conn...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (SPARK-2710) Build SchemaRDD from a JdbcRDD with MetaData (no hard code case class)

Reply via email to