GitHub user sureshthalamati opened a pull request:
https://github.com/apache/spark/pull/16209
[WIP][SPARK-10849][SQL] Adds option to the JDBC data source for user to
specify database column type for the create table
## What changes were proposed in this pull request?
Currently JDBC data source creates tables in the target database using the
default type mapping, and the JDBC dialect mechanism. Â If users want to
specify different database data type for only some of columns, there is no
option available. In scenarios where default mapping does not work, users are
forced to create tables on the target database before writing. This workaround
is probably not acceptable from a usability point of view. This PR is to
provide a user-defined type mapping for specific columns.
The solution is to allow users to specify database column data type for the
create table as JDBC datasource option(createTableColumnTypes) on write. Data
type information can be specified as key(column name)-value(data type) pairs in
JSON (e.g: {"name":"varchar(128)", "comments":"clob(20k)"}). Users can use
org.apache.spark.sql.types.MetadataBuilder to build the metadata and generate
the JSON string required for this option.
Example:
```Scala
val mdb = new MetadataBuilder()
mdb.putString("name", "VARCHAR(128)â)
mdb.putString("commentsâ, âCLOB(20K)â)
val createTableColTypes = mdb.build().json
df.write.option("createTableColumnTypes", createTableColTypes).jdbc(url,
"TEST.DBCOLTYPETEST", properties)
```
Alternative approach is to add a new column metadata property to the jdbc
data source for users to specify database column type using the metadata.
TODO : Case-insensitive column name lookup based on the
spark.sql.caseSensitive property value.
## How was this patch tested?
Added new test case to the JDBCWriteSuite
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sureshthalamati/spark
jdbc_custom_dbtype_option_json-spark-10849
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16209.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16209
----
commit 6eec6ca63c5641d1bbbbc9958bdd300ac079d5cf
Author: sureshthalamati <[email protected]>
Date: 2016-12-02T23:22:17Z
Adding new option to the jdbc to allow users to specify create table column
types when table is created on write
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]