Github user OopsOutOfMemory commented on the pull request:
https://github.com/apache/spark/pull/3847#issuecomment-68420808
Hi, @marmbrus
Why I want to make a change for this is original caused by the `OPTIONS `
keyword.
__Currently the `OPTIONS ` is an `ident` but not a `stringLit`. And I
don't want to break the design here, you can also passing `path parameter` to
avro source. so I choose to add another one to make the property more
flexible.__
For external datasource, there are many parameters, they all follow a kind
of naming format.
I refereed Hive HBase:
```
CREATE TABLE hbase_table_1(key int, value string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
TBLPROPERTIES ("hbase.table.name" = "xyz");
```
If I use `OPTIONS` to pass k/v pairs. It looks like `hbase_table_name` or
you can define other name, but you can not define `hbase.table.name` which
sepreated by `.` Like:
```
val hbaseDDL = s"""
|CREATE TEMPORARY TABLE hbase_people
|USING com.shengli.spark.hbase
|OPTIONS (
| sparksql_table_schema '(row_key string, name string, age int,
job string)',
| hbase_table_name 'people',
| hbase_table_schema '(:key , profile:name , profile:age ,
career:job )'
|)""".stripMargin
```
The format like `sparksql_table_schema` which sepreated by `_` is a little
ugly to me.
__Recommend Format__
I prefer the options format like this:
```
stringLit ~ "=" ~stringLit
```
Also,I think `Users` are get used to use these properties which sepreated
by `.` , so stringLit is more make sence than indent.
```
hbase.zookeeper.sission.timeout
hbase.zookeeper.property.clientPort
hbase.table.name
hbase.master
```
There I lists some common parameter format:
Hive Cassandra:
```
hive> CREATE EXTERNAL TABLE MyHiveTable
( key int, data string )
STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler'
TBLPROPERTIES ( "cassandra.ks.name" = "cassandra_keyspace" ,
"cassandra.cf.name" = "exampletable" );
```
Hive Elasticsearch:
```
CREATE EXTERNAL TABLE artists (...)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radio/artists',
'es.index.auto.create' = 'false') ;
```
2. yes, `SerDe` is not in API, we can do it later if there is a need. but
we can put parameters in to `TBLPROPERTIES` first like Cassandra.
Any suggestions : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]