Github user OopsOutOfMemory commented on the pull request:

    https://github.com/apache/spark/pull/3847#issuecomment-68420808
  
    Hi, @marmbrus 
    Why I want to make a change for this is original caused by the `OPTIONS ` 
keyword.
    __Currently the `OPTIONS ` is an `ident` but not a `stringLit`.  And I 
don't want to break the design here, you can also passing `path parameter` to 
avro source. so I choose to add another one to make the property more  
flexible.__
    
    For external datasource, there are many parameters, they all follow a kind 
of naming format.
    I refereed Hive HBase:
    ```
    CREATE TABLE hbase_table_1(key int, value string) 
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
    TBLPROPERTIES ("hbase.table.name" = "xyz");
    ```
    If I use `OPTIONS` to pass k/v pairs. It looks like `hbase_table_name` or 
you can define other name, but you can not define `hbase.table.name` which 
sepreated by `.`  Like:
    ```
      val hbaseDDL = s"""
            |CREATE TEMPORARY TABLE hbase_people
            |USING com.shengli.spark.hbase
            |OPTIONS (
            |  sparksql_table_schema   '(row_key string, name string, age int, 
job string)',
            |  hbase_table_name    'people',
            |  hbase_table_schema '(:key , profile:name , profile:age , 
career:job )'
            |)""".stripMargin
    ```
    The format like `sparksql_table_schema`  which sepreated by `_` is a little 
ugly to me.
    __Recommend Format__
    I prefer the options format like this: 
    ```
       stringLit ~ "=" ~stringLit
    ```
    Also,I think  `Users` are get used to use these properties which sepreated 
by `.` , so stringLit is more make sence than indent.
    ```
    hbase.zookeeper.sission.timeout
    hbase.zookeeper.property.clientPort
    hbase.table.name
    hbase.master
    ```
    There I lists some common parameter format: 
    
    Hive Cassandra:
    ```
    hive> CREATE EXTERNAL TABLE MyHiveTable 
            ( key int,  data string ) 
            STORED BY 'org.apache.hadoop.hive.cassandra.cql3.CqlStorageHandler' 
            TBLPROPERTIES ( "cassandra.ks.name" = "cassandra_keyspace" , 
              "cassandra.cf.name" = "exampletable" );
    ```
    
    Hive Elasticsearch:
    ```
    CREATE EXTERNAL TABLE artists (...)
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES('es.resource' = 'radio/artists',
                  'es.index.auto.create' = 'false') ;
    ```
    
    2. yes, `SerDe` is not in API, we can do it later if there is a need. but 
we can put parameters in to `TBLPROPERTIES` first like Cassandra.
    
    Any suggestions : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to