[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/1777


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87968979
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
+
+ TSV
+
+By default TSV uses `\t` as delimiter, but users can set another delimiter 
by setting `input.tsv.delimiter` and/or `output.tsv.delimiter`.
+Please note that it supports only one letter for delimiter.
+
+### Supported Data Sources
+
+| Data Source | Artifact Name  | Location prefix | Support 
Input data source | Support Output data source | Requires properties
+|:--- |:-- |:--- 
|:- |:-- |:---
+| Kafka | org.apache.storm:storm-sql-kafka | 
`kafka://zkhost:port/broker_path?topic=topic` | Yes | Yes | Yes
+| Redis | org.apache.storm:storm-sql-redis | 
`redis://:[password]@host:port/[dbIdx]` | No | Yes | Yes
+| MongoDB | org.apache.stormg:storm-sql-mongodb | 

[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87968697
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
--- End diff --

Yes that would be a good idea. Will address.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread vesense
Github user vesense commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87967670
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
--- End diff --

Minor. How about add a link to RFC4180? It is convenient for users who want 
to look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-15 Thread vesense
Github user vesense commented on a diff in the pull request:

https://github.com/apache/storm/pull/1777#discussion_r87968161
  
--- Diff: docs/storm-sql-reference.md ---
@@ -1203,4 +1203,103 @@ and class for aggregate function is here:
 For now users can skip implementing `result` method if it doesn't need 
transform accumulated value, 
 but this behavior is subject to change so providing `result` is 
recommended. 
 
-Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath. 
\ No newline at end of file
+Please note that users should use `--jars` or `--artifacts` while running 
Storm SQL runner to make sure UDFs and/or UDAFs are available in classpath.
+
+## External Data Sources
+
+### Specifying External Data Sources
+
+In StormSQL data is represented by external tables. Users can specify data 
sources using the `CREATE EXTERNAL TABLE` statement. The syntax of `CREATE 
EXTERNAL TABLE` closely follows the one defined in [Hive Data Definition 
Language](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL):
+
+```
+CREATE EXTERNAL TABLE table_name field_list
+[ STORED AS
+  INPUTFORMAT input_format_classname
+  OUTPUTFORMAT output_format_classname
+]
+LOCATION location
+[ TBLPROPERTIES tbl_properties ]
+[ AS select_stmt ]
+```
+
+Default input format and output format are JSON. We will introduce 
`supported formats` from further section.
+
+For example, the following statement specifies a Kafka spout and sink:
+
+```
+CREATE EXTERNAL TABLE FOO (ID INT PRIMARY KEY) LOCATION 
'kafka://localhost:2181/brokers?topic=test' TBLPROPERTIES 
'{"producer":{"bootstrap.servers":"localhost:9092","acks":"1","key.serializer":"org.apache.org.apache.storm.kafka.IntSerializer","value.serializer":"org.apache.org.apache.storm.kafka.ByteBufferSerializer"}}'
+```
+
+### Plugging in External Data Sources
+
+Users plug in external data sources through implementing the 
`ISqlTridentDataSource` interface and registers them using the mechanisms of 
Java's service loader. The external data source will be chosen based on the 
scheme of the URI of the tables. Please refer to the implementation of 
`storm-sql-kafka` for more details.
+
+### Supported Formats
+
+| Format  | Input format class | Output format class | Requires 
properties
+|:--- |:-- |:--- 
|:
+| JSON | org.apache.storm.sql.runtime.serde.json.JsonScheme | 
org.apache.storm.sql.runtime.serde.json.JsonSerializer | No
+| Avro | org.apache.storm.sql.runtime.serde.avro.AvroScheme | 
org.apache.storm.sql.runtime.serde.avro.AvroSerializer | Yes
+| CSV  | org.apache.storm.sql.runtime.serde.csv.CsvScheme | 
org.apache.storm.sql.runtime.serde.csv.CsvSerializer | No
+| TSV  | org.apache.storm.sql.runtime.serde.tsv.TsvScheme | 
org.apache.storm.sql.runtime.serde.tsv.TsvSerializer | No
+
+ Avro
+
+Avro requires users to describe the schema of record (both input and 
output). Schema should be described on `TBLPROPERTIES`.
+Input format needs to be described to `input.avro.schema`, and output 
format needs to be described to `output.avro.schema`.
+Schema string should be an escaped JSON so that `TBLPROPERTIES` is valid 
JSON.
+
+Example Schema description:
+
+`"input.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+`"output.avro.schema": "{\"type\": \"record\", \"name\": \"large_orders\", 
\"fields\" : [ {\"name\": \"ID\", \"type\": \"int\"}, {\"name\": \"TOTAL\", 
\"type\": \"int\"} ]}"`
+
+ CSV
+
+It uses `Standard RFC4180 CSV Parser` and doesn't need any other 
properties.
+
+ TSV
+
+By default TSV uses `\t` as delimiter, but users can set another delimiter 
by setting `input.tsv.delimiter` and/or `output.tsv.delimiter`.
+Please note that it supports only one letter for delimiter.
+
+### Supported Data Sources
+
+| Data Source | Artifact Name  | Location prefix | Support 
Input data source | Support Output data source | Requires properties
+|:--- |:-- |:--- 
|:- |:-- |:---
+| Kafka | org.apache.storm:storm-sql-kafka | 
`kafka://zkhost:port/broker_path?topic=topic` | Yes | Yes | Yes
+| Redis | org.apache.storm:storm-sql-redis | 
`redis://:[password]@host:port/[dbIdx]` | No | Yes | Yes
+| MongoDB | org.apache.stormg:storm-sql-mongodb | 

[GitHub] storm pull request #1777: STORM-2202 [Storm SQL] Document how to use support...

2016-11-14 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/storm/pull/1777

STORM-2202 [Storm SQL] Document how to use supported connectors and formats

Copy setting up external data sources to reference page, and add 
description on data sources (connectors) and formats.

@vesense Since you authored many of them, I'd be happy if you can take a 
look.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/storm STORM-2202

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/1777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1777


commit 102bb134d2e5fc91c5130612b556bcab5dc58ea6
Author: Jungtaek Lim 
Date:   2016-11-15T06:50:29Z

STORM-2202 [Storm SQL] Document how to use supported connectors and formats




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---