[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/1961 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1961#issuecomment-216895081 Thanks @yjshen. Integration with SQL would be very nice! I'll then go ahead and merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...
Github user yjshen commented on the pull request: https://github.com/apache/flink/pull/1961#issuecomment-216877262 Hi @fhueske , thanks for the explanation. If I get it correctly, the current `toSink` API is a general one to allow write table content to a large variety of `Sinks` without blow up `flink-table` modules dependencies, the design seems quite reasonable for me now. BTW, if we are going to support some **native** output format, the `register` & `reflection` seems a feasible approach, by doing this, we can not only do ``` scala t.toSink("csv").option("path", "/foo").option("fieldDelim", "|") ``` in Table API but also ``` sql insert overwrite into parquet_table_a select * from table_b ``` in SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1961#issuecomment-216588444 Thanks for the feedback @yjshen. The motivation of the `TableSink` interface is to support very different storage systems (JDBC, Cassandra, Kafka, HBase, ...) and formats (CSV, Parquet, Avro, etc.). The idea is to reuse existing OutputFormats (DataSet) and SinkFunctions (DataStream) as much as possible. The configuration of the `TableSink` with field names and types happens internally and is not user-facing. While the goal is to support many different systems, we do not want to blow up the the dependencies of the flink-table module. With the current design we can add TableSinks to the respective modules in `flink-batch-connectors` and `flink-streaming-connectors` and don't have to add all external dependencies to the Table API. Also we want to give users the option to define their own table sinks. I am not sure about configuring the output type and parameters with untyped Strings. IMO, this makes it hard to identify and look up relevant parameters and options. But maybe we can add a registration of TableSinks to the TableEnvironment and do something like: ``` tEnv.registerSinkType("csv", classOf[CsvTableSink]) val t: Table = ... t.toSink("csv").option("path", "/foo").option("fileDelim", "|") ``` We would need to find a way to pass the options to the TableSink constructor, maybe via reflection... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...
Github user yjshen commented on the pull request: https://github.com/apache/flink/pull/1961#issuecomment-216575391 Hi @fhueske , I've read through this PR and find a little wired of the current API design. Please correct me if I take something wrong: Since we are output `Table`s, the schema is known at runtime, why should we first create a type agnostic `TableSink` and then configure it with specific name and types? What about ``` scala val t: Table = ... t.write().format("csv").option("delim", "|").option("path","/path/to/file") env.execute() ``` and construct the `TableSink` when we are about to `execute()`? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...
GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/1961 [FLINK-1996] [tableApi] Add TableSink interface to emit tables to external storage. Thanks for contributing to Apache Flink. Before you open your pull request, please take the following check list into consideration. If your changes take all of the items into account, feel free to open your pull request. For more information and/or questions please refer to the [How To Contribute guide](http://flink.apache.org/how-to-contribute.html). In addition to going through the list, please provide a meaningful description of your changes. - [X] General - The pull request references the related JIRA issue ("[FLINK-XXX] Jira title text") - The pull request addresses only one issue - Each commit in the PR has a meaningful commit message (including the JIRA id) - [X] Documentation - Documentation has been added for new functionality - Old documentation affected by the pull request has been updated - JavaDoc for public methods has been added - [X] Tests & Build - Functionality added by the pull request is covered by tests - `mvn clean verify` has been executed successfully locally or a Travis build has passed You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink tableSink Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1961 commit ffae8feddf67c3988a2422b227a1a22190b0e69e Author: Fabian HueskeDate: 2016-04-30T19:11:40Z [FLINK-1996] [tableApi] Add TableSink interface to emit tables to external storage. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---