[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/1961


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-04 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216895081
  
Thanks @yjshen. Integration with SQL would be very nice! 
I'll then go ahead and merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-04 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216877262
  
Hi @fhueske , thanks for the explanation. If I get it correctly, the 
current `toSink` API is a general one to allow write table content to a large 
variety of `Sinks` without blow up `flink-table` modules dependencies, the 
design seems quite reasonable for me now.

BTW, if we are going to support some **native** output format, the 
`register` & `reflection` seems a feasible approach, by doing this, we can not 
only do 
``` scala
t.toSink("csv").option("path", "/foo").option("fieldDelim", "|")
``` 
in Table API but also 

``` sql
insert overwrite into parquet_table_a select * from table_b
```
in SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-03 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216588444
  
Thanks for the feedback @yjshen. 

The motivation of the `TableSink` interface is to support very different 
storage systems (JDBC, Cassandra, Kafka, HBase, ...) and formats (CSV, Parquet, 
Avro, etc.). The idea is to reuse existing OutputFormats (DataSet) and 
SinkFunctions (DataStream) as much as possible. The configuration of the 
`TableSink` with field names and types happens internally and is not 
user-facing. 

While the goal is to support many different systems, we do not want to blow 
up the the dependencies of the flink-table module. With the current design we 
can add TableSinks to the respective modules in `flink-batch-connectors` and 
`flink-streaming-connectors` and don't have to add all external dependencies to 
the Table API. Also we want to give users the option to define their own table 
sinks.

I am not sure about configuring the output type and parameters with untyped 
Strings. IMO, this makes it hard to identify and look up relevant parameters 
and options. 
But maybe we can add a registration of TableSinks to the TableEnvironment 
and do something like:

```
tEnv.registerSinkType("csv", classOf[CsvTableSink])

val t: Table = ...
t.toSink("csv").option("path", "/foo").option("fileDelim", "|")
```
We would need to find a way to pass the options to the TableSink 
constructor, maybe via reflection... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-03 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/flink/pull/1961#issuecomment-216575391
  
Hi @fhueske , I've read through this PR and find a little wired of the 
current API design.

Please correct me if I take something wrong: Since we are output `Table`s, 
the schema is known at runtime, why should we first create a type agnostic 
`TableSink` and then configure it with specific name and types? What about
``` scala
val t: Table = ...
t.write().format("csv").option("delim", "|").option("path","/path/to/file")
env.execute()
```
and construct the `TableSink` when we are about to `execute()`? :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1996] [tableApi] Add TableSink interfac...

2016-05-03 Thread fhueske
GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/1961

[FLINK-1996] [tableApi] Add TableSink interface to emit tables to external 
storage.

Thanks for contributing to Apache Flink. Before you open your pull request, 
please take the following check list into consideration.
If your changes take all of the items into account, feel free to open your 
pull request. For more information and/or questions please refer to the [How To 
Contribute guide](http://flink.apache.org/how-to-contribute.html).
In addition to going through the list, please provide a meaningful 
description of your changes.

- [X] General
  - The pull request references the related JIRA issue ("[FLINK-XXX] Jira 
title text")
  - The pull request addresses only one issue
  - Each commit in the PR has a meaningful commit message (including the 
JIRA id)

- [X] Documentation
  - Documentation has been added for new functionality
  - Old documentation affected by the pull request has been updated
  - JavaDoc for public methods has been added

- [X] Tests & Build
  - Functionality added by the pull request is covered by tests
  - `mvn clean verify` has been executed successfully locally or a Travis 
build has passed



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink tableSink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1961


commit ffae8feddf67c3988a2422b227a1a22190b0e69e
Author: Fabian Hueske 
Date:   2016-04-30T19:11:40Z

[FLINK-1996] [tableApi] Add TableSink interface to emit tables to external 
storage.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---