[jira] [Created] (FLINK-17392) enable configuring minicluster in Flink SQL in IDE

2020-04-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-17392:


 Summary: enable configuring minicluster in Flink SQL in IDE
 Key: FLINK-17392
 URL: https://issues.apache.org/jira/browse/FLINK-17392
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Affects Versions: 1.11.0
Reporter: Bowen Li
Assignee: Kurt Young
 Fix For: 1.11.0


It's very common case that users who want to learn and test Flink SQL will try 
to run a SQL job in IDE like Intellij, with Flink minicluster. Currently it's 
fine to do so with a simple job requiring only one task slot, which is the 
default resource config of minicluster.

However, users cannot run even a little bit more complicated job since they 
cannot configure task slots of minicluster thru Flink SQL, e.g. single 
parallelism job requires shuffle. This incapability has been very frustrating 
to new users.

There are two solutions to this problem:
- in minicluster, if it is single parallelism job, then chain all operators 
together
- enable configuring minicluster in Flink SQL in IDE.

The latter feels more proper.

Expected: users can configure minicluster resources via either SQL ("set 
...=...") or TableEnvironment ("tEnv.setMiniclusterResources(..., ...)"). 

[~jark] [~lzljs3620320]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17333) add doc for "create ddl"

2020-04-22 Thread Bowen Li (Jira)
Bowen Li created FLINK-17333:


 Summary: add doc for "create ddl"
 Key: FLINK-17333
 URL: https://issues.apache.org/jira/browse/FLINK-17333
 Project: Flink
  Issue Type: Improvement
  Components: Documentation, Table SQL / API
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17175) StringUtils.arrayToString() should consider Object[] lastly

2020-04-15 Thread Bowen Li (Jira)
Bowen Li created FLINK-17175:


 Summary: StringUtils.arrayToString() should consider Object[] 
lastly
 Key: FLINK-17175
 URL: https://issues.apache.org/jira/browse/FLINK-17175
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.11.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17037) add e2e tests for reading array data types from postgres with JDBCTableSource and PostgresCatalog

2020-04-07 Thread Bowen Li (Jira)
Bowen Li created FLINK-17037:


 Summary: add e2e tests for reading array data types from postgres 
with JDBCTableSource and PostgresCatalog
 Key: FLINK-17037
 URL: https://issues.apache.org/jira/browse/FLINK-17037
 Project: Flink
  Issue Type: Sub-task
Reporter: Bowen Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16820) support reading array of timestamp, data, and time in JDBCTableSource

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16820:


 Summary: support reading array of timestamp, data, and time in 
JDBCTableSource
 Key: FLINK-16820
 URL: https://issues.apache.org/jira/browse/FLINK-16820
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16817) StringUtils.arrayToString() doesn't convert byte[][] correctly

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16817:


 Summary: StringUtils.arrayToString() doesn't convert byte[][] 
correctly
 Key: FLINK-16817
 URL: https://issues.apache.org/jira/browse/FLINK-16817
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16816) planner doesn't parse timestamp array correctly

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16816:


 Summary: planner doesn't parse timestamp array correctly
 Key: FLINK-16816
 URL: https://issues.apache.org/jira/browse/FLINK-16816
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner, Table SQL / Runtime
Reporter: Bowen Li
Assignee: Kurt Young
 Fix For: 1.11.0


planner doesn't parse timestamp array correctly.

 

Repro: 

In a input format (like JBDCInputFormat)'s \{{nextRecord(Row)}} API
 # when setting a timestamp datum as java.sql.Timestamp, it works fine
 # when setting an array of timestamp datums as java.sql.Timestamp[], it breaks 
and below is the strack trace

 
{code:java}
/Caused by: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to 
java.time.LocalDateTime
at 
org.apache.flink.table.dataformat.DataFormatConverters$LocalDateTimeConverter.toInternalImpl(DataFormatConverters.java:748)
at 
org.apache.flink.table.dataformat.DataFormatConverters$ObjectArrayConverter.toBinaryArray(DataFormatConverters.java:1110)
at 
org.apache.flink.table.dataformat.DataFormatConverters$ObjectArrayConverter.toInternalImpl(DataFormatConverters.java:1093)
at 
org.apache.flink.table.dataformat.DataFormatConverters$ObjectArrayConverter.toInternalImpl(DataFormatConverters.java:1068)
at 
org.apache.flink.table.dataformat.DataFormatConverters$DataFormatConverter.toInternal(DataFormatConverters.java:344)
at 
org.apache.flink.table.dataformat.DataFormatConverters$RowConverter.toInternalImpl(DataFormatConverters.java:1377)
at 
org.apache.flink.table.dataformat.DataFormatConverters$RowConverter.toInternalImpl(DataFormatConverters.java:1365)
at 
org.apache.flink.table.dataformat.DataFormatConverters$DataFormatConverter.toInternal(DataFormatConverters.java:344)
at SourceConversion$1.processElement(Unknown Source)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:714)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:689)
at 
org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:669)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:52)
at 
org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:30)
at 
org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:93)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:208)
{code}

seems that planner runtime handles java.sql.Timetamp in these two cases 
differently



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16815) add e2e tests for reading from postgres with JDBCTableSource and PostgresCatalog

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16815:


 Summary: add e2e tests for reading from postgres with 
JDBCTableSource and PostgresCatalog
 Key: FLINK-16815
 URL: https://issues.apache.org/jira/browse/FLINK-16815
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16814) StringUtils.arrayToString() doesn't convert byte[] correctly

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16814:


 Summary: StringUtils.arrayToString() doesn't convert byte[] 
correctly
 Key: FLINK-16814
 URL: https://issues.apache.org/jira/browse/FLINK-16814
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.10.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0


StringUtils.arrayToString() doesn't convert byte[] correctly. It uses 
Arrays.toString() but should be newing a string from the byte[]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16813) JDBCInputFormat doesn't correctly map Short

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16813:


 Summary:  JDBCInputFormat doesn't correctly map Short
 Key: FLINK-16813
 URL: https://issues.apache.org/jira/browse/FLINK-16813
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Affects Versions: 1.10.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0


currently when JDBCInputFormat converts a JDBC result set row to Flink Row, it 
doesn't check the type returned from jdbc result set. Problem is that object 
from jdbc result set doesn't always match the corresponding type in relational 
db. E.g. a short column in Postgres actually returns an Integer via jdbc.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16812) introduce PostgresRowConverter

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16812:


 Summary: introduce PostgresRowConverter
 Key: FLINK-16812
 URL: https://issues.apache.org/jira/browse/FLINK-16812
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0


per https://issues.apache.org/jira/browse/FLINK-16811



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16811) introduce JDBCRowConverter

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16811:


 Summary: introduce JDBCRowConverter
 Key: FLINK-16811
 URL: https://issues.apache.org/jira/browse/FLINK-16811
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0


currently when JDBCInputFormat converts a JDBC result set row to Flink Row, it 
doesn't check the type returned from jdbc result set. Problem is that object 
from jdbc result set doesn't always match the corresponding type in relational 
db. E.g. a short column in Postgres actually returns a Integer via jdbc. And 
such mismatch can be db-dependent.

 

Thus, we introduce JDBCRowConverter interface to convert a db specific row from 
jdbc to Flink row. Dbs should implement their own row converters.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16810) add back PostgresCatalogITCase

2020-03-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-16810:


 Summary: add back PostgresCatalogITCase
 Key: FLINK-16810
 URL: https://issues.apache.org/jira/browse/FLINK-16810
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16781) add built-in cache mechanism for LookupableTableSource in lookup join

2020-03-25 Thread Bowen Li (Jira)
Bowen Li created FLINK-16781:


 Summary: add built-in cache mechanism for LookupableTableSource in 
lookup join
 Key: FLINK-16781
 URL: https://issues.apache.org/jira/browse/FLINK-16781
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Bowen Li


Currently there's no built-in cache mechanism for LookupableTableSource. 
Developers have to either build their own cache or give up caching and bear 
with poor lookup performance.

Flink should provide a generic caching layer for all the LookupableTableSource 
to take advantage of.

cc [~ykt836] [~jark] [~lzljs3620320]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16780) improve Flink lookup join

2020-03-25 Thread Bowen Li (Jira)
Bowen Li created FLINK-16780:


 Summary: improve Flink lookup join 
 Key: FLINK-16780
 URL: https://issues.apache.org/jira/browse/FLINK-16780
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API, Table SQL / Planner
Reporter: Bowen Li


this is an umbrella ticket to group all the improvements related to lookup join 
in Flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-24 Thread Bowen Li
.
> > > >
> > > > DDL:
> > > > CREATE TABLE blackhole_table (
> > > > ...
> > > > ) WITH (
> > > > 'connector.type' = 'blackhole'
> > > > )
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Mon, Mar 23, 2020 at 12:04 PM Dian Fu 
> > wrote:
> > > >
> > > > > Thanks Jingsong for bringing up this discussion. +1 to this
> > proposal. I
> > > > > think Bowen's proposal makes much sense to me.
> > > > >
> > > > > This is also a painful problem for PyFlink users. Currently there
> is
> > no
> > > > > built-in easy-to-use table source/sink and it requires users to
> > write a
> > > > lot
> > > > > of code to trying out PyFlink. This is especially painful for new
> > users
> > > > who
> > > > > are not familiar with PyFlink/Flink. I have also encountered the
> > > tedious
> > > > > process Bowen encountered, e.g. writing random source connector,
> > print
> > > > sink
> > > > > and also blackhole print sink as there are no built-in ones to use.
> > > > >
> > > > > Regards,
> > > > > Dian
> > > > >
> > > > > > 在 2020年3月22日,上午11:24,Jark Wu  写道:
> > > > > >
> > > > > > +1 to Bowen's proposal. I also saw many requirements on such
> > built-in
> > > > > > connectors.
> > > > > >
> > > > > > I will leave some my thoughts here:
> > > > > >
> > > > > >> 1. datagen source (random source)
> > > > > > I think we can merge the functinality of sequence-source into
> > random
> > > > > source
> > > > > > to allow users to custom their data values.
> > > > > > Flink can generate random data according to the field types,
> users
> > > > > > can customize their values to be more domain specific, e.g.
> > > > > > 'field.user'='User_[1-9]{0,1}'
> > > > > > This will be similar to kafka-datagen-connect[1].
> > > > > >
> > > > > >> 2. console sink (print sink)
> > > > > > This will be very useful in production debugging, to easily
> output
> > an
> > > > > > intermediate view or result view to a `.out` file.
> > > > > > So that we can look into the data representation, or check dirty
> > > data.
> > > > > > This should be out-of-box without manually DDL registration.
> > > > > >
> > > > > >> 3. blackhole sink (no output sink)
> > > > > > This is very useful for high performance testing of Flink, to
> > > meansure
> > > > > the
> > > > > > throughput of the whole pipeline without sink.
> > > > > > Presto also provides this as a built-in connector [2].
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification
> > > > > > [2]: https://prestodb.io/docs/current/connector/blackhole.html
> > > > > >
> > > > > >
> > > > > > On Sat, 21 Mar 2020 at 12:31, Bowen Li 
> > wrote:
> > > > > >
> > > > > >> +1.
> > > > > >>
> > > > > >> I would suggest to take a step even further and see what users
> > > really
> > > > > need
> > > > > >> to test/try/play with table API and Flink SQL. Besides this one,
> > > > here're
> > > > > >> some more sources and sinks that I have developed or used
> > previously
> > > > to
> > > > > >> facilitate building Flink table/SQL pipelines.
> > > > > >>
> > > > > >>
> > > > > >>   1. random input data source
> > > > > >>  - should generate random data at a specified rate according
> > to
> > > > > schema
> > > > > >>  - purposes
> > > > > >> - test Flink pipeline and data can 

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-20 Thread Bowen Li
+1.

I would suggest to take a step even further and see what users really need
to test/try/play with table API and Flink SQL. Besides this one, here're
some more sources and sinks that I have developed or used previously to
facilitate building Flink table/SQL pipelines.


   1. random input data source
  - should generate random data at a specified rate according to schema
  - purposes
 - test Flink pipeline and data can end up in external storage
 correctly
 - stress test Flink sink as well as tuning up external storage
  2. print data sink
  - should print data in row format in console
  - purposes
 - make it easier to test Flink SQL job e2e in IDE
 - test Flink pipeline and ensure output data format/value is
 correct
  3. no output data sink
  - just swallow output data without doing anything
  - purpose
 - evaluate and tune performance of Flink source and the whole
 pipeline. Users' don't need to worry about sink back pressure

These may be taken into consideration all together as an effort to lower
the threshold of running Flink SQL/table API, and facilitate users' daily
work.

Cheers,
Bowen


On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li  wrote:

> Hi all,
>
> I heard some users complain that table is difficult to test. Now with SQL
> client, users are more and more inclined to use it to test rather than
> program.
> The most common example is Kafka source. If users need to test their SQL
> output and checkpoint, they need to:
>
> - 1.Launch a Kafka standalone, create a Kafka topic .
> - 2.Write a program, mock input records, and produce records to Kafka
> topic.
> - 3.Then test in Flink.
>
> The step 1 and 2 are annoying, although this test is E2E.
>
> Then I found StatefulSequenceSource, it is very good because it has deal
> with checkpoint things, so it is very good to checkpoint mechanism.Usually,
> users are turned on checkpoint in production.
>
> With computed columns, user are easy to create a sequence source DDL same
> to Kafka DDL. Then they can test inside Flink, don't need launch other
> things.
>
> Have you consider this? What do you think?
>
> CC: @Aljoscha Krettek  the author
> of StatefulSequenceSource.
>
> Best,
> Jingsong Lee
>


[jira] [Created] (FLINK-16702) develop JDBCCatalogFactory for service discovery

2020-03-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-16702:


 Summary: develop JDBCCatalogFactory for service discovery
 Key: FLINK-16702
 URL: https://issues.apache.org/jira/browse/FLINK-16702
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: FLIP-117: HBase catalog

2020-03-16 Thread Bowen Li
Hi,

I think core of the jira right now is to investigate if catalogs of
schemaless systems like HBase and Elasticsearch bring practical value to
users. I haven't used these SQL connectors before, and thus don't have much
to say in this case. Can anyone describe how it would work? Maybe @Yu
or @Zheng can chime in.

w.r.t unsupported operation exception, they should be thrown in targeted
getters (e.g. getView(), getFunction()). General listing APIs like
listView(), listFunction() should not throw them but just return empty
results, for the sake of not breaking user SQL experience. To dedup code,
such common implementations can be moved to AbstractCatalog to make APIs
look cleaner. I recall that there was an intention to refactor catalog API
signatures, but haven't kept up with it.

Bowen

On Sun, Mar 15, 2020 at 10:19 PM Jingsong Li  wrote:

> Thanks Flavio for driving. Personally I am +1 for integrating HBase tables.
>
> I start a new topic for discussion. It is related but not the core of this
> FLIP.
> In the FLIP, I can see:
> - Does HBase support the concept of partitions..? I don't think so..
> - Does HBase support functions? I don't think so..
> - Does HBase support statistics? I don't think so..
> - Does HBase support views? I don't think so..
>
> And in JDBC catalog [1]. There are lots of UnsupportedOperationExceptions
> too.
> And maybe for confluent catalog, UnsupportedOperationExceptions come again.
> Lots of UnsupportedOperationExceptions looks unhappy to this catalog api...
> So can we do some refactor to catalog api? I can see a lot of catalogs
> just need provide table information without partitions, functions,
> statistics, views...
>
> CC: @Dawid Wysakowicz  @Bowen Li
> 
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog
>
> Best,
> Jingsong Lee
>
> On Sat, Mar 14, 2020 at 7:36 AM Flavio Pompermaier 
> wrote:
>
>> Hello everybody,
>> I started a new FLIP to discuss about an HBaseCatalog implementation[1]
>> after the opening of the relative issue by Bowen [2].
>> I drafted a very simple version of the FLIP just to discuss about the
>> critical points (in red) in order to decide how to proceed.
>>
>> Best,
>> Flavio
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-117%3A+HBase+catalog
>> [2] https://issues.apache.org/jira/browse/FLINK-16575
>>
>
>
> --
> Best, Jingsong Lee
>


[jira] [Created] (FLINK-16575) develop HBaseCatalog to integrate HBase metadata into Flink

2020-03-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-16575:


 Summary: develop HBaseCatalog to integrate HBase metadata into 
Flink
 Key: FLINK-16575
 URL: https://issues.apache.org/jira/browse/FLINK-16575
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / HBase
Reporter: Bowen Li


develop HBaseCatalog to integrate HBase metadata into Flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-03-11 Thread Bowen Li
iant by using the WITH
> >> clause. Nevertheless, we aim for not diverging too much and the LIKE
> >> clause is an example of that. It will solve things like overwriting
> >> WATERMARKs, add additional/modifying properties and inherit schema.
> >>
> >> Bowen is right that Flink's DDL is mixing 3 types definition together.
> >> We are not the first ones that try to solve this. There is also the SQL
> >> MED standard [1] that tried to tackle this problem. I think it was not
> >> considered when designing the current DDL.
> >>
> >> Currently, I see 3 options for handling Kafka offsets. I will give some
> >> examples and look forward to feedback here:
> >>
> >> *Option 1* Runtime and semantic parms as part of the query
> >>
> >> `SELECT * FROM MyTable('offset'=123)`
> >>
> >> Pros:
> >> - Easy to add
> >> - Parameters are part of the main query
> >> - No complicated hinting syntax
> >>
> >> Cons:
> >> - Not SQL compliant
> >>
> >> *Option 2* Use metadata in query
> >>
> >> `CREATE TABLE MyTable (id INT, offset AS SYSTEM_METADATA('offset'))`
> >>
> >> `SELECT * FROM MyTable WHERE offset > TIMESTAMP '2012-12-12 12:34:22'`
> >>
> >> Pros:
> >> - SQL compliant in the query
> >> - Access of metadata in the DDL which is required anyway
> >> - Regular pushdown rules apply
> >>
> >> Cons:
> >> - Users need to add an additional comlumn in the DDL
> >>
> >> *Option 3*: Use hints for properties
> >>
> >> `
> >> SELECT *
> >> FROM MyTable /*+ PROPERTIES('offset'=123) */
> >> `
> >>
> >> Pros:
> >> - Easy to add
> >>
> >> Cons:
> >> - Parameters are not part of the main query
> >> - Cryptic syntax for new users
> >> - Not standard compliant.
> >>
> >> If we go with this option, I would suggest to make it available in a
> >> separate map and don't mix it with statically defined properties. Such
> >> that the factory can decide which properties have the right to be
> >> overwritten by the hints:
> >> TableSourceFactory.Context.getQueryHints(): ReadableConfig
> >>
> >> Regards,
> >> Timo
> >>
> >> [1] https://en.wikipedia.org/wiki/SQL/MED
> >>
> >> Currently I see 3 options as a
> >>
> >>
> >> On 11.03.20 07:21, Danny Chan wrote:
> >>> Thanks Bowen ~
> >>>
> >>> I agree we should somehow categorize our connector parameters.
> >>>
> >>> For type1, I’m already preparing a solution like the Confluent schema
> registry + Avro schema inference thing, so this may not be a problem in the
> near future.
> >>>
> >>> For type3, I have some questions:
> >>>
> >>>> "SELECT * FROM mykafka WHERE offset > 12pm yesterday”
> >>>
> >>> Where does the offset column come from, a virtual column from the
> table schema, you said that
> >>>
> >>>> They change
> >>> almost every time a query starts and have nothing to do with metadata,
> thus
> >>> should not be part of table definition/DDL
> >>>
> >>> But why you can reference it in the query, I’m confused for that, can
> you elaborate a little ?
> >>>
> >>> Best,
> >>> Danny Chan
> >>> 在 2020年3月11日 +0800 PM12:52,Bowen Li ,写道:
> >>>> Thanks Danny for kicking off the effort
> >>>>
> >>>> The root cause of too much manual work is Flink DDL has mixed 3 types
> of
> >>>> params together and doesn't handle each of them very well. Below are
> how I
> >>>> categorize them and corresponding solutions in my mind:
> >>>>
> >>>> - type 1: Metadata of external data, like external endpoint/url,
> >>>> username/pwd, schemas, formats.
> >>>>
> >>>> Such metadata are mostly already accessible in external system as
> long as
> >>>> endpoints and credentials are provided. Flink can get it thru
> catalogs, but
> >>>> we haven't had many catalogs yet and thus Flink just hasn't been able
> to
> >>>> leverage that. So the solution should be building more catalogs. Such
> >>>> params should be part of a Flink table DDL/definition, and not
> overridable
> >>>> in any means.
> >>>>
> >>>>
>

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-03-10 Thread Bowen Li
Thanks Danny for kicking off the effort

The root cause of too much manual work is Flink DDL has mixed 3 types of
params together and doesn't handle each of them very well. Below are how I
categorize them and corresponding solutions in my mind:

- type 1: Metadata of external data, like external endpoint/url,
username/pwd, schemas, formats.

Such metadata are mostly already accessible in external system as long as
endpoints and credentials are provided. Flink can get it thru catalogs, but
we haven't had many catalogs yet and thus Flink just hasn't been able to
leverage that. So the solution should be building more catalogs. Such
params should be part of a Flink table DDL/definition, and not overridable
in any means.


- type 2: Runtime params, like jdbc connector's fetch size, elasticsearch
connector's bulk flush size.

Such params don't affect query results, but affect how results are produced
(eg. fast or slow, aka performance) - they are essentially execution and
implementation details. They change often in exploration or development
stages, but not quite frequently in well-defined long-running pipelines.
They should always have default values and can be missing in query. They
can be part of a table DDL/definition, but should also be replaceable in a
query - *this is what table "hints" in FLIP-113 should cover*.


- type 3: Semantic params, like kafka connector's start offset.

Such params affect query results - the semantics. They'd better be as
filter conditions in WHERE clause that can be pushed down. They change
almost every time a query starts and have nothing to do with metadata, thus
should not be part of table definition/DDL, nor be persisted in catalogs.
If they will, users should create views to keep such params around (note
this is different from variable substitution).


Take Flink-Kafka as an example. Once we get these params right, here're the
steps users need to do to develop and run a Flink job:
- configure a Flink ConfluentSchemaRegistry with url, username, and password
- run "SELECT * FROM mykafka WHERE offset > 12pm yesterday" (simplified
timestamp) in SQL CLI, Flink automatically retrieves all metadata of
schema, file format, etc and start the job
- users want to make the job read Kafka topic faster, so it goes as "SELECT
* FROM mykafka /* faster_read_key=value*/ WHERE offset > 12pm yesterday"
- done and satisfied, users submit it to production


Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), I think it's a
nice-to-have feature, but not a strategically critical, long-term solution,
because
1) It may seem promising at the current stage to solve the
too-much-manual-work problem, but that's only because Flink hasn't
leveraged catalogs well and handled the 3 types of params above properly.
Once we get the params types right, the LIKE syntax won't be that
important, and will be just an easier way to create tables without retyping
long fields like username and pwd.
2) Note that only some rare type of catalog can store k-v property pair, so
table created this way often cannot be persisted. In the foreseeable
future, such catalog will only be HiveCatalog, and not everyone has a Hive
metastore. To be honest, without persistence, recreating tables every time
this way is still a lot of keyboard typing.

Cheers,
Bowen

On Tue, Mar 10, 2020 at 8:07 PM Kurt Young  wrote:

> If a specific connector want to have such parameter and read if out of
> configuration, then that's fine.
> If we are talking about a configuration for all kinds of sources, I would
> be super careful about that.
> It's true it can solve maybe 80% cases, but it will also make the left 20%
> feels weird.
>
> Best,
> Kurt
>
>
> On Wed, Mar 11, 2020 at 11:00 AM Jark Wu  wrote:
>
> > Hi Kurt,
> >
> > #3 Regarding to global offset:
> > I'm not saying to use the global configuration to override connector
> > properties by the planner.
> > But the connector should take this configuration and translate into their
> > client API.
> > AFAIK, almost all the message queues support eariliest and latest and a
> > timestamp value as start point.
> > So we can support 3 options for this configuration: "eariliest", "latest"
> > and a timestamp string value.
> > Of course, this can't solve 100% cases, but I guess can sovle 80% or 90%
> > cases.
> > And the remaining cases can be resolved by LIKE syntax which I guess is
> not
> > very common cases.
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 11 Mar 2020 at 10:33, Kurt Young  wrote:
> >
> > > Good to have such lovely discussions. I also want to share some of my
> > > opinions.
> > >
> > > #1 Regarding to error handling: I also think ignore invalid hints would
> > be
> > > dangerous, maybe
> > > the simplest solution is just throw an exception.
> > >
> > > #2 Regarding to property replacement: I don't think we should
> constraint
> > > ourself to
> > > the meaning of the word "hint", and forbidden it modifying any
> properties
> > > which can effect
> > > query results. IMO `PROPERTIES` is one 

[jira] [Created] (FLINK-16498) make Postgres table work end-2-end in Flink SQL with PostgresJDBCCatalog

2020-03-08 Thread Bowen Li (Jira)
Bowen Li created FLINK-16498:


 Summary: make Postgres table work end-2-end in Flink SQL with 
PostgresJDBCCatalog
 Key: FLINK-16498
 URL: https://issues.apache.org/jira/browse/FLINK-16498
 Project: Flink
  Issue Type: Sub-task
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16474) develop OracleJDBCCatalog to connect Flink with Oracle databases and ecosystem

2020-03-06 Thread Bowen Li (Jira)
Bowen Li created FLINK-16474:


 Summary: develop OracleJDBCCatalog to connect Flink with Oracle 
databases and ecosystem
 Key: FLINK-16474
 URL: https://issues.apache.org/jira/browse/FLINK-16474
 Project: Flink
  Issue Type: New Feature
Reporter: Bowen Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16473) add documentation for PostgresJDBCCatalog

2020-03-06 Thread Bowen Li (Jira)
Bowen Li created FLINK-16473:


 Summary: add documentation for PostgresJDBCCatalog
 Key: FLINK-16473
 URL: https://issues.apache.org/jira/browse/FLINK-16473
 Project: Flink
  Issue Type: Sub-task
  Components: Documentation
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16472) support precision of timestamp and time data types

2020-03-06 Thread Bowen Li (Jira)
Bowen Li created FLINK-16472:


 Summary: support precision of timestamp and time data types
 Key: FLINK-16472
 URL: https://issues.apache.org/jira/browse/FLINK-16472
 Project: Flink
  Issue Type: Sub-task
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16471) develop PostgresJDBCCatalog

2020-03-06 Thread Bowen Li (Jira)
Bowen Li created FLINK-16471:


 Summary: develop PostgresJDBCCatalog
 Key: FLINK-16471
 URL: https://issues.apache.org/jira/browse/FLINK-16471
 Project: Flink
  Issue Type: Sub-task
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Bowen Li
Hi Jingsong,

I think I misunderstood you. So your argument is that, to support hive
1.0.0 - 1.2.2, we are actually using Hive 1.2.2 and thus we name the flink
module as "flink-connector-hive-1.2", right? It makes sense to me now.

+1 for this change.

Cheers,
Bowen

On Thu, Mar 5, 2020 at 6:53 PM Jingsong Li  wrote:

> Hi Bowen,
>
> My idea is to directly provide the really dependent version, such as hive
> 1.2.2, our jar name is hive 1.2.2, so that users can directly and clearly
> know the version. As for which metastore is supported, we can guide it in
> the document, otherwise, write 1.0, and the result version is indeed 1.2.2,
> which will make users have wrong expectations.
>
> Another, maybe 2.3.6 can support 2.0-2.2 after some efforts.
>
> Best,
> Jingsong Lee
>
> On Fri, Mar 6, 2020 at 1:00 AM Bowen Li  wrote:
>
> > > I have some hesitation, because the actual version number can better
> > reflect the actual dependency. For example, if the user also knows the
> > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > name, or he may have the wrong expectation for the hive built-in
> functions.
> >
> > Sorry, I'm not sure if my proposal is understood correctly.
> >
> > What I'm saying is, in your original proposal, taking an example,
> suggested
> > naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
> > 1.2.2, a name including the highest Hive version it supports. I'm
> > suggesting to name it "flink-connector-hive-1.0", a name including the
> > lowest Hive version it supports.
> >
> > What do you think?
> >
> >
> >
> > On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li 
> > wrote:
> >
> > > Hi Bowen, thanks for your reply.
> > >
> > > > will there be a base module like "flink-connector-hive-base" which
> > holds
> > > all the common logic of these proposed modules
> > >
> > > Maybe we don't need, their implementation is only "pom.xml". Different
> > > versions have different dependencies.
> > >
> > > > it's more common to set the version in module name to be the lowest
> > > version that this module supports
> > >
> > > I have some hesitation, because the actual version number can better
> > > reflect the actual dependency. For example, if the user also knows the
> > > field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> > > name, or he may have the wrong expectation for the hive built-in
> > functions.
> > >
> > > [1] https://github.com/apache/flink/pull/11304
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
> > >
> > > > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> > > >
> > > > According to your description, I think it makes sense to incorporate
> > > > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of
> > ranges
> > > to
> > > > 4.
> > > >
> > > > A couple minor followup questions:
> > > > 1) will there be a base module like "flink-connector-hive-base" which
> > > holds
> > > > all the common logic of these proposed modules and is compiled into
> the
> > > > uber jar of "flink-connector-hive-xxx"?
> > > > 2) according to my observation, it's more common to set the version
> in
> > > > module name to be the lowest version that this module supports, e.g.
> > for
> > > > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > > > rather than "flink-connector-hive-1.2"
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > > > wrote:
> > > >
> > > > > Thanks Bowen for involving.
> > > > >
> > > > > > why you proposed segregating hive versions into the 5 ranges
> > above? &
> > > > > what different Hive features are supported in the 5 ranges?
> > > > >
> > > > > For only higher client dependencies version support lower hive
> > > metastore
> > > > > versions:
> > > > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column
> > stats,
> > > > we
> > > > > can throw exception for the unsupported feature.
> > > > > - Hive 2.0 and Hive 2.1, primary key support and alt

[jira] [Created] (FLINK-16448) add documentation for Hive table sink parallelism setting strategy

2020-03-05 Thread Bowen Li (Jira)
Bowen Li created FLINK-16448:


 Summary: add documentation for Hive table sink parallelism setting 
strategy
 Key: FLINK-16448
 URL: https://issues.apache.org/jira/browse/FLINK-16448
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0


per user-zh mailing list question, would be beneficial to add documentation for 
Hive table sink parallelism setting strategy



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-05 Thread Bowen Li
> I have some hesitation, because the actual version number can better
reflect the actual dependency. For example, if the user also knows the
field hiveVersion[1]. He may enter the wrong hiveVersion because of the
name, or he may have the wrong expectation for the hive built-in functions.

Sorry, I'm not sure if my proposal is understood correctly.

What I'm saying is, in your original proposal, taking an example, suggested
naming the module as "flink-connector-hive-1.2" to support hive 1.0.0 -
1.2.2, a name including the highest Hive version it supports. I'm
suggesting to name it "flink-connector-hive-1.0", a name including the
lowest Hive version it supports.

What do you think?



On Wed, Mar 4, 2020 at 11:14 PM Jingsong Li  wrote:

> Hi Bowen, thanks for your reply.
>
> > will there be a base module like "flink-connector-hive-base" which holds
> all the common logic of these proposed modules
>
> Maybe we don't need, their implementation is only "pom.xml". Different
> versions have different dependencies.
>
> > it's more common to set the version in module name to be the lowest
> version that this module supports
>
> I have some hesitation, because the actual version number can better
> reflect the actual dependency. For example, if the user also knows the
> field hiveVersion[1]. He may enter the wrong hiveVersion because of the
> name, or he may have the wrong expectation for the hive built-in functions.
>
> [1] https://github.com/apache/flink/pull/11304
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 2:34 PM Bowen Li  wrote:
>
> > Thanks Jingsong for your explanation! I'm +1 for this initiative.
> >
> > According to your description, I think it makes sense to incorporate
> > support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges
> to
> > 4.
> >
> > A couple minor followup questions:
> > 1) will there be a base module like "flink-connector-hive-base" which
> holds
> > all the common logic of these proposed modules and is compiled into the
> > uber jar of "flink-connector-hive-xxx"?
> > 2) according to my observation, it's more common to set the version in
> > module name to be the lowest version that this module supports, e.g. for
> > Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
> > rather than "flink-connector-hive-1.2"
> >
> >
> > On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li 
> > wrote:
> >
> > > Thanks Bowen for involving.
> > >
> > > > why you proposed segregating hive versions into the 5 ranges above? &
> > > what different Hive features are supported in the 5 ranges?
> > >
> > > For only higher client dependencies version support lower hive
> metastore
> > > versions:
> > > - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats,
> > we
> > > can throw exception for the unsupported feature.
> > > - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> > > change.
> > > - Hive 2.2 no thrift change.
> > > - Hive 2.3 change many things, lots of thrift change.
> > > - Hive 3+, not null. unique, timestamp, so many things.
> > >
> > > All these things can be found in hive_metastore.thrift.
> > >
> > > I think I can try do more effort in implementation to use Hive 2.2 to
> > > support Hive 2.0. So the range size will be 4.
> > >
> > > > have you tested that whether the proposed corresponding Flink module
> > will
> > > be fully compatible with each Hive version range?
> > >
> > > Yes, I have done some tests, not really for "fully", but it is a
> > technical
> > > judgment.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Thu, Mar 5, 2020 at 1:17 PM Bowen Li  wrote:
> > >
> > > > Thanks, Jingsong, for bringing this up. We've received lots of
> > feedbacks
> > > in
> > > > the past few months that the complexity involved in different Hive
> > > versions
> > > > has been quite painful for users to start with. So it's great to step
> > > > forward and deal with such issue.
> > > >
> > > > Before getting on a decision, can you please explain:
> > > >
> > > > 1) why you proposed segregating hive versions into the 5 ranges
> above?
> > > > 2) what different Hive features are supported in the 5 ranges?
> > > > 3) have you tested that whether the proposed corresponding Flink
> module
> > >

Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-04 Thread Bowen Li
Thanks Jingsong for your explanation! I'm +1 for this initiative.

According to your description, I think it makes sense to incorporate
support of Hive 2.2 to that of 2.0/2.1 and reducing the number of ranges to
4.

A couple minor followup questions:
1) will there be a base module like "flink-connector-hive-base" which holds
all the common logic of these proposed modules and is compiled into the
uber jar of "flink-connector-hive-xxx"?
2) according to my observation, it's more common to set the version in
module name to be the lowest version that this module supports, e.g. for
Hive 1.0.0 - 1.2.2, the module name can be "flink-connector-hive-1.0"
rather than "flink-connector-hive-1.2"


On Wed, Mar 4, 2020 at 10:20 PM Jingsong Li  wrote:

> Thanks Bowen for involving.
>
> > why you proposed segregating hive versions into the 5 ranges above? &
> what different Hive features are supported in the 5 ranges?
>
> For only higher client dependencies version support lower hive metastore
> versions:
> - Hive 1.0.0 - 1.2.2, thrift change is OK, only hive date column stats, we
> can throw exception for the unsupported feature.
> - Hive 2.0 and Hive 2.1, primary key support and alter_partition api
> change.
> - Hive 2.2 no thrift change.
> - Hive 2.3 change many things, lots of thrift change.
> - Hive 3+, not null. unique, timestamp, so many things.
>
> All these things can be found in hive_metastore.thrift.
>
> I think I can try do more effort in implementation to use Hive 2.2 to
> support Hive 2.0. So the range size will be 4.
>
> > have you tested that whether the proposed corresponding Flink module will
> be fully compatible with each Hive version range?
>
> Yes, I have done some tests, not really for "fully", but it is a technical
> judgment.
>
> Best,
> Jingsong Lee
>
> On Thu, Mar 5, 2020 at 1:17 PM Bowen Li  wrote:
>
> > Thanks, Jingsong, for bringing this up. We've received lots of feedbacks
> in
> > the past few months that the complexity involved in different Hive
> versions
> > has been quite painful for users to start with. So it's great to step
> > forward and deal with such issue.
> >
> > Before getting on a decision, can you please explain:
> >
> > 1) why you proposed segregating hive versions into the 5 ranges above?
> > 2) what different Hive features are supported in the 5 ranges?
> > 3) have you tested that whether the proposed corresponding Flink module
> > will be fully compatible with each Hive version range?
> >
> > Thanks,
> > Bowen
> >
> >
> >
> > On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee 
> > wrote:
> >
> > > Hi all,
> > >
> > > I'd like to propose introduce flink-connector-hive-xx modules.
> > >
> > > We have documented the dependencies detailed information[2]. But still
> > has
> > > some inconvenient:
> > > - Too many versions, users need to pick one version from 8 versions.
> > > - Too many versions, It's not friendly to our developers either,
> because
> > > there's a problem/exception, we need to look at eight different
> versions
> > of
> > > hive client code, which are often various.
> > > - Too many jars, for example, users need to download 4+ jars for Hive
> 1.x
> > > from various places.
> > >
> > > We have discussed in [1] and [2], but unfortunately, we can not achieve
> > an
> > > agreement.
> > >
> > > For improving this, I'd like to introduce few flink-connector-hive-xx
> > > modules in flink-connectors, module contains all the dependencies
> related
> > > to hive. And only support lower hive metastore versions:
> > > - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> > > - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> > > - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> > > - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> > > - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
> > >
> > > Users can choose one and download to flink/lib. It includes all hive
> > > things.
> > >
> > > I try to use a single module to deploy multiple versions, but I can not
> > > find a suitable way, because different modules require different
> versions
> > > and different dependencies.
> > >
> > > What do you think?
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> > > [2]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
> > >
> > > Best,
> > > Jingsong Lee
> > >
> >
>
>
> --
> Best, Jingsong Lee
>


Re: [VOTE] FLIP-93: JDBC catalog and Postgres catalog

2020-03-04 Thread Bowen Li
I'm glad to announce that the voting of FLIP-93 has passed, with 7 +1  (3
binding: Jingsong, Kurt, Jark, 4 non-binding: Benchao, zoudan, Terry,
Leonard) and no -1.

Thanks everyone for participating!

Cheers,
Bowen

On Mon, Mar 2, 2020 at 7:33 AM Leonard Xu  wrote:

> +1 (non-binding).
>
>  Very useful feature especially for ETL, It will make  connecting to
> existed DB systems easier.
>
> Best,
> Leonard
>
> > 在 2020年3月2日,21:58,Jark Wu  写道:
> >
> > +1 from my side.
> >
> > Best,
> > Jark
> >
> > On Mon, 2 Mar 2020 at 21:40, Kurt Young  wrote:
> >
> >> +1
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Mon, Mar 2, 2020 at 5:32 PM Jingsong Lee 
> >> wrote:
> >>
> >>> +1 from my side.
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>> On Mon, Mar 2, 2020 at 11:06 AM Terry Wang  wrote:
> >>>
> >>>> +1 (non-binding).
> >>>> With this feature, we can more easily interact traditional database in
> >>>> flink.
> >>>>
> >>>> Best,
> >>>> Terry Wang
> >>>>
> >>>>
> >>>>
> >>>>> 2020年3月1日 18:33,zoudan  写道:
> >>>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>> Best,
> >>>>> Dan Zou
> >>>>>
> >>>>>
> >>>>>> 在 2020年2月28日,02:38,Bowen Li  写道:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'd like to kick off the vote for FLIP-93 [1] to add JDBC catalog
> >> and
> >>>>>> Postgres catalog.
> >>>>>>
> >>>>>> The vote will last for at least 72 hours, following the consensus
> >>> voting
> >>>>>> protocol.
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog
> >>>>>>
> >>>>>> Discussion thread:
> >>>>>>
> >>>>
> >>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-92-JDBC-catalog-and-Postgres-catalog-td36505.html
> >>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>>
> >>
>
>


Re: [DISCUSS] Introduce flink-connector-hive-xx modules

2020-03-04 Thread Bowen Li
Thanks, Jingsong, for bringing this up. We've received lots of feedbacks in
the past few months that the complexity involved in different Hive versions
has been quite painful for users to start with. So it's great to step
forward and deal with such issue.

Before getting on a decision, can you please explain:

1) why you proposed segregating hive versions into the 5 ranges above?
2) what different Hive features are supported in the 5 ranges?
3) have you tested that whether the proposed corresponding Flink module
will be fully compatible with each Hive version range?

Thanks,
Bowen



On Wed, Mar 4, 2020 at 1:00 AM Jingsong Lee  wrote:

> Hi all,
>
> I'd like to propose introduce flink-connector-hive-xx modules.
>
> We have documented the dependencies detailed information[2]. But still has
> some inconvenient:
> - Too many versions, users need to pick one version from 8 versions.
> - Too many versions, It's not friendly to our developers either, because
> there's a problem/exception, we need to look at eight different versions of
> hive client code, which are often various.
> - Too many jars, for example, users need to download 4+ jars for Hive 1.x
> from various places.
>
> We have discussed in [1] and [2], but unfortunately, we can not achieve an
> agreement.
>
> For improving this, I'd like to introduce few flink-connector-hive-xx
> modules in flink-connectors, module contains all the dependencies related
> to hive. And only support lower hive metastore versions:
> - "flink-connector-hive-1.2" to support hive 1.0.0 - 1.2.2
> - "flink-connector-hive-2.0" to support hive 2.0.0 - 2.0.1
> - "flink-connector-hive-2.2" to support hive 2.1.0 - 2.2.0
> - "flink-connector-hive-2.3" to support hive 2.3.0 - 2.3.6
> - "flink-connector-hive-3.1" to support hive 3.0.0 - 3.1.2
>
> Users can choose one and download to flink/lib. It includes all hive
> things.
>
> I try to use a single module to deploy multiple versions, but I can not
> find a suitable way, because different modules require different versions
> and different dependencies.
>
> What do you think?
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-have-separate-Flink-distributions-with-built-in-Hive-dependencies-td35918.html
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-109-Improve-Hive-dependencies-out-of-box-experience-td38290.html
>
> Best,
> Jingsong Lee
>


Re: Creating TemporalTable based on Catalog table in SQL Client

2020-03-04 Thread Bowen Li
you would need to reference the table with fully qualified name with
catalog and database

On Wed, Mar 4, 2020 at 02:17 Gyula Fóra  wrote:

> I guess it will only work now if you specify the catalog name too when
> referencing the table.
>
>
> On Wed, Mar 4, 2020 at 11:15 AM Gyula Fóra  wrote:
>
> > You are right but still if the default catalog is something else and
> > that's the one containing the table then it still wont work currently.
> >
> > Gyula
> >
> > On Wed, Mar 4, 2020 at 5:08 AM Bowen Li  wrote:
> >
> >> Hi Gyula,
> >>
> >> What line 622 (the link you shared) does is not registering catalogs,
> but
> >> setting an already registered catalog as the current one. As you can see
> >> from the method and its comment, catalogs are loaded first before any
> >> tables in yaml are registered, so you should be able to achieve what you
> >> described.
> >>
> >> Bowen
> >>
> >> On Tue, Mar 3, 2020 at 5:16 AM Gyula Fóra  wrote:
> >>
> >> > Hi all!
> >> >
> >> > I was testing the TemporalTable functionality in the SQL client while
> >> using
> >> > the Hive Catalog and I ran into the following problem.
> >> >
> >> > I have a table created in the Hive catalog and I want to create a
> >> temporal
> >> > table over it.
> >> >
> >> > As we cannot create temporal tables in SQL directly I have to define
> it
> >> in
> >> > the environment yaml file. Unfortunately it seems to be impossible to
> >> > reference a table only present in the catalog (not in the yaml) as
> >> catalogs
> >> > are loaded only after creating the temporal table (see
> >> >
> >> >
> >>
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/ExecutionContext.java#L622
> >> > )
> >> >
> >> > I am wondering if it would make sense to set the catalogs before all
> >> else
> >> > or if that would cause some other problems.
> >> >
> >> > What do you think?
> >> > Gyula
> >> >
> >>
> >
>


Re: Creating TemporalTable based on Catalog table in SQL Client

2020-03-03 Thread Bowen Li
Hi Gyula,

What line 622 (the link you shared) does is not registering catalogs, but
setting an already registered catalog as the current one. As you can see
from the method and its comment, catalogs are loaded first before any
tables in yaml are registered, so you should be able to achieve what you
described.

Bowen

On Tue, Mar 3, 2020 at 5:16 AM Gyula Fóra  wrote:

> Hi all!
>
> I was testing the TemporalTable functionality in the SQL client while using
> the Hive Catalog and I ran into the following problem.
>
> I have a table created in the Hive catalog and I want to create a temporal
> table over it.
>
> As we cannot create temporal tables in SQL directly I have to define it in
> the environment yaml file. Unfortunately it seems to be impossible to
> reference a table only present in the catalog (not in the yaml) as catalogs
> are loaded only after creating the temporal table (see
>
> https://github.com/apache/flink/blob/master/flink-table/flink-sql-client/src/main/java/org/apache/flink/table/client/gateway/local/ExecutionContext.java#L622
> )
>
> I am wondering if it would make sense to set the catalogs before all else
> or if that would cause some other problems.
>
> What do you think?
> Gyula
>


[VOTE] FLIP-93: JDBC catalog and Postgres catalog

2020-02-27 Thread Bowen Li
Hi all,

I'd like to kick off the vote for FLIP-93 [1] to add JDBC catalog and
Postgres catalog.

The vote will last for at least 72 hours, following the consensus voting
protocol.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

Discussion thread:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-92-JDBC-catalog-and-Postgres-catalog-td36505.html


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-21 Thread Bowen Li
Congrats, Jingsong!

On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann  wrote:

> Congratulations Jingsong!
>
> Cheers,
> Till
>
> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao  wrote:
>
>>   Congratulations Jingsong!
>>
>>Best,
>>Yun
>>
>> --
>> From:Jingsong Li 
>> Send Time:2020 Feb. 21 (Fri.) 21:42
>> To:Hequn Cheng 
>> Cc:Yang Wang ; Zhijiang <
>> wangzhijiang...@aliyun.com>; Zhenghua Gao ; godfrey he
>> ; dev ; user <
>> u...@flink.apache.org>
>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>
>> Thanks everyone~
>>
>> It's my pleasure to be part of the community. I hope I can make a better
>> contribution in future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng  wrote:
>> Congratulations Jingsong! Well deserved.
>>
>> Best,
>> Hequn
>>
>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang  wrote:
>> Congratulations!Jingsong. Well deserved.
>>
>>
>> Best,
>> Yang
>>
>> Zhijiang  于2020年2月21日周五 下午1:18写道:
>> Congrats Jingsong! Welcome on board!
>>
>> Best,
>> Zhijiang
>>
>> --
>> From:Zhenghua Gao 
>> Send Time:2020 Feb. 21 (Fri.) 12:49
>> To:godfrey he 
>> Cc:dev ; user 
>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>>
>> Congrats Jingsong!
>>
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he  wrote:
>> Congrats Jingsong! Well deserved.
>>
>> Best,
>> godfrey
>>
>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
>> Congratulations!Jingsong. You deserve it
>>
>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>> Congrats Jingsong!
>>
>> On Fri, 21 Feb 2020 at 11:41, Dian Fu  wrote:
>>
>> > Congrats Jingsong!
>> >
>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>> > >
>> > > Congratulations Jingsong! Well deserved.
>> > >
>> > > Best,
>> > > Jark
>> > >
>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >
>> > >> Congratulations! Jingsong
>> > >>
>> > >>
>> > >> Best,
>> > >> Dan Zou
>> > >>
>> >
>> >
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>>
>>


Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-02-17 Thread Bowen Li
Hi all,

If there's no more comments, I would like to kick off a vote for this FLIP
[1].

FYI, the flip number is changed to 93 since there was a race condition of
taking 92.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

On Wed, Jan 22, 2020 at 11:05 AM Bowen Li  wrote:

> Hi Flavio,
>
> First, this is a generic question on how flink-jdbc is set up, not
> specific to jdbc catalog, thus is better to be on its own thread.
>
> But to just quickly answer your question, you need to see where the
> incompatibility is. There may be incompatibility on 1) jdbc drivers and 2)
> the databases. 1) is fairly stable and back-compatible. 2) normally has
> things to do with your queries, not the driver.
>
>
>
> On Tue, Jan 21, 2020 at 3:21 PM Flavio Pompermaier 
> wrote:
>
>> Hi all,
>> I'm happy to see a lot of interest in easing the integration with JDBC
>> data
>> sources. Maybe this could be a rare situation (not in my experience
>> however..) but what if I have to connect to the same type of source (e.g.
>> Mysql) with 2 incompatible version...? How can I load the 2 (or more)
>> connectors jars without causing conflicts?
>>
>> Il Mar 14 Gen 2020, 23:32 Bowen Li  ha scritto:
>>
>> > Hi devs,
>> >
>> > I've updated the wiki according to feedbacks. Please take another look.
>> >
>> > Thanks!
>> >
>> >
>> > On Fri, Jan 10, 2020 at 2:24 PM Bowen Li  wrote:
>> >
>> > > Thanks everyone for the prompt feedback. Please see my response below.
>> > >
>> > > > In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the
>> > > java.time.Instant semantic, and should be mapped to Flink's
>> > TIME/TIMESTAMP
>> > > WITH LOCAL TIME ZONE
>> > >
>> > > Zhenghua, you are right that pg's 'timestamp with timezone' should be
>> > > translated into flink's 'timestamp with local timezone'. I don't find
>> > 'time
>> > > with (local) timezone' though, so we may not support that type from
>> pg in
>> > > Flink.
>> > >
>> > > > I suggest that the parameters can be completely consistent with the
>> > > JDBCTableSource / JDBCTableSink. If you take a look to JDBC api:
>> > > "DriverManager.getConnection".
>> > > That allow "default db, username, pwd" things optional. They can
>> included
>> > > in URL. Of course JDBC api also allows establishing connections to
>> > > different databases in a db instance. So I think we don't need
>> provide a
>> > > "base_url", we can just provide a real "url". To be consistent with
>> JDBC
>> > > api.
>> > >
>> > > Jingsong, what I'm saying is a builder can be added on demand later if
>> > > there's enough user requesting it, and doesn't need to be a core part
>> of
>> > > the FLIP.
>> > >
>> > > Besides, unfortunately Postgres doesn't allow changing databases via
>> > JDBC.
>> > >
>> > > JDBC provides different connecting options as you mentioned, but I'd
>> like
>> > > to keep our design and API simple and having to handle extra parsing
>> > logic.
>> > > And it doesn't shut the door for what you proposed as a future effort.
>> > >
>> > > > Since the PostgreSQL does not have catalog but schema under
>> database,
>> > > why not mapping the PG-database to Flink catalog and PG-schema to
>> Flink
>> > > database
>> > >
>> > > Danny, because 1) there are frequent use cases where users want to
>> switch
>> > > databases or referencing objects across databases in a pg instance 2)
>> > > schema is an optional namespace layer in pg, it always has a default
>> > value
>> > > ("public") and can be invisible to users if they'd like to as shown in
>> > the
>> > > FLIP 3) as you mentioned it is specific to postgres, and I don't feel
>> > it's
>> > > necessary to map Postgres substantially different than others DBMSs
>> with
>> > > additional complexity
>> > >
>> > > >'base_url' configuration: We are following the configuration format
>> > > guideline [1] which suggest to use dash (-) instead of underline (_).
>> And
>> > > I'm a little confused the meaning of "base_url" at the first glance,
>> > > another idea is split it into several configurations: 'dr

[jira] [Created] (FLINK-16107) github link on statefun.io should point to https://github.com/apache/flink-statefun

2020-02-16 Thread Bowen Li (Jira)
Bowen Li created FLINK-16107:


 Summary: github link on statefun.io should point to 
https://github.com/apache/flink-statefun
 Key: FLINK-16107
 URL: https://issues.apache.org/jira/browse/FLINK-16107
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Bowen Li
Assignee: Tzu-Li (Gordon) Tai


github link on statefun.io website should point to 
[https://github.com/apache/flink-statefun] rather than 
[https://github.com/ververica/stateful-functions]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16028) hbase connector's 'connector.table-name' property should be optional rather than required

2020-02-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-16028:


 Summary: hbase connector's 'connector.table-name' property should 
be optional rather than required
 Key: FLINK-16028
 URL: https://issues.apache.org/jira/browse/FLINK-16028
 Project: Flink
  Issue Type: Improvement
Reporter: Bowen Li


 

cc [~lzljs3620320] [~jark]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16027) kafka connector's 'connector.topic' property should be optional rather than required

2020-02-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-16027:


 Summary: kafka connector's 'connector.topic' property should be 
optional rather than required
 Key: FLINK-16027
 URL: https://issues.apache.org/jira/browse/FLINK-16027
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Kafka
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16024) support filter pushdown in jdbc connector

2020-02-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-16024:


 Summary: support filter pushdown in jdbc connector
 Key: FLINK-16024
 URL: https://issues.apache.org/jira/browse/FLINK-16024
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16023) jdbc connector's 'connector.table' property should be optional rather than required

2020-02-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-16023:


 Summary: jdbc connector's 'connector.table' property should be 
optional rather than required
 Key: FLINK-16023
 URL: https://issues.apache.org/jira/browse/FLINK-16023
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0


jdbc connector's 'connector.table' property should be optional rather than 
required.

 

connector should assume the table name in dbms is the same as that in Flink 
when this property is not present



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15986) support setting or changing session properties in Flink SQL

2020-02-10 Thread Bowen Li (Jira)
Bowen Li created FLINK-15986:


 Summary: support setting or changing session properties in Flink 
SQL
 Key: FLINK-15986
 URL: https://issues.apache.org/jira/browse/FLINK-15986
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API, Table SQL / Ecosystem
Reporter: Bowen Li
Assignee: Kurt Young
 Fix For: 1.11.0


as Flink SQL is more and more critical for user running batch jobs, 
experiments, and OLAP exploration, it's important than ever to support setting 
and changing session properties in Flink SQL.

 

Use cases include switching SQL dialects at runtime, switching job mode between 
"streaming" and "batch", changing other params defined in flink-conf.yaml and 
default-sql-client.yaml 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15985) offload runtime params from DDL to table hints in DML/queries

2020-02-10 Thread Bowen Li (Jira)
Bowen Li created FLINK-15985:


 Summary: offload runtime params from DDL to table hints in 
DML/queries
 Key: FLINK-15985
 URL: https://issues.apache.org/jira/browse/FLINK-15985
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Bowen Li
Assignee: Danny Chen
 Fix For: 1.11.0


background:

Currently Flink DDL mixes three types of params all together: 
 * External data’s metadata: defines what the data looks like (schema), where 
it is (location/url), how it should be accessed (username/pwd)
 * Source/sink runtime params: defines how and usually how fast Flink 
source/sink reads/writes data, not affecting the results
 * Kafka “sink-partitioner”
 * Elastic “bulk-flush.interval/max-size/...”


 * Semantics params: defines aspects like how much data Flink reads/writes, how 
the result will look like
 * Kafka “startup-mode”, “offset”
 * Watermark, timestamp column

 

Problems of the current mix-up: Flink cannot leverage catalogs and external 
system metadata alone to run queries with all the non-metadata params involved 
in DDL. E.g. when we add a catalog for Confluent Schema Registry, the expected 
user experience should be that Flink users just configure the catalog with url 
and usr/pwd, and should be able to run queries immediately; however, that’s not 
the case right now because users still have to use DDL to define a bunch params 
like “startup-mode”, “offset”, timestamp column, etc, along with the schema 
redundantly. We’ve heard many user complaints on this.

 

cc [~ykt836] [~lirui] [~lzljs3620320] [~jark] [~twalthr] [~dwysakowicz]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15984) support hive stream table sink

2020-02-10 Thread Bowen Li (Jira)
Bowen Li created FLINK-15984:


 Summary: support hive stream table sink
 Key: FLINK-15984
 URL: https://issues.apache.org/jira/browse/FLINK-15984
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Rui Li
 Fix For: 1.11.0


support hive stream table sink for stream processing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15983) add native reader for Hive parquet files

2020-02-10 Thread Bowen Li (Jira)
Bowen Li created FLINK-15983:


 Summary: add native reader for Hive parquet files
 Key: FLINK-15983
 URL: https://issues.apache.org/jira/browse/FLINK-15983
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15960) support creating Hive tables, views, functions within Flink

2020-02-09 Thread Bowen Li (Jira)
Bowen Li created FLINK-15960:


 Summary: support creating Hive tables, views, functions within 
Flink
 Key: FLINK-15960
 URL: https://issues.apache.org/jira/browse/FLINK-15960
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0


support creating Hive tables, views, functions within Flink, to achieve higher 
interoperability between Flink and Hive, and not requiring users to switch 
between Flink and Hive CLIs.

Have heard such requests from multiple Flink-Hive users

 

cc [~ykt836] [~lirui]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15933) update content of how generic table schema is stored in hive via HiveCatalog

2020-02-05 Thread Bowen Li (Jira)
Bowen Li created FLINK-15933:


 Summary: update content of how generic table schema is stored in 
hive via HiveCatalog
 Key: FLINK-15933
 URL: https://issues.apache.org/jira/browse/FLINK-15933
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.10.0
Reporter: Bowen Li
Assignee: Rui Li


FLINK-15858  updated how generic table schema is stored in hive metastore, need 
to go thru the documentation to update related content, like

[https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/hive_catalog.html#step-4-start-sql-client-and-create-a-kafka-table-with-flink-sql-ddl]

 

cc [~lzljs3620320]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Improve TableFactory to add Context

2020-02-05 Thread Bowen Li
+1, LGTM

On Tue, Feb 4, 2020 at 11:28 PM Jark Wu  wrote:

> +1 form my side.
> Thanks for driving this.
>
> Btw, could you also attach a JIRA issue with the changes described in it,
> so that users can find the issue through the mailing list in the future.
>
> Best,
> Jark
>
> On Wed, 5 Feb 2020 at 13:38, Kurt Young  wrote:
>
> > +1 from my side.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Feb 5, 2020 at 10:59 AM Jingsong Li 
> > wrote:
> >
> > > Hi all,
> > >
> > > Interface updated.
> > > Please re-vote.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Tue, Feb 4, 2020 at 1:28 PM Jingsong Li 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for the improve of
> > > > TableFactory, which is discussed and
> > > > reached a consensus in the discussion thread[2].
> > > >
> > > > The vote will be open for at least 72 hours. I'll try to close it
> > > > unless there is an objection or not enough votes.
> > > >
> > > > [1]
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
>


[jira] [Created] (FLINK-15809) component stack page needs to be updated for blink planner

2020-01-29 Thread Bowen Li (Jira)
Bowen Li created FLINK-15809:


 Summary: component stack page needs to be updated for blink planner
 Key: FLINK-15809
 URL: https://issues.apache.org/jira/browse/FLINK-15809
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.10.0


[https://ci.apache.org/projects/flink/flink-docs-master/internals/components.html]

needs to be updated to reflect latest stack components

 

cc [~ykt836] [~jark]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Yu Li became a Flink committer

2020-01-23 Thread Bowen Li
congrats!

On Thu, Jan 23, 2020 at 07:49 Kostas Kloudas  wrote:

> Congratulations Yu and welcome!
>
> On Thu, Jan 23, 2020 at 2:28 PM Till Rohrmann 
> wrote:
> >
> > Congrats Yu :-)
> >
> > On Thu, Jan 23, 2020 at 2:02 PM Yang Wang  wrote:
> >
> > > Congratulations, Yu.
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > tison  于2020年1月23日周四 下午7:07写道:
> > >
> > > > Congratulations!
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Dian Fu  于2020年1月23日周四 下午7:06写道:
> > > >
> > > > > Congrats Yu!
> > > > >
> > > > > > 在 2020年1月23日,下午6:47,Hequn Cheng  写道:
> > > > > >
> > > > > > Congratulations Yu!
> > > > > > Thanks a lot for being the release manager of the big 1.10
> release.
> > > You
> > > > > are
> > > > > > doing a very good job!
> > > > > >
> > > > > > Best, Hequn
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 23, 2020 at 6:29 PM Jingsong Li <
> jingsongl...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Congratulations Yu, well deserved!
> > > > > >>
> > > > > >> And thanks for your great contribution to the 1.10 release.
> > > > > >>
> > > > > >> Best,
> > > > > >> Jingsong Lee
> > > > > >>
> > > > > >> On Thu, Jan 23, 2020 at 6:14 PM Fabian Hueske <
> fhue...@gmail.com>
> > > > > wrote:
> > > > > >>
> > > > > >>> Congrats Yu!
> > > > > >>> Good to have you on board!
> > > > > >>>
> > > > > >>> Cheers, Fabian
> > > > > >>>
> > > > > >>> Am Do., 23. Jan. 2020 um 11:13 Uhr schrieb Piotr Nowojski <
> > > > > >>> pi...@ververica.com>:
> > > > > >>>
> > > > >  Congratulations! :)
> > > > > 
> > > > > > On 23 Jan 2020, at 10:48, Tzu-Li (Gordon) Tai <
> > > tzuli...@apache.org
> > > > >
> > > > >  wrote:
> > > > > >
> > > > > > Congratulations :)
> > > > > >
> > > > > > On Thu, Jan 23, 2020, 5:07 PM Yadong Xie <
> vthink...@gmail.com>
> > > > > >> wrote:
> > > > > >
> > > > > >> Well deserved!
> > > > > >>
> > > > > >> Yangze Guo  于2020年1月23日周四 下午5:06写道:
> > > > > >>
> > > > > >>> Congratulations!
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Yangze Guo
> > > > > >>>
> > > > > >>>
> > > > > >>> On Thu, Jan 23, 2020 at 4:59 PM Stephan Ewen <
> se...@apache.org
> > > >
> > > > > >>> wrote:
> > > > > 
> > > > >  Hi all!
> > > > > 
> > > > >  We are announcing that Yu Li has joined the rank of Flink
> > > > > >>> committers.
> > > > > 
> > > > >  Yu joined already in late December, but the announcement
> got
> > > > lost
> > > > > >> because
> > > > >  of the Christmas and New Years season, so here is a
> belated
> > > > proper
> > > > >  announcement.
> > > > > 
> > > > >  Yu is one of the main contributors to the state backend
> > > > components
> > > > > >>> in
> > > > > >> the
> > > > >  recent year, working on various improvements, for example
> the
> > > > > >>> RocksDB
> > > > >  memory management for 1.10.
> > > > >  He has also been one of the release managers for the big
> 1.10
> > > > > >>> release.
> > > > > 
> > > > >  Congrats for joining us, Yu!
> > > > > 
> > > > >  Best,
> > > > >  Stephan
> > > > > >>>
> > > > > >>
> > > > > 
> > > > > 
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best, Jingsong Lee
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
>


Re: [DISCUSS] Releasing Flink 1.9.2

2020-01-22 Thread Bowen Li
+1. Thanks Hequn

On Wed, Jan 22, 2020 at 8:39 AM Till Rohrmann  wrote:

> Thanks for resuming the discussion Hequn. +1 for starting with the RC
> creation. Thanks for driving the release process!
>
> Cheers,
> Till
>
> On Tue, Jan 21, 2020 at 11:02 PM jincheng sun 
> wrote:
>
> > Cool, looking forward the first RC of release 1.9.2.
> >
> > Best,
> > Jincheng
> >
> >
> > Hequn Cheng  于2020年1月21日周二 下午10:02写道:
> >
> > > Hi everyone,
> > >
> > > Considering that we are on the finishing line for release-1.10.0 and
> that
> > > there are no blockers for release-1.9.2, I'm proposing to resume this
> > > discussion.
> > > If there are no more concerns or any critical issues for releasing
> > 1.9.2, I
> > > would like to create the first RC soon :)
> > >
> > > Best,
> > > Hequn
> > >
> > > On Wed, Dec 18, 2019 at 7:10 PM Hequn Cheng 
> > wrote:
> > >
> > > > Hi Till,
> > > >
> > > > I agree with your concerns and thanks a lot for your feedback!
> > > >
> > > > In fact, I also have those concerns. The reasons I started the
> DISCUSS
> > > are:
> > > > - Considering the low capacities, I want to start the discussion
> > earlier
> > > > so that we can have more time to collect information and planning for
> > the
> > > > release. Usually, it takes a couple of weeks to address the blockers.
> > > > - The blockers of 1.9.2 are probably also blockers for 1.10.0, take
> > > > FLINK-15266 as an example. I can help to drive to solve the problem
> so
> > > that
> > > > 1.10.0 can also benefit from it.
> > > >
> > > > However, to ease the load, we can vote after Christmas if there are
> no
> > > > more blockers at that time.
> > > > There is a chance that the 1.10 blocker resolution has calmed down a
> > bit.
> > > >
> > > > Best,
> > > > Hequn
> > > >
> > > > On Wed, Dec 18, 2019 at 7:05 PM Hequn Cheng 
> > > wrote:
> > > >
> > > >> Hi Jincheng,
> > > >>
> > > >> Yes, your help would be very helpful. Thanks a lot!
> > > >>
> > > >> Best, Hequn
> > > >>
> > > >> On Wed, Dec 18, 2019 at 4:52 PM Till Rohrmann  >
> > > >> wrote:
> > > >>
> > > >>> Hi Hequn,
> > > >>>
> > > >>> thanks for starting this discussion. In general I think it is a
> good
> > > idea
> > > >>> to release often. Hence, I also believe it is time for another bug
> > fix
> > > >>> release for 1.9.
> > > >>>
> > > >>> The thing I'm wondering is whether we are stretching our resources
> a
> > > bit
> > > >>> too much if we now start with a 1.9.2 release vote because of the
> > > ongoing
> > > >>> 1.10 release testing work. My concern is that we don't have the
> > > >>> capacities
> > > >>> to properly test the 1.9 RC. Additionally, many people will be on
> > > >>> Christmas
> > > >>> vacations during the next 2 weeks. If this is it not a problem and
> we
> > > >>> have
> > > >>> enough resources for a parallel release then we can start it.
> > Otherwise
> > > >>> I'd
> > > >>> say that we wait until the 1.10 blocker resolution has calmed down
> a
> > > bit.
> > > >>>
> > > >>> Cheers,
> > > >>> Till
> > > >>>
> > > >>> On Wed, Dec 18, 2019 at 8:47 AM jincheng sun <
> > sunjincheng...@gmail.com
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> > Thanks for bring up the discussion Hequn. I would like to give
> you
> > a
> > > >>> hand
> > > >>> > at the last stage when the RC is finished.(If you need)  :)
> > > >>> >
> > > >>> > Best,
> > > >>> > Jincheng
> > > >>> >
> > > >>> > Best,
> > > >>> > Jincheng
> > > >>> > @sunjincheng121 
> > > >>> >
> > > >>> >
> > > >>> > Hequn Cheng  于2019年12月18日周三 下午3:44写道:
> > > >>> >
> > > >>> > > Hi everyone,
> > > >>> > >
> > > >>> > > It has already been two months since we released the Flink
> 1.9.1.
> > > >>> > > We already have many important bug fixes from which our users
> can
> > > >>> benefit
> > > >>> > > in the release-1.9 branch (85 resolved issues).
> > > >>> > > Therefore, I propose to create the next bug fix release for
> Flink
> > > >>> 1.9.
> > > >>> > >
> > > >>> > > Most notable fixes are:
> > > >>> > >
> > > >>> > > - [FLINK-14074] MesosResourceManager can't create new
> > taskmanagers
> > > in
> > > >>> > > Session Cluster Mode.
> > > >>> > > - [FLINK-14995] Kinesis NOTICE is incorrect
> > > >>> > > - [FLINK-15013] Flink (on YARN) sometimes needs too many slots
> > > >>> > > - [FLINK-15036] Container startup error will be handled out
> side
> > of
> > > >>> the
> > > >>> > > YarnResourceManager's main thread
> > > >>> > > - [FLINK-14315] NPE with JobMaster.disconnectTaskManager
> > > >>> > >
> > > >>> > > Furthermore, there is one issue marked as blocker for 1.9.2
> which
> > > >>> should
> > > >>> > be
> > > >>> > > merged before 1.9.2 release:
> > > >>> > >
> > > >>> > > - [FLINK-15266] NPE in blink planner code gen (reviewing)
> > > >>> > >
> > > >>> > > If there are any other blocker issues need to be fixed in
> 1.9.2,
> > > >>> please
> > > >>> > let
> > > >>> > > me know.
> > > >>> > > I will kick off the release process once blocker issues have
> been
> > > >>> 

Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-01-22 Thread Bowen Li
Hi Flavio,

First, this is a generic question on how flink-jdbc is set up, not specific
to jdbc catalog, thus is better to be on its own thread.

But to just quickly answer your question, you need to see where the
incompatibility is. There may be incompatibility on 1) jdbc drivers and 2)
the databases. 1) is fairly stable and back-compatible. 2) normally has
things to do with your queries, not the driver.



On Tue, Jan 21, 2020 at 3:21 PM Flavio Pompermaier 
wrote:

> Hi all,
> I'm happy to see a lot of interest in easing the integration with JDBC data
> sources. Maybe this could be a rare situation (not in my experience
> however..) but what if I have to connect to the same type of source (e.g.
> Mysql) with 2 incompatible version...? How can I load the 2 (or more)
> connectors jars without causing conflicts?
>
> Il Mar 14 Gen 2020, 23:32 Bowen Li  ha scritto:
>
> > Hi devs,
> >
> > I've updated the wiki according to feedbacks. Please take another look.
> >
> > Thanks!
> >
> >
> > On Fri, Jan 10, 2020 at 2:24 PM Bowen Li  wrote:
> >
> > > Thanks everyone for the prompt feedback. Please see my response below.
> > >
> > > > In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the
> > > java.time.Instant semantic, and should be mapped to Flink's
> > TIME/TIMESTAMP
> > > WITH LOCAL TIME ZONE
> > >
> > > Zhenghua, you are right that pg's 'timestamp with timezone' should be
> > > translated into flink's 'timestamp with local timezone'. I don't find
> > 'time
> > > with (local) timezone' though, so we may not support that type from pg
> in
> > > Flink.
> > >
> > > > I suggest that the parameters can be completely consistent with the
> > > JDBCTableSource / JDBCTableSink. If you take a look to JDBC api:
> > > "DriverManager.getConnection".
> > > That allow "default db, username, pwd" things optional. They can
> included
> > > in URL. Of course JDBC api also allows establishing connections to
> > > different databases in a db instance. So I think we don't need provide
> a
> > > "base_url", we can just provide a real "url". To be consistent with
> JDBC
> > > api.
> > >
> > > Jingsong, what I'm saying is a builder can be added on demand later if
> > > there's enough user requesting it, and doesn't need to be a core part
> of
> > > the FLIP.
> > >
> > > Besides, unfortunately Postgres doesn't allow changing databases via
> > JDBC.
> > >
> > > JDBC provides different connecting options as you mentioned, but I'd
> like
> > > to keep our design and API simple and having to handle extra parsing
> > logic.
> > > And it doesn't shut the door for what you proposed as a future effort.
> > >
> > > > Since the PostgreSQL does not have catalog but schema under database,
> > > why not mapping the PG-database to Flink catalog and PG-schema to Flink
> > > database
> > >
> > > Danny, because 1) there are frequent use cases where users want to
> switch
> > > databases or referencing objects across databases in a pg instance 2)
> > > schema is an optional namespace layer in pg, it always has a default
> > value
> > > ("public") and can be invisible to users if they'd like to as shown in
> > the
> > > FLIP 3) as you mentioned it is specific to postgres, and I don't feel
> > it's
> > > necessary to map Postgres substantially different than others DBMSs
> with
> > > additional complexity
> > >
> > > >'base_url' configuration: We are following the configuration format
> > > guideline [1] which suggest to use dash (-) instead of underline (_).
> And
> > > I'm a little confused the meaning of "base_url" at the first glance,
> > > another idea is split it into several configurations: 'driver',
> > 'hostname',
> > > 'port'.
> > >
> > > Jark, I agreed we should use "base-url" in yaml config.
> > >
> > > I'm not sure about having hostname and port separately because you can
> > > specify multiple hosts with ports in jdbc, like
> > > "jdbc:dbms/host1:port1,host2:port2/", for connection failovers.
> > Separating
> > > them would make configurations harder.
> > >
> > > I will add clear doc and example to avoid any possible confusion.
> > >
> > > > 'default-database' is optional, then which database will be used or
> > what
> > > is the behavior when the 

[jira] [Created] (FLINK-15645) enable COPY TO/FROM in Postgres JDBC source/sink for faster batch processing

2020-01-18 Thread Bowen Li (Jira)
Bowen Li created FLINK-15645:


 Summary: enable COPY TO/FROM in Postgres JDBC source/sink for 
faster batch processing
 Key: FLINK-15645
 URL: https://issues.apache.org/jira/browse/FLINK-15645
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0


Postgres has its own SQL extension as COPY FROM/TO via JDBC for faster bulk 
loading/reading [https://www.postgresql.org/docs/12/sql-copy.html]

Flink should be able to support that for batch use cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Bowen Li
Congrats!

On Thu, Jan 16, 2020 at 13:45 Peter Huang 
wrote:

> Congratulations, Dian!
>
>
> Best Regards
> Peter Huang
>
> On Thu, Jan 16, 2020 at 11:04 AM Yun Tang  wrote:
>
>> Congratulations, Dian!
>>
>> Best
>> Yun Tang
>> --
>> *From:* Benchao Li 
>> *Sent:* Thursday, January 16, 2020 22:27
>> *To:* Congxian Qiu 
>> *Cc:* dev@flink.apache.org ; Jingsong Li <
>> jingsongl...@gmail.com>; jincheng sun ; Shuo
>> Cheng ; Xingbo Huang ; Wei Zhong
>> ; Hequn Cheng ; Leonard Xu
>> ; Jeff Zhang ; user <
>> u...@flink.apache.org>; user-zh 
>> *Subject:* Re: [ANNOUNCE] Dian Fu becomes a Flink committer
>>
>> Congratulations Dian.
>>
>> Congxian Qiu  于2020年1月16日周四 下午10:15写道:
>>
>> > Congratulations Dian Fu
>> >
>> > Best,
>> > Congxian
>> >
>> >
>> > Jark Wu  于2020年1月16日周四 下午7:44写道:
>> >
>> >> Congratulations Dian and welcome on board!
>> >>
>> >> Best,
>> >> Jark
>> >>
>> >> On Thu, 16 Jan 2020 at 19:32, Jingsong Li 
>> wrote:
>> >>
>> >> > Congratulations Dian Fu. Well deserved!
>> >> >
>> >> > Best,
>> >> > Jingsong Lee
>> >> >
>> >> > On Thu, Jan 16, 2020 at 6:26 PM jincheng sun <
>> sunjincheng...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Congrats Dian Fu and welcome on board!
>> >> >>
>> >> >> Best,
>> >> >> Jincheng
>> >> >>
>> >> >> Shuo Cheng  于2020年1月16日周四 下午6:22写道:
>> >> >>
>> >> >>> Congratulations!  Dian Fu
>> >> >>>
>> >> >>> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道:  jincheng sun
>> >> >>> 于2020年1月16日周四 下午5:58写道:
>> >> >>>
>> >> >>
>> >> >
>> >> > --
>> >> > Best, Jingsong Lee
>> >> >
>> >>
>> >
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>


Re: [DISCUSS] Improve TableFactory

2020-01-15 Thread Bowen Li
Hi Jingsong,

The 1st and 2nd pain points you described are very valid, as I'm more
familiar with them. I agree these are shortcomings of the current Flink SQL
design.

A couple comments on your 1st proposal:

1. is it better to have explicit APIs like "createBatchTableSource(...)"
and "createStreamingTableSource(...)" in TableSourceFactory (would be
similar for sink factory) to let planner handle which mode (streaming vs
batch) of source should be instantiated? That way we don't need to always
let connector developers handling an if-else on isStreamingMode.
2. I'm not sure of the benefits to have a CatalogTableContext class. The
path, table, and config are fairly independent of each other. So why not
pass the config in as 3rd parameter as `createXxxTableSource(path,
catalogTable, tableConfig)?


On Tue, Jan 14, 2020 at 7:03 PM Jingsong Li  wrote:

> Hi dev,
>
> I'd like to kick off a discussion on the improvement of TableSourceFactory
> and TableSinkFactory.
>
> Motivation:
> Now the main needs and problems are:
> 1.Connector can't get TableConfig [1], and some behaviors really need to be
> controlled by the user's table configuration. In the era of catalog, we
> can't put these config in connector properties, which is too inconvenient.
> 2.Connector can't know if this is batch or stream execution mode. But the
> sink implementation of batch and stream is totally different. I understand
> there is an update mode property now, but it splits the batch and stream in
> the catalog dimension. In fact, this information can be obtained through
> the current TableEnvironment.
> 3.No interface to call validation. Now our validation is more util classes.
> It depends on whether or not the connector calls. Now we have some new
> validations to add, such as [2], which is really confuse uses, even
> developers. Another problem is that our SQL update (DDL) does not have
> validation [3]. It is better to report an error when executing DDL,
> otherwise it will confuse the user.
>
> Proposed change draft for 1 and 2:
>
> interface CatalogTableContext {
>ObjectPath getTablePath();
>CatalogTable getTable();
>ReadableConfig getTableConfig();
>boolean isStreamingMode();
> }
>
> public interface TableSourceFactory extends TableFactory {
>
>default TableSource createTableSource(CatalogTableContext context) {
>   return createTableSource(context.getTablePath(), context.getTable());
>}
>
>..
> }
>
> Proposed change draft for 3:
>
> public interface TableFactory {
>
>TableValidators validators();
>
>interface TableValidators {
>   ConnectorDescriptorValidator connectorValidator();
>   TableSchemaValidator schemaValidator();
>   FormatDescriptorValidator formatValidator();
>}
> }
>
> What do you think?
>
> [1] https://issues.apache.org/jira/browse/FLINK-15290
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-A-mechanism-to-validate-the-precision-of-columns-for-connectors-td36552.html#a36556
> [3] https://issues.apache.org/jira/browse/FLINK-15509
>
> Best,
> Jingsong Lee
>


[jira] [Created] (FLINK-15607) throw exception when users trying to use Hive aggregate functions in streaming mode

2020-01-15 Thread Bowen Li (Jira)
Bowen Li created FLINK-15607:


 Summary: throw exception when users trying to use Hive aggregate 
functions in streaming mode
 Key: FLINK-15607
 URL: https://issues.apache.org/jira/browse/FLINK-15607
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive, Table SQL / API
Reporter: Bowen Li


Seems need to distinguish execution mode in FunctionCatalogOperatorTable, which 
is not achievable yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15593) add doc to remind users not using Hive aggregate functions in streaming mode

2020-01-14 Thread Bowen Li (Jira)
Bowen Li created FLINK-15593:


 Summary: add doc to remind users not using Hive aggregate 
functions in streaming mode
 Key: FLINK-15593
 URL: https://issues.apache.org/jira/browse/FLINK-15593
 Project: Flink
  Issue Type: Task
  Components: Documentation
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0


Note that Hive scalar and table functions implementing UDF, GenericUDF, and 
GenericUDTF interfaces should be good to run in both 
streaming and batch mode in Flink.

Due to that Hive functions are all built for batch processing, aggregate 
functions in Hive that implement UDAF and GenericUDAFResolver2 
interfaces may have unpredictable behaviors when used in streaming mode in 
Flink. We advice users to only use Hive aggregate functions
interfaces in batch mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15591) support CREATE TEMPORARY TABLE/VIEW in DDL

2020-01-14 Thread Bowen Li (Jira)
Bowen Li created FLINK-15591:


 Summary: support CREATE TEMPORARY TABLE/VIEW in DDL
 Key: FLINK-15591
 URL: https://issues.apache.org/jira/browse/FLINK-15591
 Project: Flink
  Issue Type: Task
  Components: Table SQL / API
Reporter: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15590) add section for current catalog and current database

2020-01-14 Thread Bowen Li (Jira)
Bowen Li created FLINK-15590:


 Summary: add section for current catalog and current database
 Key: FLINK-15590
 URL: https://issues.apache.org/jira/browse/FLINK-15590
 Project: Flink
  Issue Type: Task
  Components: Deployment / Docker
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15589) remove beta tag from catalog and hive doc

2020-01-14 Thread Bowen Li (Jira)
Bowen Li created FLINK-15589:


 Summary: remove beta tag from catalog and hive doc
 Key: FLINK-15589
 URL: https://issues.apache.org/jira/browse/FLINK-15589
 Project: Flink
  Issue Type: Task
  Components: Documentation
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-01-14 Thread Bowen Li
Hi devs,

I've updated the wiki according to feedbacks. Please take another look.

Thanks!


On Fri, Jan 10, 2020 at 2:24 PM Bowen Li  wrote:

> Thanks everyone for the prompt feedback. Please see my response below.
>
> > In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the
> java.time.Instant semantic, and should be mapped to Flink's TIME/TIMESTAMP
> WITH LOCAL TIME ZONE
>
> Zhenghua, you are right that pg's 'timestamp with timezone' should be
> translated into flink's 'timestamp with local timezone'. I don't find 'time
> with (local) timezone' though, so we may not support that type from pg in
> Flink.
>
> > I suggest that the parameters can be completely consistent with the
> JDBCTableSource / JDBCTableSink. If you take a look to JDBC api:
> "DriverManager.getConnection".
> That allow "default db, username, pwd" things optional. They can included
> in URL. Of course JDBC api also allows establishing connections to
> different databases in a db instance. So I think we don't need provide a
> "base_url", we can just provide a real "url". To be consistent with JDBC
> api.
>
> Jingsong, what I'm saying is a builder can be added on demand later if
> there's enough user requesting it, and doesn't need to be a core part of
> the FLIP.
>
> Besides, unfortunately Postgres doesn't allow changing databases via JDBC.
>
> JDBC provides different connecting options as you mentioned, but I'd like
> to keep our design and API simple and having to handle extra parsing logic.
> And it doesn't shut the door for what you proposed as a future effort.
>
> > Since the PostgreSQL does not have catalog but schema under database,
> why not mapping the PG-database to Flink catalog and PG-schema to Flink
> database
>
> Danny, because 1) there are frequent use cases where users want to switch
> databases or referencing objects across databases in a pg instance 2)
> schema is an optional namespace layer in pg, it always has a default value
> ("public") and can be invisible to users if they'd like to as shown in the
> FLIP 3) as you mentioned it is specific to postgres, and I don't feel it's
> necessary to map Postgres substantially different than others DBMSs with
> additional complexity
>
> >'base_url' configuration: We are following the configuration format
> guideline [1] which suggest to use dash (-) instead of underline (_). And
> I'm a little confused the meaning of "base_url" at the first glance,
> another idea is split it into several configurations: 'driver', 'hostname',
> 'port'.
>
> Jark, I agreed we should use "base-url" in yaml config.
>
> I'm not sure about having hostname and port separately because you can
> specify multiple hosts with ports in jdbc, like
> "jdbc:dbms/host1:port1,host2:port2/", for connection failovers. Separating
> them would make configurations harder.
>
> I will add clear doc and example to avoid any possible confusion.
>
> > 'default-database' is optional, then which database will be used or what
> is the behavior when the default database is not selected.
>
> This should be DBMS specific. For postgres, it will be the 
> database.
>
>
> On Thu, Jan 9, 2020 at 9:48 PM Zhenghua Gao  wrote:
>
>> Hi Bowen, Thanks for driving this.
>> I think it would be very convenience to use tables in external DBs with
>> JDBC Catalog.
>>
>> I have one concern about "Flink-Postgres Data Type Mapping" part:
>>
>> In Postgress, the TIME/TIMESTAMP WITH TIME ZONE has the java.time.Instant
>> semantic,
>> and should be mapped to Flink's TIME/TIMESTAMP WITH LOCAL TIME ZONE
>>
>> *Best Regards,*
>> *Zhenghua Gao*
>>
>>
>> On Fri, Jan 10, 2020 at 11:09 AM Jingsong Li 
>> wrote:
>>
>> > Hi Bowen, thanks for reply and updating.
>> >
>> > > I don't see much value in providing a builder for jdbc catalogs, as
>> they
>> > only have 4 or 5 required params, no optional ones. I prefer users just
>> > provide a base url without default db, usrname, pwd so we don't need to
>> > parse url all around, as I mentioned jdbc catalog may need to establish
>> > connections to different databases in a db instance,
>> >
>> > I suggest that the parameters can be completely consistent with the
>> > JDBCTableSource / JDBCTableSink.
>> > If you take a look to JDBC api: "DriverManager.getConnection".
>> > That allow "default db, username, pwd" things optional. They can
>> included
>> > in URL. Of course JDBC api also allows establishing connections to
>> > different databases in a db inst

[jira] [Created] (FLINK-15588) check registered udf via catalog API cannot be a scala inner class

2020-01-14 Thread Bowen Li (Jira)
Bowen Li created FLINK-15588:


 Summary: check registered udf via catalog API cannot be a scala 
inner class
 Key: FLINK-15588
 URL: https://issues.apache.org/jira/browse/FLINK-15588
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Bowen Li
 Fix For: 1.11.0


scala inner class cannot be instantiated via reflection directly. thus they 
cannot be catalog functions stored as full class name. 

we should check that in catalog API to make sure we remind users of it with 
proper error messages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15576) remove isTemporary property from CatalogFunction API

2020-01-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15576:


 Summary: remove isTemporary property from CatalogFunction API
 Key: FLINK-15576
 URL: https://issues.apache.org/jira/browse/FLINK-15576
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.10.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0


according to FLIP-79, CatalogFunction shouldn't have "isTemporary" property. 
Moving that from CatalogFunction to Create/AlterCatalogFunctionOperation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] A mechanism to validate the precision of columns for connectors

2020-01-10 Thread Bowen Li
Hi Zhenghua,

For external systems with schema, I think the schema information is
available most of the time and should be the single source of truth to
programmatically mapping column precision via Flink catalogs, to minimize
users efforts creating schema redundantly again and avoid any human errors.
They will be a subset of the systems supported types and precision, and
thus you don't need to validate the 1st category of "the ability of
external system". It would apply to most schema storage system, like
relational DBMS, hive metastore, avro schema in confluent schema registry
for kafka.

>From my observation, the real problem right now is Flink cannot truly
leverage external system schemas via Flink Catalogs, as documented in [1].

I'm not sure if there's any unsolvable network or authorization problems,
as most systems nowadays can be read with simple access id/key pair via
vpc, intranet, or internet. What problems have you ran into?

For schemaless systems, we'd have to rely on full testing coverage in Flink.

[1] https://issues.apache.org/jira/browse/FLINK-15545

On Fri, Jan 10, 2020 at 1:12 AM Zhenghua Gao  wrote:

> Hi Jingsong Lee
>
> You are right that the connectors don't validate data types either now.
> We seems lack a mechanism to validate with properties[1], data types, etc
> for CREATE TABLE.
>
> [1] https://issues.apache.org/jira/browse/FLINK-15509
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Fri, Jan 10, 2020 at 2:59 PM Jingsong Li 
> wrote:
>
> > Hi Zhenghua,
> >
> > I think it's not just about precision of type. Connectors not validate
> the
> > types either.
> > Now there is "SchemaValidator", this validator is just used to validate
> > type properties. But not for connector type support.
> > I think we can have something like "DataTypeValidator" to help connectors
> > validating their type support.
> >
> > Consider current validator design, validator is called by connector
> itself.
> > it's more like a util class than a mechanism.
> >
> > Best,
> > Jingsong Lee
> >
> > On Fri, Jan 10, 2020 at 11:47 AM Zhenghua Gao  wrote:
> >
> > > Hi dev,
> > >
> > > I'd like to kick off a discussion on a mechanism to validate the
> > precision
> > > of columns for some connectors.
> > >
> > > We come to an agreement that the user should be informed if the
> connector
> > > does not support the desired precision. And from the connector
> > developer's
> > > view, there are 3-levels information to be considered:
> > >
> > >-  the ability of external systems (e.g. Apache Derby support
> > >TIMESTAMP(9), Mysql support TIMESTAMP(6), etc)
> > >
> > > Connector developers should use this information to validate user's DDL
> > and
> > > make sure throw an exception if concrete column is out of range.
> > >
> > >
> > >- schema of referenced tables in external systems
> > >
> > > If the schema information of referenced tables is available in Compile
> > > Time, connector developers could use it to find the mismatch between
> DDL.
> > > But in most cases, the schema information is unavailable because of
> > network
> > > isolation or authority management. We should use it with caution.
> > >
> > >
> > >- schema-less external systems (e.g. HBase)
> > >
> > > If the external systems is schema-less like HBase, the connector
> > developer
> > > should make sure the connector doesn't cause precision loss (e.g.
> > > flink-hbase serializes java.sql.Timestamp to long in bytes which only
> > keep
> > > millisecond's precision.)
> > >
> > > To make it more specific, some scenarios of JDBC Connector are list as
> > > following:
> > >
> > >- The underlying DB supports DECIMAL(65, 30), which is out of the
> > range
> > >of Flink's Decimal
> > >- The underlying DB supports TIMESTAMP(6), and user want to define a
> > >table with TIMESTAMP(9) in Flink
> > >- User defines a table with DECIMAL(10, 4) in underlying DB, and
> want
> > to
> > >define a table with DECIMAL(5, 2) in Flink
> > >- The precision of the underlying DB varies between different
> versions
> > >
> > >
> > > What do you think about this? any feedback are appreciates.
> > >
> > > *Best Regards,*
> > > *Zhenghua Gao*
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>


Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-01-10 Thread Bowen Li
; >
> > > 1) 'base_url' configuration: We are following the configuration format
> > > guideline [1] which suggest to use dash (-) instead of underline (_).
> > >  And I'm a little confused the meaning of "base_url" at the first
> > > glance, another idea is split it into several configurations: 'driver',
> > > 'hostname', 'port'.
> > >
> > > 2) 'default-database' is optional, then which database will be used or
> > what
> > > is the behavior when the default database is not selected.
> > >
> > > 3) a builder for jdbc catalogs: I agree with Jingsong to provide a
> > builder.
> > > Because there is optional configuration here (the default database),
> > >and providind Builder as the API will be easier for evolution, I'm
> not
> > > sure we won't add/modify parameters in the future.
> > >
> > > [1]:
> > >
> > >
> >
> https://flink.apache.org/contributing/code-style-and-quality-components.html#configuration-changes
> > >
> > > On Fri, 10 Jan 2020 at 04:52, Bowen Li  wrote:
> > >
> > > > Hi Jark and Jingsong,
> > > >
> > > > Thanks for your review. Please see my reply in line.
> > > >
> > > > > why introducing a `PostgresJDBCCatalog`, not a generic
> `JDBCCatalog`
> > > > (catalog.type = 'postgres' vs 'jdbc') ?
> > > >
> > > > Thanks for the reminding and I looked at JDBCDialect. A generic,
> > > > user-facing JDBCCatalog with catalog.type = jdbc and find specific db
> > > > implementations (pg v.s. mysql v.s. ...) is more aligned with how
> jdbc
> > > > sink/source is handled, indeed. However, the catalogs would also need
> > to
> > > > execute the query and parse query results in a db-dependent way. E.g.
> > > jdbc
> > > > catalog needs to establish connections to different databases within
> a
> > db
> > > > instance on demand. So just having JDBCDialect won't be enough.
> > > >
> > > > I think we can do the following:
> > > >   - provide a user-facing JDBCCatalog, composing a db-specific impl
> > like
> > > > PostgresJDBCCatalog and MySQLJDBCCatalog. Users still specify "jdbc"
> as
> > > > type in both Table API and SQL CLI, internally it will create a
> > > db-specific
> > > > impl depending on jdbc base url.
> > > >   - some statements can reside in JDBCDialect. Query execution and
> > result
> > > > parsing logic would be located in db-specific impls.
> > > >
> > > > - We can provide a Builder for Catalog, In my opinion,
> defaultDatabase,
> > > > username, pwd can be included in JDBC DB url.
> > > >
> > > > I don't see much value in providing a builder for jdbc catalogs, as
> > they
> > > > only have 4 or 5 required params, no optional ones. I prefer users
> just
> > > > provide a base url without default db, usrname, pwd so we don't need
> to
> > > > parse url all around, as I mentioned jdbc catalog may need to
> establish
> > > > connections to different databases in a db instance,
> > > >
> > > > - About timestamp and time, write down the specific Flink precision
> of
> > > > Postgres?
> > > >
> > > > I've documented that. It's 0-6
> > > >
> > > > - I think there is a part missing in your document, that is how to
> use
> > > this
> > > > catalog. If you can write a complete example, I think it will be much
> > > > clearer.
> > > >
> > > > I added some examples in both table api and SQL Cli. It will be no
> > > > different from existing catalogs.
> > > >
> > > > - So a thing is what TableFactory will this catalog use? For example,
> > > > JDBCTableSourceSinkFactory has different parameters for source or
> sink?
> > > How
> > > > do you think about it?
> > > >
> > > > This catalog will directly call JDBCTableSourceSinkFactory without
> > going
> > > > thru service discovery because we are sure it's a jdbc table. I added
> > it
> > > to
> > > > the doc.
> > > >
> > > > For the different params besides schema, as we discussed offline,
> > > > unfortunately we can't do anything right now until Flink DDL/DML are
> > able
> > > > to distinguish 3 types of params - external data's metada,
> source/sink

[jira] [Created] (FLINK-15545) Separate runtime params and semantics params from Flink DDL for easier integration with catalogs and better user experience

2020-01-09 Thread Bowen Li (Jira)
Bowen Li created FLINK-15545:


 Summary: Separate runtime params and semantics params from Flink 
DDL for easier integration with catalogs and better user experience
 Key: FLINK-15545
 URL: https://issues.apache.org/jira/browse/FLINK-15545
 Project: Flink
  Issue Type: Improvement
Reporter: Bowen Li
Assignee: Bowen Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-01-09 Thread Bowen Li
Hi Jark and Jingsong,

Thanks for your review. Please see my reply in line.

> why introducing a `PostgresJDBCCatalog`, not a generic `JDBCCatalog`
(catalog.type = 'postgres' vs 'jdbc') ?

Thanks for the reminding and I looked at JDBCDialect. A generic,
user-facing JDBCCatalog with catalog.type = jdbc and find specific db
implementations (pg v.s. mysql v.s. ...) is more aligned with how jdbc
sink/source is handled, indeed. However, the catalogs would also need to
execute the query and parse query results in a db-dependent way. E.g. jdbc
catalog needs to establish connections to different databases within a db
instance on demand. So just having JDBCDialect won't be enough.

I think we can do the following:
  - provide a user-facing JDBCCatalog, composing a db-specific impl like
PostgresJDBCCatalog and MySQLJDBCCatalog. Users still specify "jdbc" as
type in both Table API and SQL CLI, internally it will create a db-specific
impl depending on jdbc base url.
  - some statements can reside in JDBCDialect. Query execution and result
parsing logic would be located in db-specific impls.

- We can provide a Builder for Catalog, In my opinion, defaultDatabase,
username, pwd can be included in JDBC DB url.

I don't see much value in providing a builder for jdbc catalogs, as they
only have 4 or 5 required params, no optional ones. I prefer users just
provide a base url without default db, usrname, pwd so we don't need to
parse url all around, as I mentioned jdbc catalog may need to establish
connections to different databases in a db instance,

- About timestamp and time, write down the specific Flink precision of
Postgres?

I've documented that. It's 0-6

- I think there is a part missing in your document, that is how to use this
catalog. If you can write a complete example, I think it will be much
clearer.

I added some examples in both table api and SQL Cli. It will be no
different from existing catalogs.

- So a thing is what TableFactory will this catalog use? For example,
JDBCTableSourceSinkFactory has different parameters for source or sink? How
do you think about it?

This catalog will directly call JDBCTableSourceSinkFactory without going
thru service discovery because we are sure it's a jdbc table. I added it to
the doc.

For the different params besides schema, as we discussed offline,
unfortunately we can't do anything right now until Flink DDL/DML are able
to distinguish 3 types of params - external data's metada, source/sink
runtime params, and Flink semantics params. The latter two can't be
provided by catalogs. The problem is actually general to all catalogs, not
just JDBCCatalog. I'm pushing for such an effort to solve it. At this
moment we can only use some default params for some cases, and the other
cases cannot take advantage of the JDBC catalog and users still have to
write DDL manually.

Thanks,
Bowen

On Wed, Jan 8, 2020 at 7:46 PM Jingsong Li  wrote:

> Thanks Bowen for driving this,
>
> +1 for this, The DDL schema definition is a headache for users, and catalog
> is a solution to this problem.
>
> I have some questions and suggestions:
>
> - We can provide a Builder for Catalog, In my opinion, defaultDatabase,
> username, pwd can be included in JDBC DB url.
>
> - About timestamp and time, write down the specific Flink precision of
> Postgres?
>
> - I think there is a part missing in your document, that is how to use this
> catalog. If you can write a complete example, I think it will be much
> clearer.
>
> - So a thing is what TableFactory will this catalog use? For example,
> JDBCTableSourceSinkFactory has different parameters for source or sink? How
> do you think about it?
>
> Best,
> Jingsong Lee
>
> On Thu, Jan 9, 2020 at 11:33 AM Jark Wu  wrote:
>
> > Thanks Bowen for driving this.
> >
> > +1 to this feature.
> >
> > My concern is that why introducing a `PostgresJDBCCatalog`, not a generic
> > `JDBCCatalog` (catalog.type = 'postgres' vs 'jdbc') ?
> > From my understanding, JDBC catalog is similar to JDBC source/sink. For
> > JDBC source/sink, we have a generic
> > implementation for JDBC and delegate operations to JDBCDialect. Different
> > driver may have different implementation of
> > JDBCDialect, e.g `quoteIdentifier()`.
> >
> > For JDBC catalog, I guess maybe we can do it in the same way, i.e. a
> > generic JDBCCatalog implementation and delegate
> > operations to JDBCDialect, and we will have `listDataBase()`,
> > `listTables()` interfaces in JDBCDialect. The benefit is that:
> > 0) reuse the existing `JDBCDialect`, I guess JDBCCatalog also need to
> quote
> > identifiers.
> > 1) we can easily to support a new database catalog (e.g. mysql) by
> > implementing new dialects (e.g. MySQLDialect).
> > 2) this can keep the same behavior as JDBC source/sink, i.e.
> 

Re: [DISCUSS] FLIP-72: Introduce Pulsar Connector

2020-01-08 Thread Bowen Li
Hi Yijie,

There's just one more concern on the yaml configs. Otherwise, I think we
should be good to go.

Can you update your PR and ensure all tests pass? I can help review and
merge in the next couple weeks.

Thanks,
Bowen


On Mon, Dec 23, 2019 at 7:03 PM Yijie Shen 
wrote:

> Hi Bowen,
>
> I've done updated the design doc, PTAL.
> Btw the PR for catalog is https://github.com/apache/flink/pull/10455,
> could
> you please take a look?
>
> Best,
> Yijie
>
> On Mon, Dec 9, 2019 at 8:44 AM Bowen Li  wrote:
>
> > Hi Yijie,
> >
> > I took a look at the design doc. LGTM overall, left a few questions.
> >
> > On Tue, Dec 3, 2019 at 10:39 PM Becket Qin  wrote:
> >
> > > Yes, you are absolutely right. Cannot believe I posted in the wrong
> > > thread...
> > >
> > > On Wed, Dec 4, 2019 at 1:46 PM Jark Wu  wrote:
> > >
> > >> Thanks Becket the the updating,
> > >>
> > >> But shouldn't this message be posted in FLIP-27 discussion thread[1]?
> > >>
> > >>
> > >> Best,
> > >> Jark
> > >>
> > >> [1]:
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952.html
> > >>
> > >> On Wed, 4 Dec 2019 at 12:12, Becket Qin  wrote:
> > >>
> > >> > Hi all,
> > >> >
> > >> > Sorry for the long belated update. I have updated FLIP-27 wiki page
> > with
> > >> > the latest proposals. Some noticeable changes include:
> > >> > 1. A new generic communication mechanism between SplitEnumerator and
> > >> > SourceReader.
> > >> > 2. Some detail API method signature changes.
> > >> >
> > >> > We left a few things out of this FLIP and will address them in
> > separate
> > >> > FLIPs. Including:
> > >> > 1. Per split event time.
> > >> > 2. Event time alignment.
> > >> > 3. Fine grained failover for SplitEnumerator failure.
> > >> >
> > >> > Please let us know if you have any question.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Jiangjie (Becket) Qin
> > >> >
> > >> > On Tue, Nov 19, 2019 at 10:28 AM Yijie Shen <
> > henry.yijies...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi everyone,
> > >> > >
> > >> > > I've put the catalog part design in separate doc with more details
> > for
> > >> > > easier communication.
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1LMnABtXn-wQedsmWv8hopvx-B-jbdr8-jHbIiDhdsoE/edit?usp=sharing
> > >> > >
> > >> > > I would love to hear your thoughts on this.
> > >> > >
> > >> > > Best,
> > >> > > Yijie
> > >> > >
> > >> > > On Mon, Oct 21, 2019 at 11:15 AM Yijie Shen <
> > >> henry.yijies...@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > Hi everyone,
> > >> > > >
> > >> > > > Glad to receive your valuable feedbacks.
> > >> > > >
> > >> > > > I'd first separate the Pulsar catalog as another doc and show
> more
> > >> > design
> > >> > > > and implementation details there.
> > >> > > >
> > >> > > > For the current FLIP-72, I would separate it into the sink part
> > for
> > >> > > > current work and keep the source part as future works until we
> > reach
> > >> > > > FLIP-27 finals.
> > >> > > >
> > >> > > > I also reply to some of the comments in the design doc. I will
> > >> rewrite
> > >> > > the
> > >> > > > catalog part in regarding to Bowen's advice in both email and
> > >> comments.
> > >> > > >
> > >> > > > Thanks for the help again.
> > >> > > >
> > >> > > > Best,
> > >> > > > Yijie
> > >> > > >
> > >> > > > On Fri, Oct 18, 2019 at 12:40 AM Rong Rong  >
> > >> > wrote:
> >

[DISCUSS] FLIP-92: JDBC catalog and Postgres catalog

2020-01-08 Thread Bowen Li
Hi dev,

I'd like to kick off a discussion on adding JDBC catalogs, specifically
Postgres catalog in Flink [1].

Currently users have to manually create schemas in Flink source/sink
mirroring tables in their relational databases in use cases like JDBC
read/write and consuming CDC. Many users have complaint about the
unnecessary, redundant, manual work. Any mismatch can lead to a failing
Flink job at runtime instead of compile time. All these have been quite
unpleasant, resulting in a broken user experience.

We want to provide a JDBC catalog interface and a Postgres implementation
for Flink as a start to connect to all kinds of relational databases,
enabling Flink SQL to 1) retrieve table schema automatically without
requiring user writes duped DDL 2) check at compile time for schema errors.
It will greatly streamline user experiences when using Flink to deal with
popular relational databases like Postgres, MySQL, MariaDB, AWS Aurora, etc.

Note that the problem and solution are actually very general to Flink when
connecting to all kinds of external systems. We just focus on solving that
for relational databases in this FLIP.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-92%3A+JDBC+catalog+and+Postgres+catalog


Re: [DISCUSS] FLIP-91 - Support SQL Client Gateway

2020-01-04 Thread Bowen Li
+1. It will improve user experience quite a bit.


On Thu, Jan 2, 2020 at 22:07 Yangze Guo  wrote:

> Thanks for driving this, Xiaoling!
>
> +1 for supporting SQL client gateway.
>
> Best,
> Yangze Guo
>
>
> On Thu, Jan 2, 2020 at 9:58 AM 贺小令  wrote:
> >
> > Hey everyone,
> > FLIP-24
> > 
> > proposes the whole conception and architecture of SQL Client. The
> embedded
> > mode is already supported since release-1.5, which is helpful for
> > debugging/demo purposes.
> > Many users ask that how to submit a Flink job to online environment
> without
> > programming on Flink API. To solve this, we create FLIP-91 [0] which
> > supports sql client gateway mode, then users can submit a job through CLI
> > client, REST API or JDBC.
> >
> > I'm glad that you can give me more feedback about FLIP-91.
> >
> > Best,
> > godfreyhe
> >
> > [0]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>


[jira] [Created] (FLINK-15411) HiveCatalog can't prune partition on DATE/TIMESTAMP columns

2019-12-26 Thread Bowen Li (Jira)
Bowen Li created FLINK-15411:


 Summary: HiveCatalog can't prune partition on DATE/TIMESTAMP 
columns
 Key: FLINK-15411
 URL: https://issues.apache.org/jira/browse/FLINK-15411
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.0, 1.11.0
Reporter: Bowen Li
Assignee: Rui Li
 Fix For: 1.10.0, 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15376) support "CREATE TABLE AS" in Flink SQL

2019-12-23 Thread Bowen Li (Jira)
Bowen Li created FLINK-15376:


 Summary: support "CREATE TABLE AS" in Flink SQL
 Key: FLINK-15376
 URL: https://issues.apache.org/jira/browse/FLINK-15376
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Bowen Li
Assignee: Kurt Young






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15351) develop PostgresJDBCCatalog

2019-12-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-15351:


 Summary: develop PostgresJDBCCatalog
 Key: FLINK-15351
 URL: https://issues.apache.org/jira/browse/FLINK-15351
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15353) develop AbstractJDBCCatalog

2019-12-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-15353:


 Summary: develop AbstractJDBCCatalog
 Key: FLINK-15353
 URL: https://issues.apache.org/jira/browse/FLINK-15353
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15352) develop MySQLJDBCCatalog

2019-12-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-15352:


 Summary: develop MySQLJDBCCatalog
 Key: FLINK-15352
 URL: https://issues.apache.org/jira/browse/FLINK-15352
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15350) develop JDBCCatalog to connect to relational databases

2019-12-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-15350:


 Summary: develop JDBCCatalog to connect to relational databases
 Key: FLINK-15350
 URL: https://issues.apache.org/jira/browse/FLINK-15350
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0


introduce AbastractJDBCCatalog and a set of JDBC catalog implementations to 
connect Flink to all relational databases.

Class hierarchy:

{code:java}
Catalog API 
|
AbstractJDBCCatalog
|
PostgresJDBCCatalog, MySqlJDBCCatalog, OracleJDBCCatalog, ...
 
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15349) add Catalog DDL support in FLIP-69

2019-12-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-15349:


 Summary: add Catalog DDL support in FLIP-69
 Key: FLINK-15349
 URL: https://issues.apache.org/jira/browse/FLINK-15349
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API
Reporter: Bowen Li
Assignee: Terry Wang
 Fix For: 1.11.0


https://cwiki.apache.org/confluence/display/FLINK/FLIP+69+-+Flink+SQL+DDL+Enhancement

some customers who have internal streaming platform requested this feature, as 
it's not possible on a platform to load catalogs dynamically at runtime now



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15348) Fix orc optimization for version less than 2.3 by introducing orc shim

2019-12-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-15348:


 Summary: Fix orc optimization for version less than 2.3 by 
introducing orc shim
 Key: FLINK-15348
 URL: https://issues.apache.org/jira/browse/FLINK-15348
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.0, 1.11.0
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.10.0, 1.11.0


Reading ORC table from Hive 2.0.1 fails with:
{noformat}
Caused by: java.lang.NoSuchMethodError: 
org.apache.orc.OrcFile.createReader(Lorg/apache/hadoop/fs/Path;Lorg/apache/orc/OrcFile$ReaderOptions;)Lorg/apache/orc/Reader;
at org.apache.flink.orc.OrcSplitReader.(OrcSplitReader.java:78)
at 
org.apache.flink.orc.OrcColumnarRowSplitReader.(OrcColumnarRowSplitReader.java:53)
at 
org.apache.flink.orc.OrcSplitReaderUtil.genPartColumnarRowReader(OrcSplitReaderUtil.java:93)
at 
org.apache.flink.connectors.hive.read.HiveVectorizedOrcSplitReader.(HiveVectorizedOrcSplitReader.java:64)
at 
org.apache.flink.connectors.hive.read.HiveTableInputFormat.open(HiveTableInputFormat.java:117)
at 
org.apache.flink.connectors.hive.read.HiveTableInputFormat.open(HiveTableInputFormat.java:57)
at 
org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:85)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:196)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Flink docs vendor table

2019-12-19 Thread Bowen Li
Really cool. I especially like the list of tags on "Ververica Platform"!

BTW, why is "Ververica Platform" placed at the last? I won't feel bothered
if we move it to the top.

On Thu, Dec 19, 2019 at 5:56 PM Seth Wiesman  wrote:

> I'm not sure, I think most all the options other than EMR abstract that
> component away.
>
> I've also opened a ticket if a commiter could please assign it to my Jira:
> sjwiesman
>
> https://issues.apache.org/jira/browse/FLINK-15337
>
> On Wed, Dec 18, 2019 at 10:29 AM Robert Metzger 
> wrote:
>
> > I was actually referring to "YARN", "Kubernetes", "Mesos".
> > If people know that AWS EMR is using YARN, they know which documentation
> to
> > look for in Flink.
> >
> >
> > On Wed, Dec 18, 2019 at 4:26 PM Konstantin Knauf <
> konstan...@ververica.com
> > >
> > wrote:
> >
> > > +1 This gives a better overview of the deployment targets and shows our
> > > prospective users that they can rely on a broad set of vendors, if help
> > is
> > > needed.
> > >
> > > I guess, Robert means if the vendor offers a managed service (like AWS
> > > Kinesis Analytics), or licenses software (like Ververica Platform).
> This
> > > would be beneficial, but on the other hand the categories/terms
> (managed,
> > > hosted, "serverless", self-managed) are not so well-defined in my
> > > experience.
> > >
> > > On Tue, Dec 17, 2019 at 10:46 PM Seth Wiesman 
> > wrote:
> > >
> > > > Happy to see there seems to be a consensus.
> > > >
> > > > Robert, can you elaborate on what you mean by "deployment model"?
> > > >
> > > > Seth
> > > >
> > > > On Tue, Dec 17, 2019 at 12:19 PM Robert Metzger  >
> > > > wrote:
> > > >
> > > > > +1 to the general idea
> > > > >
> > > > > Maybe we could add "Deployment Model" in addition to "Supported
> > > > > Environments" as properties for the vendors.
> > > > > I'd say Cloudera, Eventador and Huawei [1] are missing from this
> page
> > > > >
> > > > > [1]https://www.huaweicloud.com/en-us/product/cs.html
> > > > >
> > > > > On Tue, Dec 17, 2019 at 5:05 PM Stephan Ewen 
> > wrote:
> > > > >
> > > > > > +1 for your proposed solution, Seth!
> > > > > >
> > > > > > On Tue, Dec 17, 2019 at 3:05 PM Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for continuing this discussion Seth. I like the mockup
> > and I
> > > > > think
> > > > > > > this is a good improvement. Modulo the completeness check, +1
> for
> > > > > > offering
> > > > > > > links to 3rd party integrations.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > > > On Mon, Dec 16, 2019 at 6:04 PM Seth Wiesman <
> > sjwies...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > This discussion is a follow up to the previous thread on
> > dropping
> > > > > > > > vendor-specific documentation[1].
> > > > > > > >
> > > > > > > > The conversation ended unresolved on the question of what we
> > > should
> > > > > > > provide
> > > > > > > > on the Apache Flink docs. The consensus seemed to be moving
> > > towards
> > > > > > > > offering a table with links to 3rd parties. After an offline
> > > > > > conversation
> > > > > > > > with Robert, I have drafted a mock-up of what that might look
> > > > > like[2].
> > > > > > > > Please note that I included a few vendors that I could think
> of
> > > off
> > > > > the
> > > > > > > top
> > > > > > > > of my head, the list in this picture is not complete but that
> > is
> > > > not
> > > > > > the
> > > > > > > > conversation we are having here.
> > > > > > > >
> > > > > > > > There are three competing goals that we are trying to achieve
> > > here.
> > > > > > > >
> > > > > > > > 1) Provide information to users that vendor support is
> > available
> > > as
> > > > > it
> > > > > > > can
> > > > > > > > be important in growing adoption within enterprises
> > > > > > > > 2) Be maintainable by the open-source Flink community
> > > > > > > > 3) Remain neutral
> > > > > > > >
> > > > > > > > Please let me know what you think
> > > > > > > >
> > > > > > > > Seth
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-vendor-specific-deployment-documentation-td35457.html
> > > > > > > > [2]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://gist.githubusercontent.com/sjwiesman/bb90f0765148c15051bcc91092367851/raw/42c0a1e9240f1c5808a053f8ff5965828cca96d5/mockup.png
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > >
> > > Konstantin Knauf | Solutions Architect
> > >
> > > +49 160 91394525
> > >
> > >
> > > Follow us @VervericaData Ververica 
> > >
> > >
> > > --
> > >
> > > Join Flink Forward  - The Apache Flink
> > > Conference
> > >
> > > Stream Processing | Event Driven | Real Time
> > >
> > > --
> > >
> > > Ververica GmbH | 

Re: [DISCUSS] What parts of the Python API should we focus on next ?

2019-12-19 Thread Bowen Li
- integrate PyFlink with Jupyter notebook
   - Description: users should be able to run PyFlink seamlessly in Jupyter
   - Benefits: Jupyter is the industrial standard notebook for data
scientists. I’ve talked to a few companies in North America, they think
Jupyter is the #1 way to empower internal DS with Flink


On Wed, Dec 18, 2019 at 19:05 jincheng sun  wrote:

> Also CC user-zh.
>
> Best,
> Jincheng
>
>
> jincheng sun  于2019年12月19日周四 上午10:20写道:
>
>> Hi folks,
>>
>> As release-1.10 is under feature-freeze(The stateless Python UDF is
>> already supported), it is time for us to plan the features of PyFlink for
>> the next release.
>>
>> To make sure the features supported in PyFlink are the mostly demanded
>> for the community, we'd like to get more people involved, i.e., it would be
>> better if all of the devs and users join in the discussion of which kind of
>> features are more important and urgent.
>>
>> We have already listed some features from different aspects which you can
>> find below, however it is not the ultimate plan. We appreciate any
>> suggestions from the community, either on the functionalities or
>> performance improvements, etc. Would be great to have the following
>> information if you want to suggest to add some features:
>>
>> -
>> - Feature description: 
>> - Benefits of the feature: 
>> - Use cases (optional): 
>> --
>>
>> Features in my mind
>>
>> 1. Integration with most popular Python libraries
>> - fromPandas/toPandas API
>>Description:
>>   Support to convert between Table and pandas.DataFrame.
>>Benefits:
>>   Users could switch between Flink and Pandas API, for example,
>> do some analysis using Flink and then perform analysis using the Pandas API
>> if the result data is small and could fit into the memory, and vice versa.
>>
>> - Support Scalar Pandas UDF
>>Description:
>>   Support scalar Pandas UDF in Python Table API & SQL. Both the
>> input and output of the UDF is pandas.Series.
>>Benefits:
>>   1) Scalar Pandas UDF performs better than row-at-a-time UDF,
>> ranging from 3x to over 100x (from pyspark)
>>   2) Users could use Pandas/Numpy API in the Python UDF
>> implementation if the input/output data type is pandas.Series
>>
>> - Support Pandas UDAF in batch GroupBy aggregation
>>Description:
>>Support Pandas UDAF in batch GroupBy aggregation of Python
>> Table API & SQL. Both the input and output of the UDF is pandas.DataFrame.
>>Benefits:
>>   1) Pandas UDAF performs better than row-at-a-time UDAF more
>> than 10x in certain scenarios
>>   2) Users could use Pandas/Numpy API in the Python UDAF
>> implementation if the input/output data type is pandas.DataFrame
>>
>> 2. Fully support  all kinds of Python UDF
>> - Support Python UDAF(stateful) in GroupBy aggregation (NOTE: Please
>> give us some use case if you want this feature to be contained in the next
>> release)
>>   Description:
>> Support UDAF in GroupBy aggregation.
>>   Benefits:
>> Users could define and use Python UDAF and use it in GroupBy
>> aggregation. Without it, users have to use Java/Scala UDAF.
>>
>> - Support Python UDTF
>>   Description:
>>Support  Python UDTF in Python Table API & SQL
>>   Benefits:
>> Users could define and use Python UDTF in Python Table API & SQL.
>> Without it, users have to use Java/Scala UDTF.
>>
>> 3. Debugging and Monitoring of Python UDF
>>- Support User-Defined Metrics
>>  Description:
>>Allow users to define user-defined metrics and global job
>> parameters with Python UDFs.
>>  Benefits:
>>UDF needs metrics to monitor some business or technical
>> indicators, which is also a requirement for UDFs.
>>
>>- Make the log level configurable
>>  Description:
>>Allow users to config the log level of Python UDF.
>>  Benefits:
>>Users could configure different log levels when debugging and
>> deploying.
>>
>> 4. Enrich the Python execution environment
>>- Docker Mode Support
>>  Description:
>>  Support running python UDF in docker workers.
>>  Benefits:
>>  Support various of deployments to meet more users' requirements.
>>
>> 5. Expand the usage scope of Python UDF
>>- Support to use Python UDF via SQL client
>>  Description:
>>  Support to register and use Python UDF via SQL client
>>  Benefits:
>>  SQL client is a very important interface for SQL users. This
>> feature allows SQL users to use Python UDFs via SQL client.
>>
>>- Integrate Python UDF with Notebooks
>>  Description:
>>  Such as Zeppelin, etc (Especially Python dependencies)
>>
>>- Support to register Python UDF into catalog
>>   Description:
>>   Support to register Python UDF into catalog
>>   Benefits:
>>   1)Catalog is the centralized 

[jira] [Created] (FLINK-15303) support predicate pushdown for sources in hive connector

2019-12-17 Thread Bowen Li (Jira)
Bowen Li created FLINK-15303:


 Summary: support predicate pushdown for sources in hive connector 
 Key: FLINK-15303
 URL: https://issues.apache.org/jira/browse/FLINK-15303
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Jingsong Lee
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15302) properties in create table DDL need to be backward compatible

2019-12-17 Thread Bowen Li (Jira)
Bowen Li created FLINK-15302:


 Summary: properties in create table DDL need to be backward 
compatible 
 Key: FLINK-15302
 URL: https://issues.apache.org/jira/browse/FLINK-15302
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Bowen Li
Assignee: Jark Wu


since we have a persistent HiveCatalog now, properties in create table DDL need 
to be backward compatible for at least 2 major versions, like state. Otherwise, 
e.g. table created via DDL in 1.10 may not be readable in 1.11





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-17 Thread Bowen Li
I'm not sure providing an uber jar would be possible.

Different from kafka and elasticsearch connector who have dependencies for
a specific kafka/elastic version, or the kafka universal connector that
provides good compatibilities, hive connector needs to deal with hive jars
in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions)
with incompatibility even between minor versions, different versioned
hadoop and other extra dependency jars for each hive version.

Besides, users usually need to be able to easily see which individual jars
are required, which is invisible from an uber jar. Hive users already have
their hive deployments. They usually have to use their own hive jars
because, unlike hive jars on mvn, their own jars contain changes in-house
or from vendors. They need to easily tell which jars Flink requires for
corresponding open sourced hive version to their own hive deployment, and
copy in-hosue jars over from hive deployments as replacements.

Providing a script to download all the individual jars for a specified hive
version can be an alternative.

The goal is we need to provide a *product*, not a technology, to make it
less hassle for Hive users. Afterall, it's Flink embracing Hive community
and ecosystem, not the other way around. I'd argue Hive connector can be
treat differently because its community/ecosystem/userbase is much larger
than the other connectors, and it's way more important than other
connectors to Flink on the mission of becoming a batch/streaming unified
engine and get Flink more widely adopted.


On Sun, Dec 15, 2019 at 10:03 PM Danny Chan  wrote:

> Also -1 on separate builds.
>
> After referencing some other BigData engines for distribution[1], i didn't
> find strong needs to publish a separate build
> for just a separate Hive version, indeed there are builds for different
> Hadoop version.
>
> Just like Seth and Aljoscha said, we could push a
> flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use cases.
>
> [1] https://spark.apache.org/downloads.html
> [2] https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html
>
> Best,
> Danny Chan
> 在 2019年12月14日 +0800 AM3:03,dev@flink.apache.org,写道:
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies
>


[jira] [Created] (FLINK-15263) add dedicated page for HiveCatalog

2019-12-15 Thread Bowen Li (Jira)
Bowen Li created FLINK-15263:


 Summary: add dedicated page for HiveCatalog
 Key: FLINK-15263
 URL: https://issues.apache.org/jira/browse/FLINK-15263
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive, Documentation
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.9.2, 1.10.0, 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15262) kafka connector doesn't read from beginning immediately when 'connector.startup-mode' = 'earliest-offset'

2019-12-15 Thread Bowen Li (Jira)
Bowen Li created FLINK-15262:


 Summary: kafka connector doesn't read from beginning immediately 
when 'connector.startup-mode' = 'earliest-offset' 
 Key: FLINK-15262
 URL: https://issues.apache.org/jira/browse/FLINK-15262
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.10.0
Reporter: Bowen Li
Assignee: Jiangjie Qin
 Fix For: 1.10.0, 1.11.0


I created a kafka table in Flink to read from my kakfa topic (already has 
messages in it) in earliest offset, but `select * from test` query in Flink 
doesn't start to read until a new message comes. If no new message arrives, the 
query just sit there and never produce result.

What I expect is that the query should immediate produce result on all existing 
message without having to wait for a new message to "trigger" data processing.

DDL that I used according to DDL document at 
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#kafka-connector

{code:java}
create table test(name String, age Int) with (
   'connector.type' = 'kafka',
   'connector.version' = 'universal',
   'connector.topic' = 'test',
   'connector.properties.zookeeper.connect' = 'localhost:2181',
   'connector.properties.bootstrap.servers' = 'localhost:9092',
   'connector.startup-mode' = 'earliest-offset',
   'format.type' = 'csv',
   'update-mode' = 'append'
);
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15261) add dedicated documentation for blink planner

2019-12-14 Thread Bowen Li (Jira)
Bowen Li created FLINK-15261:


 Summary: add dedicated documentation for blink planner 
 Key: FLINK-15261
 URL: https://issues.apache.org/jira/browse/FLINK-15261
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 1.9.0
Reporter: Bowen Li
Assignee: Kurt Young
 Fix For: 1.10.0, 1.11.0, 1.9.0


we are missing a dedicated page under `Table API and SQL` section to describe 
in detail what are the advantages of blink planner, and why users should use it 
over the legacy one.

I'm trying to reference a blink planner page in Flink's Hive documentation, and 
realized there's even not one yet



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15259) HiveInspector.toInspectors() should convert Flink constant to Hive constant

2019-12-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15259:


 Summary: HiveInspector.toInspectors() should convert Flink 
constant to Hive constant 
 Key: FLINK-15259
 URL: https://issues.apache.org/jira/browse/FLINK-15259
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0, 1.11.0


repro test: 

{code:java}
public class HiveModuleITCase {
@Test
public void test() {
TableEnvironment tEnv = 
HiveTestUtils.createTableEnvWithBlinkPlannerBatchMode();

tEnv.unloadModule("core");
tEnv.loadModule("hive", new HiveModule("2.3.4"));

tEnv.sqlQuery("select concat('an', 'bn')");
}
}
{code}

seems that currently HiveInspector.toInspectors() didn't convert Flink constant 
to Hive constant before calling hiveShim.getObjectInspectorForConstant



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15258) HiveModuleFactory doesn't take hive-version

2019-12-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15258:


 Summary: HiveModuleFactory doesn't take hive-version
 Key: FLINK-15258
 URL: https://issues.apache.org/jira/browse/FLINK-15258
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.0
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0, 1.11.0


HiveModuleFactory should have hive-version as supported property



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Zhu Zhu becomes a Flink committer

2019-12-13 Thread Bowen Li
Congrats!

On Fri, Dec 13, 2019 at 10:42 AM Xuefu Z  wrote:

> Congratulations, Zhu Zhu!
>
> On Fri, Dec 13, 2019 at 10:37 AM Peter Huang 
> wrote:
>
> > Congratulations!:)
> >
> > On Fri, Dec 13, 2019 at 9:45 AM Piotr Nowojski 
> > wrote:
> >
> > > Congratulations! :)
> > >
> > > > On 13 Dec 2019, at 18:05, Fabian Hueske  wrote:
> > > >
> > > > Congrats Zhu Zhu and welcome on board!
> > > >
> > > > Best, Fabian
> > > >
> > > > Am Fr., 13. Dez. 2019 um 17:51 Uhr schrieb Till Rohrmann <
> > > > trohrm...@apache.org>:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I'm very happy to announce that Zhu Zhu accepted the offer of the
> > Flink
> > > PMC
> > > >> to become a committer of the Flink project.
> > > >>
> > > >> Zhu Zhu has been an active community member for more than a year
> now.
> > > Zhu
> > > >> Zhu played an essential role in the scheduler refactoring, helped
> > > >> implementing fine grained recovery, drives FLIP-53 and fixed various
> > > bugs
> > > >> in the scheduler and runtime. Zhu Zhu also helped the community by
> > > >> reporting issues, answering user mails and being active on the dev
> > > mailing
> > > >> list.
> > > >>
> > > >> Congratulations Zhu Zhu!
> > > >>
> > > >> Best, Till
> > > >> (on behalf of the Flink PMC)
> > > >>
> > >
> > >
> >
>
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>


[jira] [Created] (FLINK-15257) convert HiveCatalogITCase.testCsvTableViaAPI() to use blink planner

2019-12-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15257:


 Summary: convert HiveCatalogITCase.testCsvTableViaAPI() to use 
blink planner
 Key: FLINK-15257
 URL: https://issues.apache.org/jira/browse/FLINK-15257
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive, Tests
Reporter: Bowen Li
Assignee: Terry Wang
 Fix For: 1.10.0, 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15256) unable to drop table in HiveCatalogITCase

2019-12-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15256:


 Summary: unable to drop table in HiveCatalogITCase
 Key: FLINK-15256
 URL: https://issues.apache.org/jira/browse/FLINK-15256
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Terry Wang
 Fix For: 1.10.0, 1.11.0



{code:java}
@Test
public void testCsvTableViaSQL() throws Exception {
EnvironmentSettings settings = 
EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

tableEnv.registerCatalog("myhive", hiveCatalog);
tableEnv.useCatalog("myhive");

String path = 
this.getClass().getResource("/csv/test.csv").getPath();

tableEnv.sqlUpdate("create table test2 (name String, age Int) 
with (\n" +
"   'connector.type' = 'filesystem',\n" +
"   'connector.path' = 'file://" + path + "',\n" +
"   'format.type' = 'csv'\n" +
")");

Table t = tableEnv.sqlQuery("SELECT * FROM 
myhive.`default`.test2");

List result = TableUtils.collectToList(t);

// assert query result
assertEquals(
new HashSet<>(Arrays.asList(
Row.of("1", 1),
Row.of("2", 2),
Row.of("3", 3))),
new HashSet<>(result)
);

tableEnv.sqlUpdate("drop table myhive.`default`.tests2");
}
{code}

The last drop table statement reports error as:


{code:java}
org.apache.flink.table.api.ValidationException: Could not execute DropTable in 
path `myhive`.`default`.`tests2`

at 
org.apache.flink.table.catalog.CatalogManager.execute(CatalogManager.java:568)
at 
org.apache.flink.table.catalog.CatalogManager.dropTable(CatalogManager.java:543)
at 
org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlUpdate(TableEnvironmentImpl.java:519)
at 
org.apache.flink.table.catalog.hive.HiveCatalogITCase.testCsvTableViaSQL(HiveCatalogITCase.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at 
org.apache.flink.connectors.hive.FlinkStandaloneHiveRunner.runTestMethod(FlinkStandaloneHiveRunner.java:169)
at 
org.apache.flink.connectors.hive.FlinkStandaloneHiveRunner.runChild(FlinkStandaloneHiveRunner.java:154)
at 
org.apache.flink.connectors.hive.FlinkStandaloneHiveRunner.runChild(FlinkStandaloneHiveRunner.java:92)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.jun

[jira] [Created] (FLINK-15255) document how to create Hive table from java API and DDL

2019-12-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15255:


 Summary: document how to create Hive table from java API and DDL
 Key: FLINK-15255
 URL: https://issues.apache.org/jira/browse/FLINK-15255
 Project: Flink
  Issue Type: Task
  Components: Documentation
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0, 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15254) hive module cannot be named "hive"

2019-12-13 Thread Bowen Li (Jira)
Bowen Li created FLINK-15254:


 Summary: hive module cannot be named "hive"
 Key: FLINK-15254
 URL: https://issues.apache.org/jira/browse/FLINK-15254
 Project: Flink
  Issue Type: Test
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0, 1.11.0


there seems to be a problem when a hive module is named "hive" and the module 
cannot be loaded/used properly. reported by [~Terry1897]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Bowen Li
cc user ML in case anyone want to chime in

On Fri, Dec 13, 2019 at 00:44 Bowen Li  wrote:

> Hi all,
>
> I want to propose to have a couple separate Flink distributions with Hive
> dependencies on specific Hive versions (2.3.4 and 1.2.1). The distributions
> will be provided to users on Flink download page [1].
>
> A few reasons to do this:
>
> 1) Flink-Hive integration is important to many many Flink and Hive users
> in two dimensions:
>  a) for Flink metadata: HiveCatalog is the only persistent catalog to
> manage Flink tables. With Flink 1.10 supporting more DDL, the persistent
> catalog would be playing even more critical role in users' workflow
>  b) for Flink data: Hive data connector (source/sink) helps both Flink
> and Hive users to unlock new use cases in streaming, near-realtime/realtime
> data warehouse, backfill, etc.
>
> 2) currently users have to go thru a *really* tedious process to get
> started, because it requires lots of extra jars (see [2]) that are absent
> in Flink's lean distribution. We've had so many users from public mailing
> list, private email, DingTalk groups who got frustrated on spending lots of
> time figuring out the jars themselves. They would rather have a more "right
> out of box" quickstart experience, and play with the catalog and
> source/sink without hassle.
>
> 3) it's easier for users to replace those Hive dependencies for their own
> Hive versions - just replace those jars with the right versions and no need
> to find the doc.
>
> * Hive 2.3.4 and 1.2.1 are two versions that represent lots of user base
> out there, and that's why we are using them as examples for dependencies in
> [1] even though we've supported almost all Hive versions [3] now.
>
> I want to hear what the community think about this, and how to achieve it
> if we believe that's the way to go.
>
> Cheers,
> Bowen
>
> [1] https://flink.apache.org/downloads.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
> [3]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions
>


[DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Bowen Li
Hi all,

I want to propose to have a couple separate Flink distributions with Hive
dependencies on specific Hive versions (2.3.4 and 1.2.1). The distributions
will be provided to users on Flink download page [1].

A few reasons to do this:

1) Flink-Hive integration is important to many many Flink and Hive users in
two dimensions:
 a) for Flink metadata: HiveCatalog is the only persistent catalog to
manage Flink tables. With Flink 1.10 supporting more DDL, the persistent
catalog would be playing even more critical role in users' workflow
 b) for Flink data: Hive data connector (source/sink) helps both Flink
and Hive users to unlock new use cases in streaming, near-realtime/realtime
data warehouse, backfill, etc.

2) currently users have to go thru a *really* tedious process to get
started, because it requires lots of extra jars (see [2]) that are absent
in Flink's lean distribution. We've had so many users from public mailing
list, private email, DingTalk groups who got frustrated on spending lots of
time figuring out the jars themselves. They would rather have a more "right
out of box" quickstart experience, and play with the catalog and
source/sink without hassle.

3) it's easier for users to replace those Hive dependencies for their own
Hive versions - just replace those jars with the right versions and no need
to find the doc.

* Hive 2.3.4 and 1.2.1 are two versions that represent lots of user base
out there, and that's why we are using them as examples for dependencies in
[1] even though we've supported almost all Hive versions [3] now.

I want to hear what the community think about this, and how to achieve it
if we believe that's the way to go.

Cheers,
Bowen

[1] https://flink.apache.org/downloads.html
[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
[3]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions


[jira] [Created] (FLINK-15240) is_generic key is missing for Flink table stored in HiveCatalog

2019-12-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-15240:


 Summary: is_generic key is missing for Flink table stored in 
HiveCatalog
 Key: FLINK-15240
 URL: https://issues.apache.org/jira/browse/FLINK-15240
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0, 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15235) create a Flink distribution for hive that includes all Hive dependencies

2019-12-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-15235:


 Summary: create a Flink distribution for hive that includes all 
Hive dependencies 
 Key: FLINK-15235
 URL: https://issues.apache.org/jira/browse/FLINK-15235
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive, Release System
Reporter: Bowen Li


consider create a Flink distribution for hive that includes all Hive 
dependencies, despite the existing FLink only distribution, to improve good 
quickstart experience for hive users



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-15234) hive table created from flink catalog table cannot have null properties in parameters

2019-12-12 Thread Bowen Li (Jira)
Bowen Li created FLINK-15234:


 Summary: hive table created from flink catalog table cannot have 
null properties in parameters
 Key: FLINK-15234
 URL: https://issues.apache.org/jira/browse/FLINK-15234
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   >