date:20200430

[jira] [Created] (FLINK-17487) Do not delete old checkpoints when stop the job.

2020-04-30 Thread nobleyd (Jira)

nobleyd created FLINK-17487:
---

 Summary: Do not delete old checkpoints when stop the job.
 Key: FLINK-17487
 URL: https://issues.apache.org/jira/browse/FLINK-17487
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Reporter: nobleyd


When stop flink job using 'flink stop jobId', the checkpoints data is deleted. 

When the stop action is not succeed or failed because of some unknown errors, 
sometimes the job resumes using the latest checkpoint, while sometimes it just 
fails, and the checkpoints data is gone.

You may say why I need these checkpoints since I stop the job and a savepoint 
will be generated. For example, my job uses a kafka source, while the kafka 
missed some data, and I want to stop the job and resume it using an old 
checkpoint. Anyway, I mean sometimes the action stop is failed and the 
checkpoint data is also deleted, which is not good. 

This feature is different from the case 'flink cancel jobId' or 'flink 
savepoint jobId', which won't delete the checkpoint data.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[GitHub] [flink-shaded] piyushnarang commented on pull request #85: [FLINK-16955] Bump Zookeeper 3.4.X to 3.4.14

2020-04-30 Thread GitBox



piyushnarang commented on pull request #85:
URL: https://github.com/apache/flink-shaded/pull/85#issuecomment-622144494


   @zentol added the updates you requested. I did some basic sanity checking / 
testing after excluding the spotbugs and jsr305 and it seems to work ok. Do you 
know if there's a way to trigger the Flink full CI run for flink-shaded changes?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-17486) ClassCastException when copying AVRO SpecificRecord containing a decimal field

2020-04-30 Thread Lorenzo Nicora (Jira)

Lorenzo Nicora created FLINK-17486:
--

 Summary: ClassCastException when copying AVRO SpecificRecord 
containing a decimal field
 Key: FLINK-17486
 URL: https://issues.apache.org/jira/browse/FLINK-17486
 Project: Flink
  Issue Type: Bug
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.10.0
 Environment: Flink 1.10.0

AVRO 1.9.2

Java 1.8.0 (but also Java 14)

Scala binary 2.11
Reporter: Lorenzo Nicora


When consuming from a Kafka source AVRO SpecificRecord containing a {{decimal}} 
(logical type) field, copying the record fails with:

{{java.lang.ClassCastException: class java.math.BigDecimal cannot be cast to 
class java.nio.ByteBuffer}}

 

This code reproduces the problem:

{{AvroSerializer serializer = new AvroSerializer<>(Sample.class);}}

{{Sample s1 = Sample.newBuilder()}}
 {{  .setPrice(BigDecimal.valueOf(42.32))}}
 {{  .setId("A12345")}}
 {{  .build();}}

{{Sample s2 = serializer.copy(s1);}}

 

The AVRO SpecificRecord is generated using avro-maven-plugin from this IDL:

{{@namespace("example.avro")}}
{{protocol SampleProtocol {}}
{{  record Sample{}}
{{    string id;}}
{{    decimal(9,2) price;}}
{{    timestamp_ms eventTime;}}
{{   }}}
{{}}}

 

The deepCopy of the record happens behind the scenes when attaching an 
AssignerWithPeriodicWatermark to a Kafka Source consuming AVRO SpecificRecord 
and using Confluent Schema Registry. The assigned extracts the event time from 
the record and enabling bookmarking (not sure whether this is related)

A simplified version of the application is 
[here|[https://github.com/nicusX/flink-avro-bug|https://github.com/nicusX/flink-avro-bug/blob/master/src/main/java/example/StreamJob.java]].

 

The problem looks similar to AVRO-1895 but that issue has been fixed since AVRO 
1.8.2 (I'm using AVRO 1.9.2)

In fact, the following code doing deepCopy and only relying on AVRO does work: 

 

{{Sample s1 = Sample.newBuilder()}}
 {{  .setPrice(BigDecimal.valueOf(42.32))}}
 {{  .setId("A12345")}}
 {{  .build();}}
 {{Sample s2 = Sample.newBuilder(s1).build();}}

 

A simplified version of the Flink application causing the problem is 
[here|[https://github.com/nicusX/flink-avro-bug/blob/master/src/main/java/example/StreamJob.java]].

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-04-30 Thread Konstantin Knauf

Hi Jark,

my gut feeling is 1), because of its consistency with other connectors
(does not add two secret keywords) although it is more verbose.

Best,

Konstantin



On Thu, Apr 30, 2020 at 5:01 PM Jark Wu  wrote:

> Hi Konstantin,
>
> Thanks for the link of Java Faker. It's an intereting project and
> could benefit to a comprehensive datagen source.
>
> What the discarding and printing sink look like in your thought?
> 1) manually create a table with a `blackhole` or `print` connector, e.g.
>
> CREATE TABLE my_sink (
>   a INT,
>   b STRNG,
>   c DOUBLE
> ) WITH (
>   'connector' = 'print'
> );
> INSERT INTO my_sink SELECT a, b, c FROM my_source;
>
> 2) a system built-in table named `blackhole` and `print` without manually
> schema work, e.g.
> INSERT INTO print SELECT a, b, c, d FROM my_source;
>
> Best,
> Jark
>
>
>
> On Thu, 30 Apr 2020 at 21:19, Konstantin Knauf  wrote:
>
> > Hi everyone,
> >
> > sorry for reviving this thread at this point in time. Generally, I think,
> > this is a very valuable effort. Have we considered only providing a very
> > basic data generator (+ discarding and printing sink tables) in Apache
> > Flink and moving a more comprehensive data generating table source to an
> > ecosystem project promoted on flink-packages.org. I think this has a lot
> > of
> > potential (e.g. in combination with Java Faker [1]), but it would
> probably
> > be better served in a small separately maintained repository.
> >
> > Cheers,
> >
> > Konstantin
> >
> > [1] https://github.com/DiUS/java-faker
> >
> >
> > On Tue, Mar 24, 2020 at 9:10 AM Jingsong Li 
> > wrote:
> >
> > > Hi all,
> > >
> > > I created https://issues.apache.org/jira/browse/FLINK-16743 for
> > follow-up
> > > discussion. FYI.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Tue, Mar 24, 2020 at 2:20 PM Bowen Li  wrote:
> > >
> > > > I agree with Jingsong that sink schema inference and system tables
> can
> > be
> > > > considered later. I wouldn’t recommend to tackle them for the sake of
> > > > simplifying user experience to the extreme. Providing the above handy
> > > > source and sink implementations already offer users a ton of
> immediate
> > > > value.
> > > >
> > > >
> > > > On Mon, Mar 23, 2020 at 20:20 Jingsong Li 
> > > wrote:
> > > >
> > > > > Hi Benchao,
> > > > >
> > > > > > do you think we need to add more columns with various types?
> > > > >
> > > > > I didn't list all types, but we should support primitive types,
> > > varchar,
> > > > > Decimal, Timestamp and etc...
> > > > > This can be done continuously.
> > > > >
> > > > > Hi Benchao, Jark,
> > > > > About console and blackhole, yes, they can have no schema, the
> schema
> > > can
> > > > > be inferred by upstream node.
> > > > > - But now we don't have this mechanism to do these configurable
> sink
> > > > > things.
> > > > > - If we want to support, we need a single way to support these two
> > > sinks.
> > > > > - And uses can use "create table like" and others way to simplify
> > DDL.
> > > > >
> > > > > And for providing system/registered tables (`console` and
> > `blackhole`):
> > > > > - I have no strong opinion on these system tables. In SQL, will be
> > > > "insert
> > > > > into blackhole select a /*int*/, b /*string*/ from tableA", "insert
> > > into
> > > > > blackhole select a /*double*/, b /*Map*/, c /*string*/ from
> tableB".
> > It
> > > > > seems that Blackhole is a universal thing, which makes me feel bad
> > > > > intuitively.
> > > > > - Can user override these tables? If can, we need ensure it can be
> > > > > overwrite by catalog tables.
> > > > >
> > > > > So I think we can leave these system tables to future too.
> > > > > What do you think?
> > > > >
> > > > > Best,
> > > > > Jingsong Lee
> > > > >
> > > > > On Mon, Mar 23, 2020 at 4:44 PM Jark Wu  wrote:
> > > > >
> > > > > > Hi Jingsong,
> > > > > >
> > > > > > Regarding (2) and (3), I was thinking to ignore manually DDL
> work,
> > so
> > > > > users
> > > > > > can use them directly:
> > > > > >
> > > > > > # this will log results to `.out` files
> > > > > > INSERT INTO console
> > > > > > SELECT ...
> > > > > >
> > > > > > # this will drop all received records
> > > > > > INSERT INTO blackhole
> > > > > > SELECT ...
> > > > > >
> > > > > > Here `console` and `blackhole` are system sinks which is similar
> to
> > > > > system
> > > > > > functions.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > On Mon, 23 Mar 2020 at 16:33, Benchao Li 
> > > wrote:
> > > > > >
> > > > > > > Hi Jingsong,
> > > > > > >
> > > > > > > Thanks for bring this up. Generally, it's a very good proposal.
> > > > > > >
> > > > > > > About data gen source, do you think we need to add more columns
> > > with
> > > > > > > various types?
> > > > > > >
> > > > > > > About print sink, do we need to specify the schema?
> > > > > > >
> > > > > > > Jingsong Li  于2020年3月23日周一 下午1:51写道：
> > > > > > >
> > > > > > > > Thanks Bowen, Jark and Dian for your feedback and
> suggestions.
> > > > > >

Re: [DISCUSS] Introduce a new module 'flink-hadoop-utils'

2020-04-30 Thread Sivaprasanna

Bump.

Please let me know, if someone is interested in reviewing this one. I am
willing to start working on this. BTW, a small and new addition to the
list: With FLINK-10114 merged, OrcBulkWriterFactory can also reuse
`SerializableHadoopConfiguration` along with SequenceFileWriterFactory and
CompressWriterFactory.

CC - Kostas Kloudas since he has a better understanding on the
`SerializableHadoopConfiguration.`

Cheers,
Sivaprasanna

On Mon, Mar 30, 2020 at 3:17 PM Chesnay Schepler  wrote:

> I would recommend to wait until a committer has signed up for reviewing
> your changes before preparing any PR.
> Otherwise the chances are high that you invest a lot of time but the
> changes never get in.
>
> On 30/03/2020 11:42, Sivaprasanna wrote:
> > Hello Till,
> >
> > I agree with having the scope limited and more concentrated. I can file a
> > Jira and get started with the code changes, as and when someone has some
> > bandwidth, the review can also be done. What do you think?
> >
> > Cheers,
> > Sivaprasanna
> >
> > On Mon, Mar 30, 2020 at 3:00 PM Till Rohrmann 
> wrote:
> >
> >> Hi Sivaprasanna,
> >>
> >> thanks for starting this discussion. In general I like the idea to
> remove
> >> duplications and move common code to a shared module. As a
> recommendation,
> >> I would exclude the whole part about Flink's Hadoop compatibility
> modules
> >> because they are legacy code and hardly used anymore. This would also
> have
> >> the benefit of making the scope of the proposal a bit smaller.
> >>
> >> What we now need is a committer who wants to help with this effort. It
> >> might be that this takes a bit of time as many of the committers are
> quite
> >> busy.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Thu, Mar 19, 2020 at 2:15 PM Sivaprasanna  >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> Continuing on an earlier discussion[1] regarding having a separate
> module
> >>> for Hadoop related utility components, I have gone through our project
> >>> briefly and found the following components which I feel could be moved
> >> to a
> >>> separate module for reusability, and better module structure.
> >>>
> >>> Module Name Class Name Used at / Remarks
> >>>
> >>> flink-hadoop-fs
> >>> flink.runtime.util.HadoopUtils
> >>> flink-runtime => HadoopModule & HadoopModuleFactory
> >>> flink-swift-fs-hadoop => SwiftFileSystemFactory
> >>> flink-yarn => Utils, YarnClusterDescriptor
> >>>
> >>> flink-hadoop-compatability
> >>> api.java.hadoop.mapred.utils.HadoopUtils
> >>> Both belong to the same module but with different packages
> >>> (api.java.hadoop.mapred and api.java.hadoop.mapreduce)
> >>> api.java.hadoop.mapreduce.utils.HadoopUtils
> >>> flink-sequeunce-file
> >>> formats.sequeuncefile.SerializableHadoopConfiguration Currently,
> >>> it is used at formats.sequencefile.SequenceFileWriterFactory but can
> also
> >>> be used at HadoopCompressionBulkWriter, a potential OrcBulkWriter and
> >>> pretty much everywhere to avoid NotSerializableException.
> >>>
> >>> *Proposal*
> >>> To summarise, I believe we can create a new module (flink-hadoop-utils
> ?)
> >>> and move these reusable components to this new module which will have
> an
> >>> optional/provided dependency on flink-shaded-hadoop-2.
> >>>
> >>> *Structure*
> >>> In the present form, I think we will have two classes with the
> packaging
> >>> structure being *org.apache.flink.hadoop.[utils/serialization]*
> >>> 1. HadoopUtils with all static methods ( after combining and
> eliminating
> >>> the duplicate code fragments from the three HadoopUtils classes
> mentioned
> >>> above)
> >>> 2. Move the existing SerializableHadoopConfiguration from the
> >>> flink-sequence-file to this new module .
> >>>
> >>> *Justification*
> >>> * With this change, we would be stripping away the dependency on
> >>> flink-hadoop-fs from flink-runtime as I don't see any other classes
> from
> >>> flink-hadoop-fs is being used anywhere in flink-runtime module.
> >>> * We will have a common place where all the utilities related to Hadoop
> >> can
> >>> go which can be reused easily without leading to jar hell.
> >>>
> >>> In addition to this, if you are aware of any other classes that fit in
> >> this
> >>> approach, please share the details here.
> >>>
> >>> *Note*
> >>> I don't have a complete understanding here but I did see two
> >>> implementations of the following classes under two different packages
> >>> *.mapred and *.mapreduce.
> >>> * HadoopInputFormat
> >>> * HadoopInputFormatBase
> >>> * HadoopOutputFormat
> >>> * HadoopOutputFormatBase
> >>>
> >>> Can we somehow figure and have them in this new module?
> >>>
> >>> Thanks,
> >>> Sivaprasanna
> >>>
> >>> [1]
> >>>
> >>>
> >>
> https://lists.apache.org/thread.html/r198f09496ba46885adbcc41fe778a7a34ad1cd685eeae8beb71e6fbb%40%3Cdev.flink.apache.org%3E
>
>
>

Re: [DISCUSS] Hierarchies in ConfigOption

2020-04-30 Thread Jark Wu

Hi Dawid,

I just want to mention one of your response,

> What you described with
> 'format' = 'csv',
> 'csv.allow-comments' = 'true',
> 'csv.ignore-parse-errors' = 'true'
> would not work though as the `format` prefix is mandatory in the sources
as only the properties with format
>  will be passed to the format factory in majority of cases. We already
have some implicit contracts.

IIUC, in FLIP-95 and FLIP-122, the property key style are totally decided
by connectors, not the framework.
So I custom connector can define above properties, and extract the value of
'format', i.e. 'csv', to find the format factory.
And extract the properties with `csv.` prefix and remove the prefix, and
pass the properties (e.g. 'allow-comments' = 'true')
into the format factory to create format.

So there is no a strict guarantee to have a "nested JSON style" properties.
Users can still develop a custom connector with this
un-hierarchy properties and works well.

'format' = 'json',
'format.fail-on-missing-field' = 'false'

Best,
Jark


On Thu, 30 Apr 2020 at 14:29, Dawid Wysakowicz 
wrote:

> Hi all,
>
> I'd like to start with a comment that I am ok with the current state of
> the FLIP-122 if there is a strong preference for it. Nevertheless I still
> like the idea of adding `type` to the `format` to have it as `format.type`
> = `json`.
>
> I wanted to clarify a few things though:
>
> @Jingsong As far as I see it most of the users copy/paste the properties
> from the documentation to the SQL, so I don't think additional four
> characters are too cumbersome. Plus if you force the additional suffix onto
> all the options of a format you introduce way more boilerplate than if we
> added the `type/kind/name`
>
> @Kurt I agree that we cannot force it, but I think it is more of a
> question to set standards/implicit contracts on the properties. What you
> described with
> 'format' = 'csv',
> 'csv.allow-comments' = 'true',
> 'csv.ignore-parse-errors' = 'true'
>
> would not work though as the `format` prefix is mandatory in the sources
> as only the properties with format will be passed to the format factory in
> majority of cases. We already have some implicit contracts.
>
> @Forward I did not necessarily get the example. Aren't json and bson two
> separate formats? Do you mean you can have those two at the same time? Why
> do you need to differentiate the options for each? The way I see it is:
>
> ‘format(.name)' = 'json',
> ‘format.fail-on-missing-field' = 'false'
>
> or
>
> ‘format(.name)' = 'bson',
> ‘format.fail-on-missing-field' = 'false'
>
> @Benchao I'd be fine with any of name, kind, type(this we already had in
> the past)
>
> Best,
> Dawid
>
> On 30/04/2020 04:17, Forward Xu wrote:
>
> Here I have a little doubt. At present, our json only supports the
> conventional json format. If we need to implement json with bson, json with
> avro, etc., how should we express it?
> Do you need like the following:
>
> ‘format.name' = 'json',
>
> ‘format.json.fail-on-missing-field' = 'false'
>
>
> ‘format.name' = 'bson',
>
> ‘format.bson.fail-on-missing-field' = ‘false'
>
>
> Best,
>
> Forward
>
> Benchao Li   于2020年4月30日周四 上午9:58写道：
>
>
> Thanks Timo for staring the discussion.
>
> Generally I like the idea to keep the config align with a standard like
> json/yaml.
>
> From the user's perspective, I don't use table configs from a config file
> like yaml or json for now,
> And it's ok to change it to yaml like style. Actually we didn't know that
> this could be a yaml like
> configuration hierarchy. If it has a hierarchy, we maybe consider that in
> the future to load the
> config from a yaml/json file.
>
> Regarding the name,
> 'format.kind' looks fine to me. However there is another name from the top
> of my head:
> 'format.name', WDYT?
>
> Dawid Wysakowicz   
> 于2020年4月29日周三 下午11:56写道：
>
>
> Hi all,
>
> I also wanted to share my opinion.
>
> When talking about a ConfigOption hierarchy we use for configuring Flink
> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
> compatible style. Those options are primarily read from a file and thus
> should at least try to follow common practices for nested formats if we
> ever decide to switch to one.
>
> Here the question is about the properties we use in SQL statements. The
> origin/destination of these usually will be external catalog, usually in
>
> a
>
> flattened(key/value) representation so I agree it is not as important as
>
> in
>
> the aforementioned case. Nevertheless having a yaml based catalog or
>
> being
>
> able to have e.g. yaml based snapshots of a catalog in my opinion is
> appealing. At the same time cost of being able to have a nice
> yaml/hocon/json representation is just adding a single suffix to a
> single(at most 2 key + value) property. The question is between `format`
>
> =
>
> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
> doing it.
>
> Just to have a full picture. Both cases can be represented in yaml, but
> the

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-30 Thread godfrey he

Hi Fabian,

the broken example is:

create table MyTable (

f0 BIGINT NOT NULL,

f1 ROW,

f2 VARCHAR<256>,

f3 AS f0 + 1,

PRIMARY KEY (f0),

UNIQUE (f3, f2),

WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)

) with (...)


name

type

key

compute column

watermark

f0

BIGINT NOT NULL

PRI

(NULL)

f1

ROW<`q1` STRING, `q2` TIMESTAMP(3)>

UNQ

(NULL)

f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)

f2

VARCHAR<256>

(NULL)

NULL

f3

BIGINT NOT NULL

UNQ

f0 + 1


or we add a column to represent nullability.

name

type

null

key

compute column

watermark

f0

BIGINT

false

PRI

(NULL)

f1

ROW<`q1` STRING, `q2` TIMESTAMP(3)>

true

UNQ

(NULL)

f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)

f2

VARCHAR<256>

true

(NULL)

NULL

f3

BIGINT

false

UNQ

f0 + 1




Hi Jark,
If we can limit watermark must be defined on top-level column,
this will become more simple.

Best,
Godfrey

Jark Wu  于2020年4月30日周四 下午11:38写道：

> Hi,
>
> I'm in favor of Fabian's proposal.
> First, watermark is not a column, but a metadata just like primary key, so
> shouldn't stand with columns.
> Second, AFAIK, primary key can only be defined on top-level columns.
> Third, I think watermark can also follow primary key than only allow to
> define on top-level columns.
>
> I have to admit that in FLIP-66, watermark can define on nested fields.
> However, during implementation, I found that it's too complicated to do
> that. We have refactor time-based physical nodes,
> we have to use code generation to access event-time, we have to refactor
> FlinkTypeFactory to support a complex nested rowtime.
> There is not much value of this feature, but introduce a lot of complexity
> in code base.
> So I think we can force watermark define on top-level columns. If user want
> to define on nested columns,
> he/she can use computed column to be a top-level column.
>
> Best,
> Jark
>
>
> On Thu, 30 Apr 2020 at 17:55, Fabian Hueske  wrote:
>
> > Hi Godfrey,
> >
> > The formatting of your example seems to be broken.
> > Could you send them again please?
> >
> > Regarding your points
> > > because watermark express can be a sub-column, just like `f1.q2` in
> above
> > example I give.
> >
> > I would put the watermark information in the row of the top-level field
> and
> > indicate to which nested field the watermark refers.
> > Don't we have to solve the same issue for primary keys that are defined
> on
> > a nested field?
> >
> > > A boolean flag can't represent such info. and I do know whether we will
> > support complex watermark expression involving multiple columns in the
> > future. such as: "WATERMARK FOR ts as ts + f1 + interval '1' second"
> >
> > You are right, a simple binary flag is definitely not sufficient to
> display
> > the watermark information.
> > I would put the expression string into the field, i.e., "ts + f1 +
> interval
> > '1' second"
> >
> >
> > For me the most important point of why to not show the watermark as a row
> > in the table is that it is not field that can be queried but meta
> > information on an existing field.
> > For the user it is important to know that a certain field has a
> watermark.
> > Otherwise, certain queries cannot be correctly specified.
> > Also there might be support for multiple watermarks that are defined of
> > different fields at some point. Would those be printed in multiple rows?
> >
> > Best,
> > Fabian
> >
> >
> > Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he <
> godfre...@gmail.com
> > >:
> >
> > > Hi Fabian, Aljoscha
> > >
> > > Thanks for the feedback.
> > >
> > > Agree with you that we can deal with primary key as you mentioned.
> > > now, the type column has contained the nullability attribute, e.g.
> BIGINT
> > > NOT NULL.
> > > (I'm also ok that we use two columns to represent type just like mysql)
> > >
> > > >Why I treat `watermark` as a special row ?
> > > because watermark express can be a sub-column, just like `f1.q2` in
> above
> > > example I give.
> > > A boolean flag can't represent such info. and I do know whether we will
> > > support complex
> > > watermark expression involving multiple columns in the future. such as:
> > > "WATERMARK FOR ts as ts + f1 + interval '1' second"
> > >
> > > If we do not support complex watermark expression, we can add a
> watermark
> > > column.
> > >
> > > for example:
> > >
> > > create table MyTable (
> > >
> > > f0 BIGINT NOT NULL,
> > >
> > > f1 ROW,
> > >
> > > f2 VARCHAR<256>,
> > >
> > > f3 AS f0 + 1,
> > >
> > > PRIMARY KEY (f0),
> > >
> > > UNIQUE (f3, f2),
> > >
> > > WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> > >
> > > ) with (...)
> > >
> > >
> > > name
> > >
> > > type
> > >
> > > key
> > >
> > > compute column
> > >
> > > watermark
> > >
> > > f0
> > >
> > > BIGINT NOT NULL
> > >
> > > PRI
> > >
> > > (NULL)
> > >
> > > f1
> > >
> > > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> > >
> > > UNQ
> > >
> > > (NULL)
> > >
> > > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> > >
> > >

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-30 Thread Jark Wu

Hi,

I'm in favor of Fabian's proposal.
First, watermark is not a column, but a metadata just like primary key, so
shouldn't stand with columns.
Second, AFAIK, primary key can only be defined on top-level columns.
Third, I think watermark can also follow primary key than only allow to
define on top-level columns.

I have to admit that in FLIP-66, watermark can define on nested fields.
However, during implementation, I found that it's too complicated to do
that. We have refactor time-based physical nodes,
we have to use code generation to access event-time, we have to refactor
FlinkTypeFactory to support a complex nested rowtime.
There is not much value of this feature, but introduce a lot of complexity
in code base.
So I think we can force watermark define on top-level columns. If user want
to define on nested columns,
he/she can use computed column to be a top-level column.

Best,
Jark


On Thu, 30 Apr 2020 at 17:55, Fabian Hueske  wrote:

> Hi Godfrey,
>
> The formatting of your example seems to be broken.
> Could you send them again please?
>
> Regarding your points
> > because watermark express can be a sub-column, just like `f1.q2` in above
> example I give.
>
> I would put the watermark information in the row of the top-level field and
> indicate to which nested field the watermark refers.
> Don't we have to solve the same issue for primary keys that are defined on
> a nested field?
>
> > A boolean flag can't represent such info. and I do know whether we will
> support complex watermark expression involving multiple columns in the
> future. such as: "WATERMARK FOR ts as ts + f1 + interval '1' second"
>
> You are right, a simple binary flag is definitely not sufficient to display
> the watermark information.
> I would put the expression string into the field, i.e., "ts + f1 + interval
> '1' second"
>
>
> For me the most important point of why to not show the watermark as a row
> in the table is that it is not field that can be queried but meta
> information on an existing field.
> For the user it is important to know that a certain field has a watermark.
> Otherwise, certain queries cannot be correctly specified.
> Also there might be support for multiple watermarks that are defined of
> different fields at some point. Would those be printed in multiple rows?
>
> Best,
> Fabian
>
>
> Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he  >:
>
> > Hi Fabian, Aljoscha
> >
> > Thanks for the feedback.
> >
> > Agree with you that we can deal with primary key as you mentioned.
> > now, the type column has contained the nullability attribute, e.g. BIGINT
> > NOT NULL.
> > (I'm also ok that we use two columns to represent type just like mysql)
> >
> > >Why I treat `watermark` as a special row ?
> > because watermark express can be a sub-column, just like `f1.q2` in above
> > example I give.
> > A boolean flag can't represent such info. and I do know whether we will
> > support complex
> > watermark expression involving multiple columns in the future. such as:
> > "WATERMARK FOR ts as ts + f1 + interval '1' second"
> >
> > If we do not support complex watermark expression, we can add a watermark
> > column.
> >
> > for example:
> >
> > create table MyTable (
> >
> > f0 BIGINT NOT NULL,
> >
> > f1 ROW,
> >
> > f2 VARCHAR<256>,
> >
> > f3 AS f0 + 1,
> >
> > PRIMARY KEY (f0),
> >
> > UNIQUE (f3, f2),
> >
> > WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >
> > ) with (...)
> >
> >
> > name
> >
> > type
> >
> > key
> >
> > compute column
> >
> > watermark
> >
> > f0
> >
> > BIGINT NOT NULL
> >
> > PRI
> >
> > (NULL)
> >
> > f1
> >
> > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >
> > UNQ
> >
> > (NULL)
> >
> > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >
> > f2
> >
> > VARCHAR<256>
> >
> > (NULL)
> >
> > NULL
> >
> > f3
> >
> > BIGINT NOT NULL
> >
> > UNQ
> >
> > f0 + 1
> >
> >
> > or we add a column to represent nullability.
> >
> > name
> >
> > type
> >
> > null
> >
> > key
> >
> > compute column
> >
> > watermark
> >
> > f0
> >
> > BIGINT
> >
> > false
> >
> > PRI
> >
> > (NULL)
> >
> > f1
> >
> > ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >
> > true
> >
> > UNQ
> >
> > (NULL)
> >
> > f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
> >
> > f2
> >
> > VARCHAR<256>
> >
> > true
> >
> > (NULL)
> >
> > NULL
> >
> > f3
> >
> > BIGINT
> >
> > false
> >
> > UNQ
> >
> > f0 + 1
> >
> >
> > Personally, I like the second one. (we need do some changes on
> LogicalType
> > to get type name without nullability)
> >
> >
> > Best,
> > Godfrey
> >
> >
> > Aljoscha Krettek  于2020年4月29日周三 下午5:47写道：
> >
> > > +1 I like the general idea of printing the results as a table.
> > >
> > > On the specifics I don't know enough but Fabians suggestions seems to
> > > make sense to me.
> > >
> > > Aljoscha
> > >
> > > On 29.04.20 10:56, Fabian Hueske wrote:
> > > > Hi Godfrey,
> > > >
> > > > Thanks for starting this discussion!
> > > >
> > > > In my mind, WATERMARK is a property (or constraint) of a field, just
> > like
> > >

[jira] [Created] (FLINK-17485) Add a thread dump REST API

2020-04-30 Thread Xingxing Di (Jira)

Xingxing Di created FLINK-17485:
---

 Summary: Add a thread dump REST API
 Key: FLINK-17485
 URL: https://issues.apache.org/jira/browse/FLINK-17485
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / REST
Reporter: Xingxing Di


My team build a streaming computing platform based on flink in our company 
internal.

As jobs and users grow, we spent lot's of time to help user with 
troubleshooting.

Currently we must logon the server which running task manager, find the right 
process through netstat -anp| grep "the flink data port", then run jstack 
command.

We think it will be very convenient if flink provide a REST API for thread 
dumping, with web UI support event better.

So we want to know:
 * If community is already working on this
 * Will this be a appropriate feature (add a REST API to dump threads), because 
on the other hand, thread dump may be "expensive"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-04-30 Thread Austin Cawley-Edwards

Hey all + thanks Konstantin,

Like mentioned, we also run into issues with the RMQ Source inflexibility. I 
think Aljoscha's idea of supporting both would be a nice way to incorporate new 
changes without breaking the current API.

We'd definitely benefit from the changes proposed here but have another issue 
with the Correlation ID. When a message gets in the queue without a correlation 
ID, the source errors and the job cannot recover, requiring (painful) manual 
intervention. It would be nice to be able to dead-letter these inputs from the 
source, but I don't think that's possible with the current source interface 
(don't know too much about the source specifics). We might be able to work 
around this with a custom Correlation ID extractor, as proposed by Karim.

Also, if there are other tickets in the RMQ integrations that have gone 
unmaintained, I'm also happy to chip it at maintaining them!

Best,
Austin

From: Konstantin Knauf 
Sent: Thursday, April 30, 2020 6:14 AM
To: dev 
Cc: Austin Cawley-Edwards 
Subject: Re: [DISCUSS] flink-connector-rabbitmq api changes

Hi everyone,

just looping in Austin as he mentioned that they also ran into issues due to 
the inflexibility of the RabiitMQSourcce to me yesterday.

Cheers,

Konstantin

On Thu, Apr 30, 2020 at 11:23 AM seneg...@gmail.com 
mailto:seneg...@gmail.com>> wrote:
Hello Guys,

Thanks for all the responses, i want to stress out that i didn't feel
ignored i just thought that i forgot an important step or something.

Since i am a newbie i would follow whatever route you guys would suggest :)
and i agree that the RMQ connector needs a lot of love still "which i would
be happy to submit gradually"

as for the code i have it here in the PR:
https://github.com/senegalo/flink/pull/1 it's not that much of a change in
terms of logic but more of what is exposed.

Let me know how you want me to proceed.

Thanks again,
Karim Mansour

On Thu, Apr 30, 2020 at 10:40 AM Aljoscha Krettek 
mailto:aljos...@apache.org>>
wrote:

> Hi,
>
> I think it's good to contribute the changes to Flink directly since we
> already have the RMQ connector in the respository.
>
> I would propose something similar to the Kafka connector, which takes
> both the generic DeserializationSchema and a KafkaDeserializationSchema
> that is specific to Kafka and allows access to the ConsumerRecord and
> therefore all the Kafka features. What do you think about that?
>
> Best,
> Aljoscha
>
> On 30.04.20 10:26, Robert Metzger wrote:
> > Hey Karim,
> >
> > I'm sorry that you had such a bad experience contributing to Flink, even
> > though you are nicely following the rules.
> >
> > You mentioned that you've implemented the proposed change already. Could
> > you share a link to a branch here so that we can take a look? I can
> assess
> > the API changes easier if I see them :)
> >
> > Thanks a lot!
> >
> >
> > Best,
> > Robert
> >
> > On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz 
> > mailto:dwysakow...@apache.org>
> >
> > wrote:
> >
> >> Hi Karim,
> >>
> >> Sorry you did not have the best first time experience. You certainly did
> >> everything right which I definitely appreciate.
> >>
> >> The problem in that particular case, as I see it, is that RabbitMQ is
> >> not very actively maintained and therefore it is not easy too find a
> >> committer willing to take on this topic. The point of connectors not
> >> being properly maintained was raised a few times in the past on the ML.
> >> One of the ideas how to improve the situation there was to start a
> >> https://flink-packages.org/ page. The idea is to ask active users of
> >> certain connectors to maintain those connectors outside of the core
> >> project, while giving them a platform within the community where they
> >> can make their modules visible. That way it is possible to overcome the
> >> lack of capabilities within the core committers without loosing much on
> >> the visibility.
> >>
> >> I would kindly ask you to consider that path, if you are interested. You
> >> can of course also wait/reach out to more committers if you feel strong
> >> about contributing those changes back to the Flink repository itself.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 30/04/2020 07:29, seneg...@gmail.com wrote:
> >>> Hello,
> >>>
> >>> I am new to the mailing list and to contributing in Big opensource
> >> projects
> >>> in general and i don't know if i did something wrong or should be more
> >>> patient :)
> >>>
> >>> I put a topic for discussion as per the contribution guide "
> >>> https://flink.apache.org/contributing/how-to-contribute.html; almost a
> >> week
> >>> ago and since what i propose is not backward compatible it needs to be
> >>> discussed here before opening a ticket and moving forward.
> >>>
> >>> So my question is. Will someone pick the discussion up ? or at least
> >>> someone would say that this is not the way to go ? or should i assume
>

Re: [DISCUSS]Refactor flink-jdbc connector structure

2020-04-30 Thread Jark Wu

Big +1 from my side.
The new structure and class names look nicer now.

Regarding to the compability problem, I have looked into the public APIs in
flink-jdbc, there are 3 kinds of APIs now:
1) new introduced JdbcSink for DataStream users in 1.11
2) JDBCAppendTableSink, JDBCUpsertTableSink, JDBCTableSource are introduced
since 1.9
3) very ancient JDBCOutputFormat and JDBCInputFormat

For (1), as it's an un-released API, so I think it's safe to move to new
package. cc @Khachatryan Roman  who
contributed this.
For (2), because TableSource and TableSink are not designed to be accessed
by users since 1.11, so I think it's fine to move them.
For (3), I'm not sure how many users are still using these out-of-date
classes.
But I think it's fine to keep them for one more version, and drop them in
the next version.


Best,
Jark

On Thu, 30 Apr 2020 at 22:57, Flavio Pompermaier 
wrote:

> Very big +1 from me
>
> Best,
> Flavio
>
> On Thu, Apr 30, 2020 at 4:47 PM David Anderson 
> wrote:
>
> > I'm very happy to see the jdbc connector being normalized in this way. +1
> > from me.
> >
> > David
> >
> > On Thu, Apr 30, 2020 at 2:14 PM Timo Walther  wrote:
> >
> > > Hi Leonard,
> > >
> > > this sounds like a nice refactoring for consistency. +1 from my side.
> > >
> > > However, I'm not sure how much backwards compatibility is required.
> > > Maybe others can comment on this.
> > >
> > > Thanks,
> > > Timo
> > >
> > > On 30.04.20 14:09, Leonard Xu wrote:
> > > > Hi, dear community
> > > >
> > > > Recently, I’m thinking to refactor the flink-jdbc connector structure
> > > before release 1.11.
> > > > After the refactor, in the future,  we can easily introduce unified
> > > pluggable JDBC dialect for Table and DataStream, and we can have a
> better
> > > module organization and implementations.
> > > >
> > > > So, I propose following changes:
> > > > 1) Use `Jdbc` instead of `JDBC` in the new public API and interface
> > > name. The Datastream API `JdbcSink` which imported in this version has
> > > followed this standard.
> > > >
> > > > 2) Move all interface and classes from `org.apache.flink.java.io
> > .jdbc`(old
> > > package) to `org.apache.flink.connector.jdbc`(new package) to follow
> the
> > > base connector path in FLIP-27.
> > > > I think we can move JDBC TableSource, TableSink and factory from old
> > > package to new package because TableEnvironment#registerTableSource、
> > > TableEnvironment#registerTableSink  will be removed in 1.11 ans these
> > > classes are not exposed to users[1].
> > > > We can move Datastream API JdbcSink from old package to new package
> > > because it’s  introduced in this version.
> > > > We will still keep `JDBCInputFormat` and `JDBCOoutoutFormat` in old
> > > package and deprecate them.
> > > > Other classes/interfaces are internal used and we can move to new
> > > package without breaking compatibility.
> > > > 3) Rename `flink-jdbc` to `flink-connector-jdbc`. well, this is a
> > > compatibility broken change but in order to comply with other
> connectors
> > > and it’s real a connector rather than a flink-jdc-driver[2] we’d better
> > > decide do it ASAP.
> > > >
> > > >
> > > > What do you think? Any feedback is appreciate.
> > > >
> > > >
> > > > Best,
> > > > Leonard Xu
> > > >
> > > > [1]
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
> > > <
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
> > > >
> > > > [2]https://github.com/ververica/flink-jdbc-driver <
> > > https://github.com/ververica/flink-jdbc-driver>
> > > >
> > > >
> > >
> > >
> >
>

Re: [PROPOSAL] Google Season of Docs 2020.

2020-04-30 Thread Marta Paes Moreira

The application to Season of Docs 2020 is close to being finalized. I've
created a PR with the application announcement for the Flink blog [1] (as
required by Google OSS).

Thanks a lot to everyone who pitched in — and special thanks to Aljoscha
and Seth for volunteering as mentors!

I'll send an update to this thread once the results are out (May 11th).

[1] https://github.com/apache/flink-web/pull/332

On Mon, Apr 27, 2020 at 9:28 PM Seth Wiesman  wrote:

> Hi Marta,
>
> I think this is a great idea, I'd be happy to help mentor a table
> documentation project.
>
> Seth
>
> On Thu, Apr 23, 2020 at 8:38 AM Marta Paes Moreira 
> wrote:
>
> > Thanks for the feedback!
> >
> > So far, the projects on the table are:
> >
> >1. Improving the Table API/SQL documentation.
> >2. Improving the documentation about Deployments.
> >3. Restructuring and standardizing the documentation about Connectors.
> >4. Finishing the Chinese translation.
> >
> > I think 2. would require a lot of technical knowledge about Flink, which
> > might not be a good fit for GSoD (as discussed last year).
> >
> > As for mentors, we have:
> >
> >- Aljoscha (Table API/SQL)
> >- Till (Deployments)
> >- Stephan also said he'd be happy to participate as a mentor if
> needed.
> >
> > For the translation project, I'm pulling in the people involved in last
> > year's thread (Jark and Jincheng), as we would need two chinese-speaking
> > mentors.
> >
> > I'll follow up with a draft proposal early next week, once we reach a
> > consensus and have enough mentors (2 per project). Thanks again!
> >
> > Marta
> >
> >
> > On Fri, Apr 17, 2020 at 2:53 PM Till Rohrmann 
> > wrote:
> >
> > > Thanks for driving this effort Marta.
> > >
> > > I'd be up for mentoring improvements for the deployment section as
> > > described in FLIP-42.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Apr 17, 2020 at 11:20 AM Aljoscha Krettek  >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > first, excellent that you're driving this, Marta!
> > > >
> > > > By now I have made quite some progress on the FLIP-42 restructuring
> so
> > > > that is not a good effort for someone to join now. Plus there is also
> > > > [1], which is about incorporating the existing Flink Training
> material
> > > > into the concepts section of the Flink doc.
> > > >
> > > > What I think would be very good is working on the Table API/SQL
> > > > documentation [2]. We don't necessarily have to take the FLIP as a
> > basis
> > > > but we can, or we can start from a blank slate. I think the current
> > > > structure as well as the content is sub-optimal (not good, really).
> It
> > > > would be ideal to have someone get to now the system and then write
> > > > documentation for that part of Flink that has both good structure and
> > > > content and nicely guides new users.
> > > >
> > > > I would be very happy to mentor that effort.
> > > >
> > > > Best,
> > > > Aljoscha
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/rea9cbfffd9a1c0ca6f78f7e8c8497d81f32ed4a7b9e7a05d59f3b2e9%40%3Cdev.flink.apache.org%3E
> > > >
> > > > [2]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685
> > > >
> > > > On 17.04.20 09:21, Robert Metzger wrote:
> > > > > Thanks a lot for volunteering to drive an application for the Flink
> > > > project!
> > > > >
> > > > > Last year, we discussed finishing the chinese translation as a
> > > potential
> > > > > project. I believe there's still a need for this.
> > > > > Since the work on the project starts pretty far in the future
> > > > (September),
> > > > > the translation project is a good fit as well (there's currently no
> > > major
> > > > > effort on the translation, rather a constant flow of PRs, but I
> don't
> > > > think
> > > > > that is enough to finish the translation).
> > > > >
> > > > >
> > > > > On Fri, Apr 17, 2020 at 9:15 AM Konstantin Knauf <
> kna...@apache.org>
> > > > wrote:
> > > > >
> > > > >> Hi Marta,
> > > > >>
> > > > >> Thanks for kicking off the discussion. Aljoscha has recently
> revived
> > > the
> > > > >> implementation of the FLIP-42 and has already moved things around
> > > quite
> > > > a
> > > > >> bit. [1]
> > > > >>
> > > > >> There are a lot of areas that can be improved of course, but a lot
> > of
> > > > them
> > > > >> require very deep knowledge about the system (e.g. the
> "Deployment"
> > or
> > > > >> "Concepts" section). One area that I could imagine working well in
> > > such
> > > > a
> > > > >> format is to work on the "Connectors" section. Aljoscha has
> already
> > > > moved
> > > > >> this to the top-level, but it besides that it has not been touched
> > yet
> > > > in
> > > > >> the course of FLIP-42. The documentation project could be around
> > > > >> restructuring, standardization and generally improving the
> > > > documentation of
> > > > >> our connectors for both Datastream as well as Table API/SQL.
> > > > >>
> >

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-04-30 Thread Jark Wu

Hi Konstantin,

Thanks for the link of Java Faker. It's an intereting project and
could benefit to a comprehensive datagen source.

What the discarding and printing sink look like in your thought?
1) manually create a table with a `blackhole` or `print` connector, e.g.

CREATE TABLE my_sink (
  a INT,
  b STRNG,
  c DOUBLE
) WITH (
  'connector' = 'print'
);
INSERT INTO my_sink SELECT a, b, c FROM my_source;

2) a system built-in table named `blackhole` and `print` without manually
schema work, e.g.
INSERT INTO print SELECT a, b, c, d FROM my_source;

Best,
Jark



On Thu, 30 Apr 2020 at 21:19, Konstantin Knauf  wrote:

> Hi everyone,
>
> sorry for reviving this thread at this point in time. Generally, I think,
> this is a very valuable effort. Have we considered only providing a very
> basic data generator (+ discarding and printing sink tables) in Apache
> Flink and moving a more comprehensive data generating table source to an
> ecosystem project promoted on flink-packages.org. I think this has a lot
> of
> potential (e.g. in combination with Java Faker [1]), but it would probably
> be better served in a small separately maintained repository.
>
> Cheers,
>
> Konstantin
>
> [1] https://github.com/DiUS/java-faker
>
>
> On Tue, Mar 24, 2020 at 9:10 AM Jingsong Li 
> wrote:
>
> > Hi all,
> >
> > I created https://issues.apache.org/jira/browse/FLINK-16743 for
> follow-up
> > discussion. FYI.
> >
> > Best,
> > Jingsong Lee
> >
> > On Tue, Mar 24, 2020 at 2:20 PM Bowen Li  wrote:
> >
> > > I agree with Jingsong that sink schema inference and system tables can
> be
> > > considered later. I wouldn’t recommend to tackle them for the sake of
> > > simplifying user experience to the extreme. Providing the above handy
> > > source and sink implementations already offer users a ton of immediate
> > > value.
> > >
> > >
> > > On Mon, Mar 23, 2020 at 20:20 Jingsong Li 
> > wrote:
> > >
> > > > Hi Benchao,
> > > >
> > > > > do you think we need to add more columns with various types?
> > > >
> > > > I didn't list all types, but we should support primitive types,
> > varchar,
> > > > Decimal, Timestamp and etc...
> > > > This can be done continuously.
> > > >
> > > > Hi Benchao, Jark,
> > > > About console and blackhole, yes, they can have no schema, the schema
> > can
> > > > be inferred by upstream node.
> > > > - But now we don't have this mechanism to do these configurable sink
> > > > things.
> > > > - If we want to support, we need a single way to support these two
> > sinks.
> > > > - And uses can use "create table like" and others way to simplify
> DDL.
> > > >
> > > > And for providing system/registered tables (`console` and
> `blackhole`):
> > > > - I have no strong opinion on these system tables. In SQL, will be
> > > "insert
> > > > into blackhole select a /*int*/, b /*string*/ from tableA", "insert
> > into
> > > > blackhole select a /*double*/, b /*Map*/, c /*string*/ from tableB".
> It
> > > > seems that Blackhole is a universal thing, which makes me feel bad
> > > > intuitively.
> > > > - Can user override these tables? If can, we need ensure it can be
> > > > overwrite by catalog tables.
> > > >
> > > > So I think we can leave these system tables to future too.
> > > > What do you think?
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Mon, Mar 23, 2020 at 4:44 PM Jark Wu  wrote:
> > > >
> > > > > Hi Jingsong,
> > > > >
> > > > > Regarding (2) and (3), I was thinking to ignore manually DDL work,
> so
> > > > users
> > > > > can use them directly:
> > > > >
> > > > > # this will log results to `.out` files
> > > > > INSERT INTO console
> > > > > SELECT ...
> > > > >
> > > > > # this will drop all received records
> > > > > INSERT INTO blackhole
> > > > > SELECT ...
> > > > >
> > > > > Here `console` and `blackhole` are system sinks which is similar to
> > > > system
> > > > > functions.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > On Mon, 23 Mar 2020 at 16:33, Benchao Li 
> > wrote:
> > > > >
> > > > > > Hi Jingsong,
> > > > > >
> > > > > > Thanks for bring this up. Generally, it's a very good proposal.
> > > > > >
> > > > > > About data gen source, do you think we need to add more columns
> > with
> > > > > > various types?
> > > > > >
> > > > > > About print sink, do we need to specify the schema?
> > > > > >
> > > > > > Jingsong Li  于2020年3月23日周一 下午1:51写道：
> > > > > >
> > > > > > > Thanks Bowen, Jark and Dian for your feedback and suggestions.
> > > > > > >
> > > > > > > I reorganize with your suggestions, and try to expose DDLs:
> > > > > > >
> > > > > > > 1.datagen source:
> > > > > > > - easy startup/test for streaming job
> > > > > > > - performance testing
> > > > > > >
> > > > > > > DDL:
> > > > > > > CREATE TABLE user (
> > > > > > > id BIGINT,
> > > > > > > age INT,
> > > > > > > description STRING
> > > > > > > ) WITH (
> > > > > > > 'connector.type' = 'datagen',
> > > > > > > 'connector.rows-per-second'='100',
> > > > > > >

Re: [DISCUSS]Refactor flink-jdbc connector structure

2020-04-30 Thread Flavio Pompermaier

Very big +1 from me

Best,
Flavio

On Thu, Apr 30, 2020 at 4:47 PM David Anderson 
wrote:

> I'm very happy to see the jdbc connector being normalized in this way. +1
> from me.
>
> David
>
> On Thu, Apr 30, 2020 at 2:14 PM Timo Walther  wrote:
>
> > Hi Leonard,
> >
> > this sounds like a nice refactoring for consistency. +1 from my side.
> >
> > However, I'm not sure how much backwards compatibility is required.
> > Maybe others can comment on this.
> >
> > Thanks,
> > Timo
> >
> > On 30.04.20 14:09, Leonard Xu wrote:
> > > Hi, dear community
> > >
> > > Recently, I’m thinking to refactor the flink-jdbc connector structure
> > before release 1.11.
> > > After the refactor, in the future,  we can easily introduce unified
> > pluggable JDBC dialect for Table and DataStream, and we can have a better
> > module organization and implementations.
> > >
> > > So, I propose following changes:
> > > 1) Use `Jdbc` instead of `JDBC` in the new public API and interface
> > name. The Datastream API `JdbcSink` which imported in this version has
> > followed this standard.
> > >
> > > 2) Move all interface and classes from `org.apache.flink.java.io
> .jdbc`(old
> > package) to `org.apache.flink.connector.jdbc`(new package) to follow the
> > base connector path in FLIP-27.
> > > I think we can move JDBC TableSource, TableSink and factory from old
> > package to new package because TableEnvironment#registerTableSource、
> > TableEnvironment#registerTableSink  will be removed in 1.11 ans these
> > classes are not exposed to users[1].
> > > We can move Datastream API JdbcSink from old package to new package
> > because it’s  introduced in this version.
> > > We will still keep `JDBCInputFormat` and `JDBCOoutoutFormat` in old
> > package and deprecate them.
> > > Other classes/interfaces are internal used and we can move to new
> > package without breaking compatibility.
> > > 3) Rename `flink-jdbc` to `flink-connector-jdbc`. well, this is a
> > compatibility broken change but in order to comply with other connectors
> > and it’s real a connector rather than a flink-jdc-driver[2] we’d better
> > decide do it ASAP.
> > >
> > >
> > > What do you think? Any feedback is appreciate.
> > >
> > >
> > > Best,
> > > Leonard Xu
> > >
> > > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
> > <
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
> > >
> > > [2]https://github.com/ververica/flink-jdbc-driver <
> > https://github.com/ververica/flink-jdbc-driver>
> > >
> > >
> >
> >
>

Re: [DISCUSS]Refactor flink-jdbc connector structure

2020-04-30 Thread David Anderson

I'm very happy to see the jdbc connector being normalized in this way. +1
from me.

David

On Thu, Apr 30, 2020 at 2:14 PM Timo Walther  wrote:

> Hi Leonard,
>
> this sounds like a nice refactoring for consistency. +1 from my side.
>
> However, I'm not sure how much backwards compatibility is required.
> Maybe others can comment on this.
>
> Thanks,
> Timo
>
> On 30.04.20 14:09, Leonard Xu wrote:
> > Hi, dear community
> >
> > Recently, I’m thinking to refactor the flink-jdbc connector structure
> before release 1.11.
> > After the refactor, in the future,  we can easily introduce unified
> pluggable JDBC dialect for Table and DataStream, and we can have a better
> module organization and implementations.
> >
> > So, I propose following changes:
> > 1) Use `Jdbc` instead of `JDBC` in the new public API and interface
> name. The Datastream API `JdbcSink` which imported in this version has
> followed this standard.
> >
> > 2) Move all interface and classes from `org.apache.flink.java.io.jdbc`(old
> package) to `org.apache.flink.connector.jdbc`(new package) to follow the
> base connector path in FLIP-27.
> > I think we can move JDBC TableSource, TableSink and factory from old
> package to new package because TableEnvironment#registerTableSource、
> TableEnvironment#registerTableSink  will be removed in 1.11 ans these
> classes are not exposed to users[1].
> > We can move Datastream API JdbcSink from old package to new package
> because it’s  introduced in this version.
> > We will still keep `JDBCInputFormat` and `JDBCOoutoutFormat` in old
> package and deprecate them.
> > Other classes/interfaces are internal used and we can move to new
> package without breaking compatibility.
> > 3) Rename `flink-jdbc` to `flink-connector-jdbc`. well, this is a
> compatibility broken change but in order to comply with other connectors
> and it’s real a connector rather than a flink-jdc-driver[2] we’d better
> decide do it ASAP.
> >
> >
> > What do you think? Any feedback is appreciate.
> >
> >
> > Best,
> > Leonard Xu
> >
> > [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
> <
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
> >
> > [2]https://github.com/ververica/flink-jdbc-driver <
> https://github.com/ververica/flink-jdbc-driver>
> >
> >
>
>

"[VOTE] FLIP-108: edit the Public API"

2020-04-30 Thread Yangze Guo

Hi, there.

The "FLIP-108: Add GPU support in Flink"[1] is now working in
progress. However, we met problems regarding class loader and
dependency. For more details, you could look at the discussion[2]. The
discussion thread is now converged and the solution is changing the
RuntimeContext#getExternalResourceInfos, let it return
ExternalResourceInfo and adding methods to ExternalResourceInfo
interface.

Since the solution involves changes in the Public API. We'd like to
start a voting thread for it.

The proposed change is:

```
public interface RuntimeContext {
/**
 * Get the specific external resource information by the resourceName.
 */
Set getExternalResourceInfos(String resourceName);
}
```

```
public interface ExternalResourceInfo {
  String getProperty(String key);
  Collection getKeys();
}
```

The vote will be open for at least 72 hours. Unless there is an objection,
I will try to close it by May 4, 2020 14:00 UTC if we have received
sufficient votes.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-108-Problems-regarding-the-class-loader-and-dependency-td40893.html

Best,
Yangze Guo

[GitHub] [flink-shaded] piyushnarang commented on pull request #85: [FLINK-16955] Bump Zookeeper 3.4.X to 3.4.14

2020-04-30 Thread GitBox



piyushnarang commented on pull request #85:
URL: https://github.com/apache/flink-shaded/pull/85#issuecomment-621874740


   Yeah, let me double check if things continue to work after the excludes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-shaded] piyushnarang commented on a change in pull request #85: [FLINK-16955] Bump Zookeeper 3.4.X to 3.4.14

2020-04-30 Thread GitBox



piyushnarang commented on a change in pull request #85:
URL: https://github.com/apache/flink-shaded/pull/85#discussion_r418035841



##
File path: flink-shaded-zookeeper-parent/flink-shaded-zookeeper-34/pom.xml
##
@@ -128,4 +128,4 @@ under the License.
 
 
 
-

Review comment:
   yes, will do. Missed this when I put it up





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-17484) Enable type coercion

2020-04-30 Thread Timo Walther (Jira)

Timo Walther created FLINK-17484:


 Summary: Enable type coercion
 Key: FLINK-17484
 URL: https://issues.apache.org/jira/browse/FLINK-17484
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / API, Table SQL / Planner
Reporter: Timo Walther
Assignee: Timo Walther


Calcite supports better inference for untyped NULL literals by enabling type 
coercion. This is currently still disabled in Flink SQL. We should think about 
enabling it. A first research around this topic has shown that this feature is 
still experimental. Maybe we only support a custom type coercion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-36 - Support Interactive Programming in Flink Table API

2020-04-30 Thread Kurt Young

+1 to what Timo has said.

One more comment about the relation of this FLIP and FLIP-84, in FLIP-84 we
start to deprecate all APIs which will buffer the table operation or plans.
You can think of APIs like `sqlUpdate`,
and `insertInto` is some kind of buffer operation, and all buffered
operations will be executed by TableEnv.execute().

This causes ambiguous API behavior since we have other operation which will
be executed eagerly, like passing
a DDL statement to `sqlUpdate`.

>From the example of this FLIP, I think it still follows the buffer kind API
design which I think needs more discussion.

Best,
Kurt


On Thu, Apr 30, 2020 at 6:57 PM Timo Walther  wrote:

> Hi Xuannan,
>
> sorry, for not entering the discussion earlier. Could you please update
> the FLIP to how it would like after FLIP-84? I think your proposal makes
> sense to me and aligns well with the other efforts from an API
> perspective. However, here are some thought from my side:
>
> It would be nice to already think about how we can support this feature
> in a multi-statement SQL file. Would we use planner hints (as discussed
> in FLIP-113) or rather introduce some custom DCL on top of views?
>
> Actually, the semantics of a cached table are very similar to
> materialized views. Maybe we should think about unifying these concepts?
>
> CREATE MATERIALIZED VIEW x SELECT * FROM t;
>
> table.materialize();
>
> table.unmaterialize();
>
>  From a runtime perspecitive, even for streaming a user could define a
> caching service that is backed by a regular table source/table sink.
>
> Currently, people are busy with feature freeze of Flink 1.11. Maybe we
> could postpone the discussion after May 15th. I guess this FLIP is
> targeted for Flink 1.12 anyways.
>
> Regards,
> Timo
>
> On 30.04.20 09:00, Xuannan Su wrote:
> > Hi all,
> >
> > I'd like to start the vote for FLIP-36[1], which has been discussed in
> > thread[2].
> >
> > The vote will be open for 72h, until May 3, 2020, 07:00 AM UTC, unless
> > there's an objection.
> >
> > Best,
> > Xuannan
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > [2]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-36-Support-Interactive-Programming-in-Flink-Table-API-td40592.html
> >
>
>

[jira] [Created] (FLINK-17483) Missing NOTICE file in flink-sql-connector-elasticsearch7 to reflect bundled dependencies

2020-04-30 Thread Yu Li (Jira)

Yu Li created FLINK-17483:
-

 Summary: Missing NOTICE file in flink-sql-connector-elasticsearch7 
to reflect bundled dependencies
 Key: FLINK-17483
 URL: https://issues.apache.org/jira/browse/FLINK-17483
 Project: Flink
  Issue Type: Bug
  Components: Connectors / ElasticSearch
Affects Versions: 1.10.0, 1.11.0
Reporter: Yu Li
 Fix For: 1.10.1, 1.11.0


This issue is found during 1.10.1 RC1 check by Robert, more details please 
refer to the [ML discussion 
thread|http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-10-1-release-candidate-1-tp40724p40894.html].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS][FLIP-108] Problems regarding the class loader and dependency

2020-04-30 Thread Yangze Guo

Thanks for the feedback, @Aljoscha and @Till!

Glad to see that we reach a consensus on the third proposal.

Regarding the detail of the `ExternalResourceInfo`, I think Till's
proposal might be good enough. It seems these two methods already
fulfill our requirement and using "Properties" itself might limit the
flexibility.

Best,
Yangze Guo

On Thu, Apr 30, 2020 at 7:41 PM Aljoscha Krettek  wrote:
>
> I agree with Till and Xintong, if the ExternalResourceInfo is only a
> holder of properties that doesn't have any sublasses it can just become
> the "properties" itself.
>
> Aljoscha
>
> On 30.04.20 12:49, Till Rohrmann wrote:
> > Thanks for the clarification.
> >
> > I think you are right that the typed approach does not work with the plugin
> > mechanism because even if we had the specific ExternalResourceInfo subtype
> > available one could not cast it into this type because the actual instance
> > has been loaded by a different class loader.
> >
> > I also think that the plugin approach is indeed best in order to avoid
> > dependency conflicts. Hence, I believe that the third proposal is a good
> > solution. I agree with Xintong, that we should not return a Properties
> > instance though. Maybe
> >
> > public interface ExternalResourceInfo {
> >String getProperty(String key);
> >Collection getKeys();
> > }
> >
> > would be good enough.
> >
> >
> > Cheers,
> > Till
> >
> > On Wed, Apr 29, 2020 at 2:17 PM Yang Wang  wrote:
> >
> >> I am also in favor of the option3. Since the Flink FileSystem has the very
> >> similar implementation via plugin mechanism. It has a map "FS_FACTORIES"
> >> to store the plugin-loaded specific FileSystem(e.g. S3, AzureFS, OSS,
> >> etc.).
> >> And provide some common interfaces.
> >>
> >>
> >> Best,
> >> Yang
> >>
> >> Yangze Guo  于2020年4月29日周三 下午3:54写道：
> >>
> >>> For your convenience, I modified the Tokenizer in "WordCount"[1] case
> >>> to show how UDF leverages GPU info and how we found that problem.
> >>>
> >>> [1]
> >>>
> >> https://github.com/KarmaGYZ/flink/blob/7c5596e43f6d14c65063ab0917f3c0d4bc0211ed/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java
> >>>
> >>> Best,
> >>> Yangze Guo
> >>>
> >>> On Wed, Apr 29, 2020 at 3:25 PM Xintong Song 
> >>> wrote:
> 
> >
> > Will she ask for some properties and then pass them to another
> >>> component?
> 
>  Yes. Take GPU as an example, the property needed is "GPU index", and
> >> the
>  index will be used to tell the OS which GPU should be used for the
>  computing workload.
> 
> 
> > Where does this component come from?
> 
>  The component could be either the UDF/operator itself, or some AI
> >>> libraries
>  used by the operator. For 1.11, we do not have plan for introducing new
> >>> GPU
>  aware operators in Flink. So all the usages of the GPU resources should
>  come from UDF. Please correct me if I am wrong, @Becket.
> 
>  Thank you~
> 
>  Xintong Song
> 
> 
> 
>  On Wed, Apr 29, 2020 at 3:14 PM Till Rohrmann 
> >>> wrote:
> 
> > Thanks for bringing this up Yangze and Xintong. I see the problem.
> >>> Help me
> > to understand how the ExternalResourceInfo is intended to be used by
> >>> the
> > user. Will she ask for some properties and then pass them to another
> > component? Where does this component come from?
> >
> > Cheers,
> > Till
> >
> > On Wed, Apr 29, 2020 at 9:05 AM Xintong Song 
> > wrote:
> >
> >> Thanks for kicking off this discussion, Yangze.
> >>
> >> First, let me try to explain a bit more about this problem. Since
> >> we
> >> decided to make the `ExternalResourceDriver` a plugin whose
> > implementation
> >> could be provided by user, we think it makes sense to leverage
> >>> Flink’s
> >> plugin mechanism and load the drivers in separated class loaders to
> >>> avoid
> >> potential risk of dependency conflicts. However, that means
> >> `RuntimeContext` and user codes do not naturally have access to
> >>> classes
> >> defined in the plugin. In the current design,
> >> `RuntimeContext#getExternalResourceInfos` takes the concrete
> >> `ExternalResourceInfo` implementation class as an argument. This
> >> will
> > cause
> >> problem when user codes try to pass in the argument, and when
> >> `RuntimeContext` tries to do the type check/cast.
> >>
> >>
> >> To my understanding, the root problem is probably that we should
> >> not
> > depend
> >> on a specific implementation of the `ExternalResourceInfo`
> >> interface
> >>> from
> >> outside the plugin (user codes & runtime context). To that end,
> > regardless
> >> the detailed interface design, I'm in favor of the direction of the
> >>> 3rd
> >> approach. I think it makes sense to add some general
> >>> information/property
> >> accessing interfaces in

Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-04-30 Thread Konstantin Knauf

Hi everyone,

sorry for reviving this thread at this point in time. Generally, I think,
this is a very valuable effort. Have we considered only providing a very
basic data generator (+ discarding and printing sink tables) in Apache
Flink and moving a more comprehensive data generating table source to an
ecosystem project promoted on flink-packages.org. I think this has a lot of
potential (e.g. in combination with Java Faker [1]), but it would probably
be better served in a small separately maintained repository.

Cheers,

Konstantin

[1] https://github.com/DiUS/java-faker


On Tue, Mar 24, 2020 at 9:10 AM Jingsong Li  wrote:

> Hi all,
>
> I created https://issues.apache.org/jira/browse/FLINK-16743 for follow-up
> discussion. FYI.
>
> Best,
> Jingsong Lee
>
> On Tue, Mar 24, 2020 at 2:20 PM Bowen Li  wrote:
>
> > I agree with Jingsong that sink schema inference and system tables can be
> > considered later. I wouldn’t recommend to tackle them for the sake of
> > simplifying user experience to the extreme. Providing the above handy
> > source and sink implementations already offer users a ton of immediate
> > value.
> >
> >
> > On Mon, Mar 23, 2020 at 20:20 Jingsong Li 
> wrote:
> >
> > > Hi Benchao,
> > >
> > > > do you think we need to add more columns with various types?
> > >
> > > I didn't list all types, but we should support primitive types,
> varchar,
> > > Decimal, Timestamp and etc...
> > > This can be done continuously.
> > >
> > > Hi Benchao, Jark,
> > > About console and blackhole, yes, they can have no schema, the schema
> can
> > > be inferred by upstream node.
> > > - But now we don't have this mechanism to do these configurable sink
> > > things.
> > > - If we want to support, we need a single way to support these two
> sinks.
> > > - And uses can use "create table like" and others way to simplify DDL.
> > >
> > > And for providing system/registered tables (`console` and `blackhole`):
> > > - I have no strong opinion on these system tables. In SQL, will be
> > "insert
> > > into blackhole select a /*int*/, b /*string*/ from tableA", "insert
> into
> > > blackhole select a /*double*/, b /*Map*/, c /*string*/ from tableB". It
> > > seems that Blackhole is a universal thing, which makes me feel bad
> > > intuitively.
> > > - Can user override these tables? If can, we need ensure it can be
> > > overwrite by catalog tables.
> > >
> > > So I think we can leave these system tables to future too.
> > > What do you think?
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Mar 23, 2020 at 4:44 PM Jark Wu  wrote:
> > >
> > > > Hi Jingsong,
> > > >
> > > > Regarding (2) and (3), I was thinking to ignore manually DDL work, so
> > > users
> > > > can use them directly:
> > > >
> > > > # this will log results to `.out` files
> > > > INSERT INTO console
> > > > SELECT ...
> > > >
> > > > # this will drop all received records
> > > > INSERT INTO blackhole
> > > > SELECT ...
> > > >
> > > > Here `console` and `blackhole` are system sinks which is similar to
> > > system
> > > > functions.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Mon, 23 Mar 2020 at 16:33, Benchao Li 
> wrote:
> > > >
> > > > > Hi Jingsong,
> > > > >
> > > > > Thanks for bring this up. Generally, it's a very good proposal.
> > > > >
> > > > > About data gen source, do you think we need to add more columns
> with
> > > > > various types?
> > > > >
> > > > > About print sink, do we need to specify the schema?
> > > > >
> > > > > Jingsong Li  于2020年3月23日周一 下午1:51写道：
> > > > >
> > > > > > Thanks Bowen, Jark and Dian for your feedback and suggestions.
> > > > > >
> > > > > > I reorganize with your suggestions, and try to expose DDLs:
> > > > > >
> > > > > > 1.datagen source:
> > > > > > - easy startup/test for streaming job
> > > > > > - performance testing
> > > > > >
> > > > > > DDL:
> > > > > > CREATE TABLE user (
> > > > > > id BIGINT,
> > > > > > age INT,
> > > > > > description STRING
> > > > > > ) WITH (
> > > > > > 'connector.type' = 'datagen',
> > > > > > 'connector.rows-per-second'='100',
> > > > > > 'connector.total-records'='100',
> > > > > >
> > > > > > 'schema.id.generator' = 'sequence',
> > > > > > 'schema.id.generator.start' = '1',
> > > > > >
> > > > > > 'schema.age.generator' = 'random',
> > > > > > 'schema.age.generator.min' = '0',
> > > > > > 'schema.age.generator.max' = '100',
> > > > > >
> > > > > > 'schema.description.generator' = 'random',
> > > > > > 'schema.description.generator.length' = '100'
> > > > > > )
> > > > > >
> > > > > > Default is random generator.
> > > > > > Hi Jark, I don't want to bring complicated regularities, because
> it
> > > can
> > > > > be
> > > > > > done through computed columns. And it is hard to define
> > > > > > standard regularities, I think we can leave it to the future.
> > > > > >
> > > > > > 2.print sink:
> > > > > > - easy test for streaming job
> > > > > > - be very useful in production debugging
> > > > > >
>

[jira] [Created] (FLINK-17482) KafkaITCase.testMultipleSourcesOnePartition unstable

2020-04-30 Thread Robert Metzger (Jira)

Robert Metzger created FLINK-17482:
--

 Summary: KafkaITCase.testMultipleSourcesOnePartition unstable
 Key: FLINK-17482
 URL: https://issues.apache.org/jira/browse/FLINK-17482
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Tests
Affects Versions: 1.11.0
Reporter: Robert Metzger


CI run: 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=454=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20

{code}
07:29:40,472 [main] INFO  
org.apache.flink.streaming.connectors.kafka.KafkaTestBase[] - 
-
[ERROR] Tests run: 22, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
152.018 s <<< FAILURE! - in 
org.apache.flink.streaming.connectors.kafka.KafkaITCase
[ERROR] 
testMultipleSourcesOnePartition(org.apache.flink.streaming.connectors.kafka.KafkaITCase)
  Time elapsed: 4.257 s  <<< FAILURE!
java.lang.AssertionError: Test failed: Job execution failed.
at org.junit.Assert.fail(Assert.java:88)
at org.apache.flink.test.util.TestUtils.tryExecute(TestUtils.java:45)
at 
org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runMultipleSourcesOnePartitionExactlyOnceTest(KafkaConsumerTestBase.java:963)
at 
org.apache.flink.streaming.connectors.kafka.KafkaITCase.testMultipleSourcesOnePartition(KafkaITCase.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)

{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17481) Cannot set LocalDateTime column as rowtime when converting DataStream to Table

2020-04-30 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-17481:
--

 Summary: Cannot set LocalDateTime column as rowtime when 
converting DataStream to Table
 Key: FLINK-17481
 URL: https://issues.apache.org/jira/browse/FLINK-17481
 Project: Flink
  Issue Type: Sub-task
  Components: Table SQL / Planner
Reporter: Gyula Fora


I am trying to convert an embedded LocalDateTime timestamp into a rowtime 
column while converting from DataStream to table.
{code:java}
DataStream> in = 
env.fromElements(Tuple1.of(LocalDateTime.now()))
 .returns(new 
TupleTypeInfo<>(LocalTimeTypeInfo.getInfoFor(LocalDateTime.class)));

tableEnv.sqlQuery("select * FROM " + tableEnv.fromDataStream(in, 
"f0.rowtime"));{code}

Unfortunately this leads to the following error:
{noformat}
org.apache.flink.table.api.ValidationException: The rowtime attribute can only 
replace a field with a valid time type, such as Timestamp or Long. But was: 
LocalDateTime{noformat}
It seems that only java.sql.Timestamp classes are supported for rowtime 
conversion now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS]Refactor flink-jdbc connector structure

2020-04-30 Thread Timo Walther


Hi Leonard,

this sounds like a nice refactoring for consistency. +1 from my side.

However, I'm not sure how much backwards compatibility is required. 
Maybe others can comment on this.


Thanks,
Timo

On 30.04.20 14:09, Leonard Xu wrote:

Hi, dear community

Recently, I’m thinking to refactor the flink-jdbc connector structure before 
release 1.11.
After the refactor, in the future,  we can easily introduce unified  pluggable 
JDBC dialect for Table and DataStream, and we can have a better module 
organization and implementations.

So, I propose following changes:
1) Use `Jdbc` instead of `JDBC` in the new public API and interface name. The 
Datastream API `JdbcSink` which imported in this version has followed this 
standard.

2) Move all interface and classes from `org.apache.flink.java.io.jdbc`(old 
package) to `org.apache.flink.connector.jdbc`(new package) to follow the base 
connector path in FLIP-27.
I think we can move JDBC TableSource, TableSink and factory from old package to 
new package because TableEnvironment#registerTableSource、 
TableEnvironment#registerTableSink  will be removed in 1.11 ans these classes 
are not exposed to users[1].
We can move Datastream API JdbcSink from old package to new package because 
it’s  introduced in this version.
We will still keep `JDBCInputFormat` and `JDBCOoutoutFormat` in old package and 
deprecate them.
Other classes/interfaces are internal used and we can move to new package 
without breaking compatibility.
3) Rename `flink-jdbc` to `flink-connector-jdbc`. well, this is a compatibility 
broken change but in order to comply with other connectors and it’s real a 
connector rather than a flink-jdc-driver[2] we’d better decide do it ASAP.


What do you think? Any feedback is appreciate.


Best,
Leonard Xu

[1]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
 

[2]https://github.com/ververica/flink-jdbc-driver

[DISCUSS]Refactor flink-jdbc connector structure

2020-04-30 Thread Leonard Xu

Hi, dear community

Recently, I’m thinking to refactor the flink-jdbc connector structure before 
release 1.11. 
After the refactor, in the future,  we can easily introduce unified  pluggable 
JDBC dialect for Table and DataStream, and we can have a better module 
organization and implementations.

So, I propose following changes: 
1) Use `Jdbc` instead of `JDBC` in the new public API and interface name. The 
Datastream API `JdbcSink` which imported in this version has followed this 
standard. 

2) Move all interface and classes from `org.apache.flink.java.io.jdbc`(old 
package) to `org.apache.flink.connector.jdbc`(new package) to follow the base 
connector path in FLIP-27.
I think we can move JDBC TableSource, TableSink and factory from old package to 
new package because TableEnvironment#registerTableSource、 
TableEnvironment#registerTableSink  will be removed in 1.11 ans these classes 
are not exposed to users[1].
We can move Datastream API JdbcSink from old package to new package because 
it’s  introduced in this version.
We will still keep `JDBCInputFormat` and `JDBCOoutoutFormat` in old package and 
deprecate them.
Other classes/interfaces are internal used and we can move to new package 
without breaking compatibility.
3) Rename `flink-jdbc` to `flink-connector-jdbc`. well, this is a compatibility 
broken change but in order to comply with other connectors and it’s real a 
connector rather than a flink-jdc-driver[2] we’d better decide do it ASAP.


What do you think? Any feedback is appreciate.


Best,
Leonard Xu

[1]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-registration-of-TableSource-TableSink-in-Table-Env-and-ConnectTableDescriptor-td37270.html
 

[2]https://github.com/ververica/flink-jdbc-driver

[GitHub] [flink-web] morsapaes opened a new pull request #332: [blog] Flink's application to Google Season of Docs.

2020-04-30 Thread GitBox



morsapaes opened a new pull request #332:
URL: https://github.com/apache/flink-web/pull/332


   Adding a blogpost to announce Flink's application to [Google Season of 
Docs](https://developers.google.com/season-of-docs).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] klion26 commented on a change in pull request #245: [FLINK-13678] Translate "Code Style - Preamble" page into Chinese

2020-04-30 Thread GitBox



klion26 commented on a change in pull request #245:
URL: https://github.com/apache/flink-web/pull/245#discussion_r417948740



##
File path: contributing/code-style-and-quality-preamble.zh.md
##
@@ -1,25 +1,25 @@
 ---
-title:  "Apache Flink Code Style and Quality Guide — Preamble"
+title:  "Apache Flink 代码样式与质量指南 — 序言"
 ---
 
 {% include code-style-navbar.zh.md %}
 
 
-This is an attempt to capture the code and quality standard that we want to 
maintain.
+本文旨在确立我们要维护的代码样式与质量标准。
 
-A code contribution (or any piece of code) can be evaluated in various ways: 
One set of properties is whether the code is correct and efficient. This 
requires solving the _logical or algorithmic problem_ correctly and well.
+评估代码贡献（或任何代码片段）有多种方式：一组指标是代码是否正确和高效。这需要正确地解决逻辑或算法问题。
 
-Another set of properties is whether the code follows an intuitive design and 
architecture, whether it is well structured with right separation of concerns, 
and whether the code is easily understandable and makes its assumptions 
explicit. That set of properties requires solving the _software engineering 
problem_ well. A good solution implies that the code is easily testable, 
maintainable also by other people than the original authors (because it is 
harder to accidentally break), and efficient to evolve.
+另一组指标是代码的设计和架构是否直观、结构是否良好、关注点是否正确、代码是否易于理解以及假设是否明确。这需要很好地解决软件工程问题。好的解决方案意味着代码容易测试，可以由原作者之外的其他人维护（代码不容易被意外破坏），并且可持续优化。
 
-While the first set of properties has rather objective approval criteria, the 
second set of properties is much harder to assess, but is of high importance 
for an open source project like Apache Flink. To make the code base inviting to 
many contributors, to make contributions easy to understand for developers that 
did not write the original code, and to make the code robust in the face of 
many contributions, well engineered code is crucial.[^1] For well engineered 
code, it is easier to keep it correct and fast over time.
+第一组指标具有比较客观的评价标准，第二组指标较难于评估，然而对于 Apache Flink 
这样的开源项目，第二组指标更加重要。为了能够邀请更多的贡献者，为了使非原始开发人员容易上手参与贡献，为了使大量贡献者协作开发的代码保持健壮，对代码进行精心地设计至关重要。[^1]
 随着时间的推移，精心设计的代码更容易保持正确和高效。
 
 
-This is of course not a full guide on how to write well engineered code. There 
is a world of big books that try to capture that. This guide is meant as a 
checklist of best practices, patterns, anti-patterns, and common mistakes that 
we observed in the context of developing Flink.
+本文当然不是代码设计的完全指南。有海量的书籍研究和讨论相关课题。本指南旨在作为一份清单，列举出我们在开发 Flink 
过程中所观察到的最佳实践、模式、反模式和常见错误。
 
-A big part of high-quality open source contributions is about helping the 
reviewer to understand the contribution and double-check the implications, so 
an important part of this guide is about how to structure a pull request for 
review.
+高质量开源贡献的很大一部分是帮助审阅者理解贡献的内容进而对内容进行细致地检查，因此本指南的一个重要部分是如何构建便于代码审查的拉取请求。

Review comment:
   恩，我觉得不翻译也是一种选择，这更像一个专有名词





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [DISCUSS][FLIP-108] Problems regarding the class loader and dependency

2020-04-30 Thread Aljoscha Krettek

I agree with Till and Xintong, if the ExternalResourceInfo is only a 
holder of properties that doesn't have any sublasses it can just become 
the "properties" itself.


Aljoscha

On 30.04.20 12:49, Till Rohrmann wrote:

Thanks for the clarification.

I think you are right that the typed approach does not work with the plugin
mechanism because even if we had the specific ExternalResourceInfo subtype
available one could not cast it into this type because the actual instance
has been loaded by a different class loader.

I also think that the plugin approach is indeed best in order to avoid
dependency conflicts. Hence, I believe that the third proposal is a good
solution. I agree with Xintong, that we should not return a Properties
instance though. Maybe

public interface ExternalResourceInfo {
   String getProperty(String key);
   Collection getKeys();
}

would be good enough.


Cheers,
Till

On Wed, Apr 29, 2020 at 2:17 PM Yang Wang  wrote:


I am also in favor of the option3. Since the Flink FileSystem has the very
similar implementation via plugin mechanism. It has a map "FS_FACTORIES"
to store the plugin-loaded specific FileSystem(e.g. S3, AzureFS, OSS,
etc.).
And provide some common interfaces.


Best,
Yang

Yangze Guo  于2020年4月29日周三 下午3:54写道：


For your convenience, I modified the Tokenizer in "WordCount"[1] case
to show how UDF leverages GPU info and how we found that problem.

[1]


https://github.com/KarmaGYZ/flink/blob/7c5596e43f6d14c65063ab0917f3c0d4bc0211ed/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java


Best,
Yangze Guo

On Wed, Apr 29, 2020 at 3:25 PM Xintong Song 
wrote:




Will she ask for some properties and then pass them to another

component?


Yes. Take GPU as an example, the property needed is "GPU index", and

the

index will be used to tell the OS which GPU should be used for the
computing workload.



Where does this component come from?


The component could be either the UDF/operator itself, or some AI

libraries

used by the operator. For 1.11, we do not have plan for introducing new

GPU

aware operators in Flink. So all the usages of the GPU resources should
come from UDF. Please correct me if I am wrong, @Becket.

Thank you~

Xintong Song



On Wed, Apr 29, 2020 at 3:14 PM Till Rohrmann 

wrote:



Thanks for bringing this up Yangze and Xintong. I see the problem.

Help me

to understand how the ExternalResourceInfo is intended to be used by

the

user. Will she ask for some properties and then pass them to another
component? Where does this component come from?

Cheers,
Till

On Wed, Apr 29, 2020 at 9:05 AM Xintong Song 
wrote:


Thanks for kicking off this discussion, Yangze.

First, let me try to explain a bit more about this problem. Since

we

decided to make the `ExternalResourceDriver` a plugin whose

implementation

could be provided by user, we think it makes sense to leverage

Flink’s

plugin mechanism and load the drivers in separated class loaders to

avoid

potential risk of dependency conflicts. However, that means
`RuntimeContext` and user codes do not naturally have access to

classes

defined in the plugin. In the current design,
`RuntimeContext#getExternalResourceInfos` takes the concrete
`ExternalResourceInfo` implementation class as an argument. This

will

cause

problem when user codes try to pass in the argument, and when
`RuntimeContext` tries to do the type check/cast.


To my understanding, the root problem is probably that we should

not

depend

on a specific implementation of the `ExternalResourceInfo`

interface

from

outside the plugin (user codes & runtime context). To that end,

regardless

the detailed interface design, I'm in favor of the direction of the

3rd

approach. I think it makes sense to add some general

information/property

accessing interfaces in `ExternalResourceInfo` (e.g., a key-value

property

map), so that in most cases users do not need to cast the
`ExternalResourceInfo` into concrete subclasses.


Regarding the detailed interface design, I'm not sure about using
`Properties`. I think the information contained in a

`ExternalResourceInfo`

can be considered as a unmodifiable map. So maybe something like

the

following?


public interface ExternalResourceInfo {

 String getProperty(String key);
 Map getProperties();
}



WDYT?


Thank you~

Xintong Song



On Wed, Apr 29, 2020 at 2:40 PM Yangze Guo 

wrote:



Hi, there:

The "FLIP-108: Add GPU support in Flink"[1] is now working in
progress. However, we met a problem with
"RuntimeContext#getExternalResourceInfos" if we want to leverage

the

Plugin[2] mechanism in Flink.
The interface is:
The problem is now:
public interface RuntimeContext {
 /**
  * Get the specific external resource information by the

resourceName.

  */
  Set
getExternalResourceInfos(String resourceName, Class
externalResourceType);
}
The problem is that the mainClassLoader does not recognize the
subclasses

[jira] [Created] (FLINK-17480) Support PyFlink on native Kubernetes setup

2020-04-30 Thread Canbin Zheng (Jira)

Canbin Zheng created FLINK-17480:


 Summary: Support PyFlink on native Kubernetes setup
 Key: FLINK-17480
 URL: https://issues.apache.org/jira/browse/FLINK-17480
 Project: Flink
  Issue Type: New Feature
  Components: Deployment / Kubernetes
Reporter: Canbin Zheng


This is the umbrella issue for all PyFlink related tasks with relation to Flink 
on Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-36 - Support Interactive Programming in Flink Table API

2020-04-30 Thread Timo Walther

Hi Xuannan,

sorry, for not entering the discussion earlier. Could you please update
the FLIP to how it would like after FLIP-84? I think your proposal makes
sense to me and aligns well with the other efforts from an API
perspective. However, here are some thought from my side:

It would be nice to already think about how we can support this feature
in a multi-statement SQL file. Would we use planner hints (as discussed
in FLIP-113) or rather introduce some custom DCL on top of views?

Actually, the semantics of a cached table are very similar to
materialized views. Maybe we should think about unifying these concepts?

CREATE MATERIALIZED VIEW x SELECT * FROM t;

table.materialize();

table.unmaterialize();

From a runtime perspecitive, even for streaming a user could define a
caching service that is backed by a regular table source/table sink.

Currently, people are busy with feature freeze of Flink 1.11. Maybe we
could postpone the discussion after May 15th. I guess this FLIP is
targeted for Flink 1.12 anyways.

Regards,
Timo

On 30.04.20 09:00, Xuannan Su wrote:

Hi all,

I'd like to start the vote for FLIP-36[1], which has been discussed in
thread[2].

The vote will be open for 72h, until May 3, 2020, 07:00 AM UTC, unless
there's an objection.

Best,
Xuannan

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-36-Support-Interactive-Programming-in-Flink-Table-API-td40592.html

[jira] [Created] (FLINK-17479) Occasional checkpoint failure due to null pointer exception in Flink version 1.10

2020-04-30 Thread nobleyd (Jira)

nobleyd created FLINK-17479:
---

 Summary: Occasional checkpoint failure due to null pointer 
exception in Flink version 1.10
 Key: FLINK-17479
 URL: https://issues.apache.org/jira/browse/FLINK-17479
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.10.0
 Environment: Flink1.10.0

jdk1.8.0_60
Reporter: nobleyd
 Attachments: image-2020-04-30-18-44-21-630.png, 
image-2020-04-30-18-55-53-779.png

I upgrade the standalone cluster(3 machines) from flink1.9 to flink1.10.0 
latest. My job running normally in flink1.9 for about half a year, while I get 
some job failed due to null pointer exception when checkpoing in  flink1.10.0.

Below is the exception log:

!image-2020-04-30-18-55-53-779.png!

I have checked the StreamTask(882), and is shown below. I think the only case 
is that checkpointMetaData is null that can lead to a null pointer exception.

!image-2020-04-30-18-44-21-630.png!

I do not know why, is there anyone can help me? The problem only occurs in 
Flink1.10.0 for now, it works well in flink1.9. I give the some conf info(some 
different to the default) also in below, guessing that maybe it is an error for 
configuration mistake.

some conf of my flink1.10.0:

 
{code:java}
taskmanager.memory.flink.size: 71680m
taskmanager.memory.framework.heap.size: 512m
taskmanager.memory.framework.off-heap.size: 512m
taskmanager.memory.task.off-heap.size: 17920m
taskmanager.memory.managed.size: 512m
taskmanager.memory.jvm-metaspace.size: 512m

taskmanager.memory.network.fraction: 0.1
taskmanager.memory.network.min: 1024mb
taskmanager.memory.network.max: 1536mb
taskmanager.memory.segment-size: 128kb

rest.port: 8682
historyserver.web.port: 8782high-availability.jobmanager.port: 
13141,13142,13143,13144
blob.server.port: 13146,13147,13148,13149taskmanager.rpc.port: 
13151,13152,13153,13154
taskmanager.data.port: 13156metrics.internal.query-service.port: 
13161,13162,13163,13164,13166,13167,13168,13169env.java.home: 
/usr/java/jdk1.8.0_60/bin/java
env.pid.dir: /home/work/flink-1.10.0{code}
 

Hope someone can help me solve it.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS][FLIP-108] Problems regarding the class loader and dependency

2020-04-30 Thread Till Rohrmann

Thanks for the clarification.

I think you are right that the typed approach does not work with the plugin
mechanism because even if we had the specific ExternalResourceInfo subtype
available one could not cast it into this type because the actual instance
has been loaded by a different class loader.

I also think that the plugin approach is indeed best in order to avoid
dependency conflicts. Hence, I believe that the third proposal is a good
solution. I agree with Xintong, that we should not return a Properties
instance though. Maybe

public interface ExternalResourceInfo {
  String getProperty(String key);
  Collection getKeys();
}

would be good enough.


Cheers,
Till

On Wed, Apr 29, 2020 at 2:17 PM Yang Wang  wrote:

> I am also in favor of the option3. Since the Flink FileSystem has the very
> similar implementation via plugin mechanism. It has a map "FS_FACTORIES"
> to store the plugin-loaded specific FileSystem(e.g. S3, AzureFS, OSS,
> etc.).
> And provide some common interfaces.
>
>
> Best,
> Yang
>
> Yangze Guo  于2020年4月29日周三 下午3:54写道：
>
> > For your convenience, I modified the Tokenizer in "WordCount"[1] case
> > to show how UDF leverages GPU info and how we found that problem.
> >
> > [1]
> >
> https://github.com/KarmaGYZ/flink/blob/7c5596e43f6d14c65063ab0917f3c0d4bc0211ed/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java
> >
> > Best,
> > Yangze Guo
> >
> > On Wed, Apr 29, 2020 at 3:25 PM Xintong Song 
> > wrote:
> > >
> > > >
> > > > Will she ask for some properties and then pass them to another
> > component?
> > >
> > > Yes. Take GPU as an example, the property needed is "GPU index", and
> the
> > > index will be used to tell the OS which GPU should be used for the
> > > computing workload.
> > >
> > >
> > > > Where does this component come from?
> > >
> > > The component could be either the UDF/operator itself, or some AI
> > libraries
> > > used by the operator. For 1.11, we do not have plan for introducing new
> > GPU
> > > aware operators in Flink. So all the usages of the GPU resources should
> > > come from UDF. Please correct me if I am wrong, @Becket.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Wed, Apr 29, 2020 at 3:14 PM Till Rohrmann 
> > wrote:
> > >
> > > > Thanks for bringing this up Yangze and Xintong. I see the problem.
> > Help me
> > > > to understand how the ExternalResourceInfo is intended to be used by
> > the
> > > > user. Will she ask for some properties and then pass them to another
> > > > component? Where does this component come from?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Wed, Apr 29, 2020 at 9:05 AM Xintong Song 
> > > > wrote:
> > > >
> > > > > Thanks for kicking off this discussion, Yangze.
> > > > >
> > > > > First, let me try to explain a bit more about this problem. Since
> we
> > > > > decided to make the `ExternalResourceDriver` a plugin whose
> > > > implementation
> > > > > could be provided by user, we think it makes sense to leverage
> > Flink’s
> > > > > plugin mechanism and load the drivers in separated class loaders to
> > avoid
> > > > > potential risk of dependency conflicts. However, that means
> > > > > `RuntimeContext` and user codes do not naturally have access to
> > classes
> > > > > defined in the plugin. In the current design,
> > > > > `RuntimeContext#getExternalResourceInfos` takes the concrete
> > > > > `ExternalResourceInfo` implementation class as an argument. This
> will
> > > > cause
> > > > > problem when user codes try to pass in the argument, and when
> > > > > `RuntimeContext` tries to do the type check/cast.
> > > > >
> > > > >
> > > > > To my understanding, the root problem is probably that we should
> not
> > > > depend
> > > > > on a specific implementation of the `ExternalResourceInfo`
> interface
> > from
> > > > > outside the plugin (user codes & runtime context). To that end,
> > > > regardless
> > > > > the detailed interface design, I'm in favor of the direction of the
> > 3rd
> > > > > approach. I think it makes sense to add some general
> > information/property
> > > > > accessing interfaces in `ExternalResourceInfo` (e.g., a key-value
> > > > property
> > > > > map), so that in most cases users do not need to cast the
> > > > > `ExternalResourceInfo` into concrete subclasses.
> > > > >
> > > > >
> > > > > Regarding the detailed interface design, I'm not sure about using
> > > > > `Properties`. I think the information contained in a
> > > > `ExternalResourceInfo`
> > > > > can be considered as a unmodifiable map. So maybe something like
> the
> > > > > following?
> > > > >
> > > > >
> > > > > public interface ExternalResourceInfo {
> > > > > > String getProperty(String key);
> > > > > > Map getProperties();
> > > > > > }
> > > > >
> > > > >
> > > > > WDYT?
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Apr 29, 2020 at 2:40

[jira] [Created] (FLINK-17478) Avro format logical type conversions do not work due to type mismatch

2020-04-30 Thread Gyula Fora (Jira)

Gyula Fora created FLINK-17478:
--

 Summary: Avro format logical type conversions do not work due to 
type mismatch
 Key: FLINK-17478
 URL: https://issues.apache.org/jira/browse/FLINK-17478
 Project: Flink
  Issue Type: Sub-task
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / Planner
Affects Versions: 1.10.0
Reporter: Gyula Fora


We hit the following issue when trying to use avro logical timestamp types:

 
{code:java}
CREATE TABLE source_table (
 int_field INT,
 timestamp_field TIMESTAMP(3)
) WITH (
 'connector.type' = 'kafka',
 'connector.version' = 'universal',
 'connector.topic' = 'avro_tset',
 'connector.properties.bootstrap.servers' = '<...>',
 'format.type' = 'avro',
 'format.avro-schema' =
 '{
 "type": "record",
 "name": "test",
 "fields" : [
 {"name": "int_field", "type": "int"},
 {"name": "timestamp_field", "type": {"type":"long", "logicalType": 
"timestamp-millis"}}
 ]
 }'
) 
 
INSERT INTO source_table VALUES (12, TIMESTAMP '1999-11-11 11:11:11'); 
{code}
 

And the error: 
{noformat}
Caused by: java.lang.ClassCastException: java.time.LocalDateTime cannot be cast 
to java.lang.Long at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131)
 at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72) at 
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:166)
 at 
org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:90)
 at 
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:156)
 at 
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:118)
 at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:75) at 
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:62) at 
org.apache.flink.formats.avro.AvroRowSerializationSchema.serialize(AvroRowSerializationSchema.java:143){noformat}

Dawid's analysis from the ML discussion:
It seems that the information about the bridging class (java.sql.Timestamp in 
this case) is lost in the stack. Because this information is lost/not respected 
the planner produces LocalDateTime instead of a proper java.sql.Timestamp time. 
The AvroRowSerializationSchema expects java.sql.Timestamp for a column of 
TIMESTAMP type and thus it fails for LocalDateTime. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-04-30 Thread Konstantin Knauf

Hi everyone,

just looping in Austin as he mentioned that they also ran into issues due
to the inflexibility of the RabiitMQSourcce to me yesterday.

Cheers,

Konstantin

On Thu, Apr 30, 2020 at 11:23 AM seneg...@gmail.com 
wrote:

> Hello Guys,
>
> Thanks for all the responses, i want to stress out that i didn't feel
> ignored i just thought that i forgot an important step or something.
>
> Since i am a newbie i would follow whatever route you guys would suggest :)
> and i agree that the RMQ connector needs a lot of love still "which i would
> be happy to submit gradually"
>
> as for the code i have it here in the PR:
> https://github.com/senegalo/flink/pull/1 it's not that much of a change in
> terms of logic but more of what is exposed.
>
> Let me know how you want me to proceed.
>
> Thanks again,
> Karim Mansour
>
> On Thu, Apr 30, 2020 at 10:40 AM Aljoscha Krettek 
> wrote:
>
> > Hi,
> >
> > I think it's good to contribute the changes to Flink directly since we
> > already have the RMQ connector in the respository.
> >
> > I would propose something similar to the Kafka connector, which takes
> > both the generic DeserializationSchema and a KafkaDeserializationSchema
> > that is specific to Kafka and allows access to the ConsumerRecord and
> > therefore all the Kafka features. What do you think about that?
> >
> > Best,
> > Aljoscha
> >
> > On 30.04.20 10:26, Robert Metzger wrote:
> > > Hey Karim,
> > >
> > > I'm sorry that you had such a bad experience contributing to Flink,
> even
> > > though you are nicely following the rules.
> > >
> > > You mentioned that you've implemented the proposed change already.
> Could
> > > you share a link to a branch here so that we can take a look? I can
> > assess
> > > the API changes easier if I see them :)
> > >
> > > Thanks a lot!
> > >
> > >
> > > Best,
> > > Robert
> > >
> > > On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz <
> dwysakow...@apache.org
> > >
> > > wrote:
> > >
> > >> Hi Karim,
> > >>
> > >> Sorry you did not have the best first time experience. You certainly
> did
> > >> everything right which I definitely appreciate.
> > >>
> > >> The problem in that particular case, as I see it, is that RabbitMQ is
> > >> not very actively maintained and therefore it is not easy too find a
> > >> committer willing to take on this topic. The point of connectors not
> > >> being properly maintained was raised a few times in the past on the
> ML.
> > >> One of the ideas how to improve the situation there was to start a
> > >> https://flink-packages.org/ page. The idea is to ask active users of
> > >> certain connectors to maintain those connectors outside of the core
> > >> project, while giving them a platform within the community where they
> > >> can make their modules visible. That way it is possible to overcome
> the
> > >> lack of capabilities within the core committers without loosing much
> on
> > >> the visibility.
> > >>
> > >> I would kindly ask you to consider that path, if you are interested.
> You
> > >> can of course also wait/reach out to more committers if you feel
> strong
> > >> about contributing those changes back to the Flink repository itself.
> > >>
> > >> Best,
> > >>
> > >> Dawid
> > >>
> > >> On 30/04/2020 07:29, seneg...@gmail.com wrote:
> > >>> Hello,
> > >>>
> > >>> I am new to the mailing list and to contributing in Big opensource
> > >> projects
> > >>> in general and i don't know if i did something wrong or should be
> more
> > >>> patient :)
> > >>>
> > >>> I put a topic for discussion as per the contribution guide "
> > >>> https://flink.apache.org/contributing/how-to-contribute.html;
> almost a
> > >> week
> > >>> ago and since what i propose is not backward compatible it needs to
> be
> > >>> discussed here before opening a ticket and moving forward.
> > >>>
> > >>> So my question is. Will someone pick the discussion up ? or at least
> > >>> someone would say that this is not the way to go ? or should i assume
> > >> from
> > >>> the silence that it's not important / relevant to the project ?
> Should
> > i
> > >>> track the author of the connector and send him directly ?
> > >>>
> > >>> Thank you for your time.
> > >>>
> > >>> Regards,
> > >>> Karim Mansour
> > >>>
> > >>> On Fri, Apr 24, 2020 at 11:17 AM seneg...@gmail.com <
> > seneg...@gmail.com>
> > >>> wrote:
> > >>>
> >  Dear All,
> > 
> >  I want to propose a change to the current RabbitMQ connector.
> > 
> >  Currently the RMQSource is extracting the body of the message which
> > is a
> >  byte array and pass it to a an instance of a user implementation of
> > the
> >  DeserializationSchema class to deserialize the body of the message.
> It
> >  also uses the correlation id from the message properties to
> > deduplicate
> > >> the
> >  message.
> > 
> >  What i want to propose is instead of taking a implementation of a
> >  DeserializationSchema in the RMQSource constructor, actually have
> the
> >  user implement

Re:Re: chinese-translation for FLINK-16091

2020-04-30 Thread flinker

Thank You.Best,Marshal
At 2020-04-30 17:49:07, "Jark Wu"  wrote:
>Hi,
>
>Welcome to the community!
>There is no contributor permission now, you can just comment under the JIRA
>issue.
>And committer will assign issue to you if no one is working on this.
>
>Best,
>Jark
>
>
>On Thu, 30 Apr 2020 at 17:36, flinker  wrote:
>
>> Hi,
>>
>> I want to contribute to Apache Flink.
>> Would you please give me the contributor permission?
>> My JIRA ID is FLINK-16091 ;
>> https://issues.apache.org/jira/browse/FLINK-16091.Thank you.

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-30 Thread Fabian Hueske

Hi Godfrey,

The formatting of your example seems to be broken.
Could you send them again please?

Regarding your points
> because watermark express can be a sub-column, just like `f1.q2` in above
example I give.

I would put the watermark information in the row of the top-level field and
indicate to which nested field the watermark refers.
Don't we have to solve the same issue for primary keys that are defined on
a nested field?

> A boolean flag can't represent such info. and I do know whether we will
support complex watermark expression involving multiple columns in the
future. such as: "WATERMARK FOR ts as ts + f1 + interval '1' second"

You are right, a simple binary flag is definitely not sufficient to display
the watermark information.
I would put the expression string into the field, i.e., "ts + f1 + interval
'1' second"


For me the most important point of why to not show the watermark as a row
in the table is that it is not field that can be queried but meta
information on an existing field.
For the user it is important to know that a certain field has a watermark.
Otherwise, certain queries cannot be correctly specified.
Also there might be support for multiple watermarks that are defined of
different fields at some point. Would those be printed in multiple rows?

Best,
Fabian


Am Do., 30. Apr. 2020 um 11:25 Uhr schrieb godfrey he :

> Hi Fabian, Aljoscha
>
> Thanks for the feedback.
>
> Agree with you that we can deal with primary key as you mentioned.
> now, the type column has contained the nullability attribute, e.g. BIGINT
> NOT NULL.
> (I'm also ok that we use two columns to represent type just like mysql)
>
> >Why I treat `watermark` as a special row ?
> because watermark express can be a sub-column, just like `f1.q2` in above
> example I give.
> A boolean flag can't represent such info. and I do know whether we will
> support complex
> watermark expression involving multiple columns in the future. such as:
> "WATERMARK FOR ts as ts + f1 + interval '1' second"
>
> If we do not support complex watermark expression, we can add a watermark
> column.
>
> for example:
>
> create table MyTable (
>
> f0 BIGINT NOT NULL,
>
> f1 ROW,
>
> f2 VARCHAR<256>,
>
> f3 AS f0 + 1,
>
> PRIMARY KEY (f0),
>
> UNIQUE (f3, f2),
>
> WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>
> ) with (...)
>
>
> name
>
> type
>
> key
>
> compute column
>
> watermark
>
> f0
>
> BIGINT NOT NULL
>
> PRI
>
> (NULL)
>
> f1
>
> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>
> UNQ
>
> (NULL)
>
> f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>
> f2
>
> VARCHAR<256>
>
> (NULL)
>
> NULL
>
> f3
>
> BIGINT NOT NULL
>
> UNQ
>
> f0 + 1
>
>
> or we add a column to represent nullability.
>
> name
>
> type
>
> null
>
> key
>
> compute column
>
> watermark
>
> f0
>
> BIGINT
>
> false
>
> PRI
>
> (NULL)
>
> f1
>
> ROW<`q1` STRING, `q2` TIMESTAMP(3)>
>
> true
>
> UNQ
>
> (NULL)
>
> f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)
>
> f2
>
> VARCHAR<256>
>
> true
>
> (NULL)
>
> NULL
>
> f3
>
> BIGINT
>
> false
>
> UNQ
>
> f0 + 1
>
>
> Personally, I like the second one. (we need do some changes on LogicalType
> to get type name without nullability)
>
>
> Best,
> Godfrey
>
>
> Aljoscha Krettek  于2020年4月29日周三 下午5:47写道：
>
> > +1 I like the general idea of printing the results as a table.
> >
> > On the specifics I don't know enough but Fabians suggestions seems to
> > make sense to me.
> >
> > Aljoscha
> >
> > On 29.04.20 10:56, Fabian Hueske wrote:
> > > Hi Godfrey,
> > >
> > > Thanks for starting this discussion!
> > >
> > > In my mind, WATERMARK is a property (or constraint) of a field, just
> like
> > > PRIMARY KEY.
> > > Take this example from MySQL:
> > >
> > > mysql> CREATE TABLE people (id INT NOT NULL, name VARCHAR(128) NOT
> NULL,
> > > age INT, PRIMARY KEY (id));
> > > Query OK, 0 rows affected (0.06 sec)
> > >
> > > mysql> describe people;
> > > +---+--+--+-+-+---+
> > > | Field | Type | Null | Key | Default | Extra |
> > > +---+--+--+-+-+---+
> > > | id| int  | NO   | PRI | NULL|   |
> > > | name  | varchar(128) | NO   | | NULL|   |
> > > | age   | int  | YES  | | NULL|   |
> > > +---+--+--+-+-+---+
> > > 3 rows in set (0.01 sec)
> > >
> > > Here, PRIMARY KEY is marked in the Key column of the id field.
> > > We could do the same for watermarks by adding a Watermark column.
> > >
> > > Best, Fabian
> > >
> > >
> > > Am Mi., 29. Apr. 2020 um 10:43 Uhr schrieb godfrey he <
> > godfre...@gmail.com>:
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to bring up a discussion about the result type of
> describe
> > >> statement,
> > >> which is introduced in FLIP-84[1].
> > >> In previous version, we define the result type of `describe` statement
> > is a
> > >> single column as following
> > >>
> > >> Statement
> > >>
> > >> Result Schema
> > >>
> > >> Result

Re: chinese-translation for FLINK-16091

2020-04-30 Thread Jark Wu

Hi,

Welcome to the community!
There is no contributor permission now, you can just comment under the JIRA
issue.
And committer will assign issue to you if no one is working on this.

Best,
Jark

On Thu, 30 Apr 2020 at 17:36, flinker  wrote:

> Hi,
>
> I want to contribute to Apache Flink.
> Would you please give me the contributor permission?
> My JIRA ID is FLINK-16091 ;
> https://issues.apache.org/jira/browse/FLINK-16091.Thank you.

[jira] [Created] (FLINK-17477) resumeConsumption call should happen as quickly as possible to minimise latency

2020-04-30 Thread Piotr Nowojski (Jira)

Piotr Nowojski created FLINK-17477:
--

 Summary: resumeConsumption call should happen as quickly as 
possible to minimise latency
 Key: FLINK-17477
 URL: https://issues.apache.org/jira/browse/FLINK-17477
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing, Runtime / Network
Reporter: Piotr Nowojski
 Fix For: 1.11.0


We should be calling {{InputGate#resumeConsumption()}} as soon as possible (to 
avoid any unnecessary delay/latency when task is idling). Currently I think 
it’s mostly fine - the important bit is that on the happy path, we always 
{{resumeConsumption}} before trying to complete the checkpoint, so that netty 
threads will start resuming the network traffic while the task thread is doing 
the synchronous part of the checkpoint and starting asynchronous part. But I 
think in two places we are first aborting checkpoint and only then resuming 
consumption (in CheckpointBarrierAligner):
{{code}}
// let the task know we are not completing this
notifyAbort(currentCheckpointId,
new CheckpointException(
"Barrier id: " + barrierId,

CheckpointFailureReason.CHECKPOINT_DECLINED_SUBSUMED));
// abort the current checkpoint
releaseBlocksAndResetBarriers();
{{code}}
// let the task know we skip a checkpoint
notifyAbort(currentCheckpointId,
new 
CheckpointException(CheckpointFailureReason.CHECKPOINT_DECLINED_INPUT_END_OF_STREAM));
// no chance to complete this checkpoint
releaseBlocksAndResetBarriers();
{{code}}
It’s not a big deal, as those are a rare conditions, but it would be better to 
be consistent everywhere: first release blocks and resume consumption, before 
anything else happens. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

chinese-translation for FLINK-16091

2020-04-30 Thread flinker

Hi,

I want to contribute to Apache Flink.
Would you please give me the contributor permission?
My JIRA ID is FLINK-16091 ;  
https://issues.apache.org/jira/browse/FLINK-16091.Thank you.

Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-30 Thread godfrey he

Hi Fabian, Aljoscha

Thanks for the feedback.

Agree with you that we can deal with primary key as you mentioned.
now, the type column has contained the nullability attribute, e.g. BIGINT
NOT NULL.
(I'm also ok that we use two columns to represent type just like mysql)

>Why I treat `watermark` as a special row ?
because watermark express can be a sub-column, just like `f1.q2` in above
example I give.
A boolean flag can't represent such info. and I do know whether we will
support complex
watermark expression involving multiple columns in the future. such as:
"WATERMARK FOR ts as ts + f1 + interval '1' second"

If we do not support complex watermark expression, we can add a watermark
column.

for example:

create table MyTable (

f0 BIGINT NOT NULL,

f1 ROW,

f2 VARCHAR<256>,

f3 AS f0 + 1,

PRIMARY KEY (f0),

UNIQUE (f3, f2),

WATERMARK f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)

) with (...)


name

type

key

compute column

watermark

f0

BIGINT NOT NULL

PRI

(NULL)

f1

ROW<`q1` STRING, `q2` TIMESTAMP(3)>

UNQ

(NULL)

f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)

f2

VARCHAR<256>

(NULL)

NULL

f3

BIGINT NOT NULL

UNQ

f0 + 1


or we add a column to represent nullability.

name

type

null

key

compute column

watermark

f0

BIGINT

false

PRI

(NULL)

f1

ROW<`q1` STRING, `q2` TIMESTAMP(3)>

true

UNQ

(NULL)

f1.q2 AS (`f1.q2` - INTERVAL '3' SECOND)

f2

VARCHAR<256>

true

(NULL)

NULL

f3

BIGINT

false

UNQ

f0 + 1


Personally, I like the second one. (we need do some changes on LogicalType
to get type name without nullability)


Best,
Godfrey


Aljoscha Krettek  于2020年4月29日周三 下午5:47写道：

> +1 I like the general idea of printing the results as a table.
>
> On the specifics I don't know enough but Fabians suggestions seems to
> make sense to me.
>
> Aljoscha
>
> On 29.04.20 10:56, Fabian Hueske wrote:
> > Hi Godfrey,
> >
> > Thanks for starting this discussion!
> >
> > In my mind, WATERMARK is a property (or constraint) of a field, just like
> > PRIMARY KEY.
> > Take this example from MySQL:
> >
> > mysql> CREATE TABLE people (id INT NOT NULL, name VARCHAR(128) NOT NULL,
> > age INT, PRIMARY KEY (id));
> > Query OK, 0 rows affected (0.06 sec)
> >
> > mysql> describe people;
> > +---+--+--+-+-+---+
> > | Field | Type | Null | Key | Default | Extra |
> > +---+--+--+-+-+---+
> > | id| int  | NO   | PRI | NULL|   |
> > | name  | varchar(128) | NO   | | NULL|   |
> > | age   | int  | YES  | | NULL|   |
> > +---+--+--+-+-+---+
> > 3 rows in set (0.01 sec)
> >
> > Here, PRIMARY KEY is marked in the Key column of the id field.
> > We could do the same for watermarks by adding a Watermark column.
> >
> > Best, Fabian
> >
> >
> > Am Mi., 29. Apr. 2020 um 10:43 Uhr schrieb godfrey he <
> godfre...@gmail.com>:
> >
> >> Hi everyone,
> >>
> >> I would like to bring up a discussion about the result type of describe
> >> statement,
> >> which is introduced in FLIP-84[1].
> >> In previous version, we define the result type of `describe` statement
> is a
> >> single column as following
> >>
> >> Statement
> >>
> >> Result Schema
> >>
> >> Result Value
> >>
> >> Result Kind
> >>
> >> Examples
> >>
> >> DESCRIBE xx
> >>
> >> field name: result
> >>
> >> field type: VARCHAR(n)
> >>
> >> (n is the max length of values)
> >>
> >> describe the detail of an object
> >>
> >> (single row)
> >>
> >> SUCCESS_WITH_CONTENT
> >>
> >> DESCRIBE table_name
> >>
> >> for "describe table_name", the result value is the `toString` value of
> >> `TableSchema`, which is an unstructured data.
> >> It's hard to for user to use this info.
> >>
> >> for example:
> >>
> >> TableSchema schema = TableSchema.builder()
> >> .field("f0", DataTypes.BIGINT())
> >> .field("f1", DataTypes.ROW(
> >>DataTypes.FIELD("q1", DataTypes.STRING()),
> >>DataTypes.FIELD("q2", DataTypes.TIMESTAMP(3
> >> .field("f2", DataTypes.STRING())
> >> .field("f3", DataTypes.BIGINT(), "f0 + 1")
> >> .watermark("f1.q2", WATERMARK_EXPRESSION, WATERMARK_DATATYPE)
> >> .build();
> >>
> >> its `toString` value is:
> >> root
> >>   |-- f0: BIGINT
> >>   |-- f1: ROW<`q1` STRING, `q2` TIMESTAMP(3)>
> >>   |-- f2: STRING
> >>   |-- f3: BIGINT AS f0 + 1
> >>   |-- WATERMARK FOR f1.q2 AS now()
> >>
> >> For hive, MySQL, etc., the describe result is table form including field
> >> names and field types.
> >> which is more familiar with users.
> >> TableSchema[2] has watermark expression and compute column, we should
> also
> >> put them into the table:
> >> for compute column, it's a column level, we add a new column named
> `expr`.
> >>   for watermark expression, it's a table level, we add a special row
> named
> >> `WATERMARK` to represent it.
> >>
> >> The result will look like about above example:
> >>
> >> name
> >>
> >> type
> >>
> >> expr
> >>
> >>

Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-04-30 Thread seneg...@gmail.com

Hello Guys,

Thanks for all the responses, i want to stress out that i didn't feel
ignored i just thought that i forgot an important step or something.

Since i am a newbie i would follow whatever route you guys would suggest :)
and i agree that the RMQ connector needs a lot of love still "which i would
be happy to submit gradually"

as for the code i have it here in the PR:
https://github.com/senegalo/flink/pull/1 it's not that much of a change in
terms of logic but more of what is exposed.

Let me know how you want me to proceed.

Thanks again,
Karim Mansour

On Thu, Apr 30, 2020 at 10:40 AM Aljoscha Krettek 
wrote:

> Hi,
>
> I think it's good to contribute the changes to Flink directly since we
> already have the RMQ connector in the respository.
>
> I would propose something similar to the Kafka connector, which takes
> both the generic DeserializationSchema and a KafkaDeserializationSchema
> that is specific to Kafka and allows access to the ConsumerRecord and
> therefore all the Kafka features. What do you think about that?
>
> Best,
> Aljoscha
>
> On 30.04.20 10:26, Robert Metzger wrote:
> > Hey Karim,
> >
> > I'm sorry that you had such a bad experience contributing to Flink, even
> > though you are nicely following the rules.
> >
> > You mentioned that you've implemented the proposed change already. Could
> > you share a link to a branch here so that we can take a look? I can
> assess
> > the API changes easier if I see them :)
> >
> > Thanks a lot!
> >
> >
> > Best,
> > Robert
> >
> > On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz  >
> > wrote:
> >
> >> Hi Karim,
> >>
> >> Sorry you did not have the best first time experience. You certainly did
> >> everything right which I definitely appreciate.
> >>
> >> The problem in that particular case, as I see it, is that RabbitMQ is
> >> not very actively maintained and therefore it is not easy too find a
> >> committer willing to take on this topic. The point of connectors not
> >> being properly maintained was raised a few times in the past on the ML.
> >> One of the ideas how to improve the situation there was to start a
> >> https://flink-packages.org/ page. The idea is to ask active users of
> >> certain connectors to maintain those connectors outside of the core
> >> project, while giving them a platform within the community where they
> >> can make their modules visible. That way it is possible to overcome the
> >> lack of capabilities within the core committers without loosing much on
> >> the visibility.
> >>
> >> I would kindly ask you to consider that path, if you are interested. You
> >> can of course also wait/reach out to more committers if you feel strong
> >> about contributing those changes back to the Flink repository itself.
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> On 30/04/2020 07:29, seneg...@gmail.com wrote:
> >>> Hello,
> >>>
> >>> I am new to the mailing list and to contributing in Big opensource
> >> projects
> >>> in general and i don't know if i did something wrong or should be more
> >>> patient :)
> >>>
> >>> I put a topic for discussion as per the contribution guide "
> >>> https://flink.apache.org/contributing/how-to-contribute.html; almost a
> >> week
> >>> ago and since what i propose is not backward compatible it needs to be
> >>> discussed here before opening a ticket and moving forward.
> >>>
> >>> So my question is. Will someone pick the discussion up ? or at least
> >>> someone would say that this is not the way to go ? or should i assume
> >> from
> >>> the silence that it's not important / relevant to the project ? Should
> i
> >>> track the author of the connector and send him directly ?
> >>>
> >>> Thank you for your time.
> >>>
> >>> Regards,
> >>> Karim Mansour
> >>>
> >>> On Fri, Apr 24, 2020 at 11:17 AM seneg...@gmail.com <
> seneg...@gmail.com>
> >>> wrote:
> >>>
>  Dear All,
> 
>  I want to propose a change to the current RabbitMQ connector.
> 
>  Currently the RMQSource is extracting the body of the message which
> is a
>  byte array and pass it to a an instance of a user implementation of
> the
>  DeserializationSchema class to deserialize the body of the message. It
>  also uses the correlation id from the message properties to
> deduplicate
> >> the
>  message.
> 
>  What i want to propose is instead of taking a implementation of a
>  DeserializationSchema in the RMQSource constructor, actually have the
>  user implement an interface that would have methods both the output
> for
> >> the
>  RMQSource and the correlation id used not only from the body of the
> >> message
>  but also to it's metadata and properties thus giving the connector
> much
>  more power and flexibility.
> 
>  This of course would mean a breaking API change for the RMQSource
> since
> >> it
>  will no longer take a DeserializationSchema but an implementation of a
>  predefined interface that has the methods to extract both the output
> of
> >> the

[GitHub] [flink-shaded] zentol commented on a change in pull request #85: [FLINK-16955] Bump Zookeeper 3.4.X to 3.4.14

2020-04-30 Thread GitBox



zentol commented on a change in pull request #85:
URL: https://github.com/apache/flink-shaded/pull/85#discussion_r417859654



##
File path: flink-shaded-zookeeper-parent/flink-shaded-zookeeper-34/pom.xml
##
@@ -128,4 +128,4 @@ under the License.
 
 
 
-

Review comment:
   could you revert this tiny change?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-17476) Add tests to check recovery from snapshot created with different UC mode

2020-04-30 Thread Roman Khachatryan (Jira)

Roman Khachatryan created FLINK-17476:
-

 Summary: Add tests to check recovery from snapshot created with 
different UC mode
 Key: FLINK-17476
 URL: https://issues.apache.org/jira/browse/FLINK-17476
 Project: Flink
  Issue Type: Test
  Components: Runtime / Checkpointing, Runtime / Task
Affects Versions: 1.11.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17475) Flink Multiple buffers ,IllgeaStateException

2020-04-30 Thread chun111111 (Jira)

chun11 created FLINK-17475:
--

 Summary: Flink Multiple buffers ,IllgeaStateException
 Key: FLINK-17475
 URL: https://issues.apache.org/jira/browse/FLINK-17475
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.10.0
Reporter: chun11
 Attachments: image-2020-04-30-16-48-14-074.png

!image-2020-04-30-16-48-14-074.png!

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-17474) Test and correct case insensitive for parquet and orc in hive

2020-04-30 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-17474:


 Summary: Test and correct case insensitive for parquet and orc in 
hive
 Key: FLINK-17474
 URL: https://issues.apache.org/jira/browse/FLINK-17474
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: Jingsong Lee
 Fix For: 1.11.0


Orc and parquet should be field names case insensitive to compatible with hive.

Both hive mapred reader and vectorization reader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-04-30 Thread Aljoscha Krettek


Hi,

I think it's good to contribute the changes to Flink directly since we 
already have the RMQ connector in the respository.


I would propose something similar to the Kafka connector, which takes 
both the generic DeserializationSchema and a KafkaDeserializationSchema 
that is specific to Kafka and allows access to the ConsumerRecord and 
therefore all the Kafka features. What do you think about that?


Best,
Aljoscha

On 30.04.20 10:26, Robert Metzger wrote:

Hey Karim,

I'm sorry that you had such a bad experience contributing to Flink, even
though you are nicely following the rules.

You mentioned that you've implemented the proposed change already. Could
you share a link to a branch here so that we can take a look? I can assess
the API changes easier if I see them :)

Thanks a lot!


Best,
Robert

On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz 
wrote:


Hi Karim,

Sorry you did not have the best first time experience. You certainly did
everything right which I definitely appreciate.

The problem in that particular case, as I see it, is that RabbitMQ is
not very actively maintained and therefore it is not easy too find a
committer willing to take on this topic. The point of connectors not
being properly maintained was raised a few times in the past on the ML.
One of the ideas how to improve the situation there was to start a
https://flink-packages.org/ page. The idea is to ask active users of
certain connectors to maintain those connectors outside of the core
project, while giving them a platform within the community where they
can make their modules visible. That way it is possible to overcome the
lack of capabilities within the core committers without loosing much on
the visibility.

I would kindly ask you to consider that path, if you are interested. You
can of course also wait/reach out to more committers if you feel strong
about contributing those changes back to the Flink repository itself.

Best,

Dawid

On 30/04/2020 07:29, seneg...@gmail.com wrote:

Hello,

I am new to the mailing list and to contributing in Big opensource

projects

in general and i don't know if i did something wrong or should be more
patient :)

I put a topic for discussion as per the contribution guide "
https://flink.apache.org/contributing/how-to-contribute.html; almost a

week

ago and since what i propose is not backward compatible it needs to be
discussed here before opening a ticket and moving forward.

So my question is. Will someone pick the discussion up ? or at least
someone would say that this is not the way to go ? or should i assume

from

the silence that it's not important / relevant to the project ? Should i
track the author of the connector and send him directly ?

Thank you for your time.

Regards,
Karim Mansour

On Fri, Apr 24, 2020 at 11:17 AM seneg...@gmail.com 
wrote:


Dear All,

I want to propose a change to the current RabbitMQ connector.

Currently the RMQSource is extracting the body of the message which is a
byte array and pass it to a an instance of a user implementation of the
DeserializationSchema class to deserialize the body of the message. It
also uses the correlation id from the message properties to deduplicate

the

message.

What i want to propose is instead of taking a implementation of a
DeserializationSchema in the RMQSource constructor, actually have the
user implement an interface that would have methods both the output for

the

RMQSource and the correlation id used not only from the body of the

message

but also to it's metadata and properties thus giving the connector much
more power and flexibility.

This of course would mean a breaking API change for the RMQSource since

it

will no longer take a DeserializationSchema but an implementation of a
predefined interface that has the methods to extract both the output of

the

RMQSource and the to extract the unique message id as well.

The reason behind that is that in my company we were relaying on another
property the message id for deduplication of the messages and i also

needed

that information further down the pipeline and there was absolutely no

way

of getting it other than modifying the RMQSource.

I already have code written but as the rules dictates i have to run it

by

you guys first before i attempt to create a Jira ticket :)

Let me know what you think.

Regards,
Karim Mansour

[jira] [Created] (FLINK-17473) Remove unused classes ArchivedExecutionVertexBuilder and ArchivedExecutionJobVertexBuilder

2020-04-30 Thread Gary Yao (Jira)

Gary Yao created FLINK-17473:


 Summary: Remove unused classes ArchivedExecutionVertexBuilder and 
ArchivedExecutionJobVertexBuilder
 Key: FLINK-17473
 URL: https://issues.apache.org/jira/browse/FLINK-17473
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination, Tests
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.11.0


Remove unused classes {{ArchivedExecutionVertexBuilder}} and 
{{ArchivedExecutionJobVertexBuilder}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-04-30 Thread Robert Metzger

Hey Karim,

I'm sorry that you had such a bad experience contributing to Flink, even
though you are nicely following the rules.

You mentioned that you've implemented the proposed change already. Could
you share a link to a branch here so that we can take a look? I can assess
the API changes easier if I see them :)

Thanks a lot!


Best,
Robert

On Thu, Apr 30, 2020 at 8:09 AM Dawid Wysakowicz 
wrote:

> Hi Karim,
>
> Sorry you did not have the best first time experience. You certainly did
> everything right which I definitely appreciate.
>
> The problem in that particular case, as I see it, is that RabbitMQ is
> not very actively maintained and therefore it is not easy too find a
> committer willing to take on this topic. The point of connectors not
> being properly maintained was raised a few times in the past on the ML.
> One of the ideas how to improve the situation there was to start a
> https://flink-packages.org/ page. The idea is to ask active users of
> certain connectors to maintain those connectors outside of the core
> project, while giving them a platform within the community where they
> can make their modules visible. That way it is possible to overcome the
> lack of capabilities within the core committers without loosing much on
> the visibility.
>
> I would kindly ask you to consider that path, if you are interested. You
> can of course also wait/reach out to more committers if you feel strong
> about contributing those changes back to the Flink repository itself.
>
> Best,
>
> Dawid
>
> On 30/04/2020 07:29, seneg...@gmail.com wrote:
> > Hello,
> >
> > I am new to the mailing list and to contributing in Big opensource
> projects
> > in general and i don't know if i did something wrong or should be more
> > patient :)
> >
> > I put a topic for discussion as per the contribution guide "
> > https://flink.apache.org/contributing/how-to-contribute.html; almost a
> week
> > ago and since what i propose is not backward compatible it needs to be
> > discussed here before opening a ticket and moving forward.
> >
> > So my question is. Will someone pick the discussion up ? or at least
> > someone would say that this is not the way to go ? or should i assume
> from
> > the silence that it's not important / relevant to the project ? Should i
> > track the author of the connector and send him directly ?
> >
> > Thank you for your time.
> >
> > Regards,
> > Karim Mansour
> >
> > On Fri, Apr 24, 2020 at 11:17 AM seneg...@gmail.com 
> > wrote:
> >
> >> Dear All,
> >>
> >> I want to propose a change to the current RabbitMQ connector.
> >>
> >> Currently the RMQSource is extracting the body of the message which is a
> >> byte array and pass it to a an instance of a user implementation of the
> >> DeserializationSchema class to deserialize the body of the message. It
> >> also uses the correlation id from the message properties to deduplicate
> the
> >> message.
> >>
> >> What i want to propose is instead of taking a implementation of a
> >> DeserializationSchema in the RMQSource constructor, actually have the
> >> user implement an interface that would have methods both the output for
> the
> >> RMQSource and the correlation id used not only from the body of the
> message
> >> but also to it's metadata and properties thus giving the connector much
> >> more power and flexibility.
> >>
> >> This of course would mean a breaking API change for the RMQSource since
> it
> >> will no longer take a DeserializationSchema but an implementation of a
> >> predefined interface that has the methods to extract both the output of
> the
> >> RMQSource and the to extract the unique message id as well.
> >>
> >> The reason behind that is that in my company we were relaying on another
> >> property the message id for deduplication of the messages and i also
> needed
> >> that information further down the pipeline and there was absolutely no
> way
> >> of getting it other than modifying the RMQSource.
> >>
> >> I already have code written but as the rules dictates i have to run it
> by
> >> you guys first before i attempt to create a Jira ticket :)
> >>
> >> Let me know what you think.
> >>
> >> Regards,
> >> Karim Mansour
> >>
>
>

Re: [GitHub] [flink-web] klion26 commented on pull request #247: [FLINK-13683] Translate "Code Style - Component Guide" page into Chinese

2020-04-30 Thread Yun Tang

Has anyone known why this thread could occur in development related discussions 
at Flink?

From: GitBox 
Sent: Thursday, April 30, 2020 15:55
To: dev@flink.apache.org 
Subject: [GitHub] [flink-web] klion26 commented on pull request #247: 
[FLINK-13683] Translate "Code Style - Component Guide" page into Chinese

klion26 commented on pull request #247:
URL: https://github.com/apache/flink-web/pull/247#issuecomment-621676247

   @chaojianok thanks for your contribution. could you please get rid of the 
`git merge` commit in the history. you can use `git rebase` or the other git 
command to achieve it.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-web] klion26 commented on pull request #247: [FLINK-13683] Translate "Code Style - Component Guide" page into Chinese

2020-04-30 Thread GitBox



klion26 commented on pull request #247:
URL: https://github.com/apache/flink-web/pull/247#issuecomment-621676247


   @chaojianok thanks for your contribution. could you please get rid of the 
`git merge` commit in the history. you can use `git rebase` or the other git 
command to achieve it.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [VOTE] Release 1.10.1, release candidate #1

2020-04-30 Thread Jark Wu

Thanks for the tip! I checked it and you are right :)

On Thu, 30 Apr 2020 at 15:08, Chesnay Schepler  wrote:

> flink-sql-connector-elasticsearch6 isn't bundling com.carrotsearch:hppc,
> nor does it have dependencies on org.elasticsearch:elasticsearch-geo,
> org.elasticsearch.plugin:lang-mustache-client nor
> com.github.spullara.mustache.java:compiler (and thus is also not bundling
> them).
>
> You can check this yourself by packaging the connector and comparing the
> shade-plugin output with the NOTICE file.
>
> On 30/04/2020 08:55, Jark Wu wrote:
>
> Hi Chesnay,
>
> I mean `flink-sql-connector-elasticsearch6`.
> Because this dependency change on elasticserch7 [1] is totally following
> how elasticsearch6 does. And they have the almost same dependencies.
>
> Best,
> Jark
>
> [1]:
> https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38
>
> On Thu, 30 Apr 2020 at 14:44, Chesnay Schepler  wrote:
>
>> ES6 isn't bundling these dependencies.
>>
>> On 29/04/2020 17:29, Jark Wu wrote:
>> > Looks like the ES NOTICE problem is a long-standing problem, because the
>> > ES6 sql connector NOTICE also misses these dependencies.
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 29 Apr 2020 at 17:26, Robert Metzger 
>> wrote:
>> >
>> >> Thanks for taking a look Chesnay. Then let me officially cancel the
>> >> release:
>> >>
>> >> -1 (binding)
>> >>
>> >>
>> >> Another question that I had while checking the release was the
>> >> "apache-flink-1.10.1.tar.gz" binary, which I suppose is the python
>> >> distribution.
>> >> It does not contain a LICENSE and NOTICE file at the root level (which
>> is
>> >> okay [1] for binary releases), but in the "pyflink/" directory. There
>> is
>> >> also a "deps/" directory, which contains a full distribution of Flink,
>> >> without any license files.
>> >> I believe it would be a little bit nicer to have the LICENSE and NOTICE
>> >> file in the root directory (if the python wheels format permits) to
>> make
>> >> sure it is obvious that all binary release contents are covered by
>> these
>> >> files.
>> >>
>> >>
>> >> [1]
>> >>
>> http://www.apache.org/legal/release-policy.html#licensing-documentation
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, Apr 29, 2020 at 11:10 AM Congxian Qiu 
>> >> wrote:
>> >>
>> >>> Thanks a lot for creating a release candidate for 1.10.1!
>> >>>
>> >>> +1 from my side
>> >>>
>> >>> checked
>> >>> - md5/gpg, ok
>> >>> - source does not contain any binaries, ok
>> >>> - pom points to the same version 1.10.1, ok
>> >>> - README file does not contain anything unexpected, ok
>> >>> - maven clean package -DskipTests, ok
>> >>> - maven clean verify, encounter a test timeout exception, but I think
>> it
>> >>> does not block the RC(have created an issue[1] to track it),
>> >>> - run demos on a stand-alone cluster, ok
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/FLINK-17458
>> >>> Best,
>> >>> Congxian
>> >>>
>> >>>
>> >>> Robert Metzger  于2020年4月29日周三 下午2:54写道：
>> >>>
>>  Thanks a lot for creating a release candidate for 1.10.1!
>> 
>>  I'm not sure, but I think found a potential issue in the release
>> while
>>  checking dependency changes on the ElasticSearch7 connector:
>> 
>> 
>> >>
>> https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38
>>  In this change, "com.carrotsearch:hppc" has been added to the shaded
>> >> jar
>> >>> (
>> 
>> >>
>> https://repository.apache.org/content/repositories/orgapacheflink-1362/org/apache/flink/flink-sql-connector-elasticsearch7_2.11/1.10.1/flink-sql-connector-elasticsearch7_2.11-1.10.1.jar
>>  ),
>>  without including proper mention of that dependency in
>> >> "META-INF/NOTICE".
>> 
>>  My checking notes:
>> 
>>  - checked the diff for dependency changes:
>> 
>> >>
>> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1
>>  (w/o
>>  <
>> >>
>> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1(w/o
>>  release commit:
>> 
>> 
>> >>
>> https://github.com/apache/flink/compare/release-1.10.0...0e2b520ec60cc11dce210bc38e574a05fa5a7734
>>  )
>> - flink-connector-hive sets the derby version for test-scoped
>>  dependencies:
>> 
>> 
>> >>
>> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-f4dbf40e8457457eb01ae22b53baa3ec
>>    - no NOTICE file found, but this module does not forward
>> binaries.
>> - kafka 0.10 minor version upgrade:
>> 
>> 
>> >>
>> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-0287a3f3c37b454c583b6b56de1392e4
>> - NOTICE change found
>>  - ES7 changes shading:
>> 
>> 
>> >>
>> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-bd2211176ab6e7fa83ffeaa89481ff38
>>    - problem found
>>

Re: [VOTE] Release 1.10.1, release candidate #1

2020-04-30 Thread Chesnay Schepler

flink-sql-connector-elasticsearch6 isn't bundling com.carrotsearch:hppc, 
nor does it have dependencies on org.elasticsearch:elasticsearch-geo, 
org.elasticsearch.plugin:lang-mustache-client nor 
com.github.spullara.mustache.java:compiler (and thus is also not 
bundling them).


You can check this yourself by packaging the connector and comparing the 
shade-plugin output with the NOTICE file.


On 30/04/2020 08:55, Jark Wu wrote:

Hi Chesnay,

I mean `flink-sql-connector-elasticsearch6`.
Because this dependency change on elasticserch7 [1] is totally 
following how elasticsearch6 does. And they have the almost same 
dependencies.


Best,
Jark

[1]: 
https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38


On Thu, 30 Apr 2020 at 14:44, Chesnay Schepler > wrote:


ES6 isn't bundling these dependencies.

On 29/04/2020 17:29, Jark Wu wrote:
> Looks like the ES NOTICE problem is a long-standing problem,
because the
> ES6 sql connector NOTICE also misses these dependencies.
>
> Best,
> Jark
>
> On Wed, 29 Apr 2020 at 17:26, Robert Metzger
mailto:rmetz...@apache.org>> wrote:
>
>> Thanks for taking a look Chesnay. Then let me officially cancel the
>> release:
>>
>> -1 (binding)
>>
>>
>> Another question that I had while checking the release was the
>> "apache-flink-1.10.1.tar.gz" binary, which I suppose is the python
>> distribution.
>> It does not contain a LICENSE and NOTICE file at the root level
(which is
>> okay [1] for binary releases), but in the "pyflink/" directory.
There is
>> also a "deps/" directory, which contains a full distribution of
Flink,
>> without any license files.
>> I believe it would be a little bit nicer to have the LICENSE
and NOTICE
>> file in the root directory (if the python wheels format
permits) to make
>> sure it is obvious that all binary release contents are covered
by these
>> files.
>>
>>
>> [1]
>>
http://www.apache.org/legal/release-policy.html#licensing-documentation
>>
>>
>>
>>
>> On Wed, Apr 29, 2020 at 11:10 AM Congxian Qiu
mailto:qcx978132...@gmail.com>>
>> wrote:
>>
>>> Thanks a lot for creating a release candidate for 1.10.1!
>>>
>>> +1 from my side
>>>
>>> checked
>>> - md5/gpg, ok
>>> - source does not contain any binaries, ok
>>> - pom points to the same version 1.10.1, ok
>>> - README file does not contain anything unexpected, ok
>>> - maven clean package -DskipTests, ok
>>> - maven clean verify, encounter a test timeout exception, but
I think it
>>> does not block the RC(have created an issue[1] to track it),
>>> - run demos on a stand-alone cluster, ok
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-17458
>>> Best,
>>> Congxian
>>>
>>>
>>> Robert Metzger mailto:rmetz...@apache.org>> 于2020年4月29日周三 下午2:54写道：
>>>
 Thanks a lot for creating a release candidate for 1.10.1!

 I'm not sure, but I think found a potential issue in the
release while
 checking dependency changes on the ElasticSearch7 connector:


>>

https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38
 In this change, "com.carrotsearch:hppc" has been added to the
shaded
>> jar
>>> (

>>

https://repository.apache.org/content/repositories/orgapacheflink-1362/org/apache/flink/flink-sql-connector-elasticsearch7_2.11/1.10.1/flink-sql-connector-elasticsearch7_2.11-1.10.1.jar
 ),
 without including proper mention of that dependency in
>> "META-INF/NOTICE".

 My checking notes:

 - checked the diff for dependency changes:

>>
https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1
 (w/o
 <
>>

https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1(w/o
 release commit:


>>

https://github.com/apache/flink/compare/release-1.10.0...0e2b520ec60cc11dce210bc38e574a05fa5a7734
 )
    - flink-connector-hive sets the derby version for test-scoped
 dependencies:


>>

https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-f4dbf40e8457457eb01ae22b53baa3ec
       - no NOTICE file found, but this module does not
forward binaries.
    - kafka 0.10 minor version upgrade:


>>

https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-0287a3f3c37b454c583b6b56de1392e4
        - NOTICE change found
     - ES7 changes shading:


>>

[VOTE] FLIP-36 - Support Interactive Programming in Flink Table API

2020-04-30 Thread Xuannan Su

Hi all,

I'd like to start the vote for FLIP-36[1], which has been discussed in
thread[2].

The vote will be open for 72h, until May 3, 2020, 07:00 AM UTC, unless
there's an objection.

Best,
Xuannan

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-36-Support-Interactive-Programming-in-Flink-Table-API-td40592.html

Re: [VOTE] Release 1.10.1, release candidate #1

2020-04-30 Thread Jark Wu

Hi Chesnay,

I mean `flink-sql-connector-elasticsearch6`.
Because this dependency change on elasticserch7 [1] is totally following
how elasticsearch6 does. And they have the almost same dependencies.

Best,
Jark

[1]:
https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38

On Thu, 30 Apr 2020 at 14:44, Chesnay Schepler  wrote:

> ES6 isn't bundling these dependencies.
>
> On 29/04/2020 17:29, Jark Wu wrote:
> > Looks like the ES NOTICE problem is a long-standing problem, because the
> > ES6 sql connector NOTICE also misses these dependencies.
> >
> > Best,
> > Jark
> >
> > On Wed, 29 Apr 2020 at 17:26, Robert Metzger 
> wrote:
> >
> >> Thanks for taking a look Chesnay. Then let me officially cancel the
> >> release:
> >>
> >> -1 (binding)
> >>
> >>
> >> Another question that I had while checking the release was the
> >> "apache-flink-1.10.1.tar.gz" binary, which I suppose is the python
> >> distribution.
> >> It does not contain a LICENSE and NOTICE file at the root level (which
> is
> >> okay [1] for binary releases), but in the "pyflink/" directory. There is
> >> also a "deps/" directory, which contains a full distribution of Flink,
> >> without any license files.
> >> I believe it would be a little bit nicer to have the LICENSE and NOTICE
> >> file in the root directory (if the python wheels format permits) to make
> >> sure it is obvious that all binary release contents are covered by these
> >> files.
> >>
> >>
> >> [1]
> >> http://www.apache.org/legal/release-policy.html#licensing-documentation
> >>
> >>
> >>
> >>
> >> On Wed, Apr 29, 2020 at 11:10 AM Congxian Qiu 
> >> wrote:
> >>
> >>> Thanks a lot for creating a release candidate for 1.10.1!
> >>>
> >>> +1 from my side
> >>>
> >>> checked
> >>> - md5/gpg, ok
> >>> - source does not contain any binaries, ok
> >>> - pom points to the same version 1.10.1, ok
> >>> - README file does not contain anything unexpected, ok
> >>> - maven clean package -DskipTests, ok
> >>> - maven clean verify, encounter a test timeout exception, but I think
> it
> >>> does not block the RC(have created an issue[1] to track it),
> >>> - run demos on a stand-alone cluster, ok
> >>>
> >>> [1] https://issues.apache.org/jira/browse/FLINK-17458
> >>> Best,
> >>> Congxian
> >>>
> >>>
> >>> Robert Metzger  于2020年4月29日周三 下午2:54写道：
> >>>
>  Thanks a lot for creating a release candidate for 1.10.1!
> 
>  I'm not sure, but I think found a potential issue in the release while
>  checking dependency changes on the ElasticSearch7 connector:
> 
> 
> >>
> https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38
>  In this change, "com.carrotsearch:hppc" has been added to the shaded
> >> jar
> >>> (
> 
> >>
> https://repository.apache.org/content/repositories/orgapacheflink-1362/org/apache/flink/flink-sql-connector-elasticsearch7_2.11/1.10.1/flink-sql-connector-elasticsearch7_2.11-1.10.1.jar
>  ),
>  without including proper mention of that dependency in
> >> "META-INF/NOTICE".
> 
>  My checking notes:
> 
>  - checked the diff for dependency changes:
> 
> >>
> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1
>  (w/o
>  <
> >>
> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1(w/o
>  release commit:
> 
> 
> >>
> https://github.com/apache/flink/compare/release-1.10.0...0e2b520ec60cc11dce210bc38e574a05fa5a7734
>  )
> - flink-connector-hive sets the derby version for test-scoped
>  dependencies:
> 
> 
> >>
> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-f4dbf40e8457457eb01ae22b53baa3ec
>    - no NOTICE file found, but this module does not forward
> binaries.
> - kafka 0.10 minor version upgrade:
> 
> 
> >>
> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-0287a3f3c37b454c583b6b56de1392e4
> - NOTICE change found
>  - ES7 changes shading:
> 
> 
> >>
> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-bd2211176ab6e7fa83ffeaa89481ff38
>    - problem found
> - Influxdb version change
> 
> 
> >>
> https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-0d2cce4875b2804ab89c3343a7de1ca6
>    - NOTICE change found
> 
> 
> 
>  On Fri, Apr 24, 2020 at 8:10 PM Yu Li  wrote:
> 
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for version
> >> 1.10.1,
> >>> as
> > follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> >
> > The complete staging area is available for your review, which
> >> includes:
> > * JIRA release notes [1],
> > * the official Apache source release

[jira] [Created] (FLINK-17472) StreamExecutionEnvironment and ExecutionEnvironment in Yarn mode

2020-04-30 Thread RocMarshal (Jira)

RocMarshal created FLINK-17472:
--

 Summary: StreamExecutionEnvironment and ExecutionEnvironment in 
Yarn mode
 Key: FLINK-17472
 URL: https://issues.apache.org/jira/browse/FLINK-17472
 Project: Flink
  Issue Type: New Feature
  Components: Client / Job Submission, Deployment / YARN
Affects Versions: 1.10.0
Reporter: RocMarshal


Expect to have such a mode of submission. Build the task directly in the 
Environment, and then submit the task in yarn mode. Just like 
RemoteStreamEnvironment, as long as you specify the parameters of the yarn 
cluster (host, port) or yarn configuration directory and HADOOP_USER_NAME, you 
can use the topology built by Env to submit the task .

This submission method is best to minimize the transmission of resources 
required by yarn to start flink-jobmanager and taskmanagerrunner to ensure that 
flink can deploy tasks on the yarn cluster as quickly as possible.

The simple demo as shown in  the attachment .the parameter named 'env' 
containes all the operators about job ,like sources,maps,etc..

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release 1.10.1, release candidate #1

2020-04-30 Thread Chesnay Schepler


ES6 isn't bundling these dependencies.

On 29/04/2020 17:29, Jark Wu wrote:

Looks like the ES NOTICE problem is a long-standing problem, because the
ES6 sql connector NOTICE also misses these dependencies.

Best,
Jark

On Wed, 29 Apr 2020 at 17:26, Robert Metzger  wrote:


Thanks for taking a look Chesnay. Then let me officially cancel the
release:

-1 (binding)


Another question that I had while checking the release was the
"apache-flink-1.10.1.tar.gz" binary, which I suppose is the python
distribution.
It does not contain a LICENSE and NOTICE file at the root level (which is
okay [1] for binary releases), but in the "pyflink/" directory. There is
also a "deps/" directory, which contains a full distribution of Flink,
without any license files.
I believe it would be a little bit nicer to have the LICENSE and NOTICE
file in the root directory (if the python wheels format permits) to make
sure it is obvious that all binary release contents are covered by these
files.


[1]
http://www.apache.org/legal/release-policy.html#licensing-documentation




On Wed, Apr 29, 2020 at 11:10 AM Congxian Qiu 
wrote:


Thanks a lot for creating a release candidate for 1.10.1!

+1 from my side

checked
- md5/gpg, ok
- source does not contain any binaries, ok
- pom points to the same version 1.10.1, ok
- README file does not contain anything unexpected, ok
- maven clean package -DskipTests, ok
- maven clean verify, encounter a test timeout exception, but I think it
does not block the RC(have created an issue[1] to track it),
- run demos on a stand-alone cluster, ok

[1] https://issues.apache.org/jira/browse/FLINK-17458
Best,
Congxian


Robert Metzger  于2020年4月29日周三 下午2:54写道：


Thanks a lot for creating a release candidate for 1.10.1!

I'm not sure, but I think found a potential issue in the release while
checking dependency changes on the ElasticSearch7 connector:



https://github.com/apache/flink/commit/1827e4dddfbac75a533ff2aea2f3e690777a3e5e#diff-bd2211176ab6e7fa83ffeaa89481ff38

In this change, "com.carrotsearch:hppc" has been added to the shaded

jar

(



https://repository.apache.org/content/repositories/orgapacheflink-1362/org/apache/flink/flink-sql-connector-elasticsearch7_2.11/1.10.1/flink-sql-connector-elasticsearch7_2.11-1.10.1.jar

),
without including proper mention of that dependency in

"META-INF/NOTICE".


My checking notes:

- checked the diff for dependency changes:


https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1

(w/o
<

https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1(w/o

release commit:



https://github.com/apache/flink/compare/release-1.10.0...0e2b520ec60cc11dce210bc38e574a05fa5a7734

)
   - flink-connector-hive sets the derby version for test-scoped
dependencies:



https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-f4dbf40e8457457eb01ae22b53baa3ec

  - no NOTICE file found, but this module does not forward binaries.
   - kafka 0.10 minor version upgrade:



https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-0287a3f3c37b454c583b6b56de1392e4

   - NOTICE change found
- ES7 changes shading:



https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-bd2211176ab6e7fa83ffeaa89481ff38

  - problem found
   - Influxdb version change



https://github.com/apache/flink/compare/release-1.10.0...release-1.10.1-rc1#diff-0d2cce4875b2804ab89c3343a7de1ca6

  - NOTICE change found



On Fri, Apr 24, 2020 at 8:10 PM Yu Li  wrote:


Hi everyone,

Please review and vote on the release candidate #1 for version

1.10.1,

as

follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which

includes:

* JIRA release notes [1],
* the official Apache source release and binary convenience releases

to

be

deployed to dist.apache.org [2], which are signed with the key with
fingerprint D8D3D42E84C753CA5F170BDF93C07902771AB743 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.10.1-rc1" [5],
* website pull request listing the new release and adding

announcement

blog

post [6].

The vote will be open for at least 72 hours. It is adopted by

majority

approval, with at least 3 PMC affirmative votes.

Thanks,
Yu

[1]



https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346891

<


https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12346891

[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.1-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4]


https://repository.apache.org/content/repositories/orgapacheflink-1362/

[5]



https://github.com/apache/flink/commit/84b74cc0e21981bf6feceb74b48d7a9d3e215dc5

[6] https://github.com/apache/flink-web/pull/330

Re: [DISCUSS] Hierarchies in ConfigOption

2020-04-30 Thread Dawid Wysakowicz

Hi all,

I'd like to start with a comment that I am ok with the current state of
the FLIP-122 if there is a strong preference for it. Nevertheless I
still like the idea of adding `type` to the `format` to have it as
`format.type` = `json`.

I wanted to clarify a few things though:

@Jingsong As far as I see it most of the users copy/paste the properties
from the documentation to the SQL, so I don't think additional four
characters are too cumbersome. Plus if you force the additional suffix
onto all the options of a format you introduce way more boilerplate than
if we added the `type/kind/name`

@Kurt I agree that we cannot force it, but I think it is more of a
question to set standards/implicit contracts on the properties. What you
described with

'format' = 'csv',
'csv.allow-comments' = 'true',
'csv.ignore-parse-errors' = 'true'

would not work though as the `format` prefix is mandatory in the sources
as only the properties with format will be passed to the format factory
in majority of cases. We already have some implicit contracts.

@Forward I did not necessarily get the example. Aren't json and bson two
separate formats? Do you mean you can have those two at the same time?
Why do you need to differentiate the options for each? The way I see it is:

‘format(.name)' = 'json',
‘format.fail-on-missing-field' = 'false'

or

‘format(.name)' = 'bson',
‘format.fail-on-missing-field' = 'false'

@Benchao I'd be fine with any of name, kind, type(this we already had in
the past)

Best,
Dawid

On 30/04/2020 04:17, Forward Xu wrote:
> Here I have a little doubt. At present, our json only supports the
> conventional json format. If we need to implement json with bson, json with
> avro, etc., how should we express it?
> Do you need like the following:
>
> ‘format.name' = 'json',
>
> ‘format.json.fail-on-missing-field' = 'false'
>
>
> ‘format.name' = 'bson',
>
> ‘format.bson.fail-on-missing-field' = ‘false'
>
>
> Best,
>
> Forward
>
> Benchao Li  于2020年4月30日周四 上午9:58写道：
>
>> Thanks Timo for staring the discussion.
>>
>> Generally I like the idea to keep the config align with a standard like
>> json/yaml.
>>
>> From the user's perspective, I don't use table configs from a config file
>> like yaml or json for now,
>> And it's ok to change it to yaml like style. Actually we didn't know that
>> this could be a yaml like
>> configuration hierarchy. If it has a hierarchy, we maybe consider that in
>> the future to load the
>> config from a yaml/json file.
>>
>> Regarding the name,
>> 'format.kind' looks fine to me. However there is another name from the top
>> of my head:
>> 'format.name', WDYT?
>>
>> Dawid Wysakowicz  于2020年4月29日周三 下午11:56写道：
>>
>>> Hi all,
>>>
>>> I also wanted to share my opinion.
>>>
>>> When talking about a ConfigOption hierarchy we use for configuring Flink
>>> cluster I would be a strong advocate for keeping a yaml/hocon/json/...
>>> compatible style. Those options are primarily read from a file and thus
>>> should at least try to follow common practices for nested formats if we
>>> ever decide to switch to one.
>>>
>>> Here the question is about the properties we use in SQL statements. The
>>> origin/destination of these usually will be external catalog, usually in
>> a
>>> flattened(key/value) representation so I agree it is not as important as
>> in
>>> the aforementioned case. Nevertheless having a yaml based catalog or
>> being
>>> able to have e.g. yaml based snapshots of a catalog in my opinion is
>>> appealing. At the same time cost of being able to have a nice
>>> yaml/hocon/json representation is just adding a single suffix to a
>>> single(at most 2 key + value) property. The question is between `format`
>> =
>>> `json` vs `format.kind` = `json`. That said I'd be slighty in favor of
>>> doing it.
>>>
>>> Just to have a full picture. Both cases can be represented in yaml, but
>>> the difference is significant:
>>> format: 'json'
>>> format.option: 'value'
>>>
>>> vs
>>> format:
>>> kind: 'json'
>>>
>>> option: 'value'
>>>
>>> Best,
>>> Dawid
>>>
>>> On 29/04/2020 17:13, Flavio Pompermaier wrote:
>>>
>>> Personally I don't have any preference here.  Compliance wih standard
>> YAML
>>> parser is probably more important
>>>
>>> On Wed, Apr 29, 2020 at 5:10 PM Jark Wu  wrote:
>>>
 From a user's perspective, I prefer the shorter one "format=json",
>> because
 it's more concise and straightforward. The "kind" is redundant for
>> users.
 Is there a real case requires to represent the configuration in JSON
 style?
 As far as I can see, I don't see such requirement, and everything works
 fine by now.

 So I'm in favor of "format=json". But if the community insist to follow
 code style on this, I'm also fine with the longer one.

 Btw, I also CC user mailing list to listen more user's feedback.
>> Because I
 think this is relative to usability.

 Best,
 Jark

 On Wed, 29 Apr 2020 at 22:09, Chesnay Schepler 
 wrote:

Re: [DISCUSS] flink-connector-rabbitmq api changes

2020-04-30 Thread Dawid Wysakowicz

Hi Karim,

Sorry you did not have the best first time experience. You certainly did
everything right which I definitely appreciate.

The problem in that particular case, as I see it, is that RabbitMQ is
not very actively maintained and therefore it is not easy too find a
committer willing to take on this topic. The point of connectors not
being properly maintained was raised a few times in the past on the ML.
One of the ideas how to improve the situation there was to start a
https://flink-packages.org/ page. The idea is to ask active users of
certain connectors to maintain those connectors outside of the core
project, while giving them a platform within the community where they
can make their modules visible. That way it is possible to overcome the
lack of capabilities within the core committers without loosing much on
the visibility.

I would kindly ask you to consider that path, if you are interested. You
can of course also wait/reach out to more committers if you feel strong
about contributing those changes back to the Flink repository itself.

Best,

Dawid

On 30/04/2020 07:29, seneg...@gmail.com wrote:
> Hello,
>
> I am new to the mailing list and to contributing in Big opensource projects
> in general and i don't know if i did something wrong or should be more
> patient :)
>
> I put a topic for discussion as per the contribution guide "
> https://flink.apache.org/contributing/how-to-contribute.html; almost a week
> ago and since what i propose is not backward compatible it needs to be
> discussed here before opening a ticket and moving forward.
>
> So my question is. Will someone pick the discussion up ? or at least
> someone would say that this is not the way to go ? or should i assume from
> the silence that it's not important / relevant to the project ? Should i
> track the author of the connector and send him directly ?
>
> Thank you for your time.
>
> Regards,
> Karim Mansour
>
> On Fri, Apr 24, 2020 at 11:17 AM seneg...@gmail.com 
> wrote:
>
>> Dear All,
>>
>> I want to propose a change to the current RabbitMQ connector.
>>
>> Currently the RMQSource is extracting the body of the message which is a
>> byte array and pass it to a an instance of a user implementation of the
>> DeserializationSchema class to deserialize the body of the message. It
>> also uses the correlation id from the message properties to deduplicate the
>> message.
>>
>> What i want to propose is instead of taking a implementation of a
>> DeserializationSchema in the RMQSource constructor, actually have the
>> user implement an interface that would have methods both the output for the
>> RMQSource and the correlation id used not only from the body of the message
>> but also to it's metadata and properties thus giving the connector much
>> more power and flexibility.
>>
>> This of course would mean a breaking API change for the RMQSource since it
>> will no longer take a DeserializationSchema but an implementation of a
>> predefined interface that has the methods to extract both the output of the
>> RMQSource and the to extract the unique message id as well.
>>
>> The reason behind that is that in my company we were relaying on another
>> property the message id for deduplication of the messages and i also needed
>> that information further down the pipeline and there was absolutely no way
>> of getting it other than modifying the RMQSource.
>>
>> I already have code written but as the rules dictates i have to run it by
>> you guys first before i attempt to create a Jira ticket :)
>>
>> Let me know what you think.
>>
>> Regards,
>> Karim Mansour
>>



signature.asc
Description: OpenPGP digital signature

60 matches

Mail list logo