Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Xintong Song
Thanks for driving the discussion, @Jark. The conclusion LGTM. @Yun, Since the streaming operators did not use managed memory previously, I don't think it's possible for any use cases with managed memory streaming operators to align with the previous behaviors. No matter how the consumer weights

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Yun Tang
+1 (binding) Best Yun Tang From: Wei Zhong Sent: Wednesday, January 6, 2021 14:07 To: dev Subject: Re: [VOTE] FLIP-153: Support state access in Python DataStream API +1 (non-binding) Best, Wei > 在 2021年1月6日,14:05,Xingbo Huang 写道: > > +1 (non-binding) > >

[jira] [Created] (FLINK-20861) Provide an option for serializing DECIMALs in JSON as plain number instead of scientific notation

2021-01-05 Thread Q Kang (Jira)
Q Kang created FLINK-20861: -- Summary: Provide an option for serializing DECIMALs in JSON as plain number instead of scientific notation Key: FLINK-20861 URL: https://issues.apache.org/jira/browse/FLINK-20861

Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Yun Tang
The design looks good to me now. +1 to start the vote if there are no more comments.. Best Yun Tang From: Dian Fu Sent: Tuesday, January 5, 2021 13:32 To: dev@flink.apache.org Subject: Re: [DISCUSS] FLIP-153: Support state access in Python DataStream API

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Yun Tang
I think using managed memory within streaming operator is a good idea and I just have a question over last conclusion: If both OPERATOR and STATE_BACKEND set as 70 to align with previous behavior, what will happen if one slot has both consumers of managed streaming operator and state backend?

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
> I think filter expressions and grouping sets are semantic arguments instead of utilities. If we want to push them into sources, the connector developers should be aware of them.Wrapping them in a context implicitly is error-prone that the existing connector will produce wrong results when

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
I think filter expressions and grouping sets are semantic arguments instead of utilities. If we want to push them into sources, the connector developers should be aware of them. Wrapping them in a context implicitly is error-prone that the existing connector will produce wrong results when

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Wei Zhong
+1 (non-binding) Best, Wei > 在 2021年1月6日,14:05,Xingbo Huang 写道: > > +1 (non-binding) > > Best, > Xingbo > > Dian Fu 于2021年1月6日周三 下午1:38写道: > >> +1 (binding) >> >>> 在 2021年1月6日,下午1:12,Shuiqiang Chen 写道: >>> >>> Hi devs, >>> >>> The discussion of the FLIP-153 [1] seems has reached a

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Xingbo Huang
+1 (non-binding) Best, Xingbo Dian Fu 于2021年1月6日周三 下午1:38写道: > +1 (binding) > > > 在 2021年1月6日,下午1:12,Shuiqiang Chen 写道: > > > > Hi devs, > > > > The discussion of the FLIP-153 [1] seems has reached a consensus through > > the mailing thread [2]. I would like to start a vote for it. > > > >

Task scheduling of Flink

2021-01-05 Thread penguin.
Hello! Do you know how to modify the task scheduling method of Flink?

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi, I'm also curious about aggregate with filter (COUNT(1) FILTER(WHERE d > 1)). Can we push it down? I'm not sure that a single call expression can express it, and how we should embody it and convey it to users. Best, Jingsong On Wed, Jan 6, 2021 at 1:36 PM Jingsong Li wrote: > Hi Jark, > >

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Jark Wu
Thanks all for the discussion. I have created an issue FLINK-20860 [1] to support this. In conclusion, we will extend the configuration `taskmanager.memory.managed.consumer-weights` to have 2 more consumer kinds: OPERATOR and STATE_BACKEND, the available consumer kinds will be : * `OPERATOR`

[jira] [Created] (FLINK-20860) Allow streaming operators to use managed memory

2021-01-05 Thread Jark Wu (Jira)
Jark Wu created FLINK-20860: --- Summary: Allow streaming operators to use managed memory Key: FLINK-20860 URL: https://issues.apache.org/jira/browse/FLINK-20860 Project: Flink Issue Type: Sub-task

Re: [VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Dian Fu
+1 (binding) > 在 2021年1月6日,下午1:12,Shuiqiang Chen 写道: > > Hi devs, > > The discussion of the FLIP-153 [1] seems has reached a consensus through > the mailing thread [2]. I would like to start a vote for it. > > The vote will be opened until 11th January (72h), unless there is an > objection or

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi Jark, I don't want to limit this interface to LocalAgg Push down. Actually, sometimes, we can push whole aggregation to source too. So, this rule can do something more advanced. For example, we can push down group sets to source too, for the SQL: "GROUP BY GROUPING SETS (f1, f2)". Then, we

[VOTE] FLIP-153: Support state access in Python DataStream API

2021-01-05 Thread Shuiqiang Chen
Hi devs, The discussion of the FLIP-153 [1] seems has reached a consensus through the mailing thread [2]. I would like to start a vote for it. The vote will be opened until 11th January (72h), unless there is an objection or no enough votes. Best, Shuiqiang [1]:

Re: [DISCUSS] FLIP-155: Introduce a few convenient operations in Table API

2021-01-05 Thread Dian Fu
Hi all, I have updated the FLIP about temporal join, sql hints and window TVF. Regards, Dian > 在 2021年1月5日,上午11:58,Dian Fu 写道: > > Thanks a lot for your comments! > > Regarding to Python Table API examples: I thought it should be > straightforward about how to use these operations in Python

[jira] [Created] (FLINK-20859) java.lang.NoClassDefFoundError: org/apache/parquet/avro/AvroParquetWriter

2021-01-05 Thread jack sun (Jira)
jack sun created FLINK-20859: Summary: java.lang.NoClassDefFoundError: org/apache/parquet/avro/AvroParquetWriter Key: FLINK-20859 URL: https://issues.apache.org/jira/browse/FLINK-20859 Project: Flink

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
I think this may be over designed. We should have confidence in the interface we design, the interface should be stable. Wrapping things in a big context has a cost of losing user convenience. Foremost, we don't see any parameters to add in the future. Do you know any potential parameters? Best,

[jira] [Created] (FLINK-20857) Separate the implementation of batch window aggregate nodes

2021-01-05 Thread godfrey he (Jira)
godfrey he created FLINK-20857: -- Summary: Separate the implementation of batch window aggregate nodes Key: FLINK-20857 URL: https://issues.apache.org/jira/browse/FLINK-20857 Project: Flink

[jira] [Created] (FLINK-20858) Port StreamExecPythonGroupWindowAggregate and BatchExecPythonGroupWindowAggregate to Java

2021-01-05 Thread godfrey he (Jira)
godfrey he created FLINK-20858: -- Summary: Port StreamExecPythonGroupWindowAggregate and BatchExecPythonGroupWindowAggregate to Java Key: FLINK-20858 URL: https://issues.apache.org/jira/browse/FLINK-20858

[jira] [Created] (FLINK-20856) Separate the implementation of stream window aggregate nodes

2021-01-05 Thread godfrey he (Jira)
godfrey he created FLINK-20856: -- Summary: Separate the implementation of stream window aggregate nodes Key: FLINK-20856 URL: https://issues.apache.org/jira/browse/FLINK-20856 Project: Flink

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jingsong Li
Hi Sebastian, Well, I mean: `boolean applyAggregates(int[] groupingFields, List aggregateExpressions, DataType producedDataType);` VS ``` boolean applyAggregates(Aggregation agg); interface Aggregation { int[] groupingFields(); List aggregateExpressions(); DataType producedDataType(); }

[jira] [Created] (FLINK-20855) Calculating numBuckets exceeds the maximum value of int and got a negative number

2021-01-05 Thread JieFang.He (Jira)
JieFang.He created FLINK-20855: -- Summary: Calculating numBuckets exceeds the maximum value of int and got a negative number Key: FLINK-20855 URL: https://issues.apache.org/jira/browse/FLINK-20855

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Sebastian Liu
Hi Jinsong, Thx a lot for your suggestion. These points really need to be clear in the proposal. For the semantic problem, I think the main point is the different returned data types for the target aggregate function and the row format returned by the underlying storage. That's why we provide

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Arvid Heise
For 2) the race condition, I was more thinking of still injecting the barrier at the source in all cases, but having some kind of short-cut to immediately execute the RPC inside the respective taskmanager. However, that may prove hard in case of dynamic scale-ins. Nevertheless, because of this

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Yun Gao
Hi Aljoscha, Very thanks for the feedbacks! For the second issue, I'm indeed thinking the race condition between deciding to trigger and operator get finished. And for this point, > One thought here is this: will there ever be intermediate operators that > should be

Re: [DISCUSS] Drop Scala 2.11

2021-01-05 Thread Jeff Zhang
Glad to see someone in community would like to drive this effort. If scala 2.13 can do whatever scala 2.11 can do in flink (such as support scala-shell, scala lambda udf and etc), then I would be 100% support of dropping scala 2.11. Aljoscha Krettek 于2021年1月5日周二 下午11:01写道: > There is some new

[jira] [Created] (FLINK-20854) Introduce BytesMultiMap to support buffering records

2021-01-05 Thread Jark Wu (Jira)
Jark Wu created FLINK-20854: --- Summary: Introduce BytesMultiMap to support buffering records Key: FLINK-20854 URL: https://issues.apache.org/jira/browse/FLINK-20854 Project: Flink Issue Type:

Re: [DISCUSS] Drop Scala 2.11

2021-01-05 Thread Aljoscha Krettek
There is some new enthusiasm for bringing Scala 2.13 support to Flink: https://issues.apache.org/jira/browse/FLINK-13414. One of the assumed prerequisites for this is dropping support for Scala 2.11 because it will be too hard (impossible) to try and support three Scala versions at the same

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Yun Gao
Hi Avrid, Very thanks for the feedbacks! For the second issue, sorry I think I might not make it very clear, I'm initially thinking the case that for example for a job with graph A -> B -> C, when we compute which tasks to trigger, A is still running, so we trigger A

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Aljoscha Krettek
On 2021/01/05 10:16, Arvid Heise wrote: 1. I'd think that this is an orthogonal issue, which I'd solve separately. My gut feeling says that this is something we should only address for new sinks where we decouple the semantics of commits and checkpoints anyways. @Aljoscha Krettek any idea on

[jira] [Created] (FLINK-20853) Add reader schema null check for AvroDeserializationSchema when recordClazz is GenericRecord

2021-01-05 Thread hailong wang (Jira)
hailong wang created FLINK-20853: Summary: Add reader schema null check for AvroDeserializationSchema when recordClazz is GenericRecord Key: FLINK-20853 URL: https://issues.apache.org/jira/browse/FLINK-20853

[jira] [Created] (FLINK-20852) Enrich back pressure stats per subtask in the WebUI

2021-01-05 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-20852: -- Summary: Enrich back pressure stats per subtask in the WebUI Key: FLINK-20852 URL: https://issues.apache.org/jira/browse/FLINK-20852 Project: Flink

[jira] [Created] (FLINK-20851) flink datagen produce NULL value

2021-01-05 Thread appleyuchi (Jira)
appleyuchi created FLINK-20851: -- Summary: flink datagen produce NULL value Key: FLINK-20851 URL: https://issues.apache.org/jira/browse/FLINK-20851 Project: Flink Issue Type: Bug

Re: Apache Pinot Sink

2021-01-05 Thread Poerschke, Mats
Just as a short addition: We plan to contribute the sink to Apache Bahir. Best regards Mats Pörschke > On 5. Jan 2021, at 13:21, Poerschke, Mats > wrote: > > Hi all, > > we want to contribute a sink connector for Apache Pinot. The following > briefly describes the planned control flow.

Apache Pinot Sink

2021-01-05 Thread Poerschke, Mats
Hi all, we want to contribute a sink connector for Apache Pinot. The following briefly describes the planned control flow. Please feel free to comment on any of its aspects. Background Apache Pinot is a large-scale real-time data ingestion engine working on data segments internally. The

[jira] [Created] (FLINK-20850) Analyze whether CoLocationConstraints and CoLocationGroup can be removed

2021-01-05 Thread Matthias (Jira)
Matthias created FLINK-20850: Summary: Analyze whether CoLocationConstraints and CoLocationGroup can be removed Key: FLINK-20850 URL: https://issues.apache.org/jira/browse/FLINK-20850 Project: Flink

[CVE-2020-17518] Apache Flink directory traversal attack: remote file writing through the REST API

2021-01-05 Thread Robert Metzger
CVE-2020-17518: Apache Flink directory traversal attack: remote file writing through the REST API Vendor: The Apache Software Foundation Versions Affected: 1.5.1 to 1.11.2 Description: Flink 1.5.1 introduced a REST handler that allows you to write an uploaded file to an arbitrary location on

[CVE-2020-17519] Apache Flink directory traversal attack: reading remote files through the REST API

2021-01-05 Thread Robert Metzger
CVE-2020-17519: Apache Flink directory traversal attack: reading remote files through the REST API Vendor: The Apache Software Foundation Versions Affected: 1.11.0, 1.11.1, 1.11.2 Description: A change introduced in Apache Flink 1.11.0 (and released in 1.11.1 and 1.11.2 as well) allows

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Xintong Song
> > Would the default weight for OPERATOR and STATE_BACKEND be the same value? > I would say yes, to align with previous behaviors. Thank you~ Xintong Song On Tue, Jan 5, 2021 at 5:51 PM Till Rohrmann wrote: > +1 for Jark's and Xintong's proposal. > > Would the default weight for OPERATOR

Re: [PSA] Configure "Save Actions" only for Java files

2021-01-05 Thread Till Rohrmann
This is very helpful. Thanks a lot Aljoscha! Cheers, Till On Tue, Jan 5, 2021 at 10:59 AM Aljoscha Krettek wrote: > If you're using "Save Actions" to auto-format your Java code, as > recommended in [1], you should add a regex in the settings to make sure > that this only formats Java code.

[PSA] Configure "Save Actions" only for Java files

2021-01-05 Thread Aljoscha Krettek
If you're using "Save Actions" to auto-format your Java code, as recommended in [1], you should add a regex in the settings to make sure that this only formats Java code. Otherwise you will get weird results when IntelliJ also formats XML, Markdown or Scala files for you. Best, Aljoscha [1]

Re: [DISCUSS] Allow streaming operators to use managed memory

2021-01-05 Thread Till Rohrmann
+1 for Jark's and Xintong's proposal. Would the default weight for OPERATOR and STATE_BACKEND be the same value? Cheers, Till On Tue, Jan 5, 2021 at 6:39 AM Jingsong Li wrote: > +1 for allowing streaming operators to use managed memory. > > The memory use of streams requires some hierarchy,

Re: [DISCUSS] FLIP-147: Support Checkpoints After Tasks Finished

2021-01-05 Thread Arvid Heise
Hi Yun, 1. I'd think that this is an orthogonal issue, which I'd solve separately. My gut feeling says that this is something we should only address for new sinks where we decouple the semantics of commits and checkpoints anyways. @Aljoscha Krettek any idea on this one? 2. I'm not sure I get it

Re: Support local aggregate push down for Blink batch planner

2021-01-05 Thread Jark Wu
Thanks for the update. The proposal looks good to me now. Best, Jark On Tue, 5 Jan 2021 at 14:44, Jingsong Li wrote: > Thanks for your proposal! Sebastian. > > +1 for SupportsAggregatePushDown. The above wonderful discussion has > solved many of my concerns. > > ## Semantic problems > > We