Re: (无主题)
Hi Liwen, you don't need contributor permissions to contribute to Apache Flink. Just open/pick an issue and ping a committer to assign the ticket to you. Best, Shawn Huang 刘力文 于2020年12月25日周五 下午6:24写道: > Hi Guys, > I want to contribute to Apache Flink. > Would you please give me the permission as a contributor? > My JIRA ID is Liwen Liu. > > > | | > 刘力文 > | > | > 邮箱:liulw_g...@163.com > | > > 签名由 网易邮箱大师 定制
[jira] [Created] (FLINK-20773) Support allow-unescaped-control-chars option for JSON format
xiaozilong created FLINK-20773: -- Summary: Support allow-unescaped-control-chars option for JSON format Key: FLINK-20773 URL: https://issues.apache.org/jira/browse/FLINK-20773 Project: Flink Issue Type: Improvement Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) Affects Versions: 1.12.0 Reporter: xiaozilong Attachments: image-2020-12-25-20-21-50-637.png Can we add a option `allow-unquoted-ctrl-char` for json format because of it will throw exception when exist illegal unquoted characters in data. !image-2020-12-25-20-21-50-637.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Make Temporal Join syntax easier to use
Hi, Shuo Cheng Thanks for bringing this topic, I think it’s a good idea to simplify current temporal join syntax. But I think the most valuable thing is to simplify the long keyword FOR SYSTEM_TIME AS OF lying in FOR SYSTEM_TIME AS OF L.PROCTIME syntax rather than simplify the L.PROCTIME reference to a function PROCTIME() or pseudo-column _PROCTIME. > I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to avoid > conflict) to the table by default, just like the pseudo-column of classic > database -1 for pseudo-column, I didn’t see any benefit for importing a pretty pseudo-column concept and the motivation only comes from a minor syntax simplification purpose. For PROCTIME(), I’ve same concern it’s not clear who keeps the function() ? LHS table or RHS table or event JOIN ? I think Danny’s explanation the PROCTIME() computation should happen in row level makes sense, but I think the computation should be triggered when the LHS table’s record correlates the version of RHS table. And for event time temporal join, the syntax FOR SYSTEM_TIME AS OF l.rowtime also follows similar semantic that use the rowtime in LHS table’s record to correlates the version of RHS table. Thus I tend to keep current temporal join syntax, I won’t say +1 or -1 for 'FOR SYSTEM_TIME AS OF PROCTIME()’ simplification. Best, Leonard > I don't think add a pseudo-column is a good solution because of these > reasons: > > - The normal pseudo-column or system column like _rowID_ has a > underneath storage, user can select the column from a table [1] , and each > row has a deterministic value bind to it for the pseudo-column (although it > may change when the row is deleted and inserted again), but the PROCTIME > for Flink behaves more like a row level runtime attribute which is > different for different queries and even different SQL contexts. > > - The pseudo-column make the table schema more complex but they are only > useful when we want to use the time-attributes. > >> Actually, we have another simpler solution, i.e., enrich the syntax for > temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. > > Maybe our biggest concern is that the syntax does not make it clear where > the who triggers the PROCTIME() computation, similar with > `current_timestamp`, from my understanding, the `PROCTIME()` is computed in > row level by the system, when a record from the LHS is used to join the RHS > table. So generally i'm +1 for 'FOR SYSTEM_TIME AS OF PROCTIME()'. > > BTW, in the long term, we should think how to simplifies the `FOR > SYSTEM_TIME AS OF ` syntax, because the proc time temporal table join is > the most common case and we should make the temporal table join default to > be 'PROCTIME'. Ideally a normal `A JOIN B` can describe a PROCTIME temporal > table join. The solution to add pseudo-column seems deviate further and > further from this path. > > [1] > https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm > > Shuo Cheng 于2020年12月21日周一 上午10:16写道: >> >> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to >> avoid conflict) to the table by default, just like the pseudo-column of >> classic database, e.g., `ROWID` in Oracle. Less elegant solution >> Actually, we have another simpler solution, i.e., enrich the syntax for >> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is >> also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF >> PROCTIME()' is ambiguous, because it cannot tell where the version time of >> temporal table comes from, left table or right table? The former is what we >> want. So I think this solution is not preferred. >> >> Looking forward to your feedback~ >> >> Best, >> Shuo >>
[jira] [Created] (FLINK-20772) [DISCUSS] RocksDBValueState with TTL occurs NullPointerException when calling update(null) method
Seongbae Chang created FLINK-20772: -- Summary: [DISCUSS] RocksDBValueState with TTL occurs NullPointerException when calling update(null) method Key: FLINK-20772 URL: https://issues.apache.org/jira/browse/FLINK-20772 Project: Flink Issue Type: Bug Components: Runtime / State Backends Affects Versions: 1.11.2 Environment: Flink version: 1.11.2 Flink Cluster: Standalone cluster with 3 Job managers and Task managers on CentOS 7 Reporter: Seongbae Chang h2. Problem * I use ValueState for my custom trigger and set TTL for these ValueState in RocksDB backend environment. * I found an error when I used this code. I know that ValueState.update(null) works equally to ValueState.clear() in general. Unfortunately, this error occurs after using TTL {code:java} // My Code ctx.getPartitionedState(batchTotalSizeStateDesc).update(null); {code} * I tested this in Flink 1.11.2, but I think it would be a problem in upper versions. * Plus, I'm a beginner. So, if there is any problem in this discussion issue, please give me advice about that. And I'll fix it! {code:java} // Error Stacktrace Caused by: TimerException{org.apache.flink.util.FlinkRuntimeException: Error while adding data to RocksDB} ... 12 more Caused by: org.apache.flink.util.FlinkRuntimeException: Error while adding data to RocksDB at org.apache.flink.contrib.streaming.state.RocksDBValueState.update(RocksDBValueState.java:108) at org.apache.flink.runtime.state.ttl.TtlValueState.update(TtlValueState.java:50) at .onProcessingTime(ActionBatchTimeTrigger.java:102) at .onProcessingTime(ActionBatchTimeTrigger.java:29) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.onProcessingTime(WindowOperator.java:902) at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:498) at org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:260) at org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1220) ... 11 more Caused by: java.lang.NullPointerException at org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:69) at org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:32) at org.apache.flink.api.common.typeutils.CompositeSerializer.serialize(CompositeSerializer.java:142) at org.apache.flink.contrib.streaming.state.AbstractRocksDBState.serializeValueInternal(AbstractRocksDBState.java:158) at org.apache.flink.contrib.streaming.state.AbstractRocksDBState.serializeValue(AbstractRocksDBState.java:178) at org.apache.flink.contrib.streaming.state.AbstractRocksDBState.serializeValue(AbstractRocksDBState.java:167) at org.apache.flink.contrib.streaming.state.RocksDBValueState.update(RocksDBValueState.java:106) ... 18 more {code} h2. Reason * It relates to RocksDBValueState with TTLValueState * In RocksDBValueState(as well as other types of ValueState), *.update(null)* has to be caught in if-clauses(null checking). However, it skips the null checking and then tries to serialize the null value. {code:java} // https://github.com/apache/flink/blob/release-1.11/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java#L96-L110 @Override public void update(V value) { if (value == null) { clear(); return; } try { backend.db.put(columnFamily, writeOptions, serializeCurrentKeyWithGroupAndNamespace(), serializeValue(value)); } catch (Exception e) { throw new FlinkRuntimeException("Error while adding data to RocksDB", e); } }{code} * It is because that TtlValueState wraps the value(null) with the LastAccessTime and makes the new TtlValue Object with the null value. {code:java} // https://github.com/apache/flink/blob/release-1.11/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/TtlValueState.java#L47-L51 @Override public void update(T value) throws IOException { accessCallback.run(); original.update(wrapWithTs(value)); } {code} {code:java} // https://github.com/apache/flink/blob/release-1.11/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/TtlUtils.java#L46-L48 static TtlValue wrapWithTs(V value, long ts) { return new TtlValue<>(value, ts); }{code} {code:java} // https://github.com/apache/flink/blob/release-1.11/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/TtlValue.java public class TtlValue implements Serializable { private static final long serialVersionUID = 5221129704201125020L; @Nullable private final T
(无主题)
Hi Guys, I want to contribute to Apache Flink. Would you please give me the permission as a contributor? My JIRA ID is Liwen Liu. | | 刘力文 | | 邮箱:liulw_g...@163.com | 签名由 网易邮箱大师 定制
[jira] [Created] (FLINK-20771) Hive partition is not added when there is a lot of data
hehuiyuan created FLINK-20771: - Summary: Hive partition is not added when there is a lot of data Key: FLINK-20771 URL: https://issues.apache.org/jira/browse/FLINK-20771 Project: Flink Issue Type: Bug Components: Connectors / Hive Reporter: hehuiyuan Attachments: image-2020-12-25-18-09-42-707.png, image-2020-12-25-18-15-07-519.png Hive partition is not added when the data is huge . !image-2020-12-25-18-09-42-707.png|width=437,height=115! Before partition commit, *inProgressPart* will be reinitialize . But bucket is active , the partition is !image-2020-12-25-18-15-07-519.png|width=574,height=192! bucket is active , so the notifyBucketInactive is not executed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-20770) Incorrect description for config option kubernetes.rest-service.exposed.type
Yang Wang created FLINK-20770: - Summary: Incorrect description for config option kubernetes.rest-service.exposed.type Key: FLINK-20770 URL: https://issues.apache.org/jira/browse/FLINK-20770 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Affects Versions: 1.11.3, 1.12.0 Reporter: Yang Wang {code:java} public static final ConfigOption REST_SERVICE_EXPOSED_TYPE = key("kubernetes.rest-service.exposed.type") .enumType(ServiceExposedType.class) .defaultValue(ServiceExposedType.LoadBalancer) .withDescription("The type of the rest service (ClusterIP or NodePort or LoadBalancer). " + "When set to ClusterIP, the rest service will not be created."); {code} The description of the config option is not correct. We will always create the rest service after refactoring the Kubernetes decorators in FLINK-16194. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Apache Flink 1.12.1
Hi devs, I'm very glad to announce that all known blocker issues for release-1.12.1 have been resolved. I'm creating our first release candidate now and will start a separate voting thread as soon as RC1 is created. Thanks everyone, and Merry Christmas. Thank you~ Xintong Song On Wed, Dec 23, 2020 at 6:07 PM Xintong Song wrote: > Hi devs, > > Updates on the progress of release. > > In the past week, more than 20 issues were resolved for release 1.12.1. > Thanks for the efforts. > > We still have 3 unresolved release blockers at the moment. > >- [FLINK-20648] Unable to restore from savepoints with Kubernetes HA. >Consensus has been reached on the solution. @Yang Wang is working on a >PR. >- [FLINK-20654] Unaligned checkpoint recovery may lead to corrupted >data stream. >@Roman Khachatryan is still investigating the problem. >- [FLINK-20664] Support setting service account for TaskManager pod. >Boris Lublinsky has opened a PR, which is already reviewed and close >to mergeable. > > Since we are targeting a swift release, I'm not intended to further delay > the release for other non-blocker issues, unless there's a good reason. > If there's anything that you believe is absolutely necessary for release > 1.12.1, please reach out to me. > Otherwise, the voting process will be started as soon as the above > blockers are addressed. > > Thank you~ > > Xintong Song > > > > On Mon, Dec 21, 2020 at 10:05 AM Xingbo Huang wrote: > >> Hi Xintong, >> >> Thanks a lot for driving this. >> >> I'd like to bring one more issue to your attention: >> https://issues.apache.org/jira/browse/FLINK-20389. >> This issue occurs quite frequently. Arvid and Kezhu have done some >> investigations of this issue and it may indicate a bug of the new Source >> API. It would be great to figure out the root cause of this issue. >> >> Best, >> Xingbo >> >> Xintong Song 于2020年12月18日周五 下午7:49写道: >> >> > Thanks for the replies so far. >> > >> > I've been reaching out to the owners of the reported issues. It seems >> most >> > of the blockers are likely resolved in the next few days. >> > >> > Since some of the issues are quite critical, I'd like to aim for a >> *feature >> > freeze on Dec. 23rd*, and start the release voting process by the end of >> > this week. >> > >> > If there's anything you might need more time for, please reach out to >> me. >> > >> > Thank you~ >> > >> > Xintong Song >> > >> > >> > >> > On Fri, Dec 18, 2020 at 3:19 PM Tzu-Li (Gordon) Tai < >> tzuli...@apache.org> >> > wrote: >> > >> > > Thanks Xintong for driving this. >> > > >> > > I'd like to make two more issues related to the Kinesis connector >> changes >> > > in 1.12.0 a blocker for 1.12.1: >> > > https://issues.apache.org/jira/browse/FLINK-20630 >> > > https://issues.apache.org/jira/browse/FLINK-20629 >> > > >> > > There are already PRs for these issues from @Cranmer, Danny >> > > , will try to merge these very soon. >> > > >> > > Cheers, >> > > Gordon >> > > >> > > On Fri, Dec 18, 2020 at 1:19 PM Guowei Ma >> wrote: >> > > >> > >> Thanks for driving this release Xintong. >> > >> I think https://issues.apache.org/jira/browse/FLINK-20652 should be >> > >> addressed. >> > >> >> > >> Best, >> > >> Guowei >> > >> >> > >> >> > >> On Fri, Dec 18, 2020 at 11:53 AM Jingsong Li > > >> > >> wrote: >> > >> >> > >> > Thanks for volunteering as our release manager Xintong. +1 for >> > releasing >> > >> > Flink 1.12.1 soon. >> > >> > >> > >> > I think https://issues.apache.org/jira/browse/FLINK-20665 should >> be >> > >> > addressed, I marked it as a Blocker. >> > >> > >> > >> > Best, >> > >> > Jingsong >> > >> > >> > >> > On Fri, Dec 18, 2020 at 11:16 AM Yang Wang >> > >> wrote: >> > >> > >> > >> > > Hi David, >> > >> > > >> > >> > > I will take a look this ticket FLINK-20648 and try to get it >> > resolved >> > >> in >> > >> > > this release cycle. >> > >> > > >> > >> > > @Xintong Song >> > >> > > One more Kubernetes HA related issue. We need to support setting >> > >> service >> > >> > > account for TaskManager pod[1]. Even though we have a work around >> > for >> > >> > this >> > >> > > issue, but it is not acceptable to always let the default service >> > >> account >> > >> > > with enough permissions. >> > >> > > >> > >> > > [1]. https://issues.apache.org/jira/browse/FLINK-20664 >> > >> > > >> > >> > > Best, >> > >> > > Yang >> > >> > > >> > >> > > >> > >> > > David Morávek 于2020年12月18日周五 >> 上午12:47写道: >> > >> > > >> > >> > > > Hi, I think https://issues.apache.org/jira/browse/FLINK-20648 >> > >> should >> > >> > be >> > >> > > > addressed, as Kubernetes HA was one of the main selling points >> of >> > >> this >> > >> > > > release. WDYT? >> > >> > > > >> > >> > > > D. >> > >> > > > >> > >> > > > Sent from my iPhone >> > >> > > > >> > >> > > > > On 17. 12. 2020, at 13:54, Yun Tang >> wrote: >> > >> > > > > >> > >> > > > > Thanks for driving this quick-fix release. >> > >> > > > > +1 for fixing the bug of RocksDB
[jira] [Created] (FLINK-20769) Support minibatch to optimize Python UDAF
Huang Xingbo created FLINK-20769: Summary: Support minibatch to optimize Python UDAF Key: FLINK-20769 URL: https://issues.apache.org/jira/browse/FLINK-20769 Project: Flink Issue Type: Improvement Components: API / Python Reporter: Huang Xingbo Fix For: 1.13.0 Support minibatch to optimize Python UDAF -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Apache Flink Stateful Functions 2.2.2, release candidate #2
Thanks a lot for driving this, Gordon! +1 (binding) - verified the checksum and signature - looked through the changes since 2.2.1 for license changes, verified the NOTICE file of statefun-flink-datastream and it looks good to me - the website PR LGTM Regards, Dian > 在 2020年12月24日,下午3:21,Xingbo Huang 写道: > > +1 (non-binding) > > - Verify checksums and GPG files > - Verify that the source archives do not contains any binaries > - Build the source with Maven to ensure all source files have Apache > headers (JDK8) > Command: mvn clean install -Papache-release > - Run e2e tests (JDK8) > - Check that all POM files, Dockerfiles, examples point to the same > version. That includes the quickstart artifact POM files. > - pip install apache_flink_statefun-2.2.2-py3-none-any.whl > - Verified NOTICE files in statefun-flink-datastream, > statefun-flink-distribution and statefun-ridesharing-example-simulator > > Best, > Xingbo
[jira] [Created] (FLINK-20768) Support routing field for Elasticsearch connector
wangsan created FLINK-20768: --- Summary: Support routing field for Elasticsearch connector Key: FLINK-20768 URL: https://issues.apache.org/jira/browse/FLINK-20768 Project: Flink Issue Type: Improvement Components: Connectors / ElasticSearch Reporter: wangsan Routing in Elasticsearch can help with search efficency for large scale dataset, we should support this feature as an optional config in Elasticsearch connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)