[jira] [Created] (FLINK-26746) Update the documentation of Timer's Fault Tolerance section
Liwei Lin created FLINK-26746: - Summary: Update the documentation of Timer's Fault Tolerance section Key: FLINK-26746 URL: https://issues.apache.org/jira/browse/FLINK-26746 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Liwei Lin Currently in the documentation of Timer's Fault Tolerance section ( [https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/process_function/#fault-tolerance] ), it says: Timers are always asynchronously checkpointed, except for the combination of RocksDB backend / with incremental snapshots / with heap-based timers (will be resolved with {{{}FLINK-10026{}}}). Actually FLINK-10026 is resolved as 'Won't Do'; the documentation might be updated. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream
[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297080#comment-17297080 ] Liwei Lin commented on FLINK-20654: --- Hi [~pnowojski], very appreciated for your detailed explanation! Exactly what we'd like to know! (y) > Unaligned checkpoint recovery may lead to corrupted data stream > --- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.12.0, 1.12.1 >Reporter: Arvid Heise >Assignee: Piotr Nowojski >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.12.2, 1.13.0 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;10] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream
[ https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296771#comment-17296771 ] Liwei Lin commented on FLINK-20654: --- hi [~pnowojski] , does this also affect Flink 1.11.3 ? And another thing we'd like to know is, is unaligned checkpoint production-ready as of Flink 1.11.3 or 1.12.2 ? thanks! :) > Unaligned checkpoint recovery may lead to corrupted data stream > --- > > Key: FLINK-20654 > URL: https://issues.apache.org/jira/browse/FLINK-20654 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing >Affects Versions: 1.12.0, 1.12.1 >Reporter: Arvid Heise >Assignee: Piotr Nowojski >Priority: Blocker > Labels: pull-request-available, test-stability > Fix For: 1.12.2, 1.13.0 > > > Fix of FLINK-20433 shows potential corruption after recovery for all > variations of UnalignedCheckpointITCase. > To reproduce, run UCITCase a couple hundreds times. The issue showed for me > in: > - execute [Parallel union, p = 5] > - execute [Parallel union, p = 10] > - execute [Parallel cogroup, p = 5] > - execute [parallel pipeline with remote channels, p = 5] > with decreasing frequency. > The issue manifests as one of the following issues: > - stream corrupted exception > - EOF exception > - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER > - (for union) ArithmeticException overflow (because the number that should be > [0;10] has been mis-deserialized) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-4704) Move Table API to org.apache.flink.table
[ https://issues.apache.org/jira/browse/FLINK-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532176#comment-15532176 ] Liwei Lin commented on FLINK-4704: -- +1 on this! > Move Table API to org.apache.flink.table > > > Key: FLINK-4704 > URL: https://issues.apache.org/jira/browse/FLINK-4704 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther > > This would be a large change. But maybe now is still a good time to do it. > Otherwise we will never fix this. > Actually, the Table API is in the wrong package. At the moment it is in > {{org.apache.flink.api.table}} and the actual Scala/Java APIs are in > {{org.apache.flink.api.java/scala.table}}. All other APIs such as Python, > Gelly, Flink ML do not use the {{org.apache.flink.api}} namespace. > I suggest the following packages: > {code} > org.apache.flink.table > org.apache.flink.table.api.java > org.apache.flink.table.api.scala > {code} > What do you think? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4533) Unprotected access to meters in StatsDReporter#report()
[ https://issues.apache.org/jira/browse/FLINK-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506038#comment-15506038 ] Liwei Lin commented on FLINK-4533: -- Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. Thanks! > Unprotected access to meters in StatsDReporter#report() > --- > > Key: FLINK-4533 > URL: https://issues.apache.org/jira/browse/FLINK-4533 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Access to meters in AbstractReporter is protected by AbstractReporter.this. > Access to meters in StatsDReporter#report() should do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4534) Lack of synchronization in BucketingSink#restoreState()
[ https://issues.apache.org/jira/browse/FLINK-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506034#comment-15506034 ] Liwei Lin commented on FLINK-4534: -- Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. Thanks! > Lack of synchronization in BucketingSink#restoreState() > --- > > Key: FLINK-4534 > URL: https://issues.apache.org/jira/browse/FLINK-4534 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu > > Iteration over state.bucketStates is protected by synchronization in other > methods, except for the following in restoreState(): > {code} > for (BucketState bucketState : state.bucketStates.values()) { > {code} > and following in close(): > {code} > for (Map.Entryentry : > state.bucketStates.entrySet()) { > closeCurrentPartFile(entry.getValue()); > {code} > w.r.t. bucketState.pendingFilesPerCheckpoint , there is similar issue > starting line 752: > {code} > Set pastCheckpointIds = > bucketState.pendingFilesPerCheckpoint.keySet(); > LOG.debug("Moving pending files to final location on restore."); > for (Long pastCheckpointId : pastCheckpointIds) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4573) Potential resource leak due to unclosed RandomAccessFile in TaskManagerLogHandler
[ https://issues.apache.org/jira/browse/FLINK-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506033#comment-15506033 ] Liwei Lin commented on FLINK-4573: -- Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. Thanks! > Potential resource leak due to unclosed RandomAccessFile in > TaskManagerLogHandler > - > > Key: FLINK-4573 > URL: https://issues.apache.org/jira/browse/FLINK-4573 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > try { > raf = new > RandomAccessFile(file, "r"); > } catch > (FileNotFoundException e) { > display(ctx, request, > "Displaying TaskManager log failed."); > LOG.error("Displaying > TaskManager log failed.", e); > return; > } > long fileLength = > raf.length(); > final FileChannel fc = > raf.getChannel(); > {code} > If length() throws IOException, raf would be left unclosed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3222) Incorrect shift amount in OperatorCheckpointStats#hashCode()
[ https://issues.apache.org/jira/browse/FLINK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506035#comment-15506035 ] Liwei Lin commented on FLINK-3222: -- Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. Thanks! > Incorrect shift amount in OperatorCheckpointStats#hashCode() > > > Key: FLINK-3222 > URL: https://issues.apache.org/jira/browse/FLINK-3222 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > Here is related code: > {code} > result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length > >>> 32)); > {code} > subTaskStats.length is an int. > The shift amount is greater than 31 bits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3734) Unclosed DataInputView in AbstractAlignedProcessingTimeWindowOperator#restoreState()
[ https://issues.apache.org/jira/browse/FLINK-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506029#comment-15506029 ] Liwei Lin commented on FLINK-3734: -- Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. Thanks! > Unclosed DataInputView in > AbstractAlignedProcessingTimeWindowOperator#restoreState() > > > Key: FLINK-3734 > URL: https://issues.apache.org/jira/browse/FLINK-3734 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > DataInputView in = inputState.getState(getUserCodeClassloader()); > final long nextEvaluationTime = in.readLong(); > final long nextSlideTime = in.readLong(); > AbstractKeyedTimePanespanes = > createPanes(keySelector, function); > panes.readFromInput(in, keySerializer, stateTypeSerializer); > restoredState = new RestoredState<>(panes, nextEvaluationTime, > nextSlideTime); > } > {code} > DataInputView in is not closed upon return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4638) Fix exception message for MemorySegment
Liwei Lin created FLINK-4638: Summary: Fix exception message for MemorySegment Key: FLINK-4638 URL: https://issues.apache.org/jira/browse/FLINK-4638 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.1.2, 2.0.0, 1.1.3 Reporter: Liwei Lin Priority: Trivial Please see the code snip below: {code:title=MemorySegment.java|borderStyle=solid} if (offHeapAddress >= Long.MAX_VALUE - Integer.MAX_VALUE) { throw new IllegalArgumentException("Segment initialized with too large address: " + address // here address has not been initialized yet; should really be offHeapAddress + " ; Max allowed address is " + (Long.MAX_VALUE - Integer.MAX_VALUE - 1)); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)