[jira] [Created] (FLINK-26746) Update the documentation of Timer's Fault Tolerance section

2022-03-20 Thread Liwei Lin (Jira)
Liwei Lin created FLINK-26746:
-

 Summary: Update the documentation of Timer's Fault Tolerance 
section
 Key: FLINK-26746
 URL: https://issues.apache.org/jira/browse/FLINK-26746
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Liwei Lin


Currently in the documentation of Timer's Fault Tolerance section ( 
[https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/process_function/#fault-tolerance]
 ), it says:

Timers are always asynchronously checkpointed, except for the combination of 
RocksDB backend / with incremental snapshots / with heap-based timers (will be 
resolved with {{{}FLINK-10026{}}}). 

Actually FLINK-10026 is resolved as 'Won't Do'; the documentation might be 
updated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream

2021-03-07 Thread Liwei Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17297080#comment-17297080
 ] 

Liwei Lin commented on FLINK-20654:
---

Hi [~pnowojski], very appreciated for your detailed explanation! Exactly what 
we'd like to know! (y)

> Unaligned checkpoint recovery may lead to corrupted data stream
> ---
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.12.1
>Reporter: Arvid Heise
>Assignee: Piotr Nowojski
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.2, 1.13.0
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all 
> variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me 
> in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be 
> [0;10] has been mis-deserialized)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20654) Unaligned checkpoint recovery may lead to corrupted data stream

2021-03-06 Thread Liwei Lin (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17296771#comment-17296771
 ] 

Liwei Lin commented on FLINK-20654:
---

hi [~pnowojski] , does this also affect Flink 1.11.3 ?

And another thing we'd like to know is, is unaligned checkpoint 
production-ready as of Flink 1.11.3 or 1.12.2 ?

thanks!  :)

> Unaligned checkpoint recovery may lead to corrupted data stream
> ---
>
> Key: FLINK-20654
> URL: https://issues.apache.org/jira/browse/FLINK-20654
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing
>Affects Versions: 1.12.0, 1.12.1
>Reporter: Arvid Heise
>Assignee: Piotr Nowojski
>Priority: Blocker
>  Labels: pull-request-available, test-stability
> Fix For: 1.12.2, 1.13.0
>
>
> Fix of FLINK-20433 shows potential corruption after recovery for all 
> variations of UnalignedCheckpointITCase.
> To reproduce, run UCITCase a couple hundreds times. The issue showed for me 
> in:
> - execute [Parallel union, p = 5]
> - execute [Parallel union, p = 10]
> - execute [Parallel cogroup, p = 5]
> - execute [parallel pipeline with remote channels, p = 5]
> with decreasing frequency.
> The issue manifests as one of the following issues:
> - stream corrupted exception
> - EOF exception
> - assertion failure in NUM_LOST or NUM_OUT_OF_ORDER
> - (for union) ArithmeticException overflow (because the number that should be 
> [0;10] has been mis-deserialized)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-4704) Move Table API to org.apache.flink.table

2016-09-29 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15532176#comment-15532176
 ] 

Liwei Lin commented on FLINK-4704:
--

+1 on this!

> Move Table API to org.apache.flink.table
> 
>
> Key: FLINK-4704
> URL: https://issues.apache.org/jira/browse/FLINK-4704
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API & SQL
>Reporter: Timo Walther
>
> This would be a large change. But maybe now is still a good time to do it. 
> Otherwise we will never fix this.
> Actually, the Table API is in the wrong package. At the moment it is in 
> {{org.apache.flink.api.table}} and the actual Scala/Java APIs are in 
> {{org.apache.flink.api.java/scala.table}}. All other APIs such as Python, 
> Gelly, Flink ML do not use the {{org.apache.flink.api}} namespace.
> I suggest the following packages:
> {code}
> org.apache.flink.table
> org.apache.flink.table.api.java
> org.apache.flink.table.api.scala
> {code}
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4533) Unprotected access to meters in StatsDReporter#report()

2016-09-20 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506038#comment-15506038
 ] 

Liwei Lin commented on FLINK-4533:
--

Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. 
Thanks!

> Unprotected access to meters in StatsDReporter#report()
> ---
>
> Key: FLINK-4533
> URL: https://issues.apache.org/jira/browse/FLINK-4533
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Access to meters in AbstractReporter is protected by AbstractReporter.this.
> Access to meters in StatsDReporter#report() should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4534) Lack of synchronization in BucketingSink#restoreState()

2016-09-20 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506034#comment-15506034
 ] 

Liwei Lin commented on FLINK-4534:
--

Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. 
Thanks!

> Lack of synchronization in BucketingSink#restoreState()
> ---
>
> Key: FLINK-4534
> URL: https://issues.apache.org/jira/browse/FLINK-4534
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Iteration over state.bucketStates is protected by synchronization in other 
> methods, except for the following in restoreState():
> {code}
> for (BucketState bucketState : state.bucketStates.values()) {
> {code}
> and following in close():
> {code}
> for (Map.Entry entry : 
> state.bucketStates.entrySet()) {
>   closeCurrentPartFile(entry.getValue());
> {code}
> w.r.t. bucketState.pendingFilesPerCheckpoint , there is similar issue 
> starting line 752:
> {code}
>   Set pastCheckpointIds = 
> bucketState.pendingFilesPerCheckpoint.keySet();
>   LOG.debug("Moving pending files to final location on restore.");
>   for (Long pastCheckpointId : pastCheckpointIds) {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-4573) Potential resource leak due to unclosed RandomAccessFile in TaskManagerLogHandler

2016-09-20 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-4573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506033#comment-15506033
 ] 

Liwei Lin commented on FLINK-4573:
--

Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. 
Thanks!

> Potential resource leak due to unclosed RandomAccessFile in 
> TaskManagerLogHandler
> -
>
> Key: FLINK-4573
> URL: https://issues.apache.org/jira/browse/FLINK-4573
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> try {
> raf = new 
> RandomAccessFile(file, "r");
> } catch 
> (FileNotFoundException e) {
> display(ctx, request, 
> "Displaying TaskManager log failed.");
> LOG.error("Displaying 
> TaskManager log failed.", e);
> return;
> }
> long fileLength = 
> raf.length();
> final FileChannel fc = 
> raf.getChannel();
> {code}
> If length() throws IOException, raf would be left unclosed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3222) Incorrect shift amount in OperatorCheckpointStats#hashCode()

2016-09-20 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506035#comment-15506035
 ] 

Liwei Lin commented on FLINK-3222:
--

Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. 
Thanks!

> Incorrect shift amount in OperatorCheckpointStats#hashCode()
> 
>
> Key: FLINK-3222
> URL: https://issues.apache.org/jira/browse/FLINK-3222
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> Here is related code:
> {code}
> result = 31 * result + (int) (subTaskStats.length ^ (subTaskStats.length 
> >>> 32));
> {code}
> subTaskStats.length is an int.
> The shift amount is greater than 31 bits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3734) Unclosed DataInputView in AbstractAlignedProcessingTimeWindowOperator#restoreState()

2016-09-20 Thread Liwei Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15506029#comment-15506029
 ] 

Liwei Lin commented on FLINK-3734:
--

Hi [~tedyu] do you plan to fix this? In case you don't, I might take this. 
Thanks!

> Unclosed DataInputView in 
> AbstractAlignedProcessingTimeWindowOperator#restoreState()
> 
>
> Key: FLINK-3734
> URL: https://issues.apache.org/jira/browse/FLINK-3734
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> DataInputView in = inputState.getState(getUserCodeClassloader());
> final long nextEvaluationTime = in.readLong();
> final long nextSlideTime = in.readLong();
> AbstractKeyedTimePanes panes = 
> createPanes(keySelector, function);
> panes.readFromInput(in, keySerializer, stateTypeSerializer);
> restoredState = new RestoredState<>(panes, nextEvaluationTime, 
> nextSlideTime);
>   }
> {code}
> DataInputView in is not closed upon return.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-4638) Fix exception message for MemorySegment

2016-09-20 Thread Liwei Lin (JIRA)
Liwei Lin created FLINK-4638:


 Summary: Fix exception message for MemorySegment
 Key: FLINK-4638
 URL: https://issues.apache.org/jira/browse/FLINK-4638
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.1.2, 2.0.0, 1.1.3
Reporter: Liwei Lin
Priority: Trivial


Please see the code snip below:

{code:title=MemorySegment.java|borderStyle=solid}
if (offHeapAddress >= Long.MAX_VALUE - Integer.MAX_VALUE) {
  throw new IllegalArgumentException("Segment initialized with too large 
address: " + address // here address has not been initialized yet; should 
really be offHeapAddress 
  + " ; Max allowed address is " + (Long.MAX_VALUE - Integer.MAX_VALUE - 
1));
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)