[GitHub] [flink] flinkbot edited a comment on pull request #12775: [FLINK-18396][dosc]Translate "Formats Overview" page into Chinese

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12775:
URL: https://github.com/apache/flink/pull/12775#issuecomment-650482221


   
   ## CI report:
   
   * 2ceef8370fcb1901caeaecd35a8d62419e7c5acf Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4057)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18436) how to implement the class `MyTupleReducer`in flink official document

2020-06-26 Thread appleyuchi (Jira)
appleyuchi created FLINK-18436:
--

 Summary: how to implement the class `MyTupleReducer`in flink 
official document
 Key: FLINK-18436
 URL: https://issues.apache.org/jira/browse/FLINK-18436
 Project: Flink
  Issue Type: Test
Reporter: appleyuchi


This question has been posted in 
[https://stackoverflow.com/questions/62553572/how-to-implement-the-class-mytuplereducerin-flink-official-document]
 
but no final result:
 
 
I'm learning [flink document-dataset api 
|https://ci.apache.org/projects/flink/flink-docs-stable/dev/batch/dataset_transformations.html]

there's a class called{{mytupleReducer}}

I'm trying to complete it: [https://paste.ubuntu.com/p/3CjphGQrXP/]

but it' full of red line in Intellij.

could you give me a right style of above code?

Thanks for your help~!

PS:

I'm writing part of MyTupleReduce [https://pastebin.ubuntu.com/p/m4rjs6t8QP/]

but the return part is Wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-13874) StreamingFileSink fails to recover (truncate) properly

2020-06-26 Thread JIAN WANG (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146770#comment-17146770
 ] 

JIAN WANG edited comment on FLINK-13874 at 6/27/20, 3:50 AM:
-

Hi [~gyfora] , I still meet the same issue on flink-1.10.1. I use flink on 
YARN(3.0.0-cdh6.3.2) with StreamingFileSink. 

code part like this:



*public* *static*  StreamingFileSink build(String dir, 
BucketAssigner assigner, String prefix){         

        return StreamingFileSink.forRowFormat(new Path(dir), new 
SimpleStringEncoder())                          
.withRollingPolicy(DefaultRollingPolicy._builder_()                             
                                               
.withRolloverInterval(TimeUnit.HOURS.toMillis(2))                               
                                              
.withInactivityInterval(TimeUnit.MINUTES.toMillis(10))                          
                                          .withMaxPartSize(1024L * 1024L * 
1024L * 50) // Max 50GB                                                         
  .build())                

                 .withBucketAssigner(assigner)                                  
                                                                         
.withOutputFileConfig(OutputFileConfig._builder_().withPartPrefix(prefix).build())
                       .build();    

}

 

The error is 

java.io.IOException: Problem while truncating file: 
hdfs:///business_log/hashtag/2020-06-25/.hashtag-122-37.inprogress.8e65f69c-b5ba-4466-a844-ccc0a5a93de2



Due to this issue, it can not restart from the latest checkpoint and savepoint.

 

 


was (Author: alvinwj):
Hi [~gyfora] , I still meet the same issue on flink-1.10.1. I use flink on 
YARN(3.0.0-cdh6.3.2) with StreamingFileSink. 

code part like this:
```

*public* *static*  StreamingFileSink build(String dir, 
BucketAssigner assigner, String prefix) {

        *return* StreamingFileSink

                ._forRowFormat_(*new* Path(dir), *new* 
SimpleStringEncoder())

                .withRollingPolicy(DefaultRollingPolicy._builder_()

                        .withRolloverInterval(TimeUnit.*_HOURS_*.toMillis(2))

                        
.withInactivityInterval(TimeUnit.*_MINUTES_*.toMillis(10))

                        .withMaxPartSize(1024L * 1024L * 1024L * 50) // Max 50GB

                        .build())

                .withBucketAssigner(assigner)

                
.withOutputFileConfig(OutputFileConfig._builder_().withPartPrefix(prefix).build())

                .build();

    }
```

The error is 
```

java.io.IOException: Problem while truncating file: 
hdfs:///business_log/hashtag/2020-06-25/.hashtag-122-37.inprogress.8e65f69c-b5ba-4466-a844-ccc0a5a93de2
```

Due to this issue, it can not restart from the latest checkpoint and savepoint.

 

 

> StreamingFileSink fails to recover (truncate) properly
> --
>
> Key: FLINK-13874
> URL: https://issues.apache.org/jira/browse/FLINK-13874
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.9.0
>Reporter: Gyula Fora
>Priority: Blocker
>
> It seems that there might be some problem with the truncate / recovery logic 
> for the HadoopRecoverableFsDataOutputStream.
> I keep hitting the following error:
>  
> {noformat}
> java.io.IOException: Problem while truncating file: 
> hdfs:/user/root/flink/filesink/transaction-test1-text/2019-08-27--07/.part-1-1.inprogress.7e882941-ab98-4404-b16b-87a26256bf4d
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.safelyTruncateFile(HadoopRecoverableFsDataOutputStream.java:166)
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.(HadoopRecoverableFsDataOutputStream.java:89)
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.recover(HadoopRecoverableWriter.java:72)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restoreInProgressFile(Bucket.java:140)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.(Bucket.java:127)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restore(Bucket.java:396)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.DefaultBucketFactoryImpl.restoreBucket(DefaultBucketFactoryImpl.java:64)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.handleRestoredBucketState(Buckets.java:177)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeActiveBuckets(Buckets.java:165)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeState(Buckets.java:149)
>   at 
> 

[jira] [Commented] (FLINK-13874) StreamingFileSink fails to recover (truncate) properly

2020-06-26 Thread JIAN WANG (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146770#comment-17146770
 ] 

JIAN WANG commented on FLINK-13874:
---

Hi [~gyfora] , I still meet the same issue on flink-1.10.1. I use flink on 
YARN(3.0.0-cdh6.3.2) with StreamingFileSink. 

code part like this:
```

*public* *static*  StreamingFileSink build(String dir, 
BucketAssigner assigner, String prefix) {

        *return* StreamingFileSink

                ._forRowFormat_(*new* Path(dir), *new* 
SimpleStringEncoder())

                .withRollingPolicy(DefaultRollingPolicy._builder_()

                        .withRolloverInterval(TimeUnit.*_HOURS_*.toMillis(2))

                        
.withInactivityInterval(TimeUnit.*_MINUTES_*.toMillis(10))

                        .withMaxPartSize(1024L * 1024L * 1024L * 50) // Max 50GB

                        .build())

                .withBucketAssigner(assigner)

                
.withOutputFileConfig(OutputFileConfig._builder_().withPartPrefix(prefix).build())

                .build();

    }
```

The error is 
```

java.io.IOException: Problem while truncating file: 
hdfs:///business_log/hashtag/2020-06-25/.hashtag-122-37.inprogress.8e65f69c-b5ba-4466-a844-ccc0a5a93de2
```

Due to this issue, it can not restart from the latest checkpoint and savepoint.

 

 

> StreamingFileSink fails to recover (truncate) properly
> --
>
> Key: FLINK-13874
> URL: https://issues.apache.org/jira/browse/FLINK-13874
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.9.0
>Reporter: Gyula Fora
>Priority: Blocker
>
> It seems that there might be some problem with the truncate / recovery logic 
> for the HadoopRecoverableFsDataOutputStream.
> I keep hitting the following error:
>  
> {noformat}
> java.io.IOException: Problem while truncating file: 
> hdfs:/user/root/flink/filesink/transaction-test1-text/2019-08-27--07/.part-1-1.inprogress.7e882941-ab98-4404-b16b-87a26256bf4d
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.safelyTruncateFile(HadoopRecoverableFsDataOutputStream.java:166)
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableFsDataOutputStream.(HadoopRecoverableFsDataOutputStream.java:89)
>   at 
> org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.recover(HadoopRecoverableWriter.java:72)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restoreInProgressFile(Bucket.java:140)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.(Bucket.java:127)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.restore(Bucket.java:396)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.DefaultBucketFactoryImpl.restoreBucket(DefaultBucketFactoryImpl.java:64)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.handleRestoredBucketState(Buckets.java:177)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeActiveBuckets(Buckets.java:165)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.initializeState(Buckets.java:149)
>   at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:334)
>   at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
>   at 
> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
>   at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:878)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:392)
>   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException):
>  Failed to TRUNCATE_FILE 
> /user/root/flink/filesink/transaction-test1-text/2019-08-27--07/.part-1-1.inprogress.7e882941-ab98-4404-b16b-87a26256bf4d
>  for DFSClient_NONMAPREDUCE_-1189574442_56 on 172.31.114.177 because 
> DFSClient_NONMAPREDUCE_-1189574442_56 is already the current lease holder.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2522)
>   at 
> 

[GitHub] [flink] flinkbot commented on pull request #12775: [FLINK-18396][dosc]Translate "Formats Overview" page into Chinese

2020-06-26 Thread GitBox


flinkbot commented on pull request #12775:
URL: https://github.com/apache/flink/pull/12775#issuecomment-650482221


   
   ## CI report:
   
   * 2ceef8370fcb1901caeaecd35a8d62419e7c5acf UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12775: [FLINK-18396][dosc]Translate "Formats Overview" page into Chinese

2020-06-26 Thread GitBox


flinkbot commented on pull request #12775:
URL: https://github.com/apache/flink/pull/12775#issuecomment-650480026


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 2ceef8370fcb1901caeaecd35a8d62419e7c5acf (Sat Jun 27 
03:14:30 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] lsyldliu commented on pull request #12775: [FLINK-18396][dosc]Translate "Formats Overview" page into Chinese

2020-06-26 Thread GitBox


lsyldliu commented on pull request #12775:
URL: https://github.com/apache/flink/pull/12775#issuecomment-650479900


   @wuchong CC



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18396) Translate "Formats Overview" page into Chinese

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18396:
---
Labels: pull-request-available  (was: )

> Translate "Formats Overview" page into Chinese
> --
>
> Key: FLINK-18396
> URL: https://issues.apache.org/jira/browse/FLINK-18396
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: dalongliu
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/index.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] lsyldliu opened a new pull request #12775: [FLINK-18396][dosc]Translate "Formats Overview" page into Chinese

2020-06-26 Thread GitBox


lsyldliu opened a new pull request #12775:
URL: https://github.com/apache/flink/pull/12775


   ##What is the purpose of the change
   
   *Translate "Formats Overview" page into Chinese*
   
   ## Brief change log
   
 - *Translate "Formats Overview" page into Chinese*
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Arvid Heise (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arvid Heise updated FLINK-18433:

Attachment: flink_11.log.gz

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
> Attachments: flink_11.log.gz
>
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146650#comment-17146650
 ] 

Arvid Heise commented on FLINK-18433:
-

Hi [~liyu], thanks for the update - I feared as much.

Without further information, I did run a particular comparison that may or may 
not help.

I picked TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap as it had the biggest 
regression (-17.31%) and ran some tests on a single m5ad.2xlarge (with SSD, but 
state backend is heap). I built a flink-dist from release-1.10 and release-1.11.

Since there are no built-in evaluation metrics, I just used {{time}}. To reduce 
the impact of cluster setup and really see if it's related to heap state 
backend or network stack, I simply executed on a local executor who took the 
full 8 cores and I gave it 5gb RAM (job doesn't need much and I wanted to avoid 
too much allocation overhead).

Full commands for reference:

{noformat}
time java -Xmx5g -Dlog.file=flink_10.log 
-Dlog4j.configuration=file:///`pwd`/flink-1.10/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf/log4j.properties
 -cp flink-basic-operations_2.11-1.10-SNAPSHOT.jar:"${flink_10[*]}" 
org.apache.flink.basic.operations.PerformanceTestJob --topologyName TwoInputs 
--LogicalAttributesofEdges KeyBy --ScheduleMode LazyFromSource --CheckpointMode 
AtLeastOnce --recordSize 100 --stateBackend heap --maxCount 100 
--checkpointInterval 100 --checkpointPath file:///home/ec2-user/spda/checkpoints
time java -Xmx5g -Dlog.file=`pwd`/flink_11.log 
-Dlog4j.configurationFile=file:///`pwd`/flink/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/conf/log4j.properties
 -cp flink-basic-operations_2.11-1.11-SNAPSHOT.jar:"${flink_11[*]}" 
org.apache.flink.basic.operations.PerformanceTestJob --topologyName TwoInputs 
--LogicalAttributesofEdges KeyBy --ScheduleMode LazyFromSource --CheckpointMode 
AtLeastOnce --recordSize 100 --stateBackend heap --maxCount 100 
--checkpointInterval 100 --checkpointPath 
file:///home/ec2-user/spda/checkpoints 
--execution.checkpointing.tolerable-failed-checkpoints 1000
{noformat}

I modified the test job to compile and to create a local executor forwarding 
the parameters to configuration (more on that later).

I ran these commands interleaved for a few hours and got [this 
sheet|https://docs.google.com/spreadsheets/d/1NPoDQakQu1apdzWZfxD2IoRgo28MpBfD9s4K9Aq9nA4/edit?usp=sharing].
 On average, we have
Flink 1.10  01m59s
Flink 1.11  01m50s
Note that less is better in this case as we measure the time needed to process 
1m elements.

So TL;DR in this particular benchmark setup, it rather looks like performance 
actually improved. Note that DOP=8 is higher than what [~Aihua] used. Assuming 
that both benchmarks are okay I see 3 options to explain them.
# We may have a regression on local input channels, but an improvement for 
remote input channels. Since, remote input channels are usually the bottleneck, 
I'd say this is rather good, but ideally we can still remove the regression 
while keeping the improvement.
# Memory management in 1.11 works differently/incorrectly. My test excludes the 
memory management on TM/JM level, so that may be the root cause for the 
original regression.
# I experienced restarts due to failed checkpoints in the end. My first 
impression was that when the job is about to be finished may cause some 
in-progress checkpoints to be canceled which is propagated to checkpoint 
coordinator, which ultimately restarts the job because by default no checkpoint 
is allowed to fail. In my final setup, I ignored these errors, but it is 
obvious that any restart would impact the performance tremendously. In my 
setup, I even ran in some kind of live lock for 1m records (100k records didn't 
suffer from these issues oddly).

I'm attaching a log that shows this live lock. [~roman_khachatryan] 
investigated but couldn't find anything suspicious.

The key errors are

{noformat}
2020-06-26 14:53:09,662 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Trying to recover from a global failure.
org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable 
failure threshold.
at 
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:66)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1626)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1603)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:90)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 

[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146650#comment-17146650
 ] 

Arvid Heise edited comment on FLINK-18433 at 6/26/20, 9:25 PM:
---

Hi [~liyu], thanks for the update - I feared as much.

Without further information, I did run a particular comparison that may or may 
not help.

I picked TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap as it had the biggest 
regression (-17.31%) and ran some tests on a single m5ad.2xlarge (with SSD, but 
state backend is heap). I built a flink-dist from release-1.10 and release-1.11.

Since there are no built-in evaluation metrics, I just used {{time}}. To reduce 
the impact of cluster setup and to really see if it's related to heap state 
backend or network stack, I simply executed on a local executor who took the 
full 8 cores and I gave it 5gb RAM (job doesn't need much and I wanted to avoid 
too much allocation overhead).

Full commands for reference:

{noformat}
time java -Xmx5g -Dlog.file=flink_10.log 
-Dlog4j.configuration=file:///`pwd`/flink-1.10/flink-dist/target/flink-1.10-SNAPSHOT-bin/flink-1.10-SNAPSHOT/conf/log4j.properties
 -cp flink-basic-operations_2.11-1.10-SNAPSHOT.jar:"${flink_10[*]}" 
org.apache.flink.basic.operations.PerformanceTestJob --topologyName TwoInputs 
--LogicalAttributesofEdges KeyBy --ScheduleMode LazyFromSource --CheckpointMode 
AtLeastOnce --recordSize 100 --stateBackend heap --maxCount 100 
--checkpointInterval 100 --checkpointPath file:///home/ec2-user/spda/checkpoints
time java -Xmx5g -Dlog.file=`pwd`/flink_11.log 
-Dlog4j.configurationFile=file:///`pwd`/flink/flink-dist/target/flink-1.11-SNAPSHOT-bin/flink-1.11-SNAPSHOT/conf/log4j.properties
 -cp flink-basic-operations_2.11-1.11-SNAPSHOT.jar:"${flink_11[*]}" 
org.apache.flink.basic.operations.PerformanceTestJob --topologyName TwoInputs 
--LogicalAttributesofEdges KeyBy --ScheduleMode LazyFromSource --CheckpointMode 
AtLeastOnce --recordSize 100 --stateBackend heap --maxCount 100 
--checkpointInterval 100 --checkpointPath 
file:///home/ec2-user/spda/checkpoints 
--execution.checkpointing.tolerable-failed-checkpoints 1000
{noformat}

I modified the test job to compile and to create a local executor forwarding 
the parameters to configuration (more on that later).

I ran these commands interleaved for a few hours and got [this 
sheet|https://docs.google.com/spreadsheets/d/1NPoDQakQu1apdzWZfxD2IoRgo28MpBfD9s4K9Aq9nA4/edit?usp=sharing].
 On average, we have
Flink 1.10  01m59s
Flink 1.11  01m50s
Note that less is better in this case as we measure the time needed to process 
1m elements.

So TL;DR in this particular benchmark setup, it rather looks like performance 
actually improved. Note that DOP=8 is higher than what [~Aihua] used. Assuming 
that both benchmarks are okay I see 3 options to explain them.
# We may have a regression on local input channels, but an improvement for 
remote input channels. Since, remote input channels are usually the bottleneck, 
I'd say this is rather good, but ideally we can still remove the regression 
while keeping the improvement.
# Memory management in 1.11 works differently/incorrectly. My test excludes the 
memory management on TM/JM level, so that may be the root cause for the 
original regression.
# I experienced restarts due to failed checkpoints in the end. My first 
impression was that when the job is about to be finished may cause some 
in-progress checkpoints to be canceled which is propagated to checkpoint 
coordinator, which ultimately restarts the job because by default no checkpoint 
is allowed to fail. In my final setup, I ignored these errors, but it is 
obvious that any restart would impact the performance tremendously. In my 
setup, I even ran in some kind of live lock for 1m records (100k records didn't 
suffer from these issues oddly).

I'm attaching a log that shows this live lock. [~roman_khachatryan] 
investigated but couldn't find anything suspicious.

The key errors are

{noformat}
2020-06-26 14:53:09,662 INFO  org.apache.flink.runtime.jobmaster.JobMaster  
   [] - Trying to recover from a global failure.
org.apache.flink.util.FlinkRuntimeException: Exceeded checkpoint tolerable 
failure threshold.
at 
org.apache.flink.runtime.checkpoint.CheckpointFailureManager.handleJobLevelCheckpointException(CheckpointFailureManager.java:66)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1626)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.abortPendingCheckpoint(CheckpointCoordinator.java:1603)
 ~[flink-dist_2.11-1.11-SNAPSHOT.jar:1.11-SNAPSHOT]
at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.access$600(CheckpointCoordinator.java:90)
 

[GitHub] [flink] flinkbot edited a comment on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12768:
URL: https://github.com/apache/flink/pull/12768#issuecomment-649142405


   
   ## CI report:
   
   * a2efe0f32d68f1c563e1825ec17e10c4229d44fa Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4053)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] sergiimk commented on a change in pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL

2020-06-26 Thread GitBox


sergiimk commented on a change in pull request #12768:
URL: https://github.com/apache/flink/pull/12768#discussion_r446370561



##
File path: 
flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/ParquetSchemaConverter.java
##
@@ -102,7 +104,8 @@ public static MessageType toParquetType(TypeInformation 
typeInformation, bool
if (originalType != null) {
switch (originalType) {
case DECIMAL:
-   typeInfo = 
BasicTypeInfo.BIG_DEC_TYPE_INFO;
+   DecimalMetadata 
meta = primitiveType.getDecimalMetadata();
+   typeInfo = 
BigDecimalTypeInfo.of(meta.getPrecision(), meta.getScale());

Review comment:
   When original implementation returns `BasicTypeInfo.BIG_DEC_TYPE_INFO` 
it basically discards the precision/scale information from the Parquet 
metadata. When you construct a table from such row it will have a schema with 
`LEGACY('DECIMAL', 'DECIMAL')` type in it.
   
   My understanding is that for Flink SQL this type is equivalent to maximum 
precision `DECIMAL(38, 18)`. This is problematic when you do arithmetic 
operations on decimals such as multiplication, as it will result in derived 
type that is very prone to overflowing. So in my SQL, even though my Parquet 
files have `DECIMAL(18, 4)` in them I had to do lots of casting:
   ```
   SELECT 
 CAST(
   CAST(price as DECIMAL(18, 4)) * CAST(amount as DECIMAL(18, 4)
   as DECIMAL(18, 4)
 ) as value
   FROM ...
   ```
   ...basically re-casting the input values back into the precision/scale that 
they already are in Parquet to avoid overflowing and getting a silent `null` 
result.
   
   Switching this to `BigDecimalTypeInfo.of(...)` preserves the original 
precision/scale and allows me to simplify my query to:
   ```
   SELECT ..., CAST(price * amount as DECIMAL(18, 4)) as value FROM ...
   ```
   
   In my tests this works great except for **one problem** - when I inspect the 
schema of a table created from `ParquetRowInputFormat` stream it shows data 
type as:
   ```
   LEGACY('RAW', 'ANY')
   ```
   Doing an identity operation like `SELECT * FROM input` normalizes the schema 
back to expected `DECIMAL(18, 4)` somehow.
   
   I don't quite understand what's so different about `BIG_DEC_TYPE_INFO` and 
`BigDecimalTypeInfo.of(...)` as they both instantiate a class that implements 
`BasicTypeInfo` ... I can only assume that in `Table` 
implementation somewhere there's an equality check for `BIG_DEC_TYPE_INFO` 
instead of `instaceof`...
   
   I am very confused by Flink's type system(s), so would appreciate some 
pointers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12774: CSV中文翻译 + ORC中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 https://issues.apache.org/jira/browse/FLINK-18395

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650206487


   
   ## CI report:
   
   * 7ade04cbee6e83c3bbd675a232f71f2719226903 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4051)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] sergiimk commented on a change in pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL

2020-06-26 Thread GitBox


sergiimk commented on a change in pull request #12768:
URL: https://github.com/apache/flink/pull/12768#discussion_r446370561



##
File path: 
flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/ParquetSchemaConverter.java
##
@@ -102,7 +104,8 @@ public static MessageType toParquetType(TypeInformation 
typeInformation, bool
if (originalType != null) {
switch (originalType) {
case DECIMAL:
-   typeInfo = 
BasicTypeInfo.BIG_DEC_TYPE_INFO;
+   DecimalMetadata 
meta = primitiveType.getDecimalMetadata();
+   typeInfo = 
BigDecimalTypeInfo.of(meta.getPrecision(), meta.getScale());

Review comment:
   When original implementation returns `BasicTypeInfo.BIG_DEC_TYPE_INFO` 
it basically discards the precision/scale information from the Parquet 
metadata. When you construct a table from such row it will have a schema with 
`LEGACY('DECIMAL', 'DECIMAL')` type in it.
   
   My understanding is that for Flink SQL this type is equivalent to maximum 
precision `DECIMAL(38, 18)`. This is problematic when you do arithmetic 
operations on decimals such as multiplication, as it will result in derived 
type that is very prone to overflowing. So in my SQL, even though my Parquet 
files have `DECIMAL(18, 4)` in them I had to do lots of casting:
   ```
   SELECT 
 CAST(
   CAST(price as DECIMAL(18, 4)) * CAST(amount as DECIMAL(18, 4)
   as DECIMAL(18, 4)
 ) as value
   FROM ...
   ```
   ...basically re-casting the input values back into the precision/scale that 
they already are in Parquet to avoid overflowing and getting a silent `null` 
result.
   
   Switching this to `BigDecimalTypeInfo.of(...)` preserves the original 
precision/scale and allows me to simplify my query to:
   ```
   SELECT ..., CAST(price * amount as DECIMAL(18, 4)) as value FROM ...
   ```
   
   In my tests this works great except for **one problem** - when I inspect the 
schema of a table created from `ParquetRowInputFormat` stream it shows data 
type as:
   ```
   LEGACY('RAW', 'ANY')
   ```
   Doing an identity operation like `SELECT * FROM input` normalizes the schema 
back to expected `DECIMAL(18, 4)` somehow.
   
   I am very confused by Flink's type system(s), so would appreciate some 
pointers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12768:
URL: https://github.com/apache/flink/pull/12768#issuecomment-649142405


   
   ## CI report:
   
   * 066e41b9e06caf58e4ef5a8fbe6ea49b0bc9168e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4025)
 
   * a2efe0f32d68f1c563e1825ec17e10c4229d44fa Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4053)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12768: [FLINK-17804][parquet] Follow Parquet spec when decoding DECIMAL

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12768:
URL: https://github.com/apache/flink/pull/12768#issuecomment-649142405


   
   ## CI report:
   
   * 066e41b9e06caf58e4ef5a8fbe6ea49b0bc9168e Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4025)
 
   * a2efe0f32d68f1c563e1825ec17e10c4229d44fa UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650174631


   
   ## CI report:
   
   * b6900dba87ec3a9ae7cf7b39c46274c7998357e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4048)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12774: CSV中文翻译 + ORC中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 https://issues.apache.org/jira/browse/FLINK-18395

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650206487


   
   ## CI report:
   
   * e782a0d886f2601e8e4a2deda5022c0987500717 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4049)
 
   * 7ade04cbee6e83c3bbd675a232f71f2719226903 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4051)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (FLINK-17800) RocksDB optimizeForPointLookup results in missing time windows

2020-06-26 Thread Yu Li (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-17800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li closed FLINK-17800.
-
Resolution: Fixed

Merged the new fix into master via
3516e37ae0aa4ee040b6844f336541315a455ce9
11d45135d85937edd16fb4f8f94ba71f5f794626
1718f50645ddc01d5e2e13cc5627bafe98191fa2

into release-1.11 via
b2e344a46c5d30ad46231d5c6a42bf09d9e8e559
33caa00e8df88565f022d4258148d09c90d9452b
7e1c83ddcf0e5e4417ccf25fd1d0facce9f30e0e

into release-1.10 via
de6f3aa7e5b2e4fcfbed4adeab12d4d519f1e6fb
3f8649e5c7bf731fac3cc5bfd3c5ed466f1dc561
9b45486ccce31ebef3f91dd4e6102efe3c6d51a3


Since all work done, closing the JIRA.

> RocksDB optimizeForPointLookup results in missing time windows
> --
>
> Key: FLINK-17800
> URL: https://issues.apache.org/jira/browse/FLINK-17800
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / State Backends
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Yordan Pavlov
>Assignee: Yun Tang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.11.0, 1.10.2, 1.12.0
>
> Attachments: MissingWindows.scala, MyMissingWindows.scala, 
> MyMissingWindows.scala
>
>
> +My Setup:+
> We have been using the _RocksDb_ option of _optimizeForPointLookup_ and 
> running version 1.7 for years. Upon upgrading to Flink 1.10 we started 
> receiving a strange behavior of missing time windows on a streaming Flink 
> job. For the purpose of testing I experimented with previous Flink version 
> and (1.8, 1.9, 1.9.3) and non of them showed the problem
>  
> A sample of the code demonstrating the problem is here:
> {code:java}
>  val datastream = env
>  .addSource(KafkaSource.keyedElements(config.kafkaElements, 
> List(config.kafkaBootstrapServer)))
>  val result = datastream
>  .keyBy( _ => 1)
>  .timeWindow(Time.milliseconds(1))
>  .print()
> {code}
>  
>  
> The source consists of 3 streams (being either 3 Kafka partitions or 3 Kafka 
> topics), the elements in each of the streams are separately increasing. The 
> elements generate increasing timestamps using an event time and start from 1, 
> increasing by 1. The first partitions would consist of timestamps 1, 2, 10, 
> 15..., the second of 4, 5, 6, 11..., the third of 3, 7, 8, 9...
>  
> +What I observe:+
> The time windows would open as I expect for the first 127 timestamps. Then 
> there would be a huge gap with no opened windows, if the source has many 
> elements, then next open window would be having a timestamp in the thousands. 
> A gap of hundred of elements would be created with what appear to be 'lost' 
> elements. Those elements are not reported as late (if tested with the 
> ._sideOutputLateData_ operator). The way we have been using the option is by 
> setting in inside the config like so:
> ??etherbi.rocksDB.columnOptions.optimizeForPointLookup=268435456??
> We have been using it for performance reasons as we have huge RocksDB state 
> backend.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12774: CSV中文翻译 + ORC中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 https://issues.apache.org/jira/browse/FLINK-18395

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650206487


   
   ## CI report:
   
   * e782a0d886f2601e8e4a2deda5022c0987500717 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4049)
 
   * 7ade04cbee6e83c3bbd675a232f71f2719226903 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4051)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12772: [FLINK-18394][docs-zh] Translate "Parquet Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12772:
URL: https://github.com/apache/flink/pull/12772#issuecomment-650132836


   
   ## CI report:
   
   * c41b83c9509ea244b59f3be5acd0ee822f5ab1b8 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4046)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12774: CSV中文翻译 + ORC中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 https://issues.apache.org/jira/browse/FLINK-18395

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650206487


   
   ## CI report:
   
   * e782a0d886f2601e8e4a2deda5022c0987500717 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4049)
 
   * 7ade04cbee6e83c3bbd675a232f71f2719226903 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650174631


   
   ## CI report:
   
   * 5bee189843863a93c3912b983f721214ade36e3c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4047)
 
   * b6900dba87ec3a9ae7cf7b39c46274c7998357e9 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4048)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] kakaroto928 commented on pull request #12774: CSV中文翻译 + ORC中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 https://issues.apache.org/jira/browse/FLINK-18395

2020-06-26 Thread GitBox


kakaroto928 commented on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650220953


   @flinkbot thanks for your help, if u have questions about my translation 
,don't hesitate to tell me , thanks a lot ! 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18395) Translate "ORC Format" page into Chinese

2020-06-26 Thread pengweibo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146365#comment-17146365
 ] 

pengweibo commented on FLINK-18395:
---

this has the same link to mission of "CSV format". The link is  
[https://github.com/apache/flink/pull/12774] 

 

> Translate "ORC Format" page into Chinese
> 
>
> Key: FLINK-18395
> URL: https://issues.apache.org/jira/browse/FLINK-18395
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: pengweibo
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/orc.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/orc.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18395) Translate "ORC Format" page into Chinese

2020-06-26 Thread pengweibo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146362#comment-17146362
 ] 

pengweibo commented on FLINK-18395:
---

Hello Jark Wu

I have just finished my first translation mission about ORC format. Could u 
please make a review  ? That's my first  time translation, so if there are some 
problems, don't hesitate to tell me .

Thanks for all your help

Weibo PENG 

> Translate "ORC Format" page into Chinese
> 
>
> Key: FLINK-18395
> URL: https://issues.apache.org/jira/browse/FLINK-18395
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: pengweibo
>Priority: Major
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/orc.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/orc.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] carp84 closed pull request #12736: [FLINK-17800][rocksdb] Ensure total order seek to avoid user misuse

2020-06-26 Thread GitBox


carp84 closed pull request #12736:
URL: https://github.com/apache/flink/pull/12736


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12774: CSV中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 by pengweibo

2020-06-26 Thread GitBox


flinkbot commented on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650206487


   
   ## CI report:
   
   * e782a0d886f2601e8e4a2deda5022c0987500717 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] jPrest commented on pull request #8466: [FLINK-12336][monitoring] Add HTTPS support to InfluxDB reporter

2020-06-26 Thread GitBox


jPrest commented on pull request #8466:
URL: https://github.com/apache/flink/pull/8466#issuecomment-650201811


   > @Etienne-Carriere are you still interested to work on this feature?
   
   I am still very interested in this feature. It will be a major step up in 
security on this reporter, because currently the only option is to use insecure 
http if your influx instance lives somewhere else on the internet(e.g. shared 
monitoring infrastructure), which is unacceptable.
   
   Adding this will have no drawbacks, just enable more security.
   The self-signed certificate discussion is very valid, but does not directly 
influence this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Yu Li (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146322#comment-17146322
 ] 

Yu Li commented on FLINK-18433:
---

[~AHeise] Thanks for the efforts, and sorry but I'm afraid only [~Aihua] could 
answer the questions about testing setup.

We are on public holidays for the Dragon Boat Festival (from Jun. 25 to Jun. 
27) so may be less responsive, JFYI.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18388) Translate "CSV Format" page into Chinese

2020-06-26 Thread pengweibo (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146320#comment-17146320
 ] 

pengweibo commented on FLINK-18388:
---

Hello Jark Wu

I have just finished my first translation mission about CSV format. Could u 
please make a review  ? That's my first  time translation, so if there are some 
problems, don't hesitate to tell me .

Thanks for all your help

Weibo PENG 

> Translate "CSV Format" page into Chinese
> 
>
> Key: FLINK-18388
> URL: https://issues.apache.org/jira/browse/FLINK-18388
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: pengweibo
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/csv.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/csv.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] carp84 commented on pull request #12736: [FLINK-17800][rocksdb] Ensure total order seek to avoid user misuse

2020-06-26 Thread GitBox


carp84 commented on pull request #12736:
URL: https://github.com/apache/flink/pull/12736#issuecomment-650197234


   > @carp84 I have triggered CI on `core` modules multi times and all results 
turns successful. (the only failed case is due to fail to upload logs and has 
no business with Flink itself).
   
   Thanks for the review @StephanEwen and @pnowojski . I'm merging it since 
@Myasuka has double checked and confirmed the build won't be broken again.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12774: CSV中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 by pengweibo

2020-06-26 Thread GitBox


flinkbot commented on pull request #12774:
URL: https://github.com/apache/flink/pull/12774#issuecomment-650197154


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit e782a0d886f2601e8e4a2deda5022c0987500717 (Fri Jun 26 
14:03:12 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650174631


   
   ## CI report:
   
   * 5bee189843863a93c3912b983f721214ade36e3c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4047)
 
   * b6900dba87ec3a9ae7cf7b39c46274c7998357e9 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18388) Translate "CSV Format" page into Chinese

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18388:
---
Labels: pull-request-available  (was: )

> Translate "CSV Format" page into Chinese
> 
>
> Key: FLINK-18388
> URL: https://issues.apache.org/jira/browse/FLINK-18388
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: pengweibo
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/csv.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/csv.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] kakaroto928 opened a new pull request #12774: CSV中文翻译 jira连接 https://issues.apache.org/jira/browse/FLINK-18388 by pengweibo

2020-06-26 Thread GitBox


kakaroto928 opened a new pull request #12774:
URL: https://github.com/apache/flink/pull/12774


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / no / don't 
know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650174631


   
   ## CI report:
   
   * 5bee189843863a93c3912b983f721214ade36e3c Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4047)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


flinkbot commented on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650174631


   
   ## CI report:
   
   * 5bee189843863a93c3912b983f721214ade36e3c UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12771: [FLINK-18393] [docs-zh] Translate "Canal Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12771:
URL: https://github.com/apache/flink/pull/12771#issuecomment-650078985


   
   ## CI report:
   
   * fe29c8160b8a717dd899dfcef650379603b8e7fc Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4045)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * fd6c18a3272b1daba70357760bc5e63d2b608ea0 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4044)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


flinkbot commented on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650163353


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 7d816bf28658782e93846d16511d866944c100d1 (Fri Jun 26 
12:54:35 UTC 2020)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] uce commented on pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


uce commented on pull request #12773:
URL: https://github.com/apache/flink/pull/12773#issuecomment-650161678


   I haven't tested this, but it would be great if we can include it in 1.11.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18435) Allow reporter factories to intercept reflection-based instantiation attempts

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18435:
---
Labels: pull-request-available  (was: )

> Allow reporter factories to intercept reflection-based instantiation attempts
> -
>
> Key: FLINK-18435
> URL: https://issues.apache.org/jira/browse/FLINK-18435
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Metrics
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> Before 1.11 to use a reporter its class was configured, and the instance 
> instantiated via reflection.
> We then introduced reporter factories, and added an annotation for 
> redirection instantiation attempts from the reporter class to factories, by 
> annotating the reporter class with {{InstantiateViaFactory}}.
> However, when we migrated reporters to plugins, this approach stopped 
> working, the reason being that it required the reporter class to be 
> accessible. The plugin system only exposes the factories however.
> To ensure that existing configurations continue to work, I propose to add a 
> new {{InterceptInstantiationViaReflection}} annotation for factories, with 
> which they can specify a class name to intercept reflection-based 
> instantiation attempts.
> Basically, we just invert the {{InstantiateViaFactory}} logic.
> Instead of the reporter saying "This factory should be used to instantiate 
> me.", the factory now say "I can instantiate that reporter."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zentol opened a new pull request #12773: [FLINK-18435][metrics] Add support for intercepting reflection-based instantiations

2020-06-26 Thread GitBox


zentol opened a new pull request #12773:
URL: https://github.com/apache/flink/pull/12773


   Adds support for reporter factories to declare that they should be used for 
instantiating a specific reporter class.
   
   With this change we maintain backwards compatibility with previous 
configuration that configure the reporter class name.
   The existing `InstantiateViaFactory` approach does not work for plugins (but 
only for reporters in lib/), since we do not have access to the reporter class 
during the loading process.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (FLINK-18435) Allow reporter factories to intercept reflection-based instantiation attempts

2020-06-26 Thread Chesnay Schepler (Jira)
Chesnay Schepler created FLINK-18435:


 Summary: Allow reporter factories to intercept reflection-based 
instantiation attempts
 Key: FLINK-18435
 URL: https://issues.apache.org/jira/browse/FLINK-18435
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Metrics
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.12.0


Before 1.11 to use a reporter its class was configured, and the instance 
instantiated via reflection.
We then introduced reporter factories, and added an annotation for redirection 
instantiation attempts from the reporter class to factories, by annotating the 
reporter class with {{InstantiateViaFactory}}.
However, when we migrated reporters to plugins, this approach stopped working, 
the reason being that it required the reporter class to be accessible. The 
plugin system only exposes the factories however.

To ensure that existing configurations continue to work, I propose to add a new 
{{InterceptInstantiationViaReflection}} annotation for factories, with which 
they can specify a class name to intercept reflection-based instantiation 
attempts.

Basically, we just invert the {{InstantiateViaFactory}} logic.
Instead of the reporter saying "This factory should be used to instantiate 
me.", the factory now say "I can instantiate that reporter."



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146262#comment-17146262
 ] 

Julius Michaelis commented on FLINK-18150:
--

Finally managed to build my job on release-1.11-rc2. It seems like the problem 
persists. Hence I'm guessing it's unrelated to 17327.
(I was hoping that even if it's unrelated, the update of the Kafka consumer 
from 2.2.0 to 2.4.2 with better node selection behavior would help. Nope. :/)

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:821)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.cancel(StreamSource.java:147)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.cancelTask(SourceStreamTask.java:136)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:602)
> at 
> org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1355)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It recovers, but processes fewer than the expected amount of records.
> Finally,  the job fails with
> {code:none}
> 2020-06-05 13:59:37
> org.apache.kafka.common.errors.TimeoutException: Timeout expired while 
> fetching topic metadata
> {code}
> and repeats doing so while not processing any records. (The exception comes 
> without any backtrace or otherwise interesting information)
> I have also observed this behavior with partition-discovery turned off, but 
> only when the Flink job failed (after a broker failure) and had to run 
> checkpoint recovery for some other reason.
> Please see the [Environment] description for information on how to reproduce 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12772: [FLINK-18394][docs-zh] Translate "Parquet Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12772:
URL: https://github.com/apache/flink/pull/12772#issuecomment-650132836


   
   ## CI report:
   
   * c41b83c9509ea244b59f3be5acd0ee822f5ab1b8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4046)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] azagrebin commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-26 Thread GitBox


azagrebin commented on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-650141332


   You can see the CI status for the PR in a comment which is automatically 
created and updated by the Flink bot.
   It is usually in the third comment, like for [this PR 
here](https://github.com/apache/flink/pull/11245#issuecomment-592002512).
   
   Since 1.11, [the Flink 
CI](https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines)
 is hosted by [Azure 
Pipelines](https://dev.azure.com/apache-flink/apache-flink/_build?definitionId=2),
 so it is not Travis anymore. 1.10 still uses Travis. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-15693) Stop receiving incoming RPC messages when RpcEndpoint is closing

2020-06-26 Thread MinWang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-15693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146255#comment-17146255
 ] 

MinWang commented on FLINK-15693:
-

[~trohrmann] I still figuring it out. Did you have any ideas?

> Stop receiving incoming RPC messages when RpcEndpoint is closing
> 
>
> Key: FLINK-15693
> URL: https://issues.apache.org/jira/browse/FLINK-15693
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.1, 1.10.0
>Reporter: Till Rohrmann
>Priority: Major
> Fix For: 1.11.0
>
>
> When calling {{RpcEndpoint#closeAsync()}}, the system triggers 
> {{RpcEndpoint#onStop}} and transitions the endpoint into the 
> {{TerminatingState}}. In order to allow asynchronous clean up operations, the 
> main thread executor is not shut down immediately. As a side effect, the 
> {{RpcEndpoint}} still accepts incoming RPC messages from other components. 
> I think it would be cleaner to no longer accept incoming RPC messages once we 
> are in the {{TerminatingState}}. That way we would not worry about the 
> internal state of the {{RpcEndpoint}} when processing RPC messages (similar 
> to 
> [here|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskExecutor.java#L952]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18394) Translate "Parquet Format" page into Chinese

2020-06-26 Thread Luxios (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146248#comment-17146248
 ] 

Luxios commented on FLINK-18394:


I've already submitted my commit. But I found I could not change this issue 
into 'In Progress' Mode (There was no 'Start Progress' button on my page.). are 
there some conditions to unlock this function?

> Translate "Parquet Format" page into Chinese
> 
>
> Key: FLINK-18394
> URL: https://issues.apache.org/jira/browse/FLINK-18394
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: Luxios
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/parquet.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/parquet.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot commented on pull request #12772: [FLINK-18394][docs-zh] Translate "Parquet Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot commented on pull request #12772:
URL: https://github.com/apache/flink/pull/12772#issuecomment-650132836


   
   ## CI report:
   
   * c41b83c9509ea244b59f3be5acd0ee822f5ab1b8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #12772: [FLINK-18394][docs-zh] Translate "Parquet Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot commented on pull request #12772:
URL: https://github.com/apache/flink/pull/12772#issuecomment-650127705


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit c41b83c9509ea244b59f3be5acd0ee822f5ab1b8 (Fri Jun 26 
11:21:53 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] Luxios22 opened a new pull request #12772: [FLINK-18394][docs-zh] Translate "Parquet Format" page into Chinese

2020-06-26 Thread GitBox


Luxios22 opened a new pull request #12772:
URL: https://github.com/apache/flink/pull/12772


   
   
   ## What is the purpose of the change
   
   Translate "Parquet Format" page into Chinese.
   
   The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/parquet.html
   
   The markdown file is located in 
flink/docs/dev/table/connectors/formats/parquet.zh.md
   
   
   ## Brief change log
   
   - Translate `flink/docs/dev/table/connectors/formats/parquet.zh.md`.
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? no
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18394) Translate "Parquet Format" page into Chinese

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18394:
---
Labels: pull-request-available  (was: )

> Translate "Parquet Format" page into Chinese
> 
>
> Key: FLINK-18394
> URL: https://issues.apache.org/jira/browse/FLINK-18394
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: Luxios
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/parquet.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/parquet.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12771: [FLINK-18393] [docs-zh] Translate "Canal Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12771:
URL: https://github.com/apache/flink/pull/12771#issuecomment-650078985


   
   ## CI report:
   
   * fe29c8160b8a717dd899dfcef650379603b8e7fc Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4045)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18419) Can not create a catalog

2020-06-26 Thread Dawid Wysakowicz (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Wysakowicz updated FLINK-18419:
-
Fix Version/s: 1.11.1
   1.12.0

> Can not create a catalog 
> -
>
> Key: FLINK-18419
> URL: https://issues.apache.org/jira/browse/FLINK-18419
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Assignee: Dawid Wysakowicz
>Priority: Major
> Fix For: 1.12.0, 1.11.1
>
>
> The {{CREATE CATALOG}} statement does not work if the catalog implementation 
> comes from the user classloader. The problem is that 
> {{org.apache.flink.table.planner.operations.SqlToOperationConverter#convertCreateCatalog}}
>  uses the {{SqlToOperationConverter}} classloader.
> We should use {{Thread.currentThread().getContextClassloader()}} for now.
> One of the ways to reproduce it is try to create e.g. a postgres catalog with 
> the {{flink-connector-jdbc}} passed as an additional jar to {{sql--client}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146139#comment-17146139
 ] 

Arvid Heise edited comment on FLINK-18433 at 6/26/20, 9:27 AM:
---

Puh, this is not easy to reproduce. 

* for me, both branches are not compiling; fix seems rather easy, but there is 
a general cleanup necessary including commit structure and source code.
* it's unclear to me why there are 3 machines in environment of the ticket, but 
DOP is 1. Were they executed on a cluster with 3 machines and one TM got 
randomly selected? Why not just go with 1 TM then to make results more 
comparable.
* *did you actually enable checkpointing? Without setting the 
checkpointInterval, no checkpointing is enabled in my tests.*
* how is the actual measurement performed? And what are we actually measuring? 
Is it throughput? But how did you calculate it? The job has no calculation on 
its own as far as I can see. So is it a simple `time` or did you manually 
extract the execution time and to normalize the number of elements processed? 
* how long did things run? With the command that you presented, stuff is 
running indefinitely. I'm assuming you didn't run to maxCount = Long.MAX_VALUE, 
although that is the default setting. If it's rather short running, how often 
were things repeated?
* what did you configure in flink-conf.yaml? Left to default values?
* that's a bit independent, but when did you run 1.11 and 1.10 tests? I noticed 
on EMR that I got 5-10% more throughput by running in the night than during the 
day. So for comparable results, the comparable measurements should be taken 
rather closely.

So I'm pretty much stuck at bisecting because there is too little information. 
I will pick one of the cases that [~liyu] and do some basic tests.


was (Author: aheise):
Puh, this is not easy to reproduce. 

First, for me, both branches are not compiling; fix seems rather easy, but 
there is a general cleanup necessary including commit structure and source code.

Second, it's unclear to me why there are 3 machines in environment of the 
ticket, but DOP is 1. Were they executed on a cluster with 3 machines and one 
TM got randomly selected? Why not just go with 1 TM then to make results more 
comparable.

Third, how is the actual measurement performed? And what are we actually 
measuring? Is it throughput? But how did you calculate it? The job has no 
calculation on its own as far as I can see. So is it a simple `time` or did you 
manually extract the execution time and to normalize the number of elements 
processed? 

Fourth, how long did things run? With the command that you presented, stuff is 
running indefinitely. I'm assuming you didn't run to maxCount = Long.MAX_VALUE, 
although that is the default setting. If it's rather short running, how often 
were things repeated?

Fifth, what did you configure in flink-conf.yaml? Left to default values?

Sixth, that's a bit independent, but when did you run 1.11 and 1.10 tests? I 
noticed on EMR that I got 5-10% more throughput by running in the night than 
during the day. So for comparable results, the comparable measurements should 
be taken rather closely.

So I'm pretty much stuck at bisecting because there is too little information. 
I will pick one of the cases that [~liyu] and do some basic tests.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> 

[GitHub] [flink] flinkbot edited a comment on pull request #12702: [FLINK-18325] [table] fix potential NPE when calling SqlDataTypeSpec#getNullable.

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12702:
URL: https://github.com/apache/flink/pull/12702#issuecomment-645869341


   
   ## CI report:
   
   * 135f96e93369341df6be18a1e2df23c20c7434f3 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4042)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146138#comment-17146138
 ] 

Julius Michaelis edited comment on FLINK-18150 at 6/26/20, 9:15 AM:


Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
 Either:
 * Turn on {{flink.partition-discovery.interval-millis}}, and then it does not 
happen unless there's a certain number of producers, or
 * Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager, after failing the Kafka node

Other (non-)similarities to FLINK-17327 are:
 * I am seeing {{Exceeded checkpoint tolerable failure threshold}}
 ** I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
 * I am seeing the {{Handover$ClosedException}}
 * I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right. Couldn't.) If 
there's an easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...


was (Author: caesar):
Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
 Either:
 * Turn on {{flink.partition-discovery.interval-millis}}, and then it does not 
happen unless there's a certain number of producers, or
 * Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager, after failing the Kafka node

Other (non-)similarities to FLINK-17327 are:
 * I am seeing {{Exceeded checkpoint tolerable failure threshold}}
 ** I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
 * I am seeing the {{Handover$ClosedException}}
 * I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> 

[jira] [Comment Edited] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146138#comment-17146138
 ] 

Julius Michaelis edited comment on FLINK-18150 at 6/26/20, 9:14 AM:


Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
 Either:
 * Turn on {{flink.partition-discovery.interval-millis}}, and then it does not 
happen unless there's a certain number of producers, or
 * Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager, after failing the Kafka node

Other (non-)similarities to FLINK-17327 are:
 * I am seeing {{Exceeded checkpoint tolerable failure threshold}}
 ** I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
 * I am seeing the {{Handover$ClosedException}}
 * I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...


was (Author: caesar):
Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
 Either:
 * Turn on {{flink.partition-discovery.interval-millis}}, which requires 
producers, or
 * Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
 * I am seeing {{Exceeded checkpoint tolerable failure threshold}}
 ** I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
 * I am seeing the {{Handover$ClosedException}}
 * I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> 

[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * 265d6eb7970325c88d2b3a9c77fa308be89641a9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4037)
 
   * fd6c18a3272b1daba70357760bc5e63d2b608ea0 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4044)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146139#comment-17146139
 ] 

Arvid Heise commented on FLINK-18433:
-

Puh, this is not easy to reproduce. 

First, for me, both branches are not compiling; fix seems rather easy, but 
there is a general cleanup necessary including commit structure and source code.

Second, it's unclear to me why there are 3 machines in environment of the 
ticket, but DOP is 1. Were they executed on a cluster with 3 machines and one 
TM got randomly selected? Why not just go with 1 TM then to make results more 
comparable.

Third, how is the actual measurement performed? And what are we actually 
measuring? Is it throughput? But how did you calculate it? The job has no 
calculation on its own as far as I can see. So is it a simple `time` or did you 
manually extract the execution time and to normalize the number of elements 
processed? 

Fourth, how long did things run? With the command that you presented, stuff is 
running indefinitely. I'm assuming you didn't run to maxCount = Long.MAX_VALUE, 
although that is the default setting. If it's rather short running, how often 
were things repeated?

Fifth, what did you configure in flink-conf.yaml? Left to default values?

Sixth, that's a bit independent, but when did you run 1.11 and 1.10 tests? I 
noticed on EMR that I got 5-10% more throughput by running in the night than 
during the day. So for comparable results, the comparable measurements should 
be taken rather closely.

So I'm pretty much stuck at bisecting because there is too little information. 
I will pick one of the cases that [~liyu] and do some basic tests.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> 

[jira] [Comment Edited] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146138#comment-17146138
 ] 

Julius Michaelis edited comment on FLINK-18150 at 6/26/20, 9:12 AM:


Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
 Either:
 * Turn on {{flink.partition-discovery.interval-millis}}, which requires 
producers, or
 * Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
 * I am seeing {{Exceeded checkpoint tolerable failure threshold}}
 ** I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
 * I am seeing the {{Handover$ClosedException}}
 * I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...


was (Author: caesar):
Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
Either:
* Turn on {{flink.partition-discovery.interval-millis}}, which requires 
producers, or
* Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
* I am seeing {{Exceeded checkpoint tolerable failure threshold}}
  * I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
* I am seeing the {{Handover$ClosedException}}
* I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175)
> at 
> 

[jira] [Comment Edited] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146138#comment-17146138
 ] 

Julius Michaelis edited comment on FLINK-18150 at 6/26/20, 9:12 AM:


Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
Either:
* Turn on {{flink.partition-discovery.interval-millis}}, which requires 
producers, or
* Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
* I am seeing {{Exceeded checkpoint tolerable failure threshold}}
  * I've seen this appear on a job that had 
{{setFailTaskOnCheckpointError(false)}}
* I am seeing the {{Handover$ClosedException}}
* I am not seeing {{Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...


was (Author: caesar):
Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
Either:
* Turn on {flink.partition-discovery.interval-millis} , which requires 
producers, or
* Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
* I am seeing {Exceeded checkpoint tolerable failure threshold}
  * I've seen this appear on a job that had 
{setFailTaskOnCheckpointError(false)}
* I am seeing the {Handover$ClosedException}
* I am not seeing {Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175)
> at 
> 

[jira] [Comment Edited] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146138#comment-17146138
 ] 

Julius Michaelis edited comment on FLINK-18150 at 6/26/20, 9:11 AM:


Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
Either:
* Turn on {flink.partition-discovery.interval-millis} , which requires 
producers, or
* Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
* I am seeing {Exceeded checkpoint tolerable failure threshold}
  * I've seen this appear on a job that had 
{setFailTaskOnCheckpointError(false)}
* I am seeing the {Handover$ClosedException}
* I am not seeing {Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...


was (Author: caesar):
Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
Either:
* Turn on {flink.partition-discovery.interval-millis}, which requires 
producers, or
* Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
* I am seeing {Exceeded checkpoint tolerable failure threshold}
  * I've seen this appear on a job that had 
{setFailTaskOnCheckpointError(false)}
* I am seeing the {Handover$ClosedException}
* I am not seeing {Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175)
> at 
> 

[jira] [Commented] (FLINK-18150) A single failing Kafka broker may cause jobs to fail indefinitely with TimeoutException: Timeout expired while fetching topic metadata

2020-06-26 Thread Julius Michaelis (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146138#comment-17146138
 ] 

Julius Michaelis commented on FLINK-18150:
--

Uh. It probably doesn't happen without producers. But I'm not entirely sure, 
since I have two ways of running into this bug
Either:
* Turn on {flink.partition-discovery.interval-millis}, which requires 
producers, or
* Cause the the job to run recovery for some other, unrelated reason, e.g. 
restart a taskmanager

Other (non-)similarities to FLINK-17327 are:
* I am seeing {Exceeded checkpoint tolerable failure threshold}
  * I've seen this appear on a job that had 
{setFailTaskOnCheckpointError(false)}
* I am seeing the {Handover$ClosedException}
* I am not seeing {Increase kafka producers pool size or decrease number of 
concurrent checkpoints.}

If it's already fixed on master and release-1.11, wouldn't it be easy to try 
whether it's reproducible with that? (Or so I thought. I just wasted the better 
part of a day trying to get the dependencies and build right.) If there's an 
easy way of getting a docker image (to use in my [reproducing 
setup|https://github.com/jcaesar/flink-kafka-ha-failure]) and a matching set of 
maven dependencies for a snapshot, please let me know...

> A single failing Kafka broker may cause jobs to fail indefinitely with 
> TimeoutException: Timeout expired while fetching topic metadata
> --
>
> Key: FLINK-18150
> URL: https://issues.apache.org/jira/browse/FLINK-18150
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Kafka
>Affects Versions: 1.10.1
> Environment: It is a bit unclear to me under what circumstances this 
> can be reproduced. I created a "minimum" non-working example at 
> https://github.com/jcaesar/flink-kafka-ha-failure. Note that this deploys the 
> minimum number of Kafka brokers, but it works just as well with replication 
> factor 3 and 8 brokers, e.g.
> I run this with
> {code:bash}
> docker-compose kill; and docker-compose rm -vf; and docker-compose up 
> --abort-on-container-exit --build
> {code}
> The exception should appear on the webui after 5~6 minutes.
> To make sure that this isn't dependent on my machine, I've also checked 
> reproducibility on a m5a.2xlarge EC2 instance.
> You verify that the Kafka cluster is running "normally" e.g. with:
> {code:bash}
> kafkacat -b localhost,localhost:9093 -L
> {code}
> So far, I only know that
> * {{flink.partition-discovery.interval-millis}} must be set.
> * The broker that failed must be part of the {{bootstrap.servers}}
> * There needs to be a certain amount of topics or producers, but I'm unsure 
> which is crucial
> * Changing the values of {{metadata.request.timeout.ms}} or 
> {{flink.partition-discovery.interval-millis}} does not seem to have any 
> effect.
>Reporter: Julius Michaelis
>Priority: Major
>
> When a Kafka broker fails that is listed among the bootstrap servers and 
> partition discovery is active, the Flink job reading from that Kafka may 
> enter a failing loop.
> At first, the job seems to react normally without failure with only a short 
> latency spike when switching Kafka leaders.
> Then, it fails with a
> {code:none}
> org.apache.flink.streaming.connectors.kafka.internal.Handover$ClosedException
> at 
> org.apache.flink.streaming.connectors.kafka.internal.Handover.close(Handover.java:182)
> at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.cancel(KafkaFetcher.java:175)
> at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.cancel(FlinkKafkaConsumerBase.java:821)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.cancel(StreamSource.java:147)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.cancelTask(SourceStreamTask.java:136)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:602)
> at 
> org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1355)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> It recovers, but processes fewer than the expected amount of records.
> Finally,  the job fails with
> {code:none}
> 2020-06-05 13:59:37
> org.apache.kafka.common.errors.TimeoutException: Timeout expired while 
> fetching topic metadata
> {code}
> and repeats doing so while not processing any records. (The exception comes 
> without any backtrace or otherwise interesting information)
> I have also observed this behavior with partition-discovery turned off, but 
> only when the Flink job failed (after a broker failure) and had to run 
> checkpoint recovery for some other reason.
> Please see the [Environment] description for 

[GitHub] [flink] flinkbot commented on pull request #12771: [FLINK-18393] [docs-zh] Translate "Canal Format" page into Chinese

2020-06-26 Thread GitBox


flinkbot commented on pull request #12771:
URL: https://github.com/apache/flink/pull/12771#issuecomment-650075442


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit fe29c8160b8a717dd899dfcef650379603b8e7fc (Fri Jun 26 
09:08:10 UTC 2020)
   
✅no warnings
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18393) Translate "Canal Format" page into Chinese

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-18393:
---
Labels: pull-request-available  (was: )

> Translate "Canal Format" page into Chinese
> --
>
> Key: FLINK-18393
> URL: https://issues.apache.org/jira/browse/FLINK-18393
> Project: Flink
>  Issue Type: Sub-task
>  Components: chinese-translation, Documentation, Table SQL / Ecosystem
>Reporter: Jark Wu
>Assignee: Zhiye Zou
>Priority: Major
>  Labels: pull-request-available
>
> The page url is 
> https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/canal.html
> The markdown file is located in 
> flink/docs/dev/table/connectors/formats/canal.zh.md



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] zhiyezou opened a new pull request #12771: [FLINK-18393] [docs-zh] Translate "Canal Format" page into Chinese

2020-06-26 Thread GitBox


zhiyezou opened a new pull request #12771:
URL: https://github.com/apache/flink/pull/12771


   
   
   ## What is the purpose of the change
   
   Translate "Canal Format" page into Chinese
   
   The page url is 
https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/connectors/formats/canal.html
   
   The markdown file is located in 
flink/docs/dev/table/connectors/formats/canal.zh.md
   
   
   ## Brief change log
   
   
 - translate `flink/docs/dev/table/connectors/formats/canal.zh.md`
   
   
   
   ## Verifying this change
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): no
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
 - The serializers: no
 - The runtime per-record code paths (performance sensitive): no
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
 - The S3 file system connector: no
   
   ## Documentation
   
 - Does this pull request introduce a new feature? no
 - If yes, how is the feature documented? no
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #8187: [FLINK-12197] [Formats] Avro row deser for Confluent binary format

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #8187:
URL: https://github.com/apache/flink/pull/8187#issuecomment-650056344


   
   ## CI report:
   
   * 97d2250cc98ea14d8402e81a7f5fcc3de83a5aab Travis: 
[FAILURE](https://travis-ci.com/github/flink-ci/flink/builds/173124797) 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (FLINK-18097) History server doesn't clean all job json files

2020-06-26 Thread Chesnay Schepler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-18097:
-
Component/s: (was: Runtime / Web Frontend)
 Runtime / REST

> History server doesn't clean all job json files
> ---
>
> Key: FLINK-18097
> URL: https://issues.apache.org/jira/browse/FLINK-18097
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / REST
>Affects Versions: 1.10.1
>Reporter: Milan Nikl
>Priority: Minor
>  Labels: pull-request-available
>
> Improvement introduced in https://issues.apache.org/jira/browse/FLINK-14169 
> does not delete all files in the history server folders completely.
> There is a [json file created for each 
> job|https://github.com/apache/flink/blob/master/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/history/HistoryServerArchiveFetcher.java#L237-L238]
>  in history server's {{webDir/jobs/}} directory. Such file is not deleted by 
> {{cleanupExpiredJobs}}.
> And while the cleaned up job is no longer visible in History server's 
> {{Completed Jobs List}} in web UI, it can be still accessed on 
> {{/#/job//overview}}.
> While this bug probably won't lead to any serious issues, files in history 
> server's folders should be cleaned up thoroughly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859


   
   ## CI report:
   
   * 7916851295ea609627070348b3eabf2f3de94e47 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4043)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-592002512


   
   ## CI report:
   
   * f1d4f9163a2cd60daa02fe6b7af5a459cee9660c UNKNOWN
   * 4d11da053e608d9d24e70c001fbb6b1469a392d7 UNKNOWN
   * 265d6eb7970325c88d2b3a9c77fa308be89641a9 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4037)
 
   * fd6c18a3272b1daba70357760bc5e63d2b608ea0 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] dawidwys commented on a change in pull request #12056: [FLINK-17502] [flink-connector-rabbitmq] RMQSource refactor

2020-06-26 Thread GitBox


dawidwys commented on a change in pull request #12056:
URL: https://github.com/apache/flink/pull/12056#discussion_r446041214



##
File path: 
flink-connectors/flink-connector-rabbitmq/src/main/java/org/apache/flink/streaming/connectors/rabbitmq/RMQDeserializationSchema.java
##
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.connectors.rabbitmq;
+
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.java.typeutils.ResultTypeQueryable;
+import org.apache.flink.util.Collector;
+
+import com.rabbitmq.client.AMQP;
+import com.rabbitmq.client.Envelope;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.List;
+
+/**
+ * Interface for the set of methods required to parse an RMQ delivery.
+ * @param  The output type of the {@link RMQSource}
+ */
+public interface RMQDeserializationSchema extends Serializable, 
ResultTypeQueryable {
+   /**
+* This method takes all the RabbitMQ delivery information supplied by 
the client extract the data and pass it to the
+* collector.
+* NOTICE: The implementation of this method MUST call {@link 
RMQCollector#setMessageIdentifiers(String, long)} with
+* the correlation ID of the message if checkpointing and 
UseCorrelationID (in the RMQSource constructor) were enabled
+* the {@link RMQSource}.
+* @param envelope
+* @param properties
+* @param body
+* @throws IOException
+*/
+   public  void processMessage(Envelope envelope, AMQP.BasicProperties 
properties, byte[] body, RMQCollector collector) throws IOException;
+
+   /**
+* Method to decide whether the element signals the end of the stream. 
If
+* true is returned the element won't be emitted.
+*
+* @param nextElement The element to test for the end-of-stream signal.
+* @return True, if the element signals end of stream, false otherwise.
+*/
+   public boolean isEndOfStream(T nextElement);
+
+   /**
+* The {@link TypeInformation} for the deserialized T.
+* As an example the proper implementation of this method if T is a 
String is:
+* {@code return TypeExtractor.getForClass(String.class)}
+* @return TypeInformation
+*/
+   public TypeInformation getProducedType();
+
+
+   /**
+* Special collector for RMQ messages.
+* Captures the correlation ID and delivery tag also does the filtering 
logic for weather a message has been
+* processed or not.
+* @param 
+*/
+   public interface RMQCollector extends Collector {
+   public void collect(List records);
+
+   public void setMessageIdentifiers(String correlationId, long 
deliveryTag);
+   }

Review comment:
   +1, I was also thinking of an interface as the one described by 
@austince .
   
   I am not sure though about the purpose of the `reset` method though. 
Especially with the default empy implementation. When would users call this 
method from `RMQCollector#deserialize` ? I think we might need that method in 
the implementation, but not necessarily in the interface.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot commented on pull request #8187: [FLINK-12197] [Formats] Avro row deser for Confluent binary format

2020-06-26 Thread GitBox


flinkbot commented on pull request #8187:
URL: https://github.com/apache/flink/pull/8187#issuecomment-650056344


   
   ## CI report:
   
   * 97d2250cc98ea14d8402e81a7f5fcc3de83a5aab UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18427) Job failed under java 11

2020-06-26 Thread Stephan Ewen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146120#comment-17146120
 ] 

Stephan Ewen commented on FLINK-18427:
--

[~simahao] Could you confirm that increasing Framework Offheap Memory solves 
this?


> Job failed under java 11
> 
>
> Key: FLINK-18427
> URL: https://issues.apache.org/jira/browse/FLINK-18427
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Configuration, Runtime / Network
>Affects Versions: 1.10.0
>Reporter: Zhang Hao
>Priority: Critical
>
> flink version:1.10.0
> deployment mode:cluster
> os:linux redhat7.5
> Job parallelism:greater than 1
> My job run normally under java 8, but failed under java 11.Excpetion info 
> like below,netty send message failed.In addition, I found job would failed 
> when task was distributed on multi node, if I set job's parallelism = 1, job 
> run normally under java 11 too.
>  
> 2020-06-24 09:52:162020-06-24 
> 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException:
>  Sending the partition request to '/170.0.50.19:33320' failed. at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918)
>  at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  at java.base/java.lang.Thread.run(Thread.java:834)Caused by: 
> java.io.IOException: Error while serializing message: 
> PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177)
>  at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716)
>  ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: 
> Direct buffer memory at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497)
>  at 
> org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174)
>  ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at 
> java.base/java.nio.Bits.reserveMemory(Bits.java:175) at 
> 

[GitHub] [flink] nielsbasjes commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-26 Thread GitBox


nielsbasjes commented on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-650048603


   Side question: Where can I see the CI for Flink? 
   This seems to be no longer used 
https://travis-ci.org/github/apache/flink/pull_requests ? 
   
   I did a rebase onto the latest master and rebuilt the docs. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859


   
   ## CI report:
   
   * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4038)
 
   * 7916851295ea609627070348b3eabf2f3de94e47 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4043)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18433) From the end-to-end performance test results, 1.11 has a regression

2020-06-26 Thread Arvid Heise (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146099#comment-17146099
 ] 

Arvid Heise commented on FLINK-18433:
-

I'm starting to bisect on some EC2 instance.

> From the end-to-end performance test results, 1.11 has a regression
> ---
>
> Key: FLINK-18433
> URL: https://issues.apache.org/jira/browse/FLINK-18433
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core, API / DataStream
>Affects Versions: 1.11.0
> Environment: 3 machines
> [|https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
>Reporter: Aihua Li
>Priority: Major
>
>  
> I ran end-to-end performance tests between the Release-1.10 and Release-1.11. 
> the results were as follows:
> |scenarioName|release-1.10|release-1.11| |
> |OneInput_Broadcast_LazyFromSource_ExactlyOnce_10_rocksdb|46.175|43.8133|-5.11%|
> |OneInput_Rescale_LazyFromSource_ExactlyOnce_100_heap|211.835|200.355|-5.42%|
> |OneInput_Rebalance_LazyFromSource_ExactlyOnce_1024_rocksdb|1721.041667|1618.32|-5.97%|
> |OneInput_KeyBy_LazyFromSource_ExactlyOnce_10_heap|46|43.615|-5.18%|
> |OneInput_Broadcast_Eager_ExactlyOnce_100_rocksdb|212.105|199.688|-5.85%|
> |OneInput_Rescale_Eager_ExactlyOnce_1024_heap|1754.64|1600.12|-8.81%|
> |OneInput_Rebalance_Eager_ExactlyOnce_10_rocksdb|45.9167|43.0983|-6.14%|
> |OneInput_KeyBy_Eager_ExactlyOnce_100_heap|212.0816667|200.727|-5.35%|
> |OneInput_Broadcast_LazyFromSource_AtLeastOnce_1024_rocksdb|1718.245|1614.381667|-6.04%|
> |OneInput_Rescale_LazyFromSource_AtLeastOnce_10_heap|46.12|43.5517|-5.57%|
> |OneInput_Rebalance_LazyFromSource_AtLeastOnce_100_rocksdb|212.038|200.388|-5.49%|
> |OneInput_KeyBy_LazyFromSource_AtLeastOnce_1024_heap|1762.048333|1606.408333|-8.83%|
> |OneInput_Broadcast_Eager_AtLeastOnce_10_rocksdb|46.0583|43.4967|-5.56%|
> |OneInput_Rescale_Eager_AtLeastOnce_100_heap|212.233|201.188|-5.20%|
> |OneInput_Rebalance_Eager_AtLeastOnce_1024_rocksdb|1720.66|1616.85|-6.03%|
> |OneInput_KeyBy_Eager_AtLeastOnce_10_heap|46.14|43.6233|-5.45%|
> |TwoInputs_Broadcast_LazyFromSource_ExactlyOnce_100_rocksdb|156.918|152.957|-2.52%|
> |TwoInputs_Rescale_LazyFromSource_ExactlyOnce_1024_heap|1415.511667|1300.1|-8.15%|
> |TwoInputs_Rebalance_LazyFromSource_ExactlyOnce_10_rocksdb|34.2967|34.1667|-0.38%|
> |TwoInputs_KeyBy_LazyFromSource_ExactlyOnce_100_heap|158.353|151.848|-4.11%|
> |TwoInputs_Broadcast_Eager_ExactlyOnce_1024_rocksdb|1373.406667|1300.056667|-5.34%|
> |TwoInputs_Rescale_Eager_ExactlyOnce_10_heap|34.5717|32.0967|-7.16%|
> |TwoInputs_Rebalance_Eager_ExactlyOnce_100_rocksdb|158.655|147.44|-7.07%|
> |TwoInputs_KeyBy_Eager_ExactlyOnce_1024_heap|1356.611667|1292.386667|-4.73%|
> |TwoInputs_Broadcast_LazyFromSource_AtLeastOnce_10_rocksdb|34.01|33.205|-2.37%|
> |TwoInputs_Rescale_LazyFromSource_AtLeastOnce_100_heap|149.588|145.997|-2.40%|
> |TwoInputs_Rebalance_LazyFromSource_AtLeastOnce_1024_rocksdb|1359.74|1299.156667|-4.46%|
> |TwoInputs_KeyBy_LazyFromSource_AtLeastOnce_10_heap|34.025|29.6833|-12.76%|
> |TwoInputs_Broadcast_Eager_AtLeastOnce_100_rocksdb|157.303|151.4616667|-3.71%|
> |TwoInputs_Rescale_Eager_AtLeastOnce_1024_heap|1368.74|1293.238333|-5.52%|
> |TwoInputs_Rebalance_Eager_AtLeastOnce_10_rocksdb|34.325|33.285|-3.03%|
> |TwoInputs_KeyBy_Eager_AtLeastOnce_100_heap|162.5116667|134.375|-17.31%|
> It can be seen that the performance of 1.11 has a regression, basically 
> around 5%, and the maximum regression is 17%. This needs to be checked.
> the test code:
> flink-1.10.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> flink-1.11.0: 
> [https://github.com/Li-Aihua/flink/blob/test_suite_for_basic_operations_1.11/flink-end-to-end-perf-tests/flink-basic-operations/src/main/java/org/apache/flink/basic/operations/PerformanceTestJob.java]
> commit cmd like tis:
> bin/flink run -d -m 192.168.39.246:8081 -c 
> org.apache.flink.basic.operations.PerformanceTestJob 
> /home/admin/flink-basic-operations_2.11-1.10-SNAPSHOT.jar --topologyName 
> OneInput --LogicalAttributesofEdges Broadcast --ScheduleMode LazyFromSource 
> --CheckpointMode ExactlyOnce --recordSize 10 --stateBackend rocksdb
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] flinkbot edited a comment on pull request #12770: [FLINK-18200][python] Replace the deprecated interfaces with the new interfaces in the tests and examples

2020-06-26 Thread GitBox


flinkbot edited a comment on pull request #12770:
URL: https://github.com/apache/flink/pull/12770#issuecomment-649764859


   
   ## CI report:
   
   * b2f1ab0d8197d7515678e8f74572d6ca0cddf5a4 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=4038)
 
   * 7916851295ea609627070348b3eabf2f3de94e47 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run travis` re-run the last Travis build
- `@flinkbot run azure` re-run the last Azure build
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (FLINK-18362) Shipping jdk by shipping archive

2020-06-26 Thread gramo lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146080#comment-17146080
 ] 

gramo lee edited comment on FLINK-18362 at 6/26/20, 7:15 AM:
-

[~fly_in_gis] My company policy review was just passed . So now the 
patch[^support-yarn.ship-archives.patch]) is able to be attached. As you 
mentioned, "yarn.ship-archives" is added as an option for this. at first, my 
code had been based on flink 1.10.1 release. After that, I realized an 
implementation of "YarnLocalResourceDescriptor" is already existed on master 
branch(afebdc2f19a7e54). Therefore it took more time to reimplement by using 
it, but anyway it made the code simpler than before. Please give me your review 
about the code. 


was (Author: gramo):
My company policy review was just passed . So now the 
patch[^support-yarn.ship-archives.patch]) is able to be attached. As you 
mentioned, "yarn.ship-archives" is added as an option for this. at first, my 
code had been based on flink 1.10.1 release. After that, I realized an 
implementation of "YarnLocalResourceDescriptor" is already existed on master 
branch(afebdc2f19a7e54). Therefore it took more time to reimplement by using 
it, but anyway it made the code simpler than before. Please give me your review 
about the code. 

> Shipping jdk by shipping archive
> 
>
> Key: FLINK-18362
> URL: https://issues.apache.org/jira/browse/FLINK-18362
> Project: Flink
>  Issue Type: Wish
>Affects Versions: 1.10.1
>Reporter: Noah
>Priority: Minor
> Attachments: support-yarn.ship-archives.patch
>
>
> Hello,
> Our team are running flink cluster on YARN, and it works so well 
> h4. Functional requirements
> Is there any option to ship archive to YARN applications?
> h4. Backgrounds
> Recently, one of job has been shut down with jdk8 version related issues.
> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6675699
> It's easy problem if we could set latest jdk on 
> `containerized.taskmanager.env.JAVA_HOME`.
> However, cluster administrator said it's difficult to install the latest jdk 
> on all cluster machines.
>  
> So, we planned to run a job on latest jdk that is shipped via shared 
> resources.
> There's an option `yarn.ship-directories` but it's quite slow because jdk has 
> large number of files.
> If Flink supports to ship archive such as `yarn.ship-archive`, we can ship 
> jdk archive to remote machines and use shipped jdk location as JAVA_HOME 
> (using `yarn.container-start-command-template` )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18362) Shipping jdk by shipping archive

2020-06-26 Thread gramo lee (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146080#comment-17146080
 ] 

gramo lee commented on FLINK-18362:
---

My company policy review was just passed . So now the 
patch[^support-yarn.ship-archives.patch]) is able to be attached. As you 
mentioned, "yarn.ship-archives" is added as an option for this. at first, my 
code had been based on flink 1.10.1 release. After that, I realized an 
implementation of "YarnLocalResourceDescriptor" is already existed on master 
branch(afebdc2f19a7e54). Therefore it took more time to reimplement by using 
it, but anyway it made the code simpler than before. Please give me your review 
about the code. 

> Shipping jdk by shipping archive
> 
>
> Key: FLINK-18362
> URL: https://issues.apache.org/jira/browse/FLINK-18362
> Project: Flink
>  Issue Type: Wish
>Affects Versions: 1.10.1
>Reporter: Noah
>Priority: Minor
> Attachments: support-yarn.ship-archives.patch
>
>
> Hello,
> Our team are running flink cluster on YARN, and it works so well 
> h4. Functional requirements
> Is there any option to ship archive to YARN applications?
> h4. Backgrounds
> Recently, one of job has been shut down with jdk8 version related issues.
> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6675699
> It's easy problem if we could set latest jdk on 
> `containerized.taskmanager.env.JAVA_HOME`.
> However, cluster administrator said it's difficult to install the latest jdk 
> on all cluster machines.
>  
> So, we planned to run a job on latest jdk that is shipped via shared 
> resources.
> There's an option `yarn.ship-directories` but it's quite slow because jdk has 
> large number of files.
> If Flink supports to ship archive such as `yarn.ship-archive`, we can ship 
> jdk archive to remote machines and use shipped jdk location as JAVA_HOME 
> (using `yarn.container-start-command-template` )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink-web] pnowojski commented on a change in pull request #352: Add 1.11 Release announcement.

2020-06-26 Thread GitBox


pnowojski commented on a change in pull request #352:
URL: https://github.com/apache/flink-web/pull/352#discussion_r446008533



##
File path: _posts/2020-06-29-release-1.11.0.md
##
@@ -0,0 +1,299 @@
+---
+layout: post 
+title:  "Apache Flink 1.11.0 Release Announcement" 
+date: 2020-06-29T08:00:00.000Z
+categories: news
+authors:
+- morsapaes:
+  name: "Marta Paes"
+  twitter: "morsapaes"
+
+excerpt: The Apache Flink community is proud to announce the release of Flink 
1.11.0! This release marks the first milestone in realizing a new vision for 
fault tolerance in Flink, and adds a handful of new features that simplify (and 
unify) Flink handling across the API stack. In particular for users of the 
Table API/SQL, this release introduces significant improvements to usability 
and opens up completely new use cases, including the much-anticipated support 
for Change Data Capture (CDC)! A great deal of effort has also gone into 
optimizing PyFlink and ensuring that its functionality is available to a 
broader set of Flink users.
+---
+
+The Apache Flink community is proud to announce the release of Flink 1.11.0! 
This release marks the first milestone in realizing a new vision for fault 
tolerance in Flink, and adds a handful of new features that simplify (and 
unify) Flink handling across the API stack. In particular for users of the 
Table API/SQL, this release introduces significant improvements to usability 
and opens up completely new use cases, including the much-anticipated support 
for Change Data Capture (CDC)! A great deal of effort has also gone into 
optimizing PyFlink and ensuring that its functionality is available to a 
broader set of Flink users.
+
+This blog post describes all major new features and improvements, important 
changes to be aware of and what to expect moving forward.
+
+{% toc %}
+
+The binary distribution and source artifacts are now available on the updated 
[Downloads page]({{ site.baseurl }}/downloads.html) of the Flink website, and 
the most recent distribution of PyFlink is available on 
[PyPI](https://pypi.org/project/apache-flink/). For more details, check the 
complete [release 
changelog](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346364=Html=12315522)
 and the [updated documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/flink-docs-release-1.11/). 
+
+We encourage you to download the release and share your feedback with the 
community through the [Flink mailing 
lists](https://flink.apache.org/community.html#mailing-lists) or 
[JIRA](https://issues.apache.org/jira/projects/FLINK/summary).
+
+## New Features and Improvements
+
+### Unaligned Checkpoints (Beta)
+
+Triggering a checkpoint in Flink will cause a [checkpoint barrier]({{ 
site.DOCS_BASE_URL 
}}flink-docs-release-1.11/internals/stream_checkpointing.html#barriers) to flow 
from the sources of your topology all the way towards the sinks. For operators 
that receive more than one input stream, the barriers flowing through each 
channel need to be aligned before the operator can snapshot its state and 
forward the checkpoint barrier — typically, this alignment will take just a few 
milliseconds to complete, but it can become a bottleneck in backpressured 
pipelines as:
+
+ * Checkpoint barriers will flow much slower through backpressured channels, 
effectively blocking the remaining channels and their upstream operators during 
checkpointing;
+
+ * Slow checkpoint barrier propagation leads to longer checkpointing times and 
can, worst case, result in little to no progress in the application.
+
+To improve the performance of checkpointing under backpressure scenarios, the 
community is rolling out the first iteration of unaligned checkpoints 
([FLIP-76](https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints))
 with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1), 
this approach doesn’t wait for barrier alignment across input channels, instead 
allowing barriers to overtake in-flight records and forwarding them downstream 
before the synchronous part of the checkpoint takes place (Fig. 2).
+
+
+
+
+
+
+  
+
+  
+   
+   
+   Fig.1: Aligned 
Checkpoints
+ 
+
+  
+  
+
+  
+   
+   
+   Fig.2: Unaligned 
Checkpoints
+ 
+
+  
+
+
+
+
+
+
+Because in-flight records have to be persisted as part of the snapshot, 
unaligned checkpoints will lead to increased checkpoints sizes. On the upside, 
**checkpointing times are heavily reduced**, so users will see more progress 
(even in unstable environments) as more up-to-date checkpoints will lighten the 
recovery process. You can learn more about the current limitations of unaligned 
checkpoints in the [documentation]({{ site.DOCS_BASE_URL 
}}flink-docs-release-1.11/concepts/stateful-stream-processing.html#unaligned-checkpointing),
 and track the improvement 

[jira] [Updated] (FLINK-18362) Shipping jdk by shipping archive

2020-06-26 Thread gramo lee (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gramo lee updated FLINK-18362:
--
Attachment: support-yarn.ship-archives.patch

> Shipping jdk by shipping archive
> 
>
> Key: FLINK-18362
> URL: https://issues.apache.org/jira/browse/FLINK-18362
> Project: Flink
>  Issue Type: Wish
>Affects Versions: 1.10.1
>Reporter: Noah
>Priority: Minor
> Attachments: support-yarn.ship-archives.patch
>
>
> Hello,
> Our team are running flink cluster on YARN, and it works so well 
> h4. Functional requirements
> Is there any option to ship archive to YARN applications?
> h4. Backgrounds
> Recently, one of job has been shut down with jdk8 version related issues.
> https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6675699
> It's easy problem if we could set latest jdk on 
> `containerized.taskmanager.env.JAVA_HOME`.
> However, cluster administrator said it's difficult to install the latest jdk 
> on all cluster machines.
>  
> So, we planned to run a job on latest jdk that is shipped via shared 
> resources.
> There's an option `yarn.ship-directories` but it's quite slow because jdk has 
> large number of files.
> If Flink supports to ship archive such as `yarn.ship-archive`, we can ship 
> jdk archive to remote machines and use shipped jdk location as JAVA_HOME 
> (using `yarn.container-start-command-template` )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [flink] wanglijie95 edited a comment on pull request #12754: [FLINK-5717][streaming] Fix NPE on SessionWindows with ContinuousProc…

2020-06-26 Thread GitBox


wanglijie95 edited a comment on pull request #12754:
URL: https://github.com/apache/flink/pull/12754#issuecomment-650017734


   Hi @aljoscha , when I checked the implementation of 
`ContinuousEventTimeTrigger` and `ContinuousProcessingTimeTrigger`, I was 
confused in two methods :  `OnElement` and `clear`.
   
   First of all, I think that `ContinuousEventTimeTrigger` and 
`ContinuousProcessingTimeTrigger` should also fire at the end of window. I saw 
that `ContinuousEventTimeTrigge` has registered a timer with 
`window.maxTimestamp()` in `OnElement` : 
https://github.com/apache/flink/blob/6227fffbe67c9ce4c5591ed1e940f6c34ee137bb/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/ContinuousEventTimeTrigger.java#L59
   But I didn't find that in `ContinuousProcessingTimeTrigger.`.
   
   Secondly, I think that `ContinuousEventTimeTrigger` and 
`ContinuousProcessingTimeTrigger` should delete the timer with 
`window.maxTimestamp()` in `clear`, but they didn't. 
   
   Are the above bugs? Or some other considerations, I am a little confused.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] wanglijie95 commented on pull request #12754: [FLINK-5717][streaming] Fix NPE on SessionWindows with ContinuousProc…

2020-06-26 Thread GitBox


wanglijie95 commented on pull request #12754:
URL: https://github.com/apache/flink/pull/12754#issuecomment-650017734


   Hi @aljoscha , when I checked the implementation of 
`ContinuousEventTimeTrigger` and `ContinuousProcessingTimeTrigger`, I was 
confused in two methods :  `OnElement` and `clear`.
   
   First of all, I think that `ContinuousEventTimeTrigger` and 
`ContinuousProcessingTimeTrigger` should also fire at the end of window. I saw 
that `ContinuousEventTimeTrigge` has registered a timer with 
`window.maxTimestamp()` in `OnElement` : 
https://github.com/apache/flink/blob/6227fffbe67c9ce4c5591ed1e940f6c34ee137bb/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/ContinuousEventTimeTrigger.java#L59
   But I didn't find that in `ContinuousProcessingTimeTrigger.`.
   
   Secondly, I think that `ContinuousEventTimeTrigger` should delete the timer 
with `window.maxTimestamp()` in `clear`, but it didn't. 
   
   Are the above bugs? Or some other considerations, I am a little confused.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (FLINK-18434) Can not select fields with JdbcCatalog

2020-06-26 Thread Dawid Wysakowicz (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146074#comment-17146074
 ] 

Dawid Wysakowicz commented on FLINK-18434:
--

I completely agree this shows one of the flaws of the Catalog interface. I 
don't think though having a {{ReadableOnlyCatalog}} would necessarily help in 
this particular case. In the end the {{Catalog#getFunction}} is a read 
operation. Personally what I don't like about the Catalog interface is that 
there is so many different responsibilities boundled into one interface  
(tables/views/functions/partitions/statistics/...). Honestly though I really 
don't know how to proceed with it, and if we can do sth about it. It is a new 
API that was introduced only recently.

As for a fix for that particular issue the easiest thing we can do is to throw 
{{FunctionNotExistException}} instead of {{UnsupportedOperationException}}. 
Which imo is another design flaw as it is controlling the application flaw with 
exceptions. I'd rather like to see this method return an Optional instead. 
Simply switching to {{Catalog#functionExists}} would not help, as this method 
also throws {{UnsupportedOperationException}} in {{JdbcCatalog}}.

> Can not select fields with JdbcCatalog
> --
>
> Key: FLINK-18434
> URL: https://issues.apache.org/jira/browse/FLINK-18434
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.11.0
>Reporter: Dawid Wysakowicz
>Priority: Blocker
> Fix For: 1.12.0, 1.11.1
>
>
> A query which selects fields from a table will fail if we set the 
> PostgresCatalog as default.
> Steps to reproduce:
>  # Create postgres catalog and set it as default
>  # Create any table (in any catalog)
>  # Query that table with {{SELECT field FROM t}} (Important it must be a 
> field name not '{{*}}'
>  #  The query will fail
> Stack trace:
> {code}
> org.apache.flink.table.client.gateway.SqlExecutionException: Invalidate SQL 
> statement.
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:100)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parse(SqlCommandParser.java:91)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.CliClient.parseCommand(CliClient.java:257) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.cli.CliClient.open(CliClient.java:211) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.openCli(SqlClient.java:142) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.start(SqlClient.java:114) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at org.apache.flink.table.client.SqlClient.main(SqlClient.java:201) 
> [flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
> Caused by: org.apache.flink.table.api.ValidationException: SQL validation 
> failed. null
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:146)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:108)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:187)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
>  ~[flink-table-blink_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.lambda$parse$0(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.ExecutionContext.wrapClassLoader(ExecutionContext.java:255)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.gateway.local.LocalExecutor$1.parse(LocalExecutor.java:430)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   at 
> org.apache.flink.table.client.cli.SqlCommandParser.parseBySqlParser(SqlCommandParser.java:98)
>  ~[flink-sql-client_2.11-1.12-SNAPSHOT.jar:1.12-SNAPSHOT]
>   ... 6 more
> Caused by: java.lang.UnsupportedOperationException
>   at 
> org.apache.flink.connector.jdbc.catalog.AbstractJdbcCatalog.getFunction(AbstractJdbcCatalog.java:261)
>  ~[?:?]
>   at 
> 

[GitHub] [flink] azagrebin commented on pull request #11245: [FLINK-15794][Kubernetes] Generate the Kubernetes default image version

2020-06-26 Thread GitBox


azagrebin commented on pull request #11245:
URL: https://github.com/apache/flink/pull/11245#issuecomment-650011223


   The option docs has to be rebuilt.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org