[jira] [Commented] (FLINK-5243) Implement an example for BipartiteGraph
[ https://issues.apache.org/jira/browse/FLINK-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976842#comment-16976842 ] Greg Hogan commented on FLINK-5243: --- Hi [~dinesh.b], is the proposed implementation only running SSSP on either of the top or bottom projections? > Implement an example for BipartiteGraph > --- > > Key: FLINK-5243 > URL: https://issues.apache.org/jira/browse/FLINK-5243 > Project: Flink > Issue Type: Sub-task > Components: Library / Graph Processing (Gelly) >Reporter: Ivan Mushketyk >Priority: Major > Labels: beginner > > Should implement example for BipartiteGraph in gelly-examples project > similarly to examples for Graph class. > Depends on this: https://issues.apache.org/jira/browse/FLINK-2254 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-5243) Implement an example for BipartiteGraph
[ https://issues.apache.org/jira/browse/FLINK-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848548#comment-16848548 ] Greg Hogan commented on FLINK-5243: --- [~jasleenk22], new contributions are always welcomed! Do you have a distributed algorithm in mind to implement for Bipartite Matching? > Implement an example for BipartiteGraph > --- > > Key: FLINK-5243 > URL: https://issues.apache.org/jira/browse/FLINK-5243 > Project: Flink > Issue Type: Sub-task > Components: Library / Graph Processing (Gelly) >Reporter: Ivan Mushketyk >Priority: Major > Labels: beginner > > Should implement example for BipartiteGraph in gelly-examples project > similarly to examples for Graph class. > Depends on this: https://issues.apache.org/jira/browse/FLINK-2254 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-9577) Divide-by-zero in PageRank
[ https://issues.apache.org/jira/browse/FLINK-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan resolved FLINK-9577. --- Resolution: Not A Bug We do test for this in `PageRankTest.testWithEmptyGraphWithoutVertices`. Divide-by-zero is only an error for integer division and here were are operating as `double`. > Divide-by-zero in PageRank > -- > > Key: FLINK-9577 > URL: https://issues.apache.org/jira/browse/FLINK-9577 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.4.0, 1.5.0, 1.6.0 >Reporter: Chesnay Schepler >Assignee: vinoyang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > {code} > // org.apache.flink.graph.library.linkanalysis.PageRank#AdjustScores#open > this.vertexCount = vertexCountIterator.hasNext() ? > vertexCountIterator.next().getValue() : 0; > this.uniformlyDistributedScore = ((1 - dampingFactor) + dampingFactor * > sumOfSinks) / this.vertexCount; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-10644) Batch Job: Speculative execution
[ https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760004#comment-16760004 ] Greg Hogan commented on FLINK-10644: I'm not so sure that speculative execution is a good fit for Apache Flink. In MapReduce there are two concepts conducive to speculative execution not present in Flink: 1) In MapReduce map/reduce/task to mapper/reducer/container ratio is often 10:1 or higher. In Flink all tasks are immediately assigned and processed in parallel. 2) In MapReduce intermediate and output data is always persisted whereas in Flink only state is persisted (and only in streaming). Input is assumed to be replayable but speculative execution would presumably also work for intermediate tasks. As noted, Spark has included speculative execution and the Spark processing model is closer to Flink's. I'm just not clear on the circumstances where it is beneficial to start a catch-up task so late. I haven't followed the work on unification of batch and streaming but it seems more valuable to focus on transition a task from a straggler machine rather than start that task over. > Batch Job: Speculative execution > > > Key: FLINK-10644 > URL: https://issues.apache.org/jira/browse/FLINK-10644 > Project: Flink > Issue Type: New Feature > Components: JobManager >Reporter: JIN SUN >Assignee: ryantaocer >Priority: Major > Fix For: 1.8.0 > > > Strugglers/outlier are tasks that run slower than most of the all tasks in a > Batch Job, this somehow impact job latency, as pretty much this straggler > will be in the critical path of the job and become as the bottleneck. > Tasks may be slow for various reasons, including hardware degradation, or > software mis-configuration, or noise neighboring. It's hard for JM to predict > the runtime. > To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark > has *_speculative execution_*. Speculative execution is a health-check > procedure that checks for tasks to be speculated, i.e. running slower in a > ExecutionJobVertex than the median of all successfully completed tasks in > that EJV, Such slow tasks will be re-submitted to another TM. It will not > stop the slow tasks, but run a new copy in parallel. And will kill the others > if one of them complete. > This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be > append later. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9061) add entropy to s3 path for better scalability
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549386#comment-16549386 ] Greg Hogan commented on FLINK-9061: --- Not that we shouldn't implement the general purpose solution but Amazon looks to have increased the PUT rate from 100 to 3500 and the GET rate from 300 to 5500: "This S3 request rate performance increase removes any previous guidance to randomize object prefixes to achieve faster performance. That means you can now use logical or sequential naming patterns in S3 object naming without any performance implications." https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/ > add entropy to s3 path for better scalability > - > > Key: FLINK-9061 > URL: https://issues.apache.org/jira/browse/FLINK-9061 > Project: Flink > Issue Type: Bug > Components: FileSystem, State Backends, Checkpointing >Affects Versions: 1.5.0, 1.4.2 >Reporter: Jamie Grier >Assignee: Indrajit Roychoudhury >Priority: Critical > Labels: pull-request-available > > I think we need to modify the way we write checkpoints to S3 for high-scale > jobs (those with many total tasks). The issue is that we are writing all the > checkpoint data under a common key prefix. This is the worst case scenario > for S3 performance since the key is used as a partition key. > > In the worst case checkpoints fail with a 500 status code coming back from S3 > and an internal error type of TooBusyException. > > One possible solution would be to add a hook in the Flink filesystem code > that allows me to "rewrite" paths. For example say I have the checkpoint > directory set to: > > s3://bucket/flink/checkpoints > > I would hook that and rewrite that path to: > > s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original > path > > This would distribute the checkpoint write load around the S3 cluster evenly. > > For reference: > https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/ > > Any other people hit this issue? Any other ideas for solutions? This is a > pretty serious problem for people trying to checkpoint to S3. > > -Jamie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422890#comment-16422890 ] Greg Hogan commented on FLINK-9061: --- Since S3 key names are opaque it sounds like any "prefix" is counted for the initial entropy (so "c/" instead of "checkpoints/"?). This doesn't seem very ops-friendly if various points from various jobs are now intermixed by hash. > S3 checkpoint data not partitioned well -- causes errors and poor performance > - > > Key: FLINK-9061 > URL: https://issues.apache.org/jira/browse/FLINK-9061 > Project: Flink > Issue Type: Bug > Components: FileSystem, State Backends, Checkpointing >Affects Versions: 1.4.2 >Reporter: Jamie Grier >Priority: Critical > > I think we need to modify the way we write checkpoints to S3 for high-scale > jobs (those with many total tasks). The issue is that we are writing all the > checkpoint data under a common key prefix. This is the worst case scenario > for S3 performance since the key is used as a partition key. > > In the worst case checkpoints fail with a 500 status code coming back from S3 > and an internal error type of TooBusyException. > > One possible solution would be to add a hook in the Flink filesystem code > that allows me to "rewrite" paths. For example say I have the checkpoint > directory set to: > > s3://bucket/flink/checkpoints > > I would hook that and rewrite that path to: > > s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original > path > > This would distribute the checkpoint write load around the S3 cluster evenly. > > For reference: > https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/ > > Any other people hit this issue? Any other ideas for solutions? This is a > pretty serious problem for people trying to checkpoint to S3. > > -Jamie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance
[ https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417402#comment-16417402 ] Greg Hogan commented on FLINK-9061: --- [~ste...@apache.org], not sure why your [link|https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/] isn't showing in your comment. I went looking for more info and found [this|https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/] explanation for the need to put entropy in the short prefix of the object name: "In fact, S3 even has an algorithm to detect this parallel type of write pattern and will automatically create multiple child partitions from the same parent simultaneously – increasing the system’s operations per second budget as request heat is detected." > S3 checkpoint data not partitioned well -- causes errors and poor performance > - > > Key: FLINK-9061 > URL: https://issues.apache.org/jira/browse/FLINK-9061 > Project: Flink > Issue Type: Bug > Components: FileSystem, State Backends, Checkpointing >Affects Versions: 1.4.2 >Reporter: Jamie Grier >Priority: Critical > > I think we need to modify the way we write checkpoints to S3 for high-scale > jobs (those with many total tasks). The issue is that we are writing all the > checkpoint data under a common key prefix. This is the worst case scenario > for S3 performance since the key is used as a partition key. > > In the worst case checkpoints fail with a 500 status code coming back from S3 > and an internal error type of TooBusyException. > > One possible solution would be to add a hook in the Flink filesystem code > that allows me to "rewrite" paths. For example say I have the checkpoint > directory set to: > > s3://bucket/flink/checkpoints > > I would hook that and rewrite that path to: > > s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original > path > > This would distribute the checkpoint write load around the S3 cluster evenly. > > For reference: > https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/ > > Any other people hit this issue? Any other ideas for solutions? This is a > pretty serious problem for people trying to checkpoint to S3. > > -Jamie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8809) Decrease maximum value of DirectMemory at default config
[ https://issues.apache.org/jira/browse/FLINK-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416137#comment-16416137 ] Greg Hogan commented on FLINK-8809: --- Will a restricted {{MaxDirectMemorySize}} prevent an OOME? If this is a production issue may want to document use of {{jdk.nio.maxCachedBufferSize}} which would prevent the JVM from retaining large "temporary" buffers (some discussion [here|http://www.evanjones.ca/java-bytebuffer-leak.html], and note that the option is available from [jdk8u102|http://www.oracle.com/technetwork/java/javase/8u102-relnotes-3021767.html]). > Decrease maximum value of DirectMemory at default config > > > Key: FLINK-8809 > URL: https://issues.apache.org/jira/browse/FLINK-8809 > Project: Flink > Issue Type: Bug > Components: TaskManager >Reporter: Kirill A. Korinskiy >Priority: Major > > Good day! > > Have I can see since this > [commit|https://github.com/apache/flink/commit/6c44d93d0a9da725ef8b1ad2a94889f79321db73] > TaskManager uses 8,388,607 terabytes as maximum out of heap memory. I guess > that not any system has so much memory and it may be a reason to kill java > process by OOM Killer. > > I suggest to decrease this value to reasonable value by default. > > Right now I see only one way to overstep this hardcoded value: setup > FLINK_TM_HEAP to 0, and specified heap size by hand over > FLINK_ENV_JAVA_OPTS_TM. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9090) Replace UUID.randomUUID with more efficient PRNG
[ https://issues.apache.org/jira/browse/FLINK-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416063#comment-16416063 ] Greg Hogan commented on FLINK-9090: --- Could you point to a case where use of {{UUID.randomUUID}} might affect performance? I'm only seeing this used to name things (temp dirs, operators, backends, etc.) rather than on the elementwise hot path. > Replace UUID.randomUUID with more efficient PRNG > > > Key: FLINK-9090 > URL: https://issues.apache.org/jira/browse/FLINK-9090 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Hai Zhou >Priority: Major > > Currently UUID.randomUUID is called in various places in the code base. > * It is non-deterministic. > * It uses a single secure random for UUID generation. This uses a single JVM > wide lock, and this can lead to lock contention and other performance > problems. > We should move to something that is deterministic by using seeded PRNGs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-9090) Replace UUID.randomUUID with more efficient PRNG
[ https://issues.apache.org/jira/browse/FLINK-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415910#comment-16415910 ] Greg Hogan edited comment on FLINK-9090 at 3/27/18 6:30 PM: Why the need to be deterministic (and user configurable?) and where is this called on a hot path? was (Author: greghogan): Why the need to be deterministic (and user configurable?) and where is this called on from a hot path? > Replace UUID.randomUUID with more efficient PRNG > > > Key: FLINK-9090 > URL: https://issues.apache.org/jira/browse/FLINK-9090 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Hai Zhou >Priority: Major > > Currently UUID.randomUUID is called in various places in the code base. > * It is non-deterministic. > * It uses a single secure random for UUID generation. This uses a single JVM > wide lock, and this can lead to lock contention and other performance > problems. > We should move to something that is deterministic by using seeded PRNGs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-9090) Replace UUID.randomUUID with deterministic PRNG
[ https://issues.apache.org/jira/browse/FLINK-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415910#comment-16415910 ] Greg Hogan commented on FLINK-9090: --- Why the need to be deterministic (and user configurable?) and where is this called on from a hot path? > Replace UUID.randomUUID with deterministic PRNG > --- > > Key: FLINK-9090 > URL: https://issues.apache.org/jira/browse/FLINK-9090 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Hai Zhou >Priority: Major > > Currently UUID.randomUUID is called in various places in the code base. > * It is non-deterministic. > * It uses a single secure random for UUID generation. This uses a single JVM > wide lock, and this can lead to lock contention and other performance > problems. > We should move to something that is deterministic by using seeded PRNGs -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8907) Unable to start the process by start-cluster.sh
[ https://issues.apache.org/jira/browse/FLINK-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414424#comment-16414424 ] Greg Hogan commented on FLINK-8907: --- [~mingleizhang], is the remote-access issue fixed by FLINK-8843? > Unable to start the process by start-cluster.sh > --- > > Key: FLINK-8907 > URL: https://issues.apache.org/jira/browse/FLINK-8907 > Project: Flink > Issue Type: Bug > Components: Startup Shell Scripts >Reporter: mingleizhang >Priority: Minor > Fix For: 1.5.0 > > > Build a flink based on the latest code from master branch. And I get the > flink binary of flink-1.6-SNAPSHOT. When I run ./start-cluster.sh by the > default flink-conf.yaml. I can not start the process and get the error below. > {code:java} > [root@ricezhang-pjhzf bin]# ./start-cluster.sh > >> Starting cluster. > >> Starting standalonesession daemon on host ricezhang-pjhzf.vclound.com. > >> : Temporary failure in name resolutionost > {code} > *By the way, I build this flink from a windows computer. And copy that folder > {{flink-1.6-SNAPSHOT}} to centos.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8907) Unable to start the process by start-cluster.sh
[ https://issues.apache.org/jira/browse/FLINK-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414267#comment-16414267 ] Greg Hogan commented on FLINK-8907: --- [~mingleizhang], do I understand correctly that you are able to access the web frontend locally but not remotely? Is this a firewall issue? > Unable to start the process by start-cluster.sh > --- > > Key: FLINK-8907 > URL: https://issues.apache.org/jira/browse/FLINK-8907 > Project: Flink > Issue Type: Bug > Components: Startup Shell Scripts >Reporter: mingleizhang >Priority: Minor > Fix For: 1.5.0 > > > Build a flink based on the latest code from master branch. And I get the > flink binary of flink-1.6-SNAPSHOT. When I run ./start-cluster.sh by the > default flink-conf.yaml. I can not start the process and get the error below. > {code:java} > [root@ricezhang-pjhzf bin]# ./start-cluster.sh > >> Starting cluster. > >> Starting standalonesession daemon on host ricezhang-pjhzf.vclound.com. > >> : Temporary failure in name resolutionost > {code} > *By the way, I build this flink from a windows computer. And copy that folder > {{flink-1.6-SNAPSHOT}} to centos.* -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8809) Decrease maximum value of DirectMemory at default config
[ https://issues.apache.org/jira/browse/FLINK-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413978#comment-16413978 ] Greg Hogan commented on FLINK-8809: --- As you have noted this maximum value is set only for [MaxDirectMemorySize|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html] which constrains the "the maximum total size (in bytes) of the New I/O (the java.nio package) direct-buffer allocations". Flink's allocation of memory segments is controlled by its configuration so there is no need to constrain this value in the JVM. What would you choose as a reasonable default value? Requiring some users to increase this value is a [DRY|https://en.wikipedia.org/wiki/Don%27t_repeat_yourself] anti-pattern. So I think the explanation to be: there is no harm in setting this to an essentially "infinite" value, and no benefit to setting a lower value. > Decrease maximum value of DirectMemory at default config > > > Key: FLINK-8809 > URL: https://issues.apache.org/jira/browse/FLINK-8809 > Project: Flink > Issue Type: Bug > Components: TaskManager >Reporter: Kirill A. Korinskiy >Priority: Major > > Good day! > > Have I can see since this > [commit|https://github.com/apache/flink/commit/6c44d93d0a9da725ef8b1ad2a94889f79321db73] > TaskManager uses 8,388,607 terabytes as maximum out of heap memory. I guess > that not any system has so much memory and it may be a reason to kill java > process by OOM Killer. > > I suggest to decrease this value to reasonable value by default. > > Right now I see only one way to overstep this hardcoded value: setup > FLINK_TM_HEAP to 0, and specified heap size by hand over > FLINK_ENV_JAVA_OPTS_TM. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8640) Disable japicmp on java 9
[ https://issues.apache.org/jira/browse/FLINK-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8640: -- Description: The {{japicmp}} plugin does not work out-of-the-box with java 9 as per [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177]. It is necessary to modify MAVEN_OPTS which we cannot automatically do as part of our maven build, hence we should disable the plugin. was: The {{japicmp}} plugin does not work out-of-the-box with java 9 as per [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177.] It is necessary to modify MAVEN_OPTS which we cannot automatically do as part of our maven build, hence we should disable the plugin. > Disable japicmp on java 9 > - > > Key: FLINK-8640 > URL: https://issues.apache.org/jira/browse/FLINK-8640 > Project: Flink > Issue Type: Sub-task > Components: Build System >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Minor > > The {{japicmp}} plugin does not work out-of-the-box with java 9 as per > [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177]. > It is necessary to modify MAVEN_OPTS which we cannot automatically do as part > of our maven build, hence we should disable the plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8640) Disable japicmp on java 9
[ https://issues.apache.org/jira/browse/FLINK-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8640: -- Description: The {{japicmp}} plugin does not work out-of-the-box with java 9 as per [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177.] It is necessary to modify MAVEN_OPTS which we cannot automatically do as part of our maven build, hence we should disable the plugin. was: The {{japicmp}} plugin does not work out-of-the-box with java 9 as per [https://github.com/siom79/japicmp/issues/177.] It is necessary to modify MAVEN_OPTS which we cannot automatically do as part of our maven build, hence we should disable the plugin. > Disable japicmp on java 9 > - > > Key: FLINK-8640 > URL: https://issues.apache.org/jira/browse/FLINK-8640 > Project: Flink > Issue Type: Sub-task > Components: Build System >Affects Versions: 1.5.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Minor > > The {{japicmp}} plugin does not work out-of-the-box with java 9 as per > [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177.] > It is necessary to modify MAVEN_OPTS which we cannot automatically do as part > of our maven build, hence we should disable the plugin. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration
[ https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326285#comment-16326285 ] Greg Hogan commented on FLINK-8414: --- You certainly can measure scalability but as you have discovered the performance will not be monotonically increasing. Redistributing operators require a channel between each pair of tasks, so with a parallelism of 2^7 you will have 2^14 channels between each task for each iteration. There are many reasons to use Flink and Gelly, but for some use cases for certain algorithms you may even get better performance with a single-threaded implementation. See "Scalability! But at what COST?". ConnectedComponents and PageRank require, respectively, no and very little intermediate data, whereas the similarity measures JaccardIndex and AdamicAdar as well as triangle metrics such as ClusteringCoefficient process super-linear intermediate data and benefit much more from Flink's scalability. When comparing against non-distributed implementations it is important to note that all Gelly algorithms process generic data, whereas many "optimized" algorithms assume compact integer representations. > Gelly performance seriously decreases when using the suggested parallelism > configuration > > > Key: FLINK-8414 > URL: https://issues.apache.org/jira/browse/FLINK-8414 > Project: Flink > Issue Type: Bug > Components: Configuration, Documentation, Gelly >Reporter: flora karniav >Priority: Minor > > I am running Gelly examples with different datasets in a cluster of 5 > machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each. > The number of Slots parameter is set to 32 (as suggested) and the parallelism > to 128 (32 cores*4 taskmanagers). > I observe a vast performance degradation using these suggested settings than > setting parallelism.default to 16 for example were the same job completes at > ~60 seconds vs ~140 in the 128 parallelism case. > Is there something wrong in my configuration? Should I decrease parallelism > and -if so- will this inevitably decrease CPU utilization? > Another matter that may be related to this is the number of partitions of the > data. Is this somehow related to parallelism? How many partitions are created > in the case of parallelism.default=128? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-6730) Activate strict checkstyle for flink-optimizer
[ https://issues.apache.org/jira/browse/FLINK-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan resolved FLINK-6730. --- Resolution: Won't Fix Closing per comment from #5294: "I would ignore the optimizer module TBH. There are no recent contributions to the module, so we don't benefit from cleaner PR. I also don't know what will happen to the module when we start unifying the batch&streaming APIs, which probably will be either be a full rewrite or removal." > Activate strict checkstyle for flink-optimizer > -- > > Key: FLINK-6730 > URL: https://issues.apache.org/jira/browse/FLINK-6730 > Project: Flink > Issue Type: Sub-task > Components: Build System >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > > Long term issue for incrementally introducing the strict checkstyle to > flink-optimizer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (FLINK-8427) Checkstyle for org.apache.flink.optimizer.costs
[ https://issues.apache.org/jira/browse/FLINK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan resolved FLINK-8427. --- Resolution: Won't Fix Assignee: (was: Greg Hogan) > Checkstyle for org.apache.flink.optimizer.costs > --- > > Key: FLINK-8427 > URL: https://issues.apache.org/jira/browse/FLINK-8427 > Project: Flink > Issue Type: Improvement > Components: Optimizer >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Priority: Trivial > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8427) Checkstyle for org.apache.flink.optimizer.costs
Greg Hogan created FLINK-8427: - Summary: Checkstyle for org.apache.flink.optimizer.costs Key: FLINK-8427 URL: https://issues.apache.org/jira/browse/FLINK-8427 Project: Flink Issue Type: Improvement Components: Optimizer Affects Versions: 1.5.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration
[ https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324542#comment-16324542 ] Greg Hogan commented on FLINK-8414: --- It is incumbent on the user to configure an appropriate parallelism for the quantity of data. Those graphs contain only a few tens of megabytes of data so it is not surprising that the optimal parallelism is around (or even lower than) 16. You can use `VertexMetrics` to pre-compute the size of the graph and adjust the parallelism at runtime (`ExecutionConfig#setParallelism`). Flink and Gelly are designed to scale to 100s to 1000s of parallel tasks and GBs to TBs of data. > Gelly performance seriously decreases when using the suggested parallelism > configuration > > > Key: FLINK-8414 > URL: https://issues.apache.org/jira/browse/FLINK-8414 > Project: Flink > Issue Type: Bug > Components: Configuration, Documentation, Gelly >Reporter: flora karniav >Priority: Minor > > I am running Gelly examples with different datasets in a cluster of 5 > machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each. > The number of Slots parameter is set to 32 (as suggested) and the parallelism > to 128 (32 cores*4 taskmanagers). > I observe a vast performance degradation using these suggested settings than > setting parallelism.default to 16 for example were the same job completes at > ~60 seconds vs ~140 in the 128 parallelism case. > Is there something wrong in my configuration? Should I decrease parallelism > and -if so- will this inevitably decrease CPU utilization? > Another matter that may be related to this is the number of partitions of the > data. Is this somehow related to parallelism? How many partitions are created > in the case of parallelism.default=128? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8422) Checkstyle for org.apache.flink.api.java.tuple
Greg Hogan created FLINK-8422: - Summary: Checkstyle for org.apache.flink.api.java.tuple Key: FLINK-8422 URL: https://issues.apache.org/jira/browse/FLINK-8422 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.5.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial Update {{TupleGenerator}} for Flink's checkstyle and rebuild {{Tuple}} and {{TupleBuilder}} classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration
[ https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324133#comment-16324133 ] Greg Hogan commented on FLINK-8414: --- This is more of a question than a reported bug and may be more appropriate for the flink-user mailing list. Are you able to share what algorithm(s) you are running and describe the dataset(s)? > Gelly performance seriously decreases when using the suggested parallelism > configuration > > > Key: FLINK-8414 > URL: https://issues.apache.org/jira/browse/FLINK-8414 > Project: Flink > Issue Type: Bug > Components: Configuration, Documentation, Gelly >Reporter: flora karniav >Priority: Minor > > I am running Gelly examples with different datasets in a cluster of 5 > machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each. > The number of Slots parameter is set to 32 (as suggested) and the parallelism > to 128 (32 cores*4 taskmanagers). > I observe a vast performance degradation using these suggested settings than > setting parallelism.default to 16 for example were the same job completes at > ~60 seconds vs ~140 in the 128 parallelism case. > Is there something wrong in my configuration? Should I decrease parallelism > and -if so- will this inevitably decrease CPU utilization? > Another matter that may be related to this is the number of partitions of the > data. Is this somehow related to parallelism? How many partitions are created > in the case of parallelism.default=128? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8403) Flink Gelly examples hanging without returning result
[ https://issues.apache.org/jira/browse/FLINK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321032#comment-16321032 ] Greg Hogan commented on FLINK-8403: --- Could this be due to the [{{akka.framesize}}|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#distributed-coordination-via-akka] limit (default: 10 MiB)? Does the job succeed when you write the output to CSV? > Flink Gelly examples hanging without returning result > - > > Key: FLINK-8403 > URL: https://issues.apache.org/jira/browse/FLINK-8403 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.3.2 > Environment: CentOS Linux release 7.3.1611 >Reporter: flora karniav > Labels: examples, gelly, performance > Original Estimate: 72h > Remaining Estimate: 72h > > Hello, I am currently running and measuring Flink Gelly examples (Connected > components and Pagerank algorithms) with different SNAP datasets. When > running with the Twitter dataset for example > (https://snap.stanford.edu/data/egonets-Twitter.html) which has 81,306 > vertices everything executes and finishes OK and I get the reported job > runtime. On the other hand, executions with datasets having a bigger number > of vertices, e.g. https://snap.stanford.edu/data/com-Youtube.html with > 1,134,890 vertices, hang with no result and reported time, while at the same > time I get "Job execution switched to status FINISHED." > I thought that this could be a memory issue so I reached 125GB of RAM > assigned to my taskmanagers (and jobmanager), but still no luck. > The exact command I am running is: > ./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm > PageRank --directed false --input_filename hdfs://sith0:9000/user/xx.txt > --input CSV --type integer --input_field_delimiter $' ' --output print -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8363) Build Hadoop 2.9.0 convenience binaries
Greg Hogan created FLINK-8363: - Summary: Build Hadoop 2.9.0 convenience binaries Key: FLINK-8363 URL: https://issues.apache.org/jira/browse/FLINK-8363 Project: Flink Issue Type: New Feature Components: Build System Affects Versions: 1.5.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial Hadoop 2.9.0 was released on 17 November, 2017. A local {{mvn clean verify -Dhadoop.version=2.9.0}} ran successfully. With the new Hadoopless build we may be able to improve the build process by reusing the {{flink-dist}} jar (which differ only in build timestamps) and simply make each Hadoop-specific tarball by copying in the corresponding {{flink-shaded-hadoop2-uber}} jar. What portion of the TravisCI jobs can run Hadoopless? We could build and verify these once and then run a Hadoop-versioned job for each Hadoop version. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8361) Remove create_release_files.sh
Greg Hogan created FLINK-8361: - Summary: Remove create_release_files.sh Key: FLINK-8361 URL: https://issues.apache.org/jira/browse/FLINK-8361 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.5.0 Reporter: Greg Hogan Priority: Trivial The monolithic {{create_release_files.sh}} does not support building Flink without Hadoop and looks to have been superseded by the scripts in {{tools/releasing}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8320) Flink cluster does not work on Java 9
[ https://issues.apache.org/jira/browse/FLINK-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8320: -- Issue Type: Wish (was: Bug) > Flink cluster does not work on Java 9 > - > > Key: FLINK-8320 > URL: https://issues.apache.org/jira/browse/FLINK-8320 > Project: Flink > Issue Type: Wish >Affects Versions: 1.4.0 > Environment: flink-1.4.0, mac os x, 10.13.1 >Reporter: Steve Layland > Labels: java9 > > Recently got a new macbook and figured it was a good time to install java 9 > and try it out. I didn't realize that Java 9 was such a breaking update (eg: > https://blog.codefx.org/java/java-9-migration-guide/) and took the Flink > documentation at face value and assumed that Java 7+ or higher would be fine. > Here's is what happens after starting a local cluster and attempting to run > the sample WordCount program under Java 9: > {noformat} > flink-1.4.0 $ export JAVA_HOME=$(/usr/libexec/java_home -v 9) > cru@lappy:flink-1.4.0 $ java -version > java version "9.0.1" > Java(TM) SE Runtime Environment (build 9.0.1+11) > Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) > cru@lappy:flink-1.4.0 $ bin/start-cluster.sh > Starting cluster. > Starting jobmanager daemon on host lappy.local. > Starting taskmanager daemon on host lappy.local. > cru@lappy:flink-1.4.0 $ bin/flink run examples/streaming/WordCount.jar > Cluster configuration: Standalone cluster with JobManager at > localhost/127.0.0.1:6123 > Using address localhost:6123 to connect to JobManager. > JobManager web interface address http://localhost:8081 > Starting execution of program > Executing WordCount example with default input data set. > Use --input to specify file input. > Printing result to stdout. Use --output to specify output path. > Submitting job with JobID: ee054ffeb4784848143b76b7d51d99c1. Waiting for job > completion. > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: Couldn't retrieve the JobExecutionResult from the > JobManager. > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:492) > at > org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:105) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:456) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at > org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:89) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:396) > at > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:802) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:282) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054) > at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1101) > at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1098) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1098) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Couldn't > retrieve the JobExecutionResult from the JobManager. > at > org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:300) > at > org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:387) > at > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:481) > ... 18 more > Caused by: > org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: > Lost connection to the JobManager. > at > org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:219) > at > org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:104) > at > org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:71) > a
[jira] [Closed] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-5506. - Resolution: Fixed master: a355df6e33f402beac01c2908cb0c64cfeccadb2 release-1.4: 96051e516bdc114b04c5a3cbb75874c420e27e7b > Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException > - > > Key: FLINK-5506 > URL: https://issues.apache.org/jira/browse/FLINK-5506 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.1.4, 1.3.2, 1.4.1 >Reporter: Miguel E. Coimbra >Assignee: Greg Hogan > Labels: easyfix, newbie > Fix For: 1.5.0, 1.4.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > Reporting this here as per Vasia's advice. > I am having the following problem while trying out the > org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API > (Java). > Specs: JDK 1.8.0_102 x64 > Apache Flink: 1.1.4 > Suppose I have a very small (I tried an example with 38 vertices as well) > dataset stored in a tab-separated file 3-vertex.tsv: > {code} > #id1 id2 score > 010 > 020 > 030 > {code} > This is just a central vertex with 3 neighbors (disconnected between > themselves). > I am loading the dataset and executing the algorithm with the following code: > {code} > // Load the data from the .tsv file. > final DataSet> edgeTuples = > env.readCsvFile(inputPath) > .fieldDelimiter("\t") // node IDs are separated by spaces > .ignoreComments("#") // comments start with "%" > .types(Long.class, Long.class, Double.class); > // Generate a graph and add reverse edges (undirected). > final Graph graph = Graph.fromTupleDataSet(edgeTuples, > new MapFunction() { > private static final long serialVersionUID = 8713516577419451509L; > public Long map(Long value) { > return value; > } > }, > env).getUndirected(); > // CommunityDetection parameters. > final double hopAttenuationDelta = 0.5d; > final int iterationCount = 10; > // Prepare and trigger the execution. > DataSet> vs = graph.run(new > org.apache.flink.graph.library.CommunityDetection(iterationCount, > hopAttenuationDelta)).getVertices(); > vs.print(); > {code} > Running this code throws the following exception (check the bold line): > {code} > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.NullPointerException > at > org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158) > at > org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389) > at > org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486) > at > org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146) > at > org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642) > at java.lang.Thread.run(Thread.java:745) > {code} > After a further look, I set a breakpoint (Eclipse IDE debugging) at the line > in bold: > org.apache.flink.graph.library.CommunityDetection.java (source code accessed > automatically by
[jira] [Closed] (FLINK-8222) Update Scala version
[ https://issues.apache.org/jira/browse/FLINK-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-8222. - Resolution: Implemented master: 8987de3b241d23bbcc6ca5640e3cb77972a60be4 > Update Scala version > > > Key: FLINK-8222 > URL: https://issues.apache.org/jira/browse/FLINK-8222 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.5.0 > > > Update Scala to version {{2.11.12}}. I don't believe this affects the Flink > distribution but rather anyone who is compiling Flink or a > Flink-quickstart-derived program on a shared system. > "A privilege escalation vulnerability (CVE-2017-15288) has been identified in > the Scala compilation daemon." > https://www.scala-lang.org/news/security-update-nov17.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-8223) Update Hadoop versions
[ https://issues.apache.org/jira/browse/FLINK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-8223. - Resolution: Fixed master: d3cd51a3f9fbb3ffbe6d23a57ff3884733eb47fa > Update Hadoop versions > -- > > Key: FLINK-8223 > URL: https://issues.apache.org/jira/browse/FLINK-8223 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.5.0 > > > Update 2.7.3 to 2.7.5 and 2.8.0 to 2.8.3. See > http://hadoop.apache.org/releases.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8223) Update Hadoop versions
[ https://issues.apache.org/jira/browse/FLINK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8223: -- Description: Update 2.7.3 to 2.7.5 and 2.8.0 to 2.8.3. See http://hadoop.apache.org/releases.html (was: Update 2.7.3 to 2.7.4 and 2.8.0 to 2.8.2. See http://hadoop.apache.org/releases.html) > Update Hadoop versions > -- > > Key: FLINK-8223 > URL: https://issues.apache.org/jira/browse/FLINK-8223 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.5.0 > > > Update 2.7.3 to 2.7.5 and 2.8.0 to 2.8.3. See > http://hadoop.apache.org/releases.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8223) Update Hadoop versions
[ https://issues.apache.org/jira/browse/FLINK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8223: -- Fix Version/s: 1.5.0 > Update Hadoop versions > -- > > Key: FLINK-8223 > URL: https://issues.apache.org/jira/browse/FLINK-8223 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.5.0 > > > Update 2.7.3 to 2.7.4 and 2.8.0 to 2.8.2. See > http://hadoop.apache.org/releases.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8222) Update Scala version
[ https://issues.apache.org/jira/browse/FLINK-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8222: -- Fix Version/s: 1.5.0 > Update Scala version > > > Key: FLINK-8222 > URL: https://issues.apache.org/jira/browse/FLINK-8222 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.5.0 > > > Update Scala to version {{2.11.12}}. I don't believe this affects the Flink > distribution but rather anyone who is compiling Flink or a > Flink-quickstart-derived program on a shared system. > "A privilege escalation vulnerability (CVE-2017-15288) has been identified in > the Scala compilation daemon." > https://www.scala-lang.org/news/security-update-nov17.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (FLINK-8037) Missing cast in integer arithmetic in TransactionalIdsGenerator#generateIdsToAbort
[ https://issues.apache.org/jira/browse/FLINK-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan reassigned FLINK-8037: - Assignee: Greg Hogan > Missing cast in integer arithmetic in > TransactionalIdsGenerator#generateIdsToAbort > -- > > Key: FLINK-8037 > URL: https://issues.apache.org/jira/browse/FLINK-8037 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: Greg Hogan >Priority: Minor > > {code} > public Set generateIdsToAbort() { > Set idsToAbort = new HashSet<>(); > for (int i = 0; i < safeScaleDownFactor; i++) { > idsToAbort.addAll(generateIdsToUse(i * poolSize * > totalNumberOfSubtasks)); > {code} > The operands are integers where generateIdsToUse() expects long parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-5506: -- Fix Version/s: 1.4.1 1.5.0 > Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException > - > > Key: FLINK-5506 > URL: https://issues.apache.org/jira/browse/FLINK-5506 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.1.4, 1.3.2, 1.4.1 >Reporter: Miguel E. Coimbra >Assignee: Greg Hogan > Labels: easyfix, newbie > Fix For: 1.5.0, 1.4.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > Reporting this here as per Vasia's advice. > I am having the following problem while trying out the > org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API > (Java). > Specs: JDK 1.8.0_102 x64 > Apache Flink: 1.1.4 > Suppose I have a very small (I tried an example with 38 vertices as well) > dataset stored in a tab-separated file 3-vertex.tsv: > {code} > #id1 id2 score > 010 > 020 > 030 > {code} > This is just a central vertex with 3 neighbors (disconnected between > themselves). > I am loading the dataset and executing the algorithm with the following code: > {code} > // Load the data from the .tsv file. > final DataSet> edgeTuples = > env.readCsvFile(inputPath) > .fieldDelimiter("\t") // node IDs are separated by spaces > .ignoreComments("#") // comments start with "%" > .types(Long.class, Long.class, Double.class); > // Generate a graph and add reverse edges (undirected). > final Graph graph = Graph.fromTupleDataSet(edgeTuples, > new MapFunction() { > private static final long serialVersionUID = 8713516577419451509L; > public Long map(Long value) { > return value; > } > }, > env).getUndirected(); > // CommunityDetection parameters. > final double hopAttenuationDelta = 0.5d; > final int iterationCount = 10; > // Prepare and trigger the execution. > DataSet> vs = graph.run(new > org.apache.flink.graph.library.CommunityDetection(iterationCount, > hopAttenuationDelta)).getVertices(); > vs.print(); > {code} > Running this code throws the following exception (check the bold line): > {code} > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.NullPointerException > at > org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158) > at > org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389) > at > org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486) > at > org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146) > at > org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642) > at java.lang.Thread.run(Thread.java:745) > {code} > After a further look, I set a breakpoint (Eclipse IDE debugging) at the line > in bold: > org.apache.flink.graph.library.CommunityDetection.java (source code accessed > automatically by Maven) > // find the highest score of maxScoreLabel > double highestScore
[jira] [Created] (FLINK-8222) Update Scala version
Greg Hogan created FLINK-8222: - Summary: Update Scala version Key: FLINK-8222 URL: https://issues.apache.org/jira/browse/FLINK-8222 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.4.0 Reporter: Greg Hogan Assignee: Greg Hogan Update Scala to version {{2.11.12}}. I don't believe this affects the Flink distribution but rather anyone who is compiling Flink or a Flink-quickstart-derived program on a shared system. "A privilege escalation vulnerability (CVE-2017-15288) has been identified in the Scala compilation daemon." https://www.scala-lang.org/news/security-update-nov17.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8223) Update Hadoop versions
Greg Hogan created FLINK-8223: - Summary: Update Hadoop versions Key: FLINK-8223 URL: https://issues.apache.org/jira/browse/FLINK-8223 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.5.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial Update 2.7.3 to 2.7.4 and 2.8.0 to 2.8.2. See http://hadoop.apache.org/releases.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
[ https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan reassigned FLINK-5506: - Assignee: Greg Hogan > Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException > - > > Key: FLINK-5506 > URL: https://issues.apache.org/jira/browse/FLINK-5506 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.1.4, 1.3.2, 1.4.1 >Reporter: Miguel E. Coimbra >Assignee: Greg Hogan > Labels: easyfix, newbie > Original Estimate: 2h > Remaining Estimate: 2h > > Reporting this here as per Vasia's advice. > I am having the following problem while trying out the > org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API > (Java). > Specs: JDK 1.8.0_102 x64 > Apache Flink: 1.1.4 > Suppose I have a very small (I tried an example with 38 vertices as well) > dataset stored in a tab-separated file 3-vertex.tsv: > {code} > #id1 id2 score > 010 > 020 > 030 > {code} > This is just a central vertex with 3 neighbors (disconnected between > themselves). > I am loading the dataset and executing the algorithm with the following code: > {code} > // Load the data from the .tsv file. > final DataSet> edgeTuples = > env.readCsvFile(inputPath) > .fieldDelimiter("\t") // node IDs are separated by spaces > .ignoreComments("#") // comments start with "%" > .types(Long.class, Long.class, Double.class); > // Generate a graph and add reverse edges (undirected). > final Graph graph = Graph.fromTupleDataSet(edgeTuples, > new MapFunction() { > private static final long serialVersionUID = 8713516577419451509L; > public Long map(Long value) { > return value; > } > }, > env).getUndirected(); > // CommunityDetection parameters. > final double hopAttenuationDelta = 0.5d; > final int iterationCount = 10; > // Prepare and trigger the execution. > DataSet> vs = graph.run(new > org.apache.flink.graph.library.CommunityDetection(iterationCount, > hopAttenuationDelta)).getVertices(); > vs.print(); > {code} > Running this code throws the following exception (check the bold line): > {code} > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.NullPointerException > at > org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158) > at > org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389) > at > org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218) > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486) > at > org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146) > at > org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107) > at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642) > at java.lang.Thread.run(Thread.java:745) > {code} > After a further look, I set a breakpoint (Eclipse IDE debugging) at the line > in bold: > org.apache.flink.graph.library.CommunityDetection.java (source code accessed > automatically by Maven) > // find the highest score of maxScoreLabel > double highestScore = labelsWithHighestScore.get(maxScoreLabel); > - maxSc
[jira] [Updated] (FLINK-8180) Refactor driver outputs
[ https://issues.apache.org/jira/browse/FLINK-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8180: -- Description: The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build flink-gelly-examples as an uber jar in order to include additional modules (which may require a direct dependency on Jackson, which would require checkstyle suppressions around the unshaded jackson imports). was: The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build Gelly as an uber jar in order to include additional modules (which may require a direct dependency on Jackson, which would require checkstyle suppressions around the unshaded jackson imports). > Refactor driver outputs > --- > > Key: FLINK-8180 > URL: https://issues.apache.org/jira/browse/FLINK-8180 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.5.0 > > > The change in 1.4 of algorithm results from Tuples to POJOs broke the writing > of results as csv. Testing this was and is a challenge so was not done. There > are many additional improvements which can be made based on recent > improvements to the Gelly framework. > Result hash and analytic results should always be printed to the screen. > Results can optionally be written to stdout or to a file. In the latter case > the result hash and analytic results (and schema) will also be written to a > top-level file. > The "verbose" output strings can be replaced with json which is just as > human-readable but also machine readable. In addition to csv and json it may > be simple to support xml, etc. Computed fields will be optionally printed to > screen or file (currently these are always printed to screen but never to > file). > Testing will be simplified since formats are now a separate concern from the > stream. > Jackson is available to Gelly as a dependency provided in the Flink > distribution but we may want to build flink-gelly-examples as an uber jar in > order to include additional modules (which may require a direct dependency on > Jackson, which would require checkstyle suppressions around the unshaded > jackson imports). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8180) Refactor driver outputs
[ https://issues.apache.org/jira/browse/FLINK-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8180: -- Description: The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build Gelly as an uber jar in order to include additional modules (which may require a direct dependency on Jackson, which would require checkstyle suppressions around the unshaded jackson imports). was: The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build Gelly as a fat jar in order to include additional modules (which may require a direct dependency on Jackson, which would require checkstyle suppressions around the unshaded jackson imports). > Refactor driver outputs > --- > > Key: FLINK-8180 > URL: https://issues.apache.org/jira/browse/FLINK-8180 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.5.0 > > > The change in 1.4 of algorithm results from Tuples to POJOs broke the writing > of results as csv. Testing this was and is a challenge so was not done. There > are many additional improvements which can be made based on recent > improvements to the Gelly framework. > Result hash and analytic results should always be printed to the screen. > Results can optionally be written to stdout or to a file. In the latter case > the result hash and analytic results (and schema) will also be written to a > top-level file. > The "verbose" output strings can be replaced with json which is just as > human-readable but also machine readable. In addition to csv and json it may > be simple to support xml, etc. Computed fields will be optionally printed to > screen or file (currently these are always printed to screen but never to > file). > Testing will be simplified since formats are now a separate concern from the > stream. > Jackson is available to Gelly as a dependency provided in the Flink > distribution but we may want to build Gelly as an uber jar in order to > include additional modules (which may require a direct dependency on Jackson, > which would require checkstyle suppressions around the unshaded jackson > imports). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8180) Refactor driver outputs
[ https://issues.apache.org/jira/browse/FLINK-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8180: -- Description: The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build Gelly as a fat jar in order to include additional modules (which may require a direct dependency on Jackson, which would require checkstyle suppressions around the unshaded jackson imports). was: The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build Gelly as a fat jar in order to include additional modules (which may require a direct dependency on Jackson, which would fail the checkstyle requirement to use the shaded package). > Refactor driver outputs > --- > > Key: FLINK-8180 > URL: https://issues.apache.org/jira/browse/FLINK-8180 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.5.0 > > > The change in 1.4 of algorithm results from Tuples to POJOs broke the writing > of results as csv. Testing this was and is a challenge so was not done. There > are many additional improvements which can be made based on recent > improvements to the Gelly framework. > Result hash and analytic results should always be printed to the screen. > Results can optionally be written to stdout or to a file. In the latter case > the result hash and analytic results (and schema) will also be written to a > top-level file. > The "verbose" output strings can be replaced with json which is just as > human-readable but also machine readable. In addition to csv and json it may > be simple to support xml, etc. Computed fields will be optionally printed to > screen or file (currently these are always printed to screen but never to > file). > Testing will be simplified since formats are now a separate concern from the > stream. > Jackson is available to Gelly as a dependency provided in the Flink > distribution but we may want to build Gelly as a fat jar in order to include > additional modules (which may require a direct dependency on Jackson, which > would require checkstyle suppressions around the unshaded jackson imports). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8180) Refactor driver outputs
Greg Hogan created FLINK-8180: - Summary: Refactor driver outputs Key: FLINK-8180 URL: https://issues.apache.org/jira/browse/FLINK-8180 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 1.5.0 Reporter: Greg Hogan Assignee: Greg Hogan Fix For: 1.5.0 The change in 1.4 of algorithm results from Tuples to POJOs broke the writing of results as csv. Testing this was and is a challenge so was not done. There are many additional improvements which can be made based on recent improvements to the Gelly framework. Result hash and analytic results should always be printed to the screen. Results can optionally be written to stdout or to a file. In the latter case the result hash and analytic results (and schema) will also be written to a top-level file. The "verbose" output strings can be replaced with json which is just as human-readable but also machine readable. In addition to csv and json it may be simple to support xml, etc. Computed fields will be optionally printed to screen or file (currently these are always printed to screen but never to file). Testing will be simplified since formats are now a separate concern from the stream. Jackson is available to Gelly as a dependency provided in the Flink distribution but we may want to build Gelly as a fat jar in order to include additional modules (which may require a direct dependency on Jackson, which would fail the checkstyle requirement to use the shaded package). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-6864) Remove confusing "invalid POJO type" messages from TypeExtractor
[ https://issues.apache.org/jira/browse/FLINK-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-6864. - Resolution: Implemented master: 450b4241404055ed6638e354be421b83380827c5 > Remove confusing "invalid POJO type" messages from TypeExtractor > > > Key: FLINK-6864 > URL: https://issues.apache.org/jira/browse/FLINK-6864 > Project: Flink > Issue Type: Improvement > Components: Documentation, Type Serialization System >Reporter: Tzu-Li (Gordon) Tai >Assignee: Fang Yong > Fix For: 1.5.0 > > > When a user's type cannot be treated as a POJO, the {{TypeExtractor}} will > log warnings such as ".. must have a default constructor to be used as a > POJO.", " ... is not a valid POJO type because not all fields are valid POJO > fields." in the {{analyzePojo}} method. > These messages are often conceived as misleading for the user to think that > the job should have failed, whereas in fact in these cases Flink just > fallsback to Kryo and treat then as generic types. We should remove these > messages, and at the same time improve the type serialization docs at [1] to > explicitly inform what it means when Flink does / does not recognizes a user > type as a POJO. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#rules-for-pojo-types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-6864) Remove confusing "invalid POJO type" messages from TypeExtractor
[ https://issues.apache.org/jira/browse/FLINK-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-6864: -- Fix Version/s: 1.5.0 > Remove confusing "invalid POJO type" messages from TypeExtractor > > > Key: FLINK-6864 > URL: https://issues.apache.org/jira/browse/FLINK-6864 > Project: Flink > Issue Type: Improvement > Components: Documentation, Type Serialization System >Reporter: Tzu-Li (Gordon) Tai >Assignee: Fang Yong > Fix For: 1.5.0 > > > When a user's type cannot be treated as a POJO, the {{TypeExtractor}} will > log warnings such as ".. must have a default constructor to be used as a > POJO.", " ... is not a valid POJO type because not all fields are valid POJO > fields." in the {{analyzePojo}} method. > These messages are often conceived as misleading for the user to think that > the job should have failed, whereas in fact in these cases Flink just > fallsback to Kryo and treat then as generic types. We should remove these > messages, and at the same time improve the type serialization docs at [1] to > explicitly inform what it means when Flink does / does not recognizes a user > type as a POJO. > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#rules-for-pojo-types -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-8142) Cleanup reference to deprecated constants in ConfigConstants
[ https://issues.apache.org/jira/browse/FLINK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-8142. - Resolution: Implemented master: f2b804a7479dcba7980dd68a445635f4ac2198c0 > Cleanup reference to deprecated constants in ConfigConstants > > > Key: FLINK-8142 > URL: https://issues.apache.org/jira/browse/FLINK-8142 > Project: Flink > Issue Type: Improvement > Components: Local Runtime >Affects Versions: 1.4.0 >Reporter: Hai Zhou UTC+8 >Assignee: Hai Zhou UTC+8 >Priority: Minor > Fix For: 1.5.0 > > > ConfigConstants contains several deprecated String constants that are used by > other Flink modules. Those should be cleaned up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7967) Deprecate Hadoop specific Flink configuration options
[ https://issues.apache.org/jira/browse/FLINK-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7967. - Resolution: Implemented master: fae83c0b0c0a711d5d0f9de0b95fb6b703c4f912 > Deprecate Hadoop specific Flink configuration options > - > > Key: FLINK-7967 > URL: https://issues.apache.org/jira/browse/FLINK-7967 > Project: Flink > Issue Type: Improvement > Components: Configuration >Reporter: Till Rohrmann >Assignee: mingleizhang >Priority: Trivial > Fix For: 1.5.0 > > > I think we should deprecate the hadoop specific configuration options from > Flink and encourage people to use instead the environment variable > {{HADOOP_CONF_DIR}} to configure the Hadoop configuration directory. This > includes: > {code} > fs.hdfs.hdfsdefault > fs.hdfs.hdfssite > fs.hdfs.hadoopconf > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-8105) Removed unnecessary null check
[ https://issues.apache.org/jira/browse/FLINK-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-8105. - Resolution: Implemented master: 3561222c5d6c7cee79f8c5872f32227632135c48 > Removed unnecessary null check > -- > > Key: FLINK-8105 > URL: https://issues.apache.org/jira/browse/FLINK-8105 > Project: Flink > Issue Type: Improvement > Components: Checkstyle >Affects Versions: 1.4.0 >Reporter: Hai Zhou UTC+8 >Assignee: Hai Zhou UTC+8 >Priority: Minor > Fix For: 1.5.0 > > > eg. > {code:java} > if (value != null && value instanceof String) > {code} > null instanceof String returns false hence replaced the check with > {code:java} > if (value instanceof String) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7967) Deprecate Hadoop specific Flink configuration options
[ https://issues.apache.org/jira/browse/FLINK-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7967: -- Fix Version/s: 1.5.0 > Deprecate Hadoop specific Flink configuration options > - > > Key: FLINK-7967 > URL: https://issues.apache.org/jira/browse/FLINK-7967 > Project: Flink > Issue Type: Improvement > Components: Configuration >Reporter: Till Rohrmann >Assignee: mingleizhang >Priority: Trivial > Fix For: 1.5.0 > > > I think we should deprecate the hadoop specific configuration options from > Flink and encourage people to use instead the environment variable > {{HADOOP_CONF_DIR}} to configure the Hadoop configuration directory. This > includes: > {code} > fs.hdfs.hdfsdefault > fs.hdfs.hdfssite > fs.hdfs.hadoopconf > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8126) Update and fix checkstyle
[ https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8126: -- Fix Version/s: 1.4.0 > Update and fix checkstyle > - > > Key: FLINK-8126 > URL: https://issues.apache.org/jira/browse/FLINK-8126 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.4.0, 1.5.0 > > > Our current checkstyle configuration (checkstyle version 6.19) is missing > some ImportOrder and variable naming errors which are detected in 1) IntelliJ > using the same checkstyle version and 2) with the maven-checkstyle-plugin > with an up-to-date checkstyle version (8.4). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-8126) Update and fix checkstyle
[ https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-8126. - Resolution: Fixed 1.4: 4eae418b410c928b8e4b7893c1f5b9c48a5e3228 > Update and fix checkstyle > - > > Key: FLINK-8126 > URL: https://issues.apache.org/jira/browse/FLINK-8126 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.4.0, 1.5.0 > > > Our current checkstyle configuration (checkstyle version 6.19) is missing > some ImportOrder and variable naming errors which are detected in 1) IntelliJ > using the same checkstyle version and 2) with the maven-checkstyle-plugin > with an up-to-date checkstyle version (8.4). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (FLINK-8126) Update and fix checkstyle
[ https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan reopened FLINK-8126: --- > Update and fix checkstyle > - > > Key: FLINK-8126 > URL: https://issues.apache.org/jira/browse/FLINK-8126 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.5.0 > > > Our current checkstyle configuration (checkstyle version 6.19) is missing > some ImportOrder and variable naming errors which are detected in 1) IntelliJ > using the same checkstyle version and 2) with the maven-checkstyle-plugin > with an up-to-date checkstyle version (8.4). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7642) Upgrade maven surefire plugin to 2.19.1
[ https://issues.apache.org/jira/browse/FLINK-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267037#comment-16267037 ] Greg Hogan commented on FLINK-7642: --- Please stop adding and removing a blank line from the description every few days. Does 2.20.1 resolve SUREFIRE-1255? From the parent {{pom.xml}}: {noformat} 2.18.1 {noformat} > Upgrade maven surefire plugin to 2.19.1 > --- > > Key: FLINK-7642 > URL: https://issues.apache.org/jira/browse/FLINK-7642 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu > > Surefire 2.19 release introduced more useful test filters which would let us > run a subset of the test. > This issue is for upgrading maven surefire plugin to 2.19.1 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8036) Consider using gradle to build Flink
[ https://issues.apache.org/jira/browse/FLINK-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261371#comment-16261371 ] Greg Hogan commented on FLINK-8036: --- Are many developers running tests from the command line or using an IDE for incremental compilation and selectively running tests? > Consider using gradle to build Flink > > > Key: FLINK-8036 > URL: https://issues.apache.org/jira/browse/FLINK-8036 > Project: Flink > Issue Type: Improvement >Reporter: Ted Yu > > Here is summary from Lukasz over this thread > (http://search-hadoop.com/m/Beam/gfKHFVh4NM151XIu1?subj=Re+DISCUSS+Move+away+from+Apache+Maven+as+build+tool) > w.r.t. performance boost from using gradle: > Maven performs parallelization at the module level, an entire module needs > to complete before any dependent modules can start, this means running all > the checks like findbugs, checkstyle, tests need to finish. Gradle has task > level parallelism between subprojects which means that as soon as the > compile and shade steps are done for a project, and dependent subprojects > can typically start. This means that we get increased parallelism due to > not needing to wait for findbugs, checkstyle, tests to run. I typically see > ~20 tasks (at peak) running on my desktop in parallel. > Flink should consider using gradle - on Linux with SSD, a clean build takes > an hour. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8126) Update and fix checkstyle
[ https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-8126: -- Component/s: Build System > Update and fix checkstyle > - > > Key: FLINK-8126 > URL: https://issues.apache.org/jira/browse/FLINK-8126 > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.5.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.5.0 > > > Our current checkstyle configuration (checkstyle version 6.19) is missing > some ImportOrder and variable naming errors which are detected in 1) IntelliJ > using the same checkstyle version and 2) with the maven-checkstyle-plugin > with an up-to-date checkstyle version (8.4). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8126) Update and fix checkstyle
Greg Hogan created FLINK-8126: - Summary: Update and fix checkstyle Key: FLINK-8126 URL: https://issues.apache.org/jira/browse/FLINK-8126 Project: Flink Issue Type: Bug Affects Versions: 1.5.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial Fix For: 1.5.0 Our current checkstyle configuration (checkstyle version 6.19) is missing some ImportOrder and variable naming errors which are detected in 1) IntelliJ using the same checkstyle version and 2) with the maven-checkstyle-plugin with an up-to-date checkstyle version (8.4). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7777) Bump japicmp to 0.11.0
[ https://issues.apache.org/jira/browse/FLINK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255935#comment-16255935 ] Greg Hogan commented on FLINK-: --- Running japicmp v0.11.0 against Flink API v1.3.2 I get only the following error: {{Breaking the build because there is at least one incompatibility: org.apache.flink.core.fs.FileSystem.getKind():METHOD_REMOVED}}, which is already filed as a blocking bug in FLINK-8083. > Bump japicmp to 0.11.0 > -- > > Key: FLINK- > URL: https://issues.apache.org/jira/browse/FLINK- > Project: Flink > Issue Type: Bug > Components: Build System >Affects Versions: 1.3.2 >Reporter: Hai Zhou UTC+8 >Assignee: Hai Zhou UTC+8 >Priority: Minor > Fix For: 1.5.0 > > > Currently, flink used japicmp-maven-plugin version is 0.7.0, I'm getting > these warnings from the maven plugin during a *mvn clean verify*: > {code:java} > [INFO] Written file '.../target/japicmp/japicmp.diff'. > [INFO] Written file '.../target/japicmp/japicmp.xml'. > [INFO] Written file '.../target/japicmp/japicmp.html'. > Warning: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property > 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not > recognized. > Compiler warnings: > WARNING: 'org.apache.xerces.jaxp.SAXParserImpl: Property > 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.' > Warning: org.apache.xerces.parsers.SAXParser: Feature > 'http://javax.xml.XMLConstants/feature/secure-processing' is not recognized. > Warning: org.apache.xerces.parsers.SAXParser: Property > 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized. > Warning: org.apache.xerces.parsers.SAXParser: Property > 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not > recognized. > {code} > japicmp fixed in version 0.7.1 : _Excluded xerces vom maven-reporting > dependency in order to prevent warnings from SAXParserImpl. _ > The current stable version is 0.11.0, we can consider upgrading to this > version. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8041) Recommended Improvements for Gelly's Connected Components Algorithm
[ https://issues.apache.org/jira/browse/FLINK-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253784#comment-16253784 ] Greg Hogan commented on FLINK-8041: --- Currently the vertex value can be any implementation of {{Comparable}}, not just {{Long}}. I can see the benefit to assigning vertices a label as with {{DataSetUtils#zipWithUniqueId}}. There looks to be a cost or limit on scalability with the second proposal, but it would be great to start a discussion on a PR. > Recommended Improvements for Gelly's Connected Components Algorithm > --- > > Key: FLINK-8041 > URL: https://issues.apache.org/jira/browse/FLINK-8041 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.3.2 > Environment: Linux, IntelliJ IDEA >Reporter: Christos Hadjinikolis >Priority: Minor > Fix For: 1.4.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > At the moment, the ConnectedComponents algorithm that comes with Flink's > native Graph API (Gelly) has two issues: > 1. It relies on the user to provide correct values for in the vertices > DataSet. Based on how the algorithm works, these values must be of type Long > and be unique for every vertex. If the user provides the same values for > every vertex (e.g. 1) the algorithm still works but as those values are used > for the identification of the different connected components, one will end up > with a single connected component and will have no clue as to why this > happened. This can be easily fixed in two ways: either by checking that the > values that appear alongside vertex-ids are unique and informing the user if > not, or by generating those values for every vertex before the algorithm is > ran. I have a running implementation of the second way and I really think it > is an appropriate solution to this problem. > 2. Once the connected components are identified, one has to apply additional > transformations and actions to find out which is the biggest or the order of > the connected components in terms of their size. Alternatively, the algorithm > can be implemented so that numerical ids that are given to each component > reflect their ranking when ordered based on size, e.g. connected component 1 > will be the biggest, connected component 2 should be the second biggest and > so on. I have also solved this and I think it would make a nice addition. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8048) RAT check complaint
[ https://issues.apache.org/jira/browse/FLINK-8048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253683#comment-16253683 ] Greg Hogan commented on FLINK-8048: --- [~tedyu] {{mvn rat:check}} is from a different plugin and {{mvn apache-rat:check}} matches Flink's usage. > RAT check complaint > --- > > Key: FLINK-8048 > URL: https://issues.apache.org/jira/browse/FLINK-8048 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu > > Running {{mvn rat:check}} gives warning about the following files: > test-infra/end-to-end-test/test-data/words > .editorconfig > .github/PULL_REQUEST_TEMPLATE.md > .github/CONTRIBUTING.md -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7919) Join with Solution Set fails with NPE if Solution Set has no entry
[ https://issues.apache.org/jira/browse/FLINK-7919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246487#comment-16246487 ] Greg Hogan commented on FLINK-7919: --- [~fhueske] would an alternative be to use inner join semantics (dropping the unmatched element) and potentially support outer joins in the future? > Join with Solution Set fails with NPE if Solution Set has no entry > -- > > Key: FLINK-7919 > URL: https://issues.apache.org/jira/browse/FLINK-7919 > Project: Flink > Issue Type: Bug > Components: DataSet API, Local Runtime >Affects Versions: 1.4.0, 1.3.2 >Reporter: Fabian Hueske > > A job with a delta iteration fails hard with a NPE in the solution set join, > if the solution set has no entry for the join key of the probe side. > The following program reproduces the problem: > {code} > DataSet> values = env.fromElements( > Tuple2.of(1L, 1), Tuple2.of(2L, 1), Tuple2.of(3L, 1)); > DeltaIteration, Tuple2> di = values > .iterateDelta(values, 5,0); > DataSet> loop = di.getWorkset() > .map(new MapFunction, Tuple2>() { > @Override > public Tuple2 map(Tuple2 value) throws > Exception { > // modifying the key to join on a non existing solution set key > return Tuple2.of(value.f0 + 1, 1); > } > }) > .join(di.getSolutionSet()).where(0).equalTo(0) > .with(new JoinFunction, Tuple2, > Tuple2>() { > @Override > public Tuple2 join( > Tuple2 first, > Tuple2 second) throws Exception { > > return Tuple2.of(first.f0, first.f1 + second.f1); > } > }); > DataSet> result = di.closeWith(loop, loop); > result.print(); > {code} > It doesn't matter whether the solution set is managed or not. > The problem is cause because the solution set hash table prober returns a > {{null}} value if the solution set does not contain a value for the probe > side key. > The join operator does not check if the return value is {{null}} or not but > immediately tries to create a copy using a {{TypeSerializer}}. This copy > fails with a NPE. > I propose to check for {{null}} and call the join function with {{null}} on > the solution set side. This gives OUTER JOIN semantics for join. > Since the code was previously failing with a NPE, it is safe to forward the > {{null}} into the {{JoinFunction}}. > However, users must be aware that the solution set value may be {{null}} and > we need to update the documentation (JavaDocs + website) to describe the > behavior. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-2973) Add flink-benchmark with compliant licenses again
[ https://issues.apache.org/jira/browse/FLINK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246432#comment-16246432 ] Greg Hogan commented on FLINK-2973: --- [~fhueske] it looks to be GPLv2 (with classpath exception): http://hg.openjdk.java.net/jdk9/jdk9/file/a08cbfc0e4ec/LICENSE > Add flink-benchmark with compliant licenses again > - > > Key: FLINK-2973 > URL: https://issues.apache.org/jira/browse/FLINK-2973 > Project: Flink > Issue Type: Task > Components: Build System >Affects Versions: 1.0.0 >Reporter: Fabian Hueske >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.0.0 > > > We recently created the Maven module {{flink-benchmark}} for micro-benchmarks > and ported most of the existing micro-benchmarks to the Java benchmarking > framework JMH. However, JMH is part of OpenJDK and under GPL license which is > not compatible with the AL2. > Consequently, we need to remove this dependency and either revert the porting > commits or port the benchmarks to another benchmarking framework. An > alternative could be [Google's Caliper|https://github.com/google/caliper] > library which is under AL2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7679) Upgrade maven enforcer plugin to 3.0.0-M1
[ https://issues.apache.org/jira/browse/FLINK-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245711#comment-16245711 ] Greg Hogan commented on FLINK-7679: --- So it looks like Scala 2.12+ is required for Java 9. > Upgrade maven enforcer plugin to 3.0.0-M1 > - > > Key: FLINK-7679 > URL: https://issues.apache.org/jira/browse/FLINK-7679 > Project: Flink > Issue Type: Sub-task > Components: Build System >Reporter: Ted Yu >Assignee: Hai Zhou UTC+8 > > I got the following build error against Java 9: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (enforce-maven) > on project flink-parent: Execution enforce-maven of goal > org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: An API > incompatibility was encountered while executing > org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce: > java.lang.ExceptionInInitializerError: null > [ERROR] - > [ERROR] realm =plugin>org.apache.maven.plugins:maven-enforcer-plugin:1.4.1 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/home/hbase/.m2/repository/org/apache/maven/plugins/maven-enforcer-plugin/1.4.1/maven-enforcer-plugin-1.4.1.jar > {code} > Upgrading maven enforcer plugin to 3.0.0-M1 would get over the above error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (FLINK-8033) Build Flink with JDK 9
[ https://issues.apache.org/jira/browse/FLINK-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244288#comment-16244288 ] Greg Hogan edited comment on FLINK-8033 at 11/8/17 4:54 PM: With Java 8 we had a flink-java8 package which added support for lambdas (, etc.?). Are there similar new features in Java 9 or are we simply looking to support compiling and running Flink with Java 9? was (Author: greghogan): Does "support" imply more than compiling and running Flink with Java 9? > Build Flink with JDK 9 > -- > > Key: FLINK-8033 > URL: https://issues.apache.org/jira/browse/FLINK-8033 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.4.0 >Reporter: Hai Zhou UTC+8 > Fix For: 1.5.0 > > > This is a JIRA to track all issues that found to make Flink compatible with > Java 9. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7679) Upgrade maven enforcer plugin to 3.0.0-M1
[ https://issues.apache.org/jira/browse/FLINK-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244295#comment-16244295 ] Greg Hogan commented on FLINK-7679: --- [~yew1eb] do Flink tests pass with Java 8 and maven enforcer plugin to 3.0.0-M1? > Upgrade maven enforcer plugin to 3.0.0-M1 > - > > Key: FLINK-7679 > URL: https://issues.apache.org/jira/browse/FLINK-7679 > Project: Flink > Issue Type: Sub-task > Components: Build System >Reporter: Ted Yu >Assignee: Hai Zhou UTC+8 > > I got the following build error against Java 9: > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (enforce-maven) > on project flink-parent: Execution enforce-maven of goal > org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: An API > incompatibility was encountered while executing > org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce: > java.lang.ExceptionInInitializerError: null > [ERROR] - > [ERROR] realm =plugin>org.apache.maven.plugins:maven-enforcer-plugin:1.4.1 > [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy > [ERROR] urls[0] = > file:/home/hbase/.m2/repository/org/apache/maven/plugins/maven-enforcer-plugin/1.4.1/maven-enforcer-plugin-1.4.1.jar > {code} > Upgrading maven enforcer plugin to 3.0.0-M1 would get over the above error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8033) Build Flink with JDK 9
[ https://issues.apache.org/jira/browse/FLINK-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244288#comment-16244288 ] Greg Hogan commented on FLINK-8033: --- Does "support" imply more than compiling and running Flink with Java 9? > Build Flink with JDK 9 > -- > > Key: FLINK-8033 > URL: https://issues.apache.org/jira/browse/FLINK-8033 > Project: Flink > Issue Type: Improvement > Components: Build System >Affects Versions: 1.4.0 >Reporter: Hai Zhou UTC+8 > Fix For: 1.5.0 > > > This is a JIRA to track all issues that found to support Flink on Java 9 in > the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8020) Deadlock found in Flink Streaming job
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244126#comment-16244126 ] Greg Hogan commented on FLINK-8020: --- [~whjiang] can this be closed as "not a problem"? > Deadlock found in Flink Streaming job > - > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Attachments: jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7961) Docker-Flink with Docker Swarm doesn't work when machines are in different clouds
[ https://issues.apache.org/jira/browse/FLINK-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7961: -- Description: Task Managers can't find Job Manager by name. Maybe some additional Docker configuration is needed? I am running the standard setup and create-docker-swarm-service.sh script from the Docker Flink project: https://github.com/apache/flink/blob/master/flink-contrib/docker-flink/create-docker-swarm-service.sh This is the log from one of the Task Manager's containers: {noformat} Starting Task Manager config file: jobmanager.rpc.address: flink-jobmanager jobmanager.rpc.port: 6123 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 1024 taskmanager.numberOfTaskSlots: 2 taskmanager.memory.preallocate: false parallelism.default: 1 jobmanager.web.port: 8081 blob.server.port: 6124 query.server.port: 6125 Starting taskmanager as a console application on host c42a6093f7bb. 2017-11-01 11:20:51,459 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager (Version: 1.3.2, Rev:0399bee, Date:03.08.2017 @ 10:23:11 UTC) 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Current user: flink 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.141-b15 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum heap size: 1024 MiBytes 2017-11-01 11:20:51,522 INFO org.apache.flink.runtime.taskmanager.TaskManager - JAVA_HOME: /docker-java-home/jre 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - Hadoop version: 2.7.2 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - JVM Options: 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:+UseG1GC 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xms1024M 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Xmx1024M 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -XX:MaxDirectMemorySize=8388607T 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml 2017-11-01 11:20:51,526 INFO org.apache.flink.runtime.taskmanager.TaskManager - Program Arguments: 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - --configDir 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - /opt/flink/conf 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - Classpath: /opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar::: 2017-11-01 11:20:51,527 INFO org.apache.flink.runtime.taskmanager.TaskManager - 2017-11-01 11:20:51,528 INFO org.apache.flink.runtime.taskmanager.TaskManager - Registered UNIX signal handlers for [TERM, HUP, INT] 2017-11-01 11:20:51,532 INFO org.apache.flink.runtime.taskmanager.TaskManager - Maximum number of open file descriptors is 1048576 2017-11-01 11:20:51,548 INFO org.apache.flink.runtime.taskmanager.TaskManager - Loading configuration from /opt/flink/conf 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: jobmanager.rpc.address, flink-jobmanager 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: jobmanager.rpc.port, 6123 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: jobmanager.heap.mb, 1024 2017-11-01 11:20:51,551 INFO org.apache.flink.configuration.GlobalConfiguration- Loading configuration property: taskmanager.heap.mb, 1024 2017-
[jira] [Commented] (FLINK-7947) Let ParameterTool return a dedicated GlobalJobParameters object
[ https://issues.apache.org/jira/browse/FLINK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236479#comment-16236479 ] Greg Hogan commented on FLINK-7947: --- Do these changes necessitate modifying the {{@Public}} API? > Let ParameterTool return a dedicated GlobalJobParameters object > --- > > Key: FLINK-7947 > URL: https://issues.apache.org/jira/browse/FLINK-7947 > Project: Flink > Issue Type: Improvement > Components: Client >Affects Versions: 1.4.0 >Reporter: Till Rohrmann >Assignee: Bowen Li >Priority: Major > > The {{ParameterTool}} directly implements the {{GlobalJobParameters}} > interface. Additionally it has grown over time to not only store the > configuration parameters but also to record which parameters have been > requested and what default value was set. This information is irrelevant on > the server side when setting a {{GlobalJobParameters}} object via > {{ExecutionConfig#setGlobalJobParameters}}. > Since we don't separate the {{ParameterTool}} logic and the actual data view, > users ran into problems when reusing the same {{ParameterTool}} to start > multiple jobs concurrently (see FLINK-7943). I think it would be a much > clearer separation of concerns if we would actually split the > {{GlobalJobParameters}} from the {{ParameterTool}}. > Furthermore, we should think about whether {{ParameterTool#get}} should have > side effects or not as it does right now. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7921) Flink downloads link redirect to spark downloads page
[ https://issues.apache.org/jira/browse/FLINK-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234714#comment-16234714 ] Greg Hogan commented on FLINK-7921: --- [~anil.kumar] are you still seeing this issue in your browser? > Flink downloads link redirect to spark downloads page > - > > Key: FLINK-7921 > URL: https://issues.apache.org/jira/browse/FLINK-7921 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.2 >Reporter: Anil Kumar >Priority: Major > > On the Quickstart > [page|https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/setup_quickstart.html] > of flink, there`s a download page link under *Download and Unpack tab* which > is redirecting to spark downloads page instead of flink. > Issue : redirection is happening to port 80(http) instead of port 443(https). > Once the download link is changed to https, it works fine. > Current download link : http://flink.apache.org/downloads.html > Required link : https://flink.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-2973) Add flink-benchmark with compliant licenses again
[ https://issues.apache.org/jira/browse/FLINK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193521#comment-16193521 ] Greg Hogan commented on FLINK-2973: --- [~fhueske] [~rmetzger] can we not include this as an optional (or unlisted) module in the same manner as {{flink-connector-kinesis}}? Both are restricted by dependence on a Category X license (GPL and ASL, respectively). > Add flink-benchmark with compliant licenses again > - > > Key: FLINK-2973 > URL: https://issues.apache.org/jira/browse/FLINK-2973 > Project: Flink > Issue Type: Task > Components: Build System >Affects Versions: 1.0.0 >Reporter: Fabian Hueske >Assignee: Suneel Marthi >Priority: Minor > Fix For: 1.0.0 > > > We recently created the Maven module {{flink-benchmark}} for micro-benchmarks > and ported most of the existing micro-benchmarks to the Java benchmarking > framework JMH. However, JMH is part of OpenJDK and under GPL license which is > not compatible with the AL2. > Consequently, we need to remove this dependency and either revert the porting > commits or port the benchmarks to another benchmarking framework. An > alternative could be [Google's Caliper|https://github.com/google/caliper] > library which is under AL2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7687) Clarify the master and slaves files are not necessary unless using the cluster start/stop scripts
[ https://issues.apache.org/jira/browse/FLINK-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191504#comment-16191504 ] Greg Hogan commented on FLINK-7687: --- The quoted text is in the section "Standalone Cluster High Availability". > Clarify the master and slaves files are not necessary unless using the > cluster start/stop scripts > - > > Key: FLINK-7687 > URL: https://issues.apache.org/jira/browse/FLINK-7687 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.2 >Reporter: Elias Levy >Priority: Minor > > It would be helpful if the documentation was clearer on the fact that the > master/slaves config files are not needed when configured in > high-availability mode unless you are using the provided scripts to start and > shutdown the cluster over SSH. If you are using some other mechanism to > manage Flink instances (configuration management tools such as Chef or > Ansible, or container management frameworks like Docker Compose or > Kubernetes), these files are unnecessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7687) Clarify the master and slaves files are not necessary unless using the cluster start/stop scripts
[ https://issues.apache.org/jira/browse/FLINK-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189845#comment-16189845 ] Greg Hogan commented on FLINK-7687: --- [~elevy] is there a specific page or pages where this is unclear? > Clarify the master and slaves files are not necessary unless using the > cluster start/stop scripts > - > > Key: FLINK-7687 > URL: https://issues.apache.org/jira/browse/FLINK-7687 > Project: Flink > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.3.2 >Reporter: Elias Levy >Priority: Minor > > It would be helpful if the documentation was clearer on the fact that the > master/slaves config files are not needed when configured in > high-availability mode unless you are using the provided scripts to start and > shutdown the cluster over SSH. If you are using some other mechanism to > manage Flink instances (configuration management tools such as Chef or > Ansible, or container management frameworks like Docker Compose or > Kubernetes), these files are unnecessary. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7631) Ineffective check in PageRank#open()
[ https://issues.apache.org/jira/browse/FLINK-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189835#comment-16189835 ] Greg Hogan commented on FLINK-7631: --- When `vertexCount` is zero the map function will not be called. > Ineffective check in PageRank#open() > > > Key: FLINK-7631 > URL: https://issues.apache.org/jira/browse/FLINK-7631 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > From > flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/linkanalysis/PageRank.java > : > {code} > this.vertexCount = vertexCountIterator.hasNext() ? > vertexCountIterator.next().getValue() : 0; > this.uniformlyDistributedScore = ((1 - dampingFactor) + dampingFactor * > sumOfSinks) / this.vertexCount; > {code} > The check for vertexCountIterator.hasNext() should enclose the assignments to > both this.vertexCount and this.uniformlyDistributedScore > Otherwise there may be divide by zero error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7631) Ineffective check in PageRank#open()
[ https://issues.apache.org/jira/browse/FLINK-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169958#comment-16169958 ] Greg Hogan commented on FLINK-7631: --- Divide by zero is only an error for fixed-point arithmetic and here the numerator is a double. > Ineffective check in PageRank#open() > > > Key: FLINK-7631 > URL: https://issues.apache.org/jira/browse/FLINK-7631 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > From > flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/linkanalysis/PageRank.java > : > {code} > this.vertexCount = vertexCountIterator.hasNext() ? > vertexCountIterator.next().getValue() : 0; > this.uniformlyDistributedScore = ((1 - dampingFactor) + dampingFactor * > sumOfSinks) / this.vertexCount; > {code} > The check for vertexCountIterator.hasNext() should enclose the assignments to > both this.vertexCount and this.uniformlyDistributedScore > Otherwise there may be divide by zero error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7402) Ineffective null check in NettyMessage#write()
[ https://issues.apache.org/jira/browse/FLINK-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7402. - Resolution: Implemented master: 312e0853483d1aa5ca182827890c570c44ca6a37 > Ineffective null check in NettyMessage#write() > -- > > Key: FLINK-7402 > URL: https://issues.apache.org/jira/browse/FLINK-7402 > Project: Flink > Issue Type: Bug > Components: Network >Reporter: Ted Yu >Priority: Minor > Fix For: 1.4.0 > > > Here is the null check in finally block: > {code} > finally { > if (buffer != null) { > buffer.recycle(); > } > {code} > But buffer has been dereferenced in the try block without guard. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7402) Ineffective null check in NettyMessage#write()
[ https://issues.apache.org/jira/browse/FLINK-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7402: -- Fix Version/s: 1.4.0 > Ineffective null check in NettyMessage#write() > -- > > Key: FLINK-7402 > URL: https://issues.apache.org/jira/browse/FLINK-7402 > Project: Flink > Issue Type: Bug > Components: Network >Reporter: Ted Yu >Priority: Minor > Fix For: 1.4.0 > > > Here is the null check in finally block: > {code} > finally { > if (buffer != null) { > buffer.recycle(); > } > {code} > But buffer has been dereferenced in the try block without guard. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7273) Gelly tests with empty graphs
[ https://issues.apache.org/jira/browse/FLINK-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7273. - Resolution: Fixed master: 9437a0ffc04318f6a1a2d19c59f2ae6651b26507 > Gelly tests with empty graphs > - > > Key: FLINK-7273 > URL: https://issues.apache.org/jira/browse/FLINK-7273 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.4.0 > > > There exist some tests with empty graphs but the `EmptyGraph` in > `AsmTestBase` contained vertices but no edges. Add a new `EmptyGraph` without > vertices and test both empty graphs for each algorithm. > `PageRank` should (optionally?) include zero-degree vertices in the results. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7199) Graph simplification does not set parallelism
[ https://issues.apache.org/jira/browse/FLINK-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7199: -- Fix Version/s: 1.4.0 > Graph simplification does not set parallelism > - > > Key: FLINK-7199 > URL: https://issues.apache.org/jira/browse/FLINK-7199 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.3.1, 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.4.0 > > > The {{Simplify}} parameter should accept and set the parallelism when calling > the {{Simplify}} algorithms. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7199) Graph simplification does not set parallelism
[ https://issues.apache.org/jira/browse/FLINK-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7199. - Resolution: Fixed master: 2ac09c084320f1803c623c719dca8b4776c8110f > Graph simplification does not set parallelism > - > > Key: FLINK-7199 > URL: https://issues.apache.org/jira/browse/FLINK-7199 > Project: Flink > Issue Type: Bug > Components: Gelly >Affects Versions: 1.3.1, 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.4.0 > > > The {{Simplify}} parameter should accept and set the parallelism when calling > the {{Simplify}} algorithms. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7447) Hope add more committer information to "Community & Project Info" page.
[ https://issues.apache.org/jira/browse/FLINK-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167976#comment-16167976 ] Greg Hogan commented on FLINK-7447: --- The timezone is of little benefit for real-time communication without knowing a developer's availability regarding schedule, holidays, work tasks, etc. If timezones imply an obligation or expectation that committer's are reachable at certain times then we should continue to simply provide committer emails (@apache.org) and direct users and developers to the mailing lists. > Hope add more committer information to "Community & Project Info" page. > --- > > Key: FLINK-7447 > URL: https://issues.apache.org/jira/browse/FLINK-7447 > Project: Flink > Issue Type: Wish > Components: Project Website >Reporter: Hai Zhou > > I wish add the "organization" and "time zone" information to committer > introduction, while using the mail instead of Apache ID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7482) StringWriter to support compression
[ https://issues.apache.org/jira/browse/FLINK-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7482. - Resolution: Not A Bug Hi [~felixcheung]. This is a great question for the dev or user [mailing lists|https://flink.apache.org/community.html#mailing-lists]. > StringWriter to support compression > --- > > Key: FLINK-7482 > URL: https://issues.apache.org/jira/browse/FLINK-7482 > Project: Flink > Issue Type: Bug > Components: filesystem-connector >Affects Versions: 1.3.2 >Reporter: Felix Cheung > > Is it possible to have StringWriter support compression like > AvroKeyValueSinkWriter or SequenceFileWriter? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7447) Hope add more committer information to "Community & Project Info" page.
[ https://issues.apache.org/jira/browse/FLINK-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127565#comment-16127565 ] Greg Hogan commented on FLINK-7447: --- I can see that listing organizations may be beneficial but this looks to run counter to the principle that [individuals compose the ASF|http://www.apache.org/foundation/how-it-works.html#hats]. dataArtisians lists their [people|https://data-artisans.com/about] and other affiliations are often evident from the mailing list. I'm not seeing much benefit to listing time zones, especially given the volunteer nature of the project. > Hope add more committer information to "Community & Project Info" page. > --- > > Key: FLINK-7447 > URL: https://issues.apache.org/jira/browse/FLINK-7447 > Project: Flink > Issue Type: Wish > Components: Project Website >Reporter: Hai Zhou > > I wish add the "organization" and "time zone" information to committer > introduction, while using the mail instead of Apache ID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7447) Hope add more committer information to "Community & Project Info" page.
[ https://issues.apache.org/jira/browse/FLINK-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7447: -- Component/s: (was: Documentation) Project Website > Hope add more committer information to "Community & Project Info" page. > --- > > Key: FLINK-7447 > URL: https://issues.apache.org/jira/browse/FLINK-7447 > Project: Flink > Issue Type: Wish > Components: Project Website >Reporter: Hai Zhou > > I wish add the "organization" and "time zone" information to committer > introduction, while using the mail instead of Apache ID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7296) Validate commit messages in git pre-receive hook
[ https://issues.apache.org/jira/browse/FLINK-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105217#comment-16105217 ] Greg Hogan commented on FLINK-7296: --- [~uce] I'm trying to think through where this should run. On the client side committers may not have (pre-push?) hook configured on a repo so changes could slip through. GitHub doesn't allow server-side hooks (instead sending out webhooks notifications). Apache Infra may allow a server-side hook but would require a ticket to update (and if ever switching to GitBox would require committers to continue pushing to the Apache-hosted repo). > Validate commit messages in git pre-receive hook > > > Key: FLINK-7296 > URL: https://issues.apache.org/jira/browse/FLINK-7296 > Project: Flink > Issue Type: Improvement >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > Would like to investigate a pre-receive (server-side) hook analyzing the > commit message incoming revisions on the {{master}} branch for the standard > JIRA format ({{\[FLINK-\] \[component\] ...}} or {{\[hotfix\] > \[component\] ...}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7296) Validate commit messages in git pre-receive hook
[ https://issues.apache.org/jira/browse/FLINK-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7296: -- Description: Would like to investigate a pre-receive (server-side) hook analyzing the commit message incoming revisions on the {{master}} branch for the standard JIRA format ({{\[FLINK-\] \[component\] ...}} or {{\[hotfix\] \[component\] ...}}). (was: Would like to investigate a pre-receive (server-side) hook analyzing the commit message incoming revisions on the {{master}} branch for the standard JIRA format ({{\[FLINK-\] \[module\] ...}}).) > Validate commit messages in git pre-receive hook > > > Key: FLINK-7296 > URL: https://issues.apache.org/jira/browse/FLINK-7296 > Project: Flink > Issue Type: Improvement >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > Would like to investigate a pre-receive (server-side) hook analyzing the > commit message incoming revisions on the {{master}} branch for the standard > JIRA format ({{\[FLINK-\] \[component\] ...}} or {{\[hotfix\] > \[component\] ...}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7296) Validate commit messages in git pre-receive hook
[ https://issues.apache.org/jira/browse/FLINK-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105161#comment-16105161 ] Greg Hogan commented on FLINK-7296: --- Yes, I will update the description. > Validate commit messages in git pre-receive hook > > > Key: FLINK-7296 > URL: https://issues.apache.org/jira/browse/FLINK-7296 > Project: Flink > Issue Type: Improvement >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > > Would like to investigate a pre-receive (server-side) hook analyzing the > commit message incoming revisions on the {{master}} branch for the standard > JIRA format ({{\[FLINK-\] \[module\] ...}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7296) Validate commit messages in git pre-receive hook
Greg Hogan created FLINK-7296: - Summary: Validate commit messages in git pre-receive hook Key: FLINK-7296 URL: https://issues.apache.org/jira/browse/FLINK-7296 Project: Flink Issue Type: Improvement Reporter: Greg Hogan Assignee: Greg Hogan Priority: Minor Would like to investigate a pre-receive (server-side) hook analyzing the commit message incoming revisions on the {{master}} branch for the standard JIRA format ({{\[FLINK-\] \[module\] ...}}). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7220) Update RocksDB dependency to 5.5.5
[ https://issues.apache.org/jira/browse/FLINK-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104898#comment-16104898 ] Greg Hogan commented on FLINK-7220: --- Ah, just a small thing. Unfortunate that we can't fix this but good to know what happened. > Update RocksDB dependency to 5.5.5 > -- > > Key: FLINK-7220 > URL: https://issues.apache.org/jira/browse/FLINK-7220 > Project: Flink > Issue Type: Improvement > Components: State Backends, Checkpointing >Affects Versions: 1.4.0, 1.3.2 >Reporter: Stefan Richter >Assignee: Stefan Richter > Fix For: 1.4.0 > > > The latest release of RocksDB (5.5.5) fixes the issues from previous versions > (slow merge performance, segfaults) in connection with Flink and seems stable > for us to use. We can move away from our custom FRocksDB build, back to the > latest release. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7277) Weighted PageRank
Greg Hogan created FLINK-7277: - Summary: Weighted PageRank Key: FLINK-7277 URL: https://issues.apache.org/jira/browse/FLINK-7277 Project: Flink Issue Type: New Feature Components: Gelly Affects Versions: 1.4.0 Reporter: Greg Hogan Assignee: Greg Hogan Add a weighted PageRank algorithm to complement the existing unweighted implementation. Edge values store a `double` weight value which is summed per vertex in place of the vertex degree. The vertex score is joined as the fraction of vertex weight rather than dividing by the vertex degree. The examples `Runner` must now read and generated weighted graphs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7276) Gelly algorithm parameters
Greg Hogan created FLINK-7276: - Summary: Gelly algorithm parameters Key: FLINK-7276 URL: https://issues.apache.org/jira/browse/FLINK-7276 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 1.4.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Minor Similar to the examples drivers, the algorithm configuration fields should be typed to handle `canMergeConfiguration` and `mergeConfiguration` in `GraphAlgorithmWrappingBase` rather than overriding these methods in each algorithm (which has proven brittle). The existing `OptionalBoolean` is one example. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7275) Differentiate between normal and power-user cli options in Gelly examples
Greg Hogan created FLINK-7275: - Summary: Differentiate between normal and power-user cli options in Gelly examples Key: FLINK-7275 URL: https://issues.apache.org/jira/browse/FLINK-7275 Project: Flink Issue Type: Improvement Components: Gelly Affects Versions: 1.4.0 Reporter: Greg Hogan Assignee: Greg Hogan The current "hack" is to preface "power-user" options with a double underscore (i.e. '__parallelism') which are then "hidden" by exclusion from the program usage documentation. Change this to instead be explicit in the {{Parameter}} API and provide a cli option to display "power-user" options. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6909) Flink should support Lombok POJO
[ https://issues.apache.org/jira/browse/FLINK-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101779#comment-16101779 ] Greg Hogan commented on FLINK-6909: --- Hi [~mdkamaruzzaman], judging by the extensive tooling provided at the Lombok website it looks like `TypeExtractor` would need to be modified to parse and interpret the Lombok annotations. The cost to do so is added complexity in a vital and already very complex portion of the core Flink architecture. The benefit of automatic POJO is available in any IDE and also using Lombok via the provided [delombok tool|https://projectlombok.org/features/delombok] which converts annotated classes to POJOs and can be integrated into one's project builds. > Flink should support Lombok POJO > > > Key: FLINK-6909 > URL: https://issues.apache.org/jira/browse/FLINK-6909 > Project: Flink > Issue Type: Wish > Components: Type Serialization System >Reporter: Md Kamaruzzaman >Priority: Minor > Fix For: 1.2.1 > > > Project lombok helps greatly to reduce boilerplate Java Code. > It seems that Flink does not accept a lombok POJO as a valid pojo. > e.g. Here is a POJO defined with lombok: > @Getter > @Setter > @NoArgsConstructor > public class SimplePojo > Using this Pojo class to read from CSV file throws this exception: > Exception in thread "main" java.lang.ClassCastException: > org.apache.flink.api.java.typeutils.GenericTypeInfo cannot be cast to > org.apache.flink.api.java.typeutils.PojoTypeInfo > It would be great if flink supports lombok POJO. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7273) Gelly tests with empty graphs
Greg Hogan created FLINK-7273: - Summary: Gelly tests with empty graphs Key: FLINK-7273 URL: https://issues.apache.org/jira/browse/FLINK-7273 Project: Flink Issue Type: Bug Components: Gelly Affects Versions: 1.4.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Minor Fix For: 1.4.0 There exist some tests with empty graphs but the `EmptyGraph` in `AsmTestBase` contained vertices but no edges. Add a new `EmptyGraph` without vertices and test both empty graphs for each algorithm. `PageRank` should (optionally?) include zero-degree vertices in the results. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7234) Fix CombineHint documentation
[ https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7234. - Resolution: Fixed master: 4a88f6587fdfadd5749188a76e6b38a3585cd31b release-1.3: d0a9fe013e0dccf7ff329aaedf085e8c1c133ae0 > Fix CombineHint documentation > - > > Key: FLINK-7234 > URL: https://issues.apache.org/jira/browse/FLINK-7234 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.2.2, 1.4.0, 1.3.2 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.4.0, 1.3.2 > > > The {{CombineHint}} > [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html] > applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be > note for {{DataSet#distinct}}. It is also set with > {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-6648) Transforms for Gelly examples
[ https://issues.apache.org/jira/browse/FLINK-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-6648. - Resolution: Implemented master: 8695a21098c06cd1cd727d24bca51db9971cb146 > Transforms for Gelly examples > - > > Key: FLINK-6648 > URL: https://issues.apache.org/jira/browse/FLINK-6648 > Project: Flink > Issue Type: Improvement > Components: Gelly >Affects Versions: 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.4.0 > > > A primary objective of the Gelly examples {{Runner}} is to make adding new > inputs and algorithms as simple and powerful as possible. A recent feature > made it possible to translate the key ID of generated graphs to alternative > numeric or string representations. For floating point and {{LongValue}} it is > desirable to translate the key ID of the algorithm results. > Currently a {{Runner}} job consists of an input, an algorithm, and an output. > A {{Transform}} will translate the input {{Graph}} and the algorithm output > {{DataSet}}. The {{Input}} and algorithm {{Driver}} will return an ordered > list of {{Transform}} which will be executed in that order (processed in > reverse order for algorithm output) . The {{Transform}} can be configured as > are inputs and drivers. > Example transforms: > - the aforementioned translation of key ID types > - surrogate types (String -> Long or Int) for user data > - FLINK-4481 Maximum results for pairwise algorithms > - FLINK-3625 Graph algorithms to permute graph labels and edges -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7234) Fix CombineHint documentation
[ https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7234: -- Fix Version/s: 1.3.2 1.4.0 > Fix CombineHint documentation > - > > Key: FLINK-7234 > URL: https://issues.apache.org/jira/browse/FLINK-7234 > Project: Flink > Issue Type: Bug > Components: Documentation >Affects Versions: 1.2.2, 1.4.0, 1.3.2 >Reporter: Greg Hogan >Assignee: Greg Hogan > Fix For: 1.4.0, 1.3.2 > > > The {{CombineHint}} > [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html] > applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be > note for {{DataSet#distinct}}. It is also set with > {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-7118) Remove hadoop1.x code in HadoopUtils
[ https://issues.apache.org/jira/browse/FLINK-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097409#comment-16097409 ] Greg Hogan commented on FLINK-7118: --- I'd recommend a fix version of 1.4 for this PR since we are only applying bug fixes (and documentation updates) to 1.3. > Remove hadoop1.x code in HadoopUtils > > > Key: FLINK-7118 > URL: https://issues.apache.org/jira/browse/FLINK-7118 > Project: Flink > Issue Type: Improvement > Components: Java API >Reporter: mingleizhang >Assignee: mingleizhang > Fix For: 1.3.2 > > > Since flink no longer support hadoop 1.x version, we should remove it. Below > code reside in {{org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils}} > > {code:java} > public static JobContext instantiateJobContext(Configuration configuration, > JobID jobId) throws Exception { > try { > Class clazz; > // for Hadoop 1.xx > if(JobContext.class.isInterface()) { > clazz = > Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl", true, > Thread.currentThread().getContextClassLoader()); > } > // for Hadoop 2.xx > else { > clazz = > Class.forName("org.apache.hadoop.mapreduce.JobContext", true, > Thread.currentThread().getContextClassLoader()); > } > Constructor constructor = > clazz.getConstructor(Configuration.class, JobID.class); > JobContext context = (JobContext) > constructor.newInstance(configuration, jobId); > > return context; > } catch(Exception e) { > throw new Exception("Could not create instance of > JobContext."); > } > } > {code} > And > {code:java} > public static TaskAttemptContext > instantiateTaskAttemptContext(Configuration configuration, TaskAttemptID > taskAttemptID) throws Exception { > try { > Class clazz; > // for Hadoop 1.xx > if(JobContext.class.isInterface()) { > clazz = > Class.forName("org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl"); > } > // for Hadoop 2.xx > else { > clazz = > Class.forName("org.apache.hadoop.mapreduce.TaskAttemptContext"); > } > Constructor constructor = > clazz.getConstructor(Configuration.class, TaskAttemptID.class); > TaskAttemptContext context = (TaskAttemptContext) > constructor.newInstance(configuration, taskAttemptID); > > return context; > } catch(Exception e) { > throw new Exception("Could not create instance of > TaskAttemptContext."); > } > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (FLINK-7204) CombineHint.NONE
[ https://issues.apache.org/jira/browse/FLINK-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan closed FLINK-7204. - Resolution: Implemented master: ce10e57bc163babd59005fa250c26e7604f23cf5 > CombineHint.NONE > > > Key: FLINK-7204 > URL: https://issues.apache.org/jira/browse/FLINK-7204 > Project: Flink > Issue Type: New Feature > Components: Core >Affects Versions: 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.4.0 > > > FLINK-3477 added a hash-combine preceding the reducer configured with > {{CombineHint.HASH}} or {{CombineHint.SORT}} (default). In some cases it may > be useful to disable the combiner in {{ReduceNode}} by specifying a new > {{CombineHint.NONE}} value. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-7204) CombineHint.NONE
[ https://issues.apache.org/jira/browse/FLINK-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Hogan updated FLINK-7204: -- Fix Version/s: 1.4.0 > CombineHint.NONE > > > Key: FLINK-7204 > URL: https://issues.apache.org/jira/browse/FLINK-7204 > Project: Flink > Issue Type: New Feature > Components: Core >Affects Versions: 1.4.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Minor > Fix For: 1.4.0 > > > FLINK-3477 added a hash-combine preceding the reducer configured with > {{CombineHint.HASH}} or {{CombineHint.SORT}} (default). In some cases it may > be useful to disable the combiner in {{ReduceNode}} by specifying a new > {{CombineHint.NONE}} value. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7234) Fix CombineHint documentation
Greg Hogan created FLINK-7234: - Summary: Fix CombineHint documentation Key: FLINK-7234 URL: https://issues.apache.org/jira/browse/FLINK-7234 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.2.2, 1.4.0, 1.3.2 Reporter: Greg Hogan Assignee: Greg Hogan The {{CombineHint}} [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html] applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be note for {{DataSet#distinct}}. It is also set with {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-7211) Exclude Gelly javadoc jar from release
Greg Hogan created FLINK-7211: - Summary: Exclude Gelly javadoc jar from release Key: FLINK-7211 URL: https://issues.apache.org/jira/browse/FLINK-7211 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.4.0, 1.3.2 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.4.14#64029)