from:"Greg Hogan \(JIRA\)"

[jira] [Commented] (FLINK-5243) Implement an example for BipartiteGraph

2019-11-18 Thread Greg Hogan (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976842#comment-16976842
 ] 

Greg Hogan commented on FLINK-5243:
---

Hi [~dinesh.b], is the proposed implementation only running SSSP on either of 
the top or bottom projections?

> Implement an example for BipartiteGraph
> ---
>
> Key: FLINK-5243
> URL: https://issues.apache.org/jira/browse/FLINK-5243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / Graph Processing (Gelly)
>Reporter: Ivan Mushketyk
>Priority: Major
>  Labels: beginner
>
> Should implement example for BipartiteGraph in gelly-examples project 
> similarly to examples for Graph class.
> Depends on this: https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-5243) Implement an example for BipartiteGraph

2019-05-26 Thread Greg Hogan (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16848548#comment-16848548
 ] 

Greg Hogan commented on FLINK-5243:
---

[~jasleenk22], new contributions are always welcomed! Do you have a distributed 
algorithm in mind to implement for Bipartite Matching?

> Implement an example for BipartiteGraph
> ---
>
> Key: FLINK-5243
> URL: https://issues.apache.org/jira/browse/FLINK-5243
> Project: Flink
>  Issue Type: Sub-task
>  Components: Library / Graph Processing (Gelly)
>Reporter: Ivan Mushketyk
>Priority: Major
>  Labels: beginner
>
> Should implement example for BipartiteGraph in gelly-examples project 
> similarly to examples for Graph class.
> Depends on this: https://issues.apache.org/jira/browse/FLINK-2254



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (FLINK-9577) Divide-by-zero in PageRank

2019-02-06 Thread Greg Hogan (JIRA)



 [ 
https://issues.apache.org/jira/browse/FLINK-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan resolved FLINK-9577.
---
Resolution: Not A Bug

We do test for this in `PageRankTest.testWithEmptyGraphWithoutVertices`.

Divide-by-zero is only an error for integer division and here were are 
operating as `double`.

> Divide-by-zero in PageRank
> --
>
> Key: FLINK-9577
> URL: https://issues.apache.org/jira/browse/FLINK-9577
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.4.0, 1.5.0, 1.6.0
>Reporter: Chesnay Schepler
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> // org.apache.flink.graph.library.linkanalysis.PageRank#AdjustScores#open
> this.vertexCount = vertexCountIterator.hasNext() ? 
> vertexCountIterator.next().getValue() : 0;
> this.uniformlyDistributedScore = ((1 - dampingFactor) + dampingFactor * 
> sumOfSinks) / this.vertexCount;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-10644) Batch Job: Speculative execution

2019-02-04 Thread Greg Hogan (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760004#comment-16760004
 ] 

Greg Hogan commented on FLINK-10644:


I'm not so sure that speculative execution is a good fit for Apache Flink. In 
MapReduce there are two concepts conducive to speculative execution not present 
in Flink:

1) In MapReduce map/reduce/task to mapper/reducer/container ratio is often 10:1 
or higher. In Flink all tasks are immediately assigned and processed in 
parallel.

2) In MapReduce intermediate and output data is always persisted whereas in 
Flink only state is persisted (and only in streaming). Input is assumed to be 
replayable but speculative execution would presumably also work for 
intermediate tasks.

 

As noted, Spark has included speculative execution and the Spark processing 
model is closer to Flink's. I'm just not clear on the circumstances where it is 
beneficial to start a catch-up task so late. I haven't followed the work on 
unification of batch and streaming but it seems more valuable to focus on 
transition a task from a straggler machine rather than start that task over.

> Batch Job: Speculative execution
> 
>
> Key: FLINK-10644
> URL: https://issues.apache.org/jira/browse/FLINK-10644
> Project: Flink
>  Issue Type: New Feature
>  Components: JobManager
>Reporter: JIN SUN
>Assignee: ryantaocer
>Priority: Major
> Fix For: 1.8.0
>
>
> Strugglers/outlier are tasks that run slower than most of the all tasks in a 
> Batch Job, this somehow impact job latency, as pretty much this straggler 
> will be in the critical path of the job and become as the bottleneck.
> Tasks may be slow for various reasons, including hardware degradation, or 
> software mis-configuration, or noise neighboring. It's hard for JM to predict 
> the runtime.
> To reduce the overhead of strugglers, other system such as Hadoop/Tez, Spark 
> has *_speculative execution_*. Speculative execution is a health-check 
> procedure that checks for tasks to be speculated, i.e. running slower in a 
> ExecutionJobVertex than the median of all successfully completed tasks in 
> that EJV, Such slow tasks will be re-submitted to another TM. It will not 
> stop the slow tasks, but run a new copy in parallel. And will kill the others 
> if one of them complete.
> This JIRA is an umbrella to apply this kind of idea in FLINK. Details will be 
> append later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9061) add entropy to s3 path for better scalability

2018-07-19 Thread Greg Hogan (JIRA)



[ 
https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549386#comment-16549386
 ] 

Greg Hogan commented on FLINK-9061:
---

Not that we shouldn't implement the general purpose solution but Amazon looks 
to have increased the PUT rate from 100 to 3500 and the GET rate from 300 to 
5500:

"This S3 request rate performance increase removes any previous guidance to 
randomize object prefixes to achieve faster performance. That means you can now 
use logical or sequential naming patterns in S3 object naming without any 
performance implications."

https://aws.amazon.com/about-aws/whats-new/2018/07/amazon-s3-announces-increased-request-rate-performance/

> add entropy to s3 path for better scalability
> -
>
> Key: FLINK-9061
> URL: https://issues.apache.org/jira/browse/FLINK-9061
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, State Backends, Checkpointing
>Affects Versions: 1.5.0, 1.4.2
>Reporter: Jamie Grier
>Assignee: Indrajit Roychoudhury
>Priority: Critical
>  Labels: pull-request-available
>
> I think we need to modify the way we write checkpoints to S3 for high-scale 
> jobs (those with many total tasks).  The issue is that we are writing all the 
> checkpoint data under a common key prefix.  This is the worst case scenario 
> for S3 performance since the key is used as a partition key.
>  
> In the worst case checkpoints fail with a 500 status code coming back from S3 
> and an internal error type of TooBusyException.
>  
> One possible solution would be to add a hook in the Flink filesystem code 
> that allows me to "rewrite" paths.  For example say I have the checkpoint 
> directory set to:
>  
> s3://bucket/flink/checkpoints
>  
> I would hook that and rewrite that path to:
>  
> s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original 
> path
>  
> This would distribute the checkpoint write load around the S3 cluster evenly.
>  
> For reference: 
> https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/
>  
> Any other people hit this issue?  Any other ideas for solutions?  This is a 
> pretty serious problem for people trying to checkpoint to S3.
>  
> -Jamie
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-04-02 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422890#comment-16422890
 ] 

Greg Hogan commented on FLINK-9061:
---

Since S3 key names are opaque it sounds like any "prefix" is counted for the 
initial entropy (so "c/" instead of "checkpoints/"?).

This doesn't seem very ops-friendly if various points from various jobs are now 
intermixed by hash.

> S3 checkpoint data not partitioned well -- causes errors and poor performance
> -
>
> Key: FLINK-9061
> URL: https://issues.apache.org/jira/browse/FLINK-9061
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, State Backends, Checkpointing
>Affects Versions: 1.4.2
>Reporter: Jamie Grier
>Priority: Critical
>
> I think we need to modify the way we write checkpoints to S3 for high-scale 
> jobs (those with many total tasks).  The issue is that we are writing all the 
> checkpoint data under a common key prefix.  This is the worst case scenario 
> for S3 performance since the key is used as a partition key.
>  
> In the worst case checkpoints fail with a 500 status code coming back from S3 
> and an internal error type of TooBusyException.
>  
> One possible solution would be to add a hook in the Flink filesystem code 
> that allows me to "rewrite" paths.  For example say I have the checkpoint 
> directory set to:
>  
> s3://bucket/flink/checkpoints
>  
> I would hook that and rewrite that path to:
>  
> s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original 
> path
>  
> This would distribute the checkpoint write load around the S3 cluster evenly.
>  
> For reference: 
> https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/
>  
> Any other people hit this issue?  Any other ideas for solutions?  This is a 
> pretty serious problem for people trying to checkpoint to S3.
>  
> -Jamie
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9061) S3 checkpoint data not partitioned well -- causes errors and poor performance

2018-03-28 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16417402#comment-16417402
 ] 

Greg Hogan commented on FLINK-9061:
---

[~ste...@apache.org], not sure why your 
[link|https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/]
 isn't showing in your comment. I went looking for more info and found 
[this|https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-tricks-seattle-hiring-event/]
 explanation for the need to put entropy in the short prefix of the object name:

"In fact, S3 even has an algorithm to detect this parallel type of write 
pattern and will automatically create multiple child partitions from the same 
parent simultaneously – increasing the system’s operations per second budget as 
request heat is detected."

> S3 checkpoint data not partitioned well -- causes errors and poor performance
> -
>
> Key: FLINK-9061
> URL: https://issues.apache.org/jira/browse/FLINK-9061
> Project: Flink
>  Issue Type: Bug
>  Components: FileSystem, State Backends, Checkpointing
>Affects Versions: 1.4.2
>Reporter: Jamie Grier
>Priority: Critical
>
> I think we need to modify the way we write checkpoints to S3 for high-scale 
> jobs (those with many total tasks).  The issue is that we are writing all the 
> checkpoint data under a common key prefix.  This is the worst case scenario 
> for S3 performance since the key is used as a partition key.
>  
> In the worst case checkpoints fail with a 500 status code coming back from S3 
> and an internal error type of TooBusyException.
>  
> One possible solution would be to add a hook in the Flink filesystem code 
> that allows me to "rewrite" paths.  For example say I have the checkpoint 
> directory set to:
>  
> s3://bucket/flink/checkpoints
>  
> I would hook that and rewrite that path to:
>  
> s3://bucket/[HASH]/flink/checkpoints, where HASH is the hash of the original 
> path
>  
> This would distribute the checkpoint write load around the S3 cluster evenly.
>  
> For reference: 
> https://aws.amazon.com/premiumsupport/knowledge-center/s3-bucket-performance-improve/
>  
> Any other people hit this issue?  Any other ideas for solutions?  This is a 
> pretty serious problem for people trying to checkpoint to S3.
>  
> -Jamie
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8809) Decrease maximum value of DirectMemory at default config

2018-03-27 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416137#comment-16416137
 ] 

Greg Hogan commented on FLINK-8809:
---

Will a restricted {{MaxDirectMemorySize}} prevent an OOME? If this is a 
production issue may want to document use of {{jdk.nio.maxCachedBufferSize}} 
which would prevent the JVM from retaining large "temporary" buffers (some 
discussion [here|http://www.evanjones.ca/java-bytebuffer-leak.html], and note 
that the option is available from 
[jdk8u102|http://www.oracle.com/technetwork/java/javase/8u102-relnotes-3021767.html]).

> Decrease maximum value of DirectMemory at default config
> 
>
> Key: FLINK-8809
> URL: https://issues.apache.org/jira/browse/FLINK-8809
> Project: Flink
>  Issue Type: Bug
>  Components: TaskManager
>Reporter: Kirill A. Korinskiy
>Priority: Major
>
> Good day!
>  
> Have I can see since this 
> [commit|https://github.com/apache/flink/commit/6c44d93d0a9da725ef8b1ad2a94889f79321db73]
>  TaskManager uses 8,388,607 terabytes as maximum out of heap memory. I guess 
> that not any system has so much memory and it may be a reason to kill java 
> process by OOM Killer.
>  
> I suggest to decrease this value to reasonable value by default.
>  
> Right now I see only one way to overstep this hardcoded value: setup 
> FLINK_TM_HEAP to 0, and specified heap size by hand over 
> FLINK_ENV_JAVA_OPTS_TM. 
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9090) Replace UUID.randomUUID with more efficient PRNG

2018-03-27 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416063#comment-16416063
 ] 

Greg Hogan commented on FLINK-9090:
---

Could you point to a case where use of {{UUID.randomUUID}} might affect 
performance? I'm only seeing this used to name things (temp dirs, operators, 
backends, etc.) rather than on the elementwise hot path.

> Replace UUID.randomUUID with more efficient PRNG
> 
>
> Key: FLINK-9090
> URL: https://issues.apache.org/jira/browse/FLINK-9090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Hai Zhou
>Priority: Major
>
> Currently UUID.randomUUID is called in various places in the code base.
> * It is non-deterministic.
> * It uses a single secure random for UUID generation. This uses a single JVM 
> wide lock, and this can lead to lock contention and other performance 
> problems.
> We should move to something that is deterministic by using seeded PRNGs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (FLINK-9090) Replace UUID.randomUUID with more efficient PRNG

2018-03-27 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415910#comment-16415910
 ] 

Greg Hogan edited comment on FLINK-9090 at 3/27/18 6:30 PM:


Why the need to be deterministic (and user configurable?) and where is this 
called on a hot path?


was (Author: greghogan):
Why the need to be deterministic (and user configurable?) and where is this 
called on from a hot path?

> Replace UUID.randomUUID with more efficient PRNG
> 
>
> Key: FLINK-9090
> URL: https://issues.apache.org/jira/browse/FLINK-9090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Hai Zhou
>Priority: Major
>
> Currently UUID.randomUUID is called in various places in the code base.
> * It is non-deterministic.
> * It uses a single secure random for UUID generation. This uses a single JVM 
> wide lock, and this can lead to lock contention and other performance 
> problems.
> We should move to something that is deterministic by using seeded PRNGs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9090) Replace UUID.randomUUID with deterministic PRNG

2018-03-27 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415910#comment-16415910
 ] 

Greg Hogan commented on FLINK-9090:
---

Why the need to be deterministic (and user configurable?) and where is this 
called on from a hot path?

> Replace UUID.randomUUID with deterministic PRNG
> ---
>
> Key: FLINK-9090
> URL: https://issues.apache.org/jira/browse/FLINK-9090
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>Assignee: Hai Zhou
>Priority: Major
>
> Currently UUID.randomUUID is called in various places in the code base.
> * It is non-deterministic.
> * It uses a single secure random for UUID generation. This uses a single JVM 
> wide lock, and this can lead to lock contention and other performance 
> problems.
> We should move to something that is deterministic by using seeded PRNGs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8907) Unable to start the process by start-cluster.sh

2018-03-26 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414424#comment-16414424
 ] 

Greg Hogan commented on FLINK-8907:
---

[~mingleizhang], is the remote-access issue fixed by FLINK-8843?

> Unable to start the process by start-cluster.sh
> ---
>
> Key: FLINK-8907
> URL: https://issues.apache.org/jira/browse/FLINK-8907
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: mingleizhang
>Priority: Minor
> Fix For: 1.5.0
>
>
> Build a flink based on the latest code from master branch. And I get the 
> flink binary of flink-1.6-SNAPSHOT. When I run ./start-cluster.sh by the 
> default flink-conf.yaml. I can not start the process and get the error below.
> {code:java}
>  [root@ricezhang-pjhzf bin]# ./start-cluster.sh 
> >> Starting cluster.
> >> Starting standalonesession daemon on host ricezhang-pjhzf.vclound.com.
> >> : Temporary failure in name resolutionost
> {code}
> *By the way, I build this flink from a windows computer. And copy that folder 
> {{flink-1.6-SNAPSHOT}} to centos.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8907) Unable to start the process by start-cluster.sh

2018-03-26 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414267#comment-16414267
 ] 

Greg Hogan commented on FLINK-8907:
---

[~mingleizhang], do I understand correctly that you are able to access the web 
frontend locally but not remotely? Is this a firewall issue?

> Unable to start the process by start-cluster.sh
> ---
>
> Key: FLINK-8907
> URL: https://issues.apache.org/jira/browse/FLINK-8907
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: mingleizhang
>Priority: Minor
> Fix For: 1.5.0
>
>
> Build a flink based on the latest code from master branch. And I get the 
> flink binary of flink-1.6-SNAPSHOT. When I run ./start-cluster.sh by the 
> default flink-conf.yaml. I can not start the process and get the error below.
> {code:java}
>  [root@ricezhang-pjhzf bin]# ./start-cluster.sh 
> >> Starting cluster.
> >> Starting standalonesession daemon on host ricezhang-pjhzf.vclound.com.
> >> : Temporary failure in name resolutionost
> {code}
> *By the way, I build this flink from a windows computer. And copy that folder 
> {{flink-1.6-SNAPSHOT}} to centos.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8809) Decrease maximum value of DirectMemory at default config

2018-03-26 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413978#comment-16413978
 ] 

Greg Hogan commented on FLINK-8809:
---

As you have noted this maximum value is set only for 
[MaxDirectMemorySize|https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html]
 which constrains the "the maximum total size (in bytes) of the New I/O (the 
java.nio package) direct-buffer allocations". Flink's allocation of memory 
segments is controlled by its configuration so there is no need to constrain 
this value in the JVM.

What would you choose as a reasonable default value? Requiring some users to 
increase this value is a 
[DRY|https://en.wikipedia.org/wiki/Don%27t_repeat_yourself] anti-pattern.

So I think the explanation to be: there is no harm in setting this to an 
essentially "infinite" value, and no benefit to setting a lower value.

> Decrease maximum value of DirectMemory at default config
> 
>
> Key: FLINK-8809
> URL: https://issues.apache.org/jira/browse/FLINK-8809
> Project: Flink
>  Issue Type: Bug
>  Components: TaskManager
>Reporter: Kirill A. Korinskiy
>Priority: Major
>
> Good day!
>  
> Have I can see since this 
> [commit|https://github.com/apache/flink/commit/6c44d93d0a9da725ef8b1ad2a94889f79321db73]
>  TaskManager uses 8,388,607 terabytes as maximum out of heap memory. I guess 
> that not any system has so much memory and it may be a reason to kill java 
> process by OOM Killer.
>  
> I suggest to decrease this value to reasonable value by default.
>  
> Right now I see only one way to overstep this hardcoded value: setup 
> FLINK_TM_HEAP to 0, and specified heap size by hand over 
> FLINK_ENV_JAVA_OPTS_TM. 
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8640) Disable japicmp on java 9

2018-02-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8640:
--
Description: 
The {{japicmp}} plugin does not work out-of-the-box with java 9 as per 
[https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177].

It is necessary to modify MAVEN_OPTS which we cannot automatically do as part 
of our maven build, hence we should disable the plugin.

  was:
The {{japicmp}} plugin does not work out-of-the-box with java 9 as per 
[https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177.]

It is necessary to modify MAVEN_OPTS which we cannot automatically do as part 
of our maven build, hence we should disable the plugin.


> Disable japicmp on java 9
> -
>
> Key: FLINK-8640
> URL: https://issues.apache.org/jira/browse/FLINK-8640
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
>
> The {{japicmp}} plugin does not work out-of-the-box with java 9 as per 
> [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177].
> It is necessary to modify MAVEN_OPTS which we cannot automatically do as part 
> of our maven build, hence we should disable the plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (FLINK-8640) Disable japicmp on java 9

2018-02-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8640:
--
Description: 
The {{japicmp}} plugin does not work out-of-the-box with java 9 as per 
[https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177.]

It is necessary to modify MAVEN_OPTS which we cannot automatically do as part 
of our maven build, hence we should disable the plugin.

  was:
The {{japicmp}} plugin does not work out-of-the-box with java 9 as per 
[https://github.com/siom79/japicmp/issues/177.]

It is necessary to modify MAVEN_OPTS which we cannot automatically do as part 
of our maven build, hence we should disable the plugin.


> Disable japicmp on java 9
> -
>
> Key: FLINK-8640
> URL: https://issues.apache.org/jira/browse/FLINK-8640
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Minor
>
> The {{japicmp}} plugin does not work out-of-the-box with java 9 as per 
> [https://github.com/siom79/japicmp/issues/177|https://github.com/siom79/japicmp/issues/177.]
> It is necessary to modify MAVEN_OPTS which we cannot automatically do as part 
> of our maven build, hence we should disable the plugin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-15 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326285#comment-16326285
 ] 

Greg Hogan commented on FLINK-8414:
---

You certainly can measure scalability but as you have discovered the 
performance will not be monotonically increasing. Redistributing operators 
require a channel between each pair of tasks, so with a parallelism of 2^7 you 
will have 2^14 channels between each task for each iteration.

There are many reasons to use Flink and Gelly, but for some use cases for 
certain algorithms you may even get better performance with a single-threaded 
implementation. See "Scalability! But at what COST?". ConnectedComponents and 
PageRank require, respectively, no and very little intermediate data, whereas 
the similarity measures JaccardIndex and AdamicAdar as well as triangle metrics 
such as ClusteringCoefficient process super-linear intermediate data and 
benefit much more from Flink's scalability. When comparing against 
non-distributed implementations it is important to note that all Gelly 
algorithms process generic data, whereas many "optimized" algorithms assume 
compact integer representations.

> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (FLINK-6730) Activate strict checkstyle for flink-optimizer

2018-01-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan resolved FLINK-6730.
---
Resolution: Won't Fix

Closing per comment from #5294: "I would ignore the optimizer module TBH. There 
are no recent contributions to the module, so we don't benefit from cleaner PR. 
I also don't know what will happen to the module when we start unifying the 
batch&streaming APIs, which probably will be either be a full rewrite or 
removal."

> Activate strict checkstyle for flink-optimizer
> --
>
> Key: FLINK-6730
> URL: https://issues.apache.org/jira/browse/FLINK-6730
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Major
>
> Long term issue for incrementally introducing the strict checkstyle to 
> flink-optimizer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (FLINK-8427) Checkstyle for org.apache.flink.optimizer.costs

2018-01-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan resolved FLINK-8427.
---
Resolution: Won't Fix
  Assignee: (was: Greg Hogan)

> Checkstyle for org.apache.flink.optimizer.costs
> ---
>
> Key: FLINK-8427
> URL: https://issues.apache.org/jira/browse/FLINK-8427
> Project: Flink
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-8427) Checkstyle for org.apache.flink.optimizer.costs

2018-01-12 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8427:
-

 Summary: Checkstyle for org.apache.flink.optimizer.costs
 Key: FLINK-8427
 URL: https://issues.apache.org/jira/browse/FLINK-8427
 Project: Flink
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 1.5.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-12 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324542#comment-16324542
 ] 

Greg Hogan commented on FLINK-8414:
---

It is incumbent on the user to configure an appropriate parallelism for the 
quantity of data. Those graphs contain only a few tens of megabytes of data so 
it is not surprising that the optimal parallelism is around (or even lower 
than) 16. You can use `VertexMetrics` to pre-compute the size of the graph and 
adjust the parallelism at runtime (`ExecutionConfig#setParallelism`). Flink and 
Gelly are designed to scale to 100s to 1000s of parallel tasks and GBs to TBs 
of data.

> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8422) Checkstyle for org.apache.flink.api.java.tuple

2018-01-12 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8422:
-

 Summary: Checkstyle for org.apache.flink.api.java.tuple
 Key: FLINK-8422
 URL: https://issues.apache.org/jira/browse/FLINK-8422
 Project: Flink
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.5.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial


Update {{TupleGenerator}} for Flink's checkstyle and rebuild {{Tuple}} and 
{{TupleBuilder}} classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8414) Gelly performance seriously decreases when using the suggested parallelism configuration

2018-01-12 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324133#comment-16324133
 ] 

Greg Hogan commented on FLINK-8414:
---

This is more of a question than a reported bug and may be more appropriate for 
the flink-user mailing list. Are you able to share what algorithm(s) you are 
running and describe the dataset(s)?

> Gelly performance seriously decreases when using the suggested parallelism 
> configuration
> 
>
> Key: FLINK-8414
> URL: https://issues.apache.org/jira/browse/FLINK-8414
> Project: Flink
>  Issue Type: Bug
>  Components: Configuration, Documentation, Gelly
>Reporter: flora karniav
>Priority: Minor
>
> I am running Gelly examples with different datasets in a cluster of 5 
> machines (1 Jobmanager and 4 Taskmanagers) of 32 cores each.
> The number of Slots parameter is set to 32 (as suggested) and the parallelism 
> to 128 (32 cores*4 taskmanagers).
> I observe a vast performance degradation using these suggested settings than 
> setting parallelism.default to 16 for example were the same job completes at 
> ~60 seconds vs ~140 in the 128 parallelism case.
> Is there something wrong in my configuration? Should I decrease parallelism 
> and -if so- will this inevitably decrease CPU utilization?
> Another matter that may be related to this is the number of partitions of the 
> data. Is this somehow related to parallelism? How many partitions are created 
> in the case of parallelism.default=128? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8403) Flink Gelly examples hanging without returning result

2018-01-10 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321032#comment-16321032
 ] 

Greg Hogan commented on FLINK-8403:
---

Could this be due to the 
[{{akka.framesize}}|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#distributed-coordination-via-akka]
 limit (default: 10 MiB)? Does the job succeed when you write the output to CSV?

> Flink Gelly examples hanging without returning result
> -
>
> Key: FLINK-8403
> URL: https://issues.apache.org/jira/browse/FLINK-8403
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.3.2
> Environment: CentOS Linux release 7.3.1611
>Reporter: flora karniav
>  Labels: examples, gelly, performance
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Hello, I am currently running and measuring Flink Gelly examples (Connected 
> components and Pagerank algorithms) with different SNAP datasets. When 
> running with the Twitter dataset for example 
> (https://snap.stanford.edu/data/egonets-Twitter.html) which has 81,306 
> vertices everything executes and finishes OK and I get the reported job 
> runtime. On the other hand,  executions with datasets having a bigger number 
> of vertices, e.g. https://snap.stanford.edu/data/com-Youtube.html with 
> 1,134,890 vertices, hang with no result and reported time, while at the same 
> time I get "Job execution switched to status FINISHED."
> I thought that this could be a memory issue so I reached 125GB of RAM 
> assigned to my taskmanagers (and jobmanager), but still no luck. 
> The exact command I am running is:
> ./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm 
> PageRank --directed false  --input_filename hdfs://sith0:9000/user/xx.txt 
> --input CSV --type integer --input_field_delimiter $' ' --output print



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8363) Build Hadoop 2.9.0 convenience binaries

2018-01-04 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8363:
-

 Summary: Build Hadoop 2.9.0 convenience binaries
 Key: FLINK-8363
 URL: https://issues.apache.org/jira/browse/FLINK-8363
 Project: Flink
  Issue Type: New Feature
  Components: Build System
Affects Versions: 1.5.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial


Hadoop 2.9.0 was released on 17 November, 2017. A local {{mvn clean verify 
-Dhadoop.version=2.9.0}} ran successfully.

With the new Hadoopless build we may be able to improve the build process by 
reusing the {{flink-dist}} jar (which differ only in build timestamps) and 
simply make each Hadoop-specific tarball by copying in the corresponding 
{{flink-shaded-hadoop2-uber}} jar.

What portion of the TravisCI jobs can run Hadoopless? We could build and verify 
these once and then run a Hadoop-versioned job for each Hadoop version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8361) Remove create_release_files.sh

2018-01-04 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8361:
-

 Summary: Remove create_release_files.sh
 Key: FLINK-8361
 URL: https://issues.apache.org/jira/browse/FLINK-8361
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.5.0
Reporter: Greg Hogan
Priority: Trivial


The monolithic {{create_release_files.sh}} does not support building Flink 
without Hadoop and looks to have been superseded by the scripts in 
{{tools/releasing}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8320) Flink cluster does not work on Java 9

2018-01-03 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8320:
--
Issue Type: Wish  (was: Bug)

> Flink cluster does not work on Java 9
> -
>
> Key: FLINK-8320
> URL: https://issues.apache.org/jira/browse/FLINK-8320
> Project: Flink
>  Issue Type: Wish
>Affects Versions: 1.4.0
> Environment: flink-1.4.0, mac os x, 10.13.1
>Reporter: Steve Layland
>  Labels: java9
>
> Recently got a new macbook and figured it was a good time to install java 9 
> and try it out. I didn't realize that Java 9 was such a breaking update (eg: 
> https://blog.codefx.org/java/java-9-migration-guide/) and took the Flink 
> documentation at face value and assumed that Java 7+ or higher would be fine.
> Here's is what happens after starting a local cluster and attempting to run 
> the sample WordCount program under Java 9:
> {noformat}
> flink-1.4.0 $ export JAVA_HOME=$(/usr/libexec/java_home -v 9)
> cru@lappy:flink-1.4.0 $ java -version
> java version "9.0.1"
> Java(TM) SE Runtime Environment (build 9.0.1+11)
> Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
> cru@lappy:flink-1.4.0 $ bin/start-cluster.sh
> Starting cluster.
> Starting jobmanager daemon on host lappy.local.
> Starting taskmanager daemon on host lappy.local.
> cru@lappy:flink-1.4.0 $ bin/flink run examples/streaming/WordCount.jar
> Cluster configuration: Standalone cluster with JobManager at 
> localhost/127.0.0.1:6123
> Using address localhost:6123 to connect to JobManager.
> JobManager web interface address http://localhost:8081
> Starting execution of program
> Executing WordCount example with default input data set.
> Use --input to specify file input.
> Printing result to stdout. Use --output to specify output path.
> Submitting job with JobID: ee054ffeb4784848143b76b7d51d99c1. Waiting for job 
> completion.
> 
>  The program finished with the following exception:
> org.apache.flink.client.program.ProgramInvocationException: The program 
> execution failed: Couldn't retrieve the JobExecutionResult from the 
> JobManager.
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:492)
>   at 
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:105)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:456)
>   at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>   at 
> org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:89)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:564)
>   at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
>   at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:396)
>   at 
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:802)
>   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:282)
>   at 
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1054)
>   at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1101)
>   at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1098)
>   at 
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1098)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Couldn't 
> retrieve the JobExecutionResult from the JobManager.
>   at 
> org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:300)
>   at 
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:387)
>   at 
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:481)
>   ... 18 more
> Caused by: 
> org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: 
> Lost connection to the JobManager.
>   at 
> org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:219)
>   at 
> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:104)
>   at 
> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:71)
>   a

[jira] [Closed] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-12-28 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-5506.
-
Resolution: Fixed

master: a355df6e33f402beac01c2908cb0c64cfeccadb2
release-1.4: 96051e516bdc114b04c5a3cbb75874c420e27e7b

> Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
> -
>
> Key: FLINK-5506
> URL: https://issues.apache.org/jira/browse/FLINK-5506
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.4, 1.3.2, 1.4.1
>Reporter: Miguel E. Coimbra
>Assignee: Greg Hogan
>  Labels: easyfix, newbie
> Fix For: 1.5.0, 1.4.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Reporting this here as per Vasia's advice.
> I am having the following problem while trying out the 
> org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API 
> (Java).
> Specs: JDK 1.8.0_102 x64
> Apache Flink: 1.1.4
> Suppose I have a very small (I tried an example with 38 vertices as well) 
> dataset stored in a tab-separated file 3-vertex.tsv:
> {code}
> #id1 id2 score
> 010
> 020
> 030
> {code}
> This is just a central vertex with 3 neighbors (disconnected between 
> themselves).
> I am loading the dataset and executing the algorithm with the following code:
> {code}
> // Load the data from the .tsv file.
> final DataSet> edgeTuples = 
> env.readCsvFile(inputPath)
> .fieldDelimiter("\t") // node IDs are separated by spaces
> .ignoreComments("#")  // comments start with "%"
> .types(Long.class, Long.class, Double.class);
> // Generate a graph and add reverse edges (undirected).
> final Graph graph = Graph.fromTupleDataSet(edgeTuples, 
> new MapFunction() {
> private static final long serialVersionUID = 8713516577419451509L;
> public Long map(Long value) {
> return value;
> }
> },
> env).getUndirected();
> // CommunityDetection parameters.
> final double hopAttenuationDelta = 0.5d;
> final int iterationCount = 10;
> // Prepare and trigger the execution.
> DataSet> vs = graph.run(new 
> org.apache.flink.graph.library.CommunityDetection(iterationCount, 
> hopAttenuationDelta)).getVertices();
> vs.print();
> {code}
> Running this code throws the following exception (check the bold line):
> {code}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)
> at 
> org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
> at 
> org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
> at 
> org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146)
> at 
> org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107)
> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> After a further look, I set a breakpoint (Eclipse IDE debugging) at the line 
> in bold:
> org.apache.flink.graph.library.CommunityDetection.java (source code accessed 
> automatically by

[jira] [Closed] (FLINK-8222) Update Scala version

2017-12-22 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-8222.
-
Resolution: Implemented

master: 8987de3b241d23bbcc6ca5640e3cb77972a60be4

> Update Scala version
> 
>
> Key: FLINK-8222
> URL: https://issues.apache.org/jira/browse/FLINK-8222
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.5.0
>
>
> Update Scala to version {{2.11.12}}. I don't believe this affects the Flink 
> distribution but rather anyone who is compiling Flink or a 
> Flink-quickstart-derived program on a shared system.
> "A privilege escalation vulnerability (CVE-2017-15288) has been identified in 
> the Scala compilation daemon."
> https://www.scala-lang.org/news/security-update-nov17.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-8223) Update Hadoop versions

2017-12-22 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-8223.
-
Resolution: Fixed

master: d3cd51a3f9fbb3ffbe6d23a57ff3884733eb47fa

> Update Hadoop versions
> --
>
> Key: FLINK-8223
> URL: https://issues.apache.org/jira/browse/FLINK-8223
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.5.0
>
>
> Update 2.7.3 to 2.7.5 and 2.8.0 to 2.8.3. See 
> http://hadoop.apache.org/releases.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8223) Update Hadoop versions

2017-12-22 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8223:
--
Description: Update 2.7.3 to 2.7.5 and 2.8.0 to 2.8.3. See 
http://hadoop.apache.org/releases.html  (was: Update 2.7.3 to 2.7.4 and 2.8.0 
to 2.8.2. See http://hadoop.apache.org/releases.html)

> Update Hadoop versions
> --
>
> Key: FLINK-8223
> URL: https://issues.apache.org/jira/browse/FLINK-8223
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.5.0
>
>
> Update 2.7.3 to 2.7.5 and 2.8.0 to 2.8.3. See 
> http://hadoop.apache.org/releases.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8223) Update Hadoop versions

2017-12-22 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8223:
--
Fix Version/s: 1.5.0

> Update Hadoop versions
> --
>
> Key: FLINK-8223
> URL: https://issues.apache.org/jira/browse/FLINK-8223
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.5.0
>
>
> Update 2.7.3 to 2.7.4 and 2.8.0 to 2.8.2. See 
> http://hadoop.apache.org/releases.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8222) Update Scala version

2017-12-22 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8222:
--
Fix Version/s: 1.5.0

> Update Scala version
> 
>
> Key: FLINK-8222
> URL: https://issues.apache.org/jira/browse/FLINK-8222
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.5.0
>
>
> Update Scala to version {{2.11.12}}. I don't believe this affects the Flink 
> distribution but rather anyone who is compiling Flink or a 
> Flink-quickstart-derived program on a shared system.
> "A privilege escalation vulnerability (CVE-2017-15288) has been identified in 
> the Scala compilation daemon."
> https://www.scala-lang.org/news/security-update-nov17.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (FLINK-8037) Missing cast in integer arithmetic in TransactionalIdsGenerator#generateIdsToAbort

2017-12-21 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan reassigned FLINK-8037:
-

Assignee: Greg Hogan

> Missing cast in integer arithmetic in 
> TransactionalIdsGenerator#generateIdsToAbort
> --
>
> Key: FLINK-8037
> URL: https://issues.apache.org/jira/browse/FLINK-8037
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Greg Hogan
>Priority: Minor
>
> {code}
>   public Set generateIdsToAbort() {
> Set idsToAbort = new HashSet<>();
> for (int i = 0; i < safeScaleDownFactor; i++) {
>   idsToAbort.addAll(generateIdsToUse(i * poolSize * 
> totalNumberOfSubtasks));
> {code}
> The operands are integers where generateIdsToUse() expects long parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-12-20 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-5506:
--
Fix Version/s: 1.4.1
   1.5.0

> Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
> -
>
> Key: FLINK-5506
> URL: https://issues.apache.org/jira/browse/FLINK-5506
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.4, 1.3.2, 1.4.1
>Reporter: Miguel E. Coimbra
>Assignee: Greg Hogan
>  Labels: easyfix, newbie
> Fix For: 1.5.0, 1.4.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Reporting this here as per Vasia's advice.
> I am having the following problem while trying out the 
> org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API 
> (Java).
> Specs: JDK 1.8.0_102 x64
> Apache Flink: 1.1.4
> Suppose I have a very small (I tried an example with 38 vertices as well) 
> dataset stored in a tab-separated file 3-vertex.tsv:
> {code}
> #id1 id2 score
> 010
> 020
> 030
> {code}
> This is just a central vertex with 3 neighbors (disconnected between 
> themselves).
> I am loading the dataset and executing the algorithm with the following code:
> {code}
> // Load the data from the .tsv file.
> final DataSet> edgeTuples = 
> env.readCsvFile(inputPath)
> .fieldDelimiter("\t") // node IDs are separated by spaces
> .ignoreComments("#")  // comments start with "%"
> .types(Long.class, Long.class, Double.class);
> // Generate a graph and add reverse edges (undirected).
> final Graph graph = Graph.fromTupleDataSet(edgeTuples, 
> new MapFunction() {
> private static final long serialVersionUID = 8713516577419451509L;
> public Long map(Long value) {
> return value;
> }
> },
> env).getUndirected();
> // CommunityDetection parameters.
> final double hopAttenuationDelta = 0.5d;
> final int iterationCount = 10;
> // Prepare and trigger the execution.
> DataSet> vs = graph.run(new 
> org.apache.flink.graph.library.CommunityDetection(iterationCount, 
> hopAttenuationDelta)).getVertices();
> vs.print();
> {code}
> Running this code throws the following exception (check the bold line):
> {code}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)
> at 
> org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
> at 
> org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
> at 
> org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146)
> at 
> org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107)
> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> After a further look, I set a breakpoint (Eclipse IDE debugging) at the line 
> in bold:
> org.apache.flink.graph.library.CommunityDetection.java (source code accessed 
> automatically by Maven)
> // find the highest score of maxScoreLabel
> double highestScore

[jira] [Created] (FLINK-8222) Update Scala version

2017-12-07 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8222:
-

 Summary: Update Scala version
 Key: FLINK-8222
 URL: https://issues.apache.org/jira/browse/FLINK-8222
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan


Update Scala to version {{2.11.12}}. I don't believe this affects the Flink 
distribution but rather anyone who is compiling Flink or a 
Flink-quickstart-derived program on a shared system.

"A privilege escalation vulnerability (CVE-2017-15288) has been identified in 
the Scala compilation daemon."

https://www.scala-lang.org/news/security-update-nov17.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8223) Update Hadoop versions

2017-12-07 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8223:
-

 Summary: Update Hadoop versions
 Key: FLINK-8223
 URL: https://issues.apache.org/jira/browse/FLINK-8223
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.5.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial


Update 2.7.3 to 2.7.4 and 2.8.0 to 2.8.2. See 
http://hadoop.apache.org/releases.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (FLINK-5506) Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException

2017-12-01 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-5506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan reassigned FLINK-5506:
-

Assignee: Greg Hogan

> Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
> -
>
> Key: FLINK-5506
> URL: https://issues.apache.org/jira/browse/FLINK-5506
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.1.4, 1.3.2, 1.4.1
>Reporter: Miguel E. Coimbra
>Assignee: Greg Hogan
>  Labels: easyfix, newbie
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Reporting this here as per Vasia's advice.
> I am having the following problem while trying out the 
> org.apache.flink.graph.library.CommunityDetection algorithm of the Gelly API 
> (Java).
> Specs: JDK 1.8.0_102 x64
> Apache Flink: 1.1.4
> Suppose I have a very small (I tried an example with 38 vertices as well) 
> dataset stored in a tab-separated file 3-vertex.tsv:
> {code}
> #id1 id2 score
> 010
> 020
> 030
> {code}
> This is just a central vertex with 3 neighbors (disconnected between 
> themselves).
> I am loading the dataset and executing the algorithm with the following code:
> {code}
> // Load the data from the .tsv file.
> final DataSet> edgeTuples = 
> env.readCsvFile(inputPath)
> .fieldDelimiter("\t") // node IDs are separated by spaces
> .ignoreComments("#")  // comments start with "%"
> .types(Long.class, Long.class, Double.class);
> // Generate a graph and add reverse edges (undirected).
> final Graph graph = Graph.fromTupleDataSet(edgeTuples, 
> new MapFunction() {
> private static final long serialVersionUID = 8713516577419451509L;
> public Long map(Long value) {
> return value;
> }
> },
> env).getUndirected();
> // CommunityDetection parameters.
> final double hopAttenuationDelta = 0.5d;
> final int iterationCount = 10;
> // Prepare and trigger the execution.
> DataSet> vs = graph.run(new 
> org.apache.flink.graph.library.CommunityDetection(iterationCount, 
> hopAttenuationDelta)).getVertices();
> vs.print();
> {code}
> Running this code throws the following exception (check the bold line):
> {code}
> org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply$mcV$sp(JobManager.scala:805)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$8.apply(JobManager.scala:751)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.flink.graph.library.CommunityDetection$VertexLabelUpdater.updateVertex(CommunityDetection.java:158)
> at 
> org.apache.flink.graph.spargel.ScatterGatherIteration$GatherUdfSimpleVV.coGroup(ScatterGatherIteration.java:389)
> at 
> org.apache.flink.runtime.operators.CoGroupWithSolutionSetSecondDriver.run(CoGroupWithSolutionSetSecondDriver.java:218)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
> at 
> org.apache.flink.runtime.iterative.task.AbstractIterativeTask.run(AbstractIterativeTask.java:146)
> at 
> org.apache.flink.runtime.iterative.task.IterationTailTask.run(IterationTailTask.java:107)
> at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:642)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> After a further look, I set a breakpoint (Eclipse IDE debugging) at the line 
> in bold:
> org.apache.flink.graph.library.CommunityDetection.java (source code accessed 
> automatically by Maven)
> // find the highest score of maxScoreLabel
> double highestScore = labelsWithHighestScore.get(maxScoreLabel);
> - maxSc

[jira] [Updated] (FLINK-8180) Refactor driver outputs

2017-11-30 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8180:
--
Description: 
The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build flink-gelly-examples as an uber jar in 
order to include additional modules (which may require a direct dependency on 
Jackson, which would require checkstyle suppressions around the unshaded 
jackson imports).

  was:
The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build Gelly as an uber jar in order to include 
additional modules (which may require a direct dependency on Jackson, which 
would require checkstyle suppressions around the unshaded jackson imports).


> Refactor driver outputs
> ---
>
> Key: FLINK-8180
> URL: https://issues.apache.org/jira/browse/FLINK-8180
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.5.0
>
>
> The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
> of results as csv. Testing this was and is a challenge so was not done. There 
> are many additional improvements which can be made based on recent 
> improvements to the Gelly framework.
> Result hash and analytic results should always be printed to the screen. 
> Results can optionally be written to stdout or to a file. In the latter case 
> the result hash and analytic results (and schema) will also be written to a 
> top-level file.
> The "verbose" output strings can be replaced with json which is just as 
> human-readable but also machine readable. In addition to csv and json it may 
> be simple to support xml, etc. Computed fields will be optionally printed to 
> screen or file (currently these are always printed to screen but never to 
> file).
> Testing will be simplified since formats are now a separate concern from the 
> stream.
> Jackson is available to Gelly as a dependency provided in the Flink 
> distribution but we may want to build flink-gelly-examples as an uber jar in 
> order to include additional modules (which may require a direct dependency on 
> Jackson, which would require checkstyle suppressions around the unshaded 
> jackson imports).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8180) Refactor driver outputs

2017-11-30 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8180:
--
Description: 
The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build Gelly as an uber jar in order to include 
additional modules (which may require a direct dependency on Jackson, which 
would require checkstyle suppressions around the unshaded jackson imports).

  was:
The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build Gelly as a fat jar in order to include 
additional modules (which may require a direct dependency on Jackson, which 
would require checkstyle suppressions around the unshaded jackson imports).


> Refactor driver outputs
> ---
>
> Key: FLINK-8180
> URL: https://issues.apache.org/jira/browse/FLINK-8180
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.5.0
>
>
> The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
> of results as csv. Testing this was and is a challenge so was not done. There 
> are many additional improvements which can be made based on recent 
> improvements to the Gelly framework.
> Result hash and analytic results should always be printed to the screen. 
> Results can optionally be written to stdout or to a file. In the latter case 
> the result hash and analytic results (and schema) will also be written to a 
> top-level file.
> The "verbose" output strings can be replaced with json which is just as 
> human-readable but also machine readable. In addition to csv and json it may 
> be simple to support xml, etc. Computed fields will be optionally printed to 
> screen or file (currently these are always printed to screen but never to 
> file).
> Testing will be simplified since formats are now a separate concern from the 
> stream.
> Jackson is available to Gelly as a dependency provided in the Flink 
> distribution but we may want to build Gelly as an uber jar in order to 
> include additional modules (which may require a direct dependency on Jackson, 
> which would require checkstyle suppressions around the unshaded jackson 
> imports).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8180) Refactor driver outputs

2017-11-30 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8180:
--
Description: 
The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build Gelly as a fat jar in order to include 
additional modules (which may require a direct dependency on Jackson, which 
would require checkstyle suppressions around the unshaded jackson imports).

  was:
The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build Gelly as a fat jar in order to include 
additional modules (which may require a direct dependency on Jackson, which 
would fail the checkstyle requirement to use the shaded package).


> Refactor driver outputs
> ---
>
> Key: FLINK-8180
> URL: https://issues.apache.org/jira/browse/FLINK-8180
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.5.0
>
>
> The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
> of results as csv. Testing this was and is a challenge so was not done. There 
> are many additional improvements which can be made based on recent 
> improvements to the Gelly framework.
> Result hash and analytic results should always be printed to the screen. 
> Results can optionally be written to stdout or to a file. In the latter case 
> the result hash and analytic results (and schema) will also be written to a 
> top-level file.
> The "verbose" output strings can be replaced with json which is just as 
> human-readable but also machine readable. In addition to csv and json it may 
> be simple to support xml, etc. Computed fields will be optionally printed to 
> screen or file (currently these are always printed to screen but never to 
> file).
> Testing will be simplified since formats are now a separate concern from the 
> stream.
> Jackson is available to Gelly as a dependency provided in the Flink 
> distribution but we may want to build Gelly as a fat jar in order to include 
> additional modules (which may require a direct dependency on Jackson, which 
> would require checkstyle suppressions around the unshaded jackson imports).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8180) Refactor driver outputs

2017-11-30 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8180:
-

 Summary: Refactor driver outputs
 Key: FLINK-8180
 URL: https://issues.apache.org/jira/browse/FLINK-8180
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.5.0
Reporter: Greg Hogan
Assignee: Greg Hogan
 Fix For: 1.5.0


The change in 1.4 of algorithm results from Tuples to POJOs broke the writing 
of results as csv. Testing this was and is a challenge so was not done. There 
are many additional improvements which can be made based on recent improvements 
to the Gelly framework.

Result hash and analytic results should always be printed to the screen. 
Results can optionally be written to stdout or to a file. In the latter case 
the result hash and analytic results (and schema) will also be written to a 
top-level file.

The "verbose" output strings can be replaced with json which is just as 
human-readable but also machine readable. In addition to csv and json it may be 
simple to support xml, etc. Computed fields will be optionally printed to 
screen or file (currently these are always printed to screen but never to file).

Testing will be simplified since formats are now a separate concern from the 
stream.

Jackson is available to Gelly as a dependency provided in the Flink 
distribution but we may want to build Gelly as a fat jar in order to include 
additional modules (which may require a direct dependency on Jackson, which 
would fail the checkstyle requirement to use the shaded package).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-6864) Remove confusing "invalid POJO type" messages from TypeExtractor

2017-11-28 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-6864.
-
Resolution: Implemented

master: 450b4241404055ed6638e354be421b83380827c5

> Remove confusing "invalid POJO type" messages from TypeExtractor
> 
>
> Key: FLINK-6864
> URL: https://issues.apache.org/jira/browse/FLINK-6864
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Type Serialization System
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Fang Yong
> Fix For: 1.5.0
>
>
> When a user's type cannot be treated as a POJO, the {{TypeExtractor}} will 
> log warnings such as ".. must have a default constructor to be used as a 
> POJO.", "  ... is not a valid POJO type because not all fields are valid POJO 
> fields." in the {{analyzePojo}} method.
> These messages are often conceived as misleading for the user to think that 
> the job should have failed, whereas in fact in these cases Flink just 
> fallsback to Kryo and treat then as generic types. We should remove these 
> messages, and at the same time improve the type serialization docs at [1] to 
> explicitly inform what it means when Flink does / does not recognizes a user 
> type as a POJO.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#rules-for-pojo-types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-6864) Remove confusing "invalid POJO type" messages from TypeExtractor

2017-11-28 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-6864:
--
Fix Version/s: 1.5.0

> Remove confusing "invalid POJO type" messages from TypeExtractor
> 
>
> Key: FLINK-6864
> URL: https://issues.apache.org/jira/browse/FLINK-6864
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation, Type Serialization System
>Reporter: Tzu-Li (Gordon) Tai
>Assignee: Fang Yong
> Fix For: 1.5.0
>
>
> When a user's type cannot be treated as a POJO, the {{TypeExtractor}} will 
> log warnings such as ".. must have a default constructor to be used as a 
> POJO.", "  ... is not a valid POJO type because not all fields are valid POJO 
> fields." in the {{analyzePojo}} method.
> These messages are often conceived as misleading for the user to think that 
> the job should have failed, whereas in fact in these cases Flink just 
> fallsback to Kryo and treat then as generic types. We should remove these 
> messages, and at the same time improve the type serialization docs at [1] to 
> explicitly inform what it means when Flink does / does not recognizes a user 
> type as a POJO.
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/types_serialization.html#rules-for-pojo-types



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-8142) Cleanup reference to deprecated constants in ConfigConstants

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-8142.
-
Resolution: Implemented

master: f2b804a7479dcba7980dd68a445635f4ac2198c0

> Cleanup reference to deprecated constants in ConfigConstants
> 
>
> Key: FLINK-8142
> URL: https://issues.apache.org/jira/browse/FLINK-8142
> Project: Flink
>  Issue Type: Improvement
>  Components: Local Runtime
>Affects Versions: 1.4.0
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Minor
> Fix For: 1.5.0
>
>
> ConfigConstants contains several deprecated String constants that are used by 
> other Flink modules. Those should be cleaned up.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7967) Deprecate Hadoop specific Flink configuration options

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7967.
-
Resolution: Implemented

master: fae83c0b0c0a711d5d0f9de0b95fb6b703c4f912

> Deprecate Hadoop specific Flink configuration options
> -
>
> Key: FLINK-7967
> URL: https://issues.apache.org/jira/browse/FLINK-7967
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Till Rohrmann
>Assignee: mingleizhang
>Priority: Trivial
> Fix For: 1.5.0
>
>
> I think we should deprecate the hadoop specific configuration options from 
> Flink and encourage people to use instead the environment variable 
> {{HADOOP_CONF_DIR}} to configure the Hadoop configuration directory. This 
> includes:
> {code}
> fs.hdfs.hdfsdefault
> fs.hdfs.hdfssite
> fs.hdfs.hadoopconf
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-8105) Removed unnecessary null check

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-8105.
-
Resolution: Implemented

master: 3561222c5d6c7cee79f8c5872f32227632135c48

> Removed unnecessary null check
> --
>
> Key: FLINK-8105
> URL: https://issues.apache.org/jira/browse/FLINK-8105
> Project: Flink
>  Issue Type: Improvement
>  Components: Checkstyle
>Affects Versions: 1.4.0
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Minor
> Fix For: 1.5.0
>
>
> eg.
> {code:java}
> if (value != null && value instanceof String)
> {code}
> null instanceof String returns false hence replaced the check with
> {code:java}
> if (value instanceof String)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7967) Deprecate Hadoop specific Flink configuration options

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7967:
--
Fix Version/s: 1.5.0

> Deprecate Hadoop specific Flink configuration options
> -
>
> Key: FLINK-7967
> URL: https://issues.apache.org/jira/browse/FLINK-7967
> Project: Flink
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Till Rohrmann
>Assignee: mingleizhang
>Priority: Trivial
> Fix For: 1.5.0
>
>
> I think we should deprecate the hadoop specific configuration options from 
> Flink and encourage people to use instead the environment variable 
> {{HADOOP_CONF_DIR}} to configure the Hadoop configuration directory. This 
> includes:
> {code}
> fs.hdfs.hdfsdefault
> fs.hdfs.hdfssite
> fs.hdfs.hadoopconf
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8126) Update and fix checkstyle

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8126:
--
Fix Version/s: 1.4.0

> Update and fix checkstyle
> -
>
> Key: FLINK-8126
> URL: https://issues.apache.org/jira/browse/FLINK-8126
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.4.0, 1.5.0
>
>
> Our current checkstyle configuration (checkstyle version 6.19) is missing 
> some ImportOrder and variable naming errors which are detected in 1) IntelliJ 
> using the same checkstyle version and 2) with the maven-checkstyle-plugin 
> with an up-to-date checkstyle version (8.4).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-8126) Update and fix checkstyle

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-8126.
-
Resolution: Fixed

1.4: 4eae418b410c928b8e4b7893c1f5b9c48a5e3228

> Update and fix checkstyle
> -
>
> Key: FLINK-8126
> URL: https://issues.apache.org/jira/browse/FLINK-8126
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.4.0, 1.5.0
>
>
> Our current checkstyle configuration (checkstyle version 6.19) is missing 
> some ImportOrder and variable naming errors which are detected in 1) IntelliJ 
> using the same checkstyle version and 2) with the maven-checkstyle-plugin 
> with an up-to-date checkstyle version (8.4).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (FLINK-8126) Update and fix checkstyle

2017-11-27 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan reopened FLINK-8126:
---

> Update and fix checkstyle
> -
>
> Key: FLINK-8126
> URL: https://issues.apache.org/jira/browse/FLINK-8126
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.5.0
>
>
> Our current checkstyle configuration (checkstyle version 6.19) is missing 
> some ImportOrder and variable naming errors which are detected in 1) IntelliJ 
> using the same checkstyle version and 2) with the maven-checkstyle-plugin 
> with an up-to-date checkstyle version (8.4).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7642) Upgrade maven surefire plugin to 2.19.1

2017-11-27 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16267037#comment-16267037
 ] 

Greg Hogan commented on FLINK-7642:
---

Please stop adding and removing a blank line from the description every few 
days.

Does 2.20.1 resolve SUREFIRE-1255? From the parent {{pom.xml}}:

{noformat}

2.18.1
{noformat}

> Upgrade maven surefire plugin to 2.19.1
> ---
>
> Key: FLINK-7642
> URL: https://issues.apache.org/jira/browse/FLINK-7642
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Reporter: Ted Yu
>
> Surefire 2.19 release introduced more useful test filters which would let us 
> run a subset of the test.
> This issue is for upgrading maven surefire plugin to 2.19.1



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8036) Consider using gradle to build Flink

2017-11-21 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261371#comment-16261371
 ] 

Greg Hogan commented on FLINK-8036:
---

Are many developers running tests from the command line or using an IDE for 
incremental compilation and selectively running tests?

> Consider using gradle to build Flink
> 
>
> Key: FLINK-8036
> URL: https://issues.apache.org/jira/browse/FLINK-8036
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ted Yu
>
> Here is summary from Lukasz over this thread 
> (http://search-hadoop.com/m/Beam/gfKHFVh4NM151XIu1?subj=Re+DISCUSS+Move+away+from+Apache+Maven+as+build+tool)
>  w.r.t. performance boost from using gradle:
> Maven performs parallelization at the module level, an entire module needs
> to complete before any dependent modules can start, this means running all
> the checks like findbugs, checkstyle, tests need to finish. Gradle has task
> level parallelism between subprojects which means that as soon as the
> compile and shade steps are done for a project, and dependent subprojects
> can typically start. This means that we get increased parallelism due to
> not needing to wait for findbugs, checkstyle, tests to run. I typically see
> ~20 tasks (at peak) running on my desktop in parallel.
> Flink should consider using gradle - on Linux with SSD, a clean build takes 
> an hour.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-8126) Update and fix checkstyle

2017-11-21 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-8126:
--
Component/s: Build System

> Update and fix checkstyle
> -
>
> Key: FLINK-8126
> URL: https://issues.apache.org/jira/browse/FLINK-8126
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.5.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Trivial
> Fix For: 1.5.0
>
>
> Our current checkstyle configuration (checkstyle version 6.19) is missing 
> some ImportOrder and variable naming errors which are detected in 1) IntelliJ 
> using the same checkstyle version and 2) with the maven-checkstyle-plugin 
> with an up-to-date checkstyle version (8.4).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-8126) Update and fix checkstyle

2017-11-21 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-8126:
-

 Summary: Update and fix checkstyle
 Key: FLINK-8126
 URL: https://issues.apache.org/jira/browse/FLINK-8126
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial
 Fix For: 1.5.0


Our current checkstyle configuration (checkstyle version 6.19) is missing some 
ImportOrder and variable naming errors which are detected in 1) IntelliJ using 
the same checkstyle version and 2) with the maven-checkstyle-plugin with an 
up-to-date checkstyle version (8.4).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7777) Bump japicmp to 0.11.0

2017-11-16 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255935#comment-16255935
 ] 

Greg Hogan commented on FLINK-:
---

Running japicmp v0.11.0 against Flink API v1.3.2 I get only the following 
error: {{Breaking the build because there is at least one incompatibility: 
org.apache.flink.core.fs.FileSystem.getKind():METHOD_REMOVED}}, which is 
already filed as a blocking bug in FLINK-8083.

> Bump japicmp to 0.11.0
> --
>
> Key: FLINK-
> URL: https://issues.apache.org/jira/browse/FLINK-
> Project: Flink
>  Issue Type: Bug
>  Components: Build System
>Affects Versions: 1.3.2
>Reporter: Hai Zhou UTC+8
>Assignee: Hai Zhou UTC+8
>Priority: Minor
> Fix For: 1.5.0
>
>
> Currently, flink used japicmp-maven-plugin version is 0.7.0, I'm getting 
> these warnings from the maven plugin during a *mvn clean verify*:
> {code:java}
> [INFO] Written file '.../target/japicmp/japicmp.diff'.
> [INFO] Written file '.../target/japicmp/japicmp.xml'.
> [INFO] Written file '.../target/japicmp/japicmp.html'.
> Warning:  org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 
> 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not 
> recognized.
> Compiler warnings:
>   WARNING:  'org.apache.xerces.jaxp.SAXParserImpl: Property 
> 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.'
> Warning:  org.apache.xerces.parsers.SAXParser: Feature 
> 'http://javax.xml.XMLConstants/feature/secure-processing' is not recognized.
> Warning:  org.apache.xerces.parsers.SAXParser: Property 
> 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.
> Warning:  org.apache.xerces.parsers.SAXParser: Property 
> 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not 
> recognized.
> {code}
> japicmp fixed in version 0.7.1 : _Excluded xerces vom maven-reporting 
> dependency in order to prevent warnings from SAXParserImpl. _
> The current stable version is 0.11.0, we can consider upgrading to this 
> version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8041) Recommended Improvements for Gelly's Connected Components Algorithm

2017-11-15 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253784#comment-16253784
 ] 

Greg Hogan commented on FLINK-8041:
---

Currently the vertex value can be any implementation of {{Comparable}}, not 
just {{Long}}. I can see the benefit to assigning vertices a label as with 
{{DataSetUtils#zipWithUniqueId}}. There looks to be a cost or limit on 
scalability with the second proposal, but it would be great to start a 
discussion on a PR.

> Recommended Improvements for Gelly's Connected Components Algorithm
> ---
>
> Key: FLINK-8041
> URL: https://issues.apache.org/jira/browse/FLINK-8041
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.3.2
> Environment: Linux, IntelliJ IDEA
>Reporter: Christos Hadjinikolis
>Priority: Minor
> Fix For: 1.4.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> At the moment, the ConnectedComponents algorithm that comes with Flink's 
> native Graph API (Gelly) has two issues:
> 1. It relies on the user to provide correct values for in the vertices 
> DataSet. Based on how the algorithm works, these values must be of type Long 
> and be unique for every vertex. If the user provides the same values for 
> every vertex (e.g. 1) the algorithm still works but as those values are used 
> for the identification of the different connected components, one will end up 
> with a single connected component and will have no clue as to why this 
> happened. This can be easily fixed in two ways: either by checking that the 
> values that appear alongside vertex-ids are unique and informing the user if 
> not, or by generating those values for every vertex before the algorithm is 
> ran. I have a running implementation of the second way and I really think it 
> is an appropriate solution to this problem.
> 2. Once the connected components are identified, one has to apply additional 
> transformations and actions to find out which is the biggest or the order of 
> the connected components in terms of their size. Alternatively, the algorithm 
> can be implemented so that numerical ids that are given to each component 
> reflect their ranking when ordered based on size, e.g. connected component 1 
> will be the biggest, connected component 2 should be the second biggest and 
> so on.  I have also solved this and I think it would make a nice addition. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8048) RAT check complaint

2017-11-15 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253683#comment-16253683
 ] 

Greg Hogan commented on FLINK-8048:
---

[~tedyu] {{mvn rat:check}} is from a different plugin and {{mvn 
apache-rat:check}} matches Flink's usage.

> RAT check complaint
> ---
>
> Key: FLINK-8048
> URL: https://issues.apache.org/jira/browse/FLINK-8048
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Running {{mvn rat:check}} gives warning about the following files:
> test-infra/end-to-end-test/test-data/words
> .editorconfig
> .github/PULL_REQUEST_TEMPLATE.md
> .github/CONTRIBUTING.md



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7919) Join with Solution Set fails with NPE if Solution Set has no entry

2017-11-09 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246487#comment-16246487
 ] 

Greg Hogan commented on FLINK-7919:
---

[~fhueske] would an alternative be to use inner join semantics (dropping the 
unmatched element) and potentially support outer joins in the future?

> Join with Solution Set fails with NPE if Solution Set has no entry
> --
>
> Key: FLINK-7919
> URL: https://issues.apache.org/jira/browse/FLINK-7919
> Project: Flink
>  Issue Type: Bug
>  Components: DataSet API, Local Runtime
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Fabian Hueske
>
> A job with a delta iteration fails hard with a NPE in the solution set join, 
> if the solution set has no entry for the join key of the probe side.
> The following program reproduces the problem:
> {code}
> DataSet> values = env.fromElements(
>   Tuple2.of(1L, 1), Tuple2.of(2L, 1), Tuple2.of(3L, 1));
> DeltaIteration, Tuple2> di = values
>   .iterateDelta(values, 5,0);
> DataSet> loop = di.getWorkset()
>   .map(new MapFunction, Tuple2>() {
> @Override
> public Tuple2 map(Tuple2 value) throws 
> Exception {
>   // modifying the key to join on a non existing solution set key 
>   return Tuple2.of(value.f0 + 1, 1);
> }
>   })
>   .join(di.getSolutionSet()).where(0).equalTo(0)
>   .with(new JoinFunction, Tuple2, 
> Tuple2>() {
> @Override
> public Tuple2 join(
>   Tuple2 first, 
>   Tuple2 second) throws Exception {
>   
>   return Tuple2.of(first.f0, first.f1 + second.f1);
> }
>   });
> DataSet> result = di.closeWith(loop, loop);
> result.print();
> {code}
> It doesn't matter whether the solution set is managed or not. 
> The problem is cause because the solution set hash table prober returns a 
> {{null}} value if the solution set does not contain a value for the probe 
> side key. 
> The join operator does not check if the return value is {{null}} or not but 
> immediately tries to create a copy using a {{TypeSerializer}}. This copy 
> fails with a NPE.
> I propose to check for {{null}} and call the join function with {{null}} on 
> the solution set side. This gives OUTER JOIN semantics for join.
> Since the code was previously failing with a NPE, it is safe to forward the 
> {{null}} into the {{JoinFunction}}. 
> However, users must be aware that the solution set value may be {{null}} and 
> we need to update the documentation (JavaDocs + website) to describe the 
> behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-2973) Add flink-benchmark with compliant licenses again

2017-11-09 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16246432#comment-16246432
 ] 

Greg Hogan commented on FLINK-2973:
---

[~fhueske] it looks to be GPLv2 (with classpath exception): 
http://hg.openjdk.java.net/jdk9/jdk9/file/a08cbfc0e4ec/LICENSE

> Add flink-benchmark with compliant licenses again
> -
>
> Key: FLINK-2973
> URL: https://issues.apache.org/jira/browse/FLINK-2973
> Project: Flink
>  Issue Type: Task
>  Components: Build System
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.0
>
>
> We recently created the Maven module {{flink-benchmark}} for micro-benchmarks 
> and ported most of the existing micro-benchmarks to the Java benchmarking 
> framework JMH. However, JMH is part of OpenJDK and under GPL license which is 
> not compatible with the AL2.
> Consequently, we need to remove this dependency and either revert the porting 
> commits or port the benchmarks to another benchmarking framework. An 
> alternative could be [Google's Caliper|https://github.com/google/caliper] 
> library which is under AL2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7679) Upgrade maven enforcer plugin to 3.0.0-M1

2017-11-09 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245711#comment-16245711
 ] 

Greg Hogan commented on FLINK-7679:
---

So it looks like Scala 2.12+ is required for Java 9.

> Upgrade maven enforcer plugin to 3.0.0-M1
> -
>
> Key: FLINK-7679
> URL: https://issues.apache.org/jira/browse/FLINK-7679
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Ted Yu
>Assignee: Hai Zhou UTC+8
>
> I got the following build error against Java 9:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (enforce-maven) 
> on project flink-parent: Execution enforce-maven of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: An API 
> incompatibility was encountered while executing 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce: 
> java.lang.ExceptionInInitializerError: null
> [ERROR] -
> [ERROR] realm =plugin>org.apache.maven.plugins:maven-enforcer-plugin:1.4.1
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/home/hbase/.m2/repository/org/apache/maven/plugins/maven-enforcer-plugin/1.4.1/maven-enforcer-plugin-1.4.1.jar
> {code}
> Upgrading maven enforcer plugin to 3.0.0-M1 would get over the above error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (FLINK-8033) Build Flink with JDK 9

2017-11-08 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244288#comment-16244288
 ] 

Greg Hogan edited comment on FLINK-8033 at 11/8/17 4:54 PM:


With Java 8 we had a flink-java8 package which added support for lambdas (, 
etc.?). Are there similar new features in Java 9 or are we simply looking to 
support compiling and running Flink with Java 9?


was (Author: greghogan):
Does "support" imply more than compiling and running Flink with Java 9?

> Build Flink with JDK 9
> --
>
> Key: FLINK-8033
> URL: https://issues.apache.org/jira/browse/FLINK-8033
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Hai Zhou UTC+8
> Fix For: 1.5.0
>
>
> This is a JIRA to track all issues that found to make Flink compatible with 
> Java 9.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7679) Upgrade maven enforcer plugin to 3.0.0-M1

2017-11-08 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244295#comment-16244295
 ] 

Greg Hogan commented on FLINK-7679:
---

[~yew1eb] do Flink tests pass with Java 8 and maven enforcer plugin to 3.0.0-M1?

> Upgrade maven enforcer plugin to 3.0.0-M1
> -
>
> Key: FLINK-7679
> URL: https://issues.apache.org/jira/browse/FLINK-7679
> Project: Flink
>  Issue Type: Sub-task
>  Components: Build System
>Reporter: Ted Yu
>Assignee: Hai Zhou UTC+8
>
> I got the following build error against Java 9:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce (enforce-maven) 
> on project flink-parent: Execution enforce-maven of goal 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce failed: An API 
> incompatibility was encountered while executing 
> org.apache.maven.plugins:maven-enforcer-plugin:1.4.1:enforce: 
> java.lang.ExceptionInInitializerError: null
> [ERROR] -
> [ERROR] realm =plugin>org.apache.maven.plugins:maven-enforcer-plugin:1.4.1
> [ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
> [ERROR] urls[0] = 
> file:/home/hbase/.m2/repository/org/apache/maven/plugins/maven-enforcer-plugin/1.4.1/maven-enforcer-plugin-1.4.1.jar
> {code}
> Upgrading maven enforcer plugin to 3.0.0-M1 would get over the above error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8033) Build Flink with JDK 9

2017-11-08 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244288#comment-16244288
 ] 

Greg Hogan commented on FLINK-8033:
---

Does "support" imply more than compiling and running Flink with Java 9?

> Build Flink with JDK 9
> --
>
> Key: FLINK-8033
> URL: https://issues.apache.org/jira/browse/FLINK-8033
> Project: Flink
>  Issue Type: Improvement
>  Components: Build System
>Affects Versions: 1.4.0
>Reporter: Hai Zhou UTC+8
> Fix For: 1.5.0
>
>
> This is a JIRA to track all issues that found to support Flink on Java 9 in 
> the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-8020) Deadlock found in Flink Streaming job

2017-11-08 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16244126#comment-16244126
 ] 

Greg Hogan commented on FLINK-8020:
---

[~whjiang] can this be closed as "not a problem"?

> Deadlock found in Flink Streaming job
> -
>
> Key: FLINK-8020
> URL: https://issues.apache.org/jira/browse/FLINK-8020
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming, Streaming Connectors
>Affects Versions: 1.3.2
> Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode
>Reporter: Weihua Jiang
>Priority: Blocker
> Attachments: jstack67976-2.log
>
>
> Our streaming job run into trouble in these days after a long time smooth 
> running. One issue we found is 
> [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this 
> one.
> After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
> 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
> hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
> 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
> hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
> 0x0007b6aa1788. 
> This DEADLOCK made the job fail to proceed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7961) Docker-Flink with Docker Swarm doesn't work when machines are in different clouds

2017-11-02 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7961:
--
Description: 
Task Managers can't find Job Manager by name. Maybe some additional Docker 
configuration is needed?

I am running the standard setup and create-docker-swarm-service.sh script from 
the Docker Flink project:
https://github.com/apache/flink/blob/master/flink-contrib/docker-flink/create-docker-swarm-service.sh

This is the log from one of the Task Manager's containers:

{noformat}
Starting Task Manager
config file:
jobmanager.rpc.address: flink-jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 1024
taskmanager.numberOfTaskSlots: 2
taskmanager.memory.preallocate: false
parallelism.default: 1
jobmanager.web.port: 8081
blob.server.port: 6124
query.server.port: 6125
Starting taskmanager as a console application on host c42a6093f7bb.
2017-11-01 11:20:51,459 WARN  org.apache.hadoop.util.NativeCodeLoader   
- Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
2017-11-01 11:20:51,522 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- 

2017-11-01 11:20:51,522 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  Starting TaskManager (Version: 1.3.2, Rev:0399bee, 
Date:03.08.2017 @ 10:23:11 UTC)
2017-11-01 11:20:51,522 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  Current user: flink
2017-11-01 11:20:51,522 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 
1.8/25.141-b15
2017-11-01 11:20:51,522 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  Maximum heap size: 1024 MiBytes
2017-11-01 11:20:51,522 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  JAVA_HOME: /docker-java-home/jre
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  Hadoop version: 2.7.2
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  JVM Options:
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- -XX:+UseG1GC
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- -Xms1024M
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- -Xmx1024M
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- -XX:MaxDirectMemorySize=8388607T
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- 
-Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- 
-Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2017-11-01 11:20:51,526 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  Program Arguments:
2017-11-01 11:20:51,527 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- --configDir
2017-11-01 11:20:51,527 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- /opt/flink/conf
2017-11-01 11:20:51,527 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
-  Classpath: 
/opt/flink/lib/flink-python_2.11-1.3.2.jar:/opt/flink/lib/flink-shaded-hadoop2-uber-1.3.2.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.7.jar:/opt/flink/lib/flink-dist_2.11-1.3.2.jar:::
2017-11-01 11:20:51,527 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- 

2017-11-01 11:20:51,528 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- Registered UNIX signal handlers for [TERM, HUP, INT]
2017-11-01 11:20:51,532 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- Maximum number of open file descriptors is 1048576
2017-11-01 11:20:51,548 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- Loading configuration from /opt/flink/conf
2017-11-01 11:20:51,551 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.rpc.address, flink-jobmanager
2017-11-01 11:20:51,551 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.rpc.port, 6123
2017-11-01 11:20:51,551 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: jobmanager.heap.mb, 1024
2017-11-01 11:20:51,551 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: taskmanager.heap.mb, 1024
2017-

[jira] [Commented] (FLINK-7947) Let ParameterTool return a dedicated GlobalJobParameters object

2017-11-02 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236479#comment-16236479
 ] 

Greg Hogan commented on FLINK-7947:
---

Do these changes necessitate modifying the {{@Public}} API?

> Let ParameterTool return a dedicated GlobalJobParameters object
> ---
>
> Key: FLINK-7947
> URL: https://issues.apache.org/jira/browse/FLINK-7947
> Project: Flink
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.4.0
>Reporter: Till Rohrmann
>Assignee: Bowen Li
>Priority: Major
>
> The {{ParameterTool}} directly implements the {{GlobalJobParameters}} 
> interface. Additionally it has grown over time to not only store the 
> configuration parameters but also to record which parameters have been 
> requested and what default value was set. This information is irrelevant on 
> the server side when setting a {{GlobalJobParameters}} object via 
> {{ExecutionConfig#setGlobalJobParameters}}.
> Since we don't separate the {{ParameterTool}} logic and the actual data view, 
> users ran into problems when reusing the same {{ParameterTool}} to start 
> multiple jobs concurrently (see FLINK-7943). I think it would be a much 
> clearer separation of concerns if we would actually split the 
> {{GlobalJobParameters}} from the {{ParameterTool}}.
> Furthermore, we should think about whether {{ParameterTool#get}} should have 
> side effects or not as it does right now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7921) Flink downloads link redirect to spark downloads page

2017-11-01 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234714#comment-16234714
 ] 

Greg Hogan commented on FLINK-7921:
---

[~anil.kumar] are you still seeing this issue in your browser?

> Flink downloads link redirect to spark downloads page
> -
>
> Key: FLINK-7921
> URL: https://issues.apache.org/jira/browse/FLINK-7921
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.3.2
>Reporter: Anil Kumar
>Priority: Major
>
> On the Quickstart 
> [page|https://ci.apache.org/projects/flink/flink-docs-release-1.3/quickstart/setup_quickstart.html]
>  of flink, there`s a download page link under *Download and Unpack tab* which 
> is redirecting to spark downloads page instead of flink.
> Issue : redirection is happening to port 80(http) instead of port 443(https). 
> Once the download link is changed to https, it works fine.
> Current download link : http://flink.apache.org/downloads.html
> Required link : https://flink.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-2973) Add flink-benchmark with compliant licenses again

2017-10-05 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193521#comment-16193521
 ] 

Greg Hogan commented on FLINK-2973:
---

[~fhueske] [~rmetzger] can we not include this as an optional (or unlisted) 
module in the same manner as {{flink-connector-kinesis}}? Both are restricted 
by dependence on a Category X license (GPL and ASL, respectively).

> Add flink-benchmark with compliant licenses again
> -
>
> Key: FLINK-2973
> URL: https://issues.apache.org/jira/browse/FLINK-2973
> Project: Flink
>  Issue Type: Task
>  Components: Build System
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.0
>
>
> We recently created the Maven module {{flink-benchmark}} for micro-benchmarks 
> and ported most of the existing micro-benchmarks to the Java benchmarking 
> framework JMH. However, JMH is part of OpenJDK and under GPL license which is 
> not compatible with the AL2.
> Consequently, we need to remove this dependency and either revert the porting 
> commits or port the benchmarks to another benchmarking framework. An 
> alternative could be [Google's Caliper|https://github.com/google/caliper] 
> library which is under AL2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7687) Clarify the master and slaves files are not necessary unless using the cluster start/stop scripts

2017-10-04 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191504#comment-16191504
 ] 

Greg Hogan commented on FLINK-7687:
---

The quoted text is in the section "Standalone Cluster High Availability".

> Clarify the master and slaves files are not necessary unless using the 
> cluster start/stop scripts
> -
>
> Key: FLINK-7687
> URL: https://issues.apache.org/jira/browse/FLINK-7687
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.3.2
>Reporter: Elias Levy
>Priority: Minor
>
> It would be helpful if the documentation was clearer on the fact that the 
> master/slaves config files are not needed when configured in 
> high-availability mode unless you are using the provided scripts to start and 
> shutdown the cluster over SSH.  If you are using some other mechanism to 
> manage Flink instances (configuration management tools such as Chef or 
> Ansible, or container management frameworks like Docker Compose or 
> Kubernetes), these files are unnecessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7687) Clarify the master and slaves files are not necessary unless using the cluster start/stop scripts

2017-10-03 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189845#comment-16189845
 ] 

Greg Hogan commented on FLINK-7687:
---

[~elevy] is there a specific page or pages where this is unclear?

> Clarify the master and slaves files are not necessary unless using the 
> cluster start/stop scripts
> -
>
> Key: FLINK-7687
> URL: https://issues.apache.org/jira/browse/FLINK-7687
> Project: Flink
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 1.3.2
>Reporter: Elias Levy
>Priority: Minor
>
> It would be helpful if the documentation was clearer on the fact that the 
> master/slaves config files are not needed when configured in 
> high-availability mode unless you are using the provided scripts to start and 
> shutdown the cluster over SSH.  If you are using some other mechanism to 
> manage Flink instances (configuration management tools such as Chef or 
> Ansible, or container management frameworks like Docker Compose or 
> Kubernetes), these files are unnecessary.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7631) Ineffective check in PageRank#open()

2017-10-03 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16189835#comment-16189835
 ] 

Greg Hogan commented on FLINK-7631:
---

When `vertexCount` is zero the map function will not be called.

> Ineffective check in PageRank#open()
> 
>
> Key: FLINK-7631
> URL: https://issues.apache.org/jira/browse/FLINK-7631
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> From 
> flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/linkanalysis/PageRank.java
>  :
> {code}
>   this.vertexCount = vertexCountIterator.hasNext() ? 
> vertexCountIterator.next().getValue() : 0;
>   this.uniformlyDistributedScore = ((1 - dampingFactor) + dampingFactor * 
> sumOfSinks) / this.vertexCount;
> {code}
> The check for vertexCountIterator.hasNext() should enclose the assignments to 
> both this.vertexCount and this.uniformlyDistributedScore
> Otherwise there may be divide by zero error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7631) Ineffective check in PageRank#open()

2017-09-18 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169958#comment-16169958
 ] 

Greg Hogan commented on FLINK-7631:
---

Divide by zero is only an error for fixed-point arithmetic and here the 
numerator is a double.

> Ineffective check in PageRank#open()
> 
>
> Key: FLINK-7631
> URL: https://issues.apache.org/jira/browse/FLINK-7631
> Project: Flink
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>
> From 
> flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/linkanalysis/PageRank.java
>  :
> {code}
>   this.vertexCount = vertexCountIterator.hasNext() ? 
> vertexCountIterator.next().getValue() : 0;
>   this.uniformlyDistributedScore = ((1 - dampingFactor) + dampingFactor * 
> sumOfSinks) / this.vertexCount;
> {code}
> The check for vertexCountIterator.hasNext() should enclose the assignments to 
> both this.vertexCount and this.uniformlyDistributedScore
> Otherwise there may be divide by zero error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7402) Ineffective null check in NettyMessage#write()

2017-09-16 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7402.
-
Resolution: Implemented

master: 312e0853483d1aa5ca182827890c570c44ca6a37

> Ineffective null check in NettyMessage#write()
> --
>
> Key: FLINK-7402
> URL: https://issues.apache.org/jira/browse/FLINK-7402
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Reporter: Ted Yu
>Priority: Minor
> Fix For: 1.4.0
>
>
> Here is the null check in finally block:
> {code}
>   finally {
> if (buffer != null) {
>   buffer.recycle();
> }
> {code}
> But buffer has been dereferenced in the try block without guard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7402) Ineffective null check in NettyMessage#write()

2017-09-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7402:
--
Fix Version/s: 1.4.0

> Ineffective null check in NettyMessage#write()
> --
>
> Key: FLINK-7402
> URL: https://issues.apache.org/jira/browse/FLINK-7402
> Project: Flink
>  Issue Type: Bug
>  Components: Network
>Reporter: Ted Yu
>Priority: Minor
> Fix For: 1.4.0
>
>
> Here is the null check in finally block:
> {code}
>   finally {
> if (buffer != null) {
>   buffer.recycle();
> }
> {code}
> But buffer has been dereferenced in the try block without guard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7273) Gelly tests with empty graphs

2017-09-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7273.
-
Resolution: Fixed

master: 9437a0ffc04318f6a1a2d19c59f2ae6651b26507

> Gelly tests with empty graphs
> -
>
> Key: FLINK-7273
> URL: https://issues.apache.org/jira/browse/FLINK-7273
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.4.0
>
>
> There exist some tests with empty graphs but the `EmptyGraph` in 
> `AsmTestBase` contained vertices but no edges. Add a new `EmptyGraph` without 
> vertices and test both empty graphs for each algorithm.
> `PageRank` should (optionally?) include zero-degree vertices in the results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7199) Graph simplification does not set parallelism

2017-09-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7199:
--
Fix Version/s: 1.4.0

> Graph simplification does not set parallelism
> -
>
> Key: FLINK-7199
> URL: https://issues.apache.org/jira/browse/FLINK-7199
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.4.0
>
>
> The {{Simplify}} parameter should accept and set the parallelism when calling 
> the {{Simplify}} algorithms.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7199) Graph simplification does not set parallelism

2017-09-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7199.
-
Resolution: Fixed

master: 2ac09c084320f1803c623c719dca8b4776c8110f

> Graph simplification does not set parallelism
> -
>
> Key: FLINK-7199
> URL: https://issues.apache.org/jira/browse/FLINK-7199
> Project: Flink
>  Issue Type: Bug
>  Components: Gelly
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.4.0
>
>
> The {{Simplify}} parameter should accept and set the parallelism when calling 
> the {{Simplify}} algorithms.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7447) Hope add more committer information to "Community & Project Info" page.

2017-09-15 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167976#comment-16167976
 ] 

Greg Hogan commented on FLINK-7447:
---

The timezone is of little benefit for real-time communication without knowing a 
developer's availability regarding schedule, holidays, work tasks, etc. If 
timezones imply an obligation or expectation that committer's are reachable at 
certain times then we should continue to simply provide committer emails 
(@apache.org) and direct users and developers to the mailing lists.

> Hope add more committer information to "Community & Project Info" page.
> ---
>
> Key: FLINK-7447
> URL: https://issues.apache.org/jira/browse/FLINK-7447
> Project: Flink
>  Issue Type: Wish
>  Components: Project Website
>Reporter: Hai Zhou
>
> I wish add the "organization" and "time zone" information to committer 
> introduction, while using the mail instead of Apache ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7482) StringWriter to support compression

2017-08-21 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7482.
-
Resolution: Not A Bug

Hi [~felixcheung]. This is a great question for the dev or user [mailing 
lists|https://flink.apache.org/community.html#mailing-lists].

> StringWriter to support compression
> ---
>
> Key: FLINK-7482
> URL: https://issues.apache.org/jira/browse/FLINK-7482
> Project: Flink
>  Issue Type: Bug
>  Components: filesystem-connector
>Affects Versions: 1.3.2
>Reporter: Felix Cheung
>
> Is it possible to have StringWriter support compression like 
> AvroKeyValueSinkWriter or SequenceFileWriter?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7447) Hope add more committer information to "Community & Project Info" page.

2017-08-15 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127565#comment-16127565
 ] 

Greg Hogan commented on FLINK-7447:
---

I can see that listing organizations may be beneficial but this looks to run 
counter to the principle that [individuals compose the 
ASF|http://www.apache.org/foundation/how-it-works.html#hats]. dataArtisians 
lists their [people|https://data-artisans.com/about] and other affiliations are 
often evident from the mailing list. I'm not seeing much benefit to listing 
time zones, especially given the volunteer nature of the project.

> Hope add more committer information to "Community & Project Info" page.
> ---
>
> Key: FLINK-7447
> URL: https://issues.apache.org/jira/browse/FLINK-7447
> Project: Flink
>  Issue Type: Wish
>  Components: Project Website
>Reporter: Hai Zhou
>
> I wish add the "organization" and "time zone" information to committer 
> introduction, while using the mail instead of Apache ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7447) Hope add more committer information to "Community & Project Info" page.

2017-08-15 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7447:
--
Component/s: (was: Documentation)
 Project Website

> Hope add more committer information to "Community & Project Info" page.
> ---
>
> Key: FLINK-7447
> URL: https://issues.apache.org/jira/browse/FLINK-7447
> Project: Flink
>  Issue Type: Wish
>  Components: Project Website
>Reporter: Hai Zhou
>
> I wish add the "organization" and "time zone" information to committer 
> introduction, while using the mail instead of Apache ID.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7296) Validate commit messages in git pre-receive hook

2017-07-28 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105217#comment-16105217
 ] 

Greg Hogan commented on FLINK-7296:
---

[~uce] I'm trying to think through where this should run. On the client side 
committers may not have (pre-push?) hook configured on a repo so changes could 
slip through. GitHub doesn't allow server-side hooks (instead sending out 
webhooks notifications). Apache Infra may allow a server-side hook but would 
require a ticket to update (and if ever switching to GitBox would require 
committers to continue pushing to the Apache-hosted repo).

> Validate commit messages in git pre-receive hook
> 
>
> Key: FLINK-7296
> URL: https://issues.apache.org/jira/browse/FLINK-7296
> Project: Flink
>  Issue Type: Improvement
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Would like to investigate a pre-receive (server-side) hook analyzing the 
> commit message incoming revisions on the {{master}} branch for the standard 
> JIRA format ({{\[FLINK-\] \[component\] ...}} or {{\[hotfix\] 
> \[component\] ...}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7296) Validate commit messages in git pre-receive hook

2017-07-28 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7296:
--
Description: Would like to investigate a pre-receive (server-side) hook 
analyzing the commit message incoming revisions on the {{master}} branch for 
the standard JIRA format ({{\[FLINK-\] \[component\] ...}} or {{\[hotfix\] 
\[component\] ...}}).  (was: Would like to investigate a pre-receive 
(server-side) hook analyzing the commit message incoming revisions on the 
{{master}} branch for the standard JIRA format ({{\[FLINK-\] \[module\] 
...}}).)

> Validate commit messages in git pre-receive hook
> 
>
> Key: FLINK-7296
> URL: https://issues.apache.org/jira/browse/FLINK-7296
> Project: Flink
>  Issue Type: Improvement
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Would like to investigate a pre-receive (server-side) hook analyzing the 
> commit message incoming revisions on the {{master}} branch for the standard 
> JIRA format ({{\[FLINK-\] \[component\] ...}} or {{\[hotfix\] 
> \[component\] ...}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7296) Validate commit messages in git pre-receive hook

2017-07-28 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105161#comment-16105161
 ] 

Greg Hogan commented on FLINK-7296:
---

Yes, I will update the description.

> Validate commit messages in git pre-receive hook
> 
>
> Key: FLINK-7296
> URL: https://issues.apache.org/jira/browse/FLINK-7296
> Project: Flink
>  Issue Type: Improvement
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
>
> Would like to investigate a pre-receive (server-side) hook analyzing the 
> commit message incoming revisions on the {{master}} branch for the standard 
> JIRA format ({{\[FLINK-\] \[module\] ...}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7296) Validate commit messages in git pre-receive hook

2017-07-28 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7296:
-

 Summary: Validate commit messages in git pre-receive hook
 Key: FLINK-7296
 URL: https://issues.apache.org/jira/browse/FLINK-7296
 Project: Flink
  Issue Type: Improvement
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


Would like to investigate a pre-receive (server-side) hook analyzing the commit 
message incoming revisions on the {{master}} branch for the standard JIRA 
format ({{\[FLINK-\] \[module\] ...}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7220) Update RocksDB dependency to 5.5.5

2017-07-28 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104898#comment-16104898
 ] 

Greg Hogan commented on FLINK-7220:
---

Ah, just a small thing. Unfortunate that we can't fix this but good to know 
what happened.

> Update RocksDB dependency to 5.5.5
> --
>
> Key: FLINK-7220
> URL: https://issues.apache.org/jira/browse/FLINK-7220
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Affects Versions: 1.4.0, 1.3.2
>Reporter: Stefan Richter
>Assignee: Stefan Richter
> Fix For: 1.4.0
>
>
> The latest release of RocksDB (5.5.5) fixes the issues from previous versions 
> (slow merge performance, segfaults) in connection with Flink and seems stable 
> for us to use. We can move away from our custom FRocksDB build, back to the 
> latest release.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7277) Weighted PageRank

2017-07-26 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7277:
-

 Summary: Weighted PageRank
 Key: FLINK-7277
 URL: https://issues.apache.org/jira/browse/FLINK-7277
 Project: Flink
  Issue Type: New Feature
  Components: Gelly
Affects Versions: 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan


Add a weighted PageRank algorithm to complement the existing unweighted 
implementation. Edge values store a `double` weight value which is summed per 
vertex in place of the vertex degree. The vertex score is joined as the 
fraction of vertex weight rather than dividing by the vertex degree.

The examples `Runner` must now read and generated weighted graphs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7276) Gelly algorithm parameters

2017-07-26 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7276:
-

 Summary: Gelly algorithm parameters
 Key: FLINK-7276
 URL: https://issues.apache.org/jira/browse/FLINK-7276
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor


Similar to the examples drivers, the algorithm configuration fields should be 
typed to handle `canMergeConfiguration` and `mergeConfiguration` in 
`GraphAlgorithmWrappingBase` rather than overriding these methods in each 
algorithm (which has proven brittle). The existing `OptionalBoolean` is one 
example.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7275) Differentiate between normal and power-user cli options in Gelly examples

2017-07-26 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7275:
-

 Summary: Differentiate between normal and power-user cli options 
in Gelly examples
 Key: FLINK-7275
 URL: https://issues.apache.org/jira/browse/FLINK-7275
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan


The current "hack" is to preface "power-user" options with a double underscore 
(i.e. '__parallelism') which are then "hidden" by exclusion from the program 
usage documentation. Change this to instead be explicit in the {{Parameter}} 
API and provide a cli option to display "power-user" options.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-6909) Flink should support Lombok POJO

2017-07-26 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101779#comment-16101779
 ] 

Greg Hogan commented on FLINK-6909:
---

Hi [~mdkamaruzzaman], judging by the extensive tooling provided at the Lombok 
website it looks like `TypeExtractor` would need to be modified to parse and 
interpret the Lombok annotations. The cost to do so is added complexity in a 
vital and already very complex portion of the core Flink architecture. The 
benefit of automatic POJO is available in any IDE and also using Lombok via the 
provided [delombok tool|https://projectlombok.org/features/delombok] which 
converts annotated classes to POJOs and can be integrated into one's project 
builds.

> Flink should support Lombok POJO
> 
>
> Key: FLINK-6909
> URL: https://issues.apache.org/jira/browse/FLINK-6909
> Project: Flink
>  Issue Type: Wish
>  Components: Type Serialization System
>Reporter: Md Kamaruzzaman
>Priority: Minor
> Fix For: 1.2.1
>
>
> Project lombok helps greatly to reduce boilerplate Java Code. 
> It seems that Flink does not accept a lombok POJO as a valid pojo. 
> e.g. Here is a POJO defined with lombok:
> @Getter
> @Setter
> @NoArgsConstructor
> public class SimplePojo
> Using this Pojo class to read from CSV file throws this exception:
> Exception in thread "main" java.lang.ClassCastException: 
> org.apache.flink.api.java.typeutils.GenericTypeInfo cannot be cast to 
> org.apache.flink.api.java.typeutils.PojoTypeInfo
> It would be great if flink supports lombok POJO.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7273) Gelly tests with empty graphs

2017-07-26 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7273:
-

 Summary: Gelly tests with empty graphs
 Key: FLINK-7273
 URL: https://issues.apache.org/jira/browse/FLINK-7273
 Project: Flink
  Issue Type: Bug
  Components: Gelly
Affects Versions: 1.4.0
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Minor
 Fix For: 1.4.0


There exist some tests with empty graphs but the `EmptyGraph` in `AsmTestBase` 
contained vertices but no edges. Add a new `EmptyGraph` without vertices and 
test both empty graphs for each algorithm.

`PageRank` should (optionally?) include zero-degree vertices in the results.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7234) Fix CombineHint documentation

2017-07-26 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7234.
-
Resolution: Fixed

master: 4a88f6587fdfadd5749188a76e6b38a3585cd31b
release-1.3: d0a9fe013e0dccf7ff329aaedf085e8c1c133ae0

> Fix CombineHint documentation
> -
>
> Key: FLINK-7234
> URL: https://issues.apache.org/jira/browse/FLINK-7234
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.2, 1.4.0, 1.3.2
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.4.0, 1.3.2
>
>
> The {{CombineHint}} 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
>  applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be 
> note for {{DataSet#distinct}}. It is also set with 
> {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-6648) Transforms for Gelly examples

2017-07-26 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-6648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-6648.
-
Resolution: Implemented

master: 8695a21098c06cd1cd727d24bca51db9971cb146

> Transforms for Gelly examples
> -
>
> Key: FLINK-6648
> URL: https://issues.apache.org/jira/browse/FLINK-6648
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Affects Versions: 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.4.0
>
>
> A primary objective of the Gelly examples {{Runner}} is to make adding new 
> inputs and algorithms as simple and powerful as possible. A recent feature 
> made it possible to translate the key ID of generated graphs to alternative 
> numeric or string representations. For floating point and {{LongValue}} it is 
> desirable to translate the key ID of the algorithm results.
> Currently a {{Runner}} job consists of an input, an algorithm, and an output. 
> A {{Transform}} will translate the input {{Graph}} and the algorithm output 
> {{DataSet}}. The {{Input}} and algorithm {{Driver}} will return an ordered 
> list of {{Transform}} which will be executed in that order (processed in 
> reverse order for algorithm output) . The {{Transform}} can be configured as 
> are inputs and drivers.
> Example transforms:
> - the aforementioned translation of key ID types
> - surrogate types (String -> Long or Int) for user data
> - FLINK-4481 Maximum results for pairwise algorithms
> - FLINK-3625 Graph algorithms to permute graph labels and edges



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7234) Fix CombineHint documentation

2017-07-26 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7234:
--
Fix Version/s: 1.3.2
   1.4.0

> Fix CombineHint documentation
> -
>
> Key: FLINK-7234
> URL: https://issues.apache.org/jira/browse/FLINK-7234
> Project: Flink
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.2.2, 1.4.0, 1.3.2
>Reporter: Greg Hogan
>Assignee: Greg Hogan
> Fix For: 1.4.0, 1.3.2
>
>
> The {{CombineHint}} 
> [documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
>  applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be 
> note for {{DataSet#distinct}}. It is also set with 
> {{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (FLINK-7118) Remove hadoop1.x code in HadoopUtils

2017-07-22 Thread Greg Hogan (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097409#comment-16097409
 ] 

Greg Hogan commented on FLINK-7118:
---

I'd recommend a fix version of 1.4 for this PR since we are only applying bug 
fixes (and documentation updates) to 1.3.

> Remove hadoop1.x code in HadoopUtils
> 
>
> Key: FLINK-7118
> URL: https://issues.apache.org/jira/browse/FLINK-7118
> Project: Flink
>  Issue Type: Improvement
>  Components: Java API
>Reporter: mingleizhang
>Assignee: mingleizhang
> Fix For: 1.3.2
>
>
> Since flink no longer support hadoop 1.x version, we should remove it. Below 
> code reside in {{org.apache.flink.api.java.hadoop.mapred.utils.HadoopUtils}}
>   
> {code:java}
> public static JobContext instantiateJobContext(Configuration configuration, 
> JobID jobId) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.JobContextImpl", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.JobContext", true, 
> Thread.currentThread().getContextClassLoader());
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, JobID.class);
>   JobContext context = (JobContext) 
> constructor.newInstance(configuration, jobId);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> JobContext.");
>   }
>   }
> {code}
> And 
> {code:java}
>   public static TaskAttemptContext 
> instantiateTaskAttemptContext(Configuration configuration,  TaskAttemptID 
> taskAttemptID) throws Exception {
>   try {
>   Class clazz;
>   // for Hadoop 1.xx
>   if(JobContext.class.isInterface()) {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl");
>   }
>   // for Hadoop 2.xx
>   else {
>   clazz = 
> Class.forName("org.apache.hadoop.mapreduce.TaskAttemptContext");
>   }
>   Constructor constructor = 
> clazz.getConstructor(Configuration.class, TaskAttemptID.class);
>   TaskAttemptContext context = (TaskAttemptContext) 
> constructor.newInstance(configuration, taskAttemptID);
>   
>   return context;
>   } catch(Exception e) {
>   throw new Exception("Could not create instance of 
> TaskAttemptContext.");
>   }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Closed] (FLINK-7204) CombineHint.NONE

2017-07-19 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan closed FLINK-7204.
-
Resolution: Implemented

master: ce10e57bc163babd59005fa250c26e7604f23cf5

> CombineHint.NONE
> 
>
> Key: FLINK-7204
> URL: https://issues.apache.org/jira/browse/FLINK-7204
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.4.0
>
>
> FLINK-3477 added a hash-combine preceding the reducer configured with 
> {{CombineHint.HASH}} or {{CombineHint.SORT}} (default). In some cases it may 
> be useful to disable the combiner in {{ReduceNode}} by specifying a new 
> {{CombineHint.NONE}} value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (FLINK-7204) CombineHint.NONE

2017-07-19 Thread Greg Hogan (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Hogan updated FLINK-7204:
--
Fix Version/s: 1.4.0

> CombineHint.NONE
> 
>
> Key: FLINK-7204
> URL: https://issues.apache.org/jira/browse/FLINK-7204
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 1.4.0
>Reporter: Greg Hogan
>Assignee: Greg Hogan
>Priority: Minor
> Fix For: 1.4.0
>
>
> FLINK-3477 added a hash-combine preceding the reducer configured with 
> {{CombineHint.HASH}} or {{CombineHint.SORT}} (default). In some cases it may 
> be useful to disable the combiner in {{ReduceNode}} by specifying a new 
> {{CombineHint.NONE}} value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7234) Fix CombineHint documentation

2017-07-19 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7234:
-

 Summary: Fix CombineHint documentation
 Key: FLINK-7234
 URL: https://issues.apache.org/jira/browse/FLINK-7234
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.2.2, 1.4.0, 1.3.2
Reporter: Greg Hogan
Assignee: Greg Hogan


The {{CombineHint}} 
[documentation|https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/batch/index.html]
 applies to {{DataSet#reduce}} not {{DataSet#reduceGroup}} and should also be 
note for {{DataSet#distinct}}. It is also set with 
{{.setCombineHint(CombineHint)}} rather than alongside the UDF parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (FLINK-7211) Exclude Gelly javadoc jar from release

2017-07-17 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-7211:
-

 Summary: Exclude Gelly javadoc jar from release
 Key: FLINK-7211
 URL: https://issues.apache.org/jira/browse/FLINK-7211
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.4.0, 1.3.2
Reporter: Greg Hogan
Assignee: Greg Hogan
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1081 matches

Mail list logo