[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-12 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540610#comment-14540610
 ] 

Stephan Ewen commented on FLINK-1959:
-

Great debugging, this is perfect!

I will prepare a fix based on that...

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: master
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: master
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-12 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540146#comment-14540146
 ] 

Alexander Alexandrov commented on FLINK-1959:
-

After some more debugging with [~elbehery] we managed to identify the issue.

Adding {{partitionByHash}} into the pipeline breaks the local chain two and 
creates two tasks. The second task (the receiver of the partition) starts with 
a RegularPactTask which runs a NoOpDriver and does not have a UDF stub. Because 
of that, the test at [line 
511|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java#L511]
 and the accumulators are never called. 

@[~StephanEwen]: Unless you see a problem with that solution, I suggest to 
remove the {{stub != null}} check at [line 
511|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java#L511]
 and report accumulators always.

Regards,
Alexander


> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: master
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: master
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-12 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539987#comment-14539987
 ] 

mustafa elbehery commented on FLINK-1959:
-

I have tried debugging to check the problem .. What I can see is that 
re-partitioning is working find and the data is re-distributed as expected. 
Moreover, the accumulators from each sub-task if created and filled with the 
required information. However, the merge method is never called in case of 
Re-Partitioning !!! 

I could see in debug mode that the accumulator in RunTime is empty.

Any suggestion to solve the problem ?!!

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: master
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: master
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538540#comment-14538540
 ] 

Stephan Ewen commented on FLINK-1959:
-

I will try and look into this very soon...

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: master
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: master
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537845#comment-14537845
 ] 

Alexander Alexandrov commented on FLINK-1959:
-

I think that some communication / traversal chain gets broken in the 
ParitionByHash node.

You can either 

(1) try to dig through the code and see where this happens, or
(2) use an alternative to the accumulator until the issue is resolved (e.g. 
write the information to a pre-defined HDFS path); 

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.8.1
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: 0.8.1
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-10 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537312#comment-14537312
 ] 

mustafa elbehery commented on FLINK-1959:
-

Any updates for this bug ?!!This is kind of critical to me 

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.8.1
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: 0.8.1
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-04-30 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521195#comment-14521195
 ] 

mustafa elbehery commented on FLINK-1959:
-

I just tried with DOP 1,2 and 5 .. All return NULL 

"Number of detected empty fields per column: null"

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.8.1
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: 0.8.1
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-04-29 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519542#comment-14519542
 ] 

Stephan Ewen commented on FLINK-1959:
-

Can you try if this only happens with a parallelism greater than one, or also 
with parallelism one?

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.8.1
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: 0.8.1
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-04-29 Thread mustafa elbehery (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519494#comment-14519494
 ] 

mustafa elbehery commented on FLINK-1959:
-

BTW, the version of Flink is 0.9 but it was not in the drop-down list.

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.8.1
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: 0.8.1
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)