[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394967#comment-14394967
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/525


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393167#comment-14393167
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/525#issuecomment-89009870
  
Will merge this in about 24h unless somebody raises a flag.


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381564#comment-14381564
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/525#issuecomment-86400852
  
I updated the PR as discussed:
- GlobalProperties are filtered with the user-specified semantic properties.
- LocalProperties for are filtered with forward field info for key fields 
only (sorting/grouping can only be preserved for key fields and is completely 
destroyed for key-less operators such as AllReduce or MapPartition).

Documentation is adapted accordingly.


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377846#comment-14377846
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/525#issuecomment-85499509
  
Yes, I agree. The fix is more conservative than necessary.
I think we can safely relax it if we make the rule for forwarded fields on 
group-wise operators as follows:

*All forwarded fields of an emitted record must be forwarded from the same 
input record. Each output record can have forwarded values from a different 
input record.*

This allows to preserve partitionings (also on composite keys) but voids 
all grouping / sorting on non-key fields because the order of the emitted 
records is not specified.


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377574#comment-14377574
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/525#issuecomment-85422608
  
I think this is good to fix this.

I was wondering now whether we can actually solve this a bit more 
global/local property specific. Cancelling out the properties from the non-key 
fields was originally motivated by the fact that the group operation destroys 
orders/groupings, which are actually local properties.

Is there a way we can preserve the global properties still? 

What would happen if we move the new code that cleans the semantic 
properties from the API operators to the optimizer's operator descriptor. There 
we can filter the local properties and global properties independently.

The main benefit is probably to preserve the global properties for 
`mapPartition()`, which is desirable.



 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377959#comment-14377959
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/525#issuecomment-85528267
  
Good catch, that is a critical distinction (one output record forwarded 
from same input record). Mostl relevant to MapPartition, though.


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377965#comment-14377965
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/525#issuecomment-85531186
  
Do you think this rule is easy enough for users? 
It would make the handling consistent for all group-wise operators.


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-23 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375665#comment-14375665
 ] 

Fabian Hueske commented on FLINK-1656:
--

I am going to address this issue on two levels:

1. The optimizer will filter out all forward field information on non-key 
fields for operators with iterators.
 - non-key fields of GroupReduce
 - non-key fields of CoGroup
 - all fields of MapPartition

2. The APIs will log a warning if a user adds forward field information for 
non-key fields of GroupReduce and CoGroup and throw an exception if a user adds 
forward field information for MapPartition and Filter.


 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators

2015-03-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376777#comment-14376777
 ] 

ASF GitHub Bot commented on FLINK-1656:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/525

[FLINK-1656] Filter ForwardedField properties for group-at-a-time operators 
in Optimizer

Restricts forward field information for group-wise operators.

- For `GroupReduce`, `GroupCombine`, and `CoGroup` operators forward field 
information is restricted to grouping key fields.
- For `MapPartition` operators all forwarded field information is discarded.

Extended documentation accordingly.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink groupKeySemProps

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/525.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #525


commit a49967db7ecea782860d7a0fa430cb2281653b03
Author: Fabian Hueske fhue...@apache.org
Date:   2015-03-23T10:55:34Z

[FLINK-1656] Filter ForwardedField properties for group-at-a-time operators 
in Optimizer.




 Filtered Semantic Properties for Operators with Iterators
 -

 Key: FLINK-1656
 URL: https://issues.apache.org/jira/browse/FLINK-1656
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical

 The documentation of ForwardedFields is incomplete for operators with 
 iterator inputs (GroupReduce, CoGroup). 
 This should be fixed ASAP, because it can lead to incorrect program execution.
 The conditions for forwarded fields on operators with iterator input are:
 1) forwarded fields must be emitted in the order in which they are received 
 through the iterator
 2) all forwarded fields of a record must stick together, i.e., if your 
 function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of 
 the 2nd, 4th, ... record coming through the iterator, these are not valid 
 forwarded fields.
 3) it is OK to completely filter out records coming through the iterator.
 The reason for these conditions is that the optimizer uses forwarded fields 
 to reason about physical data properties such as order and grouping. Mixing 
 up the order of records or emitting records which are composed from different 
 input records, might destroy a (secondary) order or grouping.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)