[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394967#comment-14394967 ] ASF GitHub Bot commented on FLINK-1656: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/525 Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393167#comment-14393167 ] ASF GitHub Bot commented on FLINK-1656: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/525#issuecomment-89009870 Will merge this in about 24h unless somebody raises a flag. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381564#comment-14381564 ] ASF GitHub Bot commented on FLINK-1656: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/525#issuecomment-86400852 I updated the PR as discussed: - GlobalProperties are filtered with the user-specified semantic properties. - LocalProperties for are filtered with forward field info for key fields only (sorting/grouping can only be preserved for key fields and is completely destroyed for key-less operators such as AllReduce or MapPartition). Documentation is adapted accordingly. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377846#comment-14377846 ] ASF GitHub Bot commented on FLINK-1656: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/525#issuecomment-85499509 Yes, I agree. The fix is more conservative than necessary. I think we can safely relax it if we make the rule for forwarded fields on group-wise operators as follows: *All forwarded fields of an emitted record must be forwarded from the same input record. Each output record can have forwarded values from a different input record.* This allows to preserve partitionings (also on composite keys) but voids all grouping / sorting on non-key fields because the order of the emitted records is not specified. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377574#comment-14377574 ] ASF GitHub Bot commented on FLINK-1656: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/525#issuecomment-85422608 I think this is good to fix this. I was wondering now whether we can actually solve this a bit more global/local property specific. Cancelling out the properties from the non-key fields was originally motivated by the fact that the group operation destroys orders/groupings, which are actually local properties. Is there a way we can preserve the global properties still? What would happen if we move the new code that cleans the semantic properties from the API operators to the optimizer's operator descriptor. There we can filter the local properties and global properties independently. The main benefit is probably to preserve the global properties for `mapPartition()`, which is desirable. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377959#comment-14377959 ] ASF GitHub Bot commented on FLINK-1656: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/525#issuecomment-85528267 Good catch, that is a critical distinction (one output record forwarded from same input record). Mostl relevant to MapPartition, though. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377965#comment-14377965 ] ASF GitHub Bot commented on FLINK-1656: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/525#issuecomment-85531186 Do you think this rule is easy enough for users? It would make the handling consistent for all group-wise operators. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375665#comment-14375665 ] Fabian Hueske commented on FLINK-1656: -- I am going to address this issue on two levels: 1. The optimizer will filter out all forward field information on non-key fields for operators with iterators. - non-key fields of GroupReduce - non-key fields of CoGroup - all fields of MapPartition 2. The APIs will log a warning if a user adds forward field information for non-key fields of GroupReduce and CoGroup and throw an exception if a user adds forward field information for MapPartition and Filter. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1656) Filtered Semantic Properties for Operators with Iterators
[ https://issues.apache.org/jira/browse/FLINK-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376777#comment-14376777 ] ASF GitHub Bot commented on FLINK-1656: --- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/525 [FLINK-1656] Filter ForwardedField properties for group-at-a-time operators in Optimizer Restricts forward field information for group-wise operators. - For `GroupReduce`, `GroupCombine`, and `CoGroup` operators forward field information is restricted to grouping key fields. - For `MapPartition` operators all forwarded field information is discarded. Extended documentation accordingly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink groupKeySemProps Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/525.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #525 commit a49967db7ecea782860d7a0fa430cb2281653b03 Author: Fabian Hueske fhue...@apache.org Date: 2015-03-23T10:55:34Z [FLINK-1656] Filter ForwardedField properties for group-at-a-time operators in Optimizer. Filtered Semantic Properties for Operators with Iterators - Key: FLINK-1656 URL: https://issues.apache.org/jira/browse/FLINK-1656 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Critical The documentation of ForwardedFields is incomplete for operators with iterator inputs (GroupReduce, CoGroup). This should be fixed ASAP, because it can lead to incorrect program execution. The conditions for forwarded fields on operators with iterator input are: 1) forwarded fields must be emitted in the order in which they are received through the iterator 2) all forwarded fields of a record must stick together, i.e., if your function builds record from field 0 of the 1st, 3rd, 5th, ... and field 1 of the 2nd, 4th, ... record coming through the iterator, these are not valid forwarded fields. 3) it is OK to completely filter out records coming through the iterator. The reason for these conditions is that the optimizer uses forwarded fields to reason about physical data properties such as order and grouping. Mixing up the order of records or emitting records which are composed from different input records, might destroy a (secondary) order or grouping. -- This message was sent by Atlassian JIRA (v6.3.4#6332)