[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2961 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91906988 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- Not sure, I would like to avoid a solution where projections for aggregates are injected at different locations in the code, i.e., injecting a projection before validation for non-windowed aggregates and after validation (or during optimization) for windowed aggregates. I think it would be better to have this consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user KurtYoung commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91885381 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- Hi @fhueske , i think it will be a more complex situation when window involves. However, i want to keep this change simple and straightforward, and open another issue to track the scenario when involving windows. Does this make sense to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91738368 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- At the moment there is no way to specify watermarks inside of a Table API or SQL query. This can only be done on a DataStream before it is converted into a Table. Therefore, watermarks and timestamps are already assigned before the first Table or SQL operator can remove anything. In case of a TableSource which assigns timestamps, the TableSource needs to take care that the assignment happens before a pushed-down projection is applied. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user KurtYoung commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91729382 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- What if user use a customized watermark extracter which used some fields from the element. For example, we have original table source containing 4 fields: a, b, c, d. And user used "a" field to extract the timestamp and watermark. But in the later query on the table, only "b" and "c" are used. And if we do a projection on "b" and "c", and the projection is pushed into table source further, we will not get field "a" which used to produce timestamp anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91691224 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- Watermarks and timestamps should not be affected by this change. They are treated as metadata by Flink and not part of the schema. Also watermarks and timestamps should be assigned before the query. We do not support assigning watermarks within a query. I also had a quick look into it. One problem I found was that a window alias is handled as an `UnresolvedFieldReference` in `select` here and therefore added to the projection. However, the input does to have a field like that and validation fails. During validation, the window alias is correctly recognized. Maybe it makes more sense to add the projection at this point by injection an additional `Project` with the `RelBuilder`. Another solution could be a `RelOptRule`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user KurtYoung commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91641034 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- Oh. My bad, i will add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/2961#discussion_r91561592 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/table.scala --- @@ -881,24 +883,21 @@ class GroupWindowedTable( * }}} */ def select(fields: Expression*): Table = { --- End diff -- Can you add the pre-aggregation projection for window aggregates as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2961: [FLINK-5266] [table] eagerly project unused fields...
GitHub user KurtYoung opened a pull request: https://github.com/apache/flink/pull/2961 [FLINK-5266] [table] eagerly project unused fields when selecting aggregation fields This PR is based on #2926 , only the second commit is related. I add a "plan" test directory to hold all the plan level tests. And i also did a small refactory for ProjectionTranslator, thought it's better to keep each method only do one thing. @fhueske As we discussed earlier in the jira: https://issues.apache.org/jira/browse/FLINK-5266 about where the logics should be added. I decided to add them when we selecting fields from a normal or grouped table. Since this kind of logics involves some fields references rewrite, if we choose to add the needed projection node when we convert the LogicalPlan to Calcite's RelNode, we should also take care the whole rewrite thing. However, if we add the project node in the first place, we only need to extract all the field references used in all selecting expressions, and treat them as UnresolvedFieldReferences. The validation part will take care of the rewrite thing. I think this will be easier and more consistent with other procedures. (Noticed all the "construct" logic are fairly simple) You can merge this pull request into a Git repository by running: $ git pull https://github.com/KurtYoung/flink flink-5266 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2961.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2961 commit 374d231d44f84ae385d8f8adb2353685e1214ff6 Author: Kurt YoungDate: 2016-12-08T01:27:55Z [FLINK-5226] [table] Use correct DataSetCostFactory and improve DataSetCalc costs. commit 8a3ecf8e6362acd9370b11f08018a0143fc9be18 Author: Kurt Young Date: 2016-12-08T02:35:43Z [FLINK-5266] [table] eagerly project unused fields when selecting aggregation fields Add a "plan" test dir to hold all the plan level unit tests Small refactory with ProjectionTranslator, keep each method handle one single thing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---