[jira] [Commented] (FLINK-4575) DataSet aggregate methods should support POJOs
[ https://issues.apache.org/jira/browse/FLINK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249444#comment-16249444 ] Gabor Gevay commented on FLINK-4575: OK, makes sense. Feel free to close this jira, if you think we shouldn't do it. > DataSet aggregate methods should support POJOs > -- > > Key: FLINK-4575 > URL: https://issues.apache.org/jira/browse/FLINK-4575 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Reporter: Gabor Gevay >Priority: Minor > Labels: starter > > The aggregate methods of DataSets (aggregate, sum, min, max) currently only > support Tuples, with the fields specified by indices. With > https://issues.apache.org/jira/browse/FLINK-3702 resolved, adding support for > POJOs and field expressions would be easy: {{AggregateOperator}} would create > {{FieldAccessors}} instead of just storing field positions, and > {{AggregateOperator.AggregatingUdf}} would use these {{FieldAccessors}} > instead of the Tuple field access methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4575) DataSet aggregate methods should support POJOs
[ https://issues.apache.org/jira/browse/FLINK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249312#comment-16249312 ] Fabian Hueske commented on FLINK-4575: -- I'm not sure about extending the DataSet API for such special cases. In fact, I'd rather remove the support for built-in aggregation functions on Tuples in the future as well. IMO, the DataSet API is a rather low-level API that should provide the tools to implement custom functions based on {{MapFunction}}, {{GroupReduceFunction}}, etc. Functionality for built-in aggregation function is much better covered by the Table API or SQL support. In fact, it is very easy to convert a {{DataSet}} into a {{Table}} and vice-versa. > DataSet aggregate methods should support POJOs > -- > > Key: FLINK-4575 > URL: https://issues.apache.org/jira/browse/FLINK-4575 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Reporter: Gabor Gevay >Priority: Minor > Labels: starter > > The aggregate methods of DataSets (aggregate, sum, min, max) currently only > support Tuples, with the fields specified by indices. With > https://issues.apache.org/jira/browse/FLINK-3702 resolved, adding support for > POJOs and field expressions would be easy: {{AggregateOperator}} would create > {{FieldAccessors}} instead of just storing field positions, and > {{AggregateOperator.AggregatingUdf}} would use these {{FieldAccessors}} > instead of the Tuple field access methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4575) DataSet aggregate methods should support POJOs
[ https://issues.apache.org/jira/browse/FLINK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248874#comment-16248874 ] Gabor Gevay commented on FLINK-4575: [~vcycyv], I'm not sure how would {{getFlatFields}} help here. (How would you convert back to POJO at the end?) But if you would like to work on this jira, then the approach outlined in the jira description should work. I think this is the cleanest solution, since {{FieldAccessor}} is exactly for situations like we have here, where we have to get and set a field, based on a field expression. However, you would have to resolve https://issues.apache.org/jira/browse/FLINK-4578 first. I think that could be resolved by the solution that I wrote in a comment there. > DataSet aggregate methods should support POJOs > -- > > Key: FLINK-4575 > URL: https://issues.apache.org/jira/browse/FLINK-4575 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Reporter: Gabor Gevay >Priority: Minor > Labels: starter > > The aggregate methods of DataSets (aggregate, sum, min, max) currently only > support Tuples, with the fields specified by indices. With > https://issues.apache.org/jira/browse/FLINK-3702 resolved, adding support for > POJOs and field expressions would be easy: {{AggregateOperator}} would create > {{FieldAccessors}} instead of just storing field positions, and > {{AggregateOperator.AggregatingUdf}} would use these {{FieldAccessors}} > instead of the Tuple field access methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4575) DataSet aggregate methods should support POJOs
[ https://issues.apache.org/jira/browse/FLINK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248142#comment-16248142 ] Chuyang Wan commented on FLINK-4575: [~ggevay], if you are ok with converting the POJO to a Tuple via getFlatFields, I can take the task. The idea is, to take POJO field for aggregation operators' argument, and convert it to tuple field, then just use the tuple field as what the current program does. > DataSet aggregate methods should support POJOs > -- > > Key: FLINK-4575 > URL: https://issues.apache.org/jira/browse/FLINK-4575 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Reporter: Gabor Gevay >Priority: Minor > Labels: starter > > The aggregate methods of DataSets (aggregate, sum, min, max) currently only > support Tuples, with the fields specified by indices. With > https://issues.apache.org/jira/browse/FLINK-3702 resolved, adding support for > POJOs and field expressions would be easy: {{AggregateOperator}} would create > {{FieldAccessors}} instead of just storing field positions, and > {{AggregateOperator.AggregatingUdf}} would use these {{FieldAccessors}} > instead of the Tuple field access methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4575) DataSet aggregate methods should support POJOs
[ https://issues.apache.org/jira/browse/FLINK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16247175#comment-16247175 ] Chuyang Wan commented on FLINK-4575: How about converting the POJO to a Tuple via getFlatFields? Just a thought... > DataSet aggregate methods should support POJOs > -- > > Key: FLINK-4575 > URL: https://issues.apache.org/jira/browse/FLINK-4575 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Reporter: Gabor Gevay >Priority: Minor > Labels: starter > > The aggregate methods of DataSets (aggregate, sum, min, max) currently only > support Tuples, with the fields specified by indices. With > https://issues.apache.org/jira/browse/FLINK-3702 resolved, adding support for > POJOs and field expressions would be easy: {{AggregateOperator}} would create > {{FieldAccessors}} instead of just storing field positions, and > {{AggregateOperator.AggregatingUdf}} would use these {{FieldAccessors}} > instead of the Tuple field access methods. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4575) DataSet aggregate methods should support POJOs
[ https://issues.apache.org/jira/browse/FLINK-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15463360#comment-15463360 ] Gabor Gevay commented on FLINK-4575: This is a bit harder than I thought, because of adding the ForwardedFields property in case of the aggregation field not being the same as the key field. Note, that the old logic of determining this has a bug: https://issues.apache.org/jira/browse/FLINK-4578 > DataSet aggregate methods should support POJOs > -- > > Key: FLINK-4575 > URL: https://issues.apache.org/jira/browse/FLINK-4575 > Project: Flink > Issue Type: Improvement > Components: DataSet API >Reporter: Gabor Gevay >Priority: Minor > Labels: starter > > The aggregate methods of DataSets (aggregate, sum, min, max) currently only > support Tuples, with the fields specified by indices. With > https://issues.apache.org/jira/browse/FLINK-3702 resolved, adding support for > POJOs and field expressions would be easy: {{AggregateOperator}} would create > {{FieldAccessors}} instead of just storing field positions, and > {{AggregateOperator.AggregatingUdf}} would use these {{FieldAccessors}} > instead of the Tuple field access methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)