[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905084#comment-15905084 ] ASF GitHub Bot commented on FLINK-5976: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/3485 > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904857#comment-15904857 ] ASF GitHub Bot commented on FLINK-5976: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3485 Looks good, thanks! Merging this... > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904280#comment-15904280 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 commented on the issue: https://github.com/apache/flink/pull/3485 @StephanEwen FlatMapFunction "Tokenizer" has move to org.apache.flink.test.testfunctions. Help me to review this. Thanks. And collect other reusable functions to org.apache.flink.test.testfunctions later. > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904258#comment-15904258 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 commented on the issue: https://github.com/apache/flink/pull/3485 OK, get it. So how about just do it now in flink-tests like this pull request? > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902736#comment-15902736 ] ASF GitHub Bot commented on FLINK-5976: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3485 I don't think it is necessary to unify all example functions. These are just minimal non critical pieces defined by some tests. One we drop Java 7, most of them will anyways disappear and become Lambdas. So I would not overdo this right now... > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902310#comment-15902310 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 commented on the issue: https://github.com/apache/flink/pull/3485 @StephanEwen It's a good idea. We can remove many duplicate code in flink-tests in this way. But I found it also duplicate in other modules. How to solve it in other modules. ![image](https://cloud.githubusercontent.com/assets/24708126/23732447/1f3cdd08-04ae-11e7-866e-a5a0e6d5e242.png) > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901227#comment-15901227 ] ASF GitHub Bot commented on FLINK-5976: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3485 How about we create a package `org.apache.flink.test.testfunctions` where we collect all shared reusable function implementations? > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901228#comment-15901228 ] ASF GitHub Bot commented on FLINK-5976: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3485 Would be nice to follow up here and collect also functions like "identity mapper" or so in that package. > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900542#comment-15900542 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 commented on a diff in the pull request: https://github.com/apache/flink/pull/3485#discussion_r104826690 --- Diff: flink-tests/src/test/java/org/apache/flink/test/util/Tokenizer.java --- @@ -0,0 +1,17 @@ +package org.apache.flink.test.util; --- End diff -- @zentol fixed. > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899328#comment-15899328 ] ASF GitHub Bot commented on FLINK-5976: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3485#discussion_r104650335 --- Diff: flink-tests/src/test/java/org/apache/flink/test/util/Tokenizer.java --- @@ -0,0 +1,17 @@ +package org.apache.flink.test.util; --- End diff -- this file is missing the apache license. > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899276#comment-15899276 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 commented on the issue: https://github.com/apache/flink/pull/3485 @uce 1. This pull request has squash in one commit. 2. JIRA has set affects version to 1.2.0 Help me to review this code, Thanks. > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899270#comment-15899270 ] ASF GitHub Bot commented on FLINK-5976: --- GitHub user liuyuzhong7 opened a pull request: https://github.com/apache/flink/pull/3485 FLINK-5976 [tests] Deduplicate Tokenizer in tests There are some duplicate code like this in flink-test, I think refactor this will be better. ``` public final class Tokenizer implements FlatMapFunction> { @Override public void flatMap(String value, Collector > out) { // normalize and split the line String[] tokens = value.toLowerCase().split("\\W+"); // emit the pairs for (String token : tokens) { if (token.length() > 0) { out.collect(new Tuple2 (token, 1)); } } } } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/liuyuzhong7/flink FLINK-5976 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3485.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3485 commit db4192110b1b610846f50527a5fc67450d3d1cca Author: liuyuzhong7 Date: 2017-03-07T11:15:40Z FLINK-5976 [tests] Deduplicate Tokenizer in tests > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunction Tuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898993#comment-15898993 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 closed the pull request at: https://github.com/apache/flink/pull/3482 > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898946#comment-15898946 ] ASF GitHub Bot commented on FLINK-5976: --- Github user liuyuzhong7 commented on the issue: https://github.com/apache/flink/pull/3482 @uce Thanks for review. OK, I'll do it later. > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 1.2.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 1.2.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5976) Refactoring duplicate Tokenizer in flink-test
[ https://issues.apache.org/jira/browse/FLINK-5976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898937#comment-15898937 ] ASF GitHub Bot commented on FLINK-5976: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/3482 Hey @liuyuzhong7! Thanks for the contribution. A few notes: - Could you rebase your branch on top of the master branch. We don't do merge commits. After you've rebased the branch, I will review it (you can force push to your PR branch) - Could you rename the commits and this PR to: ``` [FLINK-5976] [tests] Deduplicate Tokenizer in tests ``` - Could you unset the `fix version` in the JIRA and set `affects version` to `1.2.0`? > Refactoring duplicate Tokenizer in flink-test > - > > Key: FLINK-5976 > URL: https://issues.apache.org/jira/browse/FLINK-5976 > Project: Flink > Issue Type: Improvement > Components: Examples >Affects Versions: 2.0.0 >Reporter: liuyuzhong7 >Priority: Minor > Labels: test > Fix For: 2.0.0 > > > There are some duplicate code like this in flink-test, I think refactor this > will be better. > ``` > public final class Tokenizer implements FlatMapFunctionTuple2 > { > @Override > public void flatMap(String value, Collector > > out) { > // normalize and split the line > String[] tokens = value.toLowerCase().split("\\W+"); > // emit the pairs > for (String token : tokens) { > if (token.length() > 0) { > out.collect(new Tuple2 (token, > 1)); > } > } > } > } > ``` -- This message was sent by Atlassian JIRA (v6.3.15#6346)