[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545121#comment-14545121 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30391422 --- Diff: docs/apis/best_practices.md --- @@ -0,0 +1,155 @@ +--- +title: Best Practices +--- +!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +License); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +-- + +a href=#top/a + + +This page contains a collection of best practices for Flink programmers on how to solve frequently encountered problems. + + +* This will be replaced by the TOC +{:toc} + +## Parsing command line arguments and passing them around in your Flink application + + +Almost all Flink applications, both batch and streaming rely on external configuration parameters. +For example for specifying input and output sources (like paths or addresses), also system parameters (parallelism, runtime configuration) and application specific parameters (often used within the user functions). + +Since version 0.9 we are providing a simple called `ParameterTool` to provide at least some basic tooling for solving these problems. --- End diff -- a simple ... = utility? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545127#comment-14545127 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30391736 --- Diff: docs/apis/programming_guide.md --- @@ -665,6 +667,18 @@ DataSetInteger result = in.partitionByHash(0) /td /tr tr + tdstrongCustom Partitioning/strong/td + td --- End diff -- This is unrelated to the change, right? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545145#comment-14545145 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30392459 --- Diff: docs/apis/best_practices.md --- @@ -0,0 +1,155 @@ +--- +title: Best Practices +--- +!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +License); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +-- + +a href=#top/a + + +This page contains a collection of best practices for Flink programmers on how to solve frequently encountered problems. + + +* This will be replaced by the TOC +{:toc} + +## Parsing command line arguments and passing them around in your Flink application + + +Almost all Flink applications, both batch and streaming rely on external configuration parameters. +For example for specifying input and output sources (like paths or addresses), also system parameters (parallelism, runtime configuration) and application specific parameters (often used within the user functions). + +Since version 0.9 we are providing a simple called `ParameterTool` to provide at least some basic tooling for solving these problems. + +As you'll see Flink is very flexible when it comes to parsing input parameters. You are free to choose any other framework, like [Commons CLI](https://commons.apache.org/proper/commons-cli/), [argparse4j](http://argparse4j.sourceforge.net/), or others. --- End diff -- I rephrased it Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545130#comment-14545130 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-102312636 Thanks for the docs update! Very nice idea with the best practices page. +1 Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545134#comment-14545134 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30392111 --- Diff: docs/apis/programming_guide.md --- @@ -665,6 +667,18 @@ DataSetInteger result = in.partitionByHash(0) /td /tr tr + tdstrongCustom Partitioning/strong/td + td --- End diff -- Yes, its a fix to https://issues.apache.org/jira/browse/FLINK-1260 ;) Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545125#comment-14545125 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30391587 --- Diff: docs/apis/best_practices.md --- @@ -0,0 +1,155 @@ +--- +title: Best Practices +--- +!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +License); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +-- + +a href=#top/a + + +This page contains a collection of best practices for Flink programmers on how to solve frequently encountered problems. + + +* This will be replaced by the TOC +{:toc} + +## Parsing command line arguments and passing them around in your Flink application + + +Almost all Flink applications, both batch and streaming rely on external configuration parameters. +For example for specifying input and output sources (like paths or addresses), also system parameters (parallelism, runtime configuration) and application specific parameters (often used within the user functions). + +Since version 0.9 we are providing a simple called `ParameterTool` to provide at least some basic tooling for solving these problems. + +As you'll see Flink is very flexible when it comes to parsing input parameters. You are free to choose any other framework, like [Commons CLI](https://commons.apache.org/proper/commons-cli/), [argparse4j](http://argparse4j.sourceforge.net/), or others. + + +### Getting your configuration values into the `ParameterTool` + +The `ParameterTool` provides a set of predefined static methods for reading the configuration. The tool is internally expecting a `MapString, String`, so its very easy to integrate it with your own configuration style. + + + From `.properties` files + +The following method will read a [Properties](https://docs.oracle.com/javase/tutorial/essential/environment/properties.html) file and provide the key/value pairs: +{% highlight java %} +String propertiesFile = /home/sam/flink/myjob.properties; +ParameterTool parameter = ParameterTool.fromPropertiesFile(propertiesFile); +{% endhighlight %} + + + From the command line arguments + +This allows getting arguments like `--input hdfs:///mydata --elements 42` from the command line. +{% highlight java %} +public static void main(String[] args) { + ParameterTool parameter = ParameterTool.fromArgs(args); + // .. regular code .. +{% endhighlight %} + + + From system properties + +When starting a JVM, you can pass system properties to it: `-Dinput=hdfs:///mydata`. You can also initialize the `ParameterTool` from these system properties: + +{% highlight java %} +ParameterTool parameter = ParameterTool.fromSystemProperties(); +{% endhighlight %} + + +### Using the parameters in your Flink program + +Now that we've got the parameters from somewhere (see above) we can use them in various ways. + +**Directly from the `ParameterTool`** + +The `ParameterTool` itself has methods for accessing the values. +{% highlight java %} +ParameterTool parameters = // ... +parameter.getRequired(input); +parameter.get(output, myDefaultValue); +parameter.getLong(expectedCount, -1L); +parameter.getNumberOfParameters() +// .. there are more methods available. +{% endhighlight %} + +You can use the return values of these methods directly in the main() method (=the client submitting the application). +For example you could set the parallelism of a operator like this: + +{% highlight java %} +ParameterTool parameters = ParameterTool.fromArgs(args); +DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).setParallelism(parameters.getInt(mapParallelism, 2)); --- End diff -- Maybe to make this patter readable... do ``` int parallelism = parameters.get(mapParallelism, 2);
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545146#comment-14545146 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-102315511 Thanks for the review Henry and Ufuk! I addressed your concerns and updated the PR. I rebased the PR to the current master and filed two JIRAs for missing features https://issues.apache.org/jira/browse/FLINK-2018 https://issues.apache.org/jira/browse/FLINK-2017 I'll probably merge this change in the next 12 hours ... Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545119#comment-14545119 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30391280 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/utils/ParameterToolTest.java --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.utils; + +import org.apache.flink.api.java.ClosureCleaner; +import org.apache.flink.configuration.Configuration; +import org.junit.Assert; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Map; +import java.util.Properties; + +public class ParameterToolTest { + + @Rule + public TemporaryFolder tmp = new TemporaryFolder(); + + + // - Parser tests - + + @Test(expected = RuntimeException.class) + public void testIllegalArgs() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{berlin}); + Assert.assertEquals(0, parameter.getNumberOfParameters()); + } + + @Test + public void testNoVal() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{-berlin}); + Assert.assertEquals(1, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(berlin)); + } + + @Test + public void testNoValDouble() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--berlin}); + Assert.assertEquals(1, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(berlin)); + } + + @Test + public void testMultipleNoVal() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, --b, --c, --d, --e, --f}); + Assert.assertEquals(6, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + Assert.assertTrue(parameter.has(c)); + Assert.assertTrue(parameter.has(d)); + Assert.assertTrue(parameter.has(e)); + Assert.assertTrue(parameter.has(f)); + } + + @Test + public void testMultipleNoValMixed() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, -b, -c, -d, --e, --f}); + Assert.assertEquals(6, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + Assert.assertTrue(parameter.has(c)); + Assert.assertTrue(parameter.has(d)); + Assert.assertTrue(parameter.has(e)); + Assert.assertTrue(parameter.has(f)); + } + + @Test(expected = IllegalArgumentException.class) + public void testEmptyVal() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, -b, --}); + Assert.assertEquals(2, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + } + + @Test(expected = IllegalArgumentException.class) + public void testEmptyValShort() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, -b, -}); + Assert.assertEquals(2, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + } + + + + /*@Test --- End diff -- I'll remove it Provide utils to pass -D parameters to UDFs
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545129#comment-14545129 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30391850 --- Diff: docs/apis/best_practices.md --- @@ -0,0 +1,155 @@ +--- +title: Best Practices +--- +!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +License); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +-- + +a href=#top/a + + +This page contains a collection of best practices for Flink programmers on how to solve frequently encountered problems. + + +* This will be replaced by the TOC +{:toc} + +## Parsing command line arguments and passing them around in your Flink application + + +Almost all Flink applications, both batch and streaming rely on external configuration parameters. +For example for specifying input and output sources (like paths or addresses), also system parameters (parallelism, runtime configuration) and application specific parameters (often used within the user functions). + +Since version 0.9 we are providing a simple called `ParameterTool` to provide at least some basic tooling for solving these problems. + +As you'll see Flink is very flexible when it comes to parsing input parameters. You are free to choose any other framework, like [Commons CLI](https://commons.apache.org/proper/commons-cli/), [argparse4j](http://argparse4j.sourceforge.net/), or others. --- End diff -- I think this paragraph is confusing. I would remove it Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545185#comment-14545185 ] ASF GitHub Bot commented on FLINK-1525: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-102327285 +1 Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545140#comment-14545140 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30392280 --- Diff: flink-runtime/src/main/resources/log4j.properties --- @@ -18,7 +18,7 @@ # Convenience file for local debugging of the JobManager/TaskManager. -log4j.rootLogger=OFF, console +log4j.rootLogger=INFO, console --- End diff -- I'll very soon fix that issue so that we don't have to change the logging level every time we want to debug something. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545120#comment-14545120 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30391338 --- Diff: flink-runtime/src/main/resources/log4j.properties --- @@ -18,7 +18,7 @@ # Convenience file for local debugging of the JobManager/TaskManager. -log4j.rootLogger=OFF, console +log4j.rootLogger=INFO, console --- End diff -- back to OFF? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545401#comment-14545401 ] ASF GitHub Bot commented on FLINK-1525: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/664 Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Fix For: 0.9 Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545399#comment-14545399 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-102383548 Okay .. merging Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544069#comment-14544069 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30342062 --- Diff: flink-java/src/test/java/org/apache/flink/api/java/utils/ParameterToolTest.java --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.api.java.utils; + +import org.apache.flink.api.java.ClosureCleaner; +import org.apache.flink.configuration.Configuration; +import org.junit.Assert; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.TemporaryFolder; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; +import java.util.Map; +import java.util.Properties; + +public class ParameterToolTest { + + @Rule + public TemporaryFolder tmp = new TemporaryFolder(); + + + // - Parser tests - + + @Test(expected = RuntimeException.class) + public void testIllegalArgs() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{berlin}); + Assert.assertEquals(0, parameter.getNumberOfParameters()); + } + + @Test + public void testNoVal() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{-berlin}); + Assert.assertEquals(1, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(berlin)); + } + + @Test + public void testNoValDouble() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--berlin}); + Assert.assertEquals(1, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(berlin)); + } + + @Test + public void testMultipleNoVal() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, --b, --c, --d, --e, --f}); + Assert.assertEquals(6, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + Assert.assertTrue(parameter.has(c)); + Assert.assertTrue(parameter.has(d)); + Assert.assertTrue(parameter.has(e)); + Assert.assertTrue(parameter.has(f)); + } + + @Test + public void testMultipleNoValMixed() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, -b, -c, -d, --e, --f}); + Assert.assertEquals(6, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + Assert.assertTrue(parameter.has(c)); + Assert.assertTrue(parameter.has(d)); + Assert.assertTrue(parameter.has(e)); + Assert.assertTrue(parameter.has(f)); + } + + @Test(expected = IllegalArgumentException.class) + public void testEmptyVal() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, -b, --}); + Assert.assertEquals(2, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + } + + @Test(expected = IllegalArgumentException.class) + public void testEmptyValShort() { + ParameterTool parameter = ParameterTool.fromArgs(new String[]{--a, -b, -}); + Assert.assertEquals(2, parameter.getNumberOfParameters()); + Assert.assertTrue(parameter.has(a)); + Assert.assertTrue(parameter.has(b)); + } + + + + /*@Test --- End diff -- Do you want to keep this test as optional? If yes then it is better to have
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544074#comment-14544074 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-102111410 Hi @rmetzger, just did a pass and other than comments about unused test and broken Travis due to check style I think this PR is ready to go. +1 Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541624#comment-14541624 ] ASF GitHub Bot commented on FLINK-1525: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101579496 Do you want to allow empty keys like in `-- value` or even `--`? I think the parser should throw an exception in this case. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542129#comment-14542129 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101720728 Good catch, thank you. I added test cases that validate we're throwing an exception for `-` and `--` inputs. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539548#comment-14539548 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101201653 I thought its much nicer for users to provide a typed POJO (implementing `UserConfig`) instead of a String map. This approach leaves at least the door open for a typed configuration object. If you want, I'll also let the `Configuration` class implement the UserConfig interface, so that users can used untyped key/value configuration there as well. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539550#comment-14539550 ] ASF GitHub Bot commented on FLINK-1525: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101203099 Does every user need to implement the user config object? In the code, do you always need to cast the user config to your specific class? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539916#comment-14539916 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101303549 Yes. But my PR will provide two standard implementations (ParameterTool, Configuration), so only if users want more features they need to implement the config. Users have to cast it in their function, yes. @hsaputra: I renamed the class to `ParameterTool`. -- I think the pull request is in a mergeable state now. I didn't change the examples yet. Thats going to be a big change which I would like to do separately, once this is merged. Please review the PR ... Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Assignee: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537696#comment-14537696 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100805130 In Hadoop, the `UserConfig` would probably be a `Configuration` object with key/value pairs. In Flink, we are trying to get rid of these untyped maps. Instead, I would recommend users to use a simple java class, like ```java public static class MyConfig extends UserConfig { public long someLongValue; public int someInt; // this is optional public MapString, String toMap() { return null; } } ``` It can be used in a similar way to a Configuration object, but the compiler is able to check the types. The `ParameterUtil` is implementing the UserConfig interface to expose the configuration values through the Flink program in the web interface. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537880#comment-14537880 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100891653 Thank you. I didn't reuse the parser there because there is an ongoing JIRA to unify all the command line parsers inside Flink. It seems that we are using at least two different libraries .. and the consensus in the JIRA seems to be using a third one to solve the problem. But I didn't want to add more confusion to the topic than we already have. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537842#comment-14537842 ] ASF GitHub Bot commented on FLINK-1525: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100880185 I like it. But I think it needs some functionality for verifying parameters. To let the user specify some parameters that always need to be there and a description of the parameter. Similar to how other tools print the usage when you don't give correct arguments. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537847#comment-14537847 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100882340 I agree, however it should be optional. I don't like these tools where you spend a lot of time registering / specifying arguments. People want to analyze their data, not configure a huge parameter parsing framework ;) But I'm going to look into this and see how much work it would be to implement it. If it blocks me from getting this PR merged soon, I'll file a follow-up JIRA. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537878#comment-14537878 ] ASF GitHub Bot commented on FLINK-1525: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100891261 Very helpful utility. I think it's worth adapting all the examples if we merge this. Removes a lot of unnecessary code and makes the examples more readable. May I ask why you didn't reuse the Parser in `org.apache.commons.cli`? Too much overhead? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537909#comment-14537909 ] ASF GitHub Bot commented on FLINK-1525: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100897320 Thanks for clarifying. I agree that the parser in Apache Commons is not the nicest... Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537958#comment-14537958 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100917520 If this tool also supported positional arguments ... I also though about adding those, but decided against it, because args[n] is already a way of accessing the arguments by their position ;) But I'll add it so that users can also specify default values .. and we take care of the parsing. I'm not so sure about this anymore. Positional arguments would break a lot in the design of the `ParameterUtil`. For example the export to the web interface, the Configuration object or Properties. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538035#comment-14538035 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100941805 Thanks for clarifying, Robert. I think convention to expect *Util classes to just contain static methods instead of being created as an instance. Maybe we could use ```InputParameters``` as class name to help parse the CLI arguments instead? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537984#comment-14537984 ] ASF GitHub Bot commented on FLINK-1525: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100926127 Yes, something like this. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537975#comment-14537975 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100923321 @aljoscha: Is this the API you were thinking of? ```java RequiredParameters required = new RequiredParameters(); Option input = required.add(input).alt(i).description(Path to input file or directory); // parameter with long and short variant required.add(output); // parameter only with long variant Option parallelism = required.add(parallelism).alt(p).type(Integer.class); // parameter with type Option spOption = required.add(sourceParallelism).alt(sp).defaultValue(12).description(Number specifying the number of parallel data source instances); // parameter with default value, specifying the type. ParameterUtil parameter = ParameterUtil.fromArgs(new String[]{-i, someinput, --output, someout, -p, 15}); required.check(parameter); required.printHelp(); required.checkAndPopulate(parameter); String inputString = input.get(); int par = parallelism.getInteger(); String output = parameter.get(output); int sourcePar = parameter.getInteger(spOption.getName()); ``` Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538004#comment-14538004 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30046159 --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/PojoExample.java --- @@ -26,6 +26,7 @@ import org.apache.flink.util.Collector; + --- End diff -- Extra new line? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538574#comment-14538574 ] ASF GitHub Bot commented on FLINK-1525: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101035060 Nice idea. Do we need the special `UserConfig` interface, or can we use a `Properties` object, or a directly a `MapString, String`? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537616#comment-14537616 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100775629 This is great idea @rmetzger I am bit confuse about the addition of UserConfig interface. Is the interface added to add abstraction for the ParameterUtil? It is a bit confusing what does user mean? is it the program driver or actual user (ie the principal) which executing the Flink program? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537086#comment-14537086 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100605507 Thank you for the feedback so far! @uce In the word count example: is the only point of setting the user config in the execution config only to show the values in the web interface? Yes. But I think its worth it, because it'll teach our users how to use all features of the util. If this tool also supported positional arguments ... I also though about adding those, but decided against it, because args[n] is already a way of accessing the arguments by their position ;) But I'll add it so that users can also specify default values .. and we take care of the parsing. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537076#comment-14537076 ] ASF GitHub Bot commented on FLINK-1525: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100600902 **Feedback w/o code review** I like the idea very much. +1 for the general thing. Exporting it to the web interface is great. Thanks for this. :-) The Hadoop compatibility idea is also very nice. Let's make sure to follow up with it soon (file a JIRA etc.) Regarding adding this to all tests: I think we should discuss this. I am undecided. On the one hand, it will break everything that shows how to run the examples, because you will have to specify the parameter names now. On the other hand, I like that people will be exposed to the utility. I think I'm leaning towards option 2 atm. What's currently missing is documentation. --- In the word count example: is the only point of setting the user config in the execution config only to show the values in the web interface? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537081#comment-14537081 ] ASF GitHub Bot commented on FLINK-1525: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100601591 Cool tool, Robert. This is definitely useful for users. If this tool also supported positional arguments, then we wouldn't have to change how the examples are called. Furthermore, it wouldn't force you to name all of your parameters if you don't want to. On May 10, 2015 10:29 AM, Ufuk Celebi notificati...@github.com wrote: *Feedback w/o code review* I like the idea very much. +1 for the general thing. Exporting it to the web interface is great. Thanks for this. :-) The Hadoop compatibility idea is also very nice. Let's make sure to follow up with it soon (file a JIRA etc.) Regarding adding this to all tests: I think we should discuss this. I am undecided. On the one hand, it will break everything that shows how to run the examples, because you will have to specify the parameter names now. On the other hand, I like that people will be exposed to the utility. I think I'm leaning towards option 2 atm. What's currently missing is documentation. -- In the word count example: is the only point of setting the user config in the execution config only to show the values in the web interface? — Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/664#issuecomment-100600902. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536566#comment-14536566 ] ASF GitHub Bot commented on FLINK-1525: --- GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/664 [FLINK-1525][FEEDBACK] Introduction of a small input parameter parsing utility Hi, last week I was running a bunch of Flink Streaming jobs on a cluster. One of the jobs had 8 arguments which I changed in different iterations of the program. I ended up passing arguments like ``` 16 1 8 3 10k hdp22-w-1.c.internal:6667,hdp22-w-0.c.internal:6667,hdp22-m.c.internal:6667 1 ``` Its obvious that this is not easily maintainable. In addition to this experience, I got similar feedback from at least two other Flink users. Therefore, I sat down and implemented a simple class which allows users to work with input parameters in a hassle-free manner. The tool is called **ParameterUtil**. It can be initialized from: - regular command line arguments (`-` and `--`): `ParameterUtil.fromArgs(new String[]{--berlin});` - `.properties` files: `ParameterUtil.fromPropertiesFile(propertiesFile);` - system properties (-D arguments to the JVM): `ParameterUtil.fromSystemProperties()`; I'm also planning to provide an initializer which accepts the same arguments as Hadoop's GenericOptionsParser: https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/util/GenericOptionsParser.html (our users are just too used to Hadoop's tooling) For accessing arguments, it has methods like: `parameter.getRequired(input)`, `parameter.get(output, myDefaultValue)`, `parameter.getLong(expectedCount, -1L)` and so on ... Also, I added a method to export the parameters to Flink's `Configuration` class: ``` Configuration config = parameter.getConfiguration(); config.getLong(expectedCount, -1L) ``` This allows users to pass the input arguments to operators in the APIs: ``` text.flatMap(new Tokenizer()).withParameters(conf) ``` The `ParameterUtil` itself is Serializable, so it can be passed into user functions (for example to the `Tokenizer`). Also, I extended the `ExecutionConfig` to allow passing a `UserConfig` with custom stuff inside it. The `ParameterUtil` is implementing the `UserConfig` interface, so users can do the following: ```java public static void main(String[] args) throws Exception { ParameterUtil pt = ParameterUtil.fromArgs(args); final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); env.getConfig().setUserConfig(pt); /// regular flink stuff } ``` Inside a (rich) user function, users can access the command line arguments: ```java text.flatMap(new Tokenizer()).flatMap(new RichFlatMapFunctionTuple2String, Integer, Tuple2String, Integer() { @Override public void flatMap(Tuple2String, Integer value, CollectorTuple2String, Integer out) throws Exception { ExecutionConfig.UserConfig uc = getRuntimeContext().getExecutionConfig().getUserConfig(); ParameterUtil pt = (ParameterUtil) uc; float norm = pt.getFloat(normalization, 0.15f); } }) ``` The `UserConfig` allows to export Key/Value pairs to the web interface. Running Wordcount: ``` /bin/flink run ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar --input /home/robert/incubator-flink/build-target/README.txt --output /tmp/wo ``` Will lead to the following result: ![paramutil](https://cloud.githubusercontent.com/assets/89049/7550566/14ea36c2-f667-11e4-9a81-ee6a017527b0.png) Before I'm now going to add this to all examples I would like to get some feedback for the API choices I made (I don't want to change all examples afterwards ;) ). Wordcount currently looks like this: ```java public static void main(String[] args) throws Exception { ParameterUtil pt = ParameterUtil.fromArgs(args); boolean fileOutput = pt.getNumberOfParameters() == 2; String textPath = null; String outputPath = null; if(fileOutput) { textPath = pt.getRequired(input); outputPath = pt.getRequired(output); } // set up the execution environment final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); env.getConfig().setUserConfig(pt); // create initial DataSet, containing the text lines. DataSetString text; if(fileOutput) { text = env.readTextFile(textPath); } else { // get default test text data text = WordCountData.getDefaultTextLineDataSet(env); }
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533358#comment-14533358 ] Robert Metzger commented on FLINK-1525: --- I drafted some more code for this feature because I recently needed such a tool for one of my Flink jobs https://github.com/rmetzger/flink/tree/flink1525 Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355112#comment-14355112 ] Wepngong Ngeh Benaiah commented on FLINK-1525: -- hello Roger Metzger! I will love to work on this issue. Im quite new to flink and will appreciate any pointers for me. Thanks! Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355904#comment-14355904 ] Robert Metzger commented on FLINK-1525: --- Hi, great. We're always happy about new contributors. The idea behind such a tool is to allow users to easily configure and parameterize their functions and code. I think something like this would be really helpful: {code} public static void main(String[] inArgs) throws Exception { final ArgsUtil args = new ArgsUtil(inArgs); String input = args.getString(input, true); // true for required String output = args.getString(output, false, file:///tmp); // not required with default value. Configuration extParams = args.getParameters(); // with extParams containing extParams.getString(ignoreTerm); and the other -D arguments. DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).withParameters(extParams).map(new TermFilter(args)); {code} I think the right location for this is the {{flink-contrib}} package. Also, its very important to write test cases for your code and to add some documentation... But I think that can follow after a first working prototype. Let me know if you need more information on this. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318012#comment-14318012 ] Stephan Ewen commented on FLINK-1525: - I think Flink has a much nicer way of passing parameters to functions. This seems very much a Hadoop artifact that they build because there was no better way. Should we really adapt this? Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318016#comment-14318016 ] Robert Metzger commented on FLINK-1525: --- I think it doesn't hurt to have something like this in the {{flink-contrib}} package. The task is nice for somebody who's looking into an easy starter task with Flink. It doesn't have to be the Configuration object we're passing to the UDFs. The util could also return a serializable object one can pass into the functions. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318146#comment-14318146 ] Stephan Ewen commented on FLINK-1525: - I agree, nice starter task... Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)