[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357170#comment-14357170 ] Robert Metzger commented on FLINK-1388: --- Hi, yes, Tuple types are subclassed. Flink provides predefined Tuple1- Tuple25. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation
[ https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357215#comment-14357215 ] ASF GitHub Bot commented on FLINK-1654: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78308404 Thanks for the fix! Wrong scala example of POJO type in documentation - Key: FLINK-1654 URL: https://issues.apache.org/jira/browse/FLINK-1654 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Chiwan Park Assignee: Chiwan Park Priority: Trivial In [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos], there is a scala example of POJO {code} class WordWithCount(val word: String, val count: Int) { def this() { this(null, -1) } } {code} I think that this is wrong because Flink POJO required public fields or private fields with getter and setter. Fields in scala class is private in default. We should change the field declarations to use `var` keyword or class declaration to case class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357234#comment-14357234 ] Fabian Hueske commented on FLINK-1388: -- Watch out, PojoTypeInformation is not serializable due to the non-serializable ``java.lang.reflect.Field``. You should rather pass the Pojo class name and the name of all fields that should be written out. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/478 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1689) Add documentation on streaming file sinks interaction with the batch outputformat
[ https://issues.apache.org/jira/browse/FLINK-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi updated FLINK-1689: -- Description: OutputFormats supported by the batch API are supported in streaming through the FileSinkFunction. A bit of documentation on that is needed. Add documentation on streaming file sinks interaction with the batch outputformat - Key: FLINK-1689 URL: https://issues.apache.org/jira/browse/FLINK-1689 Project: Flink Issue Type: Task Components: Streaming Reporter: Márton Balassi OutputFormats supported by the batch API are supported in streaming through the FileSinkFunction. A bit of documentation on that is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1683) Scheduling preferences for non-unary tasks are not correctly computed
[ https://issues.apache.org/jira/browse/FLINK-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357195#comment-14357195 ] ASF GitHub Bot commented on FLINK-1683: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/476#issuecomment-78306554 I updated the PR and made the preference choice a bit more lightweight. Scheduling preferences for non-unary tasks are not correctly computed - Key: FLINK-1683 URL: https://issues.apache.org/jira/browse/FLINK-1683 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9, 0.8.1 Reporter: Fabian Hueske Assignee: Fabian Hueske Fix For: 0.9, 0.8.2 When computing scheduling preferences for an execution task, the JobManager looks at the assigned instances of all its input execution tasks and returns a preference only if not more than 8 instances have been found (if the input of a tasks is distributed across more than 8 tasks, local scheduling won't help a lot in any case). However, the JobManager treats all input execution tasks the same and does not distinguish between different logical input. The effect is that a join tasks with one broadcasted and one locally forwarded task is not locally assigned towards its locally forwarded input. This can have a significant impact on the performance of tasks that have more than one input and which rely on local forwarding and co-located task scheduling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1683] [jobmanager]� Fix scheduling pref...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/476#issuecomment-78306554 I updated the PR and made the preference choice a bit more lightweight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357223#comment-14357223 ] Till Rohrmann commented on FLINK-1537: -- You're right that it's best if the algorithms are implemented in a general fashion so that you can plug in different regularizer or cost functions, for example. Mahout is indeed a good example to learn from. What I was aiming at are more the general building blocks of ML algorithms. In many cases, distributed algorithms distribute the data. Then some local processing is done which results in some local state. This local state often has to be communicated to the other worker nodes to obtain a consistent global state. Starting from this global state, the next local computation can be started. Depending on the algorithm, whether it is stochastic or not, for example, one only need a random subset of the data or the complete local partition. If you only need a stochastic subset, then often you will only need the corresponding subset of the global state to perform your local computations. Then sometimes the global state is so huge that it cannot be kept on a single machine and has to stored in parallel. The question is now, how these operations can be realized within Flink to allow an efficient implementation of a multitude of machine learning algorithms. For example, local state can either be stored as part of a stateful operator or as part of the elements stored in the {{DataSet}}. GSoC project: Machine learning with Apache Flink Key: FLINK-1537 URL: https://issues.apache.org/jira/browse/FLINK-1537 Project: Flink Issue Type: New Feature Reporter: Till Rohrmann Priority: Minor Labels: gsoc2015, java, machine_learning, scala Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. * Extension of Flink's machine learning library by additional ML algorithms ** Stochastic gradient descent ** Distributed dual coordinate ascent ** SVM ** Gaussian mixture EM ** DecisionTrees ** ... * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357234#comment-14357234 ] Fabian Hueske edited comment on FLINK-1388 at 3/11/15 5:21 PM: --- Watch out, PojoTypeInformation is not serializable due to the non-serializable java.lang.reflect.Field. You should rather pass the Pojo class name and the name of all fields that should be written out. was (Author: fhueske): Watch out, PojoTypeInformation is not serializable due to the non-serializable ``java.lang.reflect.Field``. You should rather pass the Pojo class name and the name of all fields that should be written out. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1688) Add socket sink
Márton Balassi created FLINK-1688: - Summary: Add socket sink Key: FLINK-1688 URL: https://issues.apache.org/jira/browse/FLINK-1688 Project: Flink Issue Type: Sub-task Components: Streaming Reporter: Márton Balassi Priority: Trivial Add a sink that writes output to socket. I'd consider two options, one which implements a socket server and one which implements a client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/477#discussion_r26220968 --- Diff: flink-scala/pom.xml --- @@ -236,4 +230,23 @@ under the License. /plugins /build + profiles + profile + idscala-2.10/id + activation + property + !-- this is the default scala profile -- + name!scala-2.11.profile/name --- End diff -- Okay. Accepted. Lets do it with the properties --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Issue Comment Deleted] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-1537: - Comment: was deleted (was: Implementing first a decision tree algorithm is definitely the right way to go. If you implemented it, then it would be an awesome contribution to Flink. And I think it's the best way to get used to Flink's API. Thus, it's a win-win situation :-) Look at the recently opened [machine learning PR|https://github.com/apache/flink/pull/479] which loosely defines interfaces for {{Learner}} and {{Transformer}}. A {{Learner}} is an algorithm which takes a {{DataSet[A]}} and fits a model to this data. In the case of a decision tree, the input data would be a labeled vector and the output would be the learned tree. A {{Transformer}} simply takes a {{DataSet[A]}} and transforms it into a {{DataSet[B]}}. A feature extractor or data whitening would be an example for that. {{Transformer}} can be arbitrarily chained as long as their types match. A {{Learner}} terminates a transformer pipeline. If you sticked to this model with your implementation, then one could prepend any {{Transformer}} to the decision tree learner. This makes creating a data analysis pipeline really easy. If I can help you with the implementation, then let me know. A deep learning framework is also something really intriguing but at the same time highly ambitious. So far, we haven't made an effort implementing deep learning algorithms with Flink. I know that there is the [H2O project|https://github.com/h2oai/h2o-dev] which does distributed deep learning. However, their underlying data model is different form ours. If I'm not mistaken, then they store the data column-wise whereas we store them row-wise. I don't know what difference this makes. The first thing would probably be to evaluate Flink's potential for deep learning and then to come up with a prototype.) GSoC project: Machine learning with Apache Flink Key: FLINK-1537 URL: https://issues.apache.org/jira/browse/FLINK-1537 Project: Flink Issue Type: New Feature Reporter: Till Rohrmann Priority: Minor Labels: gsoc2015, java, machine_learning, scala Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. * Extension of Flink's machine learning library by additional ML algorithms ** Stochastic gradient descent ** Distributed dual coordinate ascent ** SVM ** Gaussian mixture EM ** DecisionTrees ** ... * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1690) ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357146#comment-14357146 ] Stephan Ewen commented on FLINK-1690: - I will create a patch with increased timeout... ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis -- Key: FLINK-1690 URL: https://issues.apache.org/jira/browse/FLINK-1690 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Priority: Minor I got the following error on Travis. {code} ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:244 The program did not finish in time {code} I think we have to increase the timeouts for this test case to make it reliably run on Travis. The log of the failed Travis build can be found [here|https://api.travis-ci.org/jobs/53952486/log.txt?deansi=true] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357163#comment-14357163 ] Till Rohrmann commented on FLINK-1537: -- Implementing first a decision tree algorithm is definitely the right way to go. If you implemented it, then it would be an awesome contribution to Flink. And I think it's the best way to get used to Flink's API. Thus, it's a win-win situation :-) Look at the recently opened [machine learning PR|https://github.com/apache/flink/pull/479] which loosely defines interfaces for {{Learner}} and {{Transformer}}. A {{Learner}} is an algorithm which takes a {{DataSet[A]}} and fits a model to this data. In the case of a decision tree, the input data would be a labeled vector and the output would be the learned tree. A {{Transformer}} simply takes a {{DataSet[A]}} and transforms it into a {{DataSet[B]}}. A feature extractor or data whitening would be an example for that. {{Transformer}} can be arbitrarily chained as long as their types match. A {{Learner}} terminates a transformer pipeline. If you sticked to this model with your implementation, then one could prepend any {{Transformer}} to the decision tree learner. This makes creating a data analysis pipeline really easy. If I can help you with the implementation, then let me know. A deep learning framework is also something really intriguing but at the same time highly ambitious. So far, we haven't made an effort implementing deep learning algorithms with Flink. I know that there is the [H2O project|https://github.com/h2oai/h2o-dev] which does distributed deep learning. However, their underlying data model is different form ours. If I'm not mistaken, then they store the data column-wise whereas we store them row-wise. I don't know what difference this makes. The first thing would probably be to evaluate Flink's potential for deep learning and then to come up with a prototype. GSoC project: Machine learning with Apache Flink Key: FLINK-1537 URL: https://issues.apache.org/jira/browse/FLINK-1537 Project: Flink Issue Type: New Feature Reporter: Till Rohrmann Priority: Minor Labels: gsoc2015, java, machine_learning, scala Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. * Extension of Flink's machine learning library by additional ML algorithms ** Stochastic gradient descent ** Distributed dual coordinate ascent ** SVM ** Gaussian mixture EM ** DecisionTrees ** ... * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1654) Wrong scala example of POJO type in documentation
[ https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1654. -- Resolution: Fixed FIxed with fd9ca4defd54fa150d33d042471e381e0a0a1164 Wrong scala example of POJO type in documentation - Key: FLINK-1654 URL: https://issues.apache.org/jira/browse/FLINK-1654 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Chiwan Park Assignee: Chiwan Park Priority: Trivial In [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos], there is a scala example of POJO {code} class WordWithCount(val word: String, val count: Int) { def this() { this(null, -1) } } {code} I think that this is wrong because Flink POJO required public fields or private fields with getter and setter. Fields in scala class is private in default. We should change the field declarations to use `var` keyword or class declaration to case class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78308404 Thanks for the fix! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1691) Inprove CountCollectITCase
Stephan Ewen created FLINK-1691: --- Summary: Inprove CountCollectITCase Key: FLINK-1691 URL: https://issues.apache.org/jira/browse/FLINK-1691 Project: Flink Issue Type: Bug Components: test Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Maximilian Michels Fix For: 0.9 The CountCollectITCase logs heavily and does not reuse the same cluster across multiple tests. Both can be addressed by letting it extend the MultipleProgramsTestBase -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1689) Add documentation on streaming file sinks interaction with the batch outputformat
Márton Balassi created FLINK-1689: - Summary: Add documentation on streaming file sinks interaction with the batch outputformat Key: FLINK-1689 URL: https://issues.apache.org/jira/browse/FLINK-1689 Project: Flink Issue Type: Task Components: Streaming Reporter: Márton Balassi -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26227591 --- Diff: docs/_layouts/default.html --- @@ -23,16 +23,25 @@ meta http-equiv=X-UA-Compatible content=IE=edge meta name=viewport content=width=device-width, initial-scale=1 titleApache Flink: {{ page.title }}/title -link rel=shortcut icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css -link rel=stylesheet href={{ site.baseurl }}css/bootstrap-lumen-custom.css -link rel=stylesheet href={{ site.baseurl }}css/syntax.css -link rel=stylesheet href={{ site.baseurl }}css/custom.css -link href={{ site.baseurl }}css/main/main.css rel=stylesheet +link rel=shortcut icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap-lumen-custom.css +link rel=stylesheet href={{ site.baseurl }}/css/syntax.css +link rel=stylesheet href={{ site.baseurl }}/css/custom.css +link href={{ site.baseurl }}/css/main/main.css rel=stylesheet --- End diff -- Yes, this should fix the problem. I'll add it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78298992 I set the ```{{ site.baseurl }}``` to http://ci.apache.org/projects/flink/flink-docs-master and the local preview is started with ```--baseurl ```. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357577#comment-14357577 ] Fabian Hueske commented on FLINK-1388: -- That is right, but the OutputFormat itself is also serialized and shipped to the TaskManager in the cluster where it is executed. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78311798 The reason why I asked regarding Scala 2.11.6 was because this version is shown on the scala-lang website next to the download button. Also, on maven central, you get the impression that it is released: http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.scala-lang%22%20AND%20a%3A%22scala-library%22 Why aren't we adding a _2.11 suffix to the Scala 2.11 Flink builds? We can do this, and it certainly makes sense if you want to ship pre-builds of both versions. With the current setup if you want to use Flink with 2.11 you have to build and install the maven projects yourself (I'm blindly following the Spark model here, let me know if you prefer the other option). Thats not entirely true: Users have to build Flink with 2.11 themselves and then Build their project with `-Dscala-2.11` to activate the profile inside our pom (on `install` maven does not rewrite poms) So I'm in favor of adding a `_2.11` suffix to all affected Flink artifacts. But this is really a major change that we really need to discuss on the mailing list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Comment Edited] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357238#comment-14357238 ] Sachin Goel edited comment on FLINK-1537 at 3/11/15 5:23 PM: - Yes. I completely agree with the transformer-learner chain design methodology. The decision tree I'll write will provide an interface for first specifying the structure of the data, i.e., the tuple, as in the types, ranges, etc. and any other statistics possible to help with the learning. I do not see myself how it makes a difference to store the data column wise or row wise, although it might have some far-reaching consequences on how the learning process proceeds. In fact, this seems like a valid idea for a learning process which treat each coordinate one by one. It might help in providing all the attributes of one particular coordinate in one go and learn some statistics on it, which might help in a better learning process. In fact, in a decision tree implementation on big data, it becomes prudent to learn such a statistic to ensure only a reasonable number of splits on the data are considered. I will look into how this could be achieved with a row-style data representation. As for the deep learning framework, you are indeed right. I am not sure myself if anyone has yet evaluated the potential of a deep learning system on a distributed system. I will look into the H2O project's implementation related to this. As of yet, I'm still not sure if deep learning can be as fast on distributed systems as it is on GPUs. was (Author: sachingoel0101): Yes. I completely agree with the transformer-learner chain design methodology. The decision I'll write will provide an interface for first specifying the structure of the data, i.e., the tuple, as in the types, ranges, etc. and any other statistics possible to help with the learning. I do not see myself how it makes a difference to store the data column wise or row wise, although it might have some far-reaching consequences on how the learning process proceeds. In fact, this seems like a valid idea for a learning process which treat each coordinate one by one. It might help in providing all the attributes of one particular coordinate in one go and learn some statistics on it, which might help in a better learning process. In fact, in a decision tree implementation on big data, it becomes prudent to learn such a statistic to ensure only a reasonable number of splits on the data are considered. I will look into how this could be achieved with a row-style data representation. As for the deep learning framework, you are indeed right. I am not sure myself if anyone has yet evaluated the potential of a deep learning system on a distributed system. I will look into the H2O project's implementation related to this. As of yet, I'm still not sure if deep learning can be as fast on distributed systems as it is on GPUs. GSoC project: Machine learning with Apache Flink Key: FLINK-1537 URL: https://issues.apache.org/jira/browse/FLINK-1537 Project: Flink Issue Type: New Feature Reporter: Till Rohrmann Priority: Minor Labels: gsoc2015, java, machine_learning, scala Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. * Extension of Flink's machine learning library by additional ML algorithms ** Stochastic gradient descent ** Distributed dual coordinate ascent ** SVM ** Gaussian mixture EM ** DecisionTrees ** ... * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357418#comment-14357418 ] Adnan Khan commented on FLINK-1388: --- so i understand - why does it matter if TypeInformation is serializable or not? aren't we just using it access the fields we're going to write out? POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357238#comment-14357238 ] Sachin Goel commented on FLINK-1537: Yes. I completely agree with the transformer-learner chain design methodology. The decision I'll write will provide an interface for first specifying the structure of the data, i.e., the tuple, as in the types, ranges, etc. and any other statistics possible to help with the learning. I do not see myself how it makes a difference to store the data column wise or row wise, although it might have some far-reaching consequences on how the learning process proceeds. In fact, this seems like a valid idea for a learning process which treat each coordinate one by one. It might help in providing all the attributes of one particular coordinate in one go and learn some statistics on it, which might help in a better learning process. In fact, in a decision tree implementation on big data, it becomes prudent to learn such a statistic to ensure only a reasonable number of splits on the data are considered. I will look into how this could be achieved with a row-style data representation. As for the deep learning framework, you are indeed right. I am not sure myself if anyone has yet evaluated the potential of a deep learning system on a distributed system. I will look into the H2O project's implementation related to this. As of yet, I'm still not sure if deep learning can be as fast on distributed systems as it is on GPUs. GSoC project: Machine learning with Apache Flink Key: FLINK-1537 URL: https://issues.apache.org/jira/browse/FLINK-1537 Project: Flink Issue Type: New Feature Reporter: Till Rohrmann Priority: Minor Labels: gsoc2015, java, machine_learning, scala Currently, the Flink community is setting up the infrastructure for a machine learning library for Flink. The goal is to provide a set of highly optimized ML algorithms and to offer a high level linear algebra abstraction to easily do data pre- and post-processing. By defining a set of commonly used data structures on which the algorithms work it will be possible to define complex processing pipelines. The Mahout DSL constitutes a good fit to be used as the linear algebra language in Flink. It has to be evaluated which means have to be provided to allow an easy transition between the high level abstraction and the optimized algorithms. The machine learning library offers multiple starting points for a GSoC project. Amongst others, the following projects are conceivable. * Extension of Flink's machine learning library by additional ML algorithms ** Stochastic gradient descent ** Distributed dual coordinate ascent ** SVM ** Gaussian mixture EM ** DecisionTrees ** ... * Integration of Flink with the Mahout DSL to support a high level linear algebra abstraction * Integration of H2O with Flink to benefit from H2O's sophisticated machine learning algorithms * Implementation of a parameter server like distributed global state storage facility for Flink. This also includes the extension of Flink to support asynchronous iterations and update messages. Own ideas for a possible contribution on the field of the machine learning library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78333617 The reason why I asked regarding Scala 2.11.6 was because this version is shown on the scala-lang website next to the download button. Snap, I guess Christmas came early this year :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user aalexandrov commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78348901 I rebased on the current master (includes the changes from PR #454 merged today). I'll take a look at the errors thrown on building later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357717#comment-14357717 ] Adnan Khan commented on FLINK-1388: --- Does that make any references to TypeInformation in OutputFormat unuseable once it's shipped to TaskManager on the cluster? Or does it not guarantee anything about to non-serializable class members like java.lang.reflect.Field. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357866#comment-14357866 ] Fabian Hueske commented on FLINK-1388: -- It just won't work and throw an exception which kills the job. You cannot serialize an object that has a non-serializable member variable (at least not with the standard Java serialization which Flink is using to ship function and input/output format object). POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1676) enableForceKryo() is not working as expected
[ https://issues.apache.org/jira/browse/FLINK-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356460#comment-14356460 ] ASF GitHub Bot commented on FLINK-1676: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/473#discussion_r26194005 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java --- @@ -310,6 +310,9 @@ public int getFieldIndex(String fieldName) { @Override public TypeSerializerT createSerializer(ExecutionConfig config) { + if(config.isForceKryoEnabled()) { + return new GenericTypeInfoT(this.typeClass).createSerializer(config); --- End diff -- I thought it makes sense to not have the switch, since the flag is called forceKryo... enableForceKryo() is not working as expected Key: FLINK-1676 URL: https://issues.apache.org/jira/browse/FLINK-1676 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger I my Flink job, I've set the following execution config {code} final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); env.getConfig().disableObjectReuse(); env.getConfig().enableForceKryo(); {code} Setting a breakpoint in the {{PojoSerializer()}} constructor, you'll see that we still serialize data with the POJO serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1676] Rework ExecutionConfig.enableForc...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/473#discussion_r26194005 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java --- @@ -310,6 +310,9 @@ public int getFieldIndex(String fieldName) { @Override public TypeSerializerT createSerializer(ExecutionConfig config) { + if(config.isForceKryoEnabled()) { + return new GenericTypeInfoT(this.typeClass).createSerializer(config); --- End diff -- I thought it makes sense to not have the switch, since the flag is called forceKryo... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation
[ https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356488#comment-14356488 ] ASF GitHub Bot commented on FLINK-1654: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78223694 Hi Chiwan, the documentation example is actually working. In order to be treated as a POJO you either need getter/setter or the fields have to be public. Wrong scala example of POJO type in documentation - Key: FLINK-1654 URL: https://issues.apache.org/jira/browse/FLINK-1654 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Chiwan Park Assignee: Chiwan Park Priority: Trivial In [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos], there is a scala example of POJO {code} class WordWithCount(val word: String, val count: Int) { def this() { this(null, -1) } } {code} I think that this is wrong because Flink POJO required public fields or private fields with getter and setter. Fields in scala class is private in default. We should change the field declarations to use `var` keyword or class declaration to case class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1594) DataStreams don't support self-join
[ https://issues.apache.org/jira/browse/FLINK-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356379#comment-14356379 ] ASF GitHub Bot commented on FLINK-1594: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/472#issuecomment-78211739 What about the stuck test? Did you look into it or did just a re run of Travis pass? DataStreams don't support self-join --- Key: FLINK-1594 URL: https://issues.apache.org/jira/browse/FLINK-1594 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Environment: flink-0.9.0-SNAPSHOT Reporter: Daniel Bali Assignee: Gábor Hermann Trying to window-join a DataStream with itself will result in exceptions. I get the following stack trace: {noformat} java.lang.Exception: Error setting up runtime environment: Union buffer reader must be initialized with at least two individual buffer readers at org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:173) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:419) at org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:44) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.IllegalArgumentException: Union buffer reader must be initialized with at least two individual buffer readers at org.apache.flink.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:125) at org.apache.flink.runtime.io.network.api.reader.UnionBufferReader.init(UnionBufferReader.java:69) at org.apache.flink.streaming.api.streamvertex.CoStreamVertex.setConfigInputs(CoStreamVertex.java:101) at org.apache.flink.streaming.api.streamvertex.CoStreamVertex.setInputsOutputs(CoStreamVertex.java:63) at org.apache.flink.streaming.api.streamvertex.StreamVertex.registerInputOutput(StreamVertex.java:65) at org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:170) ... 20 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1629][FLINK-1630][FLINK-1547] Rework Fl...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/468#issuecomment-78242400 TODO: add documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts
[ https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356717#comment-14356717 ] ASF GitHub Bot commented on FLINK-1605: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/454 Create a shaded Hadoop fat jar to resolve library version conflicts --- Key: FLINK-1605 URL: https://issues.apache.org/jira/browse/FLINK-1605 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger As per mailing list discussion: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/454 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts
[ https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1605. --- Resolution: Fixed Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/84e76f4d Create a shaded Hadoop fat jar to resolve library version conflicts --- Key: FLINK-1605 URL: https://issues.apache.org/jira/browse/FLINK-1605 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger As per mailing list discussion: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26203612 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -64,26 +70,45 @@ private transient int commentCount; private transient int invalidLineCount; + + private CompositeTypeOUT typeInformation = null; + + private String[] fieldsMap = null; + public CsvInputFormat(Path filePath, TypeInformationOUT typeInformation) { + this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, typeInformation); + } - public CsvInputFormat(Path filePath) { - super(filePath); - } - - public CsvInputFormat(Path filePath, Class? ... types) { - this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, types); - } - - public CsvInputFormat(Path filePath, String lineDelimiter, String fieldDelimiter, Class?... types) { + public CsvInputFormat(Path filePath, String lineDelimiter, String fieldDelimiter, TypeInformationOUT typeInformation) { super(filePath); + Preconditions.checkArgument(typeInformation instanceof CompositeType); + this.typeInformation = (CompositeTypeOUT) typeInformation; + setDelimiter(lineDelimiter); setFieldDelimiter(fieldDelimiter); - setFieldTypes(types); + Class?[] classes = new Class?[typeInformation.getArity()]; + for (int i = 0, arity = typeInformation.getArity(); i arity; i++) { + classes[i] = this.typeInformation.getTypeAt(i).getTypeClass(); + } + setFieldTypes(classes); + + if (typeInformation instanceof PojoTypeInfo) { + setFieldsMap(this.typeInformation.getFieldNames()); + setAccessibleToField(); + } } - - + + public void setAccessibleToField() { --- End diff -- Can we make this method private? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356725#comment-14356725 ] ASF GitHub Bot commented on FLINK-1512: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26203612 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -64,26 +70,45 @@ private transient int commentCount; private transient int invalidLineCount; + + private CompositeTypeOUT typeInformation = null; + + private String[] fieldsMap = null; + public CsvInputFormat(Path filePath, TypeInformationOUT typeInformation) { + this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, typeInformation); + } - public CsvInputFormat(Path filePath) { - super(filePath); - } - - public CsvInputFormat(Path filePath, Class? ... types) { - this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, types); - } - - public CsvInputFormat(Path filePath, String lineDelimiter, String fieldDelimiter, Class?... types) { + public CsvInputFormat(Path filePath, String lineDelimiter, String fieldDelimiter, TypeInformationOUT typeInformation) { super(filePath); + Preconditions.checkArgument(typeInformation instanceof CompositeType); + this.typeInformation = (CompositeTypeOUT) typeInformation; + setDelimiter(lineDelimiter); setFieldDelimiter(fieldDelimiter); - setFieldTypes(types); + Class?[] classes = new Class?[typeInformation.getArity()]; + for (int i = 0, arity = typeInformation.getArity(); i arity; i++) { + classes[i] = this.typeInformation.getTypeAt(i).getTypeClass(); + } + setFieldTypes(classes); + + if (typeInformation instanceof PojoTypeInfo) { + setFieldsMap(this.typeInformation.getFieldNames()); + setAccessibleToField(); + } } - - + + public void setAccessibleToField() { --- End diff -- Can we make this method private? Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/477#discussion_r26195550 --- Diff: flink-scala/pom.xml --- @@ -236,4 +230,23 @@ under the License. /plugins /build + profiles + profile + idscala-2.10/id + activation + property + !-- this is the default scala profile -- + name!scala-2.11.profile/name --- End diff -- Why are you activating the profile via this property? You can just do mvn install -Pscala-2.10 to activate the profile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/477#discussion_r26195970 --- Diff: pom.xml --- @@ -599,6 +599,38 @@ under the License. profiles profile + idscala-2.10/id + activation + property + !-- this is the default scala profile -- + name!scala-2.11.profile/name + /property + /activation + properties + !-- Scala / Akka -- + scala.version2.10.4/scala.version + scala.binary.version2.10/scala.binary.version + scala.macros.version2.0.1/scala.macros.version + akka.version2.3.7/akka.version --- End diff -- The akka and scala macros version seems to be independent of the profile. I would put them back to the `properties` section above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1685) Document how to read gzip/compressed files with Flink
Robert Metzger created FLINK-1685: - Summary: Document how to read gzip/compressed files with Flink Key: FLINK-1685 URL: https://issues.apache.org/jira/browse/FLINK-1685 Project: Flink Issue Type: Bug Components: Documentation Reporter: Robert Metzger Too many users asked for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1685) Document how to read gzip/compressed files with Flink
[ https://issues.apache.org/jira/browse/FLINK-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356736#comment-14356736 ] Robert Metzger commented on FLINK-1685: --- the task probably boils down to putting this into an appropriate location in the documentation: http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/read-gz-files-td760.html Document how to read gzip/compressed files with Flink - Key: FLINK-1685 URL: https://issues.apache.org/jira/browse/FLINK-1685 Project: Flink Issue Type: Bug Components: Documentation Reporter: Robert Metzger Too many users asked for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1629) Add option to start Flink on YARN in a detached mode
[ https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356750#comment-14356750 ] ASF GitHub Bot commented on FLINK-1629: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/468#issuecomment-78247421 Nice work. Looks good to me apart from the missing documentation. Add option to start Flink on YARN in a detached mode Key: FLINK-1629 URL: https://issues.apache.org/jira/browse/FLINK-1629 Project: Flink Issue Type: Improvement Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Right now, we expect the YARN command line interface to be connected with the Application Master all the time to control the yarn session or the job. For very long running sessions or jobs users want to just fire and forget a job/session to YARN. Stopping the session will still be possible using YARN's tools. Also, prior to detaching itself, the CLI frontend could print the required command to kill the session as a convenience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1629) Add option to start Flink on YARN in a detached mode
[ https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356715#comment-14356715 ] ASF GitHub Bot commented on FLINK-1629: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/468#issuecomment-78242400 TODO: add documentation. Add option to start Flink on YARN in a detached mode Key: FLINK-1629 URL: https://issues.apache.org/jira/browse/FLINK-1629 Project: Flink Issue Type: Improvement Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger Right now, we expect the YARN command line interface to be connected with the Application Master all the time to control the yarn session or the job. For very long running sessions or jobs users want to just fire and forget a job/session to YARN. Stopping the session will still be possible using YARN's tools. Also, prior to detaching itself, the CLI frontend could print the required command to kill the session as a convenience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1629][FLINK-1630][FLINK-1547] Rework Fl...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/468#issuecomment-78247421 Nice work. Looks good to me apart from the missing documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation
[ https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356513#comment-14356513 ] ASF GitHub Bot commented on FLINK-1654: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78228121 As I wrote in JIRA, Fields of Scala class (not case class) are private. ([Reference](http://stackoverflow.com/questions/1589603/scala-set-a-field-value-reflectively-from-field-name)) Because fields declared by `val` keyword don't have setter, Flink TypeExtractor fails to extract information of POJO example in documentation. I attached a result of sample type extraction in [JIRA](https://issues.apache.org/jira/browse/FLINK-1654?focusedCommentId=14348124page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14348124). Flink TypeExtractor deals with Scala case class differently. Case classes are dealt like Scala tuple. Wrong scala example of POJO type in documentation - Key: FLINK-1654 URL: https://issues.apache.org/jira/browse/FLINK-1654 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Chiwan Park Assignee: Chiwan Park Priority: Trivial In [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos], there is a scala example of POJO {code} class WordWithCount(val word: String, val count: Int) { def this() { this(null, -1) } } {code} I think that this is wrong because Flink POJO required public fields or private fields with getter and setter. Fields in scala class is private in default. We should change the field declarations to use `var` keyword or class declaration to case class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1684) Make Kafka connectors read/write a partition the worker is on
Gábor Hermann created FLINK-1684: Summary: Make Kafka connectors read/write a partition the worker is on Key: FLINK-1684 URL: https://issues.apache.org/jira/browse/FLINK-1684 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Gábor Hermann Kafka connectors could read/write partitions on a different machine. It is a best effort to find the partitions located on the same node as the subtask and read from (or write to) that partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation
[ https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356525#comment-14356525 ] ASF GitHub Bot commented on FLINK-1654: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78229903 Then I guess that the Scala API does not identify the ```WordWithCount``` as a POJO but as something different. Wrong scala example of POJO type in documentation - Key: FLINK-1654 URL: https://issues.apache.org/jira/browse/FLINK-1654 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Chiwan Park Assignee: Chiwan Park Priority: Trivial In [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos], there is a scala example of POJO {code} class WordWithCount(val word: String, val count: Int) { def this() { this(null, -1) } } {code} I think that this is wrong because Flink POJO required public fields or private fields with getter and setter. Fields in scala class is private in default. We should change the field declarations to use `var` keyword or class declaration to case class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation
[ https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356504#comment-14356504 ] ASF GitHub Bot commented on FLINK-1654: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78226428 Curious why this works. We should merge this pull request anyways, because it shows the recommended way to do things. Wrong scala example of POJO type in documentation - Key: FLINK-1654 URL: https://issues.apache.org/jira/browse/FLINK-1654 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Chiwan Park Assignee: Chiwan Park Priority: Trivial In [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos], there is a scala example of POJO {code} class WordWithCount(val word: String, val count: Int) { def this() { this(null, -1) } } {code} I think that this is wrong because Flink POJO required public fields or private fields with getter and setter. Fields in scala class is private in default. We should change the field declarations to use `var` keyword or class declaration to case class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355904#comment-14355904 ] Robert Metzger edited comment on FLINK-1525 at 3/11/15 9:01 AM: Hi, great. We're always happy about new contributors. The idea behind such a tool is to allow users to easily configure and parameterize their functions and code. I think something like this would be really helpful: {code} # inArgs := --input hdfs:///in --output --readers 3 hdfs:///out -DignoreTerm=abc -DfilterFactor=0.2 public static void main(String[] inArgs) throws Exception { final ArgsUtil args = new ArgsUtil(inArgs); String input = args.getString(input, true); // true for required String output = args.getString(output, false, file:///tmp); // not required with default value. int readers = args.getInteger(readers); Configuration extParams = args.getParameters(); // with extParams containing extParams.getString(ignoreTerm); and the other -D arguments. DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).withParameters(extParams).map(new TermFilter(args)); {code} I think the right location for this is the {{flink-contrib}} package. Also, its very important to write test cases for your code and to add some documentation... But I think that can follow after a first working prototype. Let me know if you need more information on this. was (Author: rmetzger): Hi, great. We're always happy about new contributors. The idea behind such a tool is to allow users to easily configure and parameterize their functions and code. I think something like this would be really helpful: {code} public static void main(String[] inArgs) throws Exception { final ArgsUtil args = new ArgsUtil(inArgs); String input = args.getString(input, true); // true for required String output = args.getString(output, false, file:///tmp); // not required with default value. Configuration extParams = args.getParameters(); // with extParams containing extParams.getString(ignoreTerm); and the other -D arguments. DataSetTuple2String, Integer counts = text.flatMap(new Tokenizer()).withParameters(extParams).map(new TermFilter(args)); {code} I think the right location for this is the {{flink-contrib}} package. Also, its very important to write test cases for your code and to add some documentation... But I think that can follow after a first working prototype. Let me know if you need more information on this. Provide utils to pass -D parameters to UDFs Key: FLINK-1525 URL: https://issues.apache.org/jira/browse/FLINK-1525 Project: Flink Issue Type: Improvement Components: flink-contrib Reporter: Robert Metzger Labels: starter Hadoop users are used to setting job configuration through -D on the command line. Right now, Flink users have to manually parse command line arguments and pass them to the methods. It would be nice to provide a standard args parser with is taking care of such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/454#issuecomment-78234801 That are some massive changes to the pom files. As far as I can tell it looks good to me. The ```Akka``` changes look good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78226428 Curious why this works. We should merge this pull request anyways, because it shows the recommended way to do things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78228121 As I wrote in JIRA, Fields of Scala class (not case class) are private. ([Reference](http://stackoverflow.com/questions/1589603/scala-set-a-field-value-reflectively-from-field-name)) Because fields declared by `val` keyword don't have setter, Flink TypeExtractor fails to extract information of POJO example in documentation. I attached a result of sample type extraction in [JIRA](https://issues.apache.org/jira/browse/FLINK-1654?focusedCommentId=14348124page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14348124). Flink TypeExtractor deals with Scala case class differently. Case classes are dealt like Scala tuple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78229903 Then I guess that the Scala API does not identify the ```WordWithCount``` as a POJO but as something different. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/454#discussion_r26199162 --- Diff: flink-shaded-hadoop/pom.xml --- @@ -60,19 +73,36 @@ under the License. shadedArtifactAttachedfalse/shadedArtifactAttached createDependencyReducedPomtrue/createDependencyReducedPom dependencyReducedPomLocation${project.basedir}/target/dependency-reduced-pom.xml/dependencyReducedPomLocation - artifactSet - includes - includecom.google.guava:guava/include - /includes - /artifactSet + filters + filter + !-- maybe slf4j-jdk14 -- + artifactorg.slf4j:*/artifact + excludes + excludeorg/slf4j/impl/StaticLoggerBinder*/exclude + /excludes + /filter + /filters + transformers + !-- The service transformer is needed to merge META-INF/services files -- + transformer implementation=org.apache.maven.plugins.shade.resource.ServicesResourceTransformer/ + /transformers relocations relocation patterncom.google/pattern - shadedPatternorg.apache.flink.shaded.com.google/shadedPattern + shadedPatternorg.apache.flink.hadoop.shaded.com.google/shadedPattern excludes - excludecom.google.protobuf.**/exclude + !-- excludecom.google.protobuf.**/exclude -- --- End diff -- Why commenting out and not removing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts
[ https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356650#comment-14356650 ] ASF GitHub Bot commented on FLINK-1605: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/454#discussion_r26199162 --- Diff: flink-shaded-hadoop/pom.xml --- @@ -60,19 +73,36 @@ under the License. shadedArtifactAttachedfalse/shadedArtifactAttached createDependencyReducedPomtrue/createDependencyReducedPom dependencyReducedPomLocation${project.basedir}/target/dependency-reduced-pom.xml/dependencyReducedPomLocation - artifactSet - includes - includecom.google.guava:guava/include - /includes - /artifactSet + filters + filter + !-- maybe slf4j-jdk14 -- + artifactorg.slf4j:*/artifact + excludes + excludeorg/slf4j/impl/StaticLoggerBinder*/exclude + /excludes + /filter + /filters + transformers + !-- The service transformer is needed to merge META-INF/services files -- + transformer implementation=org.apache.maven.plugins.shade.resource.ServicesResourceTransformer/ + /transformers relocations relocation patterncom.google/pattern - shadedPatternorg.apache.flink.shaded.com.google/shadedPattern + shadedPatternorg.apache.flink.hadoop.shaded.com.google/shadedPattern excludes - excludecom.google.protobuf.**/exclude + !-- excludecom.google.protobuf.**/exclude -- --- End diff -- Why commenting out and not removing? Create a shaded Hadoop fat jar to resolve library version conflicts --- Key: FLINK-1605 URL: https://issues.apache.org/jira/browse/FLINK-1605 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger As per mailing list discussion: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/478#issuecomment-78230090 Maybe it is treated as a ```GenericType``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/477#issuecomment-78234335 Why are you using Scala 2.11.4? The mentioned bug seems to be fixed in 2.11.6. Why aren't we adding a `_2.11` suffix to the Scala 2.11 Flink builds? Otherwise users which want to use Scala 2.11 in their project have to always set the profile/property. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/454#issuecomment-78235211 Thanks for the review. I'll address your comments and then merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Add support for building Flink with Scala 2.11
Github user aalexandrov commented on a diff in the pull request: https://github.com/apache/flink/pull/477#discussion_r26210582 --- Diff: pom.xml --- @@ -599,6 +599,38 @@ under the License. profiles profile + idscala-2.10/id + activation + property + !-- this is the default scala profile -- + name!scala-2.11.profile/name + /property + /activation + properties + !-- Scala / Akka -- + scala.version2.10.4/scala.version + scala.binary.version2.10/scala.binary.version + scala.macros.version2.0.1/scala.macros.version + akka.version2.3.7/akka.version --- End diff -- OK. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/479#issuecomment-78257904 Really nice work, Till! Very nicely documented. Just made a few comments concerning the paths for the documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/479 Kick off of Flink's machine learning library This PR contains the kick off of Flink's machine learning library. Currently it contains implementations for ALS, multiple linear regression and polynomial base feature mapper. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink flink-ml Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/479.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #479 commit 71afb4ba942840a3bc37c9b67749a8ebbd0ae41b Author: Till Rohrmann trohrm...@apache.org Date: 2015-02-27T12:43:46Z [ml] Initial commit to establish module structure. Adds simple Vector and Matrix types. commit 5e0e42c4746dcca4f266e7a946edf8d07173820e Author: Till Rohrmann trohrm...@apache.org Date: 2015-03-02T13:39:18Z [ml] Adds batch gradient descent linear regression with l2 norm [ml] Adds batch gradient descent linear regression with convergence criterion as relative change in sum of squared residuals [ml] Adds comments to MultipleLinearRegression commit 2a04c750d55922b359cec80c0d46a49001f38cca Author: Till Rohrmann trohrm...@apache.org Date: 2015-03-05T13:05:59Z [ml] Adds alternating least squares (ALS) implementation with test case [ml] Adds comments to ALS commit 933e263609558547591779c7543079d62a5f9a83 Author: Till Rohrmann trohrm...@apache.org Date: 2015-03-06T14:27:25Z [ml] Introduces FlinkTools containing persist methods. [ml] Changes comments into proper ScalaDoc in MultipleLinearRegression commit be01b5132589d5a4337887b2e26dbd299959655f Author: Till Rohrmann trohrm...@apache.org Date: 2015-03-09T17:10:52Z [ml] Adds polynomial base feature mapper and test cases [ml] Adds comments to PolynomialBase commit d923033eafc5fffd49a1ea587af453d779ff2b65 Author: Till Rohrmann trohrm...@apache.org Date: 2015-03-10T11:26:37Z [ml] Adds web documentation for multiple linear regression. Changes website links from relative to absolute. commit 2340f7283fbd0c6ffbe18d2d62d76f5abc628e28 Author: Till Rohrmann trohrm...@apache.org Date: 2015-03-10T14:41:40Z [ml] Adds web documentation for alternating least squares. Adds web documentation for polynomial base feature mapper. [ml] Adds comments [ml] Set degree of parallelism of test suites to 2 [ml] Replaces FlatSpec tests with JUnit integration test cases in order to suppress the sysout output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26205234 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, Class?[] fieldTypes) { public Class?[] getFieldTypes() { return super.getGenericFieldTypes(); } + + public void setFieldsMap(String[] fieldsMap) { + Preconditions.checkNotNull(fieldsMap); + Preconditions.checkState(typeInformation instanceof PojoTypeInfo); + + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + + String[] fields = pojoTypeInfo.getFieldNames(); + Class?[] fieldTypes = getFieldTypes(); + this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, fieldsMap.length); + + boolean[] includeMask = new boolean[fieldsMap.length]; --- End diff -- The ``includeMask`` refers to the fields in the CsvFile and allows to skip fields of the file. For example if a line in your file looks like: ``Sam,Smith,09-15-1963,123.123``and you only want to read the first name and the date field, you would set the ``includeMask`` to ``[true, false, true]`` (missing fields are treated as ``false``). So the ``includeMask`` should not depend on the ``fieldsMap``, but the number of ``true`` entries in the ``includeMask`` must be equal to the number for fields in the ``fieldsMap``. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206307 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int offset, int numBytes) throws } if (parseRecord(parsedValues, bytes, offset, numBytes)) { - // valid parse, map values into pact record - for (int i = 0; i parsedValues.length; i++) { - reuse.setField(parsedValues[i], i); + if (typeInformation instanceof TupleTypeInfoBase) { + // result type is tuple + Tuple result = (Tuple) reuse; + for (int i = 0; i parsedValues.length; i++) { + result.setField(parsedValues[i], i); + } + } else { + // result type is POJO + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + for (int i = 0; i parsedValues.length; i++) { + if (fieldsMap[i] == null) { --- End diff -- May not be null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1686) Streaming iteration heads cannot be instantiated
Paris Carbone created FLINK-1686: Summary: Streaming iteration heads cannot be instantiated Key: FLINK-1686 URL: https://issues.apache.org/jira/browse/FLINK-1686 Project: Flink Issue Type: Bug Components: Streaming Reporter: Paris Carbone Priority: Critical It looks that streaming jobs with iterations and dop 1 do not work currently. From what I see, when the TaskManager tries to instantiate a new RuntimeEnvironment for the iteration head tasks it fails since the following exception is being thrown: java.lang.Exception: Failed to deploy the task Map (2/8) - execution #0 to slot SimpleSlot (0)(1) - 0e39fcabcab3e8543cc2d8320f9de783 - ALLOCATED/ALIVE: java.lang.Exception: Error setting up runtime environment: java.lang.RuntimeException: Could not register the given element, broker slot is already occupied. at org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:174) at org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:432) . . Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Could not register the given element, broker slot is already occupied. at org.apache.flink.streaming.api.streamvertex.StreamIterationHead.setInputsOutputs(StreamIterationHead.java:64) at org.apache.flink.streaming.api.streamvertex.StreamVertex.registerInputOutput(StreamVertex.java:86) at org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:171) ... 20 more Caused by: java.lang.RuntimeException: Could not register the given element, broker slot is already occupied. at org.apache.flink.runtime.iterative.concurrent.Broker.handIn(Broker.java:39) at org.apache.flink.streaming.api.streamvertex.StreamIterationHead.setInputsOutputs(StreamIterationHead.java:62) The IterateTest passed since it is using a dop of 1 but for higher parallelism it fails. Also, the IterateExample fails as well if you try to run it. I will debug this once I find some time so any ideas of what could possible cause this are more than welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356765#comment-14356765 ] ASF GitHub Bot commented on FLINK-1512: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26205234 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, Class?[] fieldTypes) { public Class?[] getFieldTypes() { return super.getGenericFieldTypes(); } + + public void setFieldsMap(String[] fieldsMap) { + Preconditions.checkNotNull(fieldsMap); + Preconditions.checkState(typeInformation instanceof PojoTypeInfo); + + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + + String[] fields = pojoTypeInfo.getFieldNames(); + Class?[] fieldTypes = getFieldTypes(); + this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, fieldsMap.length); + + boolean[] includeMask = new boolean[fieldsMap.length]; --- End diff -- The ``includeMask`` refers to the fields in the CsvFile and allows to skip fields of the file. For example if a line in your file looks like: ``Sam,Smith,09-15-1963,123.123``and you only want to read the first name and the date field, you would set the ``includeMask`` to ``[true, false, true]`` (missing fields are treated as ``false``). So the ``includeMask`` should not depend on the ``fieldsMap``, but the number of ``true`` entries in the ``includeMask`` must be equal to the number for fields in the ``fieldsMap``. Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356781#comment-14356781 ] ASF GitHub Bot commented on FLINK-1512: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26205969 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, Class?[] fieldTypes) { public Class?[] getFieldTypes() { return super.getGenericFieldTypes(); } + + public void setFieldsMap(String[] fieldsMap) { + Preconditions.checkNotNull(fieldsMap); + Preconditions.checkState(typeInformation instanceof PojoTypeInfo); + + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + + String[] fields = pojoTypeInfo.getFieldNames(); + Class?[] fieldTypes = getFieldTypes(); + this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, fieldsMap.length); + + boolean[] includeMask = new boolean[fieldsMap.length]; + Class?[] newFieldTypes = new Class?[fieldsMap.length]; + + for (int i = 0; i fieldsMap.length; i++) { + if (fieldsMap[i] == null) { + includeMask[i] = false; + newFieldTypes[i] = null; + continue; + } + + for (int j = 0; j fields.length; j++) { + if (fields[j].equals(fieldsMap[i])) { + includeMask[i] = true; + newFieldTypes[i] = fieldTypes[j]; + break; + } + } --- End diff -- Can you throw an exception if the provided field name was not found in the POJO type information? Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356783#comment-14356783 ] ASF GitHub Bot commented on FLINK-1512: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206133 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, Class?[] fieldTypes) { public Class?[] getFieldTypes() { return super.getGenericFieldTypes(); } + + public void setFieldsMap(String[] fieldsMap) { + Preconditions.checkNotNull(fieldsMap); + Preconditions.checkState(typeInformation instanceof PojoTypeInfo); + + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + + String[] fields = pojoTypeInfo.getFieldNames(); + Class?[] fieldTypes = getFieldTypes(); + this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, fieldsMap.length); + + boolean[] includeMask = new boolean[fieldsMap.length]; + Class?[] newFieldTypes = new Class?[fieldsMap.length]; + + for (int i = 0; i fieldsMap.length; i++) { + if (fieldsMap[i] == null) { --- End diff -- IMO, ``null`` values should not be allowed in the ``fieldsMap``. Can you throw an exception in that case? The ``fieldsMap`` should be a list of fields that are mapped to columns in the CSV file. As said before, the columns that are read by the format are defined by the ``includeMask``. Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206381 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int offset, int numBytes) throws } if (parseRecord(parsedValues, bytes, offset, numBytes)) { - // valid parse, map values into pact record - for (int i = 0; i parsedValues.length; i++) { - reuse.setField(parsedValues[i], i); + if (typeInformation instanceof TupleTypeInfoBase) { + // result type is tuple + Tuple result = (Tuple) reuse; + for (int i = 0; i parsedValues.length; i++) { + result.setField(parsedValues[i], i); + } + } else { + // result type is POJO + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + for (int i = 0; i parsedValues.length; i++) { + if (fieldsMap[i] == null) { + continue; + } + + try { + int fieldIndex = typeInformation.getFieldIndex(fieldsMap[i]); + pojoTypeInfo.getPojoFieldAt(fieldIndex).field.set(reuse, parsedValues[i]); --- End diff -- Same for the ``field``. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356787#comment-14356787 ] ASF GitHub Bot commented on FLINK-1512: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206307 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int offset, int numBytes) throws } if (parseRecord(parsedValues, bytes, offset, numBytes)) { - // valid parse, map values into pact record - for (int i = 0; i parsedValues.length; i++) { - reuse.setField(parsedValues[i], i); + if (typeInformation instanceof TupleTypeInfoBase) { + // result type is tuple + Tuple result = (Tuple) reuse; + for (int i = 0; i parsedValues.length; i++) { + result.setField(parsedValues[i], i); + } + } else { + // result type is POJO + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + for (int i = 0; i parsedValues.length; i++) { + if (fieldsMap[i] == null) { --- End diff -- May not be null. Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205510 --- Diff: docs/_layouts/default.html --- @@ -23,16 +23,25 @@ meta http-equiv=X-UA-Compatible content=IE=edge meta name=viewport content=width=device-width, initial-scale=1 titleApache Flink: {{ page.title }}/title -link rel=shortcut icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css -link rel=stylesheet href={{ site.baseurl }}css/bootstrap-lumen-custom.css -link rel=stylesheet href={{ site.baseurl }}css/syntax.css -link rel=stylesheet href={{ site.baseurl }}css/custom.css -link href={{ site.baseurl }}css/main/main.css rel=stylesheet +link rel=shortcut icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap-lumen-custom.css +link rel=stylesheet href={{ site.baseurl }}/css/syntax.css +link rel=stylesheet href={{ site.baseurl }}/css/custom.css +link href={{ site.baseurl }}/css/main/main.css rel=stylesheet script src=https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js;/script -script src={{ site.baseurl }}js/bootstrap.min.js/script -script src={{ site.baseurl }}js/codetabs.js/script +script src={{ site.baseurl }}/js/bootstrap.min.js/script +script src={{ site.baseurl }}/js/codetabs.js/script + +{% if page.mathjax %} +script type=text/x-mathjax-config +MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); +/script +script type=text/javascript + src=https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML; --- End diff -- ASL2.0, so thats good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205970 --- Diff: docs/_includes/navbar.html --- @@ -24,15 +24,15 @@ We might be on an externally hosted documentation site. Please keep the site.FLINK_WEBSITE_URL below to ensure a link back to the Flink website. {% endcomment %} - a href={{ site.FLINK_WEBSITE_URL }}index.html title=Home + a href={{ site.FLINK_WEBSITE_URL }}/index.html title=Home img class=hidden-xs hidden-sm img-responsive - src={{ site.baseurl }}img/logo.png alt=Apache Flink Logo + src={{ site.baseurl }}/img/logo.png alt=Apache Flink Logo /a div class=row visible-xs div class=col-xs-3 - a href={{ site.baseurl }}index.html title=Home + a href={{ site.baseurl }}/index.html title=Home --- End diff -- This will point the Home button to http://ci.apache.org instead of the documentation. {{ site.baseurl }} is actually not set in the config. We could set it to the current docs or just keep the relative links which worked fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205980 --- Diff: docs/_includes/sidenav.html --- @@ -17,51 +17,51 @@ under the License. -- ul id=flink-doc-sidenav - lidiv class=sidenav-categorya href=faq.htmlFAQ/a/div/li + lidiv class=sidenav-categorya href=/faq.htmlFAQ/a/div/li --- End diff -- Same as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205978 --- Diff: docs/_includes/navbar.html --- @@ -49,15 +49,15 @@ ul class=nav navbar-nav li - a href=index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a + a href=/index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a /li li - a href=api/java/index.htmlJavadoc/a + a href=/api/java/index.htmlJavadoc/a /li li - a href=api/scala/index.html#org.apache.flink.api.scala.packageScaladoc/a + a href=/api/scala/index.html#org.apache.flink.api.scala.packageScaladoc/a --- End diff -- Same as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205983 --- Diff: docs/_layouts/default.html --- @@ -23,16 +23,25 @@ meta http-equiv=X-UA-Compatible content=IE=edge meta name=viewport content=width=device-width, initial-scale=1 titleApache Flink: {{ page.title }}/title -link rel=shortcut icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css -link rel=stylesheet href={{ site.baseurl }}css/bootstrap-lumen-custom.css -link rel=stylesheet href={{ site.baseurl }}css/syntax.css -link rel=stylesheet href={{ site.baseurl }}css/custom.css -link href={{ site.baseurl }}css/main/main.css rel=stylesheet +link rel=shortcut icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap-lumen-custom.css +link rel=stylesheet href={{ site.baseurl }}/css/syntax.css +link rel=stylesheet href={{ site.baseurl }}/css/custom.css +link href={{ site.baseurl }}/css/main/main.css rel=stylesheet --- End diff -- This will break the loading of stylesheets on our website. {{ site.baseurl }} is actually not set in the config. Thus the path /css will load the css path of http://ci.apache.org --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205976 --- Diff: docs/_includes/navbar.html --- @@ -49,15 +49,15 @@ ul class=nav navbar-nav li - a href=index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a + a href=/index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a /li li - a href=api/java/index.htmlJavadoc/a + a href=/api/java/index.htmlJavadoc/a --- End diff -- Same as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26205969 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, Class?[] fieldTypes) { public Class?[] getFieldTypes() { return super.getGenericFieldTypes(); } + + public void setFieldsMap(String[] fieldsMap) { + Preconditions.checkNotNull(fieldsMap); + Preconditions.checkState(typeInformation instanceof PojoTypeInfo); + + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + + String[] fields = pojoTypeInfo.getFieldNames(); + Class?[] fieldTypes = getFieldTypes(); + this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, fieldsMap.length); + + boolean[] includeMask = new boolean[fieldsMap.length]; + Class?[] newFieldTypes = new Class?[fieldsMap.length]; + + for (int i = 0; i fieldsMap.length; i++) { + if (fieldsMap[i] == null) { + includeMask[i] = false; + newFieldTypes[i] = null; + continue; + } + + for (int j = 0; j fields.length; j++) { + if (fields[j].equals(fieldsMap[i])) { + includeMask[i] = true; + newFieldTypes[i] = fieldTypes[j]; + break; + } + } --- End diff -- Can you throw an exception if the provided field name was not found in the POJO type information? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26205975 --- Diff: docs/_includes/navbar.html --- @@ -49,15 +49,15 @@ ul class=nav navbar-nav li - a href=index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a + a href=/index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a --- End diff -- Please use relative paths. This will fail with our current docs setup because the base url is http://ci.apache.org/projects/flink/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206133 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, Class?[] fieldTypes) { public Class?[] getFieldTypes() { return super.getGenericFieldTypes(); } + + public void setFieldsMap(String[] fieldsMap) { + Preconditions.checkNotNull(fieldsMap); + Preconditions.checkState(typeInformation instanceof PojoTypeInfo); + + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + + String[] fields = pojoTypeInfo.getFieldNames(); + Class?[] fieldTypes = getFieldTypes(); + this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, fieldsMap.length); + + boolean[] includeMask = new boolean[fieldsMap.length]; + Class?[] newFieldTypes = new Class?[fieldsMap.length]; + + for (int i = 0; i fieldsMap.length; i++) { + if (fieldsMap[i] == null) { --- End diff -- IMO, ``null`` values should not be allowed in the ``fieldsMap``. Can you throw an exception in that case? The ``fieldsMap`` should be a list of fields that are mapped to columns in the CSV file. As said before, the columns that are read by the format are defined by the ``includeMask``. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206329 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int offset, int numBytes) throws } if (parseRecord(parsedValues, bytes, offset, numBytes)) { - // valid parse, map values into pact record - for (int i = 0; i parsedValues.length; i++) { - reuse.setField(parsedValues[i], i); + if (typeInformation instanceof TupleTypeInfoBase) { + // result type is tuple + Tuple result = (Tuple) reuse; + for (int i = 0; i parsedValues.length; i++) { + result.setField(parsedValues[i], i); + } + } else { + // result type is POJO + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + for (int i = 0; i parsedValues.length; i++) { + if (fieldsMap[i] == null) { + continue; + } + + try { + int fieldIndex = typeInformation.getFieldIndex(fieldsMap[i]); --- End diff -- We could compute the index upfront and avoid the String look-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356788#comment-14356788 ] ASF GitHub Bot commented on FLINK-1512: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/426#discussion_r26206329 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java --- @@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int offset, int numBytes) throws } if (parseRecord(parsedValues, bytes, offset, numBytes)) { - // valid parse, map values into pact record - for (int i = 0; i parsedValues.length; i++) { - reuse.setField(parsedValues[i], i); + if (typeInformation instanceof TupleTypeInfoBase) { + // result type is tuple + Tuple result = (Tuple) reuse; + for (int i = 0; i parsedValues.length; i++) { + result.setField(parsedValues[i], i); + } + } else { + // result type is POJO + PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) typeInformation; + for (int i = 0; i parsedValues.length; i++) { + if (fieldsMap[i] == null) { + continue; + } + + try { + int fieldIndex = typeInformation.getFieldIndex(fieldsMap[i]); --- End diff -- We could compute the index upfront and avoid the String look-up. Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.
[ https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357861#comment-14357861 ] ASF GitHub Bot commented on FLINK-1512: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/426#issuecomment-78401359 @fhueske Thanks for your kindly advice. I will fix as soon as possible. Add CsvReader for reading into POJOs. - Key: FLINK-1512 URL: https://issues.apache.org/jira/browse/FLINK-1512 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter Currently, the {{CsvReader}} supports only TupleXX types. It would be nice if users were also able to read into POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/426#issuecomment-78401359 @fhueske Thanks for your kindly advice. I will fix as soon as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-1684) Make Kafka connectors read/write a partition the worker is on
[ https://issues.apache.org/jira/browse/FLINK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi closed FLINK-1684. - Resolution: Duplicate Make Kafka connectors read/write a partition the worker is on - Key: FLINK-1684 URL: https://issues.apache.org/jira/browse/FLINK-1684 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Gábor Hermann Kafka connectors could read/write partitions on a different machine. It is a best effort to find the partitions located on the same node as the subtask and read from (or write to) that partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1684) Make Kafka connectors read/write a partition the worker is on
[ https://issues.apache.org/jira/browse/FLINK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356961#comment-14356961 ] Márton Balassi commented on FLINK-1684: --- This is a duplicate of FLINK-1673. Make Kafka connectors read/write a partition the worker is on - Key: FLINK-1684 URL: https://issues.apache.org/jira/browse/FLINK-1684 Project: Flink Issue Type: Improvement Components: Streaming Reporter: Gábor Hermann Kafka connectors could read/write partitions on a different machine. It is a best effort to find the partitions located on the same node as the subtask and read from (or write to) that partition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26212489 --- Diff: docs/_includes/navbar.html --- @@ -24,15 +24,15 @@ We might be on an externally hosted documentation site. Please keep the site.FLINK_WEBSITE_URL below to ensure a link back to the Flink website. {% endcomment %} - a href={{ site.FLINK_WEBSITE_URL }}index.html title=Home + a href={{ site.FLINK_WEBSITE_URL }}/index.html title=Home img class=hidden-xs hidden-sm img-responsive - src={{ site.baseurl }}img/logo.png alt=Apache Flink Logo + src={{ site.baseurl }}/img/logo.png alt=Apache Flink Logo /a div class=row visible-xs div class=col-xs-3 - a href={{ site.baseurl }}index.html title=Home + a href={{ site.baseurl }}/index.html title=Home --- End diff -- Do we have control over ```{{ site.baseurl }}}```? The problem with the relative links is that you cannot have websites in different directories than the root directory. IMHO this is not how it should be. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26212689 --- Diff: docs/_includes/navbar.html --- @@ -49,15 +49,15 @@ ul class=nav navbar-nav li - a href=index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a + a href=/index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a --- End diff -- Then we should try to change the way the docs are linked I think. Relative paths don't allow you to have any kind of directory structure for the html files. Maintaining a directory with all our html files will quickly become a big mess. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26215901 --- Diff: docs/_includes/navbar.html --- @@ -49,15 +49,15 @@ ul class=nav navbar-nav li - a href=index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a + a href=/index.html class={% if page.url contains 'index.html' %}active{% endif %}Documentation/a --- End diff -- We could set the `base href=http://ci.apache.org/projects/flink/.../;` in the header. That way, we would be able to use relative links. However, hard-coding the link to the docs in the template variables, is also not a very nice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26212353 --- Diff: docs/_layouts/default.html --- @@ -23,16 +23,25 @@ meta http-equiv=X-UA-Compatible content=IE=edge meta name=viewport content=width=device-width, initial-scale=1 titleApache Flink: {{ page.title }}/title -link rel=shortcut icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css -link rel=stylesheet href={{ site.baseurl }}css/bootstrap-lumen-custom.css -link rel=stylesheet href={{ site.baseurl }}css/syntax.css -link rel=stylesheet href={{ site.baseurl }}css/custom.css -link href={{ site.baseurl }}css/main/main.css rel=stylesheet +link rel=shortcut icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap-lumen-custom.css +link rel=stylesheet href={{ site.baseurl }}/css/syntax.css +link rel=stylesheet href={{ site.baseurl }}/css/custom.css +link href={{ site.baseurl }}/css/main/main.css rel=stylesheet script src=https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js;/script -script src={{ site.baseurl }}js/bootstrap.min.js/script -script src={{ site.baseurl }}js/codetabs.js/script +script src={{ site.baseurl }}/js/bootstrap.min.js/script +script src={{ site.baseurl }}/js/codetabs.js/script + +{% if page.mathjax %} +script type=text/x-mathjax-config +MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}}); +/script +script type=text/javascript + src=https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML; --- End diff -- Good point. I did not check it. Thanks for doing so. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Kick off of Flink's machine learning library
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/479#discussion_r26212866 --- Diff: docs/_layouts/default.html --- @@ -23,16 +23,25 @@ meta http-equiv=X-UA-Compatible content=IE=edge meta name=viewport content=width=device-width, initial-scale=1 titleApache Flink: {{ page.title }}/title -link rel=shortcut icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=icon href={{ site.baseurl }}favicon.ico type=image/x-icon -link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css -link rel=stylesheet href={{ site.baseurl }}css/bootstrap-lumen-custom.css -link rel=stylesheet href={{ site.baseurl }}css/syntax.css -link rel=stylesheet href={{ site.baseurl }}css/custom.css -link href={{ site.baseurl }}css/main/main.css rel=stylesheet +link rel=shortcut icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=icon href={{ site.baseurl }}/favicon.ico type=image/x-icon +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css +link rel=stylesheet href={{ site.baseurl }}/css/bootstrap-lumen-custom.css +link rel=stylesheet href={{ site.baseurl }}/css/syntax.css +link rel=stylesheet href={{ site.baseurl }}/css/custom.css +link href={{ site.baseurl }}/css/main/main.css rel=stylesheet --- End diff -- Well it seems to be empty. Why do we have ```{{ site.baseurl }}``` in the first place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---