[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357170#comment-14357170
 ] 

Robert Metzger commented on FLINK-1388:
---

Hi,
yes, Tuple types are subclassed. Flink provides predefined Tuple1- Tuple25.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357215#comment-14357215
 ] 

ASF GitHub Bot commented on FLINK-1654:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78308404
  
Thanks for the fix!


 Wrong scala example of POJO type in documentation
 -

 Key: FLINK-1654
 URL: https://issues.apache.org/jira/browse/FLINK-1654
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Trivial

 In 
 [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos],
  there is a scala example of POJO
 {code}
 class WordWithCount(val word: String, val count: Int) {
   def this() {
 this(null, -1)
   }
 }
 {code}
 I think that this is wrong because Flink POJO required public fields or 
 private fields with getter and setter. Fields in scala class is private in 
 default. We should change the field declarations to use `var` keyword or 
 class declaration to case class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357234#comment-14357234
 ] 

Fabian Hueske commented on FLINK-1388:
--

Watch out, PojoTypeInformation is not serializable due to the non-serializable 
``java.lang.reflect.Field``.
You should rather pass the Pojo class name and the name of all fields that 
should be written out.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/478


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1689) Add documentation on streaming file sinks interaction with the batch outputformat

2015-03-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi updated FLINK-1689:
--
Description: OutputFormats supported by the batch API are supported in 
streaming through the FileSinkFunction. A bit of documentation on that is 
needed.

 Add documentation on streaming file sinks interaction with the batch 
 outputformat
 -

 Key: FLINK-1689
 URL: https://issues.apache.org/jira/browse/FLINK-1689
 Project: Flink
  Issue Type: Task
  Components: Streaming
Reporter: Márton Balassi

 OutputFormats supported by the batch API are supported in streaming through 
 the FileSinkFunction. A bit of documentation on that is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1683) Scheduling preferences for non-unary tasks are not correctly computed

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357195#comment-14357195
 ] 

ASF GitHub Bot commented on FLINK-1683:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/476#issuecomment-78306554
  
I updated the PR and made the preference choice a bit more lightweight.


 Scheduling preferences for non-unary tasks are not correctly computed
 -

 Key: FLINK-1683
 URL: https://issues.apache.org/jira/browse/FLINK-1683
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9, 0.8.1
Reporter: Fabian Hueske
Assignee: Fabian Hueske
 Fix For: 0.9, 0.8.2


 When computing scheduling preferences for an execution task, the JobManager 
 looks at the assigned instances of all its input execution tasks and returns 
 a preference only if not more than 8 instances have been found (if the input 
 of a tasks is distributed across more than 8 tasks, local scheduling won't 
 help a lot in any case).
 However, the JobManager treats all input execution tasks the same and does 
 not distinguish between different logical input. The effect is that a join 
 tasks with one broadcasted and one locally forwarded task is not locally 
 assigned towards its locally forwarded input.
 This can have a significant impact on the performance of tasks that have more 
 than one input and which rely on local forwarding and co-located task 
 scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1683] [jobmanager]� Fix scheduling pref...

2015-03-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/476#issuecomment-78306554
  
I updated the PR and made the preference choice a bit more lightweight.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-03-11 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357223#comment-14357223
 ] 

Till Rohrmann commented on FLINK-1537:
--

You're right that it's best if the algorithms are implemented in a general 
fashion so that you can plug in different regularizer or cost functions, for 
example. Mahout is indeed a good example to learn from.

What I was aiming at are more the general building blocks of ML algorithms. In 
many cases, distributed algorithms distribute the data. Then some local 
processing is done which results in some local state. This local state often 
has to be communicated to the other worker nodes to obtain a consistent global 
state. Starting from this global state, the next local computation can be 
started. 

Depending on the algorithm, whether it is stochastic or not, for example, one 
only need a random subset of the data or the complete local partition. If you 
only need a stochastic subset, then often you will only need the corresponding 
subset of the global state to perform your local computations. Then sometimes 
the global state is so huge that it cannot be kept on a single machine and has 
to stored in parallel.

The question is now, how these operations can be realized within Flink to allow 
an efficient implementation of a multitude of machine learning algorithms. For 
example, local state can either be stored as part of a stateful operator or as 
part of the elements stored in the {{DataSet}}.

 GSoC project: Machine learning with Apache Flink
 

 Key: FLINK-1537
 URL: https://issues.apache.org/jira/browse/FLINK-1537
 Project: Flink
  Issue Type: New Feature
Reporter: Till Rohrmann
Priority: Minor
  Labels: gsoc2015, java, machine_learning, scala

 Currently, the Flink community is setting up the infrastructure for a machine 
 learning library for Flink. The goal is to provide a set of highly optimized 
 ML algorithms and to offer a high level linear algebra abstraction to easily 
 do data pre- and post-processing. By defining a set of commonly used data 
 structures on which the algorithms work it will be possible to define complex 
 processing pipelines. 
 The Mahout DSL constitutes a good fit to be used as the linear algebra 
 language in Flink. It has to be evaluated which means have to be provided to 
 allow an easy transition between the high level abstraction and the optimized 
 algorithms.
 The machine learning library offers multiple starting points for a GSoC 
 project. Amongst others, the following projects are conceivable.
 * Extension of Flink's machine learning library by additional ML algorithms
 ** Stochastic gradient descent
 ** Distributed dual coordinate ascent
 ** SVM
 ** Gaussian mixture EM
 ** DecisionTrees
 ** ...
 * Integration of Flink with the Mahout DSL to support a high level linear 
 algebra abstraction
 * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
 learning algorithms
 * Implementation of a parameter server like distributed global state storage 
 facility for Flink. This also includes the extension of Flink to support 
 asynchronous iterations and update messages.
 Own ideas for a possible contribution on the field of the machine learning 
 library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357234#comment-14357234
 ] 

Fabian Hueske edited comment on FLINK-1388 at 3/11/15 5:21 PM:
---

Watch out, PojoTypeInformation is not serializable due to the non-serializable 
java.lang.reflect.Field.
You should rather pass the Pojo class name and the name of all fields that 
should be written out.


was (Author: fhueske):
Watch out, PojoTypeInformation is not serializable due to the non-serializable 
``java.lang.reflect.Field``.
You should rather pass the Pojo class name and the name of all fields that 
should be written out.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1688) Add socket sink

2015-03-11 Thread JIRA
Márton Balassi created FLINK-1688:
-

 Summary: Add socket sink
 Key: FLINK-1688
 URL: https://issues.apache.org/jira/browse/FLINK-1688
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Márton Balassi
Priority: Trivial


Add a sink that writes output to socket. I'd consider two options, one which 
implements a socket server and one which implements a client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/477#discussion_r26220968
  
--- Diff: flink-scala/pom.xml ---
@@ -236,4 +230,23 @@ under the License.
/plugins
/build
 
+   profiles
+   profile
+   idscala-2.10/id
+   activation
+   property
+   !-- this is the default scala profile 
--
+   name!scala-2.11.profile/name
--- End diff --

Okay. Accepted. Lets do it with the properties


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Issue Comment Deleted] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-03-11 Thread Till Rohrmann (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1537:
-
Comment: was deleted

(was: Implementing first a decision tree algorithm is definitely the right way 
to go. If you implemented it, then it would be an awesome contribution to 
Flink. And I think it's the best way to get used to Flink's API. Thus, it's a 
win-win situation :-) 

Look at the recently opened [machine learning 
PR|https://github.com/apache/flink/pull/479] which loosely defines interfaces 
for {{Learner}} and {{Transformer}}. A {{Learner}} is an algorithm which takes 
a {{DataSet[A]}} and fits a model to this data. In the case of a decision tree, 
the input data would be a labeled vector and the output would be the learned 
tree. A {{Transformer}} simply takes a {{DataSet[A]}} and transforms it into a 
{{DataSet[B]}}. A feature extractor or data whitening would be an example for 
that. {{Transformer}} can be arbitrarily chained as long as their types match. 
A {{Learner}} terminates a transformer pipeline. If you sticked to this model 
with your implementation, then one could prepend any {{Transformer}} to the 
decision tree learner. This makes creating a data analysis pipeline really 
easy. If I can help you with the implementation, then let me know.

A deep learning framework is also something really intriguing but at the same 
time highly ambitious. So far, we haven't made an effort implementing deep 
learning algorithms with Flink. I know that there is the [H2O 
project|https://github.com/h2oai/h2o-dev] which does distributed deep learning. 
However, their underlying data model is different form ours. If I'm not 
mistaken, then they store the data column-wise whereas we store them row-wise. 
I don't know what difference this makes. The first thing would probably be to 
evaluate Flink's potential for deep learning and then to come up with a 
prototype.)

 GSoC project: Machine learning with Apache Flink
 

 Key: FLINK-1537
 URL: https://issues.apache.org/jira/browse/FLINK-1537
 Project: Flink
  Issue Type: New Feature
Reporter: Till Rohrmann
Priority: Minor
  Labels: gsoc2015, java, machine_learning, scala

 Currently, the Flink community is setting up the infrastructure for a machine 
 learning library for Flink. The goal is to provide a set of highly optimized 
 ML algorithms and to offer a high level linear algebra abstraction to easily 
 do data pre- and post-processing. By defining a set of commonly used data 
 structures on which the algorithms work it will be possible to define complex 
 processing pipelines. 
 The Mahout DSL constitutes a good fit to be used as the linear algebra 
 language in Flink. It has to be evaluated which means have to be provided to 
 allow an easy transition between the high level abstraction and the optimized 
 algorithms.
 The machine learning library offers multiple starting points for a GSoC 
 project. Amongst others, the following projects are conceivable.
 * Extension of Flink's machine learning library by additional ML algorithms
 ** Stochastic gradient descent
 ** Distributed dual coordinate ascent
 ** SVM
 ** Gaussian mixture EM
 ** DecisionTrees
 ** ...
 * Integration of Flink with the Mahout DSL to support a high level linear 
 algebra abstraction
 * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
 learning algorithms
 * Implementation of a parameter server like distributed global state storage 
 facility for Flink. This also includes the extension of Flink to support 
 asynchronous iterations and update messages.
 Own ideas for a possible contribution on the field of the machine learning 
 library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1690) ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis

2015-03-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357146#comment-14357146
 ] 

Stephan Ewen commented on FLINK-1690:
-

I will create a patch with increased timeout...

 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously 
 fails on Travis
 --

 Key: FLINK-1690
 URL: https://issues.apache.org/jira/browse/FLINK-1690
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

 I got the following error on Travis.
 {code}
 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:244 The 
 program did not finish in time
 {code}
 I think we have to increase the timeouts for this test case to make it 
 reliably run on Travis.
 The log of the failed Travis build can be found 
 [here|https://api.travis-ci.org/jobs/53952486/log.txt?deansi=true]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-03-11 Thread Till Rohrmann (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357163#comment-14357163
 ] 

Till Rohrmann commented on FLINK-1537:
--

Implementing first a decision tree algorithm is definitely the right way to go. 
If you implemented it, then it would be an awesome contribution to Flink. And I 
think it's the best way to get used to Flink's API. Thus, it's a win-win 
situation :-) 

Look at the recently opened [machine learning 
PR|https://github.com/apache/flink/pull/479] which loosely defines interfaces 
for {{Learner}} and {{Transformer}}. A {{Learner}} is an algorithm which takes 
a {{DataSet[A]}} and fits a model to this data. In the case of a decision tree, 
the input data would be a labeled vector and the output would be the learned 
tree. A {{Transformer}} simply takes a {{DataSet[A]}} and transforms it into a 
{{DataSet[B]}}. A feature extractor or data whitening would be an example for 
that. {{Transformer}} can be arbitrarily chained as long as their types match. 
A {{Learner}} terminates a transformer pipeline. If you sticked to this model 
with your implementation, then one could prepend any {{Transformer}} to the 
decision tree learner. This makes creating a data analysis pipeline really 
easy. If I can help you with the implementation, then let me know.

A deep learning framework is also something really intriguing but at the same 
time highly ambitious. So far, we haven't made an effort implementing deep 
learning algorithms with Flink. I know that there is the [H2O 
project|https://github.com/h2oai/h2o-dev] which does distributed deep learning. 
However, their underlying data model is different form ours. If I'm not 
mistaken, then they store the data column-wise whereas we store them row-wise. 
I don't know what difference this makes. The first thing would probably be to 
evaluate Flink's potential for deep learning and then to come up with a 
prototype.

 GSoC project: Machine learning with Apache Flink
 

 Key: FLINK-1537
 URL: https://issues.apache.org/jira/browse/FLINK-1537
 Project: Flink
  Issue Type: New Feature
Reporter: Till Rohrmann
Priority: Minor
  Labels: gsoc2015, java, machine_learning, scala

 Currently, the Flink community is setting up the infrastructure for a machine 
 learning library for Flink. The goal is to provide a set of highly optimized 
 ML algorithms and to offer a high level linear algebra abstraction to easily 
 do data pre- and post-processing. By defining a set of commonly used data 
 structures on which the algorithms work it will be possible to define complex 
 processing pipelines. 
 The Mahout DSL constitutes a good fit to be used as the linear algebra 
 language in Flink. It has to be evaluated which means have to be provided to 
 allow an easy transition between the high level abstraction and the optimized 
 algorithms.
 The machine learning library offers multiple starting points for a GSoC 
 project. Amongst others, the following projects are conceivable.
 * Extension of Flink's machine learning library by additional ML algorithms
 ** Stochastic gradient descent
 ** Distributed dual coordinate ascent
 ** SVM
 ** Gaussian mixture EM
 ** DecisionTrees
 ** ...
 * Integration of Flink with the Mahout DSL to support a high level linear 
 algebra abstraction
 * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
 learning algorithms
 * Implementation of a parameter server like distributed global state storage 
 facility for Flink. This also includes the extension of Flink to support 
 asynchronous iterations and update messages.
 Own ideas for a possible contribution on the field of the machine learning 
 library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1654) Wrong scala example of POJO type in documentation

2015-03-11 Thread Fabian Hueske (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Hueske resolved FLINK-1654.
--
Resolution: Fixed

FIxed with fd9ca4defd54fa150d33d042471e381e0a0a1164

 Wrong scala example of POJO type in documentation
 -

 Key: FLINK-1654
 URL: https://issues.apache.org/jira/browse/FLINK-1654
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Trivial

 In 
 [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos],
  there is a scala example of POJO
 {code}
 class WordWithCount(val word: String, val count: Int) {
   def this() {
 this(null, -1)
   }
 }
 {code}
 I think that this is wrong because Flink POJO required public fields or 
 private fields with getter and setter. Fields in scala class is private in 
 default. We should change the field declarations to use `var` keyword or 
 class declaration to case class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78308404
  
Thanks for the fix!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-1691) Inprove CountCollectITCase

2015-03-11 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1691:
---

 Summary: Inprove CountCollectITCase
 Key: FLINK-1691
 URL: https://issues.apache.org/jira/browse/FLINK-1691
 Project: Flink
  Issue Type: Bug
  Components: test
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Maximilian Michels
 Fix For: 0.9


The CountCollectITCase logs heavily and does not reuse the same cluster across 
multiple tests.

Both can be addressed by letting it extend the MultipleProgramsTestBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1689) Add documentation on streaming file sinks interaction with the batch outputformat

2015-03-11 Thread JIRA
Márton Balassi created FLINK-1689:
-

 Summary: Add documentation on streaming file sinks interaction 
with the batch outputformat
 Key: FLINK-1689
 URL: https://issues.apache.org/jira/browse/FLINK-1689
 Project: Flink
  Issue Type: Task
  Components: Streaming
Reporter: Márton Balassi






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26227591
  
--- Diff: docs/_layouts/default.html ---
@@ -23,16 +23,25 @@
 meta http-equiv=X-UA-Compatible content=IE=edge
 meta name=viewport content=width=device-width, initial-scale=1
 titleApache Flink: {{ page.title }}/title
-link rel=shortcut icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css
-link rel=stylesheet href={{ site.baseurl 
}}css/bootstrap-lumen-custom.css
-link rel=stylesheet href={{ site.baseurl }}css/syntax.css
-link rel=stylesheet href={{ site.baseurl }}css/custom.css
-link href={{ site.baseurl }}css/main/main.css rel=stylesheet
+link rel=shortcut icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css
+link rel=stylesheet href={{ site.baseurl 
}}/css/bootstrap-lumen-custom.css
+link rel=stylesheet href={{ site.baseurl }}/css/syntax.css
+link rel=stylesheet href={{ site.baseurl }}/css/custom.css
+link href={{ site.baseurl }}/css/main/main.css rel=stylesheet
--- End diff --

Yes, this should fix the problem. I'll add it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78298992
  
I set the ```{{ site.baseurl }}``` to 
http://ci.apache.org/projects/flink/flink-docs-master and the local preview is 
started with ```--baseurl ```.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357577#comment-14357577
 ] 

Fabian Hueske commented on FLINK-1388:
--

That is right, but the OutputFormat itself is also serialized and shipped to 
the TaskManager in the cluster where it is executed.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78311798
  
The reason why I asked regarding Scala 2.11.6 was because this version is 
shown on the scala-lang website next to the download button.
Also, on maven central, you get the impression that it is released: 
http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.scala-lang%22%20AND%20a%3A%22scala-library%22

 Why aren't we adding a _2.11 suffix to the Scala 2.11 Flink builds?

 We can do this, and it certainly makes sense if you want to ship 
pre-builds of both versions. With the current setup if you want to use Flink 
with 2.11 you have to build and install the maven projects yourself (I'm 
blindly following the Spark model here, let me know if you prefer the other 
option).

Thats not entirely true: Users have to build Flink with 2.11 themselves and 
then Build their project with `-Dscala-2.11` to activate the profile inside our 
pom (on `install` maven does not rewrite poms)

So I'm in favor of adding a `_2.11` suffix to all affected Flink artifacts.
But this is really a major change that we really need to discuss on the 
mailing list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-03-11 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357238#comment-14357238
 ] 

Sachin Goel edited comment on FLINK-1537 at 3/11/15 5:23 PM:
-

Yes. I completely agree with the transformer-learner chain design methodology. 
The decision tree I'll write will provide an interface for first specifying the 
structure of the data, i.e., the tuple, as in the types, ranges, etc. and any 
other statistics possible to help with the learning.
I do not see myself how it makes a difference to store the data column wise or 
row wise, although it might have some far-reaching consequences on how the 
learning process proceeds. In fact, this seems like a valid idea for a learning 
process which treat each coordinate one by one. It might help in providing all 
the attributes of one particular coordinate in one go and learn some statistics 
on it, which might help in a better learning process. In fact, in a decision 
tree implementation on big data, it becomes prudent to learn such a statistic 
to ensure only a reasonable number of splits on the data are considered. I will 
look into how this could be achieved with a row-style data representation.
As for the deep learning framework, you are indeed right. I am not sure myself 
if anyone has yet evaluated the potential of a deep learning system on a 
distributed system. I will look into the H2O project's implementation related 
to this. As of yet, I'm still not sure if deep learning can be as fast on 
distributed systems as it is on GPUs.


was (Author: sachingoel0101):
Yes. I completely agree with the transformer-learner chain design methodology. 
The decision I'll write will provide an interface for first specifying the 
structure of the data, i.e., the tuple, as in the types, ranges, etc. and any 
other statistics possible to help with the learning.
I do not see myself how it makes a difference to store the data column wise or 
row wise, although it might have some far-reaching consequences on how the 
learning process proceeds. In fact, this seems like a valid idea for a learning 
process which treat each coordinate one by one. It might help in providing all 
the attributes of one particular coordinate in one go and learn some statistics 
on it, which might help in a better learning process. In fact, in a decision 
tree implementation on big data, it becomes prudent to learn such a statistic 
to ensure only a reasonable number of splits on the data are considered. I will 
look into how this could be achieved with a row-style data representation.
As for the deep learning framework, you are indeed right. I am not sure myself 
if anyone has yet evaluated the potential of a deep learning system on a 
distributed system. I will look into the H2O project's implementation related 
to this. As of yet, I'm still not sure if deep learning can be as fast on 
distributed systems as it is on GPUs.

 GSoC project: Machine learning with Apache Flink
 

 Key: FLINK-1537
 URL: https://issues.apache.org/jira/browse/FLINK-1537
 Project: Flink
  Issue Type: New Feature
Reporter: Till Rohrmann
Priority: Minor
  Labels: gsoc2015, java, machine_learning, scala

 Currently, the Flink community is setting up the infrastructure for a machine 
 learning library for Flink. The goal is to provide a set of highly optimized 
 ML algorithms and to offer a high level linear algebra abstraction to easily 
 do data pre- and post-processing. By defining a set of commonly used data 
 structures on which the algorithms work it will be possible to define complex 
 processing pipelines. 
 The Mahout DSL constitutes a good fit to be used as the linear algebra 
 language in Flink. It has to be evaluated which means have to be provided to 
 allow an easy transition between the high level abstraction and the optimized 
 algorithms.
 The machine learning library offers multiple starting points for a GSoC 
 project. Amongst others, the following projects are conceivable.
 * Extension of Flink's machine learning library by additional ML algorithms
 ** Stochastic gradient descent
 ** Distributed dual coordinate ascent
 ** SVM
 ** Gaussian mixture EM
 ** DecisionTrees
 ** ...
 * Integration of Flink with the Mahout DSL to support a high level linear 
 algebra abstraction
 * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
 learning algorithms
 * Implementation of a parameter server like distributed global state storage 
 facility for Flink. This also includes the extension of Flink to support 
 asynchronous iterations and update messages.
 Own ideas for a possible contribution on the field of the machine learning 
 library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Adnan Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357418#comment-14357418
 ] 

Adnan Khan commented on FLINK-1388:
---

so i understand - why does it matter if TypeInformation is serializable or not? 
aren't we just using it access the fields we're going to write out?

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-03-11 Thread Sachin Goel (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357238#comment-14357238
 ] 

Sachin Goel commented on FLINK-1537:


Yes. I completely agree with the transformer-learner chain design methodology. 
The decision I'll write will provide an interface for first specifying the 
structure of the data, i.e., the tuple, as in the types, ranges, etc. and any 
other statistics possible to help with the learning.
I do not see myself how it makes a difference to store the data column wise or 
row wise, although it might have some far-reaching consequences on how the 
learning process proceeds. In fact, this seems like a valid idea for a learning 
process which treat each coordinate one by one. It might help in providing all 
the attributes of one particular coordinate in one go and learn some statistics 
on it, which might help in a better learning process. In fact, in a decision 
tree implementation on big data, it becomes prudent to learn such a statistic 
to ensure only a reasonable number of splits on the data are considered. I will 
look into how this could be achieved with a row-style data representation.
As for the deep learning framework, you are indeed right. I am not sure myself 
if anyone has yet evaluated the potential of a deep learning system on a 
distributed system. I will look into the H2O project's implementation related 
to this. As of yet, I'm still not sure if deep learning can be as fast on 
distributed systems as it is on GPUs.

 GSoC project: Machine learning with Apache Flink
 

 Key: FLINK-1537
 URL: https://issues.apache.org/jira/browse/FLINK-1537
 Project: Flink
  Issue Type: New Feature
Reporter: Till Rohrmann
Priority: Minor
  Labels: gsoc2015, java, machine_learning, scala

 Currently, the Flink community is setting up the infrastructure for a machine 
 learning library for Flink. The goal is to provide a set of highly optimized 
 ML algorithms and to offer a high level linear algebra abstraction to easily 
 do data pre- and post-processing. By defining a set of commonly used data 
 structures on which the algorithms work it will be possible to define complex 
 processing pipelines. 
 The Mahout DSL constitutes a good fit to be used as the linear algebra 
 language in Flink. It has to be evaluated which means have to be provided to 
 allow an easy transition between the high level abstraction and the optimized 
 algorithms.
 The machine learning library offers multiple starting points for a GSoC 
 project. Amongst others, the following projects are conceivable.
 * Extension of Flink's machine learning library by additional ML algorithms
 ** Stochastic gradient descent
 ** Distributed dual coordinate ascent
 ** SVM
 ** Gaussian mixture EM
 ** DecisionTrees
 ** ...
 * Integration of Flink with the Mahout DSL to support a high level linear 
 algebra abstraction
 * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
 learning algorithms
 * Implementation of a parameter server like distributed global state storage 
 facility for Flink. This also includes the extension of Flink to support 
 asynchronous iterations and update messages.
 Own ideas for a possible contribution on the field of the machine learning 
 library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78333617
  
 The reason why I asked regarding Scala 2.11.6 was because this version is 
shown on the scala-lang website next to the download button.

Snap, I guess Christmas came early this year :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78348901
  
I rebased on the current master (includes the changes from PR #454 merged 
today). I'll take a look at the errors thrown on building later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Adnan Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357717#comment-14357717
 ] 

Adnan Khan commented on FLINK-1388:
---

Does that make any references to TypeInformation in OutputFormat unuseable once 
it's shipped to TaskManager on the cluster? Or does it not guarantee anything 
about to non-serializable class members like java.lang.reflect.Field.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-03-11 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357866#comment-14357866
 ] 

Fabian Hueske commented on FLINK-1388:
--

It just won't work and throw an exception which kills the job. You cannot 
serialize an object that has a non-serializable member variable (at least not 
with the standard Java serialization which Flink is using to ship function and 
input/output format object).

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1676) enableForceKryo() is not working as expected

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356460#comment-14356460
 ] 

ASF GitHub Bot commented on FLINK-1676:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/473#discussion_r26194005
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java 
---
@@ -310,6 +310,9 @@ public int getFieldIndex(String fieldName) {
 
@Override
public TypeSerializerT createSerializer(ExecutionConfig config) {
+   if(config.isForceKryoEnabled()) {
+   return new 
GenericTypeInfoT(this.typeClass).createSerializer(config);
--- End diff --

I thought it makes sense to not have the switch, since the flag is called
forceKryo...



 enableForceKryo() is not working as expected
 

 Key: FLINK-1676
 URL: https://issues.apache.org/jira/browse/FLINK-1676
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger

 I my Flink job, I've set the following execution config
 {code}
 final ExecutionEnvironment env = 
 ExecutionEnvironment.getExecutionEnvironment();
 env.getConfig().disableObjectReuse();
 env.getConfig().enableForceKryo();
 {code}
 Setting a breakpoint in the {{PojoSerializer()}} constructor, you'll see that 
 we still serialize data with the POJO serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1676] Rework ExecutionConfig.enableForc...

2015-03-11 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/473#discussion_r26194005
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java 
---
@@ -310,6 +310,9 @@ public int getFieldIndex(String fieldName) {
 
@Override
public TypeSerializerT createSerializer(ExecutionConfig config) {
+   if(config.isForceKryoEnabled()) {
+   return new 
GenericTypeInfoT(this.typeClass).createSerializer(config);
--- End diff --

I thought it makes sense to not have the switch, since the flag is called
forceKryo...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356488#comment-14356488
 ] 

ASF GitHub Bot commented on FLINK-1654:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78223694
  
Hi Chiwan, the documentation example is actually working. In order to be 
treated as a POJO you either need getter/setter or the fields have to be public.


 Wrong scala example of POJO type in documentation
 -

 Key: FLINK-1654
 URL: https://issues.apache.org/jira/browse/FLINK-1654
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Trivial

 In 
 [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos],
  there is a scala example of POJO
 {code}
 class WordWithCount(val word: String, val count: Int) {
   def this() {
 this(null, -1)
   }
 }
 {code}
 I think that this is wrong because Flink POJO required public fields or 
 private fields with getter and setter. Fields in scala class is private in 
 default. We should change the field declarations to use `var` keyword or 
 class declaration to case class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1594) DataStreams don't support self-join

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356379#comment-14356379
 ] 

ASF GitHub Bot commented on FLINK-1594:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/472#issuecomment-78211739
  
What about the stuck test? Did you look into it or did just a re run of 
Travis pass?


 DataStreams don't support self-join
 ---

 Key: FLINK-1594
 URL: https://issues.apache.org/jira/browse/FLINK-1594
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
 Environment: flink-0.9.0-SNAPSHOT
Reporter: Daniel Bali
Assignee: Gábor Hermann

 Trying to window-join a DataStream with itself will result in exceptions. I 
 get the following stack trace:
 {noformat}
 java.lang.Exception: Error setting up runtime environment: Union buffer 
 reader must be initialized with at least two individual buffer readers
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:173)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:419)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:261)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:44)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:89)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.IllegalArgumentException: Union buffer reader must 
 be initialized with at least two individual buffer readers
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:125)
 at 
 org.apache.flink.runtime.io.network.api.reader.UnionBufferReader.init(UnionBufferReader.java:69)
 at 
 org.apache.flink.streaming.api.streamvertex.CoStreamVertex.setConfigInputs(CoStreamVertex.java:101)
 at 
 org.apache.flink.streaming.api.streamvertex.CoStreamVertex.setInputsOutputs(CoStreamVertex.java:63)
 at 
 org.apache.flink.streaming.api.streamvertex.StreamVertex.registerInputOutput(StreamVertex.java:65)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:170)
 ... 20 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1629][FLINK-1630][FLINK-1547] Rework Fl...

2015-03-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/468#issuecomment-78242400
  
TODO: add documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356717#comment-14356717
 ] 

ASF GitHub Bot commented on FLINK-1605:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/454


 Create a shaded Hadoop fat jar to resolve library version conflicts
 ---

 Key: FLINK-1605
 URL: https://issues.apache.org/jira/browse/FLINK-1605
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 As per mailing list discussion: 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...

2015-03-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/454


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts

2015-03-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger resolved FLINK-1605.
---
Resolution: Fixed

Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/84e76f4d

 Create a shaded Hadoop fat jar to resolve library version conflicts
 ---

 Key: FLINK-1605
 URL: https://issues.apache.org/jira/browse/FLINK-1605
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 As per mailing list discussion: 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26203612
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -64,26 +70,45 @@
private transient int commentCount;
 
private transient int invalidLineCount;
+
+   private CompositeTypeOUT typeInformation = null;
+
+   private String[] fieldsMap = null;

+   public CsvInputFormat(Path filePath, TypeInformationOUT 
typeInformation) {
+   this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, 
typeInformation);
+   }

-   public CsvInputFormat(Path filePath) {
-   super(filePath);
-   }   
-   
-   public CsvInputFormat(Path filePath, Class? ... types) {
-   this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, 
types);
-   }   
-   
-   public CsvInputFormat(Path filePath, String lineDelimiter, String 
fieldDelimiter, Class?... types) {
+   public CsvInputFormat(Path filePath, String lineDelimiter, String 
fieldDelimiter, TypeInformationOUT typeInformation) {
super(filePath);
 
+   Preconditions.checkArgument(typeInformation instanceof 
CompositeType);
+   this.typeInformation = (CompositeTypeOUT) typeInformation;
+
setDelimiter(lineDelimiter);
setFieldDelimiter(fieldDelimiter);
 
-   setFieldTypes(types);
+   Class?[] classes = new Class?[typeInformation.getArity()];
+   for (int i = 0, arity = typeInformation.getArity(); i  arity; 
i++) {
+   classes[i] = 
this.typeInformation.getTypeAt(i).getTypeClass();
+   }
+   setFieldTypes(classes);
+
+   if (typeInformation instanceof PojoTypeInfo) {
+   setFieldsMap(this.typeInformation.getFieldNames());
+   setAccessibleToField();
+   }
}
-   
-   
+
+   public void setAccessibleToField() {
--- End diff --

Can we make this method private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356725#comment-14356725
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26203612
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -64,26 +70,45 @@
private transient int commentCount;
 
private transient int invalidLineCount;
+
+   private CompositeTypeOUT typeInformation = null;
+
+   private String[] fieldsMap = null;

+   public CsvInputFormat(Path filePath, TypeInformationOUT 
typeInformation) {
+   this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, 
typeInformation);
+   }

-   public CsvInputFormat(Path filePath) {
-   super(filePath);
-   }   
-   
-   public CsvInputFormat(Path filePath, Class? ... types) {
-   this(filePath, DEFAULT_LINE_DELIMITER, DEFAULT_FIELD_DELIMITER, 
types);
-   }   
-   
-   public CsvInputFormat(Path filePath, String lineDelimiter, String 
fieldDelimiter, Class?... types) {
+   public CsvInputFormat(Path filePath, String lineDelimiter, String 
fieldDelimiter, TypeInformationOUT typeInformation) {
super(filePath);
 
+   Preconditions.checkArgument(typeInformation instanceof 
CompositeType);
+   this.typeInformation = (CompositeTypeOUT) typeInformation;
+
setDelimiter(lineDelimiter);
setFieldDelimiter(fieldDelimiter);
 
-   setFieldTypes(types);
+   Class?[] classes = new Class?[typeInformation.getArity()];
+   for (int i = 0, arity = typeInformation.getArity(); i  arity; 
i++) {
+   classes[i] = 
this.typeInformation.getTypeAt(i).getTypeClass();
+   }
+   setFieldTypes(classes);
+
+   if (typeInformation instanceof PojoTypeInfo) {
+   setFieldsMap(this.typeInformation.getFieldNames());
+   setAccessibleToField();
+   }
}
-   
-   
+
+   public void setAccessibleToField() {
--- End diff --

Can we make this method private?


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/477#discussion_r26195550
  
--- Diff: flink-scala/pom.xml ---
@@ -236,4 +230,23 @@ under the License.
/plugins
/build
 
+   profiles
+   profile
+   idscala-2.10/id
+   activation
+   property
+   !-- this is the default scala profile 
--
+   name!scala-2.11.profile/name
--- End diff --

Why are you activating the profile via this property?
You can just do mvn install -Pscala-2.10 to activate the profile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/477#discussion_r26195970
  
--- Diff: pom.xml ---
@@ -599,6 +599,38 @@ under the License.
 
profiles
profile
+   idscala-2.10/id
+   activation
+   property
+   !-- this is the default scala profile 
--
+   name!scala-2.11.profile/name
+   /property
+   /activation
+   properties
+   !-- Scala / Akka --
+   scala.version2.10.4/scala.version
+   
scala.binary.version2.10/scala.binary.version
+   
scala.macros.version2.0.1/scala.macros.version
+   akka.version2.3.7/akka.version
--- End diff --

The akka and scala macros version seems to be independent of the profile.
I would put them back to the `properties` section above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-1685) Document how to read gzip/compressed files with Flink

2015-03-11 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1685:
-

 Summary: Document how to read gzip/compressed files with Flink
 Key: FLINK-1685
 URL: https://issues.apache.org/jira/browse/FLINK-1685
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Robert Metzger


Too many users asked for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1685) Document how to read gzip/compressed files with Flink

2015-03-11 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356736#comment-14356736
 ] 

Robert Metzger commented on FLINK-1685:
---

the task probably boils down to putting this into an appropriate location in 
the documentation: 
http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/read-gz-files-td760.html

 Document how to read gzip/compressed files with Flink
 -

 Key: FLINK-1685
 URL: https://issues.apache.org/jira/browse/FLINK-1685
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Robert Metzger

 Too many users asked for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1629) Add option to start Flink on YARN in a detached mode

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356750#comment-14356750
 ] 

ASF GitHub Bot commented on FLINK-1629:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/468#issuecomment-78247421
  
Nice work. Looks good to me apart from the missing documentation.


 Add option to start Flink on YARN in a detached mode
 

 Key: FLINK-1629
 URL: https://issues.apache.org/jira/browse/FLINK-1629
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger

 Right now, we expect the YARN command line interface to be connected with the 
 Application Master all the time to control the yarn session or the job.
 For very long running sessions or jobs users want to just fire and forget a 
 job/session to YARN.
 Stopping the session will still be possible using YARN's tools.
 Also, prior to detaching itself, the CLI frontend could print the required 
 command to kill the session as a convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1629) Add option to start Flink on YARN in a detached mode

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356715#comment-14356715
 ] 

ASF GitHub Bot commented on FLINK-1629:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/468#issuecomment-78242400
  
TODO: add documentation.


 Add option to start Flink on YARN in a detached mode
 

 Key: FLINK-1629
 URL: https://issues.apache.org/jira/browse/FLINK-1629
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger

 Right now, we expect the YARN command line interface to be connected with the 
 Application Master all the time to control the yarn session or the job.
 For very long running sessions or jobs users want to just fire and forget a 
 job/session to YARN.
 Stopping the session will still be possible using YARN's tools.
 Also, prior to detaching itself, the CLI frontend could print the required 
 command to kill the session as a convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1629][FLINK-1630][FLINK-1547] Rework Fl...

2015-03-11 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/468#issuecomment-78247421
  
Nice work. Looks good to me apart from the missing documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356513#comment-14356513
 ] 

ASF GitHub Bot commented on FLINK-1654:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78228121
  
As I wrote in JIRA, Fields of Scala class (not case class) are private. 
([Reference](http://stackoverflow.com/questions/1589603/scala-set-a-field-value-reflectively-from-field-name))
 Because fields declared by `val` keyword don't have setter, Flink 
TypeExtractor fails to extract information of POJO example in documentation. I 
attached a result of sample type extraction in 
[JIRA](https://issues.apache.org/jira/browse/FLINK-1654?focusedCommentId=14348124page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14348124).

Flink TypeExtractor deals with Scala case class differently. Case classes 
are dealt like Scala tuple.


 Wrong scala example of POJO type in documentation
 -

 Key: FLINK-1654
 URL: https://issues.apache.org/jira/browse/FLINK-1654
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Trivial

 In 
 [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos],
  there is a scala example of POJO
 {code}
 class WordWithCount(val word: String, val count: Int) {
   def this() {
 this(null, -1)
   }
 }
 {code}
 I think that this is wrong because Flink POJO required public fields or 
 private fields with getter and setter. Fields in scala class is private in 
 default. We should change the field declarations to use `var` keyword or 
 class declaration to case class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1684) Make Kafka connectors read/write a partition the worker is on

2015-03-11 Thread JIRA
Gábor Hermann created FLINK-1684:


 Summary: Make Kafka connectors read/write a partition the worker 
is on
 Key: FLINK-1684
 URL: https://issues.apache.org/jira/browse/FLINK-1684
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gábor Hermann


Kafka connectors could read/write partitions on a different machine. It is a 
best effort to find the partitions located on the same node as the subtask and 
read from (or write to) that partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356525#comment-14356525
 ] 

ASF GitHub Bot commented on FLINK-1654:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78229903
  
Then I guess that the Scala API does not identify the ```WordWithCount``` 
as a POJO but as something different.


 Wrong scala example of POJO type in documentation
 -

 Key: FLINK-1654
 URL: https://issues.apache.org/jira/browse/FLINK-1654
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Trivial

 In 
 [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos],
  there is a scala example of POJO
 {code}
 class WordWithCount(val word: String, val count: Int) {
   def this() {
 this(null, -1)
   }
 }
 {code}
 I think that this is wrong because Flink POJO required public fields or 
 private fields with getter and setter. Fields in scala class is private in 
 default. We should change the field declarations to use `var` keyword or 
 class declaration to case class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1654) Wrong scala example of POJO type in documentation

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356504#comment-14356504
 ] 

ASF GitHub Bot commented on FLINK-1654:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78226428
  
Curious why this works.

We should merge this pull request anyways, because it shows the recommended 
way to do things.


 Wrong scala example of POJO type in documentation
 -

 Key: FLINK-1654
 URL: https://issues.apache.org/jira/browse/FLINK-1654
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 0.9
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Trivial

 In 
 [documentation|https://github.com/chiwanpark/flink/blob/master/docs/programming_guide.md#pojos],
  there is a scala example of POJO
 {code}
 class WordWithCount(val word: String, val count: Int) {
   def this() {
 this(null, -1)
   }
 }
 {code}
 I think that this is wrong because Flink POJO required public fields or 
 private fields with getter and setter. Fields in scala class is private in 
 default. We should change the field declarations to use `var` keyword or 
 class declaration to case class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-03-11 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355904#comment-14355904
 ] 

Robert Metzger edited comment on FLINK-1525 at 3/11/15 9:01 AM:


Hi,
great. We're always happy about new contributors.

The idea behind such a tool is to allow users to easily configure and 
parameterize their functions and code.
I think something like this would be really helpful:
{code}
# inArgs := --input hdfs:///in --output --readers 3 hdfs:///out 
-DignoreTerm=abc -DfilterFactor=0.2
public static void main(String[] inArgs) throws Exception {
final ArgsUtil args = new ArgsUtil(inArgs);
String input = args.getString(input, true); // true for required
String output = args.getString(output, false, file:///tmp); // not 
required with default value.
int readers = args.getInteger(readers);
Configuration extParams = args.getParameters();
// with extParams containing extParams.getString(ignoreTerm); and the 
other -D arguments.

DataSetTuple2String, Integer counts = text.flatMap(new 
Tokenizer()).withParameters(extParams).map(new TermFilter(args));
{code}
I think the right location for this is the {{flink-contrib}} package.
Also, its very important to write test cases for your code and to add some 
documentation... But I think that can follow after a first working prototype.

Let me know if you need more information on this.


was (Author: rmetzger):
Hi,
great. We're always happy about new contributors.

The idea behind such a tool is to allow users to easily configure and 
parameterize their functions and code.
I think something like this would be really helpful:
{code}
public static void main(String[] inArgs) throws Exception {
final ArgsUtil args = new ArgsUtil(inArgs);
String input = args.getString(input, true); // true for required
String output = args.getString(output, false, file:///tmp); // not 
required with default value.
Configuration extParams = args.getParameters();
// with extParams containing extParams.getString(ignoreTerm); and the 
other -D arguments.

DataSetTuple2String, Integer counts = text.flatMap(new 
Tokenizer()).withParameters(extParams).map(new TermFilter(args));
{code}
I think the right location for this is the {{flink-contrib}} package.
Also, its very important to write test cases for your code and to add some 
documentation... But I think that can follow after a first working prototype.

Let me know if you need more information on this.

 Provide utils to pass -D parameters to UDFs 
 

 Key: FLINK-1525
 URL: https://issues.apache.org/jira/browse/FLINK-1525
 Project: Flink
  Issue Type: Improvement
  Components: flink-contrib
Reporter: Robert Metzger
  Labels: starter

 Hadoop users are used to setting job configuration through -D on the 
 command line.
 Right now, Flink users have to manually parse command line arguments and pass 
 them to the methods.
 It would be nice to provide a standard args parser with is taking care of 
 such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/454#issuecomment-78234801
  
That are some massive changes to the pom files. As far as I can tell it 
looks good to me. The ```Akka``` changes look good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78226428
  
Curious why this works.

We should merge this pull request anyways, because it shows the recommended 
way to do things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78228121
  
As I wrote in JIRA, Fields of Scala class (not case class) are private. 
([Reference](http://stackoverflow.com/questions/1589603/scala-set-a-field-value-reflectively-from-field-name))
 Because fields declared by `val` keyword don't have setter, Flink 
TypeExtractor fails to extract information of POJO example in documentation. I 
attached a result of sample type extraction in 
[JIRA](https://issues.apache.org/jira/browse/FLINK-1654?focusedCommentId=14348124page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14348124).

Flink TypeExtractor deals with Scala case class differently. Case classes 
are dealt like Scala tuple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78229903
  
Then I guess that the Scala API does not identify the ```WordWithCount``` 
as a POJO but as something different.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/454#discussion_r26199162
  
--- Diff: flink-shaded-hadoop/pom.xml ---
@@ -60,19 +73,36 @@ under the License.

shadedArtifactAttachedfalse/shadedArtifactAttached

createDependencyReducedPomtrue/createDependencyReducedPom

dependencyReducedPomLocation${project.basedir}/target/dependency-reduced-pom.xml/dependencyReducedPomLocation
-   artifactSet
-   includes
-   
includecom.google.guava:guava/include
-   /includes
-   /artifactSet
+   filters
+   filter
+   !-- 
maybe slf4j-jdk14 --
+   
artifactorg.slf4j:*/artifact
+   
excludes
+   
excludeorg/slf4j/impl/StaticLoggerBinder*/exclude
+   
/excludes
+   /filter
+   /filters
+   transformers
+   !-- The 
service transformer is needed to merge META-INF/services files --
+   transformer 
implementation=org.apache.maven.plugins.shade.resource.ServicesResourceTransformer/
+   /transformers
relocations
relocation

patterncom.google/pattern
-   
shadedPatternorg.apache.flink.shaded.com.google/shadedPattern
+   
shadedPatternorg.apache.flink.hadoop.shaded.com.google/shadedPattern

excludes
-   
excludecom.google.protobuf.**/exclude
+   
!-- excludecom.google.protobuf.**/exclude --
--- End diff --

Why commenting out and not removing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1605) Create a shaded Hadoop fat jar to resolve library version conflicts

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356650#comment-14356650
 ] 

ASF GitHub Bot commented on FLINK-1605:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/454#discussion_r26199162
  
--- Diff: flink-shaded-hadoop/pom.xml ---
@@ -60,19 +73,36 @@ under the License.

shadedArtifactAttachedfalse/shadedArtifactAttached

createDependencyReducedPomtrue/createDependencyReducedPom

dependencyReducedPomLocation${project.basedir}/target/dependency-reduced-pom.xml/dependencyReducedPomLocation
-   artifactSet
-   includes
-   
includecom.google.guava:guava/include
-   /includes
-   /artifactSet
+   filters
+   filter
+   !-- 
maybe slf4j-jdk14 --
+   
artifactorg.slf4j:*/artifact
+   
excludes
+   
excludeorg/slf4j/impl/StaticLoggerBinder*/exclude
+   
/excludes
+   /filter
+   /filters
+   transformers
+   !-- The 
service transformer is needed to merge META-INF/services files --
+   transformer 
implementation=org.apache.maven.plugins.shade.resource.ServicesResourceTransformer/
+   /transformers
relocations
relocation

patterncom.google/pattern
-   
shadedPatternorg.apache.flink.shaded.com.google/shadedPattern
+   
shadedPatternorg.apache.flink.hadoop.shaded.com.google/shadedPattern

excludes
-   
excludecom.google.protobuf.**/exclude
+   
!-- excludecom.google.protobuf.**/exclude --
--- End diff --

Why commenting out and not removing?


 Create a shaded Hadoop fat jar to resolve library version conflicts
 ---

 Key: FLINK-1605
 URL: https://issues.apache.org/jira/browse/FLINK-1605
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 As per mailing list discussion: 
 http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-shaded-Hadoop-fat-jar-to-resolve-library-version-conflicts-td3881.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1654] Wrong scala example of POJO type ...

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/478#issuecomment-78230090
  
Maybe it is treated as a ```GenericType```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78234335
  
Why are you using Scala 2.11.4? The mentioned bug seems to be fixed in 
2.11.6.

Why aren't we adding a `_2.11` suffix to the Scala 2.11 Flink builds? 
Otherwise users which want to use Scala 2.11 in their project have to always 
set the profile/property.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1605] Bundle all hadoop dependencies an...

2015-03-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/454#issuecomment-78235211
  
Thanks for the review.
I'll address your comments and then merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/477#discussion_r26210582
  
--- Diff: pom.xml ---
@@ -599,6 +599,38 @@ under the License.
 
profiles
profile
+   idscala-2.10/id
+   activation
+   property
+   !-- this is the default scala profile 
--
+   name!scala-2.11.profile/name
+   /property
+   /activation
+   properties
+   !-- Scala / Akka --
+   scala.version2.10.4/scala.version
+   
scala.binary.version2.10/scala.binary.version
+   
scala.macros.version2.0.1/scala.macros.version
+   akka.version2.3.7/akka.version
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/479#issuecomment-78257904
  
Really nice work, Till! Very nicely documented. Just made a few comments 
concerning the paths for the documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/479

Kick off of Flink's machine learning library

This PR contains the kick off of Flink's machine learning library. 
Currently it contains implementations for ALS, multiple linear regression and 
polynomial base feature mapper.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink flink-ml

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/479.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #479


commit 71afb4ba942840a3bc37c9b67749a8ebbd0ae41b
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-02-27T12:43:46Z

[ml] Initial commit to establish module structure. Adds simple Vector and 
Matrix types.

commit 5e0e42c4746dcca4f266e7a946edf8d07173820e
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-03-02T13:39:18Z

[ml] Adds batch gradient descent linear regression with l2 norm

[ml] Adds batch gradient descent linear regression with convergence 
criterion as relative change in sum of squared residuals

[ml] Adds comments to MultipleLinearRegression

commit 2a04c750d55922b359cec80c0d46a49001f38cca
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-03-05T13:05:59Z

[ml] Adds alternating least squares (ALS) implementation with test case

[ml] Adds comments to ALS

commit 933e263609558547591779c7543079d62a5f9a83
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-03-06T14:27:25Z

[ml] Introduces FlinkTools containing persist methods.

[ml] Changes comments into proper ScalaDoc in MultipleLinearRegression

commit be01b5132589d5a4337887b2e26dbd299959655f
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-03-09T17:10:52Z

[ml] Adds polynomial base feature mapper and test cases

[ml] Adds comments to PolynomialBase

commit d923033eafc5fffd49a1ea587af453d779ff2b65
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-03-10T11:26:37Z

[ml] Adds web documentation for multiple linear regression. Changes website 
links from relative to absolute.

commit 2340f7283fbd0c6ffbe18d2d62d76f5abc628e28
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-03-10T14:41:40Z

[ml] Adds web documentation for alternating least squares. Adds web 
documentation for polynomial base feature mapper.

[ml] Adds comments

[ml] Set degree of parallelism of test suites to 2

[ml] Replaces FlatSpec tests with JUnit integration test cases in order to 
suppress the sysout output.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26205234
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, 
Class?[] fieldTypes) {
public Class?[] getFieldTypes() {
return super.getGenericFieldTypes();
}
+
+   public void setFieldsMap(String[] fieldsMap) {
+   Preconditions.checkNotNull(fieldsMap);
+   Preconditions.checkState(typeInformation instanceof 
PojoTypeInfo);
+
+   PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) 
typeInformation;
+
+   String[] fields = pojoTypeInfo.getFieldNames();
+   Class?[] fieldTypes = getFieldTypes();
+   this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, 
fieldsMap.length);
+
+   boolean[] includeMask = new boolean[fieldsMap.length];
--- End diff --

The ``includeMask`` refers to the fields in the CsvFile and allows to skip 
fields of the file.
For example if a line in your file looks like:
``Sam,Smith,09-15-1963,123.123``and you only want to read the first name 
and the date field, you would set the ``includeMask`` to ``[true, false, 
true]`` (missing fields are treated as ``false``). 
So the ``includeMask`` should not depend on the ``fieldsMap``, but the 
number of ``true`` entries in the ``includeMask`` must be equal to the number 
for fields in the ``fieldsMap``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206307
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) throws
}

if (parseRecord(parsedValues, bytes, offset, numBytes)) {
-   // valid parse, map values into pact record
-   for (int i = 0; i  parsedValues.length; i++) {
-   reuse.setField(parsedValues[i], i);
+   if (typeInformation instanceof TupleTypeInfoBase) {
+   // result type is tuple
+   Tuple result = (Tuple) reuse;
+   for (int i = 0; i  parsedValues.length; i++) {
+   result.setField(parsedValues[i], i);
+   }
+   } else {
+   // result type is POJO
+   PojoTypeInfoOUT pojoTypeInfo = 
(PojoTypeInfoOUT) typeInformation;
+   for (int i = 0; i  parsedValues.length; i++) {
+   if (fieldsMap[i] == null) {
--- End diff --

May not be null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-1686) Streaming iteration heads cannot be instantiated

2015-03-11 Thread Paris Carbone (JIRA)
Paris Carbone created FLINK-1686:


 Summary: Streaming iteration heads cannot be instantiated
 Key: FLINK-1686
 URL: https://issues.apache.org/jira/browse/FLINK-1686
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Reporter: Paris Carbone
Priority: Critical


It looks that streaming jobs with iterations and dop  1 do not work currently. 
From what I see, when the TaskManager tries to instantiate a new 
RuntimeEnvironment for the iteration head tasks it fails since the following 
exception is being thrown:

java.lang.Exception: Failed to deploy the task Map (2/8) - execution #0 to slot 
SimpleSlot (0)(1) - 0e39fcabcab3e8543cc2d8320f9de783 - ALLOCATED/ALIVE: 
java.lang.Exception: Error setting up runtime environment: 
java.lang.RuntimeException: Could not register the given element, broker slot 
is already occupied.
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:174)
at 
org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:432)
.
.
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Could not 
register the given element, broker slot is already occupied.
at 
org.apache.flink.streaming.api.streamvertex.StreamIterationHead.setInputsOutputs(StreamIterationHead.java:64)
at 
org.apache.flink.streaming.api.streamvertex.StreamVertex.registerInputOutput(StreamVertex.java:86)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.init(RuntimeEnvironment.java:171)
... 20 more
Caused by: java.lang.RuntimeException: Could not register the given element, 
broker slot is already occupied.
at 
org.apache.flink.runtime.iterative.concurrent.Broker.handIn(Broker.java:39)
at 
org.apache.flink.streaming.api.streamvertex.StreamIterationHead.setInputsOutputs(StreamIterationHead.java:62)

The IterateTest passed since it is using a dop of 1 but for higher parallelism 
it fails. Also, the IterateExample fails as well if you try to run it. 

I will debug this once I find some time so any ideas of what could possible 
cause this are more than welcome. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356765#comment-14356765
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26205234
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, 
Class?[] fieldTypes) {
public Class?[] getFieldTypes() {
return super.getGenericFieldTypes();
}
+
+   public void setFieldsMap(String[] fieldsMap) {
+   Preconditions.checkNotNull(fieldsMap);
+   Preconditions.checkState(typeInformation instanceof 
PojoTypeInfo);
+
+   PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) 
typeInformation;
+
+   String[] fields = pojoTypeInfo.getFieldNames();
+   Class?[] fieldTypes = getFieldTypes();
+   this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, 
fieldsMap.length);
+
+   boolean[] includeMask = new boolean[fieldsMap.length];
--- End diff --

The ``includeMask`` refers to the fields in the CsvFile and allows to skip 
fields of the file.
For example if a line in your file looks like:
``Sam,Smith,09-15-1963,123.123``and you only want to read the first name 
and the date field, you would set the ``includeMask`` to ``[true, false, 
true]`` (missing fields are treated as ``false``). 
So the ``includeMask`` should not depend on the ``fieldsMap``, but the 
number of ``true`` entries in the ``includeMask`` must be equal to the number 
for fields in the ``fieldsMap``.


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356781#comment-14356781
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26205969
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, 
Class?[] fieldTypes) {
public Class?[] getFieldTypes() {
return super.getGenericFieldTypes();
}
+
+   public void setFieldsMap(String[] fieldsMap) {
+   Preconditions.checkNotNull(fieldsMap);
+   Preconditions.checkState(typeInformation instanceof 
PojoTypeInfo);
+
+   PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) 
typeInformation;
+
+   String[] fields = pojoTypeInfo.getFieldNames();
+   Class?[] fieldTypes = getFieldTypes();
+   this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, 
fieldsMap.length);
+
+   boolean[] includeMask = new boolean[fieldsMap.length];
+   Class?[] newFieldTypes = new Class?[fieldsMap.length];
+
+   for (int i = 0; i  fieldsMap.length; i++) {
+   if (fieldsMap[i] == null) {
+   includeMask[i] = false;
+   newFieldTypes[i] = null;
+   continue;
+   }
+
+   for (int j = 0; j  fields.length; j++) {
+   if (fields[j].equals(fieldsMap[i])) {
+   includeMask[i] = true;
+   newFieldTypes[i] = fieldTypes[j];
+   break;
+   }
+   }
--- End diff --

Can you throw an exception if the provided field name was not found in the 
POJO type information?


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356783#comment-14356783
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206133
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, 
Class?[] fieldTypes) {
public Class?[] getFieldTypes() {
return super.getGenericFieldTypes();
}
+
+   public void setFieldsMap(String[] fieldsMap) {
+   Preconditions.checkNotNull(fieldsMap);
+   Preconditions.checkState(typeInformation instanceof 
PojoTypeInfo);
+
+   PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) 
typeInformation;
+
+   String[] fields = pojoTypeInfo.getFieldNames();
+   Class?[] fieldTypes = getFieldTypes();
+   this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, 
fieldsMap.length);
+
+   boolean[] includeMask = new boolean[fieldsMap.length];
+   Class?[] newFieldTypes = new Class?[fieldsMap.length];
+
+   for (int i = 0; i  fieldsMap.length; i++) {
+   if (fieldsMap[i] == null) {
--- End diff --

IMO, ``null`` values should not be allowed in the ``fieldsMap``. Can you 
throw an exception in that case?
The ``fieldsMap`` should be a list of fields that are mapped to columns in 
the CSV file. As said before, the columns that are read by the format are 
defined by the ``includeMask``.


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206381
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) throws
}

if (parseRecord(parsedValues, bytes, offset, numBytes)) {
-   // valid parse, map values into pact record
-   for (int i = 0; i  parsedValues.length; i++) {
-   reuse.setField(parsedValues[i], i);
+   if (typeInformation instanceof TupleTypeInfoBase) {
+   // result type is tuple
+   Tuple result = (Tuple) reuse;
+   for (int i = 0; i  parsedValues.length; i++) {
+   result.setField(parsedValues[i], i);
+   }
+   } else {
+   // result type is POJO
+   PojoTypeInfoOUT pojoTypeInfo = 
(PojoTypeInfoOUT) typeInformation;
+   for (int i = 0; i  parsedValues.length; i++) {
+   if (fieldsMap[i] == null) {
+   continue;
+   }
+
+   try {
+   int fieldIndex = 
typeInformation.getFieldIndex(fieldsMap[i]);
+   
pojoTypeInfo.getPojoFieldAt(fieldIndex).field.set(reuse, parsedValues[i]);
--- End diff --

Same for the ``field``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356787#comment-14356787
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206307
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) throws
}

if (parseRecord(parsedValues, bytes, offset, numBytes)) {
-   // valid parse, map values into pact record
-   for (int i = 0; i  parsedValues.length; i++) {
-   reuse.setField(parsedValues[i], i);
+   if (typeInformation instanceof TupleTypeInfoBase) {
+   // result type is tuple
+   Tuple result = (Tuple) reuse;
+   for (int i = 0; i  parsedValues.length; i++) {
+   result.setField(parsedValues[i], i);
+   }
+   } else {
+   // result type is POJO
+   PojoTypeInfoOUT pojoTypeInfo = 
(PojoTypeInfoOUT) typeInformation;
+   for (int i = 0; i  parsedValues.length; i++) {
+   if (fieldsMap[i] == null) {
--- End diff --

May not be null.


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205510
  
--- Diff: docs/_layouts/default.html ---
@@ -23,16 +23,25 @@
 meta http-equiv=X-UA-Compatible content=IE=edge
 meta name=viewport content=width=device-width, initial-scale=1
 titleApache Flink: {{ page.title }}/title
-link rel=shortcut icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css
-link rel=stylesheet href={{ site.baseurl 
}}css/bootstrap-lumen-custom.css
-link rel=stylesheet href={{ site.baseurl }}css/syntax.css
-link rel=stylesheet href={{ site.baseurl }}css/custom.css
-link href={{ site.baseurl }}css/main/main.css rel=stylesheet
+link rel=shortcut icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css
+link rel=stylesheet href={{ site.baseurl 
}}/css/bootstrap-lumen-custom.css
+link rel=stylesheet href={{ site.baseurl }}/css/syntax.css
+link rel=stylesheet href={{ site.baseurl }}/css/custom.css
+link href={{ site.baseurl }}/css/main/main.css rel=stylesheet
 script 
src=https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js;/script
-script src={{ site.baseurl }}js/bootstrap.min.js/script
-script src={{ site.baseurl }}js/codetabs.js/script
+script src={{ site.baseurl }}/js/bootstrap.min.js/script
+script src={{ site.baseurl }}/js/codetabs.js/script
+
+{% if page.mathjax %}
+script type=text/x-mathjax-config
+MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], 
['\\(','\\)']]}});
+/script
+script type=text/javascript
+
src=https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML;
--- End diff --

ASL2.0, so thats good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205970
  
--- Diff: docs/_includes/navbar.html ---
@@ -24,15 +24,15 @@
 We might be on an externally hosted documentation site.
 Please keep the site.FLINK_WEBSITE_URL below to ensure a link back 
to the Flink website.
 {% endcomment %}
-   a href={{ site.FLINK_WEBSITE_URL }}index.html title=Home
+   a href={{ site.FLINK_WEBSITE_URL }}/index.html title=Home
  img class=hidden-xs hidden-sm img-responsive
-  src={{ site.baseurl }}img/logo.png alt=Apache Flink Logo
+  src={{ site.baseurl }}/img/logo.png alt=Apache Flink Logo
/a
div class=row visible-xs
  div class=col-xs-3
-   a href={{ site.baseurl }}index.html title=Home  
+   a href={{ site.baseurl }}/index.html title=Home
--- End diff --

This will point the Home button to http://ci.apache.org instead of the 
documentation. {{ site.baseurl }} is actually not set in the config. We could 
set it to the current docs or just keep the relative links which worked fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205980
  
--- Diff: docs/_includes/sidenav.html ---
@@ -17,51 +17,51 @@
 under the License.
 --
 ul id=flink-doc-sidenav
-  lidiv class=sidenav-categorya href=faq.htmlFAQ/a/div/li
+  lidiv class=sidenav-categorya href=/faq.htmlFAQ/a/div/li
--- End diff --

Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205978
  
--- Diff: docs/_includes/navbar.html ---
@@ -49,15 +49,15 @@
  ul class=nav navbar-nav
 
li
- a href=index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
+ a href=/index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
/li
 
li
- a href=api/java/index.htmlJavadoc/a
+ a href=/api/java/index.htmlJavadoc/a
/li
 
li
- a 
href=api/scala/index.html#org.apache.flink.api.scala.packageScaladoc/a
+ a 
href=/api/scala/index.html#org.apache.flink.api.scala.packageScaladoc/a
--- End diff --

Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205983
  
--- Diff: docs/_layouts/default.html ---
@@ -23,16 +23,25 @@
 meta http-equiv=X-UA-Compatible content=IE=edge
 meta name=viewport content=width=device-width, initial-scale=1
 titleApache Flink: {{ page.title }}/title
-link rel=shortcut icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css
-link rel=stylesheet href={{ site.baseurl 
}}css/bootstrap-lumen-custom.css
-link rel=stylesheet href={{ site.baseurl }}css/syntax.css
-link rel=stylesheet href={{ site.baseurl }}css/custom.css
-link href={{ site.baseurl }}css/main/main.css rel=stylesheet
+link rel=shortcut icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css
+link rel=stylesheet href={{ site.baseurl 
}}/css/bootstrap-lumen-custom.css
+link rel=stylesheet href={{ site.baseurl }}/css/syntax.css
+link rel=stylesheet href={{ site.baseurl }}/css/custom.css
+link href={{ site.baseurl }}/css/main/main.css rel=stylesheet
--- End diff --

This will break the loading of stylesheets on our website. {{ site.baseurl 
}} is actually not set in the config. Thus the path /css will load the css 
path of http://ci.apache.org


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205976
  
--- Diff: docs/_includes/navbar.html ---
@@ -49,15 +49,15 @@
  ul class=nav navbar-nav
 
li
- a href=index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
+ a href=/index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
/li
 
li
- a href=api/java/index.htmlJavadoc/a
+ a href=/api/java/index.htmlJavadoc/a
--- End diff --

Same as above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26205969
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, 
Class?[] fieldTypes) {
public Class?[] getFieldTypes() {
return super.getGenericFieldTypes();
}
+
+   public void setFieldsMap(String[] fieldsMap) {
+   Preconditions.checkNotNull(fieldsMap);
+   Preconditions.checkState(typeInformation instanceof 
PojoTypeInfo);
+
+   PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) 
typeInformation;
+
+   String[] fields = pojoTypeInfo.getFieldNames();
+   Class?[] fieldTypes = getFieldTypes();
+   this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, 
fieldsMap.length);
+
+   boolean[] includeMask = new boolean[fieldsMap.length];
+   Class?[] newFieldTypes = new Class?[fieldsMap.length];
+
+   for (int i = 0; i  fieldsMap.length; i++) {
+   if (fieldsMap[i] == null) {
+   includeMask[i] = false;
+   newFieldTypes[i] = null;
+   continue;
+   }
+
+   for (int j = 0; j  fields.length; j++) {
+   if (fields[j].equals(fieldsMap[i])) {
+   includeMask[i] = true;
+   newFieldTypes[i] = fieldTypes[j];
+   break;
+   }
+   }
--- End diff --

Can you throw an exception if the provided field name was not found in the 
POJO type information?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26205975
  
--- Diff: docs/_includes/navbar.html ---
@@ -49,15 +49,15 @@
  ul class=nav navbar-nav
 
li
- a href=index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
+ a href=/index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
--- End diff --

Please use relative paths. This will fail with our current docs setup 
because the base url is http://ci.apache.org/projects/flink/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206133
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -152,6 +177,38 @@ public void setFields(boolean[] sourceFieldMask, 
Class?[] fieldTypes) {
public Class?[] getFieldTypes() {
return super.getGenericFieldTypes();
}
+
+   public void setFieldsMap(String[] fieldsMap) {
+   Preconditions.checkNotNull(fieldsMap);
+   Preconditions.checkState(typeInformation instanceof 
PojoTypeInfo);
+
+   PojoTypeInfoOUT pojoTypeInfo = (PojoTypeInfoOUT) 
typeInformation;
+
+   String[] fields = pojoTypeInfo.getFieldNames();
+   Class?[] fieldTypes = getFieldTypes();
+   this.fieldsMap = Arrays.copyOfRange(fieldsMap, 0, 
fieldsMap.length);
+
+   boolean[] includeMask = new boolean[fieldsMap.length];
+   Class?[] newFieldTypes = new Class?[fieldsMap.length];
+
+   for (int i = 0; i  fieldsMap.length; i++) {
+   if (fieldsMap[i] == null) {
--- End diff --

IMO, ``null`` values should not be allowed in the ``fieldsMap``. Can you 
throw an exception in that case?
The ``fieldsMap`` should be a list of fields that are mapped to columns in 
the CSV file. As said before, the columns that are read by the format are 
defined by the ``includeMask``.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread fhueske
Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206329
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) throws
}

if (parseRecord(parsedValues, bytes, offset, numBytes)) {
-   // valid parse, map values into pact record
-   for (int i = 0; i  parsedValues.length; i++) {
-   reuse.setField(parsedValues[i], i);
+   if (typeInformation instanceof TupleTypeInfoBase) {
+   // result type is tuple
+   Tuple result = (Tuple) reuse;
+   for (int i = 0; i  parsedValues.length; i++) {
+   result.setField(parsedValues[i], i);
+   }
+   } else {
+   // result type is POJO
+   PojoTypeInfoOUT pojoTypeInfo = 
(PojoTypeInfoOUT) typeInformation;
+   for (int i = 0; i  parsedValues.length; i++) {
+   if (fieldsMap[i] == null) {
+   continue;
+   }
+
+   try {
+   int fieldIndex = 
typeInformation.getFieldIndex(fieldsMap[i]);
--- End diff --

We could compute the index upfront and avoid the String look-up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356788#comment-14356788
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user fhueske commented on a diff in the pull request:

https://github.com/apache/flink/pull/426#discussion_r26206329
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/io/CsvInputFormat.java ---
@@ -234,9 +291,29 @@ public OUT readRecord(OUT reuse, byte[] bytes, int 
offset, int numBytes) throws
}

if (parseRecord(parsedValues, bytes, offset, numBytes)) {
-   // valid parse, map values into pact record
-   for (int i = 0; i  parsedValues.length; i++) {
-   reuse.setField(parsedValues[i], i);
+   if (typeInformation instanceof TupleTypeInfoBase) {
+   // result type is tuple
+   Tuple result = (Tuple) reuse;
+   for (int i = 0; i  parsedValues.length; i++) {
+   result.setField(parsedValues[i], i);
+   }
+   } else {
+   // result type is POJO
+   PojoTypeInfoOUT pojoTypeInfo = 
(PojoTypeInfoOUT) typeInformation;
+   for (int i = 0; i  parsedValues.length; i++) {
+   if (fieldsMap[i] == null) {
+   continue;
+   }
+
+   try {
+   int fieldIndex = 
typeInformation.getFieldIndex(fieldsMap[i]);
--- End diff --

We could compute the index upfront and avoid the String look-up.


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1512) Add CsvReader for reading into POJOs.

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357861#comment-14357861
 ] 

ASF GitHub Bot commented on FLINK-1512:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-78401359
  
@fhueske Thanks for your kindly advice. I will fix as soon as possible. 


 Add CsvReader for reading into POJOs.
 -

 Key: FLINK-1512
 URL: https://issues.apache.org/jira/browse/FLINK-1512
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 Currently, the {{CsvReader}} supports only TupleXX types. 
 It would be nice if users were also able to read into POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1512] Add CsvReader for reading into PO...

2015-03-11 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/426#issuecomment-78401359
  
@fhueske Thanks for your kindly advice. I will fix as soon as possible. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Closed] (FLINK-1684) Make Kafka connectors read/write a partition the worker is on

2015-03-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/FLINK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Márton Balassi closed FLINK-1684.
-
Resolution: Duplicate

 Make Kafka connectors read/write a partition the worker is on
 -

 Key: FLINK-1684
 URL: https://issues.apache.org/jira/browse/FLINK-1684
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gábor Hermann

 Kafka connectors could read/write partitions on a different machine. It is a 
 best effort to find the partitions located on the same node as the subtask 
 and read from (or write to) that partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1684) Make Kafka connectors read/write a partition the worker is on

2015-03-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356961#comment-14356961
 ] 

Márton Balassi commented on FLINK-1684:
---

This is a duplicate of FLINK-1673.

 Make Kafka connectors read/write a partition the worker is on
 -

 Key: FLINK-1684
 URL: https://issues.apache.org/jira/browse/FLINK-1684
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Reporter: Gábor Hermann

 Kafka connectors could read/write partitions on a different machine. It is a 
 best effort to find the partitions located on the same node as the subtask 
 and read from (or write to) that partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26212489
  
--- Diff: docs/_includes/navbar.html ---
@@ -24,15 +24,15 @@
 We might be on an externally hosted documentation site.
 Please keep the site.FLINK_WEBSITE_URL below to ensure a link back 
to the Flink website.
 {% endcomment %}
-   a href={{ site.FLINK_WEBSITE_URL }}index.html title=Home
+   a href={{ site.FLINK_WEBSITE_URL }}/index.html title=Home
  img class=hidden-xs hidden-sm img-responsive
-  src={{ site.baseurl }}img/logo.png alt=Apache Flink Logo
+  src={{ site.baseurl }}/img/logo.png alt=Apache Flink Logo
/a
div class=row visible-xs
  div class=col-xs-3
-   a href={{ site.baseurl }}index.html title=Home  
+   a href={{ site.baseurl }}/index.html title=Home
--- End diff --

Do we have control over ```{{ site.baseurl }}}```? The problem with the 
relative links is that you cannot have websites in different directories than 
the root directory. IMHO this is not how it should be.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26212689
  
--- Diff: docs/_includes/navbar.html ---
@@ -49,15 +49,15 @@
  ul class=nav navbar-nav
 
li
- a href=index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
+ a href=/index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
--- End diff --

Then we should try to change the way the docs are linked I think. Relative 
paths don't allow you to have any kind of directory structure for the html 
files. Maintaining a directory with all our html files will quickly become a 
big mess.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26215901
  
--- Diff: docs/_includes/navbar.html ---
@@ -49,15 +49,15 @@
  ul class=nav navbar-nav
 
li
- a href=index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
+ a href=/index.html class={% if page.url contains 'index.html' 
%}active{% endif %}Documentation/a
--- End diff --

We could set the `base href=http://ci.apache.org/projects/flink/.../;` 
in the header. That way, we would be able to use relative links. However, 
hard-coding the link to the docs in the template variables, is also not a very 
nice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26212353
  
--- Diff: docs/_layouts/default.html ---
@@ -23,16 +23,25 @@
 meta http-equiv=X-UA-Compatible content=IE=edge
 meta name=viewport content=width=device-width, initial-scale=1
 titleApache Flink: {{ page.title }}/title
-link rel=shortcut icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css
-link rel=stylesheet href={{ site.baseurl 
}}css/bootstrap-lumen-custom.css
-link rel=stylesheet href={{ site.baseurl }}css/syntax.css
-link rel=stylesheet href={{ site.baseurl }}css/custom.css
-link href={{ site.baseurl }}css/main/main.css rel=stylesheet
+link rel=shortcut icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css
+link rel=stylesheet href={{ site.baseurl 
}}/css/bootstrap-lumen-custom.css
+link rel=stylesheet href={{ site.baseurl }}/css/syntax.css
+link rel=stylesheet href={{ site.baseurl }}/css/custom.css
+link href={{ site.baseurl }}/css/main/main.css rel=stylesheet
 script 
src=https://ajax.googleapis.com/ajax/libs/jquery/1.11.0/jquery.min.js;/script
-script src={{ site.baseurl }}js/bootstrap.min.js/script
-script src={{ site.baseurl }}js/codetabs.js/script
+script src={{ site.baseurl }}/js/bootstrap.min.js/script
+script src={{ site.baseurl }}/js/codetabs.js/script
+
+{% if page.mathjax %}
+script type=text/x-mathjax-config
+MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], 
['\\(','\\)']]}});
+/script
+script type=text/javascript
+
src=https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML;
--- End diff --

Good point. I did not check it. Thanks for doing so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Kick off of Flink's machine learning library

2015-03-11 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/479#discussion_r26212866
  
--- Diff: docs/_layouts/default.html ---
@@ -23,16 +23,25 @@
 meta http-equiv=X-UA-Compatible content=IE=edge
 meta name=viewport content=width=device-width, initial-scale=1
 titleApache Flink: {{ page.title }}/title
-link rel=shortcut icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=icon href={{ site.baseurl }}favicon.ico 
type=image/x-icon
-link rel=stylesheet href={{ site.baseurl }}css/bootstrap.css
-link rel=stylesheet href={{ site.baseurl 
}}css/bootstrap-lumen-custom.css
-link rel=stylesheet href={{ site.baseurl }}css/syntax.css
-link rel=stylesheet href={{ site.baseurl }}css/custom.css
-link href={{ site.baseurl }}css/main/main.css rel=stylesheet
+link rel=shortcut icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=icon href={{ site.baseurl }}/favicon.ico 
type=image/x-icon
+link rel=stylesheet href={{ site.baseurl }}/css/bootstrap.css
+link rel=stylesheet href={{ site.baseurl 
}}/css/bootstrap-lumen-custom.css
+link rel=stylesheet href={{ site.baseurl }}/css/syntax.css
+link rel=stylesheet href={{ site.baseurl }}/css/custom.css
+link href={{ site.baseurl }}/css/main/main.css rel=stylesheet
--- End diff --

Well it seems to be empty. Why do we have ```{{ site.baseurl }}``` in the 
first place? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---