[ANNOUNCE] Introducing Bay Area Apache Flink meetup

2015-05-28 Thread Henry Saputra
Hi All, I would like to announce the new Apache Flink meetup in bay area: http://www.meetup.com/Bay-Area-Apache-Flink-Meetup/ We are cooking the first event for the meetup soon and will have awesome speakers to talk about Apache Flink =) Please join the Bay Area meetup to get the latest news

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gyula Fóra
Let's not get all dramatic :D If we don't call any methods on the empty groups we can still keep them off-memory in a persistent storage with a lazy checkpoint/state-access logic with practically 0 memory overhead. Automatically dropping everything will break a lot of programs without people

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Gábor Gévay
Hi, I would vote for making the default behaviour to drop all state for empty groups, and allow a configuration to set the current behaviour instead. This issue will probably have a paragraph in the documentation, but if someone overlooks this, then there is potential for a greater disaster with

Re: Adding custom Tuple to a DataSet

2015-05-28 Thread Stephan Ewen
Hi! If you want to have type hierarchies (like base tuples and different classes), you cannot use tuples (they are expected to be 'exact schema'), but you can use other classes. Create your own tuple POJO with subclasses, and it should work. Stephan On Thu, May 28, 2015 at 1:30 AM, Amit Pawar

Re: Storm compatibility layer currently does not support Storm's SimpleJoin example

2015-05-28 Thread Szabó Péter
Hi Matthias, Of course, here is the package that contains the example's source classes. https://github.com/mbalassi/flink/tree/storm-backup/flink-staging/flink-streaming/flink-storm-examples/src/main/java/org/apache/flink/stormcompatibility/singlejoin It is mostly a copy-paste of SimpleJoin from

[jira] [Created] (FLINK-2103) Expose partitionBy to the user in Stream API

2015-05-28 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2103: --- Summary: Expose partitionBy to the user in Stream API Key: FLINK-2103 URL: https://issues.apache.org/jira/browse/FLINK-2103 Project: Flink Issue Type:

RE: Changed the behavior of DataSet.print()

2015-05-28 Thread Kruse, Sebastian
Hi everyone, I am a bit worried about that recent change of the print() method. I can understand the rationale that obtaining the stdout from all the taskmanagers is cumbersome (although, for local debugging the old print() was fine). However, a major problem, I see with the new print(), is,

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Robert Metzger
Hi Sebastian, thank you for the feedback. I agree that both variants have a right to exist. I would vote for adding another method to the DataSet called printLocal() that has the old behavior. On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Hi everyone, I am

[jira] [Created] (FLINK-2104) Fallback implicit values for PredictOperation and TransformOperation don't work if Nothing is inferred as the output type

2015-05-28 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2104: Summary: Fallback implicit values for PredictOperation and TransformOperation don't work if Nothing is inferred as the output type Key: FLINK-2104 URL:

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Robert Metzger
Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe printOnTaskManager() ? On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for both. printLocal() might not be the best name,

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Stephan Ewen
Actually, there is a method print(String prefix) which still goes to the sysout of where the job is executed. Let's give that one the name printOnTaskManager() and then we should have it... On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske fhue...@gmail.com wrote: I would avoid to call it

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
+1 for both. printLocal() might not be the best name, because local is not well defined and could also be understood as the local machine of the user. How about naming the method completely different (writeToWorkerStdOut()?) to make sure users are not confused with eager and lazy execution?

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
I would avoid to call it printXYZ, since print()'s behavior changed to eager execution. 2015-05-28 14:10 GMT+02:00 Robert Metzger rmetz...@apache.org: Okay, you are right, local is actually confusing. I'm against introducing worker as a term in the API. Its still called TaskManager. Maybe

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Fabian Hueske
As I said, the common print prefix might indicate eager execution. I know that writeToTaskManagerStdOut() is quite bulky, but we should make the difference in the behavior very clear, IMO. 2015-05-28 14:29 GMT+02:00 Stephan Ewen se...@apache.org: Actually, there is a method print(String

[jira] [Created] (FLINK-2106) Add outer joins to API, Optimizer, and Runtime

2015-05-28 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2106: Summary: Add outer joins to API, Optimizer, and Runtime Key: FLINK-2106 URL: https://issues.apache.org/jira/browse/FLINK-2106 Project: Flink Issue Type:

RE: Changed the behavior of DataSet.print()

2015-05-28 Thread Kruse, Sebastian
Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager() ;) Cheers, Sebastian -Original Message- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent:

Re: Memleak in the SessionWindowing example

2015-05-28 Thread Márton Balassi
Thanks for debugging this Gabor, indeed a good catch. I am not so sure about surfacing it in the API though - it seems very specific for the session windowing case. I am also wondering whether maybe this should actually be the default behavior - if there are already empty windows for a group why

Re: Changed the behavior of DataSet.print()

2015-05-28 Thread Maximilian Michels
+1 for printOnTaskManager() On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian sebastian.kr...@hpi.de wrote: Thanks, for your quick responses! I also think that renaming the old print method should do the trick. As a contribution to your brainstorming for a name, I propose logOnTaskManager()

[jira] [Created] (FLINK-2107) Implement Hash Outer Join algorithm

2015-05-28 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-2107: Summary: Implement Hash Outer Join algorithm Key: FLINK-2107 URL: https://issues.apache.org/jira/browse/FLINK-2107 Project: Flink Issue Type: Sub-task

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Till Rohrmann
Yes GradientDescent == (batch-)SGD. That was also my first idea of how to implement it. However, what happens if the regularization is specific to the actually used algorithm. For example, for L-BFGS with L1 regularization you have a different `parameterUpdate` step (Orthant-wise Limited Memory

[jira] [Created] (FLINK-2109) CancelTaskException leads to FAILED task state

2015-05-28 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-2109: -- Summary: CancelTaskException leads to FAILED task state Key: FLINK-2109 URL: https://issues.apache.org/jira/browse/FLINK-2109 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-2108) Add score function for Predictors

2015-05-28 Thread Theodore Vasiloudis (JIRA)
Theodore Vasiloudis created FLINK-2108: -- Summary: Add score function for Predictors Key: FLINK-2108 URL: https://issues.apache.org/jira/browse/FLINK-2108 Project: Flink Issue Type:

Re: Some feedback on the Gradient Descent Code

2015-05-28 Thread Theodore Vasiloudis
+1 This separation was the idea from the start, there is trade-off between having highly configureable optimizers and ensuring that the right types of regularization can only be applied to optimization algorithms that support them. It comes down to viewing the optimization framework mostly as a