[jira] [Created] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other

2015-03-12 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1679:
-

 Summary: Document how degree of parallelism /  parallelism / 
slots are connected to each other
 Key: FLINK-1679
 URL: https://issues.apache.org/jira/browse/FLINK-1679
 Project: Flink
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.9
Reporter: Robert Metzger


I see too many users being confused about properly setting up Flink with 
respect to parallelism.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[DISCUSS] Deprecate Spargel API for 0.9

2015-03-12 Thread Vasiliki Kalavri
Hi all,

I would like your opinion on whether we should deprecate the Spargel API in
0.9.

Gelly doesn't depend on Spargel, it actually contains it -- we have copied
the relevant classes over. I think it would be a good idea to deprecate
Spargel in 0.9, so that we can inform existing Spargel users that we'll
eventually remove it.

Also, I think the fact that we have 2 Graph APIs in the documentation might
be a bit confusing for newcomers. One might wonder why do we have them both
and when shall they use one over the other?

It might be a good idea to add a note in the Spargel guide that would
suggest to use Gelly instead and a corresponding note in the beginning of
the Gelly guide to explain that Spargel is part of Gelly now. Or maybe a
Gelly or Spargel? section. What do you think?

The only thing that worries me is that the Gelly API is not very stable. Of
course, we are mostly adding things, but we are planning to make some
changes as well and I'm sure more will be needed the more we use it.

Looking forward to your thoughts!

Cheers,
Vasia.


Re: [DISCUSS] Add method for each Akka message

2015-03-12 Thread Henry Saputra
+1 for the idea too. Should make it easier to trace/ debug.


- Henry

On Tue, Mar 10, 2015 at 5:45 AM, Stephan Ewen se...@apache.org wrote:
 +1, let's change this lazily whenever we work on an action/message, we pull
 the handling out into a dedicated method.

 On Tue, Mar 10, 2015 at 11:49 AM, Ufuk Celebi u...@apache.org wrote:

 Hey all,

 I currently find it a little bit frustrating to navigate between different
 task manager operations like cancel or submit task. Some of these
 operations are directly done in the event loop (e.g. cancelling), whereas
 others forward the msg to a method (e.g. submitting).

 For me, navigating to methods is way easier than manually scanning the
 event loop.

 Therefore, I would prefer to forward all messages to a corresponding
 method. Can I get some opinions on this? Would someone be opposed? [Or is
 there a way in IntelliJ to do this navigation more efficiently? I couldn't
 find anything.]

 – Ufuk


Re: [jira] [Commented] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other

2015-03-12 Thread Stephan Ewen
+1 for consistently calling it parallelism

-1 for AUTOMAX as the default

On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger rmetz...@apache.org
wrote:

 We can also make the change non-API breaking by adding an additional method
 and deprecating the old one.


 Why would the AUTOMAX parallelism eat up all cluster resources? It would
 only allocate all slots WITHIN the Flink cluster.
 Those users (=new users) who would benefit from the AUTOMAX parallelism
 have probably set the parallelism per TaskManager set to 1 anyways.
 Advanced users will set their parallelism / slots configuration anyways
 properly.

 In my experience, most users:
 - have exclusive access to a test cluster in the beginning (I don't think
 anybody who doesn't know the system at all would start Flink on a
 production cluster)
 - or use YARN
 - do not set any parallelism for jobs or slots per TaskManager.

 From these observations, I would actually set the number of slots on the
 TaskManagers to the number of available CPUs.
 And for the CLI frontend, I would by default let a job use all available
 slots (most users don't know that Flink allows to run multiple jobs at the
 same time).

 If users want to change the behavior, they have to look into the
 documentation.

 On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske fhue...@gmail.com wrote:

  +1 for going consistently with parallelism. However, these are
 API-breaking
  changes and we need to mark them deprecated before throwing them out,
 IMO.
 
  I am not comfortable with using AUTOMAX as a default. This is fine on
  dedicated setups like YARN sessions, but will consume all available
  resources of a cluster if a user forgets to set the -p flag (or fix the
 DOP
  in the program). There is already a default-parallelsm flag in the config
  and that value should be used, IMO.
 
  2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) j...@apache.org:
 
  
   [
  
 
 https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358345#comment-14358345
   ]
  
   Robert Metzger commented on FLINK-1679:
   ---
  
   I would suggest to remove all occurrences of degreeOfParalleism in
 the
   system and replace it by parallelism everywhere.
   The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
  
   I would also suggest to set the parallelism by default to {{AUTOMAX}}
 in
   the CliFrontend.
  
Document how degree of parallelism /  parallelism / slots are
   connected to each other
   
  
 
 ---
   
Key: FLINK-1679
URL:
 https://issues.apache.org/jira/browse/FLINK-1679
Project: Flink
 Issue Type: Task
 Components: Documentation
   Affects Versions: 0.9
   Reporter: Robert Metzger
   Assignee: Ufuk Celebi
   
I see too many users being confused about properly setting up Flink
  with
   respect to parallelism.
  
  
  
   --
   This message was sent by Atlassian JIRA
   (v6.3.4#6332)
  
 



Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-12 Thread Till Rohrmann
Have you run the 20 builds with the new shading code? With new shading the
TaskManagerFailsITCase should no longer fail. If it still does, then we
have to look into it again.

On Thu, Mar 12, 2015 at 2:01 PM, Stephan Ewen se...@apache.org wrote:

 I am also big time skeptical.

 There are some remaining stability issues with 0.9
   - Apparently a bug in the task canceling
   - Blocking Data Exchange is premature at this point
   - TaskManager startup is not robust
   - TaskManager / JobManager registration is not robust
   - Streaming fault tolerance needs more testing before we can make an
 assessment

 I think this needs a few more weeks...

 On Thu, Mar 12, 2015 at 1:57 PM, Aljoscha Krettek aljos...@apache.org
 wrote:

  I would like to get the Expression API for Java in there, as well.
 
  On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi u...@apache.org wrote:
   On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger rmetz...@apache.org
   wrote:
  
   So you're saying regarding the release you don't feel very confident
  that
   we manage to fork off release-0.9 next week?
  
  
   Yes. At the moment I would be uncomfortable with forking off.
  
   
  
   Regarding the failing tests: I thought that some failings jobs were
  related
   to my changes, but after looking into it, it was a false alarm. See
   comments here: https://github.com/apache/flink/pull/475
 



Building Flink takes long time now =(

2015-03-12 Thread Henry Saputra
Seemed like after few commits between today and yesterday, building
Flink takes very long time.

I used to be able to run mvn clean install -DskipTests in about 17
mins but after pull this morning, it has been 25 mins and the build
has not complete yet.

I am using MacOSX with JDK7

- Henry


Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-12 Thread Ufuk Celebi
On Thursday, March 12, 2015, Till Rohrmann till.rohrm...@gmail.com wrote:

 Have you run the 20 builds with the new shading code? With new shading the
 TaskManagerFailsITCase should no longer fail. If it still does, then we
 have to look into it again.


No, rebased on Monday before shading. Let me rebase and rerun tonight.


Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-12 Thread Aljoscha Krettek
I would like to get the Expression API for Java in there, as well.

On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi u...@apache.org wrote:
 On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger rmetz...@apache.org
 wrote:

 So you're saying regarding the release you don't feel very confident that
 we manage to fork off release-0.9 next week?


 Yes. At the moment I would be uncomfortable with forking off.

 

 Regarding the failing tests: I thought that some failings jobs were related
 to my changes, but after looking into it, it was a false alarm. See
 comments here: https://github.com/apache/flink/pull/475


Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-12 Thread Stephan Ewen
I am also big time skeptical.

There are some remaining stability issues with 0.9
  - Apparently a bug in the task canceling
  - Blocking Data Exchange is premature at this point
  - TaskManager startup is not robust
  - TaskManager / JobManager registration is not robust
  - Streaming fault tolerance needs more testing before we can make an
assessment

I think this needs a few more weeks...

On Thu, Mar 12, 2015 at 1:57 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 I would like to get the Expression API for Java in there, as well.

 On Thu, Mar 12, 2015 at 11:59 AM, Ufuk Celebi u...@apache.org wrote:
  On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger rmetz...@apache.org
  wrote:
 
  So you're saying regarding the release you don't feel very confident
 that
  we manage to fork off release-0.9 next week?
 
 
  Yes. At the moment I would be uncomfortable with forking off.
 
  
 
  Regarding the failing tests: I thought that some failings jobs were
 related
  to my changes, but after looking into it, it was a false alarm. See
  comments here: https://github.com/apache/flink/pull/475



[jira] [Created] (FLINK-1695) Create machine learning library

2015-03-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1695:


 Summary: Create machine learning library
 Key: FLINK-1695
 URL: https://issues.apache.org/jira/browse/FLINK-1695
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Create the infrastructure for Flink's machine learning library. This includes 
the creation of the module structure and the implementation of basic types such 
as vectors and matrices.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1696) Add multiple linear regression to ML library

2015-03-12 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-1696:


 Summary: Add multiple linear regression to ML library
 Key: FLINK-1696
 URL: https://issues.apache.org/jira/browse/FLINK-1696
 Project: Flink
  Issue Type: Improvement
Reporter: Till Rohrmann
Assignee: Till Rohrmann


Add multiple linear regression to ML library.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-12 Thread Ufuk Celebi
On Thu, Mar 12, 2015 at 10:11 AM, Robert Metzger rmetz...@apache.org
wrote:

 So you're saying regarding the release you don't feel very confident that
 we manage to fork off release-0.9 next week?


Yes. At the moment I would be uncomfortable with forking off.



Regarding the failing tests: I thought that some failings jobs were related
to my changes, but after looking into it, it was a false alarm. See
comments here: https://github.com/apache/flink/pull/475


[jira] [Created] (FLINK-1693) Deprecate the Spargel API

2015-03-12 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-1693:


 Summary: Deprecate the Spargel API
 Key: FLINK-1693
 URL: https://issues.apache.org/jira/browse/FLINK-1693
 Project: Flink
  Issue Type: Task
  Components: Spargel
Affects Versions: 0.9
Reporter: Vasia Kalavri


For the upcoming 0.9 release, we should mark all user-facing methods from the 
Spargel API as deprecated, with a warning that we are going to remove it at 
some point.

We should also add a comment in the docs and point people to Gelly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1694) Change the split between create/run of a vertex-centric iteration

2015-03-12 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-1694:


 Summary: Change the split between create/run of a vertex-centric 
iteration
 Key: FLINK-1694
 URL: https://issues.apache.org/jira/browse/FLINK-1694
 Project: Flink
  Issue Type: Task
  Components: Gelly
Reporter: Vasia Kalavri


Currently, the vertex-centric API in Gelly looks like this:

{code:java}
Graph inputGaph = ... //create graph
VertexCentricIteration iteration = inputGraph.createVertexCentricIteration();
... // configure the iteration
Graph newGraph = inputGaph.runVertexCentricIteration(iteration);
{code}

We have this create/run split, in order to expose the iteration object and be 
able to call the public methods of VertexCentricIteration.

However, this is not very nice and might lead to errors, if create and run are  
mistakenly called on different graph objects.

One suggestion is to change this to the following:

{code:java}
VertexCentricIteration iteration = inputGraph.createVertexCentricIteration();
... // configure the iteration
Graph newGraph = iteration.result();
{code}

or to go with a single run call, where we add an IterationConfiguration object 
as a parameter and we don't expose the iteration object to the user at all:

{code:java}
IterationConfiguration parameters  = ...
Graph newGraph = inputGraph.runVertexCentricIteration(parameters);
{code}

and we can also have a simplified method where no configuration is passed.

What do you think?
Personally, I like the second option a bit more.

-Vasia.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (FLINK-1679) Document how degree of parallelism / parallelism / slots are connected to each other

2015-03-12 Thread Maximilian Michels
+1 for unifying the way to set the parallelism and deprecating the old methods.

We had the AUTOMAX discussion before in the corresponding pull
request. It seems to be that there are two orthogonal views on how
resources should be allocated by default. I strongly agree with
Robert.

Users have exclusive access to resources or use a resource manager
(YARN). They are often unaware of the parallelism and are turned off
by the bad performance with parallelism of 1. Setting AUTOMAX by
default gives the best possible Flink experience. After all, Flink
doesn't even support proper sharing of resources at the moment. So
scenarios where multiple users manually set the parallelism will cause
problems with job canceling due to unavailable resources and missing
queuing features.

Let's leave it up to the advanced users to set the granularity of the
parallelism and provide the best out of the box experience for Flink
novices.

Best regards,
Max

On Thu, Mar 12, 2015 at 10:31 AM, Robert Metzger rmetz...@apache.org wrote:
 We can also make the change non-API breaking by adding an additional method
 and deprecating the old one.


 Why would the AUTOMAX parallelism eat up all cluster resources? It would
 only allocate all slots WITHIN the Flink cluster.
 Those users (=new users) who would benefit from the AUTOMAX parallelism
 have probably set the parallelism per TaskManager set to 1 anyways.
 Advanced users will set their parallelism / slots configuration anyways
 properly.

 In my experience, most users:
 - have exclusive access to a test cluster in the beginning (I don't think
 anybody who doesn't know the system at all would start Flink on a
 production cluster)
 - or use YARN
 - do not set any parallelism for jobs or slots per TaskManager.

 From these observations, I would actually set the number of slots on the
 TaskManagers to the number of available CPUs.
 And for the CLI frontend, I would by default let a job use all available
 slots (most users don't know that Flink allows to run multiple jobs at the
 same time).

 If users want to change the behavior, they have to look into the
 documentation.

 On Thu, Mar 12, 2015 at 10:20 AM, Fabian Hueske fhue...@gmail.com wrote:

 +1 for going consistently with parallelism. However, these are API-breaking
 changes and we need to mark them deprecated before throwing them out, IMO.

 I am not comfortable with using AUTOMAX as a default. This is fine on
 dedicated setups like YARN sessions, but will consume all available
 resources of a cluster if a user forgets to set the -p flag (or fix the DOP
 in the program). There is already a default-parallelsm flag in the config
 and that value should be used, IMO.

 2015-03-12 10:07 GMT+01:00 Robert Metzger (JIRA) j...@apache.org:

 
  [
 
 https://issues.apache.org/jira/browse/FLINK-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358345#comment-14358345
  ]
 
  Robert Metzger commented on FLINK-1679:
  ---
 
  I would suggest to remove all occurrences of degreeOfParalleism in the
  system and replace it by parallelism everywhere.
  The CLI frontend for example also calls it {{-p}}, not {{-dop}}.
 
  I would also suggest to set the parallelism by default to {{AUTOMAX}} in
  the CliFrontend.
 
   Document how degree of parallelism /  parallelism / slots are
  connected to each other
  
 
 ---
  
   Key: FLINK-1679
   URL: https://issues.apache.org/jira/browse/FLINK-1679
   Project: Flink
Issue Type: Task
Components: Documentation
  Affects Versions: 0.9
  Reporter: Robert Metzger
  Assignee: Ufuk Celebi
  
   I see too many users being confused about properly setting up Flink
 with
  respect to parallelism.
 
 
 
  --
  This message was sent by Atlassian JIRA
  (v6.3.4#6332)
 



Re: [jira] [Commented] (FLINK-1106) Deprecate old Record API

2015-03-12 Thread Fabian Hueske
And I'm +1 for removing the old API with the next release.

2015-03-10 17:38 GMT+01:00 Fabian Hueske fhue...@gmail.com:

 Yeah, I spotted a good amount of optimizer tests that depend on the Record
 API.
 I implemented the last optimizer tests with the new API and would
 volunteer to port the other optimizer tests.

 2015-03-10 16:32 GMT+01:00 Stephan Ewen (JIRA) j...@apache.org:


 [
 https://issues.apache.org/jira/browse/FLINK-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355063#comment-14355063
 ]

 Stephan Ewen commented on FLINK-1106:
 -

 A bit of test coverage depends on the deprecated API.

 We would need to port at least some of the tests to the new API.

 We can probably drop some subsumed / obsolete tests.

  Deprecate old Record API
  
 
  Key: FLINK-1106
  URL: https://issues.apache.org/jira/browse/FLINK-1106
  Project: Flink
   Issue Type: Task
   Components: Java API
 Affects Versions: 0.7.0-incubating
 Reporter: Robert Metzger
 Assignee: Robert Metzger
 Priority: Critical
  Fix For: 0.7.0-incubating
 
 
  For the upcoming 0.7 release, we should mark all user-facing methods
 from the old Record Java API as deprecated, with a warning that we are
 going to remove it at some point.
  I would suggest to wait one or two releases from the 0.7 release (given
 our current release cycle). I'll start a mailing-list discussion at some
 point regarding this.



 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)





[jira] [Created] (FLINK-1688) Add socket sink

2015-03-12 Thread JIRA
Márton Balassi created FLINK-1688:
-

 Summary: Add socket sink
 Key: FLINK-1688
 URL: https://issues.apache.org/jira/browse/FLINK-1688
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Reporter: Márton Balassi
Priority: Trivial


Add a sink that writes output to socket. I'd consider two options, one which 
implements a socket server and one which implements a client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Make a release to be announced at ApacheCon

2015-03-12 Thread Henry Saputra
I will follow up again with Sally this week if there any special
messaging or communications needed to do for the Apache Con from our
side.

- Henry

On Tue, Mar 10, 2015 at 3:20 AM, Robert Metzger rmetz...@apache.org wrote:
 Hey,

 whats the status on this? There is one week left until we are going to fork
 off a branch for 0.9 .. if we stick to the suggested timeline.
 The initial email said I am very much in favor of doing this, under the
 strong condition that we
 are very confident that the master has grown to be stable enough. I think
 it is time to evaluate whether we are confident that the master is stable.

 Best
 Robert



 On Wed, Mar 4, 2015 at 9:42 AM, Robert Metzger rmetz...@apache.org wrote:

 +1 for Marton as a release manager. Thank you!


 On Tue, Mar 3, 2015 at 7:56 PM, Henry Saputra henry.sapu...@gmail.com
 wrote:

 Ah, thanks Márton.

 So we are chartering to the similar concept of Spark RRD staging
 execution =P
 I suppose there will be a runtime configuration or hint to tell the
 Flink Job manager to indicate which execution is preferred?


 - Henry

 On Tue, Mar 3, 2015 at 2:09 AM, Márton Balassi balassi.mar...@gmail.com
 wrote:
  Hi Henry,
 
  Batch mode is a new execution mode for batch Flink jobs where instead of
  pipelining the whole execution the job is scheduled in stages, thus
  materializing the intermediate result before continuing to the next
  operators. For implications see [1].
 
  [1] http://www.slideshare.net/KostasTzoumas/flink-internals, page
 18-21.
 
 
  On Mon, Mar 2, 2015 at 11:39 PM, Henry Saputra henry.sapu...@gmail.com
 
  wrote:
 
  HI Stephan,
 
  What is Batch mode feature in the list?
 
  - Henry
 
  On Mon, Mar 2, 2015 at 5:03 AM, Stephan Ewen se...@apache.org wrote:
   Hi all!
  
   ApacheCon is coming up and it is the 15th anniversary of the Apache
   Software Foundation.
  
   In the course of the conference, Apache would like to make a series
 of
   announcements. If we manage to make a release during (or shortly
 before)
   ApacheCon, they will announce it through their channels.
  
   I am very much in favor of doing this, under the strong condition
 that we
   are very confident that the master has grown to be stable enough
 (there
  are
   major changes in the distributed runtime since version 0.8 that we
 are
   still stabilizing). No use in a widely announced build that does not
 have
   the quality.
  
   Flink has now many new features that warrant a release soon (once we
  fixed
   the last quirks in the new distributed runtime).
  
   Notable new features are:
- Gelly
- Streaming windows
- Flink on Tez
- Expression API
- Distributed Runtime on Akka
- Batch mode
- Maybe even a first ML library version
- Some streaming fault tolerance
  
   Robert proposed to have a feature freeze mid Match for that. His
   cornerpoints were:
  
   Feature freeze (forking off release-0.9): March 17
   RC1 vote: March 24
  
   The RC1 vote is 20 days before the ApacheCon (13. April).
   For the last three releases, the average voting time was 20 days:
   R 0.8.0 -- 14 days
   R 0.7.0 -- 22 days
   R 0.6   -- 26 days
  
   Please share your opinion on this!
  
  
   Greetings,
   Stephan