[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-06 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77569539
  
I think it would be definitely good to have something like a job submission
queue, that accepts jobs and executes them as soon as enough as enough
resource become available.
That should not be too hard to do.
Also simple dependencies could be checked like execute job Y only if job X
successfully completed.

However, I am not aware of any effort in that direction.

2015-03-06 11:26 GMT+01:00 Flavio Pompermaier notificati...@github.com:

 I know that in stratosphere there was an effort to write a job scheduler,
 do you think that such a thing could be valuable for the future or are you
 going to rely only on hadoop-ecosytem stuff (like Oozie or Falcon upon
 YARN)?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/410#issuecomment-77537609.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-06 Thread fpompermaier
Github user fpompermaier commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77586066
  
That would be awesome :)
I think you could talk with Markus about the Dopa scheduler..propably it's 
a closed project but it could be a source of inputs to create a ticket for 
contributors who wants to implement that!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-06 Thread fpompermaier
Github user fpompermaier commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77533354
  
That's true but what if there's not enough resources? Is there any policy 
to retry the job submission automatically or give priority to waiting/queued 
ones?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-06 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77530332
  
Hey,
Flink already supports running multiple jobs in parallel.
If you have 50 slots available, you can run two jobs requiring 25 slots.
The webfrontend is not really able to properly report the status of 
concurrent jobs, but thats only a visualization issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-06 Thread fpompermaier
Github user fpompermaier commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77537609
  
I know that in stratosphere there was an effort to write a job scheduler, 
do you think that such a thing could be valuable for the future or are you 
going to rely only on hadoop-ecosytem stuff (like Oozie or Falcon upon YARN)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-06 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77536080
  
At the moment, this is not supported yet. The easiest way to execute
multiple jobs concurrently is to start each job in a separate Flink cluster
running on YARN.

On Fri, Mar 6, 2015 at 10:52 AM, Flavio Pompermaier 
notificati...@github.com wrote:

 That's true but what if there's not enough resources? Is there any policy
 to retry the job submission automatically or give priority to
 waiting/queued ones?

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/410#issuecomment-77533354.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-04 Thread StephanEwen
Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/410


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-04 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77205854
  
Manually merged into `release-0.8` in 
a6f9f9939ca03026baeefb3bd0876b90068b7682


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-04 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-77239086
  
Thank you for merging it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-03 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-76974155
  
Ping ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-03-03 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-76978687
  
@rmetzger I don't see a reason why this should not go to master as well. 
After all, it's optional and quite useful if you want to run a job on the full 
cluster with as many available slots as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-02-17 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-74638095
  
Cool.
Lets merge this also to master and document it there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-02-17 Thread mxm
Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/410#discussion_r24804242
  
--- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/jobmanager/JobManager.java 
---
@@ -374,6 +375,8 @@ public JobSubmissionResult submitJob(JobGraph job) 
throws IOException {
LOG.debug(String.format(Running master 
initialization of job %s (%s), job.getJobID(), job.getName()));
}
 
+   final int numSlots = scheduler.getTotalNumberOfSlots();
--- End diff --

Shouldn't this be set to `getNumberOfAvailableSlots()` for the 
PARALLELISM_AUTO_MAX case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-02-17 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-74642763
  
Right now, the user has to set the parallelism to 
`ExecutionConfig.PARALLELISM_AUTO_MAX`. Why not use all available task slots by 
default? I understand, that we shouldn't simply grab all resources but the auto 
parallelism will only grab the resources which were already granted to Flink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-02-17 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-74643785
  
I agree with Fabian that it is not a good default behavior to grab 
everything that is possible.
It should be an explicit request by the user. For YARN single job sessions, 
we can make this the default, otherwise it is not very friendly.

`getNumberOfAvailableSlots()` changes very fast during multi user 
operation. Dusing single user operation between jobs (where I see the auto 
parallelism useful), it is the same as `getTotalNumberOfSlots()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add auto-parallelism to Jobs (0.8 branch)

2015-02-17 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/410#issuecomment-74661555
  
But currently the system does not support multi-user/multi-job scenarios so
well either. If I'm not mistaken, then the scheduler schedules the tasks
eagerly which means that two jobs could take required slots away from each
other. As a consequence, both will fail if not properly configured.

On Tue, Feb 17, 2015 at 11:01 AM, Fabian Hueske notificati...@github.com
wrote:

 Using max parallelism basically prohibits to run more than one program at
 a time. I don't think that would be a good default mode.

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/flink/pull/410#issuecomment-74643360.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---