Re: scaling question

Ufuk Celebi Fri, 19 Jun 2015 07:45:07 -0700

PS: I've read your last email as 64 HT cores per machine. If it was in total 
over the 16 nodes, you have to adjust my response accordingly. ;)


On 19 Jun 2015, at 16:42, Fabian Hueske <fhue...@gmail.com> wrote:

> Hi Bill,
> 
> no worry, questions are the purpose of this mailing list.
> 
> The number network buffers is a parameter that needs to be scaled with your 
> setup. The reason for that is Flink's pipelined data transfer, which requires 
> a certain number of network buffers to be available at the same time during 
> processing.
> 
> There is an FAQ entry that explains how to set this parameter according to 
> your setup:
> --> 
> http://flink.apache.org/faq.html#i-get-an-error-message-saying-that-not-enough-buffers-are-available-how-do-i-fix-this
> 
> The documentation for parallel execution can be found here:
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution
> 
> If you are working on the latest snapshot you can also configure Flink to use 
> batched data transfer instead of pipelined transfer. This is done via the 
> ExecutionConfig.setExecutionMode(), which you obtain by calling getConfig() 
> on your ExecutionEnvironment.
> 
> Best, Fabian
> 
> 
> 2015-06-19 16:31 GMT+02:00 Maximilian Michels <m...@apache.org>:
> Hi Bill,
> 
> You're right. Simply increasing the task manager slots doesn't do anything. 
> It is correct to set the parallelism to taskManagers*slots. Simply increase 
> the number of network buffers in the flink-conf.yaml, e.g. to 4096. In the 
> future, we will configure this setting dynamically.
> 
> Let us know if your runtime decreases :)
> 
> Cheers,
> Max
> 
> On Fri, Jun 19, 2015 at 4:24 PM, Bill Sparks <jspa...@cray.com> wrote:
> 
> Sorry for the post again. I guess I'm not understanding this… 
> 
> The question is how to scale up/increase the execution of a problem. What  
> I'm trying to do, is get the best out of the available processors for a given 
> node count and compare this against spark, using KMeans.
> 
> For spark,  one method is to increase the executors and RDD partitions  - for 
> Flink I can increase the number of task slots 
> (taskmanager.numberOfTaskSlots). My empirical evidence suggests that just 
> increasing the slots does not increase processing of the data. Is there 
> something I'm missing? Much like spark with re-partitioning your datasets, is 
> there an equivalent option for flink? What about the parallelism argument The 
> referring document seems to be broken…
> 
> This seems to be a dead link: 
> https://github.com/apache/flink/blob/master/docs/setup/%7B%7Bsite.baseurl%7D%7D/apis/programming_guide.html#parallel-execution
> 
> If I do increase the parallelism to be (taskManagers*slots) I hit the 
> "Insufficient number of network buffers…" 
> 
> I have 16 nodes (64 HT cores), and have run TaskSlots from 1, 4, 8, 16  and 
> still the execution time is always around 5-6 minutes, using the default 
> parallelism.
> 
> Regards,
>     Bill
> -- 
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
> 
>

Re: scaling question

Reply via email to