Re: Python sdk performance

Maximilian Michels Mon, 10 Jun 2019 07:29:36 -0700

Hi Mingliang,

You can increase the parallelism of the Python SDK Harness via the pipeline 
option


  --experimental worker_threads=<num_workers>

Note that the workers are Python threads which suffer from the Global 
Interpreter Lock. We currently do not use real processes, e.g. via 
multiprocessing.

There is also SdkWorkerParallelism which controls the number of SDK Harnesses 
per partition, e.g. Flink Task Manager.

  --sdk_worker_parallelism=<num_sdk_harnesses>

You probably will see a more significant improvement tuning this parameter.

Cheers,
Max

On 09.06.19 05:36, 青雉（祁明良） wrote:
> Hi all,
>
> I’m currently tuning performance of python sdk with Flink runner. I found 
> that the multithreading in python sdk worker limits the cpu usage around 1 
> core maximal. To my understanding, all the task slots on one taskmanger share 
> one sdk process, which means the low cpu usage of python sdk may probably 
> became the bottleneck. Is it possible to use multiprocessing to bump up cpu 
> usage?
>
> Best,
> Mingliang
>
> 本邮件及其附件含有小红书公司的保密信息，仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用（包括但不限于全部或部分地泄露、复制、或散发）本邮件中的信息。如果您错收了本邮件，请您立即电话或邮件通知发件人并删除本邮件！
> This communication may contain privileged or other confidential information 
> of Red. If you have received it in error, please advise the sender by reply 
> e-mail and immediately delete the message and any attachments without copying 
> or disclosing the contents. Thank you.
>

Re: Python sdk performance

Reply via email to