There is a large overhead to distributing this type of workload. I imagine
that for a small problem, the overhead dominates. You do not nearly need to
distribute a problem of this size, so more workers is probalby just worse.
On Sun, Apr 30, 2023 at 1:46 AM second_co...@yahoo.com <
I re-test with cifar10 example and below is the result . can advice why
lesser num_slot is faster compared with more slots?
num_slots=20 231 seconds
num_slots=5 52 seconds
num_slot=134 seconds
the code is at below
https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32
Do you
You don't want to use CPUs with Tensorflow.
If it's not scaling, you may have a problem that is far too small to
distribute.
On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID
wrote:
> Anyone successfully run native tensorflow on Spark ? i tested example at
>
Anyone successfully run native tensorflow on Spark ? i tested example at
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any
speed up in training time by setting number of slot from1