Re: Tensorflow on Spark CPU

2023-04-30 Thread Sean Owen
There is a large overhead to distributing this type of workload. I imagine
that for a small problem, the overhead dominates. You do not nearly need to
distribute a problem of this size, so more workers is probalby just worse.

On Sun, Apr 30, 2023 at 1:46 AM second_co...@yahoo.com <
second_co...@yahoo.com> wrote:

> I re-test with cifar10 example and below is the result .  can advice why
> lesser num_slot is faster compared with more slots?
>
> num_slots=20
>
> 231 seconds
>
>
> num_slots=5
>
> 52 seconds
>
>
> num_slot=1
>
> 34 seconds
>
> the code is at below
> https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32
>
> Do you have an example of tensorflow+big dataset that I can test?
>
>
>
>
>
>
>
> On Saturday, April 29, 2023 at 08:44:04 PM GMT+8, Sean Owen <
> sro...@gmail.com> wrote:
>
>
> You don't want to use CPUs with Tensorflow.
> If it's not scaling, you may have a problem that is far too small to
> distribute.
>
> On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID
>  wrote:
>
> Anyone successfully run native tensorflow on Spark ? i tested example at
> https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
> on Kubernetes CPU . By running in on multiple workers CPUs. I do not see
> any speed up in training time by setting number of slot from1 to 10. The
> time taken to train is still the same. Anyone tested tensorflow training on
> Spark distributed workers with CPUs ?  Can share your working example?
>
>
>
>
>
>


Re: Tensorflow on Spark CPU

2023-04-30 Thread second_co...@yahoo.com.INVALID
 I re-test with cifar10 example and below is the result .  can advice why 
lesser num_slot is faster compared with more slots?
num_slots=20 231 seconds
num_slots=5 52 seconds
num_slot=134 seconds

the code is at below 
https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32
Do you have an example of tensorflow+big dataset that I can test?






On Saturday, April 29, 2023 at 08:44:04 PM GMT+8, Sean Owen 
 wrote:  
 
 You don't want to use CPUs with Tensorflow.If it's not scaling, you may have a 
problem that is far too small to distribute.
On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID 
 wrote:

Anyone successfully run native tensorflow on Spark ? i tested example at 
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
  on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any 
speed up in training time by setting number of slot from1 to 10. The time taken 
to train is still the same. Anyone tested tensorflow training on Spark 
distributed workers with CPUs ?  Can share your working example?

 




  

Re: Tensorflow on Spark CPU

2023-04-29 Thread Sean Owen
You don't want to use CPUs with Tensorflow.
If it's not scaling, you may have a problem that is far too small to
distribute.

On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID
 wrote:

> Anyone successfully run native tensorflow on Spark ? i tested example at
> https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
> on Kubernetes CPU . By running in on multiple workers CPUs. I do not see
> any speed up in training time by setting number of slot from1 to 10. The
> time taken to train is still the same. Anyone tested tensorflow training on
> Spark distributed workers with CPUs ?  Can share your working example?
>
>
>
>
>
>


Tensorflow on Spark CPU

2023-04-29 Thread second_co...@yahoo.com.INVALID
Anyone successfully run native tensorflow on Spark ? i tested example at 
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
  on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any 
speed up in training time by setting number of slot from1 to 10. The time taken 
to train is still the same. Anyone tested tensorflow training on Spark 
distributed workers with CPUs ?  Can share your working example?