I re-test with cifar10 example and below is the result .  can advice why 
lesser num_slot is faster compared with more slots?
num_slots=20     231 seconds
num_slots=5 52 seconds
num_slot=134 seconds

the code is at below 
https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32
Do you have an example of tensorflow+big dataset that I can test?






    On Saturday, April 29, 2023 at 08:44:04 PM GMT+8, Sean Owen 
<sro...@gmail.com> wrote:  
 
 You don't want to use CPUs with Tensorflow.If it's not scaling, you may have a 
problem that is far too small to distribute.
On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID 
<second_co...@yahoo.com.invalid> wrote:

Anyone successfully run native tensorflow on Spark ? i tested example at 
https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
  on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any 
speed up in training time by setting number of slot from1 to 10. The time taken 
to train is still the same. Anyone tested tensorflow training on Spark 
distributed workers with CPUs ?  Can share your working example?

 




  

Reply via email to