Re: Tensorflow on Spark CPU
I re-test with cifar10 example and below is the result . can advice why lesser num_slot is faster compared with more slots? num_slots=20 231 seconds num_slots=5 52 seconds num_slot=134 seconds the code is at below https://gist.github.com/cometta/240bbc549155e22f80f6ba670c9a2e32 Do you have an example of tensorflow+big dataset that I can test? On Saturday, April 29, 2023 at 08:44:04 PM GMT+8, Sean Owen wrote: You don't want to use CPUs with Tensorflow.If it's not scaling, you may have a problem that is far too small to distribute. On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID wrote: Anyone successfully run native tensorflow on Spark ? i tested example at https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any speed up in training time by setting number of slot from1 to 10. The time taken to train is still the same. Anyone tested tensorflow training on Spark distributed workers with CPUs ? Can share your working example?
Re: Tensorflow on Spark CPU
You don't want to use CPUs with Tensorflow. If it's not scaling, you may have a problem that is far too small to distribute. On Sat, Apr 29, 2023 at 7:30 AM second_co...@yahoo.com.INVALID wrote: > Anyone successfully run native tensorflow on Spark ? i tested example at > https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor > on Kubernetes CPU . By running in on multiple workers CPUs. I do not see > any speed up in training time by setting number of slot from1 to 10. The > time taken to train is still the same. Anyone tested tensorflow training on > Spark distributed workers with CPUs ? Can share your working example? > > > > > >
Tensorflow on Spark CPU
Anyone successfully run native tensorflow on Spark ? i tested example at https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor on Kubernetes CPU . By running in on multiple workers CPUs. I do not see any speed up in training time by setting number of slot from1 to 10. The time taken to train is still the same. Anyone tested tensorflow training on Spark distributed workers with CPUs ? Can share your working example?