Hi Riccardo, Yes, you can run Tensorflow distributed training (and inference) inline with PySpark; see some examples at https://github.com/intel-analytics/analytics-zoo/blob/master/pyzoo/zoo/examples/tensorflow/tfpark/estimator_dataset.py (using TF Keras API), https://github.com/intel-analytics/analytics-zoo/blob/master/pyzoo/zoo/examples/tensorflow/tfpark/estimator_dataset.py (using TF Estimator API) and https://github.com/intel-analytics/analytics-zoo/tree/master/pyzoo/zoo/examples/tensorflow/distributed_training .
For Keras API support in Analytics Zoo, it's a new implementation of Keras 1.2.2 on Spark (using BigDL). Thanks, -Jason On Mon, May 6, 2019 at 5:37 AM Riccardo Ferrari <ferra...@gmail.com> wrote: > Thanks everyone, I really appreciate your contributions here. > > @Jason, thanks for the references I'll take a look. Quickly checking > github: > https://github.com/intel-analytics/analytics-zoo#distributed-tensorflow-and-keras-on-sparkbigdl > Do I understand correctly I can: > > - Prepare my data with Spark > - Define a Tensorflow model > - Train it in distributed fashion > > When using the Keras API, is it the real Keras with just an adapter layer > or it si a completely different API that mimic Keras? > > @Gurav, I agree that "you should pick the right tool for the job". > > The purpose of this discussion is to understand/explore if we really need > another stack or we can leverage on the existing infrastructure and > expertise to accomplish the task. > We currently have some ML jobs and Spark proved to be the perfect fit for > us. We do know it enough to be confident we can deliver what is asked, it > scale, it is reslient, it works. > > We are starting to evaluate/introduce some DL models, being able to > leverage on the existing infra it would be a big plus. It is not only > having to deal with a new set of machines running a different stack (ie > tensorflow, mxnet, ...) it is everything around it, tuning, managing, > packing applications, testing and so on. Are reasonable concerns? > > Best, > > On Sun, May 5, 2019 at 8:06 PM Gourav Sengupta <gourav.sengu...@gmail.com> > wrote: > >> If someone is trying to actually use deep learning algorithms, their >> focus should be in choosing the technology stack which gives them maximum >> flexibility to try the nuances of their algorithms. >> >> From a personal perspective, I always prefer to use libraries which >> provides the best flexibility and extensibility in terms of the science/ >> mathematics of the subjects. For example try to open a book on Linear >> Regression and then try to see whether all the mathematical formulations >> are available in the SPARK module for regression or not. >> >> It is always better to choose a technology that fits into the nuances and >> perfection of the science, rather than choose a technology and then try to >> fit the science into it. >> >> Regards, >> Gourav >> >> On Sun, May 5, 2019 at 2:23 PM Jason Dai <jason....@gmail.com> wrote: >> >>> You may find talks from Analytics Zoo users at >>> https://analytics-zoo.github.io/master/#presentations/; in particular, >>> some of recent user examples on Analytics Zoo: >>> >>> - Mastercard: >>> >>> https://software.intel.com/en-us/articles/deep-learning-with-analytic-zoo-optimizes-mastercard-recommender-ai-service >>> >>> - Azure: >>> >>> https://software.intel.com/en-us/articles/use-analytics-zoo-to-inject-ai-into-customer-service-platforms-on-microsoft-azure-part-1 >>> - CERN: >>> >>> https://db-blog.web.cern.ch/blog/luca-canali/machine-learning-pipelines-high-energy-physics-using-apache-spark-bigdl >>> - Midea/KUKA: >>> >>> https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics >>> - Talroo: >>> >>> https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendation >>> >>> <https://software.intel.com/en-us/articles/talroo-uses-analytics-zoo-and-aws-to-leverage-deep-learning-for-job-recommendations> >>> >>> Thanks, >>> -Jason >>> >>> On Sun, May 5, 2019 at 6:29 AM Riccardo Ferrari <ferra...@gmail.com> >>> wrote: >>> >>>> Thank you for your answers! >>>> >>>> While it is clear each DL framework can solve the distributed model >>>> training on their own (some better than others). Still I see a lot of >>>> value of having Spark on the ETL/pre-processing part, thus the origin of my >>>> question. >>>> I am trying to avoid to mange multiple stacks/workflows and hoping to >>>> unify my system. Projects like TensorflowOnSpark or Analytics-Zoo (to name >>>> couple) feels like they can help, still I really appreciate your comments >>>> and anyone that could add some value to this discussion. Does anyone have >>>> experience with them? >>>> >>>> Thanks >>>> >>>> On Sat, May 4, 2019 at 8:01 PM Pat Ferrel <p...@occamsmachete.com> >>>> wrote: >>>> >>>>> @Riccardo >>>>> >>>>> Spark does not do the DL learning part of the pipeline (afaik) so it >>>>> is limited to data ingestion and transforms (ETL). It therefore is >>>>> optional >>>>> and other ETL options might be better for you. >>>>> >>>>> Most of the technologies @Gourav mentions have their own scaling based >>>>> on their own compute engines specialized for their DL implementations, so >>>>> be aware that Spark scaling has nothing to do with scaling most of the DL >>>>> engines, they have their own solutions. >>>>> >>>>> From: Gourav Sengupta <gourav.sengu...@gmail.com> >>>>> <gourav.sengu...@gmail.com> >>>>> Reply: Gourav Sengupta <gourav.sengu...@gmail.com> >>>>> <gourav.sengu...@gmail.com> >>>>> Date: May 4, 2019 at 10:24:29 AM >>>>> To: Riccardo Ferrari <ferra...@gmail.com> <ferra...@gmail.com> >>>>> Cc: User <user@spark.apache.org> <user@spark.apache.org> >>>>> Subject: Re: Deep Learning with Spark, what is your experience? >>>>> >>>>> Try using MxNet and Horovod directly as well (I think that MXNet is >>>>> worth a try as well): >>>>> 1. >>>>> https://medium.com/apache-mxnet/distributed-training-using-apache-mxnet-with-horovod-44f98bf0e7b7 >>>>> 2. >>>>> https://docs.nvidia.com/deeplearning/dgx/mxnet-release-notes/rel_19-01.html >>>>> 3. https://aws.amazon.com/mxnet/ >>>>> 4. >>>>> https://aws.amazon.com/blogs/machine-learning/aws-deep-learning-amis-now-include-horovod-for-faster-multi-gpu-tensorflow-training-on-amazon-ec2-p3-instances/ >>>>> >>>>> >>>>> Ofcourse Tensorflow is backed by Google's advertisement team as well >>>>> https://aws.amazon.com/blogs/machine-learning/scalable-multi-node-training-with-tensorflow/ >>>>> >>>>> >>>>> Regards, >>>>> >>>>> >>>>> >>>>> >>>>> On Sat, May 4, 2019 at 10:59 AM Riccardo Ferrari <ferra...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi list, >>>>>> >>>>>> I am trying to undestand if ti make sense to leverage on Spark as >>>>>> enabling platform for Deep Learning. >>>>>> >>>>>> My open question to you are: >>>>>> >>>>>> - Do you use Apache Spark in you DL pipelines? >>>>>> - How do you use Spark for DL? Is it just a stand-alone stage in >>>>>> the workflow (ie data preparation script) or is it more integrated >>>>>> >>>>>> I see a major advantage in leveraging on Spark as a unified >>>>>> entrypoint, for example you can easily abstract data sources and leverage >>>>>> on existing team skills for data pre-processing and training. On the flip >>>>>> side you may hit some limitations including supported versions and so on. >>>>>> What is your experience? >>>>>> >>>>>> Thanks! >>>>>> >>>>>