kevin su created SUBMARINE-270:
----------------------------------

             Summary: [Umbrella] Submarine-sdk pipeline
                 Key: SUBMARINE-270
                 URL: https://issues.apache.org/jira/browse/SUBMARINE-270
             Project: Apache Submarine
          Issue Type: New Feature
          Components: Submarine SDK
            Reporter: kevin su
             Fix For: 0.4.0


It's very complex from raw data ingestion to push model in production, 
submarine pipeline is building for deploying portable, scalable machine 
learning workflow

Created this JIRA ticket to discuss more detail and plan on submarine pipeline 
 The pipeline would have two main component

1. *workflow orchestrator* - help us manage dependency between each task 
,schedule workflow and retry if failure happens. There are 3 ways to build our 
orchestrator.
 * airflow - use airflow API to build our pipeline
 * submarine workflow - [~10110346] suggests built-in [submarine 
workflow|https://docs.google.com/document/d/1LiRozgumsYadmESQAXJk5gM5GOvB5bXpOiuFew6i9Os/edit#]
 * abstract orchestrator - support a abstraction layer like 
[TFX|https://github.com/tensorflow/tfx/blob/master/docs/guide/index.md#portability-and-interoperability],
 and we can support different orchestration frameworks

2. *sdk ML library* - reduce routine ML code development, there are several 
routine task to build ML pipeline, give some callback function to let user 
easily preprocessing, train model and others, we may contain different 
frameworks to deal with both small and large datasets.
 * preprocessing (Hive,Spark,Pandas)
 * train (TF, Pytorch)
 * Evaluation
 * Model Validator
 * Pusher

To find more check the link below, feel free to edit or comment documents



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to