Model metadata (mostly parameter values) are usually tiny. The parquet data
is most often for model coefficients. So this depends on the size of your
model, i.e. Your feature dimension.

A high-dimensional linear model can be quite large - but still typically
easy to fit into main memory on a single node. A high-dimensional
multi-layer perceptron with many layers could be quite a lot larger. An ALS
model with millions of users &I items could be quite huge.

On Thu, 18 Aug 2016 at 18:00, Rich Tarro <richta...@gmail.com> wrote:

> The following Databricks blog on Model Persistence states "Internally, we
> save the model metadata and parameters as JSON and the data as Parquet."
>
>
> https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html
>
>
> What data associated with a model or Pipeline is actually saved (in
> Parquet format)?
>
> What factors determine how large the the saved model or pipeline will be?
>
> Thanks.
> Rich
>

Reply via email to