Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

martin Thu, 11 Nov 2021 00:02:46 -0800

Yes, that would be a suitable option. We could just extend the standardSpark MLLib Transformer and add the required meta-data.

Just out of curiosity: Is there a specific reason for why the user of astandard Transform would not be able to add arbitrary key-value pairsfor additional meta-data? This could also be handy not just for thingslike versioning, but also for storing evaluation metrics together with atrained pipeline (for people who aren't using something like MLFlow,yet).


Cheers,

Martin

Am 2021-10-25 14:38, schrieb Sean Owen:

You can write a custom Transformer or Estimator?
On Mon, Oct 25, 2021 at 7:37 AM Sonal Goyal <sonalgoy...@gmail.com>wrote:
Hi Martin,
Agree, if you don't need the other features of MLFlow then it is likelyoverkill.
Cheers,
Sonal
https://github.com/zinggAI/zingg

On Mon, Oct 25, 2021 at 4:06 PM <mar...@wunderlich.com> wrote:

Hi Sonal,
Thanks a lot for this suggestion. I presume it might indeed be possibleto use MLFlow for this purpose, but at present it seems a bit too muchto introduce another framework only for storing arbitrary meta-datawith trained ML pipelines. I was hoping there might be a way to do thisnatively in Spark ML. Otherwise, I'll just create a wrapper class forthe trained models.
Cheers,

Martin

Am 2021-10-24 21:16, schrieb Sonal Goyal:

Does MLFlow help you? https://mlflow.org/
I don't know if ML flow can save arbitrary key-value pairs andassociate them with a model, but versioning and evaluation etc aresupported.
Cheers,
Sonal
https://github.com/zinggAI/zingg

On Wed, Oct 20, 2021 at 12:59 PM <mar...@wunderlich.com> wrote:

Hello,
This is my first post to this list, so I hope I won't violate any(un)written rules.
I recently started working with SparkNLP for a larger project. SparkNLPin turn is based Apache Spark's MLlib. One thing I found missing is theability to store custom parameters in a Spark pipeline. It seems onlycertain pre-configured parameter values are allowed (e.g. "stages" forthe Pipeline class).
IMHO, it would be handy to be able to store custom parameters, e.g. formodel versions or other meta-data, so that these parameters are storedwith a trained pipeline, for instance. This could also be used toinclude evaluation results, such as accuracy, with trained ML models.
(I also asked this on Stackoverflow, but didn't get a response, yet:https://stackoverflow.com/questions/69627820/setting-custom-parameters-for-a-spark-mllib-pipeline)
Would does the community think about this proposal? Has it beendiscussed before perhaps? Any thoughts?
Cheers,

Martin

Re: Feature (?): Setting custom parameters for a Spark MLlib pipeline

Reply via email to