Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

Andreas Mueller Thu, 04 Oct 2018 08:42:35 -0700


On 10/03/2018 03:32 PM, Nick Pentreath wrote:

For ONNX you may be interested inhttps://github.com/onnx/onnxmltools - which supports conversion of afew skelarn models to ONNX already.
However as far as I am aware, none of the ONNX backends actuallysupport the ONNX-ML extended spec (in open-source at least). So youwould not be able to actually do prediction I think...

Exactly, that's what I'm waiting for. MS is working on itafaik.

As for PFA, to my current knowledge there is no library that does ityet. Our own Aardpfark project(https://github.com/CODAIT/aardpfark) focuses on SparkML export to PFAfor now but would like to add sklearn support in the future.

On Wed, 3 Oct 2018 at 20:07 Sebastian Raschka<m...@sebastianraschka.com <mailto:m...@sebastianraschka.com>> wrote:


    The ONNX-approach sounds most promising, esp. because it will also
    allow library interoperability but I wonder if this is for
    parametric models only and not for the nonparametric ones like
    KNN, tree-based classifiers, etc.

    All-in-all I can definitely see the appeal for having a way to
    export sklearn estimators in a text-based format (e.g., via JSON),
    since it would make sharing code easier. This doesn't even have to
    be compatible with multiple sklearn versions. A typical use case
    would be to include these JSON exports as e.g., supplemental files
    of a research paper for other people to run the models etc. (here,
    one can just specify which sklearn version it would require; of
    course, one could also share pickle files, by I am personally
    always hesitant reg. running/trusting other people's pickle files).

    Unfortunately though, as Gael pointed out, this "feature" would be
    a huge burden for the devs, and it would probably also negatively
    impact the development of scikit-learn itself because it imposes
    another design constraint.

    However, I do think this sounds like an excellent case for a
    contrib project. Like scikit-export, scikit-serialize or sth like
    that.

    Best,
    Sebastian



    > On Oct 3, 2018, at 5:49 AM, Javier López <jlo...@ende.cc> wrote:
    >
    >
    > On Tue, Oct 2, 2018 at 5:07 PM Gael Varoquaux
    <gael.varoqu...@normalesup.org
    <mailto:gael.varoqu...@normalesup.org>> wrote:
    > The reason that pickles are brittle and that sharing pickles is
    a bad
    > practice is that pickle use an implicitly defined data model,
    which is
    > defined via the internals of objects.
    >
    > Plus the fact that loading a pickle can execute arbitrary code,
    and there is no way to know
    > if any malicious code is in there in advance because the
    contents of the pickle cannot
    > be easily inspected without loading/executing it.
    >
    > So, the problems of pickle are not specific to pickle, but rather
    > intrinsic to any generic persistence code [*]. Writing
    persistence code that
    > does not fall in these problems is very costly in terms of
    developer time
    > and makes it harder to add new methods or improve existing one.
    I am not
    > excited about it.
    >
    > My "text-based serialization" suggestion was nowhere near as
    ambitious as that,
    > as I have already explained, and wasn't aiming at solving the
    versioning issues, but
    > rather at having something which is "about as good" as pickle
    but in a human-readable
    > format. I am not asking for a Turing-complete language to
    reproduce the prediction
    > function, but rather something simple in the spirit of the
    output produced by the gist code I linked above, just for the
    model families where it is reasonable:
    >
    > https://gist.github.com/jlopezpena/2cdd09c56afda5964990d5cf278bfd31
    >
    > The code I posted mostly works (specific cases of nested models
    need to be addressed
    > separately, as well as pipelines), and we have been using (a
    version of) it in production
    > for quite some time. But there are hackish aspects to it that we
    are not happy with,
    > such as the manual separation of init and fitted parameters by
    checking if the name ends with "_", having to infer class name and
    location using
    > "model.__class__.__name__" and "model.__module__", and the wacky
    use of "__import__".
    >
    > My suggestion was more along the lines of adding some metadata
    to sklearn estimators so
    > that a code in a similar style would be nicer to write; little
    things like having a `init_parameters` and `fit_parameters`
    properties that would return the lists of named parameters,
    > or a `model_info` method that would return data like sklearn
    version, class name and location, or a package level dictionary
    pointing at the estimator classes by a string name, like
    >
    > from sklearn.linear_models import LogisticRegression
    > estimator_classes = {"LogisticRegression": LogisticRegression, ...}
    >
    > so that one can load the appropriate class from the string
    description without calling __import__ or eval; that sort of stuff.
    >
    > I am aware this would not address the common complain of
    "prefect prediction reproducibility"
    > across versions, but I think we can all agree that this utopia
    of perfect reproducibility is not
    > feasible.
    >
    > And in the long, long run, I agree that PFA/onnx or whichever
    similar format that emerges, is
    > the way to go.
    >
    > J
    > _______________________________________________
    > scikit-learn mailing list
    > scikit-learn@python.org <mailto:scikit-learn@python.org>
    > https://mail.python.org/mailman/listinfo/scikit-learn

    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn



_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] [ANN] Scikit-learn 0.20.0

Reply via email to