Thanks Nick! On Thu, Jul 14, 2016 at 10:18 AM, Nick Pentreath <[email protected]> wrote:
> For PFA, you may wish to check out > https://github.com/opendatagroup/hadrian/ (the "titus" subproject is a > full Python impl of PFA, with a focus on some "model producing" hooks such > as a PrettyPFA higher-level text-based DSL for PFA document construction). > > > > On Thu, 14 Jul 2016 at 16:07 William Komp <[email protected]> wrote: > >> Hi, >> Interesting conversation. I have captured model parameters in sql and use >> sql for scoring in massively parallel setups. You can score billion record >> sets in seconds. Works really well with logistic regression and other >> functional based models. Trees would be a bit more difficult. >> >> Has there been any discussion on PFA (Portable Format for Analytics): >> http://dmg.org/pfa/index.html incorporation in scikit? Bob Grossman is >> the driving force behind it. Here is a link to a deck from a Predictive >> Analytics World talk he gave in chicago a few months ago. >> >> >> http://www.slideshare.net/rgrossman/how-to-lower-the-cost-of-deploying-analytics-an-introduction-to-the-portable-format-for-analytics >> >> William >> >> On Thu, Jul 14, 2016 at 8:35 AM, Dale T Smith <[email protected]> >> wrote: >> >>> Hello, >>> >>> >>> >>> I investigated this subject last year, and have tried to keep up, so I >>> can perhaps offer some alternatives. >>> >>> >>> >>> · The only packages I know that read PMML in Python are >>> proprietary. There are several alternatives for writing to PMML, as you can >>> easily find. >>> >>> >>> >>> I also found >>> >>> >>> >>> https://code.google.com/archive/p/augustus/ >>> >>> >>> >>> and >>> >>> >>> >>> https://github.com/ctrl-alt-d/lightpmmlpredictor >>> >>> >>> >>> Depending on your project, sklearn-compiledtrees may be an option. >>> >>> >>> >>> https://github.com/ajtulloch/sklearn-compiledtrees >>> >>> >>> >>> Py2PMML ( >>> https://support.zementis.com/entries/37092748-Introducing-Py2PMML) is >>> by Zemantis and it’s a commercial product, meaning you pay for a license. >>> >>> >>> >>> · Another option is what we planned to do at an old job of mine >>> – read the model characteristics out of the scikit-learn object after fit, >>> and produce C code ourselves. This is a viable option for decision trees. >>> Adapt print_decision_trees() from this Stackoverflow answer. >>> >>> >>> >>> >>> http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree >>> >>> >>> >>> · You can also reconsider your use of joblib.dump again. I’m >>> aware that it has problems, but you can include enough versioning >>> information in the objects you dump in order to apply checks in your code >>> to make sure scikit-learn versions are compatible, etc. I know this is a >>> pain in the neck, but it’s a viable alternative to creating your own PMML >>> reader, writing a code generator of some kind, or buying a license. >>> >>> >>> >>> >>> >>> >>> __________________________________________________________________________________________ >>> *Dale Smith* | Macy's Systems and Technology | IFS eCommerce | Data >>> Science and Capacity Planning >>> | 5985 State Bridge Road, Johns Creek, GA 30097 | [email protected] >>> >>> >>> >>> *From:* scikit-learn [mailto:scikit-learn-bounces+dale.t.smith= >>> [email protected]] *On Behalf Of *Joel Nothman >>> *Sent:* Thursday, July 14, 2016 4:18 AM >>> *To:* Scikit-learn user and developer mailing list >>> *Subject:* Re: [scikit-learn] [Scikit-learn-general] Estimator >>> serialisability >>> >>> >>> >>> ⚠ EXT MSG: >>> >>> This has been discussed numerous times. I suppose no one thinks >>> supporting pickle only is great, but a custom dict is unmaintainable. The >>> best we've got AFAIK (and it looks >>> <https://github.com/jpmml/jpmml-sklearn/graphs/contributors> like it's >>> getting better all the time) is a tool to convert one-way to PMML, which is >>> portable to production environments. See >>> https://github.com/jpmml/sklearn2pmml (python interface) and >>> https://github.com/jpmml/jpmml-sklearn(command-line interface and guts >>> of the thing). >>> >>> >>> >>> I hope that helps; and thanks to Villu Ruusmann: that list of supported >>> estimators is awesome! >>> >>> >>> >>> PS: please write to the new list at [email protected] >>> >>> >>> >>> On 14 July 2016 at 17:24, Miroslav Zoričák <[email protected]> >>> wrote: >>> >>> Hi everybody, >>> >>> >>> >>> I have been using scikit-learn for a while, but I have run into a >>> problem that does not seem to have any good solutions. >>> >>> >>> >>> Basically I would like to: >>> >>> - build my pipeline in a Jupyter Notebook >>> >>> - persist it (to json or hdf5) >>> >>> - load it in production and execute the prediction there >>> >>> >>> >>> The problem is that for persisting estimators such as the RobustScaler >>> for example, the recommended way is to pickle them. Now I don't want to do >>> this, for three reasons: >>> >>> >>> >>> - Security, pickle is potentially dangerous >>> >>> - Portability, I can't unpickle it in scala for example >>> >>> - Pickle stores a lot of details and information which is not strictly >>> necessary to reconstruct the RobustScaler and therefore might prevent it >>> from being reconstructed correctly if a different version is used. >>> >>> >>> >>> Another option I would seem to have is to access the private members of >>> each serialiser that I want to use and store them on my own, but this is >>> inconvenient, because: >>> >>> >>> >>> - It forces me as a user to understand how the robust scaler works and >>> how it stores its internal state, which is generally bad for usability >>> >>> - The internal implementation could change, leaving me to fix my >>> serialisers (see #1) >>> >>> - I would need to do this for each new Estimator I decide to use >>> >>> >>> >>> Now, to me it seems the solution is quite obvious: >>> >>> Write a Mixin or update the BaseEstimator class to include two >>> additional methods: >>> >>> >>> >>> to_dict() - will return a dictionary such, that when passed to >>> >>> from_dict(dictionary) - it will reconstruct the original object >>> >>> >>> >>> these dictionaries could be passed to the JSON module or the YAML module >>> or stored elsewhere. We could provide more convenience methods to do this >>> for the user. >>> >>> >>> >>> In case of the RobustScaler the dict would look something like: >>> >>> { "center": "0,0", "scale": "1.0"} >>> >>> >>> >>> Now the bulk of the work is writing these serialisers and deserialisers >>> for all of the estimators, but that can be simplified by adding a method >>> that could do that automatically via reflection and the estimator would >>> only need to specify which fields to serialise. >>> >>> >>> >>> I am happy to start working on this and create a pull request on Github, >>> but before I do that I wanted to get some initial thoughts and reactions >>> from the community, so please let me know what you think. >>> >>> >>> >>> Best Regards, >>> >>> Miroslav Zoricak >>> >>> -- >>> >>> Best Regards, >>> Miroslav Zoricak >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> What NetFlow Analyzer can do for you? Monitors network bandwidth and >>> traffic >>> patterns at an interface-level. Reveals which users, apps, and protocols >>> are >>> consuming the most bandwidth. Provides multi-vendor support for NetFlow, >>> J-Flow, sFlow and other flows. Make informed decisions using capacity >>> planning >>> reports.http://sdm.link/zohodev2dev >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >>> >>> * This is an EXTERNAL EMAIL. Stop and think before clicking a link or >>> opening attachments. >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
