Re: Status of MLLib exporting models to PMML
Hi, so you know, I added PMML export for linear models (linear, ridge and lasso) as suggested by Xiangrui. I will be looking at SVMs and Logistic regression next. Vincenzo -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-MLLib-exporting-models-to-PMML-tp18514p20005.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Status of MLLib exporting models to PMML
Yes, The case is convincing for PMML with Oryx. I will also investigate parameter server. Cheers, Charles On Tuesday, November 18, 2014, Sean Owen wrote: > I'm just using PMML. I haven't hit any limitation of its > expressiveness, for the model types is supports. I don't think there > is a point in defining a new format for models, excepting that PMML > can get very big. Still, just compressing the XML gets it down to a > manageable size for just about any realistic model.* > > I can imagine some kind of translation from PMML-in-XML to > PMML-in-something-else that is more compact. I've not seen anyone do > this. > > * there still aren't formats for factored matrices and probably won't > ever quite be, since they're just too large for a file format. > > On Tue, Nov 18, 2014 at 5:34 AM, Manish Amde > wrote: > > Hi Charles, > > > > I am not aware of other storage formats. Perhaps Sean or Sandy can > elaborate > > more given their experience with Oryx. > > > > There is work by Smola et al at Google that talks about large scale model > > update and deployment. > > > https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu > > > > -Manish > > > -- - Charles
Re: Status of MLLib exporting models to PMML
I'm just using PMML. I haven't hit any limitation of its expressiveness, for the model types is supports. I don't think there is a point in defining a new format for models, excepting that PMML can get very big. Still, just compressing the XML gets it down to a manageable size for just about any realistic model.* I can imagine some kind of translation from PMML-in-XML to PMML-in-something-else that is more compact. I've not seen anyone do this. * there still aren't formats for factored matrices and probably won't ever quite be, since they're just too large for a file format. On Tue, Nov 18, 2014 at 5:34 AM, Manish Amde wrote: > Hi Charles, > > I am not aware of other storage formats. Perhaps Sean or Sandy can elaborate > more given their experience with Oryx. > > There is work by Smola et al at Google that talks about large scale model > update and deployment. > https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu > > -Manish > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Status of MLLib exporting models to PMML
Hi Charles, I am not aware of other storage formats. Perhaps Sean or Sandy can elaborate more given their experience with Oryx. There is work by Smola et al at Google that talks about large scale model update and deployment. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/li_mu -Manish On Sunday, November 16, 2014, Charles Earl wrote: > Manish and others, > A follow up question on my mind is whether there are protobuf (or other > binary format) frameworks in the vein of PMML. Perhaps scientific data > storage frameworks like netcdf, root are possible also. > I like the comprehensiveness of PMML but as you mention the complexity of > management for large models is a concern. > Cheers > > On Fri, Nov 14, 2014 at 1:35 AM, Manish Amde > wrote: > >> @Aris, we are closely following the PMML work that is going on and as >> Xiangrui mentioned, it might be easier to migrate models such as logistic >> regression and then migrate trees. Some of the models get fairly large (as >> pointed out by Sung Chung) with deep trees as building blocks and we might >> have to consider a distributed storage and prediction strategy. >> >> >> On Tuesday, November 11, 2014, Xiangrui Meng > > wrote: >> >>> Vincenzo sent a PR and included k-means as an example. Sean is helping >>> review it. PMML standard is quite large. So we may start with simple >>> model export, like linear methods, then move forward to tree-based. >>> -Xiangrui >>> >>> On Mon, Nov 10, 2014 at 11:27 AM, Aris wrote: >>> > Hello Spark and MLLib folks, >>> > >>> > So a common problem in the real world of using machine learning is >>> that some >>> > data analysis use tools like R, but the more "data engineers" out >>> there will >>> > use more advanced systems like Spark MLLib or even Python Scikit Learn. >>> > >>> > In the real world, I want to have "a system" where multiple different >>> > modeling environments can learn from data / build models, represent the >>> > models in a common language, and then have a layer which just takes the >>> > model and run model.predict() all day long -- scores the models in >>> other >>> > words. >>> > >>> > It looks like the project openscoring.io and jpmml-evaluator are some >>> > amazing systems for this, but they fundamentally use PMML as the model >>> > representation here. >>> > >>> > I have read some JIRA tickets that Xiangrui Meng is interested in >>> getting >>> > PMML implemented to export MLLib models, is that happening? Further, >>> would >>> > something like Manish Amde's boosted ensemble tree methods be >>> representable >>> > in PMML? >>> > >>> > Thank you!! >>> > Aris >>> >>> - >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> > > > -- > - Charles >
Re: Status of MLLib exporting models to PMML
Manish and others, A follow up question on my mind is whether there are protobuf (or other binary format) frameworks in the vein of PMML. Perhaps scientific data storage frameworks like netcdf, root are possible also. I like the comprehensiveness of PMML but as you mention the complexity of management for large models is a concern. Cheers On Fri, Nov 14, 2014 at 1:35 AM, Manish Amde wrote: > @Aris, we are closely following the PMML work that is going on and as > Xiangrui mentioned, it might be easier to migrate models such as logistic > regression and then migrate trees. Some of the models get fairly large (as > pointed out by Sung Chung) with deep trees as building blocks and we might > have to consider a distributed storage and prediction strategy. > > > On Tuesday, November 11, 2014, Xiangrui Meng wrote: > >> Vincenzo sent a PR and included k-means as an example. Sean is helping >> review it. PMML standard is quite large. So we may start with simple >> model export, like linear methods, then move forward to tree-based. >> -Xiangrui >> >> On Mon, Nov 10, 2014 at 11:27 AM, Aris wrote: >> > Hello Spark and MLLib folks, >> > >> > So a common problem in the real world of using machine learning is that >> some >> > data analysis use tools like R, but the more "data engineers" out there >> will >> > use more advanced systems like Spark MLLib or even Python Scikit Learn. >> > >> > In the real world, I want to have "a system" where multiple different >> > modeling environments can learn from data / build models, represent the >> > models in a common language, and then have a layer which just takes the >> > model and run model.predict() all day long -- scores the models in other >> > words. >> > >> > It looks like the project openscoring.io and jpmml-evaluator are some >> > amazing systems for this, but they fundamentally use PMML as the model >> > representation here. >> > >> > I have read some JIRA tickets that Xiangrui Meng is interested in >> getting >> > PMML implemented to export MLLib models, is that happening? Further, >> would >> > something like Manish Amde's boosted ensemble tree methods be >> representable >> > in PMML? >> > >> > Thank you!! >> > Aris >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> -- - Charles
Re: Status of MLLib exporting models to PMML
@Aris, we are closely following the PMML work that is going on and as Xiangrui mentioned, it might be easier to migrate models such as logistic regression and then migrate trees. Some of the models get fairly large (as pointed out by Sung Chung) with deep trees as building blocks and we might have to consider a distributed storage and prediction strategy. On Tuesday, November 11, 2014, Xiangrui Meng wrote: > Vincenzo sent a PR and included k-means as an example. Sean is helping > review it. PMML standard is quite large. So we may start with simple > model export, like linear methods, then move forward to tree-based. > -Xiangrui > > On Mon, Nov 10, 2014 at 11:27 AM, Aris > wrote: > > Hello Spark and MLLib folks, > > > > So a common problem in the real world of using machine learning is that > some > > data analysis use tools like R, but the more "data engineers" out there > will > > use more advanced systems like Spark MLLib or even Python Scikit Learn. > > > > In the real world, I want to have "a system" where multiple different > > modeling environments can learn from data / build models, represent the > > models in a common language, and then have a layer which just takes the > > model and run model.predict() all day long -- scores the models in other > > words. > > > > It looks like the project openscoring.io and jpmml-evaluator are some > > amazing systems for this, but they fundamentally use PMML as the model > > representation here. > > > > I have read some JIRA tickets that Xiangrui Meng is interested in getting > > PMML implemented to export MLLib models, is that happening? Further, > would > > something like Manish Amde's boosted ensemble tree methods be > representable > > in PMML? > > > > Thank you!! > > Aris > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Status of MLLib exporting models to PMML
Hi DB, DB Tsai wrote > I also worry about that the author of JPMML changed the license of > jpmml-evaluator due to his interest of his commercial business, and he > might change the license of jpmml-model in the future. I am the principal author of the said Java PMML API projects and I want to assure you that I have no plans of changing the license of the JPMML-Model project now or in the future. In fact, most of the codebase is copyrighted by University of Tartu, so I can not do it even if I wanted to. I would also like to clarify the licensing of the JPMML-Evaluator project. This is a fork of the legacy JPMML project (https://github.com/jpmml/jpmml), which was started in early 2014 in order to provide support for the PMML specification version 4.2, implement missing functionality and do other enhancements. The project was initiated with the AGPLv3 license, there have been no "unexpected" license changes. Developing Java PMML APIs is a full-time work for me. If you (or anybody else) can suggest how I can support myself doing this under some license other than (A)GPLv3, I'd be interested to find out more. So far, I have learned that BSD 3-Clause License doesn't work - I've yet to receive a single "thank you" message for my previous work in this field, and many other fields. VR -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Status-of-MLLib-exporting-models-to-PMML-tp18514p18729.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Status of MLLib exporting models to PMML
Yes although I think this difference is on purpose as part of that commercial strategy. If future versions change license it would still be possible to not upgrade. Or fork / recreate the bean classes. Not worried so much but it is a good point. On Nov 11, 2014 10:06 PM, "DB Tsai" wrote: > I also worry about that the author of JPMML changed the license of > jpmml-evaluator due to his interest of his commercial business, and he > might change the license of jpmml-model in the future. > > Sincerely, > > DB Tsai > --- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Tue, Nov 11, 2014 at 11:43 AM, Sean Owen wrote: > > Yes, jpmml-evaluator is AGPL, but things like jpmml-model are not; > they're > > 3-clause BSD: > > > > https://github.com/jpmml/jpmml-model > > > > So some of the scoring components are off-limits for an AL2 project but > the > > core model components are OK. > > > > On Tue, Nov 11, 2014 at 7:40 PM, DB Tsai wrote: > >> > >> JPMML evaluator just changed their license to AGPL or commercial > >> license, and I think AGPL is not compatible with apache project. Any > >> advice? > >> > >> https://github.com/jpmml/jpmml-evaluator > >> > >> Sincerely, > >> > >> DB Tsai > >> --- > >> My Blog: https://www.dbtsai.com > >> LinkedIn: https://www.linkedin.com/in/dbtsai > >> > >> > >> On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng > wrote: > >> > Vincenzo sent a PR and included k-means as an example. Sean is helping > >> > review it. PMML standard is quite large. So we may start with simple > >> > model export, like linear methods, then move forward to tree-based. > >> > -Xiangrui > >> > > >> > On Mon, Nov 10, 2014 at 11:27 AM, Aris > wrote: > >> >> Hello Spark and MLLib folks, > >> >> > >> >> So a common problem in the real world of using machine learning is > that > >> >> some > >> >> data analysis use tools like R, but the more "data engineers" out > there > >> >> will > >> >> use more advanced systems like Spark MLLib or even Python Scikit > Learn. > >> >> > >> >> In the real world, I want to have "a system" where multiple different > >> >> modeling environments can learn from data / build models, represent > the > >> >> models in a common language, and then have a layer which just takes > the > >> >> model and run model.predict() all day long -- scores the models in > >> >> other > >> >> words. > >> >> > >> >> It looks like the project openscoring.io and jpmml-evaluator are > some > >> >> amazing systems for this, but they fundamentally use PMML as the > model > >> >> representation here. > >> >> > >> >> I have read some JIRA tickets that Xiangrui Meng is interested in > >> >> getting > >> >> PMML implemented to export MLLib models, is that happening? Further, > >> >> would > >> >> something like Manish Amde's boosted ensemble tree methods be > >> >> representable > >> >> in PMML? > >> >> > >> >> Thank you!! > >> >> Aris > >> > > >> > - > >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> > For additional commands, e-mail: user-h...@spark.apache.org > >> > > > > > >
Re: Status of MLLib exporting models to PMML
I also worry about that the author of JPMML changed the license of jpmml-evaluator due to his interest of his commercial business, and he might change the license of jpmml-model in the future. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Nov 11, 2014 at 11:43 AM, Sean Owen wrote: > Yes, jpmml-evaluator is AGPL, but things like jpmml-model are not; they're > 3-clause BSD: > > https://github.com/jpmml/jpmml-model > > So some of the scoring components are off-limits for an AL2 project but the > core model components are OK. > > On Tue, Nov 11, 2014 at 7:40 PM, DB Tsai wrote: >> >> JPMML evaluator just changed their license to AGPL or commercial >> license, and I think AGPL is not compatible with apache project. Any >> advice? >> >> https://github.com/jpmml/jpmml-evaluator >> >> Sincerely, >> >> DB Tsai >> --- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >> >> >> On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng wrote: >> > Vincenzo sent a PR and included k-means as an example. Sean is helping >> > review it. PMML standard is quite large. So we may start with simple >> > model export, like linear methods, then move forward to tree-based. >> > -Xiangrui >> > >> > On Mon, Nov 10, 2014 at 11:27 AM, Aris wrote: >> >> Hello Spark and MLLib folks, >> >> >> >> So a common problem in the real world of using machine learning is that >> >> some >> >> data analysis use tools like R, but the more "data engineers" out there >> >> will >> >> use more advanced systems like Spark MLLib or even Python Scikit Learn. >> >> >> >> In the real world, I want to have "a system" where multiple different >> >> modeling environments can learn from data / build models, represent the >> >> models in a common language, and then have a layer which just takes the >> >> model and run model.predict() all day long -- scores the models in >> >> other >> >> words. >> >> >> >> It looks like the project openscoring.io and jpmml-evaluator are some >> >> amazing systems for this, but they fundamentally use PMML as the model >> >> representation here. >> >> >> >> I have read some JIRA tickets that Xiangrui Meng is interested in >> >> getting >> >> PMML implemented to export MLLib models, is that happening? Further, >> >> would >> >> something like Manish Amde's boosted ensemble tree methods be >> >> representable >> >> in PMML? >> >> >> >> Thank you!! >> >> Aris >> > >> > - >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Status of MLLib exporting models to PMML
Yes, jpmml-evaluator is AGPL, but things like jpmml-model are not; they're 3-clause BSD: https://github.com/jpmml/jpmml-model So some of the scoring components are off-limits for an AL2 project but the core model components are OK. On Tue, Nov 11, 2014 at 7:40 PM, DB Tsai wrote: > JPMML evaluator just changed their license to AGPL or commercial > license, and I think AGPL is not compatible with apache project. Any > advice? > > https://github.com/jpmml/jpmml-evaluator > > Sincerely, > > DB Tsai > --- > My Blog: https://www.dbtsai.com > LinkedIn: https://www.linkedin.com/in/dbtsai > > > On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng wrote: > > Vincenzo sent a PR and included k-means as an example. Sean is helping > > review it. PMML standard is quite large. So we may start with simple > > model export, like linear methods, then move forward to tree-based. > > -Xiangrui > > > > On Mon, Nov 10, 2014 at 11:27 AM, Aris wrote: > >> Hello Spark and MLLib folks, > >> > >> So a common problem in the real world of using machine learning is that > some > >> data analysis use tools like R, but the more "data engineers" out there > will > >> use more advanced systems like Spark MLLib or even Python Scikit Learn. > >> > >> In the real world, I want to have "a system" where multiple different > >> modeling environments can learn from data / build models, represent the > >> models in a common language, and then have a layer which just takes the > >> model and run model.predict() all day long -- scores the models in other > >> words. > >> > >> It looks like the project openscoring.io and jpmml-evaluator are some > >> amazing systems for this, but they fundamentally use PMML as the model > >> representation here. > >> > >> I have read some JIRA tickets that Xiangrui Meng is interested in > getting > >> PMML implemented to export MLLib models, is that happening? Further, > would > >> something like Manish Amde's boosted ensemble tree methods be > representable > >> in PMML? > >> > >> Thank you!! > >> Aris > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >
Re: Status of MLLib exporting models to PMML
JPMML evaluator just changed their license to AGPL or commercial license, and I think AGPL is not compatible with apache project. Any advice? https://github.com/jpmml/jpmml-evaluator Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Tue, Nov 11, 2014 at 10:07 AM, Xiangrui Meng wrote: > Vincenzo sent a PR and included k-means as an example. Sean is helping > review it. PMML standard is quite large. So we may start with simple > model export, like linear methods, then move forward to tree-based. > -Xiangrui > > On Mon, Nov 10, 2014 at 11:27 AM, Aris wrote: >> Hello Spark and MLLib folks, >> >> So a common problem in the real world of using machine learning is that some >> data analysis use tools like R, but the more "data engineers" out there will >> use more advanced systems like Spark MLLib or even Python Scikit Learn. >> >> In the real world, I want to have "a system" where multiple different >> modeling environments can learn from data / build models, represent the >> models in a common language, and then have a layer which just takes the >> model and run model.predict() all day long -- scores the models in other >> words. >> >> It looks like the project openscoring.io and jpmml-evaluator are some >> amazing systems for this, but they fundamentally use PMML as the model >> representation here. >> >> I have read some JIRA tickets that Xiangrui Meng is interested in getting >> PMML implemented to export MLLib models, is that happening? Further, would >> something like Manish Amde's boosted ensemble tree methods be representable >> in PMML? >> >> Thank you!! >> Aris > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Status of MLLib exporting models to PMML
Vincenzo sent a PR and included k-means as an example. Sean is helping review it. PMML standard is quite large. So we may start with simple model export, like linear methods, then move forward to tree-based. -Xiangrui On Mon, Nov 10, 2014 at 11:27 AM, Aris wrote: > Hello Spark and MLLib folks, > > So a common problem in the real world of using machine learning is that some > data analysis use tools like R, but the more "data engineers" out there will > use more advanced systems like Spark MLLib or even Python Scikit Learn. > > In the real world, I want to have "a system" where multiple different > modeling environments can learn from data / build models, represent the > models in a common language, and then have a layer which just takes the > model and run model.predict() all day long -- scores the models in other > words. > > It looks like the project openscoring.io and jpmml-evaluator are some > amazing systems for this, but they fundamentally use PMML as the model > representation here. > > I have read some JIRA tickets that Xiangrui Meng is interested in getting > PMML implemented to export MLLib models, is that happening? Further, would > something like Manish Amde's boosted ensemble tree methods be representable > in PMML? > > Thank you!! > Aris - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Status of MLLib exporting models to PMML
Hello Spark and MLLib folks, So a common problem in the real world of using machine learning is that some data analysis use tools like R, but the more "data engineers" out there will use more advanced systems like Spark MLLib or even Python Scikit Learn. In the real world, I want to have "a system" where multiple different modeling environments can learn from data / build models, represent the models in a common language, and then have a layer which just takes the model and run model.predict() all day long -- scores the models in other words. It looks like the project openscoring.io and jpmml-evaluator are some amazing systems for this, but they fundamentally use PMML as the model representation here. I have read some JIRA tickets that Xiangrui Meng is interested in getting PMML implemented to export MLLib models, is that happening? Further, would something like Manish Amde's boosted ensemble tree methods be representable in PMML? Thank you!! Aris