[GitHub] incubator-predictionio issue #441: pio batchpredict error
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/441 Also, this PR is from `develop` branch to `master`, but AFAIK this project does not use `master`. So, it seems to be committed directly to the mainline already. ---
[GitHub] incubator-predictionio issue #441: pio batchpredict error
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/441 I do not understand what this PR does. Does this pull request fix that SparkException or cause it? Is this a problem using Spark 2.2 and `pio batchpredict`? What engine template does this occur for? ---
Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release
Awesome! The docs for Batch Predict are finally live: https://predictionio.incubator.apache.org/batchpredict/ On Mon, Oct 2, 2017 at 3:57 PM, Donald Szeto <don...@apache.org> wrote: > Mars, it's fixed now. > > Re: https://issues.apache.org/jira/browse/INFRA-15208 > > On Mon, Oct 2, 2017 at 3:37 PM, Donald Szeto <don...@apache.org> wrote: > > > The build went through but the site is not reflecting the new version. I > > will open a ticket against ASF Infra to take a look. > > > > On Mon, Oct 2, 2017 at 2:47 PM, Donald Szeto <don...@apache.org> wrote: > > > >> The doc build failed at Scaladoc: https://builds.apach > >> e.org/job/PredictionIO-build-site/78/console > >> > >> And this has blocked the subsequent publish build. I'll just go ahead > and > >> disable Scaladoc generation for now to get the main site updated first. > >> > >> On Mon, Oct 2, 2017 at 12:59 PM, Mars Hall <mars.h...@salesforce.com> > >> wrote: > >> > >>> Actually, I still don't see the updates for version 0.12.0. > >>> > >>> Why don't we see this "Batch Predictions" entry on the docs site? > >>> > >>> https://github.com/apache/incubator-predictionio/blob/develo > >>> p/docs/manual/data/nav/main.yml#L65 > >>> > >>> When I made that change locally, that entry did appear under "Deploying > >>> an > >>> Engine" navigation section, but it's still not on the docs site: > >>> https://predictionio.incubator.apache.org/deploy/ > >>> > >>> > >>> On Mon, Oct 2, 2017 at 12:52 PM, Mars Hall <mars.h...@salesforce.com> > >>> wrote: > >>> > >>> > Thank you Chan! > >>> > > >>> > On Thu, Sep 28, 2017 at 9:02 AM, Chan Lee <chanlee...@gmail.com> > >>> wrote: > >>> > > >>> >> My apologies. The doc site has been updated now. > >>> >> > >>> > > >>> > > >>> > > >>> > -- > >>> > *Mars Hall > >>> > 415-818-7039 <(415)%20818-7039> > >>> > Customer Facing Architect > >>> > Salesforce Platform / Heroku > >>> > San Francisco, California > >>> > > >>> > > >>> > <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html> > >>> > > >>> > >>> > >>> > >>> -- > >>> *Mars Hall > >>> 415-818-7039 > >>> Customer Facing Architect > >>> Salesforce Platform / Heroku > >>> San Francisco, California > >>> > >>> > >>> <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html> > >>> > >> > >> > > > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California
Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release
Actually, I still don't see the updates for version 0.12.0. Why don't we see this "Batch Predictions" entry on the docs site? https://github.com/apache/incubator-predictionio/blob/develop/docs/manual/data/nav/main.yml#L65 When I made that change locally, that entry did appear under "Deploying an Engine" navigation section, but it's still not on the docs site: https://predictionio.incubator.apache.org/deploy/ On Mon, Oct 2, 2017 at 12:52 PM, Mars Hall <mars.h...@salesforce.com> wrote: > Thank you Chan! > > On Thu, Sep 28, 2017 at 9:02 AM, Chan Lee <chanlee...@gmail.com> wrote: > >> My apologies. The doc site has been updated now. >> > > > > -- > *Mars Hall > 415-818-7039 <(415)%20818-7039> > Customer Facing Architect > Salesforce Platform / Heroku > San Francisco, California > > > <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html> > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>
Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release
Thank you Chan! On Thu, Sep 28, 2017 at 9:02 AM, Chan Lee <chanlee...@gmail.com> wrote: > My apologies. The doc site has been updated now. > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>
Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release
It seems the documentation site has not been updated for the release: the new batch prediction page does not appear. What is the process for updating the doc site? On Thu, Sep 28, 2017 at 1:49 PM, takako shimamoto <chiboch...@gmail.com> wrote: > Awesome sauce! > Chan, I owe you a lot. Thanks! > > > 2017-09-28 12:45 GMT+09:00 Paritosh Piplewar <parit...@greentoe.com>: > >> congratulation. >> >> Sent from my iPhone >> >> On 28-Sep-2017, at 3:10 AM, Chan Lee <chan...@apache.org> wrote: >> >> The Apache PredictionIO team would like to announce the release of Apache >> PredictionIO 0.12.0-incubating. >> >> Release notes are here: >> https://github.com/apache/incubator-predictionio/blob/releas >> e/0.12.0/RELEASE.md >> >> Apache PredictionIO (incubating) is an open source Machine Learning Server >> built on top of state-of-the-art open source stack, that enables >> developers >> to manage and deploy production-ready predictive services for various >> kinds >> of machine learning tasks. >> >> More details regarding Apache PredictionIO (incubating) can be found here: >> http://predictionio.incubator.apache.org/ >> >> The release artifacts can be downloaded here: >> https://dist.apache.org/repos/dist/release/incubator/predict >> ionio/0.12.0-incubating/ >> >> All JIRAs completed for this release are tagged with 'FixVersion = >> 0.12.0-incubating'; the JIRA release notes can be found here: >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?versi >> on=12340591=12320420 >> >> Thanks! >> The Apache PredictionIO Team >> >> DISCLAIMER >> Apache PredictionIO (incubating) is an effort undergoing incubation at the >> Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. >> Incubation is required of all newly accepted projects until a further >> review indicates that the infrastructure, communications, and decision >> making process have stabilized in a manner consistent with other >> successful >> ASF projects. While incubation status is not necessarily a reflection of >> the completeness or stability of the code, it does indicate that the >> project has yet to be fully endorsed by the ASF. >> >> > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California
Re: [VOTE] Resolution to create a TLP from graduating Incubator podling
+1 binding On Tue, Sep 26, 2017 at 12:57 Andrew Purtell <andrew.purt...@gmail.com> wrote: > +1 (binding) > > > On Sep 25, 2017, at 8:50 PM, Donald Szeto <don...@apache.org> wrote: > > > > Hi all, > > > > Based on previous discussions ( > > > https://lists.apache.org/thread.html/2b4ef7c394584988cf0c99920824afaa60ee4c648d5c0069b1bf55c0@%3Cdev.predictionio.apache.org%3E > > and > > > https://lists.apache.org/thread.html/1b06e510773ee1d315728e0ce25f220c9cf7d9e8ad601ec9dba4fe1d@%3Cdev.predictionio.apache.org%3E > ), > > I would like to start a formal vote on graduating PredictionIO from an > > Incubator podling to a top level project with the following resolution. > > This thread will be forwarded to the Incubator general mailing list. > > > > Once again, Salesforce has already signed and executed an assignment > > agreement to assign the PredictionIO mark to ASF. > > > > The graduation process we are following is described here: > > http://incubator.apache.org/guides/graduation.html > > > > Once this vote passes, a discussion will be started on Incubator general, > > followed by a vote when a consensus there would be arrived. The vote will > > run for at least 72 hours before closing at 9PM PST on 9/28/2017. > > > > Thank you all! Let's graduate. > > > > +1 (binding) from me. > > > > Regards, > > Donald > > > > - > > > >X. Establish the Apache PredictionIO Project > > > > WHEREAS, the Board of Directors deems it to be in the best > > interests of the Foundation and consistent with the > > Foundation's purpose to establish a Project Management > > Committee charged with the creation and maintenance of > > open-source software, for distribution at no charge to > > the public, related to a machine learning server built on top of > > state-of-the-art open source stack, that enables developers to > manage > > and deploy production-ready predictive services for various kinds > of > > machine learning tasks. > > > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management > > Committee (PMC), to be known as the "Apache PredictionIO Project", > > be and hereby is established pursuant to Bylaws of the > > Foundation; and be it further > > > > RESOLVED, that the Apache PredictionIO Project be and hereby is > > responsible for the creation and maintenance of software > > related to a machine learning server built on top of > > state-of-the-art open source stack, that enables developers to > manage > > and deploy production-ready predictive services for various kinds > of > > machine learning tasks; > > and be it further > > > > RESOLVED, that the office of "Vice President, Apache PredictionIO" > be > > and hereby is created, the person holding such office to > > serve at the direction of the Board of Directors as the chair > > of the Apache PredictionIO Project, and to have primary > > responsibility > > for management of the projects within the scope of > > responsibility of the Apache PredictionIO Project; and be it > further > > > > RESOLVED, that the persons listed immediately below be and > > hereby are appointed to serve as the initial members of the > > Apache PredictionIO Project: > > > > * Alex Merritt <emergentor...@apache.org> > > * Andrew Kyle Purtell <apurt...@apache.org> > > * Chan Lee <chan...@apache.org> > > * Donald Szeto <don...@apache.org> > > * Felipe Oliveira <fel...@apache.org> > > * James Taylor <jtay...@apache.org> > > * Justin Yip <yipjus...@apache.org> > > * Kenneth Chan <kenn...@apache.org> > > * Lars Hofhansl <la...@apache.org> > > * Lee Moon Soo <m...@apache.org> > > * Luciano Resende <lrese...@apache.org> > > * Marcin Ziemiński <zie...@apache.org> > > * Marco Vivero <mviv...@apache.org> > > * Mars Hall <m...@apache.org> > > * Matthew Tovbin <tovb...@apache.org> > > * Naoki Takezoe <take...@apache.org> > > * Pat Ferrel <p...@apache.org> > > * Paul Li <pau...@apache.org> > > * Shinsuke Sugaya <shins...@apache.org> > > * Simon Chan <sim...@apache.org> >
Re: [DISCUSS] Resolution to create a TLP from graduating Incubator podling
Thank you for creating this resolution Donald. I move that we start the vote, pending any additional feedback from the group. Best regards, On Thu, Sep 21, 2017 at 12:19 PM, Andrew Purtell <apurt...@apache.org> wrote: > This looks great Donald, and I'm so glad you accepted the role of Chair. > > This part of the Special Order will establish the project description text > which must appear at the top of every report to the Board: > > [...] software related to *a machine learning server built on top of > state-of-the-art open source stack, that enables developers to manage and > deploy production-ready predictive services for various kinds of machine > learning tasks* > > > It is in effect the Apache in-house elevator pitch to other projects and > PMC or anyone reading the reports. This is the opportunity to improve this > description, if desired. It could also be fine as-is. > > > On Thu, Sep 21, 2017 at 10:29 AM, Donald Szeto <don...@apache.org> wrote: > > > Hi all, > > > > Based on the previous discussion ( > > https://lists.apache.org/thread.html/2b4ef7c394584988cf0c99920824af > > aa60ee4c648d5c0069b1bf55c0@%3Cdev.predictionio.apache.org%3E), > > I would like to start discussing a graduation resolution and reach a > > consent before starting a community vote on the following. Please read > > carefully the resolution, and voice any concerns you may have. If you > are a > > current PMC member, please make sure your name is listed unless you have > > already asked to be excluded. We will start an official community vote > when > > a consent is reached. > > > > Regarding the PredictionIO trademark assignment, Salesforce has signed > and > > executed an assignment agreement, and is only pending ASF to countersign. > > > > The graduation process we are following is described here: > > http://incubator.apache.org/guides/graduation.html > > > > Thank you all! Let's graduate. > > > > Regards, > > Donald > > > > - > > > > X. Establish the Apache PredictionIO Project > > > >WHEREAS, the Board of Directors deems it to be in the best > >interests of the Foundation and consistent with the > >Foundation's purpose to establish a Project Management > >Committee charged with the creation and maintenance of > >open-source software, for distribution at no charge to > >the public, related to a machine learning server built on top of > >state-of-the-art open source stack, that enables developers to > > manage > >and deploy production-ready predictive services for various kinds > of > >machine learning tasks. > > > >NOW, THEREFORE, BE IT RESOLVED, that a Project Management > >Committee (PMC), to be known as the "Apache PredictionIO Project", > >be and hereby is established pursuant to Bylaws of the > >Foundation; and be it further > > > >RESOLVED, that the Apache PredictionIO Project be and hereby is > >responsible for the creation and maintenance of software > >related to a machine learning server built on top of > >state-of-the-art open source stack, that enables developers to > > manage > >and deploy production-ready predictive services for various kinds > of > >machine learning tasks; > >and be it further > > > >RESOLVED, that the office of "Vice President, Apache PredictionIO" > > be > >and hereby is created, the person holding such office to > >serve at the direction of the Board of Directors as the chair > >of the Apache PredictionIO Project, and to have primary > > responsibility > >for management of the projects within the scope of > >responsibility of the Apache PredictionIO Project; and be it > further > > > >RESOLVED, that the persons listed immediately below be and > >hereby are appointed to serve as the initial members of the > >Apache PredictionIO Project: > > > > * Alex Merritt <emergentor...@apache.org> > > * Andrew Kyle Purtell <apurt...@apache.org> > > * Chan Lee <chan...@apache.org> > > * Donald Szeto <don...@apache.org> > > * Felipe Oliveira <fel...@apache.org> > > * James Taylor <jtay...@apache.org> > > * Justin Yip <yipjus...@apache.org> > > * Kenneth Chan <kenn...@apache.org> > > * Lars Hofhansl &l
Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC3)
+1 binding I checked: - build, train, deploy, & batchpredict - complete Elasticsearch 5.x functionality - Binaries work directly for Heroku deployment Such an exciting release! Thank you Chan, *Mars On Sun, Sep 17, 2017 at 11:31 AM, Chan Lee <chanlee...@gmail.com> wrote: > > > This is the vote for 0.12.0 of Apache PredictionIO (incubating). > > > > The vote will run for at least 72 hours and will close on Sep 20th, 2017. > > > > The release candidate artifacts can be downloaded here: > > https://dist.apache.org/repos/dist/dev/incubator/predictionio/0.12.0- > > incubating-rc3 > > > > Test results of RC3 can be found here: > > https://travis-ci.org/apache/incubator-predictionio/builds/276558626 > > > > Maven artifacts are built from the release candidate artifacts above, and > > are provided as convenience for testing with engine templates. The Maven > > artifacts are provided at the Maven staging repo here: > > https://repository.apache.org/content/repositories/ > > orgapachepredictionio-1021/ > > > > All JIRAs completed for this release are tagged with 'FixVersion = > > 0.12.0-incubating'. You can view them here: https://issues.apache.or > > g/jira/secure/ReleaseNote.jspa?version=12340591=12320420 > > > > The artifacts have been signed with Key: ytX8GpWv > > > > Please vote accordingly: > > > > [ ] +1, accept RC as the official 0.12.0 release > > [ ] -1, do not accept RC as the official 0.12.0 release because... > > >
[GitHub] incubator-predictionio pull request #435: Revise release notes: clarify brea...
Github user mars closed the pull request at: https://github.com/apache/incubator-predictionio/pull/435 ---
[GitHub] incubator-predictionio issue #435: Revise release notes: clarify breaking ch...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/435 Merged to [release-0.12.0](https://github.com/apache/incubator-predictionio/tree/release/0.12.0) by @chanlee514 ---
Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC2)
Sorry I didn't catch this before RC1. Thank you Chan. On Fri, Sep 15, 2017 at 12:35 PM, Chan Lee <chanlee...@gmail.com> wrote: > Mars, this seems like a necessary change, so I'll create RC3 with the PR > and additional updates here: http://predictionio.incubator.apache.org/ > resources/upgrade/ > > If there are changes anyone would like to add, please let me know by today. > I'll patch up another release tonight and send out a new email. > > Thanks, > Chan > > > On Fri, Sep 15, 2017 at 11:54 AM, Donald Szeto <don...@apache.org> wrote: > > > Votes are tied to tag/commit by ASF convention, so a new RC and vote will > > be required. > > > > On Fri, Sep 15, 2017 at 10:57 AM Mars Hall <mars.h...@salesforce.com> > > wrote: > > > > > I just opened a release notes PR against apache:release/0.12.0, because > > > that seems to be the right place. > > > > > > Chan, will that work okay with the release process? > > > > > > On Fri, Sep 15, 2017 at 10:20 AM, Mars Hall <mars.h...@salesforce.com> > > > wrote: > > > > > > > Also, I'd love to directly link the PIO-* issue numbers to JIRA. > > > > > > > > On Fri, Sep 15, 2017 at 10:19 AM, Mars Hall < > mars.h...@salesforce.com> > > > > wrote: > > > > > > > >> RC2 is working perfectly. > > > >> > > > >> I see a few issues with the releases notes: > > > >> > > > >> > > > >>- PIO-95 should be "Raised request timeout for REST API to > > > 35-seconds" > > > >>- PIO-102, PIO-106, PIO-117, PIO-118, PIO-120 actually includes a > > > >>breaking change to Elasticsearch 5.x StorageClient interface. I > > > think these > > > >>should be enumerated more explicitly with one of them called out > > in a > > > >>"Breaking changes" section. > > > >> > > > >> May I revise RELEASE.md on develop to fix these issues? Does that > > > require > > > >> restarting vote for an RC3? > > > >> > > > >> > > > >> On Thu, Sep 14, 2017 at 11:49 PM, Donald Szeto <don...@apache.org> > > > wrote: > > > >> > > > >>> I believe those are fixed by PIO-60, PIO-62 and PIO-63 in the > release > > > >>> notes. > > > >>> > > > >>> +1 binding from me > > > >>> > > > >>> On Thu, Sep 14, 2017 at 2:13 PM Pat Ferrel <p...@occamsmachete.com> > > > >>> wrote: > > > >>> > > > >>> > The last release was hung up by the IPMC regarding content > > licensing > > > >>> > issues and libraries used by the doc site, which we promised to > > > >>> address in > > > >>> > this release. Have these been resolved, don’t recall the > specifics? > > > It > > > >>> > would be great to fly through the IPMC vote without issue. > > > >>> > > > > >>> > > > > >>> > On Sep 14, 2017, at 2:06 PM, Chan Lee <chanlee...@gmail.com> > > wrote: > > > >>> > > > > >>> > This is the vote for 0.12.0 of Apache PredictionIO (incubating). > > > >>> > > > > >>> > The vote will run for at least 72 hours and will close on Sep > 17th, > > > >>> 2017. > > > >>> > > > > >>> > The release candidate artifacts can be downloaded here: > > > >>> > https://dist.apache.org/repos/dist/dev/incubator/predi > > > >>> > ctionio/0.12.0-incubating-rc2 > > > >>> > > > > >>> > Test results of RC1 can be found here: https://travis-ci.org/ap > > > >>> > ache/incubator-predictionio/builds/275634960 > > > >>> > > > > >>> > Maven artifacts are built from the release candidate artifacts > > above, > > > >>> and > > > >>> > are provided as convenience for testing with engine templates. > The > > > >>> Maven > > > >>> > artifacts are provided at the Maven staging repo here: > > > >>> > > > > >>> > https://repository.apache.org/content/repositories/orgapache > > > >>> predictionio-1020 > > > >>> > > > > >>> > All JIRAs completed for this release are tagged with 'FixVersion > = > > > >>> > 0.12.0-incubating'. You can view them here: > > https://issues.apache.or > > > >>> > g/jira/secure/ReleaseNote.jspa?version=12340591& > projectId=12320420 > > > >>> > > > > >>> > The artifacts have been signed with Key: ytX8GpWv > > > >>> > > > > >>> > Please vote accordingly: > > > >>> > > > > >>> > [ ] +1, accept RC as the official 0.12.0 release > > > >>> > [ ] -1, do not accept RC as the official 0.12.0 release > because... > > > >>> > > > > >>> > > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> *Mars Hall > > > >> 415-818-7039 <(415)%20818-7039> > > > >> Customer Facing Architect > > > >> Salesforce Platform / Heroku > > > >> San Francisco, California > > > >> > > > > > > > > > > > > > > > > -- > > > > *Mars Hall > > > > 415-818-7039 <(415)%20818-7039> > > > > Customer Facing Architect > > > > Salesforce Platform / Heroku > > > > San Francisco, California > > > > > > > > > > > > > > > > -- > > > *Mars Hall > > > 415-818-7039 > > > Customer Facing Architect > > > Salesforce Platform / Heroku > > > San Francisco, California > > > > > > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California
Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC2)
I just opened a release notes PR against apache:release/0.12.0, because that seems to be the right place. Chan, will that work okay with the release process? On Fri, Sep 15, 2017 at 10:20 AM, Mars Hall <mars.h...@salesforce.com> wrote: > Also, I'd love to directly link the PIO-* issue numbers to JIRA. > > On Fri, Sep 15, 2017 at 10:19 AM, Mars Hall <mars.h...@salesforce.com> > wrote: > >> RC2 is working perfectly. >> >> I see a few issues with the releases notes: >> >> >>- PIO-95 should be "Raised request timeout for REST API to 35-seconds" >>- PIO-102, PIO-106, PIO-117, PIO-118, PIO-120 actually includes a >>breaking change to Elasticsearch 5.x StorageClient interface. I think >> these >>should be enumerated more explicitly with one of them called out in a >>"Breaking changes" section. >> >> May I revise RELEASE.md on develop to fix these issues? Does that require >> restarting vote for an RC3? >> >> >> On Thu, Sep 14, 2017 at 11:49 PM, Donald Szeto <don...@apache.org> wrote: >> >>> I believe those are fixed by PIO-60, PIO-62 and PIO-63 in the release >>> notes. >>> >>> +1 binding from me >>> >>> On Thu, Sep 14, 2017 at 2:13 PM Pat Ferrel <p...@occamsmachete.com> >>> wrote: >>> >>> > The last release was hung up by the IPMC regarding content licensing >>> > issues and libraries used by the doc site, which we promised to >>> address in >>> > this release. Have these been resolved, don’t recall the specifics? It >>> > would be great to fly through the IPMC vote without issue. >>> > >>> > >>> > On Sep 14, 2017, at 2:06 PM, Chan Lee <chanlee...@gmail.com> wrote: >>> > >>> > This is the vote for 0.12.0 of Apache PredictionIO (incubating). >>> > >>> > The vote will run for at least 72 hours and will close on Sep 17th, >>> 2017. >>> > >>> > The release candidate artifacts can be downloaded here: >>> > https://dist.apache.org/repos/dist/dev/incubator/predi >>> > ctionio/0.12.0-incubating-rc2 >>> > >>> > Test results of RC1 can be found here: https://travis-ci.org/ap >>> > ache/incubator-predictionio/builds/275634960 >>> > >>> > Maven artifacts are built from the release candidate artifacts above, >>> and >>> > are provided as convenience for testing with engine templates. The >>> Maven >>> > artifacts are provided at the Maven staging repo here: >>> > >>> > https://repository.apache.org/content/repositories/orgapache >>> predictionio-1020 >>> > >>> > All JIRAs completed for this release are tagged with 'FixVersion = >>> > 0.12.0-incubating'. You can view them here: https://issues.apache.or >>> > g/jira/secure/ReleaseNote.jspa?version=12340591=12320420 >>> > >>> > The artifacts have been signed with Key: ytX8GpWv >>> > >>> > Please vote accordingly: >>> > >>> > [ ] +1, accept RC as the official 0.12.0 release >>> > [ ] -1, do not accept RC as the official 0.12.0 release because... >>> > >>> > >>> >> >> >> >> -- >> *Mars Hall >> 415-818-7039 <(415)%20818-7039> >> Customer Facing Architect >> Salesforce Platform / Heroku >> San Francisco, California >> > > > > -- > *Mars Hall > 415-818-7039 <(415)%20818-7039> > Customer Facing Architect > Salesforce Platform / Heroku > San Francisco, California > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California
[GitHub] incubator-predictionio pull request #435: Revise release notes: clarify brea...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/435 Revise release notes: clarify breaking changes; link issue IDs to JIRA You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/435.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #435 commit cbe7063c869611f7ece0613a16cf0bebf22e404e Author: Mars Hall <m...@users.noreply.github.com> Date: 2017-09-15T17:53:02Z Revise release notes: clarify breaking changes; link issue IDs to JIRA ---
[jira] [Updated] (PIO-95) Raise request timeout for REST API
[ https://issues.apache.org/jira/browse/PIO-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-95: - Summary: Raise request timeout for REST API (was: Configurable request timeout for REST API) > Raise request timeout for REST API > -- > > Key: PIO-95 > URL: https://issues.apache.org/jira/browse/PIO-95 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > Fix For: 0.12.0-incubating > > > We've found the default 20-second REST API request timeout is too short for > our batch-prediction use cases. We're running PredictionIO on Heroku which > has its own [timeout starting at > 30-seconds|https://devcenter.heroku.com/articles/limits#http-timeouts]. So > we'd prefer a more generous or easily configurable timeout to allow Heroku's > routing layer to impose & track this limit in the platform layer. > I investigated how to configure this and found [Spray > `application.conf`|http://spray.io/documentation/1.2.4/spray-can/configuration/]. > This PR simply increases the timeout. > I would love guidance on how we might extract this config into an environment > variable or a value in `pio-env.sh`. > Investigation / implementation PR: > https://github.com/apache/incubator-predictionio/pull/394 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC2)
RC2 is working perfectly. I see a few issues with the releases notes: - PIO-95 should be "Raised request timeout for REST API to 35-seconds" - PIO-102, PIO-106, PIO-117, PIO-118, PIO-120 actually includes a breaking change to Elasticsearch 5.x StorageClient interface. I think these should be enumerated more explicitly with one of them called out in a "Breaking changes" section. May I revise RELEASE.md on develop to fix these issues? Does that require restarting vote for an RC3? On Thu, Sep 14, 2017 at 11:49 PM, Donald Szeto <don...@apache.org> wrote: > I believe those are fixed by PIO-60, PIO-62 and PIO-63 in the release > notes. > > +1 binding from me > > On Thu, Sep 14, 2017 at 2:13 PM Pat Ferrel <p...@occamsmachete.com> wrote: > > > The last release was hung up by the IPMC regarding content licensing > > issues and libraries used by the doc site, which we promised to address > in > > this release. Have these been resolved, don’t recall the specifics? It > > would be great to fly through the IPMC vote without issue. > > > > > > On Sep 14, 2017, at 2:06 PM, Chan Lee <chanlee...@gmail.com> wrote: > > > > This is the vote for 0.12.0 of Apache PredictionIO (incubating). > > > > The vote will run for at least 72 hours and will close on Sep 17th, 2017. > > > > The release candidate artifacts can be downloaded here: > > https://dist.apache.org/repos/dist/dev/incubator/predi > > ctionio/0.12.0-incubating-rc2 > > > > Test results of RC1 can be found here: https://travis-ci.org/ap > > ache/incubator-predictionio/builds/275634960 > > > > Maven artifacts are built from the release candidate artifacts above, and > > are provided as convenience for testing with engine templates. The Maven > > artifacts are provided at the Maven staging repo here: > > > > https://repository.apache.org/content/repositories/ > orgapachepredictionio-1020 > > > > All JIRAs completed for this release are tagged with 'FixVersion = > > 0.12.0-incubating'. You can view them here: https://issues.apache.or > > g/jira/secure/ReleaseNote.jspa?version=12340591=12320420 > > > > The artifacts have been signed with Key: ytX8GpWv > > > > Please vote accordingly: > > > > [ ] +1, accept RC as the official 0.12.0 release > > [ ] -1, do not accept RC as the official 0.12.0 release because... > > > > > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California
[jira] [Created] (PIO-121) Authentication for Engine's HTTP API
Mars Hall created PIO-121: - Summary: Authentication for Engine's HTTP API Key: PIO-121 URL: https://issues.apache.org/jira/browse/PIO-121 Project: PredictionIO Issue Type: Improvement Components: Core Affects Versions: 0.12.0-incubating Reporter: Mars Hall PredictionIO already supports key-based authentication for accessing the {{/events.json}} API, but is missing any type of auth for the {{/queries.json}} API and {{/}} status page. Comprehensive authentication would simplify deployment to cloud platforms by eliminating the current requirement to deploy on a private network in order to prevent public access. As a first step, adding key-based auth to the Engine APIs that matches the Eventserver API {{accessKey}} behavior would be a huge step forward. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: JIRAs to include in 0.12 Release
Wow, so many resolved issues, so much progress! Thank you for sharing this list Chan. [PIO-120] Process hangs if Elasticsearch is not available during train (pending) https://issues.apache.org/jira/browse/PIO-120 Would one of you folks review this simple fix PR? The, I'll merge it to make 0.12.0. https://github.com/apache/incubator-predictionio/pull/432 *Mars
[GitHub] incubator-predictionio issue #432: [PIO-120] Process hangs if Elasticsearch ...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/432 Would be great to have this included in 0.12.0 release. ---
[GitHub] incubator-predictionio pull request #432: [PIO-120] Process hangs if Elastic...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/432 [PIO-120] Process hangs if Elasticsearch is not available during train Fixes [PIO-120](https://issues.apache.org/jira/browse/PIO-120) This changeset ensures that the process exits gracefully after ES connection error. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio fix-es-hang-on-train Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #432 commit f1c7337e246c9bd2bed5cc080efcf3dc81e4b055 Author: Mars Hall <m...@heroku.com> Date: 2017-09-07T21:38:46Z Graceful exit after ES connection error during train. ---
[GitHub] incubator-predictionio issue #428: [PIO-117] Cannot delete event data on ESL...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/428 ð looks good ---
[GitHub] incubator-predictionio issue #430: [PIO-119] Bump up Elasticsearch to 5.5.2
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/430 Just tested build, train, batchpredict, & deploy locally with ES 5.5.2. ð looks good! ---
[jira] [Updated] (PIO-120) Process hangs if Elasticsearch is not available during train
[ https://issues.apache.org/jira/browse/PIO-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-120: -- External issue URL: https://github.com/apache/incubator-predictionio/pull/432 > Process hangs if Elasticsearch is not available during train > > > Key: PIO-120 > URL: https://issues.apache.org/jira/browse/PIO-120 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.12.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > I noticed that, when Elasticsearch is configured as meta storage, `pio train` > will hang with the following error unless Elasticsearch is on-line/available: > {code} > Exception in thread "main" java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171) > at > org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145) > at > org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.predictionio.shaded.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.predictionio.shaded.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-120) Process hangs if Elasticsearch is not available during train
Mars Hall created PIO-120: - Summary: Process hangs if Elasticsearch is not available during train Key: PIO-120 URL: https://issues.apache.org/jira/browse/PIO-120 Project: PredictionIO Issue Type: Bug Components: Core Affects Versions: 0.12.0-incubating Reporter: Mars Hall Assignee: Mars Hall I noticed that, when Elasticsearch is configured as meta storage, `pio train` will hang with the following error unless Elasticsearch is on-line/available: {code} Exception in thread "main" java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171) at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145) at org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) at org.apache.predictionio.shaded.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) at org.apache.predictionio.shaded.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-predictionio issue #401: [PIO-72] Fix class loading for pio-shell
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/401 Yes @BrianOn99, I do believe [class loading for pio-shell is fixed](https://github.com/apache/incubator-predictionio/blob/develop/bin/pio-shell#L59) for the next release, or if you `make-distribution.sh` on main `develop` branch, you'll get these fixes now. ---
Re: Graduation to TLP
Please continue to include me as a committer and PMC member. After some deliberation, I cannot take on further responsibility as VP at this time. *Mars On Tue, Sep 5, 2017 at 10:32 AM, Donald Szeto <don...@apache.org> wrote: > Thanks for the clarification Pat! It always help to have Apache veterans to > provide historical context to these processes. > > As for me, I'd like to remain as PMC and committer. > > I like the idea of polling the current committers and PMC, but like you > said, most of them got pretty busy and may not be reading mailing list in a > while. Maybe let me try a shout out here and see if anyone would > acknowledge it, so that we know whether a poll will be effective. > > *>> If you're a PMC or committer who see this line but hasn't been replying > this thread, please acknowledge. <<* > > Regarding the maturity model, this is my perception right now: > - CD10, CD20, CD30, CD40 (and we start to have CD50 as well) > - LC10, LC20, LC30, LC40, LC50 > - RE10, RE20, RE30, RE50 (I think we hope to also do RE40 with 0.12) > - QU10, QU30, QU40, QU50 (we should put a bit of focus to QU20) > - CO10, CO20, CO30, CO40, CO60, CO70 (for CO50, I think we've been > operating under the assumption that PMC and contributors are pretty > standard definitions by ASF. We can call those out explicitly.) > - CS10, CS50 (We are also assuming implicitly CS20, CS30, and CS40 from > main ASF doc) > - IN10, IN20 > > Let me know what you think. > > On Fri, Sep 1, 2017 at 10:32 AM, Pat Ferrel <p...@occamsmachete.com> wrote: > > > The Chair, PMC, and Committers may be different after graduation. > > PMC/committers are sometimes not active committers but can have a > valuable > > role as mentors, in non-technical roles, as support people on the mailing > > list, or as sometimes committers who don’t seem very active but come in > > every so often to make a key contribution. So I hope this doesn’t become > a > > time to prune too deeply. I’d suggest we only do that if one of the > > committers has done something to lessen our project maturity or wants to > be > > left out for their own reasons. An example of bad behavior is someone > > trying to exert corporate dominance (which is severely frowned on by the > > ASF). Another would be someone who is disruptive to the point of > destroying > > team effectiveness. I personally haven’t seen any of this but purposely > > don’t read everything so chime in here. > > > > It would be good to have people declare their interest-level. As for me, > > I’d like to remain on the PMC as a committer but have no interest in > Chair. > > Since people can become busy periodically and not read @dev (me?) we > could, > > maybe should, poll the current committers and PMC to get the lists ready > > for the graduation proposal. > > > > > > Don’t forget that we are not just asking for dev community opinion about > > graduation. We are also asking that people check things like the Maturity > > Checklist to see it we are ready. http://community.apache.org/ > > apache-way/apache-project-maturity-model.html < > > http://community.apache.org/apache-way/apache-project- > maturity-model.html> > > People seem fairly enthusiastic about applying for graduation, but are > > there things we need to do before hand? The goal is to show that we do > not > > require the second level check for decisions that the IPMC provides. The > > last release required no changes but had a proviso about content > licenses. > > This next release should fly through without provisos IMHO. Are there > other > > things we should do? > > > > > > On Sep 1, 2017, at 6:16 AM, takako shimamoto <chiboch...@gmail.com> > wrote: > > > > I entirely agree with everyone else. > > I hope the PIO community will become more active after graduation. > > > > > 2. If we are to graduate, who should we include in the list of the > > initial > > > PMC? > > > > Don't all present IPMC members are included in the list of the initial > PMC? > > > > Personally, I think we may as well check and see if present IPMC > > members intend to become an initial PMC for graduation. > > Members who make a declaration of intent to become it will surely > > contribute to the project. > > It is a great contribution not only to develop a program but also to > > respond to email aggressively or fix document. > > > > > > 2017-08-29 14:20 GMT+09:00 Donald Szeto <don...@apache.org>: > > > Hi all, > > > > > > Since the ASF Board meeting in May ( > > > htt
[GitHub] incubator-predictionio issue #401: [PIO-72] Fix class loading for pio-shell
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/401 Hi @BrianOn99, Adding that `--jars` option to `pio-shell` command is the right solution, and then the "No suitable driver found" error can be solved by adding the Postgres driver to your PredictionIO install: 1. download [Postgres JDBC driver](https://jdbc.postgresql.org/download.html) (probably the newest one for Java 8) 2. put it in the PredictionIO distribution's `lib/` directory (this directory is sibling to the `bin/` directory where the `pio` command is located; any jars in that directory are automatically added to the classpath for `pio` commands) We're working on releasing 0.12! ---
Re: Graduation to TLP
Thank you Donald for leading the charge here, >From my perspective PredictionIO is already Apache in process & title. Graduation seems quite natural to reach top-level recognition. I'm interested in helping with PMC duties. Would be great to understand what the VP vs Member responsibilities look like. Let's graduate. +1 *Mars On Wed, Aug 30, 2017 at 15:21 Pat Ferrel <p...@occamsmachete.com> wrote: > I have had several people tell me they want to wait until PIO is not > incubating before using it. This even after explaining that “incubating” > has more to do with getting into the Apache Way of doing things and has no > direct link to quality or community. I can only conclude from this that > “incubating” is holding back adoption. > > And yet we have absorbed the Apache Way and will have at least 3 releases > (including 12) a incubating. We have brought in a fair number of new > committers and seem to have a healthy community of users. > > +1 for a push to graduate. > > > On Aug 28, 2017, at 10:20 PM, Donald Szeto <don...@apache.org> wrote: > > Hi all, > > Since the ASF Board meeting in May ( > > http://apache.org/foundation/records/minutes/2017/board_minutes_2017_05_17.txt > ), > PredictionIO has been considered nearing graduation and I think we are > almost there. I am kickstarting this thread so that we can discuss on these > 3 things: > > 1. Does the development community feel ready to graduate? > 2. If we are to graduate, who should we include in the list of the initial > PMC? > 3. If we are to graduate, who should be the VP of the initial PMC? > > These points are relevant for graduation. Please take a look at the > official graduation guide: > http://incubator.apache.org/guides/graduation.html. > > In addition, Sara and I have been working to transfer the PredictionIO > trademark to the ASF. We will keep you updated with our progress. > > I would also like to propose to cut a 0.12.0 release by merging JIRAs that > have a target version set to 0.12.0-incubating for graduation. 0.12.0 will > contain cleanups for minor license and copyright issues that were pointed > out in previous releases by IPMC. > > Let me know what you think. > > Regards, > Donald > > -- *Mars Hall 415-818-7039 Customer Facing Architect Salesforce Platform / Heroku San Francisco, California
[jira] [Resolved] (PIO-115) Cache name-to-ID lookups for Storage app & channel
[ https://issues.apache.org/jira/browse/PIO-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall resolved PIO-115. --- Resolution: Fixed > Cache name-to-ID lookups for Storage app & channel > -- > > Key: PIO-115 > URL: https://issues.apache.org/jira/browse/PIO-115 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When stress testing the Universal Recommender with high-concurrency HTTP/REST > queries, we observed that Elasticsearch traffic was majority composed of > requests resolving the Storage app's name & channel, over and over and over > again! In this case, [each per-query call to > `LEventStore.findByEntity`|https://github.com/heroku/predictionio-engine-ur/blob/master/src/main/scala/URAlgorithm.scala#L694] > re-resolves the app name to an ID. > Implement memoization for the function that performs these name-to-ID > lookups, so that only one set of lookups is performed per process for each > app+channel combination. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
[ https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall resolved PIO-114. --- Resolution: Fixed > Elasticsearch 5.x StorageClient basic HTTP authentication > - > > Key: PIO-114 > URL: https://issues.apache.org/jira/browse/PIO-114 > Project: PredictionIO > Issue Type: New Feature > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > Add optional username-password configuration for the new Elasticsearch 5 > client; in {{conf/pio-env.sh}} config: > {code} > # Optional basic HTTP auth > PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name > PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret > {code} > These credentials are sent in each Elasticsearch request as an HTTP Basic > Authorization header. > Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai > on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall resolved PIO-106. --- Resolution: Fixed > Elasticsearch 5.x StorageClient should reuse RestClient > --- > > Key: PIO-106 > URL: https://issues.apache.org/jira/browse/PIO-106 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > When using the proposed [PIO-105 Batch > Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an > engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's > REST interface appears to become overloaded, ending with the Spark job being > killed from errors like: > {noformat} > [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search > [ERROR] [Utils] Aborting task > [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search > [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) > [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) > [ERROR] [Common$] Invalid app name ur > [ERROR] [Utils] Aborting task > [ERROR] [URAlgorithm] Error when read recent events: > java.lang.IllegalArgumentException: Invalid app name ur > [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) > [ERROR] [Utils] Aborting task > [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) > [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, > executor driver): java.net.BindException: Can't assign requested address > at sun.nio.ch.Net.connect0(Native Method) > at sun.nio.ch.Net.connect(Net.java:454) > at sun.nio.ch.Net.connect(Net.java:446) > at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) > at > org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) > at > org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) > at > org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) > at > org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) > at java.lang.Thread.run(Thread.java:745) > {noformat} > After these errors happen & the job is killed, Elasticsearch immediately > recovers. It responds to queries normally. I researched what could cause this > and found an [old issue in the main Elasticsearch > repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints > given therein about *using keep-alive in the ES client* to avoid these > performance issues, I investigated how PredictionIO's [Elasticsearch > StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] > manages its connections. > I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), > Elasticsearch creates a new underlying connection, an Elasticsearch > RestClient, for > [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] > > [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] > > [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] > & > [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] > with its API. As a result, *there is no way Elasticsearch TCP connections > can be reused via HTTP keep-alive*. > High-performance workloads with Elasticsearch 5.x will suffer from these > issues unless we refactor Elasticsearch StorageClient to share the underlying > RestClient instead of [building a new one everytime the client is > used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. > There are certainly different approaches we could take to shari
[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/421 I will resolve these conflicts today and then merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #425: [PIO-110] Refactoring
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/425 Great Scala-style improvements here, @takezoe. Great to see this gardening of the codebase ð¤ I'm wondering, in [PIO-110](https://issues.apache.org/jira/browse/PIO-110) the objective is to refactor the common code between `CreateServer` and `BatchPredict`, yet I do not see that kind of change here. Are you working on extracting & reusing the common code as the next step for this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/421 ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/421 Based on these [Scala Concurrency/Thread Safety docs](https://twitter.github.io/scala_school/concurrency.html#danger), I believe simply annotating `@volatile` will cause the synchronization needed for thread-safety in this case. So, I updated this PR with that change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #424: [PIO-115] Implement Storage app & channel...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/424 Thanks for your feedback @dszeto. I've addressed the code style & JIRA issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #424: Implement Storage app & channel na...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/424 Implement Storage app & channel name-to-ID cache When stress testing the Universal Recommender with high-concurrency HTTP/REST queries, we observed that Elasticsearch traffic was majority composed of requests resolving the Storage app's name & channel, over and over and over again! In this case, [each per-query call to `LEventStore.findByEntity`](https://github.com/heroku/predictionio-engine-ur/blob/master/src/main/scala/URAlgorithm.scala#L694) re-resolves the app name to an ID. This changeset implements memoization for the function that performs these name-to-ID lookups, so that only one set of lookups is performed per process for each app+channel combination. As a result, we've seen overall throughput increase ð and error rate drop dramatically ð. This common optimization effects all storage backends, not just Elasticsearch. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio cache-storage-name-to-id Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/424.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #424 commit 9825ae2a6981431ce49a6ea40ddabd82ab4121f2 Author: Mars Hall <m...@heroku.com> Date: 2017-08-22T18:48:04Z Implement Storage app & channel name-to-ID cache --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #421: Elasticsearch 5.x singleton client...
Github user mars commented on a diff in the pull request: https://github.com/apache/incubator-predictionio/pull/421#discussion_r134630517 --- Diff: storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala --- @@ -18,27 +18,84 @@ package org.apache.predictionio.data.storage.elasticsearch import org.apache.http.HttpHost +import org.apache.http.auth.{AuthScope, UsernamePasswordCredentials} +import org.apache.http.impl.client.BasicCredentialsProvider +import org.apache.http.impl.nio.client.HttpAsyncClientBuilder import org.apache.predictionio.data.storage.BaseStorageClient import org.apache.predictionio.data.storage.StorageClientConfig import org.apache.predictionio.data.storage.StorageClientException +import org.apache.predictionio.workflow.CleanupFunctions import org.elasticsearch.client.RestClient +import org.elasticsearch.client.RestClientBuilder.HttpClientConfigCallback import grizzled.slf4j.Logging -case class ESClient(hosts: Seq[HttpHost]) { - def open(): RestClient = { +object ESClient extends Logging { + var _sharedRestClient: Option[RestClient] = None --- End diff -- Thanks for the hints here. I suspected this would be an issue. I'm investigating how to make this threadsafe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/421 Cheers @takezoe I addressed all your Scala style & usage suggestions. Still need to take care of the threadsafety issue with the singleton client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
[ https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-114: -- External issue URL: https://github.com/apache/incubator-predictionio/pull/421 > Elasticsearch 5.x StorageClient basic HTTP authentication > - > > Key: PIO-114 > URL: https://issues.apache.org/jira/browse/PIO-114 > Project: PredictionIO > Issue Type: New Feature > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > Add optional username-password configuration for the new Elasticsearch 5 > client; in {{conf/pio-env.sh}} config: > {code} > # Optional basic HTTP auth > PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name > PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret > {code} > These credentials are sent in each Elasticsearch request as an HTTP Basic > Authorization header. > Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai > on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-predictionio pull request #421: Elasticsearch singleton client wit...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/421 Elasticsearch singleton client with authentication Fixes both [PIO-106](https://issues.apache.org/jira/browse/PIO-106) & [PIO-114](https://issues.apache.org/jira/browse/PIO-114), replacing https://github.com/apache/incubator-predictionio/pull/372. These are combined because they each heavily revise the same class. ## Authentication Add optional username-password configuration for the new Elasticsearch 5 client; in `pio-env.sh` config: ```bash # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret ``` These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). ## Singleton client This PR moves to a singleton Elasticsearch RestClient which has built-in HTTP keep-alive and TCP connection pooling. Running on this branch, we've seen a 2x speed-up in predictions from the Universal Recommender with ES5, and the feared "cannot assign requested address" ð± Elasticsearch connection errors have completely disappeared. Running `pio batchpredict` for 160K queries results in only 7 total TCP connections to Elasticsearch. Previously that would escalate to ~25,000 connections before denying further connections. **This fundamentally changes the interface for the new [Elasticsearch 5.x REST client](https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch)** introduced with PredictionIO 0.11.0-incubating. With this changeset, the `client` is a single instance of [`org.elasticsearch.client.RestClient`](https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java). ð¨ **As a result of this change, any engine templates that directly use the Elasticsearch 5 StorageClient would require an update for compatibility.** The change is this: ### Original ```scala val client: StorageClient = ⦠// code to instantiate client val restClient: RestClient = client.open() try { restClient.performRequest(â¦) } finally { restClient.close() } ``` ### With this PR ```scala val client: RestClient = ⦠// code to instantiate client client.performRequest(â¦) ``` *No more balancing `open` & `close` as this is handled by using a new `CleanupFunctions` hook added to the framework in this PR.* [Universal Recommender](https://github.com/actionml/universal-recommender) is the only template that I know of which directly uses the ES StorageClient outside of PredictionIO core. See example [UR changes for compatibility with this PR](https://github.com/heroku/predictionio-engine-ur/compare/esclient-singleton). ### Elasticsearch StorageClient changes * reimplemented as singleton * installs a cleanup function See [StorageClient](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2926f4cfd93ccb02320e2a9503ccd223) ### Core changes A new [`CleanupFunctions`](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2a958821ac58f019fbce38540c775f19) hook has been added which enables developers of storage modules to register anonymous functions with `CleanupFunctions.add { ⦠}` to be executed after Spark-related commands/workflows. The hook is called in a `finally { CleanupFunctions.run() }` from within: * `pio import` * `pio export` * `pio train` * `pio batchpredict` Apologies for the huge indentation shifts from the requisite try-finally blocks: ```scala try { // Freshly indented code. } finally { CleanupFunctions.run() } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio esclient-singleton-with-auth Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/421.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #421 commit f30f27bcc09a397efb42a7923938beceaeac37bf Author: Mars Hall <m...@heroku.com> Date: 2017-08-08T23:29:15Z Migrate to singleton Elasticsearch client to use underlying connection pooling (PoolingNHttpClientConnectionManager) commit d99927089a41cb85f525cb74bdf394eed4686bf2 Author: Mars Hall <m...@heroku.com>
[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...
Github user mars closed the pull request at: https://github.com/apache/incubator-predictionio/pull/420 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #420: [PIO-106] Elasticsearch 5.x StorageClient...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/420 Closing in favor of https://github.com/apache/incubator-predictionio/pull/421 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #372: Elasticsearch basic HTTP authentication
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/372 Closing in favor of https://github.com/apache/incubator-predictionio/pull/421 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
[ https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-114: -- Description: Add optional username-password configuration for the new Elasticsearch 5 client; in {{conf/pio-env.sh}} config: {code} # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret {code} These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). was: Add optional username-password configuration for the new Elasticsearch 5 client; in {conf/pio-env.sh} config: {code} # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret {code} These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). > Elasticsearch 5.x StorageClient basic HTTP authentication > - > > Key: PIO-114 > URL: https://issues.apache.org/jira/browse/PIO-114 > Project: PredictionIO > Issue Type: New Feature > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > Add optional username-password configuration for the new Elasticsearch 5 > client; in {{conf/pio-env.sh}} config: > {code} > # Optional basic HTTP auth > PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name > PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret > {code} > These credentials are sent in each Elasticsearch request as an HTTP Basic > Authorization header. > Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai > on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication
Mars Hall created PIO-114: - Summary: Elasticsearch 5.x StorageClient basic HTTP authentication Key: PIO-114 URL: https://issues.apache.org/jira/browse/PIO-114 Project: PredictionIO Issue Type: New Feature Components: Core Affects Versions: 0.11.0-incubating Reporter: Mars Hall Assignee: Mars Hall Add optional username-password configuration for the new Elasticsearch 5 client; in {conf/pio-env.sh} config: {code:shell} # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret {code} ``` These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-predictionio issue #420: [PIO-106] Elasticsearch 5.x StorageClient...
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/420 Seem to solve this [long ago reported Elasticsearch connection issue](https://github.com/elastic/elasticsearch/issues/3647) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...
Github user mars commented on a diff in the pull request: https://github.com/apache/incubator-predictionio/pull/420#discussion_r132601642 --- Diff: storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEvaluationInstances.scala --- @@ -110,28 +104,24 @@ class ESEvaluationInstances(client: ESClient, config: StorageClientConfig, index error(s"Failed to access to /$index/$estype/$id", e) None } finally { - restClient.close() + client.close() --- End diff -- This `close` should be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/420 [PIO-106] Elasticsearch 5.x StorageClient should reuse RestClient Implements [PIO-106](https://issues.apache.org/jira/browse/PIO-106) This PR moves to a singleton Elasticsearch RestClient which has built-in HTTP keep-alive and TCP connection pooling. Running on this branch, we've seen a 2x speed-up in predictions from the Universal Recommender with ES5, and the feared "cannot bind" ð± Elasticsearch connection errors have completely disappeared. Running `pio batchpredict` for 170K queries results in only 7 total TCP connections to Elasticsearch. Previously that would escalate to ~25,000 connections before denying further connections. **This fundamentally changes the interface for the new [Elasticsearch 5.x REST client](https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch)** introduced with PredictionIO 0.11.0-incubating. With this changeset, the `client` is a single instance of [`org.elasticsearch.client.RestClient`](https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java). ð¨ **As a result of this change, any engine templates that directly use the Elasticsearch 5 StorageClient would require an update for compatibility.** The change is this: ### Original ```scala val client: StorageClient = ⦠// code to instantiate client val restClient: RestClient = client.open() try { restClient.performRequest(â¦) } finally { restClient.close() } ``` ### With this PR ```scala val client: RestClient = ⦠// code to instantiate client client.performRequest(â¦) ``` *No more balancing `open` & `close` as this is handled by using a new `CleanupFunctions` hook added to the framework in this PR.* [Universal Recommender](https://github.com/actionml/universal-recommender) is the only template that I know of which directly uses the ES StorageClient outside of PredictionIO core. See the [UR changes for compatibility with this PR](https://github.com/heroku/predictionio-engine-ur/compare/esclient-singleton). ### Elasticsearch StorageClient changes * reimplemented as singleton * installs a cleanup function See [StorageClient](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2926f4cfd93ccb02320e2a9503ccd223) ### Core changes A new [`CleanupFunctions`](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2a958821ac58f019fbce38540c775f19) hook has been added which enables developers of storage modules to register anonymous functions with `CleanupFunctions.add { ⦠}` to be executed after Spark-related commands/workflows. The hook is called in a `finally { CleanupFunctions.run() }` from within: * `pio import` * `pio export` * `pio train` * `pio batchpredict` Apologies for the huge indentation shifts from the requisite try-finally blocks: ```scala try { // Freshly indented code. } finally { CleanupFunctions.run() } ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio esclient-singleton Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #420 commit f30f27bcc09a397efb42a7923938beceaeac37bf Author: Mars Hall <m...@heroku.com> Date: 2017-08-08T23:29:15Z Migrate to singleton Elasticsearch client to use underlying connection pooling (PoolingNHttpClientConnectionManager) commit d99927089a41cb85f525cb74bdf394eed4686bf2 Author: Mars Hall <m...@heroku.com> Date: 2017-08-10T03:00:58Z Log stacktrace for Storage initialization errors. commit dc4c31cbcddbb3b281d52b8099e210adc546d1ed Author: Mars Hall <m...@heroku.com> Date: 2017-08-10T22:55:38Z Remove shade rule that breaks Elasticsearch 5 client commit 7634a7ab720239d5f8efda85f67b26bdaff797f8 Author: Mars Hall <m...@heroku.com> Date: 2017-08-10T22:59:01Z Collect & run cleanup functions to allow spark-submit processes to end gracefully. commit 5953451f40e554eafa887328122c794edbbd8f1d Author: Mars Hall <m...@heroku.com> Date: 2017-08-11T00:06:24Z Rename CleanupFunctions to match object name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this
Re: August 2017 Release
Yes that is the PR. When I checked out develop yesterday, I though it was already merged. Apologies for my confusion. I'd like to see if I can get that merged for the release. Looking into it now. *Mars ( <> .. <> ) > On Aug 4, 2017, at 12:46, Donald Szeto <don...@apache.org> wrote: > > Hey Mars, > > Is this the PR in question? > https://github.com/apache/incubator-predictionio/pull/372 > > Regards, > Donald > > On Thu, Aug 3, 2017 at 11:49 AM, Mars Hall <m...@heroku.com> wrote: > >> Hit an Authenticated Elasticsearch 5.x problem on the current develop >> branch. >> >> I just tested the HEAD of develop by performing: >> >> ./make-distribution.sh \ >>-Dscala.version=2.11.8 \ >>-Dspark.version=2.1.0 \ >>-Dhadoop.version=2.7.3 \ >>-Delasticsearch.version=5.1.1 >> >> Then, tried build/train/deploy of our Universal Recommender template. >> >> Locally, it makes it through train to the point when it saves to >> Elasticsearch, failing with: >> >>> Exception in thread "main" java.lang.NoSuchMethodError: >> org.elasticsearch.client.RestClient.performRequest( >> Ljava/lang/String;Ljava/lang/String;Ljava/util/Map;[Lorg/ >> apache/http/Header;)Lorg/elasticsearch/client/Response; >>> at org.template.EsClient$.createIndex(EsClient.scala:132) >>> at org.template.EsClient$.hotSwap(EsClient.scala:218) >>> at org.template.URModel.save(URModel.scala:86) >> >> I tried deploying it to Heroku as well, and it fails much earlier when >> simply connecting to Elasticsearch: >> >>> remote: Exception in thread "main" >>> org.elasticsearch.client.ResponseException: >> HEAD https://xx.us-east-1.bonsaisearch.net:443/pio_meta: HTTP/1.1 401 >> Unauthorized >>> remote: at org.elasticsearch.client.RestClient$1.completed( >> RestClient.java:311) >>> remote: at org.elasticsearch.client.RestClient$1.completed( >> RestClient.java:300) >>> remote: at shadeio.data.http.concurrent.BasicFuture.completed( >> BasicFuture.java:119) >>> remote: at shadeio.data.http.impl.nio.client. >> DefaultClientExchangeHandlerImpl.responseCompleted( >> DefaultClientExchangeHandlerImpl.java:177) >>> remote: at shadeio.data.http.nio.protocol. >> HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java: >> 436) >>> remote: at shadeio.data.http.nio.protocol. >> HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java: >> 309) >>> remote: at shadeio.data.http.impl.nio. >> DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection. >> java:255) >> >> >> These issues were previously found to be caused by this shade rule: >> https://github.com/apache/incubator-predictionio/blob/ >> develop/storage/elasticsearch/build.sbt#L42 >> >> It looks like the shaded package does not actually use the new >> authentication code. >> >> Chan Lee mentioned to me that he was only able to make the TravisCI build >> pass by adding this shade rule, but it is clearly breaking the authenicated >> Elasticsearch functionality. >> >> Any ideas how to solve this? >> >> *Mars >> >> ( <> .. <> ) >> >>> On Aug 3, 2017, at 11:02, Donald Szeto <don...@apache.org> wrote: >>> >>> On Thu, Aug 3, 2017 at 10:07 AM, Mars Hall <m...@heroku.com> wrote: >>> >>>> I just opened a PR to add docs for batch predict. >>>> >>>> Moving forward with the 0.12.0 release sounds great. Today, I will pull >>>> develop and see how it's working with the Heroku buildpack. >>>> >>> >>> Awesome. Thanks! >>> >>> >>>>> On Aug 3, 2017, at 00:37, takako shimamoto <chiboch...@gmail.com> >> wrote: >>>>> >>>>> I think it's almost ready, and now we just have to update the current >>>>> documentation. >>>>> The deadline of several unresolved issues for Target Version/s: >>>>> 0.12.0-incubating is extended, right? >>>> >>> >>> Yes. Let's extend those that have not started working if there's no >>> objection. >> >>
Re: August 2017 Release
Hit an Authenticated Elasticsearch 5.x problem on the current develop branch. I just tested the HEAD of develop by performing: ./make-distribution.sh \ -Dscala.version=2.11.8 \ -Dspark.version=2.1.0 \ -Dhadoop.version=2.7.3 \ -Delasticsearch.version=5.1.1 Then, tried build/train/deploy of our Universal Recommender template. Locally, it makes it through train to the point when it saves to Elasticsearch, failing with: > Exception in thread "main" java.lang.NoSuchMethodError: > org.elasticsearch.client.RestClient.performRequest(Ljava/lang/String;Ljava/lang/String;Ljava/util/Map;[Lorg/apache/http/Header;)Lorg/elasticsearch/client/Response; > at org.template.EsClient$.createIndex(EsClient.scala:132) > at org.template.EsClient$.hotSwap(EsClient.scala:218) > at org.template.URModel.save(URModel.scala:86) I tried deploying it to Heroku as well, and it fails much earlier when simply connecting to Elasticsearch: > remote: Exception in thread "main" > org.elasticsearch.client.ResponseException: HEAD > https://xx.us-east-1.bonsaisearch.net:443/pio_meta: HTTP/1.1 401 > Unauthorized > remote: at > org.elasticsearch.client.RestClient$1.completed(RestClient.java:311) > remote: at > org.elasticsearch.client.RestClient$1.completed(RestClient.java:300) > remote: at > shadeio.data.http.concurrent.BasicFuture.completed(BasicFuture.java:119) > remote: at > shadeio.data.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) > remote: at > shadeio.data.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) > remote: at > shadeio.data.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:309) > remote: at > shadeio.data.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255) These issues were previously found to be caused by this shade rule: https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/build.sbt#L42 It looks like the shaded package does not actually use the new authentication code. Chan Lee mentioned to me that he was only able to make the TravisCI build pass by adding this shade rule, but it is clearly breaking the authenicated Elasticsearch functionality. Any ideas how to solve this? *Mars ( <> .. <> ) > On Aug 3, 2017, at 11:02, Donald Szeto <don...@apache.org> wrote: > > On Thu, Aug 3, 2017 at 10:07 AM, Mars Hall <m...@heroku.com> wrote: > >> I just opened a PR to add docs for batch predict. >> >> Moving forward with the 0.12.0 release sounds great. Today, I will pull >> develop and see how it's working with the Heroku buildpack. >> > > Awesome. Thanks! > > >>> On Aug 3, 2017, at 00:37, takako shimamoto <chiboch...@gmail.com> wrote: >>> >>> I think it's almost ready, and now we just have to update the current >>> documentation. >>> The deadline of several unresolved issues for Target Version/s: >>> 0.12.0-incubating is extended, right? >> > > Yes. Let's extend those that have not started working if there's no > objection.
[GitHub] incubator-predictionio pull request #418: batchpredict docs
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/418 batchpredict docs JIRA [PIO-111](https://issues.apache.org/jira/browse/PIO-111) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio batchpredict-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #418 commit 382d238f73fb04728b5fba9fc0484084ffc0945d Author: Mars Hall <m...@heroku.com> Date: 2017-08-02T22:21:39Z Update therubyracer gem to most recent patch-level for macOS 10.12 compatibility. commit eb79654f2c95abaf747f163bc43f86e8ed9328a0 Author: Mars Hall <m...@heroku.com> Date: 2017-08-03T00:29:39Z Documentation for `pio batchpredict` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PIO-111) Document pio batchpredict
[ https://issues.apache.org/jira/browse/PIO-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-111: -- External issue URL: https://github.com/apache/incubator-predictionio/pull/418 > Document pio batchpredict > - > > Key: PIO-111 > URL: https://issues.apache.org/jira/browse/PIO-111 > Project: PredictionIO > Issue Type: Task > Components: Documentation >Affects Versions: 0.12.0-incubating >Reporter: Donald Szeto >Assignee: Mars Hall > Labels: newbie > > {{pio batchpredict}} is a new feature created in PIO-105. It needs to be > documented. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-109) Customizable HTTP server configuration
Mars Hall created PIO-109: - Summary: Customizable HTTP server configuration Key: PIO-109 URL: https://issues.apache.org/jira/browse/PIO-109 Project: PredictionIO Issue Type: Improvement Components: Core Affects Versions: 0.11.0-incubating Reporter: Mars Hall Make it possible to customize the Akka/Spray server config [/common/src/main/resources/application.conf|https://github.com/apache/incubator-predictionio/blob/develop/common/src/main/resources/application.conf] without building PredictionIO from source. A possible solution might be an option to the {{pio deploy}} command, like {{--server-config ./application.conf}}, that allows overriding with a user-supplied config. Background: in PIO-95 I requested a configurable timeout for the engine's HTTP server. That issue was resolved with a simple change to the server timeout and lead to this more generalized idea. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: New PMC member and committer: Mars Hall
Thank you Donald, I'm honored & excited to officially join the project! *Mars ( <> .. <> ) > On Jul 28, 2017, at 12:01, Donald Szeto <don...@apache.org> wrote: > > Hi all, > > The Project Management Committee (PMC) for Apache PredictionIO (incubating) > has asked Mars Hall to become a PMC member and committer, and we are > pleased to announce that he has accepted. > > Mars has been working on PredictionIO since 0.10 and has suggested and made > changes to the core so that it has become more configurable and easier to > deploy on Heroku. He added authentication support to the REST-based > Elasticsearch client. He has also found and fixed core bugs. > > Mars is the primary driver in delivering a good developer experience > through Heroku buildpacks for PredictionIO ( > https://github.com/heroku/predictionio-buildpack), which allows engine > templates to be submitted to Heroku and deployed automatically. He also > made a couple engine templates that are preset to do so ( > https://github.com/heroku/predictionio-engine-classification, > https://github.com/heroku/predictionio-engine-ur). > > Being a committer enables easier contribution to the project since there is > no need to go via the patch submission process. This should enable better > productivity. Being a PMC member enables assistance with the management and > to guide the direction of the project. > > Please join us in welcoming Mars. > > Regards, > Donald
[jira] [Updated] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient
[ https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-106: -- Description: When using the proposed [PIO-105 Batch Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's REST interface appears to become overloaded, ending with the Spark job being killed from errors like: {noformat} [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search [ERROR] [Utils] Aborting task [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749) [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737) [ERROR] [Common$] Invalid app name ur [ERROR] [Utils] Aborting task [ERROR] [URAlgorithm] Error when read recent events: java.lang.IllegalArgumentException: Invalid app name ur [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751) [ERROR] [Utils] Aborting task [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750) [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, executor driver): java.net.BindException: Can't assign requested address at sun.nio.ch.Net.connect0(Native Method) at sun.nio.ch.Net.connect(Net.java:454) at sun.nio.ch.Net.connect(Net.java:446) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273) at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139) at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348) at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192) at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) at java.lang.Thread.run(Thread.java:745) {noformat} After these errors happen & the job is killed, Elasticsearch immediately recovers. It responds to queries normally. I researched what could cause this and found an [old issue in the main Elasticsearch repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints given therein about *using keep-alive in the ES client* to avoid these performance issues, I investigated how PredictionIO's [Elasticsearch StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch] manages its connections. I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), Elasticsearch creates a new underlying connection, an Elasticsearch RestClient, for [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80] [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157] [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78] & [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205] with its API. As a result, *there is no way Elasticsearch TCP connections can be reused via HTTP keep-alive*. High-performance workloads with Elasticsearch 5.x will suffer from these issues unless we refactor Elasticsearch StorageClient to share the underlying RestClient instead of [building a new one everytime the client is used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31]. There are certainly different approaches we could take to sharing a RestClient so that its keep-alive behavior may work as designed: * maintain a singleton RestClient that is reused throughout the ES storage classes * create a RestClient on-demand and pass it as an argument to ES storage methods * other ideas? was: When using the proposed [PIO-105 Batch Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's REST interface appears to become overloaded, ending with the Spark job being killed from errors like: {noformat} [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search [ERROR] [Utils] Aborting task [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search [ERROR] [Executor] Exception in task 747.0
[GitHub] incubator-predictionio issue #412: [PIO-105] Batch Predictions
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/412 @takezoe thank you for the feedback. As a relatively-new Scala programmer I really appreciate this kind of review. I am a bit hesitant to make these changes. I'm trying to maintain likeness with the [`CreateServer.scala`](https://github.com/mars/incubator-predictionio/blob/e7c6ebd8cfe2d4a150319025876520fc39be9a34/core/src/main/scala/org/apache/predictionio/workflow/CreateServer.scala) code, to minimize differences in prediction behavior between `pio deploy` and `pio batchpredict`. Any of these stylistic points should probably be matched in CreateServer, so that it continues to be easy to reason about their similarity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #412: Batch Predictions
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/412 Batch Predictions JIRA issue [PIO-105](https://issues.apache.org/jira/browse/PIO-105) Provides a new `pio batchpredict` command. Reads from multi-object JSON input file. Example: ```json {"user":"1"} {"user":"2"} {"user":"3"} {"user":"4"} {"user":"5"} ``` Writes to multi-object JSON output file (actually Hadoop partition files). Example: ```json {"query":{"user":"1"},"prediction":{"itemScores":[{"item":"1","score":33},{"item":"2","score":32}]}} {"query":{"user":"2"},"prediction":{"itemScores":[{"item":"5","score":55},{"item":"3","score":28}]}} {"query":{"user":"3"},"prediction":{"itemScores":[{"item":"2","score":16},{"item":"3","score":12}]}} {"query":{"user":"4"},"prediction":{"itemScores":[{"item":"3","score":19},{"item":"1","score":18}]}} {"query":{"user":"5"},"prediction":{"itemScores":[{"item":"1","score":24},{"item":"4","score":14}]}} ``` See the included [console usage help](#diff-2cf174557564e09d52157be8e839fecf) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio batch-predict Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/412.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #412 commit 99ee6493bddc8f02aee384f3a2db27c6ae3f68cc Author: Mars Hall <m...@heroku.com> Date: 2017-07-13T00:12:25Z Implement BatchPredict commit c205357498e4a4a745810b04130c5bbad78f8686 Author: Mars Hall <m...@heroku.com> Date: 2017-07-14T22:29:26Z Improve console help for batch predict. commit 93f7ed3e5ed10155a688a032e367793d75fa116a Author: Mars Hall <m...@heroku.com> Date: 2017-07-14T22:46:30Z Undo experimental change to publish tools artifact --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (PIO-105) Batch Predictions
Mars Hall created PIO-105: - Summary: Batch Predictions Key: PIO-105 URL: https://issues.apache.org/jira/browse/PIO-105 Project: PredictionIO Issue Type: New Feature Components: Core Reporter: Mars Hall Assignee: Mars Hall Implement a new {{pio batchpredict}} command to enable massive, fast, batch predictions from a trained model. Read a multi-object JSON file as the input format, with one query object per line. Similarly, write results to a multi-object JSON file, with one prediction result + its original query per line. Currently getting bulk predictions from PredictionIO is possible with either: * a {{pio eval}} script, which will always train a fresh, unvalidated model before getting predictions * a custom script that hits the {{queries.json}} HTTP API, which is a serious bottleneck when requesting hundreds-of-thousands or millions of predictions Neither of these existing bulk-prediction hacks are adequate for the reasons mentioned. It's time for this use-case to be a firstclass command :D -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-predictionio issue #401: [PIO-72] Fix class loading for pio-shell
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio/pull/401 Back in May, we fixed an intermittent class loading problem by [making a change to stabilize the classpath](https://github.com/mars/incubator-predictionio/commit/9ecc77628aba347454073e9919096a8fc8e0b952) in our fork of incubator-predictionio. Would you be open to adding that to this PR as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #406: [PIO-102] Fix ESEngineInstances `g...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/406 [PIO-102] Fix ESEngineInstances `getAll` results out of order (Elasticsearch 5.x) Fix for [PIO-102](https://issues.apache.org/jira/browse/PIO-102) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio fix-es-getall-order Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/406.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #406 commit 34fb0de8ae91f3bf9edb7a9823ea1784555845a8 Author: Mars Hall <m...@heroku.com> Date: 2017-07-08T01:22:14Z Append Elasticsearch scroll results to maintain order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PIO-102) ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)
[ https://issues.apache.org/jira/browse/PIO-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-102: -- External issue URL: https://github.com/apache/incubator-predictionio/pull/406 Description: Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate. In the original Elasticsearch 1.x storage client, [the "scroll" pagination responses are collected by *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44] them to one another. In the new Elasticsearch 5.x client, [the "scroll" responses are collected by *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152] them to one another. This out-of-order concatenation breaks [ESEngineInstances `getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192] by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits. I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature. Pull request: https://github.com/apache/incubator-predictionio/pull/406 was: Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate. In the original Elasticsearch 1.x storage client, [the "scroll" pagination responses are collected by *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44] them to one another. In the new Elasticsearch 5.x client, [the "scroll" responses are collected by *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152] them to one another. This out-of-order concatenation breaks [ESEngineInstances `getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192] by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits. I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature. I'll be opening a pull request shortly with the super-simple fix. > ESEngineInstances `getAll` results out of order (Elasticsearch 5.x) > --- > > Key: PIO-102 > URL: https://issues.apache.org/jira/browse/PIO-102 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall >Assignee: Mars Hall > > Using the new Elasticsearch 5.x REST storage client as the meta storage > source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in > conf/pio-env.sh), I found that once an engine has been trained a certain > number of times, that the most recent engine instance is no longer retrieved. > So, I tracked down where those Elasticsearch queries originate. > In the original Elasticsearch 1.x storage client, [the "scroll" pagination > responses are collected by > *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44] > them to one another. > In the new Elasticsearch 5.x client, [the "scroll" responses are collected by > *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/
[jira] [Created] (PIO-102) ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)
Mars Hall created PIO-102: - Summary: ESEngineInstances `getAll` results out of order (Elasticsearch 5.x) Key: PIO-102 URL: https://issues.apache.org/jira/browse/PIO-102 Project: PredictionIO Issue Type: Bug Components: Core Affects Versions: 0.11.0-incubating Reporter: Mars Hall Assignee: Mars Hall Using the new Elasticsearch 5.x REST storage client as the meta storage source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in conf/pio-env.sh), I found that once an engine has been trained a certain number of times, that the most recent engine instance is no longer retrieved. So, I tracked down where those Elasticsearch queries originate. In the original Elasticsearch 1.x storage client, [the "scroll" pagination responses are collected by *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44] them to one another. In the new Elasticsearch 5.x client, [the "scroll" responses are collected by *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152] them to one another. This out-of-order concatenation breaks [ESEngineInstances `getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192] by erroneously replacing the head of the results with an older engine instance, when there are enough engine instances to overflow a single page of Elasticsearch hits. I've observed this buggy behavior after ten trainings, when enough engine instances are stored to trigger Elasticsearch's scroll feature. I'll be opening a pull request shortly with the super-simple fix. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs
[ https://issues.apache.org/jira/browse/PIO-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079272#comment-16079272 ] Mars Hall commented on PIO-96: -- Yes Kenneth, the same storage config can be used for (per my example) a Classifier & UR, but the issue I'm raising here is that it's quite simple for someone to not understand this and end up with corrupted storage. As I mentioned, PredictionIO makes it sound like sharing storage and eventserver between engines is okay. Unfortunately this sets folks up for hard to understand, probably time-wasting, and possibly hidden erroneous data problems. > Storage corrupted by sharing databases between engines with different storage > configs > - > > Key: PIO-96 > URL: https://issues.apache.org/jira/browse/PIO-96 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > When getting started with PredictionIO, it's no problem to spin up an engine > and see it work. Problems emerge when a developer tries running multiple > engines with different storage configs on the same underlying database, such > as: > * a Classifier with *Postgres* meta, event, & model storage, and > * the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & > model storage. > The database will become corrupt because the meta tables are stored in > different databases, but the dynamically created event & model tables may > mistakenly share the same name, like {{pio_event_1}}. > We are directing folks to avoid this problem with the Heroku buildpack by > [isolating each engine's > database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database] > and [optionally running an eventserver per > engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver]. > It's still a problem with local development, though. > It would be great if PredictionIO's management of the database schema's would > inherently avoid such conflicts, like by using random/UUIDs for dynamically > created table names, so that they will never conflict. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs
[ https://issues.apache.org/jira/browse/PIO-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070626#comment-16070626 ] Mars Hall commented on PIO-96: -- Data corruption is an issue, we've seen it happen five times for various local developers & deployments in the exact example I provided. This was never an issue until Universal Recommender required us to mix a PIO engine with a different meta storage source into our environments. Maybe clarification in documentation about this danger is a good way to resolve, as the evolution of PIO itself seems to be headed to solve this? > Storage corrupted by sharing databases between engines with different storage > configs > - > > Key: PIO-96 > URL: https://issues.apache.org/jira/browse/PIO-96 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > When getting started with PredictionIO, it's no problem to spin up an engine > and see it work. Problems emerge when a developer tries running multiple > engines with different storage configs on the same underlying database, such > as: > * a Classifier with *Postgres* meta, event, & model storage, and > * the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & > model storage. > The database will become corrupt because the meta tables are stored in > different databases, but the dynamically created event & model tables may > mistakenly share the same name, like {{pio_event_1}}. > We are directing folks to avoid this problem with the Heroku buildpack by > [isolating each engine's > database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database] > and [optionally running an eventserver per > engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver]. > It's still a problem with local development, though. > It would be great if PredictionIO's management of the database schema's would > inherently avoid such conflicts, like by using random/UUIDs for dynamically > created table names, so that they will never conflict. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs
Mars Hall created PIO-96: Summary: Storage corrupted by sharing databases between engines with different storage configs Key: PIO-96 URL: https://issues.apache.org/jira/browse/PIO-96 Project: PredictionIO Issue Type: Bug Components: Core Affects Versions: 0.11.0-incubating Reporter: Mars Hall When getting started with PredictionIO, it's no problem to spin up an engine and see it work. Problems emerge when a developer tries running multiple engines with different storage configs on the same underlying database, such as: * a Classifier with *Postgres* meta, event, & model storage, and * the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & model storage. The database will become corrupt because the meta tables are stored in different databases, but the dynamically created event & storage tables may mistakenly share the same name, like {{pio_event_1}}. We are directing folks to avoid this problem with the Heroku buildpack by [isolating each engine's database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database] and [optionally running an eventserver per engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver]. It's still a problem with local development, though. It would be great if PredictionIO's management of the database schema's would inherently avoid such conflicts, like by using random/UUIDs for dynamically created table names, so that they will never conflict. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] incubator-predictionio pull request #394: [PIO-95] Extend request timeout fo...
Github user mars commented on a diff in the pull request: https://github.com/apache/incubator-predictionio/pull/394#discussion_r122527314 --- Diff: common/src/main/resources/application.conf --- @@ -9,3 +9,7 @@ spray.can { verbose-error-messages = "on" } } + +spray.can.server { + request-timeout = 35s +} --- End diff -- I updated my branch with this change, but for some reason this PR is not updating to reflect the [new commit](https://github.com/mars/incubator-predictionio/commits/extend-request-timeout). Assuming it's a Github glitch which will eventually fix itself. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #394: Extend request timeout for REST AP...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/394 Extend request timeout for REST API We've found the default 20-second REST API request timeout is too short for our batch-prediction use cases. We're running PredictionIO on Heroku which has its own [timeout starting at 30-seonds](https://devcenter.heroku.com/articles/limits#http-timeouts). So we'd prefer a more generous or easily configurable timeout to allow Heroku's routing layer to impose & track this limit in the platform layer. I investigated how to configure this and found [Spray `application.conf`](http://spray.io/documentation/1.2.4/spray-can/configuration/). This PR simply increases the timeout. I would love guidance on how we might extract this config into an environment variable or a value in `pio-env.sh`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio extend-request-timeout Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/394.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #394 commit 4b99967b47350e2f3ef25e505bd1f523680d7f64 Author: Mars Hall <m...@heroku.com> Date: 2017-06-16T18:59:10Z Extend request timeout for REST API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #393: Fix to show stacktrace for errors ...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/393 Fix to show stacktrace for errors thrown in `queries.json` REST API We were getting intractable errors from `queries.json` requests, like this one without a stacktrace: ``` [ERROR] [ServerActor] Query '{ "user": "000", "item": "000" }' is invalid. Reason: Expected object but got JNothing ``` This pull request adds stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio log-queries-stacktrace Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/393.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #393 commit f50980d6513c367657374988083e32039c454992 Author: Mars Hall <m...@heroku.com> Date: 2017-06-16T18:45:03Z Fix to show stacktrace for errors thrown in `queries.json` REST API --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PIO-94) Query parsing may throw intractable errors
[ https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-94: - Description: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: {code} [ERROR] [ServerActor] Query '{ "item": "000" }' is invalid. Reason: Expected object but got JNothing {code} To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 was: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: {{ [ERROR] [ServerActor] Query '{ "item": "000" }' is invalid. Reason: Expected object but got JNothing }} To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 > Query parsing may throw intractable errors > -- > > Key: PIO-94 > URL: https://issues.apache.org/jira/browse/PIO-94 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > We get intractable errors from some `queries.json` requests, like this one > without a stacktrace: > {code} > [ERROR] [ServerActor] Query '{ > "item": "000" > }' is invalid. Reason: Expected object but got JNothing > {code} > To solve, add stacktraces to these errors using the pattern already present > elsewhere in `CreateServer.scala`. > PR: https://github.com/apache/incubator-predictionio/pull/393 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIO-94) Query parsing may throw intractable errors
[ https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-94: - Description: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: {{ [ERROR] [ServerActor] Query '{ "user": "000", "item": "000" }' is invalid. Reason: Expected object but got JNothing }} To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 was: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: {{[ERROR] [ServerActor] Query '{ "user": "000", "item": "000" }' is invalid. Reason: Expected object but got JNothing}} To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 > Query parsing may throw intractable errors > -- > > Key: PIO-94 > URL: https://issues.apache.org/jira/browse/PIO-94 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > We get intractable errors from some `queries.json` requests, like this one > without a stacktrace: > {{ > [ERROR] [ServerActor] Query '{ > "user": "000", > "item": "000" > }' is invalid. Reason: Expected object but got JNothing > }} > To solve, add stacktraces to these errors using the pattern already present > elsewhere in `CreateServer.scala`. > PR: https://github.com/apache/incubator-predictionio/pull/393 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIO-94) Query parsing may throw intractable errors
[ https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-94: - Description: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: {{ [ERROR] [ServerActor] Query '{ "item": "000" }' is invalid. Reason: Expected object but got JNothing }} To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 was: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: [ERROR] [ServerActor] Query '{ "item": "000" }' is invalid. Reason: Expected object but got JNothing To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 > Query parsing may throw intractable errors > -- > > Key: PIO-94 > URL: https://issues.apache.org/jira/browse/PIO-94 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > We get intractable errors from some `queries.json` requests, like this one > without a stacktrace: > {{ > [ERROR] [ServerActor] Query '{ > "item": "000" > }' is invalid. Reason: Expected object but got JNothing > }} > To solve, add stacktraces to these errors using the pattern already present > elsewhere in `CreateServer.scala`. > PR: https://github.com/apache/incubator-predictionio/pull/393 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIO-94) Query parsing may throw intractable errors
[ https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mars Hall updated PIO-94: - Description: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: {{[ERROR] [ServerActor] Query '{ "user": "000", "item": "000" }' is invalid. Reason: Expected object but got JNothing}} To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 was: We get intractable errors from some `queries.json` requests, like this one without a stacktrace: [ERROR] [ServerActor] Query '{ "user": "000", "item": "000" }' is invalid. Reason: Expected object but got JNothing To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 > Query parsing may throw intractable errors > -- > > Key: PIO-94 > URL: https://issues.apache.org/jira/browse/PIO-94 > Project: PredictionIO > Issue Type: Bug > Components: Core >Affects Versions: 0.11.0-incubating >Reporter: Mars Hall > > We get intractable errors from some `queries.json` requests, like this one > without a stacktrace: > {{[ERROR] [ServerActor] Query '{ > "user": "000", > "item": "000" > }' is invalid. Reason: Expected object but got JNothing}} > To solve, add stacktraces to these errors using the pattern already present > elsewhere in `CreateServer.scala`. > PR: https://github.com/apache/incubator-predictionio/pull/393 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-94) Query parsing may throw intractable errors
Mars Hall created PIO-94: Summary: Query parsing may throw intractable errors Key: PIO-94 URL: https://issues.apache.org/jira/browse/PIO-94 Project: PredictionIO Issue Type: Bug Components: Core Affects Versions: 0.11.0-incubating Reporter: Mars Hall We get intractable errors from some `queries.json` requests, like this one without a stacktrace: [ERROR] [ServerActor] Query '{ "user": "000", "item": "000" }' is invalid. Reason: Expected object but got JNothing To solve, add stacktraces to these errors using the pattern already present elsewhere in `CreateServer.scala`. PR: https://github.com/apache/incubator-predictionio/pull/393 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PIO-72) In `pio-shell` jdbc.StorageClient cannot be loaded
Mars Hall created PIO-72: Summary: In `pio-shell` jdbc.StorageClient cannot be loaded Key: PIO-72 URL: https://issues.apache.org/jira/browse/PIO-72 Project: PredictionIO Issue Type: Bug Components: Core Affects Versions: 0.11.0-incubating Environment: local developer machines Reporter: Mars Hall Attachments: image.png Class loading/classpath is currently broken in {{pio-shell}}. Attached screenshot is the public docs that explain the intended functionality. Instead, users see errors when attempting to use storage classes: {code:title=pio-shell.error|borderStyle=solid} java.lang.ClassNotFoundException: jdbc.StorageClient at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:264) at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:228) at org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:254) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:215) at org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:215) at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189) at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91) at org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:215) at org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284) at org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:269) at org.apache.predictionio.data.storage.Storage$.getMetaDataApps(Storage.scala:387) at org.apache.predictionio.data.store.Common$.appsDb$lzycompute(Common.scala:27) at org.apache.predictionio.data.store.Common$.appsDb(Common.scala:27) at org.apache.predictionio.data.store.Common$.appNameToId(Common.scala:32) at org.apache.predictionio.data.store.PEventStore$.aggregateProperties(PEventStore.scala:108) at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31) at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38) at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:40) at $line20.$read$$iwC$$iwC$$iwC$$iwC.(:42) at $line20.$read$$iwC$$iwC$$iwC.(:44) at $line20.$read$$iwC$$iwC.(:46) at $line20.$read$$iwC.(:48) at $line20.$read.(:50) at $line20.$read$.(:54) at $line20.$read$.() at $line20.$eval$.(:7) at $line20.$eval$.() at $line20.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731
Re: New product manager: Sara Asher
Bravo Sara! PredictionIO is fortunate to have you!!! *Mars ( <> .. <> ) On Wed, May 17, 2017 at 09:00 Donald Szeto <don...@apache.org <mailto:don...@apache.org>> wrote: Hi all, The Project Management Committee (PMC) for Apache PredictionIO (incubating) has asked Sara Asher to become a product manager, and we are pleased to announce that she has accepted. Sara is a Director of Product Management for Salesforce Einstein, where she creates products that let people build smarter applications with Salesforce and advanced AI. Prior to Salesforce, Sara worked at Alpine Data where she was chief product manager and founding director of Alpine Labs. Sara holds an AB in mathematics from Princeton University and a PhD in mathematics from Northwestern University. Being a product manager enables management of JIRA tickets. This should make prioritizing product features more efficient. Please join us in welcoming Sara. Regards, Donald
Re: New product manager: Sara Asher
Bravo Sara! PredictionIO is fortunate to have you!!! On Wed, May 17, 2017 at 09:00 Donald Szetowrote: > Hi all, > > The Project Management Committee (PMC) for Apache PredictionIO > (incubating) has asked Sara Asher to become a product manager, and we are > pleased to announce that she has accepted. > > Sara is a Director of Product Management for Salesforce Einstein, where > she creates products that let people build smarter applications with > Salesforce and advanced AI. Prior to Salesforce, Sara worked at Alpine Data > where she was chief product manager and founding director of Alpine Labs. > Sara holds an AB in mathematics from Princeton University and a PhD in > mathematics from Northwestern University. > > Being a product manager enables management of JIRA tickets. This should > make prioritizing product features more efficient. > > Please join us in welcoming Sara. > > Regards, > Donald >
[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...
Github user mars commented on a diff in the pull request: https://github.com/apache/incubator-predictionio/pull/371#discussion_r113494975 --- Diff: storage/s3/build.sbt --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import PIOBuild._ + +name := "apache-predictionio-data-s3" + +libraryDependencies ++= Seq( + "org.apache.predictionio" %% "apache-predictionio-core" % version.value % "provided", + "com.google.guava"% "guava" % "14.0.1" % "provided", + "com.amazonaws" % "aws-java-sdk-s3" % "1.11.118", + "org.scalatest" %% "scalatest"% "2.1.7" % "test") + +parallelExecution in Test := false + +pomExtra := childrenPomExtra.value + +assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) + +assemblyShadeRules in assembly := Seq( + ShadeRule.rename("org.apache.http.**" -> "shadeio.data.s3.http.@1").inAll, + ShadeRule.rename("com.fasterxml.**" -> "shadeio.data.s3.fasterxml.@1").inAll +) --- End diff -- Hi @marevol, We're building a [predictionio-incubating branch to add authentication to the Elasticsearch REST client](https://github.com/apache/incubator-predictionio/pull/372) with Scala 2.11 & Spark 2.1. I believe it was during a runtime ES REST client call to `performRequest` suddenly an underlying method signature could not be located in `org.apache.http` package. @dszeto helped me to discover that the `shadeio.data` name was being incorrectly resolved. So, I tried [removing the shade rule](https://github.com/apache/incubator-predictionio/pull/372/files#diff-55cfeb297edd310e1efa8b6ac8bdbae6L39) and it started working. No more errors for that branch. Maybe the difference is that we're using Spark 2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #372: Elasticsearch basic HTTP authentic...
Github user mars commented on a diff in the pull request: https://github.com/apache/incubator-predictionio/pull/372#discussion_r112755776 --- Diff: storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala --- @@ -18,27 +18,66 @@ package org.apache.predictionio.data.storage.elasticsearch import org.apache.http.HttpHost +import org.apache.http.auth.{AuthScope, UsernamePasswordCredentials} +import org.apache.http.impl.client.BasicCredentialsProvider +import org.apache.http.impl.nio.client.HttpAsyncClientBuilder import org.apache.predictionio.data.storage.BaseStorageClient import org.apache.predictionio.data.storage.StorageClientConfig import org.apache.predictionio.data.storage.StorageClientException import org.elasticsearch.client.RestClient +import org.elasticsearch.client.RestClientBuilder.HttpClientConfigCallback import grizzled.slf4j.Logging -case class ESClient(hosts: Seq[HttpHost]) { +case class ESClient( +hosts: Seq[HttpHost], +basicAuth: Option[(String, String)] = None) { + def open(): RestClient = { try { - RestClient.builder(hosts: _*).build() + var builder = RestClient.builder(hosts: _*) + builder = basicAuth match { +case Some((username, password)) => builder.setHttpClientConfigCallback( + new BasicAuthProvider(username, password)) +case None => builder} + builder.build() } catch { case e: Throwable => throw new StorageClientException(e.getMessage, e) } } } -class StorageClient(val config: StorageClientConfig) extends BaseStorageClient -with Logging { +class StorageClient(val config: StorageClientConfig) + extends BaseStorageClient with Logging { + override val prefix = "ES" - val client = ESClient(ESUtils.getHttpHosts(config)) + val usernamePassword = ( +config.properties.get("USERNAME"), +config.properties.get("PASSWORD")) + val optionalBasicAuth: Option[(String, String)] = usernamePassword match { +case (Some(username), Some(password)) => Some(username, password) +case (Some(username), None) => Some(username, "") +case (None, Some(password)) => Some("", password) +case (None, None) => None} --- End diff -- Thanks for the feedback, @takezoe. Push this improvement now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio pull request #372: Elasticsearch basic HTTP authentic...
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio/pull/372 Elasticsearch basic HTTP authentication Add optional username-password configuration for the new Elasticsearch 5 client; in `pio-env.sh` config: ```bash # Optional basic HTTP auth PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret ``` These credentials are sent in each Elasticsearch request as an HTTP Basic Authorization header. Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on Heroku](https://elements.heroku.com/addons/bonsai). I'm looking into adding test coverage. (I have the Docker test suite setup now.) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio esclient-auth Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/372.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #372 commit 9f61541df44a5728450c3d25a79639e351e0ae6f Author: Mars Hall <m...@heroku.com> Date: 2017-04-19T18:00:36Z Fix classpath computation error introduced when "storage got refactored"â@dszeto commit 9ab99f6be9d0f018b3c900effe0be455f74f0046 Author: Mars Hall <m...@heroku.com> Date: 2017-04-19T18:37:18Z Optional Elasticsearch support for basic HTTP auth (username & password) using ES 5.3.0's "preemptive authentication" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [VOTE] Apache PredictionIO (incubating) 0.11.0 Release (RC2)
My non-binding vote for rc2! [X] +1, accept RC as the official 0.11.0 release [ ] -1, do not accept RC as the official 0.11.0 release because... *Mars ( <> .. <> ) > On Apr 9, 2017, at 21:29, Steven Yan <steven@salesforce.com> wrote: > > [X] +1, accept RC as the official 0.11.0 release > [ ] -1, do not accept RC as the official 0.11.0 release because... > > Thanks, > Steven > > On Sun, Apr 9, 2017 at 5:21 PM, Donald Szeto <don...@apache.org> wrote: > >> This is the vote for 0.11.0 of Apache PredictionIO (incubating). >> >> The vote will run for at least 72 hours and will close on Apr 12th, 2017. >> >> RC2 fixes a usage bug where the pio command does not pass through >> --driver-java-options to spark-submit. >> >> The release candidate artifacts can be downloaded here: >> https://dist.apache.org/repos/dist/dev/incubator/predictionio/0.11.0- >> incubating-rc2/ >> >> Test results of RC2 can be found here: >> https://travis-ci.org/apache/incubator-predictionio/builds/220381611 >> >> Maven artifacts are built from the release candidate artifacts above, and >> are provided as convenience for testing with engine templates. The Maven >> artifacts are provided at the Maven staging repo here: >> https://repository.apache.org/content/repositories/ >> orgapachepredictionio-1016/ >> >> All JIRAs completed for this release are tagged with 'FixVersion = >> 0.11.0-incubating'. You can view them here: >> https://issues.apache.org/jira/secure/ReleaseNote.jspa? >> projectId=12320420=12338381 >> >> The artifacts have been signed with Key : 8BF4ABEB >> >> Please vote accordingly: >> >> [ ] +1, accept RC as the official 0.11.0 release >> [ ] -1, do not accept RC as the official 0.11.0 release because... >>
[GitHub] incubator-predictionio-template-skeleton issue #5: Example Tests
Github user mars commented on the issue: https://github.com/apache/incubator-predictionio-template-skeleton/pull/5 I've been trying to support `DataSourceTest` using an in-memory H2 database. Unfortunately, that's currently failing with: ``` [info] DataSourceTest: [info] readTraining [info] - should return the data *** FAILED *** [info] java.lang.IllegalStateException: Connection pool is not yet initialized.(name:'default) [info] at scalikejdbc.ConnectionPool$$anonfun$get$1.apply(ConnectionPool.scala:57) [info] at scalikejdbc.ConnectionPool$$anonfun$get$1.apply(ConnectionPool.scala:55) [info] at scala.Option.getOrElse(Option.scala:120) [info] at scalikejdbc.ConnectionPool$.get(ConnectionPool.scala:55) [info] at scalikejdbc.ConnectionPool$.apply(ConnectionPool.scala:46) [info] at scalikejdbc.DB$.connectionPool(DB.scala:150) [info] at scalikejdbc.DB$.autoCommit(DB.scala:213) [info] at org.apache.predictionio.data.storage.jdbc.JDBCApps.(JDBCApps.scala:32) [info] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [info] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ``` â¦which seems to be caused by [this `DB autoCommit` call in the JDBC initializer](https://github.com/apache/incubator-predictionio/blob/release/0.10.0/data/src/main/scala/org/apache/predictionio/data/storage/jdbc/JDBCApps.scala#L32). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio-template-skeleton pull request #5: Example Tests
GitHub user mars opened a pull request: https://github.com/apache/incubator-predictionio-template-skeleton/pull/5 Example Tests This skeleton repo seems to be the authoritative starting point for creating a new engine. Since testing is a great way to improve collaboration and reliability, what do you think about including example tests in the skeleton? Here I implemented tests with a [ScalaTest](http://www.scalatest.org) suite which includes a mixin `SharedSingletonContext` to make a Spark context available as `SparkContext`. Each of the engine-defined classes now has a tiny passing test: `AlgorithmTest`, `EngineTest`, `PreparatorTest`, & `ServingTest`. `DataSourceTest` is the outlier and so currently tagged **ignore**. It's difficult to test in the skeleton, because it requires a database connection as well as database cleansing or transaction isolation between tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mars/incubator-predictionio-template-skeleton example-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio-template-skeleton/pull/5.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5 commit a6954e9bc7bd53fc47e5150f6132757e740b42a5 Author: Mars Hall <m...@heroku.com> Date: 2017-03-08T21:29:45Z Implement tests in template skeleton. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Remove engine registration
Hello folks, Great to hear about this possibility. I've been working on running PredictionIO on Heroku https://www.heroku.com Heroku's 12-factor architecture https://12factor.net prefers "stateless builds" to ensure that compiled artifacts result in processes which may be cheaply restarted, replaced, and scaled via process count & size. I imagine this stateless property would be valuable for others as well. The fact that `pio build` inserts stateful metadata into a database causes ripples throughout the lifecycle of PIO engines on Heroku: * An engine cannot be built for production without the production database available. When a production database contains PII (personally identifiable information) which has security compliance requirements, the build system may not be privileged to access that PII data. This also affects CI (continuous integration/testing), where engines would need to be rebuilt in production, defeating assurances CI is supposed to provide. * The build artifacts cannot be reliably reused. "Slugs" at Heroku are intended to be stateless, so that you can rollback to a previous version during the lifetime of an app. With `pio build` causing database side-effects, there's a greater-than-zero probability of slug-to-metadata inconsistencies eventually surfacing in a long-running system. From my user-perspective, a few changes to the CLI would fix it: 1. add a "skip registration" option, `pio build --without-engine-registration` 2. a new command `pio app register` that could be run separately in the built engine (before training) Alas, I do not know PredictionIO internals, so I can only offer a suggestion for how this might be solved. Donald, one specific note, Regarding "No automatic version matching of PIO binary distribution and artifacts version used in the engine template": The Heroku slug contains the PredictionIO binary distribution used to build the engine, so there's never a version matching issue. I guess some systems might deploy only the engine artifacts to production where a pre-existing PIO binary is available, but that seems like a risky practice for long-running systems. Thanks for listening, *Mars Hall Customer Facing Architect Salesforce App Cloud / Heroku San Francisco, California > On Sep 16, 2016, at 10:42, Donald Szeto <don...@apache.org> wrote: > > Hi all, > > I want to start the discussion of removing engine registration. How many > people actually take advantage of being able to run pio commands everywhere > outside of an engine template directory? This will be a nontrivial change on > the operational side so I want to gauge the potential impact to existing > users. > > Pros: > - Stateless build. This would work well with many PaaS. > - Eliminate the "pio build" command once and for all. > - Ability to use your own build system, i.e. Maven, Ant, Gradle, etc. > - Potentially better experience with IDE since engine templates no longer > depends on an SBT plugin. > > Cons: > - Inability to run pio engine training and deployment commands outside of > engine template directory. > - No automatic version matching of PIO binary distribution and artifacts > version used in the engine template. > - A less unified user experience: from pio-build-train-deploy to build, then > pio-train-deploy. > > Regards, > Donald