from:"mars"

[GitHub] incubator-predictionio issue #441: pio batchpredict error

2017-10-11 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/441
  
Also, this PR is from `develop` branch to `master`, but AFAIK this project 
does not use `master`. So, it seems to be committed directly to the mainline 
already.


---

[GitHub] incubator-predictionio issue #441: pio batchpredict error

2017-10-11 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/441
  
I do not understand what this PR does.

Does this pull request fix that SparkException or cause it?

Is this a problem using Spark 2.2 and `pio batchpredict`?

What engine template does this occur for?



---

Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release

2017-10-02 Thread Mars Hall

Awesome! The docs for Batch Predict are finally live:

https://predictionio.incubator.apache.org/batchpredict/

On Mon, Oct 2, 2017 at 3:57 PM, Donald Szeto <don...@apache.org> wrote:

> Mars, it's fixed now.
>
> Re: https://issues.apache.org/jira/browse/INFRA-15208
>
> On Mon, Oct 2, 2017 at 3:37 PM, Donald Szeto <don...@apache.org> wrote:
>
> > The build went through but the site is not reflecting the new version. I
> > will open a ticket against ASF Infra to take a look.
> >
> > On Mon, Oct 2, 2017 at 2:47 PM, Donald Szeto <don...@apache.org> wrote:
> >
> >> The doc build failed at Scaladoc: https://builds.apach
> >> e.org/job/PredictionIO-build-site/78/console
> >>
> >> And this has blocked the subsequent publish build. I'll just go ahead
> and
> >> disable Scaladoc generation for now to get the main site updated first.
> >>
> >> On Mon, Oct 2, 2017 at 12:59 PM, Mars Hall <mars.h...@salesforce.com>
> >> wrote:
> >>
> >>> Actually, I still don't see the updates for version 0.12.0.
> >>>
> >>> Why don't we see this "Batch Predictions" entry on the docs site?
> >>>
> >>> https://github.com/apache/incubator-predictionio/blob/develo
> >>> p/docs/manual/data/nav/main.yml#L65
> >>>
> >>> When I made that change locally, that entry did appear under "Deploying
> >>> an
> >>> Engine" navigation section, but it's still not on the docs site:
> >>>   https://predictionio.incubator.apache.org/deploy/
> >>>
> >>>
> >>> On Mon, Oct 2, 2017 at 12:52 PM, Mars Hall <mars.h...@salesforce.com>
> >>> wrote:
> >>>
> >>> > Thank you Chan!
> >>> >
> >>> > On Thu, Sep 28, 2017 at 9:02 AM, Chan Lee <chanlee...@gmail.com>
> >>> wrote:
> >>> >
> >>> >> My apologies. The doc site has been updated now.
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > *Mars Hall
> >>> > 415-818-7039 <(415)%20818-7039>
> >>> > Customer Facing Architect
> >>> > Salesforce Platform / Heroku
> >>> > San Francisco, California
> >>> >
> >>> >
> >>> > <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> *Mars Hall
> >>> 415-818-7039
> >>> Customer Facing Architect
> >>> Salesforce Platform / Heroku
> >>> San Francisco, California
> >>>
> >>>
> >>> <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>
> >>>
> >>
> >>
> >
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release

2017-10-02 Thread Mars Hall

Actually, I still don't see the updates for version 0.12.0.

Why don't we see this "Batch Predictions" entry on the docs site?

https://github.com/apache/incubator-predictionio/blob/develop/docs/manual/data/nav/main.yml#L65

When I made that change locally, that entry did appear under "Deploying an
Engine" navigation section, but it's still not on the docs site:
  https://predictionio.incubator.apache.org/deploy/


On Mon, Oct 2, 2017 at 12:52 PM, Mars Hall <mars.h...@salesforce.com> wrote:

> Thank you Chan!
>
> On Thu, Sep 28, 2017 at 9:02 AM, Chan Lee <chanlee...@gmail.com> wrote:
>
>> My apologies. The doc site has been updated now.
>>
>
>
>
> --
> *Mars Hall
> 415-818-7039 <(415)%20818-7039>
> Customer Facing Architect
> Salesforce Platform / Heroku
> San Francisco, California
>
>
> <http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California


<http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>

Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release

2017-10-02 Thread Mars Hall

Thank you Chan!

On Thu, Sep 28, 2017 at 9:02 AM, Chan Lee <chanlee...@gmail.com> wrote:

> My apologies. The doc site has been updated now.
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California


<http://smart.salesforce.com/sig/mars.hall//us_mb/default/link.html>

Re: [ANNOUNCE] Apache PredictionIO 0.12.0-incubating Release

2017-09-28 Thread Mars Hall

It seems the documentation site has not been updated for the release: the
new batch prediction page does not appear.

What is the process for updating the doc site?

On Thu, Sep 28, 2017 at 1:49 PM, takako shimamoto <chiboch...@gmail.com>
wrote:

> Awesome sauce!
> Chan, I owe you a lot. Thanks!
>
>
> 2017-09-28 12:45 GMT+09:00 Paritosh Piplewar <parit...@greentoe.com>:
>
>> congratulation. 
>>
>> Sent from my iPhone
>>
>> On 28-Sep-2017, at 3:10 AM, Chan Lee <chan...@apache.org> wrote:
>>
>> The Apache PredictionIO team would like to announce the release of Apache
>> PredictionIO 0.12.0-incubating.
>>
>> Release notes are here:
>> https://github.com/apache/incubator-predictionio/blob/releas
>> e/0.12.0/RELEASE.md
>>
>> Apache PredictionIO (incubating) is an open source Machine Learning Server
>> built on top of state-of-the-art open source stack, that enables
>> developers
>> to manage and deploy production-ready predictive services for various
>> kinds
>> of machine learning tasks.
>>
>> More details regarding Apache PredictionIO (incubating) can be found here:
>> http://predictionio.incubator.apache.org/
>>
>> The release artifacts can be downloaded here:
>> https://dist.apache.org/repos/dist/release/incubator/predict
>> ionio/0.12.0-incubating/
>>
>> All JIRAs completed for this release are tagged with 'FixVersion =
>> 0.12.0-incubating'; the JIRA release notes can be found here:
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?versi
>> on=12340591=12320420
>>
>> Thanks!
>> The Apache PredictionIO Team
>>
>> DISCLAIMER
>> Apache PredictionIO (incubating) is an effort undergoing incubation at the
>> Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
>> Incubation is required of all newly accepted projects until a further
>> review indicates that the infrastructure, communications, and decision
>> making process have stabilized in a manner consistent with other
>> successful
>> ASF projects. While incubation status is not necessarily a reflection of
>> the completeness or stability of the code, it does indicate that the
>> project has yet to be fully endorsed by the ASF.
>>
>>
>


-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Re: [VOTE] Resolution to create a TLP from graduating Incubator podling

2017-09-25 Thread Mars Hall

+1 binding

On Tue, Sep 26, 2017 at 12:57 Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> +1 (binding)
>
> > On Sep 25, 2017, at 8:50 PM, Donald Szeto <don...@apache.org> wrote:
> >
> > Hi all,
> >
> > Based on previous discussions (
> >
> https://lists.apache.org/thread.html/2b4ef7c394584988cf0c99920824afaa60ee4c648d5c0069b1bf55c0@%3Cdev.predictionio.apache.org%3E
> > and
> >
> https://lists.apache.org/thread.html/1b06e510773ee1d315728e0ce25f220c9cf7d9e8ad601ec9dba4fe1d@%3Cdev.predictionio.apache.org%3E
> ),
> > I would like to start a formal vote on graduating PredictionIO from an
> > Incubator podling to a top level project with the following resolution.
> > This thread will be forwarded to the Incubator general mailing list.
> >
> > Once again, Salesforce has already signed and executed an assignment
> > agreement to assign the PredictionIO mark to ASF.
> >
> > The graduation process we are following is described here:
> > http://incubator.apache.org/guides/graduation.html
> >
> > Once this vote passes, a discussion will be started on Incubator general,
> > followed by a vote when a consensus there would be arrived. The vote will
> > run for at least 72 hours before closing at 9PM PST on 9/28/2017.
> >
> > Thank you all! Let's graduate.
> >
> > +1 (binding) from me.
> >
> > Regards,
> > Donald
> >
> > -
> >
> >X. Establish the Apache PredictionIO Project
> >
> >   WHEREAS, the Board of Directors deems it to be in the best
> >   interests of the Foundation and consistent with the
> >   Foundation's purpose to establish a Project Management
> >   Committee charged with the creation and maintenance of
> >   open-source software, for distribution at no charge to
> >   the public, related to a machine learning server built on top of
> >   state-of-the-art open source stack, that enables developers to
> manage
> >   and deploy production-ready predictive services for various kinds
> of
> >   machine learning tasks.
> >
> >   NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> >   Committee (PMC), to be known as the "Apache PredictionIO Project",
> >   be and hereby is established pursuant to Bylaws of the
> >   Foundation; and be it further
> >
> >   RESOLVED, that the Apache PredictionIO Project be and hereby is
> >   responsible for the creation and maintenance of software
> >   related to a machine learning server built on top of
> >   state-of-the-art open source stack, that enables developers to
> manage
> >   and deploy production-ready predictive services for various kinds
> of
> >   machine learning tasks;
> >   and be it further
> >
> >   RESOLVED, that the office of "Vice President, Apache PredictionIO"
> be
> >   and hereby is created, the person holding such office to
> >   serve at the direction of the Board of Directors as the chair
> >   of the Apache PredictionIO Project, and to have primary
> > responsibility
> >   for management of the projects within the scope of
> >   responsibility of the Apache PredictionIO Project; and be it
> further
> >
> >   RESOLVED, that the persons listed immediately below be and
> >   hereby are appointed to serve as the initial members of the
> >   Apache PredictionIO Project:
> >
> > * Alex Merritt <emergentor...@apache.org>
> > * Andrew Kyle Purtell <apurt...@apache.org>
> > * Chan Lee <chan...@apache.org>
> > * Donald Szeto <don...@apache.org>
> > * Felipe Oliveira <fel...@apache.org>
> > * James Taylor <jtay...@apache.org>
> > * Justin Yip <yipjus...@apache.org>
> > * Kenneth Chan <kenn...@apache.org>
> > * Lars Hofhansl <la...@apache.org>
> > * Lee Moon Soo <m...@apache.org>
> > * Luciano Resende <lrese...@apache.org>
> > * Marcin Ziemiński <zie...@apache.org>
> > * Marco Vivero <mviv...@apache.org>
> > * Mars Hall <m...@apache.org>
> > * Matthew Tovbin <tovb...@apache.org>
> > * Naoki Takezoe <take...@apache.org>
> > * Pat Ferrel <p...@apache.org>
> > * Paul Li <pau...@apache.org>
> > * Shinsuke Sugaya <shins...@apache.org>
> > * Simon Chan <sim...@apache.org>
>

Re: [DISCUSS] Resolution to create a TLP from graduating Incubator podling

2017-09-22 Thread Mars Hall

Thank you for creating this resolution Donald.

I move that we start the vote, pending any additional feedback from the
group.

Best regards,

On Thu, Sep 21, 2017 at 12:19 PM, Andrew Purtell <apurt...@apache.org>
wrote:

> This looks great Donald, and I'm so glad you accepted the role of Chair.
>
> This part of the Special Order will establish the project description text
> which must appear at the top of every report to the Board:
>
> [...] software related to *a machine learning server built on top of
> state-of-the-art open source stack, that enables developers to manage and
> deploy production-ready predictive services for various kinds of machine
> learning tasks*
>
>
> It is in effect the Apache in-house elevator pitch to other projects and
> PMC or anyone reading the reports. This is the opportunity to improve this
> description, if desired. It could also be fine as-is.
>
>
> On Thu, Sep 21, 2017 at 10:29 AM, Donald Szeto <don...@apache.org> wrote:
>
> > Hi all,
> >
> > Based on the previous discussion (
> > https://lists.apache.org/thread.html/2b4ef7c394584988cf0c99920824af
> > aa60ee4c648d5c0069b1bf55c0@%3Cdev.predictionio.apache.org%3E),
> > I would like to start discussing a graduation resolution and reach a
> > consent before starting a community vote on the following. Please read
> > carefully the resolution, and voice any concerns you may have. If you
> are a
> > current PMC member, please make sure your name is listed unless you have
> > already asked to be excluded. We will start an official community vote
> when
> > a consent is reached.
> >
> > Regarding the PredictionIO trademark assignment, Salesforce has signed
> and
> > executed an assignment agreement, and is only pending ASF to countersign.
> >
> > The graduation process we are following is described here:
> > http://incubator.apache.org/guides/graduation.html
> >
> > Thank you all! Let's graduate.
> >
> > Regards,
> > Donald
> >
> > -
> >
> > X. Establish the Apache PredictionIO Project
> >
> >WHEREAS, the Board of Directors deems it to be in the best
> >interests of the Foundation and consistent with the
> >Foundation's purpose to establish a Project Management
> >Committee charged with the creation and maintenance of
> >open-source software, for distribution at no charge to
> >the public, related to a machine learning server built on top of
> >state-of-the-art open source stack, that enables developers to
> > manage
> >and deploy production-ready predictive services for various kinds
> of
> >machine learning tasks.
> >
> >NOW, THEREFORE, BE IT RESOLVED, that a Project Management
> >Committee (PMC), to be known as the "Apache PredictionIO Project",
> >be and hereby is established pursuant to Bylaws of the
> >Foundation; and be it further
> >
> >RESOLVED, that the Apache PredictionIO Project be and hereby is
> >responsible for the creation and maintenance of software
> >related to a machine learning server built on top of
> >state-of-the-art open source stack, that enables developers to
> > manage
> >and deploy production-ready predictive services for various kinds
> of
> >machine learning tasks;
> >and be it further
> >
> >RESOLVED, that the office of "Vice President, Apache PredictionIO"
> > be
> >and hereby is created, the person holding such office to
> >serve at the direction of the Board of Directors as the chair
> >of the Apache PredictionIO Project, and to have primary
> > responsibility
> >for management of the projects within the scope of
> >responsibility of the Apache PredictionIO Project; and be it
> further
> >
> >RESOLVED, that the persons listed immediately below be and
> >hereby are appointed to serve as the initial members of the
> >Apache PredictionIO Project:
> >
> >  * Alex Merritt <emergentor...@apache.org>
> >  * Andrew Kyle Purtell <apurt...@apache.org>
> >  * Chan Lee <chan...@apache.org>
> >  * Donald Szeto <don...@apache.org>
> >  * Felipe Oliveira <fel...@apache.org>
> >  * James Taylor <jtay...@apache.org>
> >  * Justin Yip <yipjus...@apache.org>
> >  * Kenneth Chan <kenn...@apache.org>
> >  * Lars Hofhansl &l

Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC3)

2017-09-18 Thread Mars Hall

+1 binding

I checked:
- build, train, deploy, & batchpredict
- complete Elasticsearch 5.x functionality
- Binaries work directly for Heroku deployment

Such an exciting release! Thank you Chan,

*Mars


On Sun, Sep 17, 2017 at 11:31 AM, Chan Lee <chanlee...@gmail.com> wrote:
>
> > This is the vote for 0.12.0 of Apache PredictionIO (incubating).
> >
> > The vote will run for at least 72 hours and will close on Sep 20th, 2017.
> >
> > The release candidate artifacts can be downloaded here:
> > https://dist.apache.org/repos/dist/dev/incubator/predictionio/0.12.0-
> > incubating-rc3
> >
> > Test results of RC3 can be found here:
> > https://travis-ci.org/apache/incubator-predictionio/builds/276558626
> >
> > Maven artifacts are built from the release candidate artifacts above, and
> > are provided as convenience for testing with engine templates. The Maven
> > artifacts are provided at the Maven staging repo here:
> > https://repository.apache.org/content/repositories/
> > orgapachepredictionio-1021/
> >
> > All JIRAs completed for this release are tagged with 'FixVersion =
> > 0.12.0-incubating'. You can view them here: https://issues.apache.or
> > g/jira/secure/ReleaseNote.jspa?version=12340591=12320420
> >
> > The artifacts have been signed with Key: ytX8GpWv
> >
> > Please vote accordingly:
> >
> > [ ] +1, accept RC as the official 0.12.0 release
> > [ ] -1, do not accept RC as the official 0.12.0 release because...
> >
>

[GitHub] incubator-predictionio pull request #435: Revise release notes: clarify brea...

2017-09-18 Thread mars

Github user mars closed the pull request at:

https://github.com/apache/incubator-predictionio/pull/435


---

[GitHub] incubator-predictionio issue #435: Revise release notes: clarify breaking ch...

2017-09-18 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/435
  
Merged to 
[release-0.12.0](https://github.com/apache/incubator-predictionio/tree/release/0.12.0)
 by @chanlee514 


---

Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC2)

2017-09-15 Thread Mars Hall

Sorry I didn't catch this before RC1. Thank you Chan.

On Fri, Sep 15, 2017 at 12:35 PM, Chan Lee <chanlee...@gmail.com> wrote:

> Mars, this seems like a necessary change, so I'll create RC3 with the PR
> and additional updates here: http://predictionio.incubator.apache.org/
> resources/upgrade/
>
> If there are changes anyone would like to add, please let me know by today.
> I'll patch up another release tonight and send out a new email.
>
> Thanks,
> Chan
>
>
> On Fri, Sep 15, 2017 at 11:54 AM, Donald Szeto <don...@apache.org> wrote:
>
> > Votes are tied to tag/commit by ASF convention, so a new RC and vote will
> > be required.
> >
> > On Fri, Sep 15, 2017 at 10:57 AM Mars Hall <mars.h...@salesforce.com>
> > wrote:
> >
> > > I just opened a release notes PR against apache:release/0.12.0, because
> > > that seems to be the right place.
> > >
> > > Chan, will that work okay with the release process?
> > >
> > > On Fri, Sep 15, 2017 at 10:20 AM, Mars Hall <mars.h...@salesforce.com>
> > > wrote:
> > >
> > > > Also, I'd love to directly link the PIO-* issue numbers to JIRA.
> > > >
> > > > On Fri, Sep 15, 2017 at 10:19 AM, Mars Hall <
> mars.h...@salesforce.com>
> > > > wrote:
> > > >
> > > >> RC2 is working perfectly.
> > > >>
> > > >> I see a few issues with the releases notes:
> > > >>
> > > >>
> > > >>- PIO-95 should be "Raised request timeout for REST API to
> > > 35-seconds"
> > > >>- PIO-102, PIO-106, PIO-117, PIO-118, PIO-120 actually includes a
> > > >>breaking change to Elasticsearch 5.x StorageClient interface. I
> > > think these
> > > >>should be enumerated more explicitly with one of them called out
> > in a
> > > >>"Breaking changes" section.
> > > >>
> > > >> May I revise RELEASE.md on develop to fix these issues? Does that
> > > require
> > > >> restarting vote for an RC3?
> > > >>
> > > >>
> > > >> On Thu, Sep 14, 2017 at 11:49 PM, Donald Szeto <don...@apache.org>
> > > wrote:
> > > >>
> > > >>> I believe those are fixed by PIO-60, PIO-62 and PIO-63 in the
> release
> > > >>> notes.
> > > >>>
> > > >>> +1 binding from me
> > > >>>
> > > >>> On Thu, Sep 14, 2017 at 2:13 PM Pat Ferrel <p...@occamsmachete.com>
> > > >>> wrote:
> > > >>>
> > > >>> > The last release was hung up by the IPMC regarding content
> > licensing
> > > >>> > issues and libraries used by the doc site, which we promised to
> > > >>> address in
> > > >>> > this release. Have these been resolved, don’t recall the
> specifics?
> > > It
> > > >>> > would be great to fly through the IPMC vote without issue.
> > > >>> >
> > > >>> >
> > > >>> > On Sep 14, 2017, at 2:06 PM, Chan Lee <chanlee...@gmail.com>
> > wrote:
> > > >>> >
> > > >>> > This is the vote for 0.12.0 of Apache PredictionIO (incubating).
> > > >>> >
> > > >>> > The vote will run for at least 72 hours and will close on Sep
> 17th,
> > > >>> 2017.
> > > >>> >
> > > >>> > The release candidate artifacts can be downloaded here:
> > > >>> > https://dist.apache.org/repos/dist/dev/incubator/predi
> > > >>> > ctionio/0.12.0-incubating-rc2
> > > >>> >
> > > >>> > Test results of RC1 can be found here: https://travis-ci.org/ap
> > > >>> > ache/incubator-predictionio/builds/275634960
> > > >>> >
> > > >>> > Maven artifacts are built from the release candidate artifacts
> > above,
> > > >>> and
> > > >>> > are provided as convenience for testing with engine templates.
> The
> > > >>> Maven
> > > >>> > artifacts are provided at the Maven staging repo here:
> > > >>> >
> > > >>> > https://repository.apache.org/content/repositories/orgapache
> > > >>> predictionio-1020
> > > >>> >
> > > >>> > All JIRAs completed for this release are tagged with 'FixVersion
> =
> > > >>> > 0.12.0-incubating'. You can view them here:
> > https://issues.apache.or
> > > >>> > g/jira/secure/ReleaseNote.jspa?version=12340591&
> projectId=12320420
> > > >>> >
> > > >>> > The artifacts have been signed with Key: ytX8GpWv
> > > >>> >
> > > >>> > Please vote accordingly:
> > > >>> >
> > > >>> > [ ] +1, accept RC as the official 0.12.0 release
> > > >>> > [ ] -1, do not accept RC as the official 0.12.0 release
> because...
> > > >>> >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *Mars Hall
> > > >> 415-818-7039 <(415)%20818-7039>
> > > >> Customer Facing Architect
> > > >> Salesforce Platform / Heroku
> > > >> San Francisco, California
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > *Mars Hall
> > > > 415-818-7039 <(415)%20818-7039>
> > > > Customer Facing Architect
> > > > Salesforce Platform / Heroku
> > > > San Francisco, California
> > > >
> > >
> > >
> > >
> > > --
> > > *Mars Hall
> > > 415-818-7039
> > > Customer Facing Architect
> > > Salesforce Platform / Heroku
> > > San Francisco, California
> > >
> >
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC2)

2017-09-15 Thread Mars Hall

I just opened a release notes PR against apache:release/0.12.0, because
that seems to be the right place.

Chan, will that work okay with the release process?

On Fri, Sep 15, 2017 at 10:20 AM, Mars Hall <mars.h...@salesforce.com>
wrote:

> Also, I'd love to directly link the PIO-* issue numbers to JIRA.
>
> On Fri, Sep 15, 2017 at 10:19 AM, Mars Hall <mars.h...@salesforce.com>
> wrote:
>
>> RC2 is working perfectly.
>>
>> I see a few issues with the releases notes:
>>
>>
>>- PIO-95 should be "Raised request timeout for REST API to 35-seconds"
>>- PIO-102, PIO-106, PIO-117, PIO-118, PIO-120 actually includes a
>>breaking change to Elasticsearch 5.x StorageClient interface. I think 
>> these
>>should be enumerated more explicitly with one of them called out in a
>>"Breaking changes" section.
>>
>> May I revise RELEASE.md on develop to fix these issues? Does that require
>> restarting vote for an RC3?
>>
>>
>> On Thu, Sep 14, 2017 at 11:49 PM, Donald Szeto <don...@apache.org> wrote:
>>
>>> I believe those are fixed by PIO-60, PIO-62 and PIO-63 in the release
>>> notes.
>>>
>>> +1 binding from me
>>>
>>> On Thu, Sep 14, 2017 at 2:13 PM Pat Ferrel <p...@occamsmachete.com>
>>> wrote:
>>>
>>> > The last release was hung up by the IPMC regarding content licensing
>>> > issues and libraries used by the doc site, which we promised to
>>> address in
>>> > this release. Have these been resolved, don’t recall the specifics? It
>>> > would be great to fly through the IPMC vote without issue.
>>> >
>>> >
>>> > On Sep 14, 2017, at 2:06 PM, Chan Lee <chanlee...@gmail.com> wrote:
>>> >
>>> > This is the vote for 0.12.0 of Apache PredictionIO (incubating).
>>> >
>>> > The vote will run for at least 72 hours and will close on Sep 17th,
>>> 2017.
>>> >
>>> > The release candidate artifacts can be downloaded here:
>>> > https://dist.apache.org/repos/dist/dev/incubator/predi
>>> > ctionio/0.12.0-incubating-rc2
>>> >
>>> > Test results of RC1 can be found here: https://travis-ci.org/ap
>>> > ache/incubator-predictionio/builds/275634960
>>> >
>>> > Maven artifacts are built from the release candidate artifacts above,
>>> and
>>> > are provided as convenience for testing with engine templates. The
>>> Maven
>>> > artifacts are provided at the Maven staging repo here:
>>> >
>>> > https://repository.apache.org/content/repositories/orgapache
>>> predictionio-1020
>>> >
>>> > All JIRAs completed for this release are tagged with 'FixVersion =
>>> > 0.12.0-incubating'. You can view them here: https://issues.apache.or
>>> > g/jira/secure/ReleaseNote.jspa?version=12340591=12320420
>>> >
>>> > The artifacts have been signed with Key: ytX8GpWv
>>> >
>>> > Please vote accordingly:
>>> >
>>> > [ ] +1, accept RC as the official 0.12.0 release
>>> > [ ] -1, do not accept RC as the official 0.12.0 release because...
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> *Mars Hall
>> 415-818-7039 <(415)%20818-7039>
>> Customer Facing Architect
>> Salesforce Platform / Heroku
>> San Francisco, California
>>
>
>
>
> --
> *Mars Hall
> 415-818-7039 <(415)%20818-7039>
> Customer Facing Architect
> Salesforce Platform / Heroku
> San Francisco, California
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

[GitHub] incubator-predictionio pull request #435: Revise release notes: clarify brea...

2017-09-15 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/435

Revise release notes: clarify breaking changes; link issue IDs to JIRA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/435.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #435


commit cbe7063c869611f7ece0613a16cf0bebf22e404e
Author: Mars Hall <m...@users.noreply.github.com>
Date:   2017-09-15T17:53:02Z

Revise release notes: clarify breaking changes; link issue IDs to JIRA




---

[jira] [Updated] (PIO-95) Raise request timeout for REST API

2017-09-15 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-95:
-
Summary: Raise request timeout for REST API  (was: Configurable request 
timeout for REST API)

> Raise request timeout for REST API
> --
>
> Key: PIO-95
> URL: https://issues.apache.org/jira/browse/PIO-95
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
> Fix For: 0.12.0-incubating
>
>
> We've found the default 20-second REST API request timeout is too short for 
> our batch-prediction use cases. We're running PredictionIO on Heroku which 
> has its own [timeout starting at 
> 30-seconds|https://devcenter.heroku.com/articles/limits#http-timeouts]. So 
> we'd prefer a more generous or easily configurable timeout to allow Heroku's 
> routing layer to impose & track this limit in the platform layer.
> I investigated how to configure this and found [Spray 
> `application.conf`|http://spray.io/documentation/1.2.4/spray-can/configuration/].
>  This PR simply increases the timeout.
> I would love guidance on how we might extract this config into an environment 
> variable or a value in `pio-env.sh`.
> Investigation / implementation PR: 
> https://github.com/apache/incubator-predictionio/pull/394



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [VOTE] Apache PredictionIO (incubating) 0.12.0 Release (RC2)

2017-09-15 Thread Mars Hall

RC2 is working perfectly.

I see a few issues with the releases notes:


   - PIO-95 should be "Raised request timeout for REST API to 35-seconds"
   - PIO-102, PIO-106, PIO-117, PIO-118, PIO-120 actually includes a
   breaking change to Elasticsearch 5.x StorageClient interface. I think these
   should be enumerated more explicitly with one of them called out in a
   "Breaking changes" section.

May I revise RELEASE.md on develop to fix these issues? Does that require
restarting vote for an RC3?


On Thu, Sep 14, 2017 at 11:49 PM, Donald Szeto <don...@apache.org> wrote:

> I believe those are fixed by PIO-60, PIO-62 and PIO-63 in the release
> notes.
>
> +1 binding from me
>
> On Thu, Sep 14, 2017 at 2:13 PM Pat Ferrel <p...@occamsmachete.com> wrote:
>
> > The last release was hung up by the IPMC regarding content licensing
> > issues and libraries used by the doc site, which we promised to address
> in
> > this release. Have these been resolved, don’t recall the specifics? It
> > would be great to fly through the IPMC vote without issue.
> >
> >
> > On Sep 14, 2017, at 2:06 PM, Chan Lee <chanlee...@gmail.com> wrote:
> >
> > This is the vote for 0.12.0 of Apache PredictionIO (incubating).
> >
> > The vote will run for at least 72 hours and will close on Sep 17th, 2017.
> >
> > The release candidate artifacts can be downloaded here:
> > https://dist.apache.org/repos/dist/dev/incubator/predi
> > ctionio/0.12.0-incubating-rc2
> >
> > Test results of RC1 can be found here: https://travis-ci.org/ap
> > ache/incubator-predictionio/builds/275634960
> >
> > Maven artifacts are built from the release candidate artifacts above, and
> > are provided as convenience for testing with engine templates. The Maven
> > artifacts are provided at the Maven staging repo here:
> >
> > https://repository.apache.org/content/repositories/
> orgapachepredictionio-1020
> >
> > All JIRAs completed for this release are tagged with 'FixVersion =
> > 0.12.0-incubating'. You can view them here: https://issues.apache.or
> > g/jira/secure/ReleaseNote.jspa?version=12340591=12320420
> >
> > The artifacts have been signed with Key: ytX8GpWv
> >
> > Please vote accordingly:
> >
> > [ ] +1, accept RC as the official 0.12.0 release
> > [ ] -1, do not accept RC as the official 0.12.0 release because...
> >
> >
>



-- 
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

[jira] [Created] (PIO-121) Authentication for Engine's HTTP API

2017-09-12 Thread Mars Hall (JIRA)

Mars Hall created PIO-121:
-

 Summary: Authentication for Engine's HTTP API
 Key: PIO-121
 URL: https://issues.apache.org/jira/browse/PIO-121
 Project: PredictionIO
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.12.0-incubating
Reporter: Mars Hall


PredictionIO already supports key-based authentication for accessing the 
{{/events.json}} API, but is missing any type of auth for the {{/queries.json}} 
API and {{/}} status page.

Comprehensive authentication would simplify deployment to cloud platforms by 
eliminating the current requirement to deploy on a private network in order to 
prevent public access.

As a first step, adding key-based auth to the Engine APIs that matches the 
Eventserver API {{accessKey}} behavior would be a huge step forward.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: JIRAs to include in 0.12 Release

2017-09-11 Thread Mars Hall

Wow, so many resolved issues, so much progress! Thank you for sharing this
list Chan.

[PIO-120] Process hangs if Elasticsearch is not available during train
(pending)
  https://issues.apache.org/jira/browse/PIO-120

Would one of you folks review this simple fix PR? The, I'll merge it to
make 0.12.0.
  https://github.com/apache/incubator-predictionio/pull/432

*Mars

[GitHub] incubator-predictionio issue #432: [PIO-120] Process hangs if Elasticsearch ...

2017-09-07 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/432
  
Would be great to have this included in 0.12.0 release.


---

[GitHub] incubator-predictionio pull request #432: [PIO-120] Process hangs if Elastic...

2017-09-07 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/432

[PIO-120] Process hangs if Elasticsearch is not available during train

Fixes [PIO-120](https://issues.apache.org/jira/browse/PIO-120)

This changeset ensures that the process exits gracefully after ES 
connection error.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio 
fix-es-hang-on-train

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #432


commit f1c7337e246c9bd2bed5cc080efcf3dc81e4b055
Author: Mars Hall <m...@heroku.com>
Date:   2017-09-07T21:38:46Z

Graceful exit after ES connection error during train.




---

[GitHub] incubator-predictionio issue #428: [PIO-117] Cannot delete event data on ESL...

2017-09-07 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/428
  
ð  looks good


---

[GitHub] incubator-predictionio issue #430: [PIO-119] Bump up Elasticsearch to 5.5.2

2017-09-07 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/430
  
Just tested build, train, batchpredict, & deploy locally with ES 5.5.2.

ð  looks good!


---

[jira] [Updated] (PIO-120) Process hangs if Elasticsearch is not available during train

2017-09-07 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-120:
--
External issue URL: 
https://github.com/apache/incubator-predictionio/pull/432

> Process hangs if Elasticsearch is not available during train
> 
>
> Key: PIO-120
> URL: https://issues.apache.org/jira/browse/PIO-120
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.12.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> I noticed that, when Elasticsearch is configured as meta storage, `pio train` 
> will hang with the following error unless Elasticsearch is on-line/available:
> {code}
> Exception in thread "main" java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171)
>   at 
> org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145)
>   at 
> org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
>   at 
> org.apache.predictionio.shaded.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
>   at 
> org.apache.predictionio.shaded.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIO-120) Process hangs if Elasticsearch is not available during train

2017-09-07 Thread Mars Hall (JIRA)

Mars Hall created PIO-120:
-

 Summary: Process hangs if Elasticsearch is not available during 
train
 Key: PIO-120
 URL: https://issues.apache.org/jira/browse/PIO-120
 Project: PredictionIO
  Issue Type: Bug
  Components: Core
Affects Versions: 0.12.0-incubating
Reporter: Mars Hall
Assignee: Mars Hall


I noticed that, when Elasticsearch is configured as meta storage, `pio train` 
will hang with the following error unless Elasticsearch is on-line/available:

{code}
Exception in thread "main" java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:171)
at 
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:145)
at 
org.apache.predictionio.shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
at 
org.apache.predictionio.shaded.org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
at 
org.apache.predictionio.shaded.org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.run(Thread.java:745)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] incubator-predictionio issue #401: [PIO-72] Fix class loading for pio-shell

2017-09-06 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/401
  
Yes @BrianOn99, I do believe [class loading for pio-shell is 
fixed](https://github.com/apache/incubator-predictionio/blob/develop/bin/pio-shell#L59)
 for the next release, or if you `make-distribution.sh` on main `develop` 
branch, you'll get these fixes now.


---

Re: Graduation to TLP

2017-09-05 Thread Mars Hall

Please continue to include me as a committer and PMC member.

After some deliberation, I cannot take on further responsibility as VP at
this time.

*Mars


On Tue, Sep 5, 2017 at 10:32 AM, Donald Szeto <don...@apache.org> wrote:

> Thanks for the clarification Pat! It always help to have Apache veterans to
> provide historical context to these processes.
>
> As for me, I'd like to remain as PMC and committer.
>
> I like the idea of polling the current committers and PMC, but like you
> said, most of them got pretty busy and may not be reading mailing list in a
> while. Maybe let me try a shout out here and see if anyone would
> acknowledge it, so that we know whether a poll will be effective.
>
> *>> If you're a PMC or committer who see this line but hasn't been replying
> this thread, please acknowledge. <<*
>
> Regarding the maturity model, this is my perception right now:
> - CD10, CD20, CD30, CD40 (and we start to have CD50 as well)
> - LC10, LC20, LC30, LC40, LC50
> - RE10, RE20, RE30, RE50 (I think we hope to also do RE40 with 0.12)
> - QU10, QU30, QU40, QU50 (we should put a bit of focus to QU20)
> - CO10, CO20, CO30, CO40, CO60, CO70 (for CO50, I think we've been
> operating under the assumption that PMC and contributors are pretty
> standard definitions by ASF. We can call those out explicitly.)
> - CS10, CS50 (We are also assuming implicitly CS20, CS30, and CS40 from
> main ASF doc)
> - IN10, IN20
>
> Let me know what you think.
>
> On Fri, Sep 1, 2017 at 10:32 AM, Pat Ferrel <p...@occamsmachete.com> wrote:
>
> > The Chair, PMC, and Committers may be different after graduation.
> > PMC/committers are sometimes not active committers but can have a
> valuable
> > role as mentors, in non-technical roles, as support people on the mailing
> > list, or as sometimes committers who don’t seem very active but come in
> > every so often to make a key contribution. So I hope this doesn’t become
> a
> > time to prune too deeply. I’d suggest we only do that if one of the
> > committers has done something to lessen our project maturity or wants to
> be
> > left out for their own reasons. An example of bad behavior is someone
> > trying to exert corporate dominance (which is severely frowned on by the
> > ASF). Another would be someone who is disruptive to the point of
> destroying
> > team effectiveness. I personally haven’t seen any of this but purposely
> > don’t read everything so chime in here.
> >
> > It would be good to have people declare their interest-level. As for me,
> > I’d like to remain on the PMC as a committer but have no interest in
> Chair.
> > Since people can become busy periodically and not read @dev (me?) we
> could,
> > maybe should, poll the current committers and PMC to get the lists ready
> > for the graduation proposal.
> >
> >
> > Don’t forget that we are not just asking for dev community opinion about
> > graduation. We are also asking that people check things like the Maturity
> > Checklist to see it we are ready. http://community.apache.org/
> > apache-way/apache-project-maturity-model.html <
> > http://community.apache.org/apache-way/apache-project-
> maturity-model.html>
> > People seem fairly enthusiastic about applying for graduation, but are
> > there things we need to do before hand? The goal is to show that we do
> not
> > require the second level check for decisions that the IPMC provides. The
> > last release required no changes but had a proviso about content
> licenses.
> > This next release should fly through without provisos IMHO. Are there
> other
> > things we should do?
> >
> >
> > On Sep 1, 2017, at 6:16 AM, takako shimamoto <chiboch...@gmail.com>
> wrote:
> >
> > I entirely agree with everyone else.
> > I hope the PIO community will become more active after graduation.
> >
> > > 2. If we are to graduate, who should we include in the list of the
> > initial
> > > PMC?
> >
> > Don't all present IPMC members are included in the list of the initial
> PMC?
> >
> > Personally, I think we may as well check and see if present IPMC
> > members intend to become an initial PMC for graduation.
> > Members who make a declaration of intent to become it will surely
> > contribute to the project.
> > It is a great contribution not only to develop a program but also to
> > respond to email aggressively or fix document.
> >
> >
> > 2017-08-29 14:20 GMT+09:00 Donald Szeto <don...@apache.org>:
> > > Hi all,
> > >
> > > Since the ASF Board meeting in May (
> > > htt

[GitHub] incubator-predictionio issue #401: [PIO-72] Fix class loading for pio-shell

2017-09-05 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/401
  
Hi @BrianOn99,

Adding that `--jars` option to `pio-shell` command is the right solution, 
and then the "No suitable driver found" error can be solved by adding the 
Postgres driver to your PredictionIO install:

1. download [Postgres JDBC 
driver](https://jdbc.postgresql.org/download.html) (probably the newest one for 
Java 8)
2. put it in the PredictionIO distribution's `lib/` directory (this 
directory is sibling to the `bin/` directory where the `pio` command is 
located; any jars in that directory are automatically added to the classpath 
for `pio` commands)

We're working on releasing 0.12!




---

Re: Graduation to TLP

2017-08-30 Thread Mars Hall

Thank you Donald for leading the charge here,

>From my perspective PredictionIO is already Apache in process & title.
Graduation seems quite natural to reach top-level recognition.

I'm interested in helping with PMC duties. Would be great to understand
what the VP vs Member responsibilities look like.

Let's graduate. +1

*Mars


On Wed, Aug 30, 2017 at 15:21 Pat Ferrel <p...@occamsmachete.com> wrote:

> I have had several people tell me they want to wait until PIO is not
> incubating before using it. This even after explaining that “incubating”
> has more to do with getting into the Apache Way of doing things and has no
> direct link to quality or community. I can only conclude from this that
> “incubating” is holding back adoption.
>
> And yet we have absorbed the Apache Way and will have at least 3 releases
> (including 12) a incubating. We have brought in a fair number of new
> committers and seem to have a healthy community of users.
>
> +1 for a push to graduate.
>
>
> On Aug 28, 2017, at 10:20 PM, Donald Szeto <don...@apache.org> wrote:
>
> Hi all,
>
> Since the ASF Board meeting in May (
>
> http://apache.org/foundation/records/minutes/2017/board_minutes_2017_05_17.txt
> ),
> PredictionIO has been considered nearing graduation and I think we are
> almost there. I am kickstarting this thread so that we can discuss on these
> 3 things:
>
> 1. Does the development community feel ready to graduate?
> 2. If we are to graduate, who should we include in the list of the initial
> PMC?
> 3. If we are to graduate, who should be the VP of the initial PMC?
>
> These points are relevant for graduation. Please take a look at the
> official graduation guide:
> http://incubator.apache.org/guides/graduation.html.
>
> In addition, Sara and I have been working to transfer the PredictionIO
> trademark to the ASF. We will keep you updated with our progress.
>
> I would also like to propose to cut a 0.12.0 release by merging JIRAs that
> have a target version set to 0.12.0-incubating for graduation. 0.12.0 will
> contain cleanups for minor license and copyright issues that were pointed
> out in previous releases by IPMC.
>
> Let me know what you think.
>
> Regards,
> Donald
>
> --
*Mars Hall
415-818-7039
Customer Facing Architect
Salesforce Platform / Heroku
San Francisco, California

[jira] [Resolved] (PIO-115) Cache name-to-ID lookups for Storage app & channel

2017-08-29 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall resolved PIO-115.
---
Resolution: Fixed

> Cache name-to-ID lookups for Storage app & channel
> --
>
> Key: PIO-115
> URL: https://issues.apache.org/jira/browse/PIO-115
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> When stress testing the Universal Recommender with high-concurrency HTTP/REST 
> queries, we observed that Elasticsearch traffic was majority composed of 
> requests resolving the Storage app's name & channel, over and over and over 
> again! In this case, [each per-query call to 
> `LEventStore.findByEntity`|https://github.com/heroku/predictionio-engine-ur/blob/master/src/main/scala/URAlgorithm.scala#L694]
>  re-resolves the app name to an ID.
> Implement memoization for the function that performs these name-to-ID 
> lookups, so that only one set of lookups is performed per process for each 
> app+channel combination.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication

2017-08-29 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall resolved PIO-114.
---
Resolution: Fixed

> Elasticsearch 5.x StorageClient basic HTTP authentication
> -
>
> Key: PIO-114
> URL: https://issues.apache.org/jira/browse/PIO-114
> Project: PredictionIO
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> Add optional username-password configuration for the new Elasticsearch 5 
> client; in {{conf/pio-env.sh}} config:
> {code}
> # Optional basic HTTP auth
> PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
> PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
> {code}
> These credentials are sent in each Elasticsearch request as an HTTP Basic 
> Authorization header.
> Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai 
> on Heroku](https://elements.heroku.com/addons/bonsai).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient

2017-08-29 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall resolved PIO-106.
---
Resolution: Fixed

> Elasticsearch 5.x StorageClient should reuse RestClient
> ---
>
> Key: PIO-106
> URL: https://issues.apache.org/jira/browse/PIO-106
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> When using the proposed [PIO-105 Batch 
> Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an 
> engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's 
> REST interface appears to become overloaded, ending with the Spark job being 
> killed from errors like:
> {noformat}
> [ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search
> [ERROR] [Utils] Aborting task
> [ERROR] [ESApps] Failed to access to /pio_meta/apps/_search
> [ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749)
> [ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737)
> [ERROR] [Common$] Invalid app name ur
> [ERROR] [Utils] Aborting task
> [ERROR] [URAlgorithm] Error when read recent events: 
> java.lang.IllegalArgumentException: Invalid app name ur
> [ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751)
> [ERROR] [Utils] Aborting task
> [ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750)
> [WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, 
> executor driver): java.net.BindException: Can't assign requested address
>   at sun.nio.ch.Net.connect0(Native Method)
>   at sun.nio.ch.Net.connect(Net.java:454)
>   at sun.nio.ch.Net.connect(Net.java:446)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>   at 
> org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273)
>   at 
> org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139)
>   at 
> org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
>   at 
> org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
>   at 
> org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> After these errors happen & the job is killed, Elasticsearch immediately 
> recovers. It responds to queries normally. I researched what could cause this 
> and found an [old issue in the main Elasticsearch 
> repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints 
> given therein about *using keep-alive in the ES client* to avoid these 
> performance issues, I investigated how PredictionIO's [Elasticsearch 
> StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch]
>  manages its connections.
> I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), 
> Elasticsearch creates a new underlying connection, an Elasticsearch 
> RestClient, for 
> [every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80]
>  
> [single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157]
>  
> [query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78]
>  & 
> [interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205]
>  with its API. As a result, *there is no way Elasticsearch TCP connections 
> can be reused via HTTP keep-alive*.
> High-performance workloads with Elasticsearch 5.x will suffer from these 
> issues unless we refactor Elasticsearch StorageClient to share the underlying 
> RestClient instead of [building a new one everytime the client is 
> used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31].
> There are certainly different approaches we could take to shari

[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...

2017-08-29 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/421
  
I will resolve these conflicts today and then merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #425: [PIO-110] Refactoring

2017-08-25 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/425
  
Great Scala-style improvements here, @takezoe. Great to see this gardening 
of the codebase ð¤

I'm wondering, in [PIO-110](https://issues.apache.org/jira/browse/PIO-110) 
the objective is to refactor the common code between `CreateServer` and 
`BatchPredict`, yet I do not see that kind of change here. Are you working on 
extracting & reusing the common code as the next step for this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...

2017-08-23 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/421
  
ð


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...

2017-08-23 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/421
  
Based on these [Scala Concurrency/Thread Safety 
docs](https://twitter.github.io/scala_school/concurrency.html#danger), I 
believe simply annotating `@volatile` will cause the synchronization needed for 
thread-safety in this case.

So, I updated this PR with that change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #424: [PIO-115] Implement Storage app & channel...

2017-08-22 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/424
  
Thanks for your feedback @dszeto. I've addressed the code style & JIRA 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #424: Implement Storage app & channel na...

2017-08-22 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/424

Implement Storage app & channel name-to-ID cache

When stress testing the Universal Recommender with high-concurrency 
HTTP/REST queries, we observed that Elasticsearch traffic was majority composed 
of requests resolving the Storage app's name & channel, over and over and over 
again! In this case, [each per-query call to 
`LEventStore.findByEntity`](https://github.com/heroku/predictionio-engine-ur/blob/master/src/main/scala/URAlgorithm.scala#L694)
 re-resolves the app name to an ID.

This changeset implements memoization for the function that performs these 
name-to-ID lookups, so that only one set of lookups is performed per process 
for each app+channel combination. As a result, we've seen overall throughput 
increase ð and error rate drop dramatically ð.

This common optimization effects all storage backends, not just 
Elasticsearch.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio 
cache-storage-name-to-id

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/424.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #424


commit 9825ae2a6981431ce49a6ea40ddabd82ab4121f2
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-22T18:48:04Z

Implement Storage app & channel name-to-ID cache




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #421: Elasticsearch 5.x singleton client...

2017-08-22 Thread mars

Github user mars commented on a diff in the pull request:


https://github.com/apache/incubator-predictionio/pull/421#discussion_r134630517
  
--- Diff: 
storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala
 ---
@@ -18,27 +18,84 @@
 package org.apache.predictionio.data.storage.elasticsearch
 
 import org.apache.http.HttpHost
+import org.apache.http.auth.{AuthScope, UsernamePasswordCredentials}
+import org.apache.http.impl.client.BasicCredentialsProvider
+import org.apache.http.impl.nio.client.HttpAsyncClientBuilder
 import org.apache.predictionio.data.storage.BaseStorageClient
 import org.apache.predictionio.data.storage.StorageClientConfig
 import org.apache.predictionio.data.storage.StorageClientException
+import org.apache.predictionio.workflow.CleanupFunctions
 import org.elasticsearch.client.RestClient
+import org.elasticsearch.client.RestClientBuilder.HttpClientConfigCallback
 
 import grizzled.slf4j.Logging
 
-case class ESClient(hosts: Seq[HttpHost]) {
-  def open(): RestClient = {
+object ESClient extends Logging {
+  var _sharedRestClient: Option[RestClient] = None
--- End diff --

Thanks for the hints here. I suspected this would be an issue. I'm 
investigating how to make this threadsafe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #421: Elasticsearch 5.x singleton client with a...

2017-08-22 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/421
  
Cheers @takezoe I addressed all your Scala style & usage suggestions. Still 
need to take care of the threadsafety issue with the singleton client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication

2017-08-14 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-114:
--
External issue URL: 
https://github.com/apache/incubator-predictionio/pull/421

> Elasticsearch 5.x StorageClient basic HTTP authentication
> -
>
> Key: PIO-114
> URL: https://issues.apache.org/jira/browse/PIO-114
> Project: PredictionIO
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> Add optional username-password configuration for the new Elasticsearch 5 
> client; in {{conf/pio-env.sh}} config:
> {code}
> # Optional basic HTTP auth
> PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
> PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
> {code}
> These credentials are sent in each Elasticsearch request as an HTTP Basic 
> Authorization header.
> Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai 
> on Heroku](https://elements.heroku.com/addons/bonsai).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] incubator-predictionio pull request #421: Elasticsearch singleton client wit...

2017-08-14 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/421

Elasticsearch singleton client with authentication

Fixes both [PIO-106](https://issues.apache.org/jira/browse/PIO-106) & 
[PIO-114](https://issues.apache.org/jira/browse/PIO-114), replacing 
https://github.com/apache/incubator-predictionio/pull/372. These are combined 
because they each heavily revise the same class.

## Authentication

Add optional username-password configuration for the new Elasticsearch 5 
client; in `pio-env.sh` config:

```bash
# Optional basic HTTP auth
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
```

These credentials are sent in each Elasticsearch request as an HTTP Basic 
Authorization header.

Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai 
on Heroku](https://elements.heroku.com/addons/bonsai).

## Singleton client

This PR moves to a singleton Elasticsearch RestClient which has built-in 
HTTP keep-alive and TCP connection pooling. Running on this branch, we've seen 
a 2x speed-up in predictions from the Universal Recommender with ES5, and the 
feared "cannot assign requested address" ð± Elasticsearch connection errors 
have completely disappeared. Running `pio batchpredict` for 160K queries 
results in only 7 total TCP connections to Elasticsearch. Previously that would 
escalate to ~25,000 connections before denying further connections.

**This fundamentally changes the interface for the new [Elasticsearch 5.x 
REST 
client](https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch)**
 introduced with PredictionIO 0.11.0-incubating. With this changeset, the 
`client` is a single instance of 
[`org.elasticsearch.client.RestClient`](https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java).

ð¨ **As a result of this change, any engine templates that directly use 
the Elasticsearch 5 StorageClient would require an update for compatibility.** 
The change is this:

### Original 

```scala
val client: StorageClient = â¦ // code to instantiate client
val restClient: RestClient = client.open()
try {
  restClient.performRequest(â¦)
} finally {
  restClient.close()
}
```

### With this PR

```scala
val client: RestClient = â¦ // code to instantiate client
client.performRequest(â¦)
```

*No more balancing `open` & `close` as this is handled by using a new 
`CleanupFunctions` hook added to the framework in this PR.*

[Universal Recommender](https://github.com/actionml/universal-recommender) 
is the only template that I know of which directly uses the ES StorageClient 
outside of PredictionIO core. See example [UR changes for compatibility with 
this 
PR](https://github.com/heroku/predictionio-engine-ur/compare/esclient-singleton).

### Elasticsearch StorageClient changes

* reimplemented as singleton
* installs a cleanup function

See 
[StorageClient](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2926f4cfd93ccb02320e2a9503ccd223)

### Core changes

A new 
[`CleanupFunctions`](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2a958821ac58f019fbce38540c775f19)
 hook has been added which enables developers of storage modules to register 
anonymous functions with `CleanupFunctions.add { â¦ }` to be executed after 
Spark-related commands/workflows. The hook is called in a `finally { 
CleanupFunctions.run() }` from within:

* `pio import`
* `pio export`
* `pio train`
* `pio batchpredict`

Apologies for the huge indentation shifts from the requisite try-finally 
blocks:

```scala
try {
  // Freshly indented code.
} finally {
  CleanupFunctions.run()
}
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio 
esclient-singleton-with-auth

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/421.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #421


commit f30f27bcc09a397efb42a7923938beceaeac37bf
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-08T23:29:15Z

Migrate to singleton Elasticsearch client to use underlying connection 
pooling (PoolingNHttpClientConnectionManager)

commit d99927089a41cb85f525cb74bdf394eed4686bf2
Author: Mars Hall <m...@heroku.com>

[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...

2017-08-14 Thread mars

Github user mars closed the pull request at:

https://github.com/apache/incubator-predictionio/pull/420


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #420: [PIO-106] Elasticsearch 5.x StorageClient...

2017-08-14 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/420
  
Closing in favor of 
https://github.com/apache/incubator-predictionio/pull/421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio issue #372: Elasticsearch basic HTTP authentication

2017-08-14 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/372
  
Closing in favor of 
https://github.com/apache/incubator-predictionio/pull/421


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication

2017-08-14 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-114:
--
Description: 
Add optional username-password configuration for the new Elasticsearch 5 
client; in {{conf/pio-env.sh}} config:


{code}
# Optional basic HTTP auth
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
{code}

These credentials are sent in each Elasticsearch request as an HTTP Basic 
Authorization header.

Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on 
Heroku](https://elements.heroku.com/addons/bonsai).

  was:
Add optional username-password configuration for the new Elasticsearch 5 
client; in {conf/pio-env.sh} config:


{code}
# Optional basic HTTP auth
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
{code}

These credentials are sent in each Elasticsearch request as an HTTP Basic 
Authorization header.

Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on 
Heroku](https://elements.heroku.com/addons/bonsai).


> Elasticsearch 5.x StorageClient basic HTTP authentication
> -
>
> Key: PIO-114
> URL: https://issues.apache.org/jira/browse/PIO-114
> Project: PredictionIO
>  Issue Type: New Feature
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> Add optional username-password configuration for the new Elasticsearch 5 
> client; in {{conf/pio-env.sh}} config:
> {code}
> # Optional basic HTTP auth
> PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
> PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
> {code}
> These credentials are sent in each Elasticsearch request as an HTTP Basic 
> Authorization header.
> Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai 
> on Heroku](https://elements.heroku.com/addons/bonsai).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIO-114) Elasticsearch 5.x StorageClient basic HTTP authentication

2017-08-14 Thread Mars Hall (JIRA)

Mars Hall created PIO-114:
-

 Summary: Elasticsearch 5.x StorageClient basic HTTP authentication
 Key: PIO-114
 URL: https://issues.apache.org/jira/browse/PIO-114
 Project: PredictionIO
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.11.0-incubating
Reporter: Mars Hall
Assignee: Mars Hall


Add optional username-password configuration for the new Elasticsearch 5 
client; in {conf/pio-env.sh} config:


{code:shell}
# Optional basic HTTP auth
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
{code}
```

These credentials are sent in each Elasticsearch request as an HTTP Basic 
Authorization header.

Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai on 
Heroku](https://elements.heroku.com/addons/bonsai).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] incubator-predictionio issue #420: [PIO-106] Elasticsearch 5.x StorageClient...

2017-08-11 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/420
  
Seem to solve this [long ago reported Elasticsearch connection 
issue](https://github.com/elastic/elasticsearch/issues/3647)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...

2017-08-10 Thread mars

Github user mars commented on a diff in the pull request:


https://github.com/apache/incubator-predictionio/pull/420#discussion_r132601642
  
--- Diff: 
storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEvaluationInstances.scala
 ---
@@ -110,28 +104,24 @@ class ESEvaluationInstances(client: ESClient, config: 
StorageClientConfig, index
 error(s"Failed to access to /$index/$estype/$id", e)
 None
 } finally {
-  restClient.close()
+  client.close()
--- End diff --

This `close` should be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #420: [PIO-106] Elasticsearch 5.x Storag...

2017-08-10 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/420

[PIO-106] Elasticsearch 5.x StorageClient should reuse RestClient

Implements [PIO-106](https://issues.apache.org/jira/browse/PIO-106)

This PR moves to a singleton Elasticsearch RestClient which has built-in 
HTTP keep-alive and TCP connection pooling. Running on this branch, we've seen 
a 2x speed-up in predictions from the Universal Recommender with ES5, and the 
feared "cannot bind" ð± Elasticsearch connection errors have completely 
disappeared. Running `pio batchpredict` for 170K queries results in only 7 
total TCP connections to Elasticsearch. Previously that would escalate to 
~25,000 connections before denying further connections.

**This fundamentally changes the interface for the new [Elasticsearch 5.x 
REST 
client](https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch)**
 introduced with PredictionIO 0.11.0-incubating. With this changeset, the 
`client` is a single instance of 
[`org.elasticsearch.client.RestClient`](https://github.com/elastic/elasticsearch/blob/master/client/rest/src/main/java/org/elasticsearch/client/RestClient.java).

ð¨ **As a result of this change, any engine templates that directly use 
the Elasticsearch 5 StorageClient would require an update for compatibility.** 
The change is this:

### Original 

```scala
val client: StorageClient = â¦ // code to instantiate client
val restClient: RestClient = client.open()
try {
  restClient.performRequest(â¦)
} finally {
  restClient.close()
}
```

### With this PR

```scala
val client: RestClient = â¦ // code to instantiate client
client.performRequest(â¦)
```

*No more balancing `open` & `close` as this is handled by using a new 
`CleanupFunctions` hook added to the framework in this PR.*

[Universal Recommender](https://github.com/actionml/universal-recommender) 
is the only template that I know of which directly uses the ES StorageClient 
outside of PredictionIO core. See the [UR changes for compatibility with this 
PR](https://github.com/heroku/predictionio-engine-ur/compare/esclient-singleton).

### Elasticsearch StorageClient changes

* reimplemented as singleton
* installs a cleanup function

See 
[StorageClient](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2926f4cfd93ccb02320e2a9503ccd223)

### Core changes

A new 
[`CleanupFunctions`](https://github.com/apache/incubator-predictionio/compare/develop...mars:esclient-singleton?expand=1#diff-2a958821ac58f019fbce38540c775f19)
 hook has been added which enables developers of storage modules to register 
anonymous functions with `CleanupFunctions.add { â¦ }` to be executed after 
Spark-related commands/workflows. The hook is called in a `finally { 
CleanupFunctions.run() }` from within:

* `pio import`
* `pio export`
* `pio train`
* `pio batchpredict`

Apologies for the huge indentation shifts from the requisite try-finally 
blocks:

```scala
try {
  // Freshly indented code.
} finally {
  CleanupFunctions.run()
}
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio esclient-singleton

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #420


commit f30f27bcc09a397efb42a7923938beceaeac37bf
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-08T23:29:15Z

Migrate to singleton Elasticsearch client to use underlying connection 
pooling (PoolingNHttpClientConnectionManager)

commit d99927089a41cb85f525cb74bdf394eed4686bf2
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-10T03:00:58Z

Log stacktrace for Storage initialization errors.

commit dc4c31cbcddbb3b281d52b8099e210adc546d1ed
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-10T22:55:38Z

Remove shade rule that breaks Elasticsearch 5 client

commit 7634a7ab720239d5f8efda85f67b26bdaff797f8
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-10T22:59:01Z

Collect & run cleanup functions to allow spark-submit processes to end 
gracefully.

commit 5953451f40e554eafa887328122c794edbbd8f1d
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-11T00:06:24Z

Rename CleanupFunctions to match object name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this

Re: August 2017 Release

2017-08-04 Thread Mars Hall

Yes that is the PR. When I checked out develop yesterday, I though it was 
already merged. Apologies for my confusion.

I'd like to see if I can get that merged for the release. Looking into it now.

*Mars

( <> .. <> )

> On Aug 4, 2017, at 12:46, Donald Szeto <don...@apache.org> wrote:
> 
> Hey Mars,
> 
> Is this the PR in question?
> https://github.com/apache/incubator-predictionio/pull/372
> 
> Regards,
> Donald
> 
> On Thu, Aug 3, 2017 at 11:49 AM, Mars Hall <m...@heroku.com> wrote:
> 
>> Hit an Authenticated Elasticsearch 5.x problem on the current develop
>> branch.
>> 
>> I just tested the HEAD of develop by performing:
>> 
>>  ./make-distribution.sh \
>>-Dscala.version=2.11.8 \
>>-Dspark.version=2.1.0 \
>>-Dhadoop.version=2.7.3 \
>>-Delasticsearch.version=5.1.1
>> 
>> Then, tried build/train/deploy of our Universal Recommender template.
>> 
>> Locally, it makes it through train to the point when it saves to
>> Elasticsearch, failing with:
>> 
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>> org.elasticsearch.client.RestClient.performRequest(
>> Ljava/lang/String;Ljava/lang/String;Ljava/util/Map;[Lorg/
>> apache/http/Header;)Lorg/elasticsearch/client/Response;
>>>  at org.template.EsClient$.createIndex(EsClient.scala:132)
>>>  at org.template.EsClient$.hotSwap(EsClient.scala:218)
>>>  at org.template.URModel.save(URModel.scala:86)
>> 
>> I tried deploying it to Heroku as well, and it fails much earlier when
>> simply connecting to Elasticsearch:
>> 
>>> remote: Exception in thread "main" 
>>> org.elasticsearch.client.ResponseException:
>> HEAD https://xx.us-east-1.bonsaisearch.net:443/pio_meta: HTTP/1.1 401
>> Unauthorized
>>> remote:   at org.elasticsearch.client.RestClient$1.completed(
>> RestClient.java:311)
>>> remote:   at org.elasticsearch.client.RestClient$1.completed(
>> RestClient.java:300)
>>> remote:   at shadeio.data.http.concurrent.BasicFuture.completed(
>> BasicFuture.java:119)
>>> remote:   at shadeio.data.http.impl.nio.client.
>> DefaultClientExchangeHandlerImpl.responseCompleted(
>> DefaultClientExchangeHandlerImpl.java:177)
>>> remote:   at shadeio.data.http.nio.protocol.
>> HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:
>> 436)
>>> remote:   at shadeio.data.http.nio.protocol.
>> HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:
>> 309)
>>> remote:   at shadeio.data.http.impl.nio.
>> DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.
>> java:255)
>> 
>> 
>> These issues were previously found to be caused by this shade rule:
>>  https://github.com/apache/incubator-predictionio/blob/
>> develop/storage/elasticsearch/build.sbt#L42
>> 
>> It looks like the shaded package does not actually use the new
>> authentication code.
>> 
>> Chan Lee mentioned to me that he was only able to make the TravisCI build
>> pass by adding this shade rule, but it is clearly breaking the authenicated
>> Elasticsearch functionality.
>> 
>> Any ideas how to solve this?
>> 
>> *Mars
>> 
>> ( <> .. <> )
>> 
>>> On Aug 3, 2017, at 11:02, Donald Szeto <don...@apache.org> wrote:
>>> 
>>> On Thu, Aug 3, 2017 at 10:07 AM, Mars Hall <m...@heroku.com> wrote:
>>> 
>>>> I just opened a PR to add docs for batch predict.
>>>> 
>>>> Moving forward with the 0.12.0 release sounds great. Today, I will pull
>>>> develop and see how it's working with the Heroku buildpack.
>>>> 
>>> 
>>> Awesome. Thanks!
>>> 
>>> 
>>>>> On Aug 3, 2017, at 00:37, takako shimamoto <chiboch...@gmail.com>
>> wrote:
>>>>> 
>>>>> I think it's almost ready, and now we just have to update the current
>>>>> documentation.
>>>>> The deadline of several unresolved issues for Target Version/s:
>>>>> 0.12.0-incubating is extended, right?
>>>> 
>>> 
>>> Yes. Let's extend those that have not started working if there's no
>>> objection.
>> 
>>

Re: August 2017 Release

2017-08-03 Thread Mars Hall

Hit an Authenticated Elasticsearch 5.x problem on the current develop branch.

I just tested the HEAD of develop by performing:

  ./make-distribution.sh \
-Dscala.version=2.11.8 \
-Dspark.version=2.1.0 \
-Dhadoop.version=2.7.3 \
-Delasticsearch.version=5.1.1

Then, tried build/train/deploy of our Universal Recommender template.

Locally, it makes it through train to the point when it saves to Elasticsearch, 
failing with:

> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.elasticsearch.client.RestClient.performRequest(Ljava/lang/String;Ljava/lang/String;Ljava/util/Map;[Lorg/apache/http/Header;)Lorg/elasticsearch/client/Response;
>   at org.template.EsClient$.createIndex(EsClient.scala:132)
>   at org.template.EsClient$.hotSwap(EsClient.scala:218)
>   at org.template.URModel.save(URModel.scala:86)

I tried deploying it to Heroku as well, and it fails much earlier when simply 
connecting to Elasticsearch:

> remote: Exception in thread "main" 
> org.elasticsearch.client.ResponseException: HEAD 
> https://xx.us-east-1.bonsaisearch.net:443/pio_meta: HTTP/1.1 401 
> Unauthorized
> remote:   at 
> org.elasticsearch.client.RestClient$1.completed(RestClient.java:311)
> remote:   at 
> org.elasticsearch.client.RestClient$1.completed(RestClient.java:300)
> remote:   at 
> shadeio.data.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
> remote:   at 
> shadeio.data.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
> remote:   at 
> shadeio.data.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436)
> remote:   at 
> shadeio.data.http.nio.protocol.HttpAsyncRequestExecutor.responseReceived(HttpAsyncRequestExecutor.java:309)
> remote:   at 
> shadeio.data.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:255)


These issues were previously found to be caused by this shade rule:
  
https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/build.sbt#L42

It looks like the shaded package does not actually use the new authentication 
code.

Chan Lee mentioned to me that he was only able to make the TravisCI build pass 
by adding this shade rule, but it is clearly breaking the authenicated 
Elasticsearch functionality.

Any ideas how to solve this?

*Mars

( <> .. <> )

> On Aug 3, 2017, at 11:02, Donald Szeto <don...@apache.org> wrote:
> 
> On Thu, Aug 3, 2017 at 10:07 AM, Mars Hall <m...@heroku.com> wrote:
> 
>> I just opened a PR to add docs for batch predict.
>> 
>> Moving forward with the 0.12.0 release sounds great. Today, I will pull
>> develop and see how it's working with the Heroku buildpack.
>> 
> 
> Awesome. Thanks!
> 
> 
>>> On Aug 3, 2017, at 00:37, takako shimamoto <chiboch...@gmail.com> wrote:
>>> 
>>> I think it's almost ready, and now we just have to update the current
>>> documentation.
>>> The deadline of several unresolved issues for Target Version/s:
>>> 0.12.0-incubating is extended, right?
>> 
> 
> Yes. Let's extend those that have not started working if there's no
> objection.

[GitHub] incubator-predictionio pull request #418: batchpredict docs

2017-08-02 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/418

batchpredict docs

JIRA [PIO-111](https://issues.apache.org/jira/browse/PIO-111)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio batchpredict-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #418


commit 382d238f73fb04728b5fba9fc0484084ffc0945d
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-02T22:21:39Z

Update therubyracer gem to most recent patch-level for macOS 10.12 
compatibility.

commit eb79654f2c95abaf747f163bc43f86e8ed9328a0
Author: Mars Hall <m...@heroku.com>
Date:   2017-08-03T00:29:39Z

Documentation for `pio batchpredict`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (PIO-111) Document pio batchpredict

2017-08-02 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-111:
--
External issue URL: 
https://github.com/apache/incubator-predictionio/pull/418

> Document pio batchpredict
> -
>
> Key: PIO-111
> URL: https://issues.apache.org/jira/browse/PIO-111
> Project: PredictionIO
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 0.12.0-incubating
>Reporter: Donald Szeto
>Assignee: Mars Hall
>  Labels: newbie
>
> {{pio batchpredict}} is a new feature created in PIO-105. It needs to be 
> documented.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIO-109) Customizable HTTP server configuration

2017-08-01 Thread Mars Hall (JIRA)

Mars Hall created PIO-109:
-

 Summary: Customizable HTTP server configuration
 Key: PIO-109
 URL: https://issues.apache.org/jira/browse/PIO-109
 Project: PredictionIO
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.11.0-incubating
Reporter: Mars Hall


Make it possible to customize the Akka/Spray server config 
[/common/src/main/resources/application.conf|https://github.com/apache/incubator-predictionio/blob/develop/common/src/main/resources/application.conf]
 without building PredictionIO from source.

A possible solution might be an option to the {{pio deploy}} command, like 
{{--server-config ./application.conf}}, that allows overriding with a 
user-supplied config.

Background: in PIO-95 I requested a configurable timeout for the engine's HTTP 
server. That issue was resolved with a simple change to the server timeout and 
lead to this more generalized idea.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: New PMC member and committer: Mars Hall

2017-07-28 Thread Mars Hall

Thank you Donald,

I'm honored & excited to officially join the project!

*Mars

( <> .. <> )

> On Jul 28, 2017, at 12:01, Donald Szeto <don...@apache.org> wrote:
> 
> Hi all,
> 
> The Project Management Committee (PMC) for Apache PredictionIO (incubating)
> has asked Mars Hall to become a PMC member and committer, and we are
> pleased to announce that he has accepted.
> 
> Mars has been working on PredictionIO since 0.10 and has suggested and made
> changes to the core so that it has become more configurable and easier to
> deploy on Heroku. He added authentication support to the REST-based
> Elasticsearch client. He has also found and fixed core bugs.
> 
> Mars is the primary driver in delivering a good developer experience
> through Heroku buildpacks for PredictionIO (
> https://github.com/heroku/predictionio-buildpack), which allows engine
> templates to be submitted to Heroku and deployed automatically. He also
> made a couple engine templates that are preset to do so (
> https://github.com/heroku/predictionio-engine-classification,
> https://github.com/heroku/predictionio-engine-ur).
> 
> Being a committer enables easier contribution to the project since there is
> no need to go via the patch submission process. This should enable better
> productivity. Being a PMC member enables assistance with the management and
> to guide the direction of the project.
> 
> Please join us in welcoming Mars.
> 
> Regards,
> Donald

[jira] [Updated] (PIO-106) Elasticsearch 5.x StorageClient should reuse RestClient

2017-07-18 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-106:
--
Description: 
When using the proposed [PIO-105 Batch 
Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an 
engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's 
REST interface appears to become overloaded, ending with the Spark job being 
killed from errors like:

{noformat}
[ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search
[ERROR] [Utils] Aborting task
[ERROR] [ESApps] Failed to access to /pio_meta/apps/_search
[ERROR] [Executor] Exception in task 747.0 in stage 1.0 (TID 749)
[ERROR] [Executor] Exception in task 735.0 in stage 1.0 (TID 737)
[ERROR] [Common$] Invalid app name ur
[ERROR] [Utils] Aborting task
[ERROR] [URAlgorithm] Error when read recent events: 
java.lang.IllegalArgumentException: Invalid app name ur
[ERROR] [Executor] Exception in task 749.0 in stage 1.0 (TID 751)
[ERROR] [Utils] Aborting task
[ERROR] [Executor] Exception in task 748.0 in stage 1.0 (TID 750)
[WARN] [TaskSetManager] Lost task 749.0 in stage 1.0 (TID 751, localhost, 
executor driver): java.net.BindException: Can't assign requested address
  at sun.nio.ch.Net.connect0(Native Method)
  at sun.nio.ch.Net.connect(Net.java:454)
  at sun.nio.ch.Net.connect(Net.java:446)
  at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
  at 
org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processSessionRequests(DefaultConnectingIOReactor.java:273)
  at 
org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:139)
  at 
org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:348)
  at 
org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:192)
  at 
org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
  at java.lang.Thread.run(Thread.java:745)
{noformat}

After these errors happen & the job is killed, Elasticsearch immediately 
recovers. It responds to queries normally. I researched what could cause this 
and found an [old issue in the main Elasticsearch 
repo|https://github.com/elastic/elasticsearch/issues/3647]. With the hints 
given therein about *using keep-alive in the ES client* to avoid these 
performance issues, I investigated how PredictionIO's [Elasticsearch 
StorageClient|https://github.com/apache/incubator-predictionio/tree/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch]
 manages its connections.

I found that unlike the other StorageClients (Elasticsearch1, HBase, JDBC), 
Elasticsearch creates a new underlying connection, an Elasticsearch RestClient, 
for 
[every|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L80]
 
[single|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESApps.scala#L157]
 
[query|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESChannels.scala#L78]
 & 
[interaction|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L205]
 with its API. As a result, *there is no way Elasticsearch TCP connections can 
be reused via HTTP keep-alive*.

High-performance workloads with Elasticsearch 5.x will suffer from these issues 
unless we refactor Elasticsearch StorageClient to share the underlying 
RestClient instead of [building a new one everytime the client is 
used|https://github.com/apache/incubator-predictionio/blob/develop/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala#L31].

There are certainly different approaches we could take to sharing a RestClient 
so that its keep-alive behavior may work as designed:

* maintain a singleton RestClient that is reused throughout the ES storage 
classes
* create a RestClient on-demand and pass it as an argument to ES storage methods
* other ideas?

  was:
When using the proposed [PIO-105 Batch 
Predictions|https://issues.apache.org/jira/browse/PIO-105] feature with an 
engine that queries Elasticsearch in {{Algorithm#predict}}, Elasticsearch's 
REST interface appears to become overloaded, ending with the Spark job being 
killed from errors like:

{noformat}
[ERROR] [ESChannels] Failed to access to /pio_meta/channels/_search
[ERROR] [Utils] Aborting task
[ERROR] [ESApps] Failed to access to /pio_meta/apps/_search
[ERROR] [Executor] Exception in task 747.0

[GitHub] incubator-predictionio issue #412: [PIO-105] Batch Predictions

2017-07-17 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/412
  
@takezoe thank you for the feedback. As a relatively-new Scala programmer I 
really appreciate this kind of review.

I am a bit hesitant to make these changes. I'm trying to maintain likeness 
with the 
[`CreateServer.scala`](https://github.com/mars/incubator-predictionio/blob/e7c6ebd8cfe2d4a150319025876520fc39be9a34/core/src/main/scala/org/apache/predictionio/workflow/CreateServer.scala)
 code, to minimize differences in prediction behavior between `pio deploy` and 
`pio batchpredict`. Any of these stylistic points should probably be matched in 
CreateServer, so that it continues to be easy to reason about their similarity.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #412: Batch Predictions

2017-07-14 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/412

Batch Predictions

JIRA issue [PIO-105](https://issues.apache.org/jira/browse/PIO-105)

Provides a new `pio batchpredict` command.

Reads from multi-object JSON input file. Example:

```json
{"user":"1"}
{"user":"2"}
{"user":"3"}
{"user":"4"}
{"user":"5"}
```

Writes to multi-object JSON output file (actually Hadoop partition files). 
Example:

```json

{"query":{"user":"1"},"prediction":{"itemScores":[{"item":"1","score":33},{"item":"2","score":32}]}}

{"query":{"user":"2"},"prediction":{"itemScores":[{"item":"5","score":55},{"item":"3","score":28}]}}

{"query":{"user":"3"},"prediction":{"itemScores":[{"item":"2","score":16},{"item":"3","score":12}]}}

{"query":{"user":"4"},"prediction":{"itemScores":[{"item":"3","score":19},{"item":"1","score":18}]}}

{"query":{"user":"5"},"prediction":{"itemScores":[{"item":"1","score":24},{"item":"4","score":14}]}}
```

See the included [console usage 
help](#diff-2cf174557564e09d52157be8e839fecf)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio batch-predict

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/412.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #412


commit 99ee6493bddc8f02aee384f3a2db27c6ae3f68cc
Author: Mars Hall <m...@heroku.com>
Date:   2017-07-13T00:12:25Z

Implement BatchPredict

commit c205357498e4a4a745810b04130c5bbad78f8686
Author: Mars Hall <m...@heroku.com>
Date:   2017-07-14T22:29:26Z

Improve console help for batch predict.

commit 93f7ed3e5ed10155a688a032e367793d75fa116a
Author: Mars Hall <m...@heroku.com>
Date:   2017-07-14T22:46:30Z

Undo experimental change to publish tools artifact




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (PIO-105) Batch Predictions

2017-07-14 Thread Mars Hall (JIRA)

Mars Hall created PIO-105:
-

 Summary: Batch Predictions
 Key: PIO-105
 URL: https://issues.apache.org/jira/browse/PIO-105
 Project: PredictionIO
  Issue Type: New Feature
  Components: Core
Reporter: Mars Hall
Assignee: Mars Hall


Implement a new {{pio batchpredict}} command to enable massive, fast, batch 
predictions from a trained model. Read a multi-object JSON file as the input 
format, with one query object per line. Similarly, write results to a 
multi-object JSON file, with one prediction result + its original query per 
line.

Currently getting bulk predictions from PredictionIO is possible with either:

* a {{pio eval}} script, which will always train a fresh, unvalidated model 
before getting predictions
* a custom script that hits the {{queries.json}} HTTP API, which is a serious 
bottleneck when requesting hundreds-of-thousands or millions of predictions

Neither of these existing bulk-prediction hacks are adequate for the reasons 
mentioned.

It's time for this use-case to be a firstclass command :D



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] incubator-predictionio issue #401: [PIO-72] Fix class loading for pio-shell

2017-07-11 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio/pull/401
  
Back in May, we fixed an intermittent class loading problem by [making a 
change to stabilize the 
classpath](https://github.com/mars/incubator-predictionio/commit/9ecc77628aba347454073e9919096a8fc8e0b952)
 in our fork of incubator-predictionio. Would you be open to adding that to 
this PR as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #406: [PIO-102] Fix ESEngineInstances `g...

2017-07-08 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/406

[PIO-102] Fix ESEngineInstances `getAll` results out of order 
(Elasticsearch 5.x)

Fix for [PIO-102](https://issues.apache.org/jira/browse/PIO-102)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio 
fix-es-getall-order

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #406


commit 34fb0de8ae91f3bf9edb7a9823ea1784555845a8
Author: Mars Hall <m...@heroku.com>
Date:   2017-07-08T01:22:14Z

Append Elasticsearch scroll results to maintain order




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (PIO-102) ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)

2017-07-08 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-102:
--
External issue URL: 
https://github.com/apache/incubator-predictionio/pull/406
   Description: 
Using the new Elasticsearch 5.x REST storage client as the meta storage source 
(`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
conf/pio-env.sh), I found that once an engine has been trained a certain number 
of times, that the most recent engine instance is no longer retrieved. So, I 
tracked down where those Elasticsearch queries originate.

In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
responses are collected by 
*appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
 them to one another.

In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
*prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152]
 them to one another.

This out-of-order concatenation breaks [ESEngineInstances 
`getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192]
 by erroneously replacing the head of the results with an older engine 
instance, when there are enough engine instances to overflow a single page of 
Elasticsearch hits.

I've observed this buggy behavior after ten trainings, when enough engine 
instances are stored to trigger Elasticsearch's scroll feature.

Pull request: https://github.com/apache/incubator-predictionio/pull/406

  was:
Using the new Elasticsearch 5.x REST storage client as the meta storage source 
(`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
conf/pio-env.sh), I found that once an engine has been trained a certain number 
of times, that the most recent engine instance is no longer retrieved. So, I 
tracked down where those Elasticsearch queries originate.

In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
responses are collected by 
*appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
 them to one another.

In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
*prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152]
 them to one another.

This out-of-order concatenation breaks [ESEngineInstances 
`getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192]
 by erroneously replacing the head of the results with an older engine 
instance, when there are enough engine instances to overflow a single page of 
Elasticsearch hits.

I've observed this buggy behavior after ten trainings, when enough engine 
instances are stored to trigger Elasticsearch's scroll feature.

I'll be opening a pull request shortly with the super-simple fix.


> ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)
> ---
>
> Key: PIO-102
> URL: https://issues.apache.org/jira/browse/PIO-102
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>Assignee: Mars Hall
>
> Using the new Elasticsearch 5.x REST storage client as the meta storage 
> source (`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
> conf/pio-env.sh), I found that once an engine has been trained a certain 
> number of times, that the most recent engine instance is no longer retrieved. 
> So, I tracked down where those Elasticsearch queries originate.
> In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
> responses are collected by 
> *appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
>  them to one another.
> In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
> *prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/

[jira] [Created] (PIO-102) ESEngineInstances `getAll` results out of order (Elasticsearch 5.x)

2017-07-08 Thread Mars Hall (JIRA)

Mars Hall created PIO-102:
-

 Summary: ESEngineInstances `getAll` results out of order 
(Elasticsearch 5.x)
 Key: PIO-102
 URL: https://issues.apache.org/jira/browse/PIO-102
 Project: PredictionIO
  Issue Type: Bug
  Components: Core
Affects Versions: 0.11.0-incubating
Reporter: Mars Hall
Assignee: Mars Hall


Using the new Elasticsearch 5.x REST storage client as the meta storage source 
(`PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH` setup in 
conf/pio-env.sh), I found that once an engine has been trained a certain number 
of times, that the most recent engine instance is no longer retrieved. So, I 
tracked down where those Elasticsearch queries originate.

In the original Elasticsearch 1.x storage client, [the "scroll" pagination 
responses are collected by 
*appending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch1/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L44]
 them to one another.

In the new Elasticsearch 5.x client, [the "scroll" responses are collected by 
*prepending*|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESUtils.scala#L152]
 them to one another.

This out-of-order concatenation breaks [ESEngineInstances 
`getLatestCompleted`|https://github.com/apache/incubator-predictionio/blob/release/0.11.0/storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/ESEngineInstances.scala#L192]
 by erroneously replacing the head of the results with an older engine 
instance, when there are enough engine instances to overflow a single page of 
Elasticsearch hits.

I've observed this buggy behavior after ten trainings, when enough engine 
instances are stored to trigger Elasticsearch's scroll feature.

I'll be opening a pull request shortly with the super-simple fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs

2017-07-08 Thread Mars Hall (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079272#comment-16079272
 ] 

Mars Hall commented on PIO-96:
--

Yes Kenneth, the same storage config can be used for (per my example) a 
Classifier & UR, but the issue I'm raising here is that it's quite simple for 
someone to not understand this and end up with corrupted storage.

As I mentioned, PredictionIO makes it sound like sharing storage and 
eventserver between engines is okay. Unfortunately this sets folks up for hard 
to understand, probably time-wasting, and possibly hidden erroneous data 
problems.

> Storage corrupted by sharing databases between engines with different storage 
> configs
> -
>
> Key: PIO-96
> URL: https://issues.apache.org/jira/browse/PIO-96
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>
> When getting started with PredictionIO, it's no problem to spin up an engine 
> and see it work. Problems emerge when a developer tries running multiple 
> engines with different storage configs on the same underlying database, such 
> as:
> * a Classifier with *Postgres* meta, event, & model storage, and
> * the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & 
> model storage.
> The database will become corrupt because the meta tables are stored in 
> different databases, but the dynamically created event & model tables may 
> mistakenly share the same name, like {{pio_event_1}}.
> We are directing folks to avoid this problem with the Heroku buildpack by 
> [isolating each engine's 
> database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database]
>  and [optionally running an eventserver per 
> engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver].
>  It's still a problem with local development, though.
> It would be great if PredictionIO's management of the database schema's would 
> inherently avoid such conflicts, like by using random/UUIDs for dynamically 
> created table names, so that they will never conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs

2017-06-30 Thread Mars Hall (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-96?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070626#comment-16070626
 ] 

Mars Hall commented on PIO-96:
--

Data corruption is an issue, we've seen it happen five times for various
local developers & deployments in the exact example I provided.

This was never an issue until Universal Recommender required us to mix a
PIO engine with a different meta storage source into our environments.

Maybe clarification in documentation about this danger is a good way to
resolve, as the evolution of PIO itself seems to be headed to solve this?



> Storage corrupted by sharing databases between engines with different storage 
> configs
> -
>
> Key: PIO-96
> URL: https://issues.apache.org/jira/browse/PIO-96
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>
> When getting started with PredictionIO, it's no problem to spin up an engine 
> and see it work. Problems emerge when a developer tries running multiple 
> engines with different storage configs on the same underlying database, such 
> as:
> * a Classifier with *Postgres* meta, event, & model storage, and
> * the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & 
> model storage.
> The database will become corrupt because the meta tables are stored in 
> different databases, but the dynamically created event & model tables may 
> mistakenly share the same name, like {{pio_event_1}}.
> We are directing folks to avoid this problem with the Heroku buildpack by 
> [isolating each engine's 
> database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database]
>  and [optionally running an eventserver per 
> engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver].
>  It's still a problem with local development, though.
> It would be great if PredictionIO's management of the database schema's would 
> inherently avoid such conflicts, like by using random/UUIDs for dynamically 
> created table names, so that they will never conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIO-96) Storage corrupted by sharing databases between engines with different storage configs

2017-06-19 Thread Mars Hall (JIRA)

Mars Hall created PIO-96:


 Summary: Storage corrupted by sharing databases between engines 
with different storage configs
 Key: PIO-96
 URL: https://issues.apache.org/jira/browse/PIO-96
 Project: PredictionIO
  Issue Type: Bug
  Components: Core
Affects Versions: 0.11.0-incubating
Reporter: Mars Hall


When getting started with PredictionIO, it's no problem to spin up an engine 
and see it work. Problems emerge when a developer tries running multiple 
engines with different storage configs on the same underlying database, such as:

* a Classifier with *Postgres* meta, event, & model storage, and
* the Universal Recommender with *Elasticsearch* meta plus *Postgres* event & 
model storage.

The database will become corrupt because the meta tables are stored in 
different databases, but the dynamically created event & storage tables may 
mistakenly share the same name, like {{pio_event_1}}.

We are directing folks to avoid this problem with the Heroku buildpack by 
[isolating each engine's 
database|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#provision-the-database]
 and [optionally running an eventserver per 
engine|https://github.com/heroku/predictionio-buildpack/blob/master/CUSTOM.md#user-content-eventserver].
 It's still a problem with local development, though.

It would be great if PredictionIO's management of the database schema's would 
inherently avoid such conflicts, like by using random/UUIDs for dynamically 
created table names, so that they will never conflict.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] incubator-predictionio pull request #394: [PIO-95] Extend request timeout fo...

2017-06-16 Thread mars

Github user mars commented on a diff in the pull request:


https://github.com/apache/incubator-predictionio/pull/394#discussion_r122527314
  
--- Diff: common/src/main/resources/application.conf ---
@@ -9,3 +9,7 @@ spray.can {
 verbose-error-messages = "on"
   }
 }
+
+spray.can.server {
+  request-timeout = 35s
+}
--- End diff --

I updated my branch with this change, but for some reason this PR is not 
updating to reflect the [new 
commit](https://github.com/mars/incubator-predictionio/commits/extend-request-timeout).
 Assuming it's a Github glitch which will eventually fix itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #394: Extend request timeout for REST AP...

2017-06-16 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/394

Extend request timeout for REST API

We've found the default 20-second REST API request timeout is too short for 
our batch-prediction use cases. We're running PredictionIO on Heroku which has 
its own [timeout starting at 
30-seonds](https://devcenter.heroku.com/articles/limits#http-timeouts). So we'd 
prefer a more generous or easily configurable timeout to allow Heroku's routing 
layer to impose & track this limit in the platform layer.

I investigated how to configure this and found [Spray 
`application.conf`](http://spray.io/documentation/1.2.4/spray-can/configuration/).
 This PR simply increases the timeout.

I would love guidance on how we might extract this config into an 
environment variable or a value in `pio-env.sh`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio 
extend-request-timeout

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/394.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #394


commit 4b99967b47350e2f3ef25e505bd1f523680d7f64
Author: Mars Hall <m...@heroku.com>
Date:   2017-06-16T18:59:10Z

Extend request timeout for REST API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #393: Fix to show stacktrace for errors ...

2017-06-16 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/393

Fix to show stacktrace for errors thrown in `queries.json` REST API

We were getting intractable errors from `queries.json` requests, like this 
one without a stacktrace:

```
[ERROR] [ServerActor] Query '{
  "user": "000",
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing
```

This pull request adds stacktraces to these errors using the pattern 
already present elsewhere in `CreateServer.scala`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio 
log-queries-stacktrace

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #393


commit f50980d6513c367657374988083e32039c454992
Author: Mars Hall <m...@heroku.com>
Date:   2017-06-16T18:45:03Z

Fix to show stacktrace for errors thrown in `queries.json` REST API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (PIO-94) Query parsing may throw intractable errors

2017-06-16 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-94:
-
Description: 
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

{code}
[ERROR] [ServerActor] Query '{
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing
{code}


To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393

  was:
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

{{
[ERROR] [ServerActor] Query '{
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing
}}

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393


> Query parsing may throw intractable errors
> --
>
> Key: PIO-94
> URL: https://issues.apache.org/jira/browse/PIO-94
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>
> We get intractable errors from some `queries.json` requests, like this one 
> without a stacktrace:
> {code}
> [ERROR] [ServerActor] Query '{
>   "item": "000"
> }' is invalid. Reason: Expected object but got JNothing
> {code}
> To solve, add stacktraces to these errors using the pattern already present 
> elsewhere in `CreateServer.scala`.
> PR: https://github.com/apache/incubator-predictionio/pull/393



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIO-94) Query parsing may throw intractable errors

2017-06-16 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-94:
-
Description: 
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

{{
[ERROR] [ServerActor] Query '{
  "user": "000",
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing
}}

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393

  was:
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

{{[ERROR] [ServerActor] Query '{
  "user": "000",
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing}}

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393


> Query parsing may throw intractable errors
> --
>
> Key: PIO-94
> URL: https://issues.apache.org/jira/browse/PIO-94
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>
> We get intractable errors from some `queries.json` requests, like this one 
> without a stacktrace:
> {{
> [ERROR] [ServerActor] Query '{
>   "user": "000",
>   "item": "000"
> }' is invalid. Reason: Expected object but got JNothing
> }}
> To solve, add stacktraces to these errors using the pattern already present 
> elsewhere in `CreateServer.scala`.
> PR: https://github.com/apache/incubator-predictionio/pull/393



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIO-94) Query parsing may throw intractable errors

2017-06-16 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-94:
-
Description: 
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

{{
[ERROR] [ServerActor] Query '{
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing
}}

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393

  was:
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

[ERROR] [ServerActor] Query '{
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393


> Query parsing may throw intractable errors
> --
>
> Key: PIO-94
> URL: https://issues.apache.org/jira/browse/PIO-94
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>
> We get intractable errors from some `queries.json` requests, like this one 
> without a stacktrace:
> {{
> [ERROR] [ServerActor] Query '{
>   "item": "000"
> }' is invalid. Reason: Expected object but got JNothing
> }}
> To solve, add stacktraces to these errors using the pattern already present 
> elsewhere in `CreateServer.scala`.
> PR: https://github.com/apache/incubator-predictionio/pull/393



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIO-94) Query parsing may throw intractable errors

2017-06-16 Thread Mars Hall (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mars Hall updated PIO-94:
-
Description: 
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

{{[ERROR] [ServerActor] Query '{
  "user": "000",
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing}}

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393

  was:
We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

[ERROR] [ServerActor] Query '{
  "user": "000",
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393


> Query parsing may throw intractable errors
> --
>
> Key: PIO-94
> URL: https://issues.apache.org/jira/browse/PIO-94
> Project: PredictionIO
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 0.11.0-incubating
>Reporter: Mars Hall
>
> We get intractable errors from some `queries.json` requests, like this one 
> without a stacktrace:
> {{[ERROR] [ServerActor] Query '{
>   "user": "000",
>   "item": "000"
> }' is invalid. Reason: Expected object but got JNothing}}
> To solve, add stacktraces to these errors using the pattern already present 
> elsewhere in `CreateServer.scala`.
> PR: https://github.com/apache/incubator-predictionio/pull/393



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIO-94) Query parsing may throw intractable errors

2017-06-16 Thread Mars Hall (JIRA)

Mars Hall created PIO-94:


 Summary: Query parsing may throw intractable errors
 Key: PIO-94
 URL: https://issues.apache.org/jira/browse/PIO-94
 Project: PredictionIO
  Issue Type: Bug
  Components: Core
Affects Versions: 0.11.0-incubating
Reporter: Mars Hall


We get intractable errors from some `queries.json` requests, like this one 
without a stacktrace:

[ERROR] [ServerActor] Query '{
  "user": "000",
  "item": "000"
}' is invalid. Reason: Expected object but got JNothing

To solve, add stacktraces to these errors using the pattern already present 
elsewhere in `CreateServer.scala`.

PR: https://github.com/apache/incubator-predictionio/pull/393



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (PIO-72) In `pio-shell` jdbc.StorageClient cannot be loaded

2017-05-20 Thread Mars Hall (JIRA)

Mars Hall created PIO-72:


 Summary: In `pio-shell` jdbc.StorageClient cannot be loaded
 Key: PIO-72
 URL: https://issues.apache.org/jira/browse/PIO-72
 Project: PredictionIO
  Issue Type: Bug
  Components: Core
Affects Versions: 0.11.0-incubating
 Environment: local developer machines
Reporter: Mars Hall
 Attachments: image.png

Class loading/classpath is currently broken in {{pio-shell}}. Attached 
screenshot is the public docs that explain the intended functionality. Instead, 
users see errors when attempting to use storage classes:

{code:title=pio-shell.error|borderStyle=solid}
java.lang.ClassNotFoundException: jdbc.StorageClient
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.predictionio.data.storage.Storage$.getClient(Storage.scala:228)
at 
org.apache.predictionio.data.storage.Storage$.org$apache$predictionio$data$storage$Storage$$updateS2CM(Storage.scala:254)
at 
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:215)
at 
org.apache.predictionio.data.storage.Storage$$anonfun$sourcesToClientMeta$1.apply(Storage.scala:215)
at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at 
org.apache.predictionio.data.storage.Storage$.sourcesToClientMeta(Storage.scala:215)
at 
org.apache.predictionio.data.storage.Storage$.getDataObject(Storage.scala:284)
at 
org.apache.predictionio.data.storage.Storage$.getDataObjectFromRepo(Storage.scala:269)
at 
org.apache.predictionio.data.storage.Storage$.getMetaDataApps(Storage.scala:387)
at org.apache.predictionio.data.store.Common$.appsDb$lzycompute(Common.scala:27)
at org.apache.predictionio.data.store.Common$.appsDb(Common.scala:27)
at org.apache.predictionio.data.store.Common$.appNameToId(Common.scala:32)
at 
org.apache.predictionio.data.store.PEventStore$.aggregateProperties(PEventStore.scala:108)
at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:31)
at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36)
at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:38)
at $line20.$read$$iwC$$iwC$$iwC$$iwC$$iwC.(:40)
at $line20.$read$$iwC$$iwC$$iwC$$iwC.(:42)
at $line20.$read$$iwC$$iwC$$iwC.(:44)
at $line20.$read$$iwC$$iwC.(:46)
at $line20.$read$$iwC.(:48)
at $line20.$read.(:50)
at $line20.$read$.(:54)
at $line20.$read$.()
at $line20.$eval$.(:7)
at $line20.$eval$.()
at $line20.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731

Re: New product manager: Sara Asher

2017-05-17 Thread Mars Hall

Bravo Sara!  PredictionIO is fortunate to have you!!!

*Mars

( <> .. <> )


On Wed, May 17, 2017 at 09:00 Donald Szeto <don...@apache.org 
<mailto:don...@apache.org>> wrote:
Hi all,

The Project Management Committee (PMC) for Apache PredictionIO (incubating) has 
asked Sara Asher to become a product manager, and we are pleased to announce 
that she has accepted.

Sara is a Director of Product Management for Salesforce Einstein, where she 
creates products that let people build smarter applications with Salesforce and 
advanced AI. Prior to Salesforce, Sara worked at Alpine Data where she was 
chief product manager and founding director of Alpine Labs. Sara holds an AB in 
mathematics from Princeton University and a PhD in mathematics from 
Northwestern University.

Being a product manager enables management of JIRA tickets. This should make 
prioritizing product features more efficient.

Please join us in welcoming Sara.

Regards,
Donald

Re: New product manager: Sara Asher

2017-05-17 Thread Mars Hall

Bravo Sara!  PredictionIO is fortunate to have you!!!
On Wed, May 17, 2017 at 09:00 Donald Szeto  wrote:

> Hi all,
>
> The Project Management Committee (PMC) for Apache PredictionIO
> (incubating) has asked Sara Asher to become a product manager, and we are
> pleased to announce that she has accepted.
>
> Sara is a Director of Product Management for Salesforce Einstein, where
> she creates products that let people build smarter applications with
> Salesforce and advanced AI. Prior to Salesforce, Sara worked at Alpine Data
> where she was chief product manager and founding director of Alpine Labs.
> Sara holds an AB in mathematics from Princeton University and a PhD in
> mathematics from Northwestern University.
>
> Being a product manager enables management of JIRA tickets. This should
> make prioritizing product features more efficient.
>
> Please join us in welcoming Sara.
>
> Regards,
> Donald
>

[GitHub] incubator-predictionio pull request #371: [PIO-61] Add S3 Model Data Reposit...

2017-04-26 Thread mars

Github user mars commented on a diff in the pull request:


https://github.com/apache/incubator-predictionio/pull/371#discussion_r113494975
  
--- Diff: storage/s3/build.sbt ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import PIOBuild._
+
+name := "apache-predictionio-data-s3"
+
+libraryDependencies ++= Seq(
+  "org.apache.predictionio" %% "apache-predictionio-core" % version.value 
% "provided",
+  "com.google.guava"% "guava" % "14.0.1"  
% "provided",
+  "com.amazonaws"   % "aws-java-sdk-s3"   % "1.11.118",
+  "org.scalatest"   %% "scalatest"% "2.1.7" % 
"test")
+
+parallelExecution in Test := false
+
+pomExtra := childrenPomExtra.value
+
+assemblyOption in assembly := (assemblyOption in 
assembly).value.copy(includeScala = false)
+
+assemblyShadeRules in assembly := Seq(
+  ShadeRule.rename("org.apache.http.**" -> 
"shadeio.data.s3.http.@1").inAll,
+  ShadeRule.rename("com.fasterxml.**" -> 
"shadeio.data.s3.fasterxml.@1").inAll
+)
--- End diff --

Hi @marevol,

We're building a [predictionio-incubating branch to add authentication to 
the Elasticsearch REST 
client](https://github.com/apache/incubator-predictionio/pull/372) with Scala 
2.11 & Spark 2.1.

I believe it was during a runtime ES REST client call to `performRequest` 
suddenly an underlying method signature could not be located in 
`org.apache.http` package. @dszeto helped me to discover that the 
`shadeio.data` name was being incorrectly resolved. So, I tried [removing the 
shade 
rule](https://github.com/apache/incubator-predictionio/pull/372/files#diff-55cfeb297edd310e1efa8b6ac8bdbae6L39)
 and it started working. No more errors for that branch.

Maybe the difference is that we're using Spark 2.1?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #372: Elasticsearch basic HTTP authentic...

2017-04-21 Thread mars

Github user mars commented on a diff in the pull request:


https://github.com/apache/incubator-predictionio/pull/372#discussion_r112755776
  
--- Diff: 
storage/elasticsearch/src/main/scala/org/apache/predictionio/data/storage/elasticsearch/StorageClient.scala
 ---
@@ -18,27 +18,66 @@
 package org.apache.predictionio.data.storage.elasticsearch
 
 import org.apache.http.HttpHost
+import org.apache.http.auth.{AuthScope, UsernamePasswordCredentials}
+import org.apache.http.impl.client.BasicCredentialsProvider
+import org.apache.http.impl.nio.client.HttpAsyncClientBuilder
 import org.apache.predictionio.data.storage.BaseStorageClient
 import org.apache.predictionio.data.storage.StorageClientConfig
 import org.apache.predictionio.data.storage.StorageClientException
 import org.elasticsearch.client.RestClient
+import org.elasticsearch.client.RestClientBuilder.HttpClientConfigCallback
 
 import grizzled.slf4j.Logging
 
-case class ESClient(hosts: Seq[HttpHost]) {
+case class ESClient(
+hosts: Seq[HttpHost],
+basicAuth: Option[(String, String)] = None) {
+
   def open(): RestClient = {
 try {
-  RestClient.builder(hosts: _*).build()
+  var builder = RestClient.builder(hosts: _*)
+  builder = basicAuth match {
+case Some((username, password)) => 
builder.setHttpClientConfigCallback(
+  new BasicAuthProvider(username, password))
+case None   => builder}
+  builder.build()
 } catch {
   case e: Throwable =>
 throw new StorageClientException(e.getMessage, e)
 }
   }
 }
 
-class StorageClient(val config: StorageClientConfig) extends 
BaseStorageClient
-with Logging {
+class StorageClient(val config: StorageClientConfig)
+  extends BaseStorageClient with Logging {
+
   override val prefix = "ES"
 
-  val client = ESClient(ESUtils.getHttpHosts(config))
+  val usernamePassword = (
+config.properties.get("USERNAME"),
+config.properties.get("PASSWORD"))
+  val optionalBasicAuth: Option[(String, String)] = usernamePassword match 
{
+case (Some(username), Some(password)) => Some(username, password)
+case (Some(username), None)   => Some(username, "")
+case (None, Some(password))   => Some("", password)
+case (None, None) => None}
--- End diff --

Thanks for the feedback, @takezoe. Push this improvement now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio pull request #372: Elasticsearch basic HTTP authentic...

2017-04-19 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio/pull/372

Elasticsearch basic HTTP authentication

Add optional username-password configuration for the new Elasticsearch 5 
client; in `pio-env.sh` config:

```bash
# Optional basic HTTP auth
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=my-name
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=my-secret
```

These credentials are sent in each Elasticsearch request as an HTTP Basic 
Authorization header.

Enables use of public-cloud, hosted Elasticsearch clusters, such as [Bonsai 
on Heroku](https://elements.heroku.com/addons/bonsai).

I'm looking into adding test coverage. (I have the Docker test suite setup 
now.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio esclient-auth

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-predictionio/pull/372.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #372


commit 9f61541df44a5728450c3d25a79639e351e0ae6f
Author: Mars Hall <m...@heroku.com>
Date:   2017-04-19T18:00:36Z

Fix classpath computation error introduced when "storage got 
refactored"â@dszeto

commit 9ab99f6be9d0f018b3c900effe0be455f74f0046
Author: Mars Hall <m...@heroku.com>
Date:   2017-04-19T18:37:18Z

Optional Elasticsearch support for basic HTTP auth (username & password) 
using ES 5.3.0's "preemptive authentication"




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [VOTE] Apache PredictionIO (incubating) 0.11.0 Release (RC2)

2017-04-10 Thread Mars Hall

My non-binding vote for rc2!

[X] +1, accept RC as the official 0.11.0 release
[ ] -1, do not accept RC as the official 0.11.0 release because...

*Mars

( <> .. <> )

> On Apr 9, 2017, at 21:29, Steven Yan <steven@salesforce.com> wrote:
> 
> [X] +1, accept RC as the official 0.11.0 release
> [ ] -1, do not accept RC as the official 0.11.0 release because...
> 
> Thanks,
> Steven
> 
> On Sun, Apr 9, 2017 at 5:21 PM, Donald Szeto <don...@apache.org> wrote:
> 
>> This is the vote for 0.11.0 of Apache PredictionIO (incubating).
>> 
>> The vote will run for at least 72 hours and will close on Apr 12th, 2017.
>> 
>> RC2 fixes a usage bug where the pio command does not pass through
>> --driver-java-options to spark-submit.
>> 
>> The release candidate artifacts can be downloaded here:
>> https://dist.apache.org/repos/dist/dev/incubator/predictionio/0.11.0-
>> incubating-rc2/
>> 
>> Test results of RC2 can be found here:
>> https://travis-ci.org/apache/incubator-predictionio/builds/220381611
>> 
>> Maven artifacts are built from the release candidate artifacts above, and
>> are provided as convenience for testing with engine templates. The Maven
>> artifacts are provided at the Maven staging repo here:
>> https://repository.apache.org/content/repositories/
>> orgapachepredictionio-1016/
>> 
>> All JIRAs completed for this release are tagged with 'FixVersion =
>> 0.11.0-incubating'. You can view them here:
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
>> projectId=12320420=12338381
>> 
>> The artifacts have been signed with Key : 8BF4ABEB
>> 
>> Please vote accordingly:
>> 
>> [ ] +1, accept RC as the official 0.11.0 release
>> [ ] -1, do not accept RC as the official 0.11.0 release because...
>>

[GitHub] incubator-predictionio-template-skeleton issue #5: Example Tests

2017-03-10 Thread mars

Github user mars commented on the issue:

https://github.com/apache/incubator-predictionio-template-skeleton/pull/5
  
I've been trying to support `DataSourceTest` using an in-memory H2 
database. Unfortunately, that's currently failing with:

```
[info] DataSourceTest:
[info] readTraining
[info] - should return the data *** FAILED ***
[info]   java.lang.IllegalStateException: Connection pool is not yet 
initialized.(name:'default)
[info]   at 
scalikejdbc.ConnectionPool$$anonfun$get$1.apply(ConnectionPool.scala:57)
[info]   at 
scalikejdbc.ConnectionPool$$anonfun$get$1.apply(ConnectionPool.scala:55)
[info]   at scala.Option.getOrElse(Option.scala:120)
[info]   at scalikejdbc.ConnectionPool$.get(ConnectionPool.scala:55)
[info]   at scalikejdbc.ConnectionPool$.apply(ConnectionPool.scala:46)
[info]   at scalikejdbc.DB$.connectionPool(DB.scala:150)
[info]   at scalikejdbc.DB$.autoCommit(DB.scala:213)
[info]   at 
org.apache.predictionio.data.storage.jdbc.JDBCApps.(JDBCApps.scala:32)
[info]   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
[info]   at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
```

â¦which seems to be caused by [this `DB autoCommit` call in the JDBC 
initializer](https://github.com/apache/incubator-predictionio/blob/release/0.10.0/data/src/main/scala/org/apache/predictionio/data/storage/jdbc/JDBCApps.scala#L32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-predictionio-template-skeleton pull request #5: Example Tests

2017-03-08 Thread mars

GitHub user mars opened a pull request:

https://github.com/apache/incubator-predictionio-template-skeleton/pull/5

Example Tests

This skeleton repo seems to be the authoritative starting point for 
creating a new engine. Since testing is a great way to improve collaboration 
and reliability, what do you think about including example tests in the 
skeleton?

Here I implemented tests with a [ScalaTest](http://www.scalatest.org) suite 
which includes a mixin `SharedSingletonContext` to make a Spark context 
available as `SparkContext`.

Each of the engine-defined classes now has a tiny passing test: 
`AlgorithmTest`, `EngineTest`, `PreparatorTest`, & `ServingTest`.

`DataSourceTest` is the outlier and so currently tagged **ignore**. It's 
difficult to test in the skeleton, because it requires a database connection as 
well as database cleansing or transaction isolation between tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mars/incubator-predictionio-template-skeleton 
example-tests

Alternatively you can review and apply these changes as the patch at:


https://github.com/apache/incubator-predictionio-template-skeleton/pull/5.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5


commit a6954e9bc7bd53fc47e5150f6132757e740b42a5
Author: Mars Hall <m...@heroku.com>
Date:   2017-03-08T21:29:45Z

Implement tests in template skeleton.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: Remove engine registration

2016-09-17 Thread Mars Hall

Hello folks,

Great to hear about this possibility. I've been working on running PredictionIO 
on Heroku https://www.heroku.com

Heroku's 12-factor architecture https://12factor.net prefers "stateless builds" 
to ensure that compiled artifacts result in processes which may be cheaply 
restarted, replaced, and scaled via process count & size. I imagine this 
stateless property would be valuable for others as well.

The fact that `pio build` inserts stateful metadata into a database causes 
ripples throughout the lifecycle of PIO engines on Heroku:

* An engine cannot be built for production without the production database 
available. When a production database contains PII (personally identifiable 
information) which has security compliance requirements, the build system may 
not be privileged to access that PII data. This also affects CI (continuous 
integration/testing), where engines would need to be rebuilt in production, 
defeating assurances CI is supposed to provide.

* The build artifacts cannot be reliably reused. "Slugs" at Heroku are intended 
to be stateless, so that you can rollback to a previous version during the 
lifetime of an app. With `pio build` causing database side-effects, there's a 
greater-than-zero probability of slug-to-metadata inconsistencies eventually 
surfacing in a long-running system.

From my user-perspective, a few changes to the CLI would fix it:

1. add a "skip registration" option, `pio build --without-engine-registration`
2. a new command `pio app register` that could be run separately in the built 
engine (before training)

Alas, I do not know PredictionIO internals, so I can only offer a suggestion 
for how this might be solved.

Donald, one specific note,

Regarding "No automatic version matching of PIO binary distribution and 
artifacts version used in the engine template":

The Heroku slug contains the PredictionIO binary distribution used to build the 
engine, so there's never a version matching issue. I guess some systems might 
deploy only the engine artifacts to production where a pre-existing PIO binary 
is available, but that seems like a risky practice for long-running systems.

Thanks for listening,

*Mars Hall
Customer Facing Architect
Salesforce App Cloud / Heroku
San Francisco, California

> On Sep 16, 2016, at 10:42, Donald Szeto <don...@apache.org> wrote:
> 
> Hi all,
> 
> I want to start the discussion of removing engine registration. How many 
> people actually take advantage of being able to run pio commands everywhere 
> outside of an engine template directory? This will be a nontrivial change on 
> the operational side so I want to gauge the potential impact to existing 
> users.
> 
> Pros:
> - Stateless build. This would work well with many PaaS.
> - Eliminate the "pio build" command once and for all.
> - Ability to use your own build system, i.e. Maven, Ant, Gradle, etc.
> - Potentially better experience with IDE since engine templates no longer 
> depends on an SBT plugin.
> 
> Cons:
> - Inability to run pio engine training and deployment commands outside of 
> engine template directory.
> - No automatic version matching of PIO binary distribution and artifacts 
> version used in the engine template.
> - A less unified user experience: from pio-build-train-deploy to build, then 
> pio-train-deploy.
> 
> Regards,
> Donald

84 matches

Mail list logo