Sep 2016 Podling report
Please find below the proposed Podling PIO report for Sep 2016, let me know of any changes/additions you would like to make. Apache PredictionIO (Incubating) Apache PredictionIO (incubating) is an open source Machine Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. PredictionIO has been incubating since 2016-05-26. The initial code for PredictionIO was granted on 2016-06-16. A second grant of PredictionIO Templates is in process and anticipated to be complete in September 2016. The important issues to address in the move towards graduation: 1. Establish a formal release schedule and process, allowing for dependable release cycles in a manner consistent with the Apache way. 2. Grow the community to establish diversity of background. Any issues the Incubator PMC or ASF Board wish/need to be aware of? None How has the community developed since the last report? 1. Users have begun to migrate from the old Google group to the Apache user list 2. 3 new contributors have become active with consistent acceptable contributions and are on path to become committers How has the project developed since the last report? 1. The old Prediction.io docs site has been migrated to http://predictionio.incubator.apache.org 2. 7 templates have been ported to Apache PredictionIO, pending a decision on a 2nd grant. The pending 2nd grant will not block a PredictionIO release. Date of last release: No releases have been made yet. Present activities are directed towards first Apache Release planned for September 2016. When were the last committers or PMC members elected? Paul Li was elected as committer and PMC member on Aug 30, 2016
Re: Podling Report Reminder - September 2016
Thanks Suneel. Whenever the report is ready and there's consensus on the PPMC that it's good, we can post it to the incubator wiki. The deadline is Wednesday the 7th. On Thu, Sep 1, 2016 at 2:23 PM, Suneel Marthiwrote: > I posted a outline draft here for others to fill in - > https://docs.google.com/document/d/1ktBY0mUxT1GW42Oq2howNmJZea2xj > 8PKVXxAPSIetJE/edit > > > > On Thu, Sep 1, 2016 at 5:02 PM, Andrew Purtell > wrote: > > > Time to report to the Board and IPMC. Do we have a volunteer from the > PPMC > > to produce the report this time? > > > > On Tue, Aug 30, 2016 at 3:46 AM, wrote: > > > > > Dear podling, > > > > > > This email was sent by an automated system on behalf of the Apache > > > Incubator PMC. It is an initial reminder to give you plenty of time to > > > prepare your quarterly board report. > > > > > > The board meeting is scheduled for Wed, 21 September 2016, 10:30 am > PDT. > > > The report for your podling will form a part of the Incubator PMC > > > report. The Incubator PMC requires your report to be submitted 2 weeks > > > before the board meeting, to allow sufficient time for review and > > > submission (Wed, September 07). > > > > > > Please submit your report with sufficient time to allow the Incubator > > > PMC, and subsequently board members to review and digest. Again, the > > > very latest you should submit your report is 2 weeks prior to the board > > > meeting. > > > > > > Thanks, > > > > > > The Apache Incubator PMC > > > > > > Submitting your Report > > > > > > -- > > > > > > Your report should contain the following: > > > > > > * Your project name > > > * A brief description of your project, which assumes no knowledge of > > > the project or necessarily of its field > > > * A list of the three most important issues to address in the move > > > towards graduation. > > > * Any issues that the Incubator PMC or ASF Board might wish/need to > be > > > aware of > > > * How has the community developed since the last report > > > * How has the project developed since the last report. > > > > > > This should be appended to the Incubator Wiki page at: > > > > > > http://wiki.apache.org/incubator/September2016 > > > > > > Note: This is manually populated. You may need to wait a little before > > > this page is created from a template. > > > > > > Mentors > > > --- > > > > > > Mentors should review reports for their project(s) and sign them off on > > > the Incubator wiki page. Signing off reports shows that you are > > > following the project - projects that are not signed may raise alarms > > > for the Incubator PMC. > > > > > > Incubator PMC > > > > > > > > > > > -- > > Best regards, > > > >- Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
Re: Podling Report Reminder - September 2016
I posted a outline draft here for others to fill in - https://docs.google.com/document/d/1ktBY0mUxT1GW42Oq2howNmJZea2xj8PKVXxAPSIetJE/edit On Thu, Sep 1, 2016 at 5:02 PM, Andrew Purtellwrote: > Time to report to the Board and IPMC. Do we have a volunteer from the PPMC > to produce the report this time? > > On Tue, Aug 30, 2016 at 3:46 AM, wrote: > > > Dear podling, > > > > This email was sent by an automated system on behalf of the Apache > > Incubator PMC. It is an initial reminder to give you plenty of time to > > prepare your quarterly board report. > > > > The board meeting is scheduled for Wed, 21 September 2016, 10:30 am PDT. > > The report for your podling will form a part of the Incubator PMC > > report. The Incubator PMC requires your report to be submitted 2 weeks > > before the board meeting, to allow sufficient time for review and > > submission (Wed, September 07). > > > > Please submit your report with sufficient time to allow the Incubator > > PMC, and subsequently board members to review and digest. Again, the > > very latest you should submit your report is 2 weeks prior to the board > > meeting. > > > > Thanks, > > > > The Apache Incubator PMC > > > > Submitting your Report > > > > -- > > > > Your report should contain the following: > > > > * Your project name > > * A brief description of your project, which assumes no knowledge of > > the project or necessarily of its field > > * A list of the three most important issues to address in the move > > towards graduation. > > * Any issues that the Incubator PMC or ASF Board might wish/need to be > > aware of > > * How has the community developed since the last report > > * How has the project developed since the last report. > > > > This should be appended to the Incubator Wiki page at: > > > > http://wiki.apache.org/incubator/September2016 > > > > Note: This is manually populated. You may need to wait a little before > > this page is created from a template. > > > > Mentors > > --- > > > > Mentors should review reports for their project(s) and sign them off on > > the Incubator wiki page. Signing off reports shows that you are > > following the project - projects that are not signed may raise alarms > > for the Incubator PMC. > > > > Incubator PMC > > > > > > -- > Best regards, > >- Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
Re: Podling Report Reminder - September 2016
Time to report to the Board and IPMC. Do we have a volunteer from the PPMC to produce the report this time? On Tue, Aug 30, 2016 at 3:46 AM,wrote: > Dear podling, > > This email was sent by an automated system on behalf of the Apache > Incubator PMC. It is an initial reminder to give you plenty of time to > prepare your quarterly board report. > > The board meeting is scheduled for Wed, 21 September 2016, 10:30 am PDT. > The report for your podling will form a part of the Incubator PMC > report. The Incubator PMC requires your report to be submitted 2 weeks > before the board meeting, to allow sufficient time for review and > submission (Wed, September 07). > > Please submit your report with sufficient time to allow the Incubator > PMC, and subsequently board members to review and digest. Again, the > very latest you should submit your report is 2 weeks prior to the board > meeting. > > Thanks, > > The Apache Incubator PMC > > Submitting your Report > > -- > > Your report should contain the following: > > * Your project name > * A brief description of your project, which assumes no knowledge of > the project or necessarily of its field > * A list of the three most important issues to address in the move > towards graduation. > * Any issues that the Incubator PMC or ASF Board might wish/need to be > aware of > * How has the community developed since the last report > * How has the project developed since the last report. > > This should be appended to the Incubator Wiki page at: > > http://wiki.apache.org/incubator/September2016 > > Note: This is manually populated. You may need to wait a little before > this page is created from a template. > > Mentors > --- > > Mentors should review reports for their project(s) and sign them off on > the Incubator wiki page. Signing off reports shows that you are > following the project - projects that are not signed may raise alarms > for the Incubator PMC. > > Incubator PMC > -- Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
[GitHub] incubator-predictionio pull request #288: [PIO-30] Set up a cross build for ...
GitHub user Ziemin opened a pull request: https://github.com/apache/incubator-predictionio/pull/288 [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala ⦠I treat this PR as an RFC to get some insights on what could be improved and what would be the best for the community. I don't think this is ready for merge and it also should not be a part of the upcoming release, but it's an example of working crossbuild setup. ### The good The key changes include: * `build.sbt` - here I set two versions of scala for the cross build. With regards to the current version appropriate versions of spark, akka and hadoop are chosen. To run an sbt command for both setups at the same time, you simply need to type, e.g sbt +test. To choose scala 2.11.5 specifically you just have to write e.g. sbt ++2.10.5 scalastyle. Using sbt in a default way will always resort to 2.11.8. * `data/src/main/scala-2.10/org/apache/predictionio/data/SparkVersionDependent.scala` and `data/src/main/scala-2.11/org/apache/predictionio/data/SparkVersionDependent.scala` - are the only examples of version dependent code. They are solely for providing a proper type of an object for Spark sql related actions. Sbt is smart enough to include version specific source paths like these. * `make_distribution.sh` - in order to create an archive for Scala 2.10.5 one has to provide it with an argument (./make_distribution.sh 2.10.5). By default it will use 2.11.8. * **integration tests** - The docker image is updated, I pushed it with a tag spark_2.0.0 not to interfere with the current build. It contains both versions of Spark and on startup sets up environment according to dependencies that predictionIO was built with. It uses a simple Java program `tests/docker-files/BuildInfoPrinter.java` linked with the assembly of PredictionIO to acquire necessary information. Travis CI makes use of the setup and runs 8 parallel builds, the number doubled because of introducing two different scala versions. ### The bad Updating Spark caused some troubles * The classpath has to be extended to run the unit tests successfully for the data sub-package. (see `build.sbt`) * Column names have to be handled differently for Postgres in `JDBCPevents`, as Spark surrounds them with "..." what breaks the current schema in this case * `tests/pio_tests/utils.py` - has a special Spark pass through argument to set `spark.sql.warehouse.dir`, because the defaults cause runtime exceptions. See -> [here](https://mail-archives.apache.org/mod_mbox/spark-user/201608.mbox/%3ccamassd+efz+uscmnzvkfp00qbr9ynv8lrfhvz9lrmnwh2vk...@mail.gmail.com%3E) ### The ugly I haven't updated the install script. I think that there are too many places containing version strings and related dependencies. This is the same for setting up and downloading a proper version of spark in the tests. I think that we should come up with a cleaner way of choosing a profile that would be consistent between different scripts and easier to maintain. Currently bumping version of scala or spark in one place involves modifying many files and is very error prone. Ideally there should be a one place with a description of a profile that all the other scripts could use for the set up. I hope that this PR will lead to some discussion and we will finally devise a better solution. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ziemin/incubator-predictionio crossbuild Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/288.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #288 commit 85c06aa3cae53ca8158db34e3f5c6788e06b38cf Author: Marcin ZiemiÅskiDate: 2016-08-09T21:20:54Z [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala 2.11 (Spark 2.0.0). Changes also include updating travis integration tests, which run now eight parallel builds. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PIO-30) Cross build for different versions of scala and spark
[ https://issues.apache.org/jira/browse/PIO-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456437#comment-15456437 ] ASF GitHub Bot commented on PIO-30: --- GitHub user Ziemin opened a pull request: https://github.com/apache/incubator-predictionio/pull/288 [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala … I treat this PR as an RFC to get some insights on what could be improved and what would be the best for the community. I don't think this is ready for merge and it also should not be a part of the upcoming release, but it's an example of working crossbuild setup. ### The good The key changes include: * `build.sbt` - here I set two versions of scala for the cross build. With regards to the current version appropriate versions of spark, akka and hadoop are chosen. To run an sbt command for both setups at the same time, you simply need to type, e.g sbt +test. To choose scala 2.11.5 specifically you just have to write e.g. sbt ++2.10.5 scalastyle. Using sbt in a default way will always resort to 2.11.8. * `data/src/main/scala-2.10/org/apache/predictionio/data/SparkVersionDependent.scala` and `data/src/main/scala-2.11/org/apache/predictionio/data/SparkVersionDependent.scala` - are the only examples of version dependent code. They are solely for providing a proper type of an object for Spark sql related actions. Sbt is smart enough to include version specific source paths like these. * `make_distribution.sh` - in order to create an archive for Scala 2.10.5 one has to provide it with an argument (./make_distribution.sh 2.10.5). By default it will use 2.11.8. * **integration tests** - The docker image is updated, I pushed it with a tag spark_2.0.0 not to interfere with the current build. It contains both versions of Spark and on startup sets up environment according to dependencies that predictionIO was built with. It uses a simple Java program `tests/docker-files/BuildInfoPrinter.java` linked with the assembly of PredictionIO to acquire necessary information. Travis CI makes use of the setup and runs 8 parallel builds, the number doubled because of introducing two different scala versions. ### The bad Updating Spark caused some troubles * The classpath has to be extended to run the unit tests successfully for the data sub-package. (see `build.sbt`) * Column names have to be handled differently for Postgres in `JDBCPevents`, as Spark surrounds them with "..." what breaks the current schema in this case * `tests/pio_tests/utils.py` - has a special Spark pass through argument to set `spark.sql.warehouse.dir`, because the defaults cause runtime exceptions. See -> [here](https://mail-archives.apache.org/mod_mbox/spark-user/201608.mbox/%3ccamassd+efz+uscmnzvkfp00qbr9ynv8lrfhvz9lrmnwh2vk...@mail.gmail.com%3E) ### The ugly I haven't updated the install script. I think that there are too many places containing version strings and related dependencies. This is the same for setting up and downloading a proper version of spark in the tests. I think that we should come up with a cleaner way of choosing a profile that would be consistent between different scripts and easier to maintain. Currently bumping version of scala or spark in one place involves modifying many files and is very error prone. Ideally there should be a one place with a description of a profile that all the other scripts could use for the set up. I hope that this PR will lead to some discussion and we will finally devise a better solution. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ziemin/incubator-predictionio crossbuild Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-predictionio/pull/288.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #288 commit 85c06aa3cae53ca8158db34e3f5c6788e06b38cf Author: Marcin ZiemińskiDate: 2016-08-09T21:20:54Z [PIO-30] Set up a cross build for Scala 2.10 (Spark 1.6.2) and Scala 2.11 (Spark 2.0.0). Changes also include updating travis integration tests, which run now eight parallel builds. > Cross build for different versions of scala and spark > - > > Key: PIO-30 > URL: https://issues.apache.org/jira/browse/PIO-30 > Project: PredictionIO > Issue Type: Improvement >Reporter: Marcin Ziemiński > > The present version of Scala is 2.10 and Spark is 1.4, which is quite old. > With Spark 2.0.0 come many performance improvements and features, that people > will definitely like to add to their
[jira] [Commented] (PIO-27) Check release artifacts for licenses and the LICENSE.txt file
[ https://issues.apache.org/jira/browse/PIO-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456370#comment-15456370 ] Pat Ferrel commented on PIO-27: --- [~dszeto] Will start work on the text license part of this Friday Sept 2. > Check release artifacts for licenses and the LICENSE.txt file > - > > Key: PIO-27 > URL: https://issues.apache.org/jira/browse/PIO-27 > Project: PredictionIO > Issue Type: Task >Affects Versions: 0.10.0 >Reporter: Pat Ferrel >Priority: Blocker > Fix For: 0.10.0 > > > Quoth [~smarthi] " I would ask that you do the due diligence on the > License and Notice files and ensure that all third party jars have been > accounted for and the License and Notice files are included in the > appropriate project release artifacts." > This has to be done by hand. We should be able to do it now on the develop > branch build since we will not include new features and so no new > dependencies. > https://github.com/apache/incubator-predictionio/blob/develop/LICENSE.txt > https://github.com/apache/incubator-predictionio/blob/develop/NOTICE.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Apache PIO v0.10.0 release
100% agree! I’ll work on the text license verification tomorrow. Oops back to the Jira. Ship it! On Aug 30, 2016, at 1:47 PM, Donald Szetowrote: I am aware that the paperwork is in the pipeline, but agree that the release should not be blocked by that. I will take some time and look at the license requirement of the binary distribution. If anyone is starting the process of examining all third-party dependencies, please keep track of progress clearly and often in JIRA so hopefully we can parallelize the work if possible. I plan to start doing some of it later this week. I also hope to start the release candidate process in the 1st week of September. The community has been asking and we'd like to ship it to them. On Tue, Aug 30, 2016 at 10:05 AM, Pat Ferrel wrote: > It’s been over a week waiting for the template donation paperwork, which I > imagine will take a few weeks to go through repo creation, license > validation and PR merging before they can be released. Since they can be on > a separate release schedules I’d like to remove them from being blocking > release. We can point the Gallery at Chan’s github for now (just to remove > the issue as a blocker). Any other opinions? > > #3 below is still the main blocker IMO. I have some time this week to > handle the source license check but would have no clue about how to make > SBT include the correct licenses in the binary artifacts. Donald is the > only SBT expert I know in the project can anyone else help with the binary > build bits mentioned in https://issues.apache.org/jira/browse/PIO-26? > > There are several PRs that are in limbo since we aren’t adding new > features or major new code. The only way to get moving on these is to > release. Sooo... > > What do people think about a release? What sort of target date can we set? > Can we get things ready by first week of September? > > -- > > What do people think remains for release? > > 1) template donation and mods. Chan Lee has done work on this but we can’t > review until the donation and repos are set up. > 2) install.sh. There are some suggestions on how to deal with the one-line > install here: https://issues.apache.org/jira/browse/PIO-22 and a bug here > https://issues.apache.org/jira/browse/PIO-25 PIO-22 suggests we have an > install based on source pull and build so it will work even on snapshots in > the “develop” branch but we could also have an install from binary that > works after the release. Comments welcome. > 3) https://issues.apache.org/jira/browse/PIO-26 has a PR but I don’t fee > qualified to merge it, can someone pick this up? > > Anything else? I took the liberty of marking anything I thought was a > non-blocker but unresolved as minor. Feel free to disagree. > > Hopefully when #1 comes through we will have everything else cleared up. > Let’s ship-it. > >
Re: Regarding PredictionIO templates
This is getting complicated but is still workable. If the template has an Apache 2 license, which should be a requirement from old PIO days and should remain a requirement for gallery inclusion, then even if you can’t get the developer interested you can do the changes and host the modified template in your github account. You can submit it for inclusion in the Gallery too but you will be supporting the template and it will not be accepted into the Apache project because copyrights are ill-defined. I agree with Chan, cooperation is the best outcome, I’m just adding that it is not necessary and I’d love to see those templates supported. Hope that didn’t confuse things. On Sep 1, 2016, at 10:57 AM, Chan Leewrote: Hi Bansari, In that case, I'd like to suggest that you work with the owners of the templates to update the namespace and newer scala/spark version support. You could make a pull request to the original repo and tag a new version. Due to the unique way PredictionIO works, most templates will remain independent after Apache donation. It would be great if developers could work with each other to maintain and expand independent templates for various use cases. There is also a lot of work to be done on the main PredictionIO repo. Please feel free to make contributions or suggestions. Cheers, Chan On Wed, Aug 31, 2016 at 9:36 PM, Bansari Shah wrote: > Hi chan, > The templates i have updated to org.apache.predictionio namespace is > already in templates.yaml : > - template: > name: OpenNLP Sentiment Analysis Template > repo: "https://github.com/vshwnth2/OpenNLP-SentimentAnalysis-Template; > description: |- > Given a sentence, this engine will return a score between 0 and 4. This is > the sentiment of the sentence. The lower the number the more negative the > sentence is. It uses the OpenNLP library. > tags: [nlp] > type: Parallel > language: Scala > license: "Apache Licence 2.0" > status: alpha > pio_min_version: "-" > > And, > - template: > name: Sentiment Analysis Template > repo: "https://github.com/whhone/template-sentiment-analysis; > description: |- > Given a sentence, return a score between 0 and 4, indicating the > sentence's sentiment. 0 being very negative, 4 being very positive, 2 being > neutral. The engine uses the stanford CoreNLP library and the Scala binding > `gangeli/CoreNLP-Scala` for parsing. > tags: [nlp] > type: Parallel > language: Scala > license: None > status: stable > pio_min_version: 0.9.0 > > > > I have modified it from io.prediction to org.apache.predictionio .build > with predictionio -0.9.7-SNAPSHOT, scala 2.10.6 and spark 1.5.1, In this > case should i keep this one and add updated template to templates.yaml or > should i remove it . Please suggest me. > > Thank you. > > Regards, > Bansari Shah > > On Thu, Sep 1, 2016 at 12:11 AM, Chan Lee wrote: > >> Hi Bansari, >> >> You can git clone Apache PredictionIO repo (https://github.com/apache/inc >> ubator-predictionio), modify templates.yaml under >> /docs/manual/source/gallery, and submit a pull request. When merged, people >> will be able to view your templates here: http://predictionio.incu >> bator.apache.org/gallery/template-gallery/ >> >> Cheers, >> Chan >> >> >> On Wed, Aug 31, 2016 at 4:17 AM, Bansari Shah >> wrote: >> >>> Hi Chan, >>> >>> Updated predictionIO template of coreNLP_SentimentAnalysis and >>> OpenNLP_SentimentAnalysis are under below repository : >>> https://github.com/peoplehum/coreNLP-Sentiment-Analysis-Pred >>> ictionIO-Template >>> https://github.com/peoplehum/OpenNLP-Sentiment-Analysis-Pred >>> ictionIO-Template >>> >>> For integrating above this templates i have created jira issue PIO-34 : >>> Upgrading coreNLP_SentimentAnalysis and OpenNLP_Sentiment Analysis >>> template to org.apache.predictionio namespace. >>> >>> Please suggest me how to proceed further. >>> >>> Thank you >>> >>> Regards, >>> Bansari Shah >>> >>> >>> >>> On Tue, Aug 30, 2016 at 11:51 PM, Chan Lee wrote: >>> Hi Bansari, Apologies for the late reply. I was away yesterday. I have updated namespace and written tests for all "previously official" templates under PredictionIO repository ( https://github.com/PredictionIO) that we are working to donate to Apache. coreNLP_SentimentAnalysis and OpenNLP_SentimentAnalysis are not included in this. So your work does not overlap with mine. We are planning to release v0.10 of PredictionIO in September, and we would need voluntary updates to the org namespace in all templates. As Pat mentioned, the best way to do this is via pull request to the template gallery. Below are the links to template gallery page and instructions. For both namespace updates and your new template, you can just specify the pio_min_version as 0.10.0 and it will work with the
[jira] [Created] (PIO-35) Add integration tests for major templates
Chan created PIO-35: --- Summary: Add integration tests for major templates Key: PIO-35 URL: https://issues.apache.org/jira/browse/PIO-35 Project: PredictionIO Issue Type: Improvement Reporter: Chan Developers of engine templates should be able to test that their template works with the latest changes in PredictionIO. As a starting point, we can expand the integration test suite to all previously "official" templates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)