PredictionIO ASF Board Report for December 2018

2018-12-10 Thread Donald Szeto
Hi all,

Please take a look at the draft report below and make your comments or
edits as you see fit.

Let's come to a consent by December 12th for the report submission. Thanks!

Regards,
Donald

## Description:
 - PredictionIO is an open source Machine Learning Server built on top of
state-of-the-art open source stack, that enables developers to manage and
deploy production-ready predictive services for various kinds of machine
learning tasks.

## Issues:
 - There are no issues requiring board attention at this time.

## Activity:
 - Improved PySpark support.
 - Most client SDKs have been updated.
 - Successful GitBox migration of all repos.
 - Re-architecture discussion towards version 1.0 started.
 - Continued community support and driving for contributions.

## PMC changes:

 - Currently 28 PMC members.
 - No new PMC members added in the last 3 months
 - Last PMC addition was Mars Hall on Fri Jul 28 2017

## Committer base changes:

 - Currently 29 committers.
 - No new changes to the committer base since last report.
 - Last committer addition was Mars Hall on Fri Jul 28 2017

## Releases:

 - 0.13.0 was released on Wed Sep 19 2018

## JIRA activity:

 - 36 JIRA tickets created in the last 3 months
 - 29 JIRA tickets closed/resolved in the last 3 months


[jira] [Resolved] (PIO-192) Enhance PySpark support

2018-12-10 Thread Naoki Takezoe (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naoki Takezoe resolved PIO-192.
---
   Resolution: Done
Fix Version/s: 0.14.0

> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
> Fix For: 0.14.0
>
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714683#comment-16714683
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe closed pull request #494: [PIO-192] Enhance PySpark support
URL: https://github.com/apache/predictionio/pull/494
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/bin/pio-shell b/bin/pio-shell
index cd119cdee..e23041ac8 100755
--- a/bin/pio-shell
+++ b/bin/pio-shell
@@ -65,7 +65,6 @@ then
   # Get paths of assembly jars to pass to pyspark
   . ${PIO_HOME}/bin/compute-classpath.sh
   shift
-  export PYTHONSTARTUP=${PIO_HOME}/python/pypio/shell.py
   export PYTHONPATH=${PIO_HOME}/python
   ${SPARK_HOME}/bin/pyspark --jars ${ASSEMBLY_JARS} $@
 else
diff --git a/build.sbt b/build.sbt
index 2a6204215..fc1fcf255 100644
--- a/build.sbt
+++ b/build.sbt
@@ -151,19 +151,19 @@ val core = (project in file("core")).
   enablePlugins(SbtTwirl).
   disablePlugins(sbtassembly.AssemblyPlugin)
 
-val tools = (project in file("tools")).
+val e2 = (project in file("e2")).
   dependsOn(core).
-  dependsOn(data).
   settings(commonSettings: _*).
-  settings(commonTestSettings: _*).
-  settings(skip in publish := true).
   enablePlugins(GenJavadocPlugin).
-  enablePlugins(SbtTwirl)
+  disablePlugins(sbtassembly.AssemblyPlugin)
 
-val e2 = (project in file("e2")).
+val tools = (project in file("tools")).
+  dependsOn(e2).
   settings(commonSettings: _*).
+  settings(commonTestSettings: _*).
+  settings(skip in publish := true).
   enablePlugins(GenJavadocPlugin).
-  disablePlugins(sbtassembly.AssemblyPlugin)
+  enablePlugins(SbtTwirl)
 
 val dataEs = if (majorVersion(es) == 1) dataElasticsearch1 else 
dataElasticsearch
 
diff --git 
a/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
 
b/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
index cfc83eb1d..011cd95c9 100644
--- 
a/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
+++ 
b/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
@@ -55,9 +55,10 @@ object EngineServerPluginContext extends Logging {
   EngineServerPlugin.outputSniffer -> mutable.Map())
 val pluginParams = mutable.Map[String, JValue]()
 val serviceLoader = ServiceLoader.load(classOf[EngineServerPlugin])
-val variantJson = parse(stringFromFile(engineVariant))
-(variantJson \ "plugins").extractOpt[JObject].foreach { pluginDefs =>
-  pluginDefs.obj.foreach { pluginParams += _ }
+stringFromFile(engineVariant).foreach { variantJson =>
+  (parse(variantJson) \ "plugins").extractOpt[JObject].foreach { 
pluginDefs =>
+pluginDefs.obj.foreach { pluginParams += _ }
+  }
 }
 serviceLoader foreach { service =>
   pluginParams.get(service.pluginName) map { params =>
@@ -77,11 +78,15 @@ object EngineServerPluginContext extends Logging {
   log)
   }
 
-  private def stringFromFile(filePath: String): String = {
+  private def stringFromFile(filePath: String): Option[String] = {
 try {
-  val uri = new URI(filePath)
-  val fs = FileSystem.get(uri, new Configuration())
-  new String(ByteStreams.toByteArray(fs.open(new Path(uri))).map(_.toChar))
+  val fs = FileSystem.get(new Configuration())
+  val path = new Path(new URI(filePath))
+  if (fs.exists(path)) {
+Some(new String(ByteStreams.toByteArray(fs.open(path)).map(_.toChar)))
+  } else {
+None
+  }
 } catch {
   case e: java.io.IOException =>
 error(s"Error reading from file: ${e.getMessage}. Aborting.")
diff --git 
a/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala 
b/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala
index cb71f14e4..3aafe67e0 100644
--- a/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala
+++ b/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala
@@ -32,7 +32,6 @@ import org.json4s.native.JsonMethods.compact
 import org.json4s.native.JsonMethods.pretty
 import org.json4s.native.JsonMethods.parse
 import org.json4s.native.JsonMethods.render
-import org.json4s.reflect.TypeInfo
 
 object JsonExtractor {
 
@@ -144,7 +143,13 @@ object JsonExtractor {
 formats: Formats,
 clazz: Class[T]): T = {
 
-Extraction.extract(parse(json), TypeInfo(clazz, 
None))(formats).asInstanceOf[T]
+implicit val f = formats
+implicit val m = if (clazz == classOf[Map[_, _]]) {
+  Manifest.classType(clazz, manifest[String], manifest[Any])
+} else {
+  Manifest.classType(clazz)
+}
+Extraction.extract(parse(json))