[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16714683#comment-16714683
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe closed pull request #494: [PIO-192] Enhance PySpark support
URL: https://github.com/apache/predictionio/pull/494
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/bin/pio-shell b/bin/pio-shell
index cd119cdee..e23041ac8 100755
--- a/bin/pio-shell
+++ b/bin/pio-shell
@@ -65,7 +65,6 @@ then
   # Get paths of assembly jars to pass to pyspark
   . ${PIO_HOME}/bin/compute-classpath.sh
   shift
-  export PYTHONSTARTUP=${PIO_HOME}/python/pypio/shell.py
   export PYTHONPATH=${PIO_HOME}/python
   ${SPARK_HOME}/bin/pyspark --jars ${ASSEMBLY_JARS} $@
 else
diff --git a/build.sbt b/build.sbt
index 2a6204215..fc1fcf255 100644
--- a/build.sbt
+++ b/build.sbt
@@ -151,19 +151,19 @@ val core = (project in file("core")).
   enablePlugins(SbtTwirl).
   disablePlugins(sbtassembly.AssemblyPlugin)
 
-val tools = (project in file("tools")).
+val e2 = (project in file("e2")).
   dependsOn(core).
-  dependsOn(data).
   settings(commonSettings: _*).
-  settings(commonTestSettings: _*).
-  settings(skip in publish := true).
   enablePlugins(GenJavadocPlugin).
-  enablePlugins(SbtTwirl)
+  disablePlugins(sbtassembly.AssemblyPlugin)
 
-val e2 = (project in file("e2")).
+val tools = (project in file("tools")).
+  dependsOn(e2).
   settings(commonSettings: _*).
+  settings(commonTestSettings: _*).
+  settings(skip in publish := true).
   enablePlugins(GenJavadocPlugin).
-  disablePlugins(sbtassembly.AssemblyPlugin)
+  enablePlugins(SbtTwirl)
 
 val dataEs = if (majorVersion(es) == 1) dataElasticsearch1 else 
dataElasticsearch
 
diff --git 
a/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
 
b/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
index cfc83eb1d..011cd95c9 100644
--- 
a/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
+++ 
b/core/src/main/scala/org/apache/predictionio/workflow/EngineServerPluginContext.scala
@@ -55,9 +55,10 @@ object EngineServerPluginContext extends Logging {
   EngineServerPlugin.outputSniffer -> mutable.Map())
 val pluginParams = mutable.Map[String, JValue]()
 val serviceLoader = ServiceLoader.load(classOf[EngineServerPlugin])
-val variantJson = parse(stringFromFile(engineVariant))
-(variantJson \ "plugins").extractOpt[JObject].foreach { pluginDefs =>
-  pluginDefs.obj.foreach { pluginParams += _ }
+stringFromFile(engineVariant).foreach { variantJson =>
+  (parse(variantJson) \ "plugins").extractOpt[JObject].foreach { 
pluginDefs =>
+pluginDefs.obj.foreach { pluginParams += _ }
+  }
 }
 serviceLoader foreach { service =>
   pluginParams.get(service.pluginName) map { params =>
@@ -77,11 +78,15 @@ object EngineServerPluginContext extends Logging {
   log)
   }
 
-  private def stringFromFile(filePath: String): String = {
+  private def stringFromFile(filePath: String): Option[String] = {
 try {
-  val uri = new URI(filePath)
-  val fs = FileSystem.get(uri, new Configuration())
-  new String(ByteStreams.toByteArray(fs.open(new Path(uri))).map(_.toChar))
+  val fs = FileSystem.get(new Configuration())
+  val path = new Path(new URI(filePath))
+  if (fs.exists(path)) {
+Some(new String(ByteStreams.toByteArray(fs.open(path)).map(_.toChar)))
+  } else {
+None
+  }
 } catch {
   case e: java.io.IOException =>
 error(s"Error reading from file: ${e.getMessage}. Aborting.")
diff --git 
a/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala 
b/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala
index cb71f14e4..3aafe67e0 100644
--- a/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala
+++ b/core/src/main/scala/org/apache/predictionio/workflow/JsonExtractor.scala
@@ -32,7 +32,6 @@ import org.json4s.native.JsonMethods.compact
 import org.json4s.native.JsonMethods.pretty
 import org.json4s.native.JsonMethods.parse
 import org.json4s.native.JsonMethods.render
-import org.json4s.reflect.TypeInfo
 
 object JsonExtractor {
 
@@ -144,7 +143,13 @@ object JsonExtractor {
 formats: Formats,
 clazz: Class[T]): T = {
 
-Extraction.extract(parse(json), TypeInfo(clazz, 
None))(formats).asInstanceOf[T]
+implicit val f = formats
+implicit val m = if (clazz == classOf[Map[_, _]]) {
+  Manifest.classType(clazz, manifest[String], manifest[Any])
+} else {
+  Manifest.classType(clazz)
+}
+Extraction.extract(parse(json))
   }
 
   

[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710985#comment-16710985
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r239329614
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   I'm considering supporting data import, but I don't have an image of how to 
implement yet. The ideal API specification is the following:
   
   ```
   diabetes_df = pypio.import_file("../input/diabetes.csv", )
   ```
   
   I believe we should see the exact behavior of current commands and SDK. 
Anyway, this topic is out of scope of my PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710984#comment-16710984
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r239329614
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   I'm considering supporting data import, but I don't have an image of how to 
implement yet. The ideal API specification is the following:
   
   ```
   diabetes_df = pypio.import_file("../input/diabetes.csv", 
destination_frame="diabetes_df")
   ```
   
   I believe we should see the exact behavior of current commands and SDK. 
Anyway, this topic is out of scope of my PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710174#comment-16710174
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r239101138
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   Thanks for information!
   
   Actually, I had forgotten what `pio app new` (and `pio accessKey new`) do. 
If you have a plan to support importing event data via Web API of the event 
server, how it handle accessKey can be an important point in the design of this 
Python support.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16710171#comment-16710171
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r239101138
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   Thanks for information! Actually, I had forgotten what `pio app new` (and 
`pio accessKey new`) do.
   
   If you have a plan to support importing event data via Web API of the event 
server, how it handle accessKey can be an important point in the design of this 
Python support.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-05 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16709856#comment-16709856
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r238999000
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   You have a point. `init` function, as you say, shouldn't take `app_name`. 
I'll remove it.
   
   Just an FYI, I found out the following:
   - Running `pio app new ` creates an event table/index and an 
accesskey
   - Running `pio accesskey new ` creates another accesskey for 
specified App
   - Users must use an accessKey to import events via API


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708167#comment-16708167
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r238525718
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   While I'm not sure the use case of pypio, I feel the lack of consistency in 
these APIs.
   
   For example, `find_events()` takes `app_name`. Does this mean that pypio 
supports multiple apps in a same session? If so, Can we call `init()` multiple 
times to create multiple apps?
   
   I think that either of following options would be better for API consistency:
   
   - If pypio supports multiple apps, separate a method for creating a new app 
from `init()`.
   - If pypio doesn't support multiple apps, make `app_name` mandatory in 
`init()` and use it for `find_events()` too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16708165#comment-16708165
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r238525718
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -0,0 +1,125 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import absolute_import
+
+import atexit
+import json
+import os
+import sys
+
+from pypio.data import PEventStore
+from pypio.utils import dict_to_scalamap, list_to_dict
+from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+
+
+def init(app_name=None):
 
 Review comment:
   While I'm not sure the use case of pypio, I feel the lack of consistency in 
these APIs.
   
   For example, `find_events()` takes `app_name`. Does this mean that pypio 
supports multiple apps in a same session? If so, Can we call `init()` multiple 
times to create multiple apps?
   
   I think either of following options is good for API consistency:
   
   - If pypio supports multiple apps, separate a method for creating a new app 
from `init()`.
   - If pypio doesn't support multiple apps, make `app_name` mandatory in 
`init()` and use it for `find_events()` too.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-12-03 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16706994#comment-16706994
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto commented on issue #494: [PIO-192] Enhance PySpark support
URL: https://github.com/apache/predictionio/pull/494#issuecomment-443670773
 
 
   @marevol @takezoe I updated the JIRA ticket. Again, could you please review 
this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine has nothing to need. Developers can use the pypio module with 
> jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> # not create App
> pypio.init()
> # create App (pio app new BHPApp)
> pypio.init('BHPApp')
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find_events('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> engine_instance_id = pypio.save_model(model, ["prediction"])
> {code}
> h4. Run & Deploy
> h5. Run Jupyter
> {code:sh}
> pio-shell --with-pyspark
> {code}
> h5. Run on Spark
> {code:sh}
> pio train --main-py-file .py
> {code}
> h5. Deploy App
> {code:sh}
> pio deploy --engine-instance-id 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687370#comment-16687370
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r233687103
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
+engine = sc._jvm.org.apache.predictionio.e2.engine.PythonEngine
+engine.model().set(model._to_java())
+main_args = utils.toJArray(sc._gateway, sc._gateway.jvm.String, sys.argv)
+create_workflow = sc._jvm.org.apache.predictionio.workflow.CreateWorkflow
+spark.stop()
+create_workflow.main(main_args)
 
 Review comment:
   Ah, top level commands don't accept illegal arguments so passing `sys.argv` 
directly to `CreateWorkflow.main()` is safe. I understand.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686005#comment-16686005
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r233292934
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
 
 Review comment:
   Fixed it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16683128#comment-16683128
 ] 

ASF GitHub Bot commented on PIO-192:


marevol commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232516369
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
 Review comment:
   cleanup_functions is not needed in new pypio?
   In python/pyspark/shell.py, PySpark seems to call the following code when 
exiting a process.
   ```
   atexit.register(lambda: sc.stop())
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682674#comment-16682674
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232469362
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
+engine = sc._jvm.org.apache.predictionio.e2.engine.PythonEngine
+engine.model().set(model._to_java())
+main_args = utils.toJArray(sc._gateway, sc._gateway.jvm.String, sys.argv)
+create_workflow = sc._jvm.org.apache.predictionio.workflow.CreateWorkflow
+spark.stop()
+create_workflow.main(main_args)
 
 Review comment:
   This functions is wrapper of `CreateWorkflow`. It seems to can run  
`runEvaluation` not only `runTrain` by giving bootstrap arguments. Does it work?
   
https://github.com/apache/predictionio/blob/4342fcd9d0a7b549543b59467f5e1b008523fe4f/core/src/main/scala/org/apache/predictionio/workflow/CreateWorkflow.scala#L271-L275


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682673#comment-16682673
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232469362
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
+engine = sc._jvm.org.apache.predictionio.e2.engine.PythonEngine
+engine.model().set(model._to_java())
+main_args = utils.toJArray(sc._gateway, sc._gateway.jvm.String, sys.argv)
+create_workflow = sc._jvm.org.apache.predictionio.workflow.CreateWorkflow
+spark.stop()
+create_workflow.main(main_args)
 
 Review comment:
   This functions is wrapper of `CreateWorkflow`. It seems to can run  
`runEvaluation` not only `runTrain` by giving bootstrap arguments. Does it work?
   
https://github.com/apache/predictionio/blob/4342fcd9d0a7b549543b59467f5e1b008523fe4f/core/src/main/scala/org/apache/predictionio/workflow/CreateWorkflow.scala#L271


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682672#comment-16682672
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232469362
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
+engine = sc._jvm.org.apache.predictionio.e2.engine.PythonEngine
+engine.model().set(model._to_java())
+main_args = utils.toJArray(sc._gateway, sc._gateway.jvm.String, sys.argv)
+create_workflow = sc._jvm.org.apache.predictionio.workflow.CreateWorkflow
+spark.stop()
+create_workflow.main(main_args)
 
 Review comment:
   This functions is wrapper of `CreateWorkflow`. It seems to can run even the 
evaluation by giving bootstrap arguments. Does it work?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682671#comment-16682671
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232469264
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
 
 Review comment:
   `saveModel` might be better as same as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682669#comment-16682669
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232469362
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
+return p_event_store.find(app_name)
 
-p_event_store = PEventStore(spark._jsparkSession, sqlContext)
-cleanup_functions = CleanupFunctions(sqlContext)
 
+def save(model):
+engine = sc._jvm.org.apache.predictionio.e2.engine.PythonEngine
+engine.model().set(model._to_java())
+main_args = utils.toJArray(sc._gateway, sc._gateway.jvm.String, sys.argv)
+create_workflow = sc._jvm.org.apache.predictionio.workflow.CreateWorkflow
+spark.stop()
+create_workflow.main(main_args)
 
 Review comment:
   This functions is wrapper of CreateWorkflow. It seems to can run even the 
evaluation by giving bootstrap arguments. Does it work?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682670#comment-16682670
 ] 

ASF GitHub Bot commented on PIO-192:


takezoe commented on a change in pull request #494: [PIO-192] Enhance PySpark 
support
URL: https://github.com/apache/predictionio/pull/494#discussion_r232469236
 
 

 ##
 File path: python/pypio/pypio.py
 ##
 @@ -15,9 +15,35 @@
 # limitations under the License.
 #
 
+from __future__ import absolute_import
+
+import sys
+
 from pypio.data import PEventStore
-from pypio.workflow import CleanupFunctions
+from pyspark.sql import SparkSession
+from pyspark.sql import utils
+
+
+def init():
+global spark
+spark = SparkSession.builder.getOrCreate()
+global sc
+sc = spark.sparkContext
+global sqlContext
+sqlContext = spark._wrapped
+global p_event_store
+p_event_store = PEventStore(spark._jsparkSession, sqlContext)
+print("Initialized pypio")
+
+
+def find(app_name):
 
 Review comment:
   `findEvents` might be better to clarify the purpose of this function.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-09 Thread Wei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681716#comment-16681716
 ] 

Wei Chen commented on PIO-192:
--

Hello [~shimamoto], just a question.
Since we are doing the restructuring, are we looking for providing functions to 
deploy prediction service:

{code:python}
pypio.deploy(model)
{code}

Also, should we allow users to create new apps in the notebook?
{code:python}
pypio.newApp("myApp1")
{code}

So users can have complete control just by using the notebook.
Doing so will make Jupiter notebook a control center for experiments, which I 
think we should also take into consideration before settling the new 
architecture.

> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680893#comment-16680893
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto commented on issue #494: [PIO-192] Enhance PySpark support
URL: https://github.com/apache/predictionio/pull/494#issuecomment-437253361
 
 
   @marevol @takezoe Can you take a look?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-02 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16672863#comment-16672863
 ] 

ASF GitHub Bot commented on PIO-192:


shimamoto opened a new pull request #494: [WIP][PIO-192] Enhance PySpark support
URL: https://github.com/apache/predictionio/pull/494
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ]
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> model = ...
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)