[jira] [Commented] (PIO-192) Enhance PySpark support

2018-11-09 Thread Wei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681716#comment-16681716
 ] 

Wei Chen commented on PIO-192:
--

Hello [~shimamoto], just a question.
Since we are doing the restructuring, are we looking for providing functions to 
deploy prediction service:

{code:python}
pypio.deploy(model)
{code}

Also, should we allow users to create new apps in the notebook?
{code:python}
pypio.newApp("myApp1")
{code}

So users can have complete control just by using the notebook.
Doing so will make Jupiter notebook a control center for experiments, which I 
think we should also take into consideration before settling the new 
architecture.

> Enhance PySpark support
> ---
>
> Key: PIO-192
> URL: https://issues.apache.org/jira/browse/PIO-192
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Takako Shimamoto
>Assignee: Takako Shimamoto
>Priority: Major
>
> h3. Summary
> Enhance the pypio, which is the Python API for PIO.
> h3. Goals
> The limitations of the current Python support always force developers to have 
> access to sbt. This enhancement will get rid of the build phase.
> h3. Description
> A Python engine template requires 3 files:
> * Python code to specify for the --main-py-file option
> * template.json
> {code:json}
> {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}}
> {code}
> * engine.json
> {code:json}
> {
>   "id": "default",
>   "description": "Default settings",
>   "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine",
>   "algorithms": [
> {
>   "name": "default",
>   "params": {
> "name": "BHPApp"
>   }
> }
>   ],
>   "serving": {
> "params": {
>   "columns": ["prediction"]
> }
>   }
> }
> {code}
> h4. pypio module
> Developers can use the pypio module with jupyter notebook and Python code.
> First, import the necessary modules.
> {code:python}
> from pypio import pypio
> {code}
> Once the module in imported, the first step is to initialize the pypio module.
> {code:python}
> pypio.init()
> {code}
> Next, find data from the event store.
> {code:python}
> event_df = pypio.find('BHPApp')
> {code}
> And then, save the model.
> {code:python}
> # model is a PipelineModel, which is produced after a Pipeline’s fit() method 
> runs
> pipeline = Pipeline(...)
> model = pipeline.fit(train_df)
> pypio.save(model)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-193) Use async requests to storage whenever possible

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681535#comment-16681535
 ] 

ASF GitHub Bot commented on PIO-193:


longliveenduro edited a comment on issue #495: [PIO-193] Async support for 
predict method and storage access, blocking code wrapped in blocking construct
URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562
 
 
   @dszeto @takezoe This would be one possible solution to be nearly 100% 
backward compatible. With "nearly" I mean that an engine implementor has to add 
"override" to its implementation of predictBase() / predict(), when he 
recompiles/changes his engine. Because in this suggestion I implement it with a 
default NotImplementedError stating that the method is deprecated. But this 
should only be necessary when the engine developer recompiles his engine. With 
that "override" hint he then also gets a good hint to move on to the new async 
implementation. Already compiled engines should still work as the simply 
override the new default impl. of predict/predictBase. predictBaseAsync and 
predictAsync simply delegate to the old methods and wrapping a blocking block 
around it to tell the standard scala execution context to execute them in a 
second, much bigger threadpool. See chapter "Blocking" from 
https://www.beyondthelines.net/computing/scala-future-and-execution-context/
   
   If you want we could just drop this default implementation of 
BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility 
but then even a new async implementation would have to implement these old 
method signatures with a dummy.
   
   Other solution I did think of were using some runtime information (things 
like checking dynamically if a method is present or not) but I decided not to 
use that, because these kind of lookups are usually very slow and IMHO breaks 
type saftey.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use async requests to storage whenever possible
> ---
>
> Key: PIO-193
> URL: https://issues.apache.org/jira/browse/PIO-193
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Chris Wewerka
>Priority: Major
>
> The storage access in Prediction IO uses blocking drivers and uses the 
> standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This 
> leads to bad usage of machines resources.
>  
> See also 
> [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E]
> and https://jira.apache.org/jira/browse/PIO-182



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-193) Use async requests to storage whenever possible

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681515#comment-16681515
 ] 

ASF GitHub Bot commented on PIO-193:


longliveenduro edited a comment on issue #495: [PIO-193] Async support for 
predict method and storage access, blocking code wrapped in blocking construct
URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562
 
 
   @dszeto @takezoe This would be one possible solution to be nearly 100% 
backward compatible. With "nearly" I mean that an engine implementor has to add 
"override" to its implementation of predictBase() / predict() because in this 
suggestion I implement it with a default NotImplementedError stating that the 
method is deprecated. But this should only be necessary when the engine 
developer recompiles his engine. With that "override" hint he then also gets a 
good hint to move on to the new async implementation. Already compiled engines 
should still work as the simply override the new default impl. of 
predict/predictBase. predictBaseAsync and predictAsync simply delegate to the 
old methods and wrapping a blocking block around it to tell the standard scala 
execution context to execute them in a second, much bigger threadpool. See 
chapter "Blocking" from 
https://www.beyondthelines.net/computing/scala-future-and-execution-context/
   
   If you want we could just drop this default implementation of 
BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility 
but then even a new async implementation would have to implement these old 
method signatures with a dummy.
   
   Other solution I did think of were using some runtime information (things 
like checking dynamically if a method is present or not) but I decided not to 
use that, because these kind of lookups are usually very slow and IMHO breaks 
type saftey.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use async requests to storage whenever possible
> ---
>
> Key: PIO-193
> URL: https://issues.apache.org/jira/browse/PIO-193
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Chris Wewerka
>Priority: Major
>
> The storage access in Prediction IO uses blocking drivers and uses the 
> standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This 
> leads to bad usage of machines resources.
>  
> See also 
> [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E]
> and https://jira.apache.org/jira/browse/PIO-182



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-193) Use async requests to storage whenever possible

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681440#comment-16681440
 ] 

ASF GitHub Bot commented on PIO-193:


longliveenduro commented on issue #495: [PIO-193] Async support for predict 
method and storage access, blocking code wrapped in blocking construct
URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562
 
 
   @dszeto @takezoe This would be one possible solution to be nearly 100% 
backward compatible. With "nearly" I mean that an engine implementor has to add 
"override" to its implementation of predictBase() / predict() because in this 
suggestion I implement it with a default NotImplementedError stating that the 
method is deprecated.
   
   If you want we could just drop this default implementation of 
BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility 
but then even a new async implementation would have to implement these old 
method signatures with a dummy.
   
   Other solution I did think of were using some runtime information (things 
like checking dynamically if a method is present or not) but I decided not to 
use that, because these kind of lookups is usually very slow and IMHO breaks 
type saftey.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use async requests to storage whenever possible
> ---
>
> Key: PIO-193
> URL: https://issues.apache.org/jira/browse/PIO-193
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Chris Wewerka
>Priority: Major
>
> The storage access in Prediction IO uses blocking drivers and uses the 
> standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This 
> leads to bad usage of machines resources.
>  
> See also 
> [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E]
> and https://jira.apache.org/jira/browse/PIO-182



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PIO-193) Use async requests to storage whenever possible

2018-11-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681443#comment-16681443
 ] 

ASF GitHub Bot commented on PIO-193:


longliveenduro edited a comment on issue #495: [PIO-193] Async support for 
predict method and storage access, blocking code wrapped in blocking construct
URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562
 
 
   @dszeto @takezoe This would be one possible solution to be nearly 100% 
backward compatible. With "nearly" I mean that an engine implementor has to add 
"override" to its implementation of predictBase() / predict() because in this 
suggestion I implement it with a default NotImplementedError stating that the 
method is deprecated.
   
   If you want we could just drop this default implementation of 
BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility 
but then even a new async implementation would have to implement these old 
method signatures with a dummy.
   
   Other solution I did think of were using some runtime information (things 
like checking dynamically if a method is present or not) but I decided not to 
use that, because these kind of lookups are usually very slow and IMHO breaks 
type saftey.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use async requests to storage whenever possible
> ---
>
> Key: PIO-193
> URL: https://issues.apache.org/jira/browse/PIO-193
> Project: PredictionIO
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 0.13.0
>Reporter: Chris Wewerka
>Priority: Major
>
> The storage access in Prediction IO uses blocking drivers and uses the 
> standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This 
> leads to bad usage of machines resources.
>  
> See also 
> [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E]
> and https://jira.apache.org/jira/browse/PIO-182



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)