[jira] [Commented] (PIO-192) Enhance PySpark support
[ https://issues.apache.org/jira/browse/PIO-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681716#comment-16681716 ] Wei Chen commented on PIO-192: -- Hello [~shimamoto], just a question. Since we are doing the restructuring, are we looking for providing functions to deploy prediction service: {code:python} pypio.deploy(model) {code} Also, should we allow users to create new apps in the notebook? {code:python} pypio.newApp("myApp1") {code} So users can have complete control just by using the notebook. Doing so will make Jupiter notebook a control center for experiments, which I think we should also take into consideration before settling the new architecture. > Enhance PySpark support > --- > > Key: PIO-192 > URL: https://issues.apache.org/jira/browse/PIO-192 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.13.0 >Reporter: Takako Shimamoto >Assignee: Takako Shimamoto >Priority: Major > > h3. Summary > Enhance the pypio, which is the Python API for PIO. > h3. Goals > The limitations of the current Python support always force developers to have > access to sbt. This enhancement will get rid of the build phase. > h3. Description > A Python engine template requires 3 files: > * Python code to specify for the --main-py-file option > * template.json > {code:json} > {"pio": {"version": { "min": "0.14.0-SNAPSHOT" }}} > {code} > * engine.json > {code:json} > { > "id": "default", > "description": "Default settings", > "engineFactory": "org.apache.predictionio.e2.engine.PythonEngine", > "algorithms": [ > { > "name": "default", > "params": { > "name": "BHPApp" > } > } > ], > "serving": { > "params": { > "columns": ["prediction"] > } > } > } > {code} > h4. pypio module > Developers can use the pypio module with jupyter notebook and Python code. > First, import the necessary modules. > {code:python} > from pypio import pypio > {code} > Once the module in imported, the first step is to initialize the pypio module. > {code:python} > pypio.init() > {code} > Next, find data from the event store. > {code:python} > event_df = pypio.find('BHPApp') > {code} > And then, save the model. > {code:python} > # model is a PipelineModel, which is produced after a Pipeline’s fit() method > runs > pipeline = Pipeline(...) > model = pipeline.fit(train_df) > pypio.save(model) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIO-193) Use async requests to storage whenever possible
[ https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681535#comment-16681535 ] ASF GitHub Bot commented on PIO-193: longliveenduro edited a comment on issue #495: [PIO-193] Async support for predict method and storage access, blocking code wrapped in blocking construct URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562 @dszeto @takezoe This would be one possible solution to be nearly 100% backward compatible. With "nearly" I mean that an engine implementor has to add "override" to its implementation of predictBase() / predict(), when he recompiles/changes his engine. Because in this suggestion I implement it with a default NotImplementedError stating that the method is deprecated. But this should only be necessary when the engine developer recompiles his engine. With that "override" hint he then also gets a good hint to move on to the new async implementation. Already compiled engines should still work as the simply override the new default impl. of predict/predictBase. predictBaseAsync and predictAsync simply delegate to the old methods and wrapping a blocking block around it to tell the standard scala execution context to execute them in a second, much bigger threadpool. See chapter "Blocking" from https://www.beyondthelines.net/computing/scala-future-and-execution-context/ If you want we could just drop this default implementation of BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility but then even a new async implementation would have to implement these old method signatures with a dummy. Other solution I did think of were using some runtime information (things like checking dynamically if a method is present or not) but I decided not to use that, because these kind of lookups are usually very slow and IMHO breaks type saftey. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use async requests to storage whenever possible > --- > > Key: PIO-193 > URL: https://issues.apache.org/jira/browse/PIO-193 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.13.0 >Reporter: Chris Wewerka >Priority: Major > > The storage access in Prediction IO uses blocking drivers and uses the > standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This > leads to bad usage of machines resources. > > See also > [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E] > and https://jira.apache.org/jira/browse/PIO-182 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIO-193) Use async requests to storage whenever possible
[ https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681515#comment-16681515 ] ASF GitHub Bot commented on PIO-193: longliveenduro edited a comment on issue #495: [PIO-193] Async support for predict method and storage access, blocking code wrapped in blocking construct URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562 @dszeto @takezoe This would be one possible solution to be nearly 100% backward compatible. With "nearly" I mean that an engine implementor has to add "override" to its implementation of predictBase() / predict() because in this suggestion I implement it with a default NotImplementedError stating that the method is deprecated. But this should only be necessary when the engine developer recompiles his engine. With that "override" hint he then also gets a good hint to move on to the new async implementation. Already compiled engines should still work as the simply override the new default impl. of predict/predictBase. predictBaseAsync and predictAsync simply delegate to the old methods and wrapping a blocking block around it to tell the standard scala execution context to execute them in a second, much bigger threadpool. See chapter "Blocking" from https://www.beyondthelines.net/computing/scala-future-and-execution-context/ If you want we could just drop this default implementation of BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility but then even a new async implementation would have to implement these old method signatures with a dummy. Other solution I did think of were using some runtime information (things like checking dynamically if a method is present or not) but I decided not to use that, because these kind of lookups are usually very slow and IMHO breaks type saftey. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use async requests to storage whenever possible > --- > > Key: PIO-193 > URL: https://issues.apache.org/jira/browse/PIO-193 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.13.0 >Reporter: Chris Wewerka >Priority: Major > > The storage access in Prediction IO uses blocking drivers and uses the > standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This > leads to bad usage of machines resources. > > See also > [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E] > and https://jira.apache.org/jira/browse/PIO-182 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIO-193) Use async requests to storage whenever possible
[ https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681440#comment-16681440 ] ASF GitHub Bot commented on PIO-193: longliveenduro commented on issue #495: [PIO-193] Async support for predict method and storage access, blocking code wrapped in blocking construct URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562 @dszeto @takezoe This would be one possible solution to be nearly 100% backward compatible. With "nearly" I mean that an engine implementor has to add "override" to its implementation of predictBase() / predict() because in this suggestion I implement it with a default NotImplementedError stating that the method is deprecated. If you want we could just drop this default implementation of BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility but then even a new async implementation would have to implement these old method signatures with a dummy. Other solution I did think of were using some runtime information (things like checking dynamically if a method is present or not) but I decided not to use that, because these kind of lookups is usually very slow and IMHO breaks type saftey. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use async requests to storage whenever possible > --- > > Key: PIO-193 > URL: https://issues.apache.org/jira/browse/PIO-193 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.13.0 >Reporter: Chris Wewerka >Priority: Major > > The storage access in Prediction IO uses blocking drivers and uses the > standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This > leads to bad usage of machines resources. > > See also > [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E] > and https://jira.apache.org/jira/browse/PIO-182 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIO-193) Use async requests to storage whenever possible
[ https://issues.apache.org/jira/browse/PIO-193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681443#comment-16681443 ] ASF GitHub Bot commented on PIO-193: longliveenduro edited a comment on issue #495: [PIO-193] Async support for predict method and storage access, blocking code wrapped in blocking construct URL: https://github.com/apache/predictionio/pull/495#issuecomment-437362562 @dszeto @takezoe This would be one possible solution to be nearly 100% backward compatible. With "nearly" I mean that an engine implementor has to add "override" to its implementation of predictBase() / predict() because in this suggestion I implement it with a default NotImplementedError stating that the method is deprecated. If you want we could just drop this default implementation of BaseAlgorithm.predictBase / Subclass.predict and would have 100% compatibility but then even a new async implementation would have to implement these old method signatures with a dummy. Other solution I did think of were using some runtime information (things like checking dynamically if a method is present or not) but I decided not to use that, because these kind of lookups are usually very slow and IMHO breaks type saftey. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Use async requests to storage whenever possible > --- > > Key: PIO-193 > URL: https://issues.apache.org/jira/browse/PIO-193 > Project: PredictionIO > Issue Type: Improvement > Components: Core >Affects Versions: 0.13.0 >Reporter: Chris Wewerka >Priority: Major > > The storage access in Prediction IO uses blocking drivers and uses the > standard scala ExecutionContext which is a bounded ForkJoin ThreadPool. This > leads to bad usage of machines resources. > > See also > [https://lists.apache.org/thread.html/f14e4f8f29410e4585b3d8e9f646b88293a605f4716d3c4d60771854@%3Cuser.predictionio.apache.org%3E] > and https://jira.apache.org/jira/browse/PIO-182 -- This message was sent by Atlassian JIRA (v7.6.3#76005)