[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user chanlee514 commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 @pferrel: Yes the hash of engine directory path is used as engineVersion (which kind of acts as the id). However, I'm working on making updates to this PR so that users can provide the directory path as CLI argument for `pio build/train/deploy/undeploy` commands. This will enable users to run pio commands anywhere, and hopefully make this issue clearer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user pferrel commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 Actually the manifest may be gone but the requirement of the same absolute path is maintained, I believe. A hash of the path to the template code is used to create the id and is part of the metadata for the engine. So the "same path" remains but the manifest goes away. Correct me @chanlee514 if I'm wrong. We are trying to create a roadmap that makes these hidden dependencies more clear and removing those that we can. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user pferrel commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 @chanlee514 said: If the absolute filepath is the same in the machines (since SHA-1 hash of filepath is used as engineVersion), there would be no changes in the above workflow. You can just copy the engine directory and deploy. If not, you can also provide engine-id, engine-version as CLI arguments. There's some additional ongoing work to make the process more robust and clearer, such as enabling engine-directory as CLI input, help messages, and adding tests to cover such scenarios. I don't think the PR will be merged before then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user dszeto commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 @chanlee514 @pferrel I have moved the discussion to dev@. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user pferrel commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 @dszeto asked: "Do you currently have any production deployments that rely heavily on engine IDs and versions? That would be a bigger immediate concern regarding this change." Yes. That is why so many questions above. We create a pio driver machine that runs training and is then taken down. This is in sync with bringing up Spark and taking it down. The pio driver machine is really part of the cluster and for the UR (and other PIO templates we have) the predict part runs without Spark. So this requires we use some engine instance id across machines to find data in the metastore. We also have deployments with 2 EventServers and 2 PredictionServers behind a load balancer. This requires that the same metadata is available to several different machines running, what I've come to see as 3 independent processes, ES, PS, train. They are independent except for the metadata. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user pferrel commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 ok, this looks like a good step @dszeto we would not do `pio build` we would do `sbt build` in the directory that has the template and this is fine IMO. The confusion users have is over pio commands. BTW @chanlee514 @dszeto Are we thinking of a new command, something like `pio register` that would add metadata to the metastore? This would need to be run every time the engine.json changed for instance? It would also do not compile? Is there an alternative? What state does this leave us in? After the push, what action create binary (I assume `pio build`) what action adds metadata to the metastore (I assume `pio build`) So this is only about changing the instance id and getting rid of manifest, right? If so I'm totally cool with this, a good step I agree. I guess I'm using this PR as a forum for discussion and will move it to @dev --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user dszeto commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 @pferrel For `pio build` we probably can enhance it such that it can take a working directory for compilation to run. If we look at some other build tools, they also rely on having certain files in the current working directory. For `pio train` and `pio deploy`, I absolutely agree that it should be able to run from anywhere as the most advanced scenario. My personal take to this change would be something comparable to a local HBase and distributed HBase setup. It would be ideal if someone can start playing with PredictionIO without setting up any external services (not even a SQL database). I believe Chan's change here is a partial step to make the learning curve less steep, and path ways for us to untangle engine registration so that it could become a feature that can be more easily documented and understandable. Do you currently have any production deployments that rely heavily on engine IDs and versions? That would be a bigger immediate concern regarding this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user chanlee514 commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 @pferrel: I agree, and that would be possible with a general metadata registry server which would be the next step. The reason I started by removing manifest.json is because the previous database-dependent build was a roadblock for PredictionIO in PaaS platforms such as Heroku. Information in engine.json is written during `pio train` as engineInstance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-predictionio issue #328: [PIO-47] Eliminate enginemanifest for sta...
Github user chanlee514 commented on the issue: https://github.com/apache/incubator-predictionio/pull/328 Right now, the commands throw java.io.FileNotFoundException regarding engine.json. May be better if we throw an error message saying the commands should be executed inside engine directory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---