Pat - thanks for the detailed write-up. On Fri, Jul 28, 2017 at 11:00 PM, Pat Ferrel <[email protected]> wrote:
> As it happens I just looked into the concurrency issue. How many > connections can be made to the Prediction Server. The answer is that Spray > HTTP, the earlier version of what is now merged with akka-http, uses > something called tcp back-pressure to optimize the number of connections > allowed and the number of simultaneous responses that can be worked on in > one connection. This accounts for how many cores and therefore threads are > available on the machine. The upshot is that Spray self-optimizes for the > max that the machine can handle. > > This means that connections can be raised by increasing cores but if you > are not at 100% CPU usage then some other part of the system is probably > the bottleneck and that means batch queries will not help. Only if you see > 100% CPU usage on the prediction server is increasing connections or batch > queries going to help. > > Remember that for evaluation you are putting the worst case load on the > Prediction Server, worse than any real-world scenario is likely to hit. And > it is almost certain that you will be able to overload the system since > sending a query is much much easier than processing it. So processing the > query is the most likely cause of speed limits. > > Therefore, scale the system to the max time you can wait for all > responses. Start with the Prediction Server, either add cores or put a load > balancer in so you can have any number of PS machines. Scale HBase and > Elasticsearch in the normal ways using their clustering methods as you see > them start to show max CPU usage. Nothing is stored during a query but disk > may be read so also watch I/O for bottlenecks. This assumes you have a > clustered deployment and can measure each systems’s load independently. If > you are all on one machine, good luck because there are many > over-constrained situations where one service grabs resources another needs > causing artificial bottlenecks. > > Assuming a clustered environment the system is indefinitely scalable > because no state is stored in PIO, only in scalable services like HBase and > Elasticsearch. There are 2 internal queries for every query you make, one > to HBase, and one to ES. Both have been use in massive deployments and so > can handle any load you can define. > > So by scaling the PS (vertically or with load balancers) and Hbase and ES > (through vertical or cluster expansion) you should be able to handle as > many queries per second as you need. > > > > On Jul 27, 2017, at 9:39 PM, Mattz <[email protected]> wrote: > > Thanks Mars. Looks like the pio eval may not work for my needs (according > to Pat) since I am using UR template. > > And, querying the RESP API in bulk may be limited with the concurrency > that the API can handle. I tested with a reasonably sized machine and this > number was not high enough. > > On Thu, Jul 27, 2017 at 11:08 PM, Mars Hall <[email protected]> wrote: > >> Hi Mattz, >> >> Yes, that batch evaluator using `pio eval` is currently the only >> documented way to run batch predictions. >> >> It's also possible to create a custom script that calls the Queries >> HTTP/REST API to collect predictions in bulk. >> >> My team has had this need reoccur. So, I implemented a `pio batchpredict` >> command for PredictionIO, but it's not yet been merged & released. See the >> pull request: >> https://github.com/apache/incubator-predictionio/pull/412 >> >> *Mars >> >> ( <> .. <> ) >> >> > On Jul 27, 2017, at 05:25, Mattz <[email protected]> wrote: >> > >> > Hello, >> > >> > I am using the "Universal Recommender" template. Is the below guide >> current if I want to create bulk recommendations in batch? >> > >> > https://predictionio.incubator.apache.org/templates/ >> recommendation/batch-evaluator/ >> > >> > Thanks! >> >> > >
