As it happens I just looked into the concurrency issue. How many connections can be made to the Prediction Server. The answer is that Spray HTTP, the earlier version of what is now merged with akka-http, uses something called tcp back-pressure to optimize the number of connections allowed and the number of simultaneous responses that can be worked on in one connection. This accounts for how many cores and therefore threads are available on the machine. The upshot is that Spray self-optimizes for the max that the machine can handle.
This means that connections can be raised by increasing cores but if you are not at 100% CPU usage then some other part of the system is probably the bottleneck and that means batch queries will not help. Only if you see 100% CPU usage on the prediction server is increasing connections or batch queries going to help. Remember that for evaluation you are putting the worst case load on the Prediction Server, worse than any real-world scenario is likely to hit. And it is almost certain that you will be able to overload the system since sending a query is much much easier than processing it. So processing the query is the most likely cause of speed limits. Therefore, scale the system to the max time you can wait for all responses. Start with the Prediction Server, either add cores or put a load balancer in so you can have any number of PS machines. Scale HBase and Elasticsearch in the normal ways using their clustering methods as you see them start to show max CPU usage. Nothing is stored during a query but disk may be read so also watch I/O for bottlenecks. This assumes you have a clustered deployment and can measure each systems’s load independently. If you are all on one machine, good luck because there are many over-constrained situations where one service grabs resources another needs causing artificial bottlenecks. Assuming a clustered environment the system is indefinitely scalable because no state is stored in PIO, only in scalable services like HBase and Elasticsearch. There are 2 internal queries for every query you make, one to HBase, and one to ES. Both have been use in massive deployments and so can handle any load you can define. So by scaling the PS (vertically or with load balancers) and Hbase and ES (through vertical or cluster expansion) you should be able to handle as many queries per second as you need. On Jul 27, 2017, at 9:39 PM, Mattz <[email protected]> wrote: Thanks Mars. Looks like the pio eval may not work for my needs (according to Pat) since I am using UR template. And, querying the RESP API in bulk may be limited with the concurrency that the API can handle. I tested with a reasonably sized machine and this number was not high enough. On Thu, Jul 27, 2017 at 11:08 PM, Mars Hall <[email protected] <mailto:[email protected]>> wrote: Hi Mattz, Yes, that batch evaluator using `pio eval` is currently the only documented way to run batch predictions. It's also possible to create a custom script that calls the Queries HTTP/REST API to collect predictions in bulk. My team has had this need reoccur. So, I implemented a `pio batchpredict` command for PredictionIO, but it's not yet been merged & released. See the pull request: https://github.com/apache/incubator-predictionio/pull/412 <https://github.com/apache/incubator-predictionio/pull/412> *Mars ( <> .. <> ) > On Jul 27, 2017, at 05:25, Mattz <[email protected] > <mailto:[email protected]>> wrote: > > Hello, > > I am using the "Universal Recommender" template. Is the below guide current > if I want to create bulk recommendations in batch? > > https://predictionio.incubator.apache.org/templates/recommendation/batch-evaluator/ > > <https://predictionio.incubator.apache.org/templates/recommendation/batch-evaluator/> > > Thanks!
