As it happens I just looked into the concurrency issue. How many connections 
can be made to the Prediction Server. The answer is that Spray HTTP, the 
earlier version of what is now merged with akka-http, uses something called tcp 
back-pressure to optimize the number of connections allowed and the number of 
simultaneous responses that can be worked on in one connection. This accounts 
for how many cores and therefore threads are available on the machine. The 
upshot is that Spray self-optimizes for the max that the machine can handle.

This means that connections can be raised by increasing cores but if you are 
not at 100% CPU usage then some other part of the system is probably the 
bottleneck and that means batch queries will not help. Only if you see 100% CPU 
usage on the prediction server is increasing connections or batch queries going 
to help.

Remember that for evaluation you are putting the worst case load on the 
Prediction Server, worse than any real-world scenario is likely to hit. And it 
is almost certain that you will be able to overload the system since sending a 
query is much much easier than processing it. So processing the query is the 
most likely cause of speed limits. 

Therefore, scale the system to the max time you can wait for all responses. 
Start with the Prediction Server, either add cores or put a load balancer in so 
you can have any number of PS machines. Scale HBase and Elasticsearch in the 
normal ways using their clustering methods as you see them start to show max 
CPU usage. Nothing is stored during a query but disk may be read so also watch 
I/O for bottlenecks. This assumes you have a clustered deployment and can 
measure each systems’s load independently. If you are all on one machine, good 
luck because there are many over-constrained situations where one service grabs 
resources another needs causing artificial bottlenecks.

Assuming a clustered environment the system is indefinitely scalable because no 
state is stored in PIO, only in scalable services like HBase and Elasticsearch. 
There are 2 internal queries for every query you make, one to HBase, and one to 
ES. Both have been use in massive deployments and so can handle any load you 
can define.

So by scaling the PS (vertically or with load balancers) and Hbase and ES 
(through vertical or cluster expansion) you should be able to handle as many 
queries per second as you need.


On Jul 27, 2017, at 9:39 PM, Mattz <[email protected]> wrote:

Thanks Mars. Looks like the pio eval may not work for my needs (according to 
Pat) since I am using UR template. 

And, querying the RESP API in bulk may be limited with the concurrency that the 
API can handle. I tested with a reasonably sized machine and this number was 
not high enough. 

On Thu, Jul 27, 2017 at 11:08 PM, Mars Hall <[email protected] 
<mailto:[email protected]>> wrote:
Hi Mattz,

Yes, that batch evaluator using `pio eval` is currently the only documented way 
to run batch predictions.

It's also possible to create a custom script that calls the Queries HTTP/REST 
API to collect predictions in bulk.

My team has had this need reoccur. So, I implemented a `pio batchpredict` 
command for PredictionIO, but it's not yet been merged & released. See the pull 
request:
  https://github.com/apache/incubator-predictionio/pull/412 
<https://github.com/apache/incubator-predictionio/pull/412>

*Mars

( <> .. <> )

> On Jul 27, 2017, at 05:25, Mattz <[email protected] 
> <mailto:[email protected]>> wrote:
>
> Hello,
>
> I am using the "Universal Recommender" template. Is the below guide current 
> if I want to create bulk recommendations in batch?
>
> https://predictionio.incubator.apache.org/templates/recommendation/batch-evaluator/
>  
> <https://predictionio.incubator.apache.org/templates/recommendation/batch-evaluator/>
>
> Thanks!



Reply via email to