Re: Disable hbase user history queries

Pat Ferrel Thu, 01 Jun 2017 13:59:56 -0700

Event if you need this I’d think about 2 deployments with in input proxy that 
sends to both input servers and a load balancer for queries. That risks the 
split brain problem but if one goes down you can recover since no data is lost 
and you can export from DS1 and import to DS2 to recover. Recommenders can deal 
with small amounts of data loss too unless it’s changes to item attributes.


The alternative of making every component service HA to the extent of multiple 
datacenters would be a lot more work. I’s not hard to make PIO HA for a single 
datacenter in fact it’s pretty easy. It’s making each service work across 
multiple datacenters that might end up being complex.


On Jun 1, 2017, at 1:47 PM, Pat Ferrel <[email protected]> wrote:

I would put PIO in one and query/input from either of your other datacenters. 
The latency issues with input/query are much much simpler than spreading HBase 
across datacenters. I mean would you want to put Spark and Elasticsearch across 
datacenters too? You are doing 2 datacenters  for HA, right? Do you really need 
HA for a recommender? If so all of the components support this but it will cost 
a lot in complexity and $$ for instances.


On Jun 1, 2017, at 1:28 PM, Martin Fernandez <[email protected] 
<mailto:[email protected]>> wrote:

Actually I have my infraestructure splitted in multiple datacenters in an 
Active/active mode. So how can I manage to have a PIO instance running in each 
DC? Do I have to deploy also HBASE as well? How can I maintain HBASE data? 

On Thu, Jun 1, 2017 at 5:23 PM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
I haven’t done this, it can be done. But why? You always give up performance 
when instances are not on the same physical LAN. Recommenders are generally not 
considered mission critical where the ultimate HA is required.


On Jun 1, 2017, at 11:19 AM, Martin Fernandez <[email protected] 
<mailto:[email protected]>> wrote:

Thanks Pat for your reply. I am doing Video on Demand e-commerce in which 
reatime query would be very helpful but I want to minimize the risks of HDFS 
synchronization latency between datacenters. Do you have experience running 
predictionIO + Universal Recommender in multiple DCs that you can share? Did 
you face any latency issue with the HBASE cluster?  

Thanks in advance

On Thu, Jun 1, 2017 at 2:53 PM, Pat Ferrel <[email protected] 
<mailto:[email protected]>> wrote:
First, I’m not sure this is a good idea. You loose the realtime nature of 
recommendations based on the up-to-the-second recording of user behavior. You 
get this with live user event input even without re-calculating the model in 
realtime.

Second, no you can’t disable queries for user history, it is the single most 
important key to personalized recommendations.

I’d have to know more about your application but the first line of cost cutting 
for us in custom installations (I work for ActionML the maintainer of the UR 
Template) is to make the Spark cluster temporary since it is not needed to 
serve queries and only needs to run during training. We start it up, train. 
then shut it down.

If you really want to shut the entire system down and don’t want realtime user 
behavior you can query for all users and put the results in your DB or 
in-memory cache like a hashmap, then just serve from your db or in-memory 
cache. This takes you back to the days of the old Mahout Mapreduce recommenders 
(pre 2014) but maybe it fits your app.

If you are doing E-Commerce think about a user’s shopping behavior. They shop, 
browse, then buy. Once they buy that old shopping behavior is no longer 
indicative of realtime intent. If you miss using that behavior you may miss the 
shopping session altogether. But again, your needs may vary.


On Jun 1, 2017, at 6:19 AM, Martin Fernandez <[email protected] 
<mailto:[email protected]>> wrote:

Hello guys,

we are trying to deploy Universal Recommender + predictionIO in our 
infrastructure but we don't want to distribute hbase accross datacenters cause 
of the latency. So the idea is to build and train the engine offline and then 
copy the model and ealstic data to PIO replicas. I noticed when I deploy 
engine, it always tries to connect to HBASE server since it is used to query 
user history. Is there any way to disable those user history queries and avoid 
connection to HBASE?

Thanks

Martin




-- 
Saludos / Best Regards,

Martin Gustavo Fernandez
Mobile: +5491132837292 <tel:+54%209%2011%203283-7292>


-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAGQyoRcZtFs9BdGx4y9qSd0qyJBxPQN7NuGrgStT%2BNo%2B7nn0Mw%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAGQyoRcZtFs9BdGx4y9qSd0qyJBxPQN7NuGrgStT%2BNo%2B7nn0Mw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.




-- 
Saludos / Best Regards,

Martin Gustavo Fernandez
Mobile: +5491132837292


-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/76982772-EDF6-4EB2-BF1D-748E6D06026E%40occamsmachete.com
 
<https://groups.google.com/d/msgid/actionml-user/76982772-EDF6-4EB2-BF1D-748E6D06026E%40occamsmachete.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.

Re: Disable hbase user history queries

Reply via email to