Hi,

We've got a quite wide table (maybe 50 cols) with about 1 billion rows in it, 
currently stored in MySQL; we're looking at moving it into Phoenix. The pk 
there is an autoincrement column, but each row also contains a UUID, and that 
would probably naturally become the pk in Phoenix. There are several other 
tables that hang off this table, in the sense that the pk for the main table is 
a foreign key in these other tables. There are several indexed columns in MySQL 
that would also need to carry over as indexes in Phoenix.

Most of the queries are reads, but maybe 20% of them are writes. Almost all of 
them are small, doing point lookups or returning a few rows based on one of the 
indexes.

Can anyone suggest sensible Phoenix/HBase config to get decent performance out 
of this? Specifically:


  1.  How should we encode the UUID? As BINARY(16)? And if this is the PK, and 
they are randomly generated UUIDs, presumably salting is unnecessary?
  2.  How many nodes should we expect to need to give us at least as good 
performance as our MySQL database with 1 billion rows?
  3.  How many regions?
  4.  Presumably this will start to out-perform MySQL as the number of rows in 
the database increases? When we've got 10 billion rows, MySQL might struggle 
but hopefully Phoenix will be fine?
  5.  Are there any particular HBase configs we should be aware of (RPC 
timeouts etc.) that we'll need to tweak to get decent performance? This applies 
partly to the bulk loading process (data migration) at the beginning, but also 
afterwards when it's released into production.

We'd be extremely grateful for any tips.

James

________________________________

Elsevier Limited. Registered Office: The Boulevard, Langford Lane, Kidlington, 
Oxford, OX5 1GB, United Kingdom, Registration No. 1982084, Registered in 
England and Wales.

Reply via email to