Hef -- do your split points actually correspond with the distribution on
values of your `id` column? You can tell this pretty easily looking at
the number of requests per region for your data table on the HBase UI.
And yes, PQS will not increase the performance (as it is adding "more
work" to accomplish the same thing the thick driver accomplishes itself).
5K/s updates with two indexes seems OK to me for a laptop/VM. Maybe you
need to increase the number of handlers you give HBase? What are the
hardware characteristics of the system that you're running HBase on?
On 9/8/17 2:57 AM, Hef wrote:
Hi James,
I have read over the Tuning Guide, and tried some of your suggestions:
#3, #5, #6. Since the date is mutable, and read/write frequently, I did
not try #1, #2, #4.
The schema is simple as such:
/create table if not exists test_data (/
/ id VARCHAR(32),/
/ sid VARCHAR(32),/
/ uid UNSIGNED_LONG,/
/ xid UNSIGNED_INT,/
/ ts UNSIGNED_LONG/
/ CONSTRAINT id primary key(id)/
/) SPLIT ON
('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z')
;/
The indexes are created with:
/create LOCAL index data_ts on test_data(ts);/
/create LOCAL index data_sid_ts on test_data(sid, ts);/
On my 10 RegionServer cluster, write performance raised to 1k/s with the
tuning above.
I also tried using multi-thread for write, with 20 threads write
concurrently, the performance can raise to 5k/s, but won't raise any
more how ever threads increased.
I didn't try write with thin-client to QueryServer, I guess there won't
be any boost.
On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestay...@apache.org
<mailto:jamestay...@apache.org>> wrote:
Hi Hef,
Have you had a chance to read our Tuning Guide [1] yet? There's a
lot of good, general guidance there. There are some optimizations
for write performance that depend on how you expect/allow your data
and schema to change:
1) Is your data write-once? Make sure to declare your table with the
IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a
secondary index as it's not necessary to read the data row (to get
the old value) prior to writing it when there are secondary indexes.
2) Does your schema only change in an append-only manner? For
example, are columns only added, but never removed? If so, you can
declare your table as APPEND_ONLY_SCHEMA as described here [2].
3) Does your schema never or rarely change at know times? If so, you
can declare an UPDATE_CACHE_FREQUENCY property as described here [2]
to reduce the RPC traffic.
4) Can you bulk load data [3] and then add or rebuild the index
afterwards?
5) Have you investigated using local indexes [4]? They're optimized
for write speed since they ensure that the index data is on the same
region server as the data (i.e. all writes are local to the region
server, no cross region server calls, but there's some overhead at
read time).
6) Have you considered not using secondary indexes and just letting
your less common queries be slower?
Keep in mind, with secondary indexes, you're essentially writing
your data twice. You'll need to expect that your write performance
will drop. As usual, there's a set of tradeoffs that you need to
understand and choose according to your requirements.
Thanks,
James
[1] https://phoenix.apache.org/tuning_guide.html
<https://phoenix.apache.org/tuning_guide.html>
[2] https://phoenix.apache.org/language/index.html#options
<https://phoenix.apache.org/language/index.html#options>
[3] https://phoenix.apache.org/bulk_dataload.html
<https://phoenix.apache.org/bulk_dataload.html>
[4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes
<https://phoenix.apache.org/secondary_indexing.html#Local_Indexes>
On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <els...@apache.org
<mailto:els...@apache.org>> wrote:
500writes/seconds seems very low to me. On my wimpy laptop, I
can easily see over 10K writes/second depending on the schema.
The first check is to make sure that you have autocommit
disabled. Otherwise, every update you make via JDBC will trigger
an HBase RPC. Batching of RPCs to HBase is key to optimal
performance via Phoenix.
Regarding #2, unless you have intimate knowledge with how
Phoenix writes data to HBase, do not investigate this approach.
On 9/5/17 5:56 AM, Hef wrote:
Hi guys,
I'm evaluating using Phoenix to replace MySQL for better
scalability.
The version I'm evaluating is 4.11-HBase-1.2, with some
dependencies modified to match CDH5.9 which we are using.
The problem I'm having is the write performance to Phoenix
from JDBC is too poor, only 500writes/second, while our
data's throughput is almost 50,000/s. My questions are:
1. If the 500/s TPS is normal speed? How fast can you
achieve in your production?
2. Whether I can write directly into HBase with mutation
API, and read from Phoenix, that could be fast. But I don't
see the secondary index be created automatically in this case.
Regards,
Hef