Here's another good tuning resource that covers HBase too: http://phoenix.apache.org/presentations/TuningForOLTP.pdf
On Fri, Sep 8, 2017 at 8:30 AM, Josh Elser <els...@apache.org> wrote: > Hef -- do your split points actually correspond with the distribution on > values of your `id` column? You can tell this pretty easily looking at the > number of requests per region for your data table on the HBase UI. > > And yes, PQS will not increase the performance (as it is adding "more > work" to accomplish the same thing the thick driver accomplishes itself). > > 5K/s updates with two indexes seems OK to me for a laptop/VM. Maybe you > need to increase the number of handlers you give HBase? What are the > hardware characteristics of the system that you're running HBase on? > > On 9/8/17 2:57 AM, Hef wrote: > >> Hi James, >> I have read over the Tuning Guide, and tried some of your suggestions: >> #3, #5, #6. Since the date is mutable, and read/write frequently, I did not >> try #1, #2, #4. >> The schema is simple as such: >> >> /create table if not exists test_data (/ >> / id VARCHAR(32),/ >> / sid VARCHAR(32),/ >> / uid UNSIGNED_LONG,/ >> / xid UNSIGNED_INT,/ >> / ts UNSIGNED_LONG/ >> / CONSTRAINT id primary key(id)/ >> /) SPLIT ON ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e' >> ,'f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z') >> ;/ >> >> >> The indexes are created with: >> >> /create LOCAL index data_ts on test_data(ts);/ >> /create LOCAL index data_sid_ts on test_data(sid, ts);/ >> >> On my 10 RegionServer cluster, write performance raised to 1k/s with the >> tuning above. >> I also tried using multi-thread for write, with 20 threads write >> concurrently, the performance can raise to 5k/s, but won't raise any more >> how ever threads increased. >> >> I didn't try write with thin-client to QueryServer, I guess there won't >> be any boost. >> >> >> >> On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestay...@apache.org >> <mailto:jamestay...@apache.org>> wrote: >> >> Hi Hef, >> Have you had a chance to read our Tuning Guide [1] yet? There's a >> lot of good, general guidance there. There are some optimizations >> for write performance that depend on how you expect/allow your data >> and schema to change: >> 1) Is your data write-once? Make sure to declare your table with the >> IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a >> secondary index as it's not necessary to read the data row (to get >> the old value) prior to writing it when there are secondary indexes. >> 2) Does your schema only change in an append-only manner? For >> example, are columns only added, but never removed? If so, you can >> declare your table as APPEND_ONLY_SCHEMA as described here [2]. >> 3) Does your schema never or rarely change at know times? If so, you >> can declare an UPDATE_CACHE_FREQUENCY property as described here [2] >> to reduce the RPC traffic. >> 4) Can you bulk load data [3] and then add or rebuild the index >> afterwards? >> 5) Have you investigated using local indexes [4]? They're optimized >> for write speed since they ensure that the index data is on the same >> region server as the data (i.e. all writes are local to the region >> server, no cross region server calls, but there's some overhead at >> read time). >> 6) Have you considered not using secondary indexes and just letting >> your less common queries be slower? >> >> Keep in mind, with secondary indexes, you're essentially writing >> your data twice. You'll need to expect that your write performance >> will drop. As usual, there's a set of tradeoffs that you need to >> understand and choose according to your requirements. >> >> Thanks, >> James >> >> [1] https://phoenix.apache.org/tuning_guide.html >> <https://phoenix.apache.org/tuning_guide.html> >> [2] https://phoenix.apache.org/language/index.html#options >> <https://phoenix.apache.org/language/index.html#options> >> [3] https://phoenix.apache.org/bulk_dataload.html >> <https://phoenix.apache.org/bulk_dataload.html> >> [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes >> <https://phoenix.apache.org/secondary_indexing.html#Local_Indexes> >> >> On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <els...@apache.org >> <mailto:els...@apache.org>> wrote: >> >> 500writes/seconds seems very low to me. On my wimpy laptop, I >> can easily see over 10K writes/second depending on the schema. >> >> The first check is to make sure that you have autocommit >> disabled. Otherwise, every update you make via JDBC will trigger >> an HBase RPC. Batching of RPCs to HBase is key to optimal >> performance via Phoenix. >> >> Regarding #2, unless you have intimate knowledge with how >> Phoenix writes data to HBase, do not investigate this approach. >> >> >> On 9/5/17 5:56 AM, Hef wrote: >> >> Hi guys, >> I'm evaluating using Phoenix to replace MySQL for better >> scalability. >> The version I'm evaluating is 4.11-HBase-1.2, with some >> dependencies modified to match CDH5.9 which we are using. >> >> The problem I'm having is the write performance to Phoenix >> from JDBC is too poor, only 500writes/second, while our >> data's throughput is almost 50,000/s. My questions are: >> 1. If the 500/s TPS is normal speed? How fast can you >> achieve in your production? >> 2. Whether I can write directly into HBase with mutation >> API, and read from Phoenix, that could be fast. But I don't >> see the secondary index be created automatically in this case. >> >> Regards, >> Hef >> >> >> >>