Hi James, I have read over the Tuning Guide, and tried some of your suggestions: #3, #5, #6. Since the date is mutable, and read/write frequently, I did not try #1, #2, #4. The schema is simple as such:
*create table if not exists test_data (* * id VARCHAR(32),* * sid VARCHAR(32),* * uid UNSIGNED_LONG,* * xid UNSIGNED_INT,* * ts UNSIGNED_LONG* * CONSTRAINT id primary key(id)* *) SPLIT ON ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z') ;* The indexes are created with: *create LOCAL index data_ts on test_data(ts);* *create LOCAL index data_sid_ts on test_data(sid, ts);* On my 10 RegionServer cluster, write performance raised to 1k/s with the tuning above. I also tried using multi-thread for write, with 20 threads write concurrently, the performance can raise to 5k/s, but won't raise any more how ever threads increased. I didn't try write with thin-client to QueryServer, I guess there won't be any boost. On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestay...@apache.org> wrote: > Hi Hef, > Have you had a chance to read our Tuning Guide [1] yet? There's a lot of > good, general guidance there. There are some optimizations for write > performance that depend on how you expect/allow your data and schema to > change: > 1) Is your data write-once? Make sure to declare your table with the > IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a > secondary index as it's not necessary to read the data row (to get the old > value) prior to writing it when there are secondary indexes. > 2) Does your schema only change in an append-only manner? For example, are > columns only added, but never removed? If so, you can declare your table as > APPEND_ONLY_SCHEMA as described here [2]. > 3) Does your schema never or rarely change at know times? If so, you can > declare an UPDATE_CACHE_FREQUENCY property as described here [2] to reduce > the RPC traffic. > 4) Can you bulk load data [3] and then add or rebuild the index afterwards? > 5) Have you investigated using local indexes [4]? They're optimized for > write speed since they ensure that the index data is on the same region > server as the data (i.e. all writes are local to the region server, no > cross region server calls, but there's some overhead at read time). > 6) Have you considered not using secondary indexes and just letting your > less common queries be slower? > > Keep in mind, with secondary indexes, you're essentially writing your data > twice. You'll need to expect that your write performance will drop. As > usual, there's a set of tradeoffs that you need to understand and choose > according to your requirements. > > Thanks, > James > > [1] https://phoenix.apache.org/tuning_guide.html > [2] https://phoenix.apache.org/language/index.html#options > [3] https://phoenix.apache.org/bulk_dataload.html > [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes > > On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <els...@apache.org> wrote: > >> 500writes/seconds seems very low to me. On my wimpy laptop, I can easily >> see over 10K writes/second depending on the schema. >> >> The first check is to make sure that you have autocommit disabled. >> Otherwise, every update you make via JDBC will trigger an HBase RPC. >> Batching of RPCs to HBase is key to optimal performance via Phoenix. >> >> Regarding #2, unless you have intimate knowledge with how Phoenix writes >> data to HBase, do not investigate this approach. >> >> >> On 9/5/17 5:56 AM, Hef wrote: >> >>> Hi guys, >>> I'm evaluating using Phoenix to replace MySQL for better scalability. >>> The version I'm evaluating is 4.11-HBase-1.2, with some dependencies >>> modified to match CDH5.9 which we are using. >>> >>> The problem I'm having is the write performance to Phoenix from JDBC is >>> too poor, only 500writes/second, while our data's throughput is almost >>> 50,000/s. My questions are: >>> 1. If the 500/s TPS is normal speed? How fast can you achieve in your >>> production? >>> 2. Whether I can write directly into HBase with mutation API, and read >>> from Phoenix, that could be fast. But I don't see the secondary index be >>> created automatically in this case. >>> >>> Regards, >>> Hef >>> >> >