Re: How to speed up write performance

James Taylor Fri, 08 Sep 2017 08:36:41 -0700

Here's another good tuning resource that covers HBase too:
http://phoenix.apache.org/presentations/TuningForOLTP.pdf


On Fri, Sep 8, 2017 at 8:30 AM, Josh Elser <els...@apache.org> wrote:

> Hef -- do your split points actually correspond with the distribution on
> values of your `id` column? You can tell this pretty easily looking at the
> number of requests per region for your data table on the HBase UI.
>
> And yes, PQS will not increase the performance (as it is adding "more
> work" to accomplish the same thing the thick driver accomplishes itself).
>
> 5K/s updates with two indexes seems OK to me for a laptop/VM. Maybe you
> need to increase the number of handlers you give HBase? What are the
> hardware characteristics of the system that you're running HBase on?
>
> On 9/8/17 2:57 AM, Hef wrote:
>
>> Hi James,
>> I have read over the Tuning Guide, and tried some of your suggestions:
>> #3, #5, #6. Since the date is mutable, and read/write frequently, I did not
>> try #1, #2, #4.
>> The schema is simple as such:
>>
>> /create table if not exists test_data (/
>> /  id VARCHAR(32),/
>> /  sid VARCHAR(32),/
>> /  uid UNSIGNED_LONG,/
>> /  xid UNSIGNED_INT,/
>> /  ts UNSIGNED_LONG/
>> /  CONSTRAINT id primary key(id)/
>> /)  SPLIT ON ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e'
>> ,'f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z')
>> ;/
>>
>>
>> The indexes are created with:
>>
>> /create LOCAL index data_ts on  test_data(ts);/
>> /create LOCAL index data_sid_ts on  test_data(sid, ts);/
>>
>> On my 10 RegionServer cluster, write performance raised to 1k/s with the
>> tuning above.
>> I also tried using multi-thread for write, with 20 threads write
>> concurrently, the performance can raise to 5k/s, but won't raise any more
>> how ever threads increased.
>>
>> I didn't try write with thin-client to QueryServer, I guess there won't
>> be any boost.
>>
>>
>>
>> On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestay...@apache.org
>> <mailto:jamestay...@apache.org>> wrote:
>>
>>     Hi Hef,
>>     Have you had a chance to read our Tuning Guide [1] yet? There's a
>>     lot of good, general guidance there. There are some optimizations
>>     for write performance that depend on how you expect/allow your data
>>     and schema to change:
>>     1) Is your data write-once? Make sure to declare your table with the
>>     IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a
>>     secondary index as it's not necessary to read the data row (to get
>>     the old value) prior to writing it when there are secondary indexes.
>>     2) Does your schema only change in an append-only manner? For
>>     example, are columns only added, but never removed? If so, you can
>>     declare your table as APPEND_ONLY_SCHEMA as described here [2].
>>     3) Does your schema never or rarely change at know times? If so, you
>>     can declare an UPDATE_CACHE_FREQUENCY property as described here [2]
>>     to reduce the RPC traffic.
>>     4) Can you bulk load data [3] and then add or rebuild the index
>>     afterwards?
>>     5) Have you investigated using local indexes [4]? They're optimized
>>     for write speed since they ensure that the index data is on the same
>>     region server as the data (i.e. all writes are local to the region
>>     server, no cross region server calls, but there's some overhead at
>>     read time).
>>     6) Have you considered not using secondary indexes and just letting
>>     your less common queries be slower?
>>
>>     Keep in mind, with secondary indexes, you're essentially writing
>>     your data twice. You'll need to expect that your write performance
>>     will drop. As usual, there's a set of tradeoffs that you need to
>>     understand and choose according to your requirements.
>>
>>     Thanks,
>>     James
>>
>>     [1] https://phoenix.apache.org/tuning_guide.html
>>     <https://phoenix.apache.org/tuning_guide.html>
>>     [2] https://phoenix.apache.org/language/index.html#options
>>     <https://phoenix.apache.org/language/index.html#options>
>>     [3] https://phoenix.apache.org/bulk_dataload.html
>>     <https://phoenix.apache.org/bulk_dataload.html>
>>     [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes
>>     <https://phoenix.apache.org/secondary_indexing.html#Local_Indexes>
>>
>>     On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <els...@apache.org
>>     <mailto:els...@apache.org>> wrote:
>>
>>         500writes/seconds seems very low to me. On my wimpy laptop, I
>>         can easily see over 10K writes/second depending on the schema.
>>
>>         The first check is to make sure that you have autocommit
>>         disabled. Otherwise, every update you make via JDBC will trigger
>>         an HBase RPC. Batching of RPCs to HBase is key to optimal
>>         performance via Phoenix.
>>
>>         Regarding #2, unless you have intimate knowledge with how
>>         Phoenix writes data to HBase, do not investigate this approach.
>>
>>
>>         On 9/5/17 5:56 AM, Hef wrote:
>>
>>             Hi guys,
>>             I'm evaluating using Phoenix to replace MySQL for better
>>             scalability.
>>             The version I'm evaluating is 4.11-HBase-1.2, with some
>>             dependencies modified to match CDH5.9 which we are using.
>>
>>             The problem I'm having is the write performance to Phoenix
>>             from JDBC is too poor, only 500writes/second, while our
>>             data's throughput is almost 50,000/s. My questions are:
>>             1. If the 500/s TPS is normal speed? How fast can you
>>             achieve in your production?
>>             2. Whether I can write directly into HBase with mutation
>>             API, and read from Phoenix, that could be fast. But I don't
>>             see the secondary index be created automatically in this case.
>>
>>             Regards,
>>             Hef
>>
>>
>>
>>

Re: How to speed up write performance

Reply via email to