Hef -- do your split points actually correspond with the distribution on values of your `id` column? You can tell this pretty easily looking at the number of requests per region for your data table on the HBase UI.

And yes, PQS will not increase the performance (as it is adding "more work" to accomplish the same thing the thick driver accomplishes itself).

5K/s updates with two indexes seems OK to me for a laptop/VM. Maybe you need to increase the number of handlers you give HBase? What are the hardware characteristics of the system that you're running HBase on?

On 9/8/17 2:57 AM, Hef wrote:
Hi James,
I have read over the Tuning Guide, and tried some of your suggestions: #3, #5, #6. Since the date is mutable, and read/write frequently, I did not try #1, #2, #4.
The schema is simple as such:

/create table if not exists test_data (/
/  id VARCHAR(32),/
/  sid VARCHAR(32),/
/  uid UNSIGNED_LONG,/
/  xid UNSIGNED_INT,/
/  ts UNSIGNED_LONG/
/  CONSTRAINT id primary key(id)/
/)  SPLIT ON ('0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z') ;/


The indexes are created with:

/create LOCAL index data_ts on  test_data(ts);/
/create LOCAL index data_sid_ts on  test_data(sid, ts);/

On my 10 RegionServer cluster, write performance raised to 1k/s with the tuning above. I also tried using multi-thread for write, with 20 threads write concurrently, the performance can raise to 5k/s, but won't raise any more how ever threads increased.

I didn't try write with thin-client to QueryServer, I guess there won't be any boost.



On Wed, Sep 6, 2017 at 3:21 PM, James Taylor <jamestay...@apache.org <mailto:jamestay...@apache.org>> wrote:

    Hi Hef,
    Have you had a chance to read our Tuning Guide [1] yet? There's a
    lot of good, general guidance there. There are some optimizations
    for write performance that depend on how you expect/allow your data
    and schema to change:
    1) Is your data write-once? Make sure to declare your table with the
    IMMUTABLE_ROWS=true property[2]. That will lower the overhead of a
    secondary index as it's not necessary to read the data row (to get
    the old value) prior to writing it when there are secondary indexes.
    2) Does your schema only change in an append-only manner? For
    example, are columns only added, but never removed? If so, you can
    declare your table as APPEND_ONLY_SCHEMA as described here [2].
    3) Does your schema never or rarely change at know times? If so, you
    can declare an UPDATE_CACHE_FREQUENCY property as described here [2]
    to reduce the RPC traffic.
    4) Can you bulk load data [3] and then add or rebuild the index
    afterwards?
    5) Have you investigated using local indexes [4]? They're optimized
    for write speed since they ensure that the index data is on the same
    region server as the data (i.e. all writes are local to the region
    server, no cross region server calls, but there's some overhead at
    read time).
    6) Have you considered not using secondary indexes and just letting
    your less common queries be slower?

    Keep in mind, with secondary indexes, you're essentially writing
    your data twice. You'll need to expect that your write performance
    will drop. As usual, there's a set of tradeoffs that you need to
    understand and choose according to your requirements.

    Thanks,
    James

    [1] https://phoenix.apache.org/tuning_guide.html
    <https://phoenix.apache.org/tuning_guide.html>
    [2] https://phoenix.apache.org/language/index.html#options
    <https://phoenix.apache.org/language/index.html#options>
    [3] https://phoenix.apache.org/bulk_dataload.html
    <https://phoenix.apache.org/bulk_dataload.html>
    [4] https://phoenix.apache.org/secondary_indexing.html#Local_Indexes
    <https://phoenix.apache.org/secondary_indexing.html#Local_Indexes>

    On Tue, Sep 5, 2017 at 11:48 AM, Josh Elser <els...@apache.org
    <mailto:els...@apache.org>> wrote:

        500writes/seconds seems very low to me. On my wimpy laptop, I
        can easily see over 10K writes/second depending on the schema.

        The first check is to make sure that you have autocommit
        disabled. Otherwise, every update you make via JDBC will trigger
        an HBase RPC. Batching of RPCs to HBase is key to optimal
        performance via Phoenix.

        Regarding #2, unless you have intimate knowledge with how
        Phoenix writes data to HBase, do not investigate this approach.


        On 9/5/17 5:56 AM, Hef wrote:

            Hi guys,
            I'm evaluating using Phoenix to replace MySQL for better
            scalability.
            The version I'm evaluating is 4.11-HBase-1.2, with some
            dependencies modified to match CDH5.9 which we are using.

            The problem I'm having is the write performance to Phoenix
            from JDBC is too poor, only 500writes/second, while our
            data's throughput is almost 50,000/s. My questions are:
            1. If the 500/s TPS is normal speed? How fast can you
            achieve in your production?
            2. Whether I can write directly into HBase with mutation
            API, and read from Phoenix, that could be fast. But I don't
            see the secondary index be created automatically in this case.

            Regards,
            Hef



Reply via email to