I observer that there is some extra mutations in batch for every my UPSERTs For example if app call executeUpdate() only 5 times then on commit there will be "DEBUG MutationState:1046 - Sent batch of 10" Can’t figure out where this extra mutations comes from and why.
This is mean that “useful” batch size is phoenix.mutate.batchSize / 2. > * What does your table DDL look like? CREATE TABLE IF NOT EXISTS TABLE_CODES ( "id" VARCHAR NOT NULL PRIMARY KEY, "d"."tg" VARCHAR, "d"."drip" VARCHAR, "d"."s" UNSIGNED_TINYINT, "d"."se" UNSIGNED_TINYINT, "d"."rle" UNSIGNED_TINYINT, "d"."dme" TIMESTAMP, "d"."dpa" TIMESTAMP, "d"."p" VARCHAR, "d"."pt" UNSIGNED_TINYINT, "d"."x" VARCHAR, "d"."pn" VARCHAR, "d"."b" VARCHAR, "d"."hc" VARCHAR ARRAY, "d"."ns" VARCHAR(16), "d"."tv" VARCHAR(10), "d"."vcp" VARCHAR, "d"."et" UNSIGNED_TINYINT, "d"."xoa" BINARY(16), "d"."j" VARCHAR ) SALT_BUCKETS=30, COLUMN_ENCODED_BYTES=NONE; CREATE INDEX "IDX_CIS_O" ON "TABLE_CODES" ("d"."x", "d"."dme") INCLUDE("d"."tg", "d"."rle", "d"."pt" ... ) SALT_BUCKETS=30; CREATE INDEX "IDX_CIS_PRID" ON "TABLE_CODES" ("d"."drip", "d"."dme") INCLUDE("d"."tg", "d"."rle", "d"."pt" ...) SALT_BUCKETS=30; For my case SALT_BUCKET=30 every batch with default settings will carry only 50 “useful” rows and they will be splitted across 30 servers, so every server will get only 1-2 rows. > * How large is one mutation you're writing (in bytes)? Any idea how to calculate it? https://phoenix.apache.org/metrics.html <https://phoenix.apache.org/metrics.html> will give me total mutations count and total size in bytes of batch. But as I mentioned before there is “extra” mutation that will corrupt statistics > * How much data ends up being sent to a RegionServer in one RPC? Where I can get this metric? > On 3 Sep 2019, at 17:19, Josh Elser <els...@apache.org> wrote: > > Hey Alexander, > > Was just poking at the code for this: it looks like this is really just > determining the number of mutations that get "processed together" (as opposed > to a hard limit). > > Since you have done some work, I'm curious if you could generate some data to > help back up your suggestion: > > * What does your table DDL look like? > * How large is one mutation you're writing (in bytes)? > * How much data ends up being sent to a RegionServer in one RPC? > > You're right in that we would want to make sure that we're sending an > adequate amount of data to a RegionServer in an RPC, but this is tricky to > balance for all cases (thus, setting a smaller value to avoid sending batches > that are too large is safer). > > On 9/3/19 8:03 AM, Alexander Batyrshin wrote: >> Hello all, >> 1) There is bug in documentation - http://phoenix.apache.org/tuning.html >> phoenix.mutate.batchSize is not 1000, but only 100 by default >> https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java#L164 >> Changed for https://issues.apache.org/jira/browse/PHOENIX-541 >> 2) I want to discuss this default value. From PHOENIX-541 >> <https://issues.apache.org/jira/browse/PHOENIX-541> I read about issue with >> MR and wide rows (2MB per row) and it looks like rare case. But in most >> common cases we can get much better write perfomance with batchSize = 1000 >> especially if it used with SALT table