Shawn, The UPSERT SELECT will run in a coprocessor on if it hasn't limit clause, only query 1 table, the query is doing aggregation, no sequences and auto commit is on. Please check your SQL ... and you can also check whether some resources have not been released.
---------------------------------------- Jaanai Zhang Best regards! Shawn Li <shawnli...@gmail.com> 于2018年12月13日周四 下午12:10写道: > Hi Jaanai, > > Thanks for putting your thought. The behavior you describe is correct on > the Hbase region sever side. The memory usage for blockcache and memstore > will be high under such high throughput. But our phoenix client is on a > gateway machine (no hbase region server sitting on it or any Hbase service > on it), so not sure how to explain such high memory usage for upsert select > without "limit" clause. The high memory usage behavior like all select > results send to client machine, cached in client machine's memory, and then > insert back to target table, which is not like the behavior that should > happen, all of this should be done on the server side as the table schema > is exactly the same. By the way, this happens on both Phoenix 4.7 and > Phoenix 4.14. > > > Thanks, > Shawn > > On Wed, Dec 12, 2018 at 10:26 PM Jaanai Zhang <cloud.pos...@gmail.com> > wrote: > >> Shawn, >> >> >> For the upsert without limit, which will read the source table and write >> the target tables on the server side. I think the higher memory usage is >> caused by using scan cache and memstore under the higher throughput. >> >> ---------------------------------------- >> Jaanai Zhang >> Best regards! >> >> >> >> Shawn Li <shawnli...@gmail.com> 于2018年12月13日周四 上午10:13写道: >> >>> Hi Vincent, >>> >>> So you describe limit will sent result to client side and then write to >>> server, this might explain why upsert with limit is slower than without >>> limit. But looks like it can't explain the memory usage? The memory usage >>> on client machine is 8gb (without "limit") vs 2gb (with limit), sometime >>> upsert without "limit" can even reach 20gb for big table. >>> >>> Thanks, >>> Shawn >>> >>> On Wed, Dec 12, 2018 at 6:34 PM Vincent Poon <vincentp...@apache.org> >>> wrote: >>> >>>> I think it's done client-side if you have LIMIT. If you have e.g. >>>> LIMIT 1000 , it would be incorrect for each regionserver to upsert 100, if >>>> you have more than one regionserver. So instead results are sent back to >>>> the client, where the LIMIT is applied and then written back to the server >>>> in the UPSERT. >>>> >>>> On Wed, Dec 12, 2018 at 1:18 PM Shawn Li <shawnli...@gmail.com> wrote: >>>> >>>>> Hi Vincent, >>>>> >>>>> >>>>> >>>>> The table creation statement is similar to below. We have about 200 >>>>> fields. Table is mutable and don’t have any index on the table. >>>>> >>>>> >>>>> >>>>> CREATE TABLE IF NOT EXISTS us_population ( >>>>> >>>>> state CHAR(2) NOT NULL, >>>>> >>>>> city VARCHAR, >>>>> >>>>> population BIGINT, >>>>> >>>>> … >>>>> >>>>> CONSTRAINT my_pk PRIMARY KEY (state)); >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Shawn >>>>> >>>>> >>>>> >>>>> On Wed, Dec 12, 2018, 13:42 Vincent Poon <vincentp...@apache.org >>>>> wrote: >>>>> >>>>>> For #2, can you provide the table definition and the statement used? >>>>>> e.g. Is the table immutable, or does it have indexes? >>>>>> >>>>>> On Tue, Dec 11, 2018 at 6:08 PM Shawn/Xiang Li <shawnli...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> >>>>>>> 1. Want to check what is underlying running for limit clause >>>>>>> used in the following Upsert statement (is it involving any coprocessor >>>>>>> working behind?): >>>>>>> >>>>>>> >>>>>>> >>>>>>> * upsert into table2 select * from >>>>>>> table1 limit 3000000; * (table 1 and table 2 have same schema) >>>>>>> >>>>>>> >>>>>>> >>>>>>> The above statement is running a lot slower than >>>>>>> without “limit” clause as shown in following, even the above statement >>>>>>> upsert less data: >>>>>>> >>>>>>> >>>>>>> >>>>>>> * upsert into table2 select * from >>>>>>> table1;* >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2. We also observe memory usable is pretty high without the >>>>>>> limit clause (8gb vs 2gb), sometimes for big table it can reach 20gb >>>>>>> without using limit clause. According to phoenix website description >>>>>>> for >>>>>>> upsert select “If auto commit is on, and both a) the target table >>>>>>> matches >>>>>>> the source table, and b) the select performs no aggregation, then the >>>>>>> population of the target table will be done completely on the >>>>>>> server-side >>>>>>> (with constraint violations logged, but otherwise ignored).” >>>>>>> >>>>>>> >>>>>>> >>>>>>> My question is If everything is done on server-side, >>>>>>> how come we have such high memory usage on the client machine? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Shawn >>>>>>> >>>>>>