Shawn, Your query *upsert into table2 select * from table1; *would not be run on the server - the source and target table are different. It would have to be something like: *upsert into table1 select * from table1;*
If you want to run server-side upsert select on a target table that is different from the source table, you need to set "phoenix.client.enable.server.upsert.select" to true on your client. The are some other restrictions: the table can't have any global indexes, and the statement can't have a join or where subquery. We need to update the documentation with this information. The reason there are all these hurdles is because it's generally not recommended to do server-side upsert select across different tables, because that means you're doing cross-regionserver RPCs (e.g. read data from a region of sourcetable, and write to a region of targettable on a different regionserver), potentially tying up handlers in an unpredictable way. On Sun, Dec 16, 2018 at 7:12 PM Shawn Li <shawnli...@gmail.com> wrote: > Hi Jaanai, > > According to Phoenix website, " If auto commit is on, and both a) the > target table matches the source table, and b) the select performs no > aggregation, then the population of the target table will be done > completely on the server-side (with constraint violations logged, but > otherwise ignored). Otherwise, data is buffered on the client and, if auto > commit is on, committed in row batches as specified by the UpsertBatchSize > connection property (or the phoenix.mutate.upsertBatchSize HBase config > property which defaults to 10000 rows)" > > And our sql statement is just: *upsert into table2 select * from table1; > *which > should match the first case, all operations should be in server site. But > the memory usage on the client machine is higher than "upsert select limit" > clause. And the memory usage is check by run 'top' command under Linux. So > we are sure it is caused by "select upsert" in Phoenix and not others, and > can't explain why there is so high memory usage on client/gateway machine > when all operations are supposed to happen on the serve side. > > Thanks, > Shawn > > On Thu, Dec 13, 2018 at 3:15 AM Jaanai Zhang <cloud.pos...@gmail.com> > wrote: > >> Shawn, >> >> The UPSERT SELECT will run in a coprocessor on if it hasn't limit clause, >> only query 1 table, the query is doing aggregation, no sequences and auto >> commit is on. Please check your SQL ... and you can also check whether some >> resources have not been released. >> >> ---------------------------------------- >> Jaanai Zhang >> Best regards! >> >> >> >> Shawn Li <shawnli...@gmail.com> 于2018年12月13日周四 下午12:10写道: >> >>> Hi Jaanai, >>> >>> Thanks for putting your thought. The behavior you describe is correct on >>> the Hbase region sever side. The memory usage for blockcache and memstore >>> will be high under such high throughput. But our phoenix client is on a >>> gateway machine (no hbase region server sitting on it or any Hbase service >>> on it), so not sure how to explain such high memory usage for upsert select >>> without "limit" clause. The high memory usage behavior like all select >>> results send to client machine, cached in client machine's memory, and then >>> insert back to target table, which is not like the behavior that should >>> happen, all of this should be done on the server side as the table schema >>> is exactly the same. By the way, this happens on both Phoenix 4.7 and >>> Phoenix 4.14. >>> >>> >>> Thanks, >>> Shawn >>> >>> On Wed, Dec 12, 2018 at 10:26 PM Jaanai Zhang <cloud.pos...@gmail.com> >>> wrote: >>> >>>> Shawn, >>>> >>>> >>>> For the upsert without limit, which will read the source table and >>>> write the target tables on the server side. I think the higher memory >>>> usage is caused by using scan cache and memstore under the higher >>>> throughput. >>>> >>>> ---------------------------------------- >>>> Jaanai Zhang >>>> Best regards! >>>> >>>> >>>> >>>> Shawn Li <shawnli...@gmail.com> 于2018年12月13日周四 上午10:13写道: >>>> >>>>> Hi Vincent, >>>>> >>>>> So you describe limit will sent result to client side and then write >>>>> to server, this might explain why upsert with limit is slower than without >>>>> limit. But looks like it can't explain the memory usage? The memory usage >>>>> on client machine is 8gb (without "limit") vs 2gb (with limit), sometime >>>>> upsert without "limit" can even reach 20gb for big table. >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >>>>> On Wed, Dec 12, 2018 at 6:34 PM Vincent Poon <vincentp...@apache.org> >>>>> wrote: >>>>> >>>>>> I think it's done client-side if you have LIMIT. If you have e.g. >>>>>> LIMIT 1000 , it would be incorrect for each regionserver to upsert 100, >>>>>> if >>>>>> you have more than one regionserver. So instead results are sent back to >>>>>> the client, where the LIMIT is applied and then written back to the >>>>>> server >>>>>> in the UPSERT. >>>>>> >>>>>> On Wed, Dec 12, 2018 at 1:18 PM Shawn Li <shawnli...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Vincent, >>>>>>> >>>>>>> >>>>>>> >>>>>>> The table creation statement is similar to below. We have about 200 >>>>>>> fields. Table is mutable and don’t have any index on the table. >>>>>>> >>>>>>> >>>>>>> >>>>>>> CREATE TABLE IF NOT EXISTS us_population ( >>>>>>> >>>>>>> state CHAR(2) NOT NULL, >>>>>>> >>>>>>> city VARCHAR, >>>>>>> >>>>>>> population BIGINT, >>>>>>> >>>>>>> … >>>>>>> >>>>>>> CONSTRAINT my_pk PRIMARY KEY (state)); >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Shawn >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 12, 2018, 13:42 Vincent Poon <vincentp...@apache.org >>>>>>> wrote: >>>>>>> >>>>>>>> For #2, can you provide the table definition and the statement >>>>>>>> used? e.g. Is the table immutable, or does it have indexes? >>>>>>>> >>>>>>>> On Tue, Dec 11, 2018 at 6:08 PM Shawn/Xiang Li < >>>>>>>> shawnli...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. Want to check what is underlying running for limit >>>>>>>>> clause used in the following Upsert statement (is it involving any >>>>>>>>> coprocessor working behind?): >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * upsert into table2 select * >>>>>>>>> from table1 limit 3000000; * (table 1 and table 2 have same >>>>>>>>> schema) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The above statement is running a lot slower than >>>>>>>>> without “limit” clause as shown in following, even the above >>>>>>>>> statement >>>>>>>>> upsert less data: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * upsert into table2 select * from >>>>>>>>> table1;* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2. We also observe memory usable is pretty high without the >>>>>>>>> limit clause (8gb vs 2gb), sometimes for big table it can reach 20gb >>>>>>>>> without using limit clause. According to phoenix website description >>>>>>>>> for >>>>>>>>> upsert select “If auto commit is on, and both a) the target table >>>>>>>>> matches >>>>>>>>> the source table, and b) the select performs no aggregation, then the >>>>>>>>> population of the target table will be done completely on the >>>>>>>>> server-side >>>>>>>>> (with constraint violations logged, but otherwise ignored).” >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> My question is If everything is done on >>>>>>>>> server-side, how come we have such high memory usage on the client >>>>>>>>> machine? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Shawn >>>>>>>>> >>>>>>>>