Kadir, The server-side paging changes are in Phoenix 5.1, which is at feature parity with 4.16 for HBase 1.x, not 5.0 (which was released in 2018 and is roughly equivalent to Phoenix 4.14). So if Simon's environment is Phoenix 5.0, it can't be affected by the server-side changes. (Though I see that he's running HBase 2.2, which I thought wasn't supported until Phoenix 5.1?) Geoffrey On Tue, Sep 28, 2021 at 4:07 AM Simon Mottram wrote: > Hi thanks for the quick response > > I'm going to appear thick now, brace yourself. By client do you mean the > queryserver running in EMR which does have an hbase-site.xml or my Java API > as a property, we don't have an hbase-site.xml (unless it's embedded in the > thin client jar. > > Cheers > > S > -- > *From:* Kadir Ozdemir > *Sent:* 28 September 2021 7:45 PM > *To:* user > *Subject:* Re: Urgent problem with Phoenix function LAST_VALUE and joins > > In Phoenix 5.0 we introduced a new server side paging feature to eliminate > timeouts due long running server side computations such aggregation and > joins. I wonder if this new feature caused your issue. If so, the work > around is to disable it by setting phoenix.server.paging.enabled to false > in hbase-site.xml. This is a client side config param so you just need to > restart your client. Hope this will fix your issue. > > On Mon, Sep 27, 2021 at 9:23 PM Simon Mottram < > simon.mott...@cucumber.co.nz> wrote: > > Hi > > Got my fingers crossed that there's a work around for this as this really > is a big problem for us > > We are using: > > Amazon EMR > > Release label:emr-6.1.0 > Hadoop distribution:Amazon > Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0 > > Thin Client version: > phoenix-5.0.0-HBase-2.0-thin-client.jar > > We get the following error when doing an LAST_VALUE aggregation where > > 1. A JOIN is empty > 2. The column is INTEGER or DATETIME > > Remote driver error: IllegalArgumentException: offset (25) + length (4) > exceed the capacity of the array: 25 > > The query that breaks is: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > ON DOCID = OBSERVATION_VALUE_ID > AND TAGNAME = 'TRIAL_ID' > AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe' > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > I can refactor this using EXIST but get same error, presumably the driver > knows to treat them the same: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > EXISTS ( > SELECT > DOCID > FROM > VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > WHERE > DOCID = OBSERVATION_VALUE_ID > AND TAGNAME = 'TRIAL_ID' > AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe') > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > If we remove the external reference we get no error, regardless of whether > there are any hits or not > > -- these all work > There are no hits for this query > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > BIOMATERIAL_TYPE = 'aardvark' > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > Lots of hits for this query: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 OFFSET 0; > > I've tried weird things like: > > Comparing exists to TRUE to try and force it into a normal BOOLEAN value, > same IllegalArgumentException. > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > (EXISTS ( > SELECT > DOCID > FROM > VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX > WHERE > DOCID = OBSERVATION_VALUE_ID > AND TAGNAME = 'TRIAL_ID' > AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')) = TRUE > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > And are you prepared for this one, which throws exact same error: > > SELECT > "BIOMATERIAL_NAME", > FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP ( > ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count" > FROM > VARIABLE_VALUES_QA.OBSERVATION > WHERE > TRUE > GROUP BY > "BIOMATERIAL_NAME" > LIMIT 10 > OFFSET 0; > > Change FIRST_VALUE to AVG and it works fine. > >
On behalf of the Apache Phoenix PMC, I'm pleased to announce that Viraj Jasani has accepted the PMC's invitation to become a committer on Apache Phoenix. We appreciate all of the great contributions Viraj has made to the community thus far and we look forward to his continued involvement. Congratulations and welcome, Viraj!
On behalf of the Apache Phoenix PMC, I'm pleased to announce that Daniel Wong has accepted the PMC's invitation to become a committer on Apache Phoenix. We appreciate all of the great contributions Daniel has made to the community thus far and we look forward to his continued involvement. Welcome Daniel! Geoffrey Jacoby
Welcome, Richard, and congratulations! Geoffrey On Mon, Jan 4, 2021 at 2:31 PM Ankit Singhal wrote: > On behalf of the Apache Phoenix PMC, I'm pleased to announce that Richárd > Antal > has accepted the PMC's invitation to become a committer on Apache Phoenix. > > We appreciate all of the great contributions Richárd has made to the > community thus far and we look forward to his continued involvement. > > Congratulations and welcome, Richárd Antal! >
Congrats to Ankit, and thanks to Josh for his service to the Phoenix community! Geoffrey On Fri, Apr 17, 2020 at 11:10 AM Andrew Purtell wrote: > Congratulations on the new role Ankit. > > On Thu, Apr 16, 2020 at 8:14 AM Josh Elser wrote: > > > I'm pleased to announce that the ASF board has just approved the > > transition of VP Phoenix from myself to Ankit. As with all things, this > > comes with the approval of the Phoenix PMC. > > > > The ASF defines the responsibilities of the VP to be largely oversight > > and secretarial. That is, a VP should be watching to make sure that the > > project is following all foundation-level obligations and writing the > > quarterly project reports about Phoenix to summarize the happenings. Of > > course, a VP can choose to use this title to help drive movement and > > innovation in the community, as well. > > > > With this VP rotation, the PMC has also implicitly agreed to focus on a > > more regular rotation schedule of the VP role. The current plan is to > > revisit the VP role in another year. > > > > Please join me in congratulating Ankit on this new role and thank him > > for volunteering. > > > > Thank you all for the opportunity to act as VP these last years. > > > > - Josh > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23, Crosstalk >
This is a frequent feature request we unfortunately haven't implemented yet -- see PHOENIX-4286 and PHOENIX-5054, one of which I filed and the other one Josh did. :-) I agree with Josh, I'd love to see an implementation of this if someone has bandwidth. Geoffrey Jacoby On Tue, Apr 14, 2020 at 8:01 AM Josh Elser wrote: > Yeah, I don't have anything handy. > > I'll happily review and commit such a utility if you happen to write one > (even if flawed). > > On 4/12/20 1:31 AM, Simon Mottram wrote: > > Best I can offer is > > > > "SELECT * FROM SYSTEM.CATALOG where table_name = '" + tableName + "' > > and table_schem = '" +schemaName + "'" > > > > S > > > > *From:* Mich Talebzadeh > > *Sent:* Sunday, 12 April 2020 1:36 AM > > *To:* user > > *Subject:* Reverse engineer a phoneix table definition > > Hi, > > > > I was wondering if anyone has a handy script to reverse engineer an > > existing table schema. > > > > I guess one can get the info from system.catalog table to start with. > > However, I was wondering if there is a shell script already or I have to > > write my own. > > > > Thanks, > > > > Dr Mich Talebzadeh > > > > LinkedIn > > / > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/ > > > > http://talebzadehmich.wordpress.com > > > > > > *Disclaimer:* Use it at your own risk.Any and all responsibility for any > > loss, damage or destruction of data or any other property which may > > arise from relying on this email's technical content is explicitly > > disclaimed. The author will in no case be liable for any monetary > > damages arising from such loss, damage or destruction. > > >
Phoenix 5.1 doesn't actually exist yet, at least not at the Apache level. We haven't released it yet. It's possible that a vendor or user has cut an unofficial release off one of our development branches, but that's not something we can give support on. You should contact your vendor. Also, since I see you're upgrading from Phoenix 4.14 to 5.1: The 4.x branch of Phoenix is for HBase 1.x systems, and the 5.x branch is for HBase 2.x systems. If you're upgrading from a 4.x to a 5.x, make sure that you also upgrade your HBase. If you're still on HBase 1.x, we recently released Phoenix 4.15, which does have a supported upgrade path from 4.14 (and a very similar set of features to what 5.1 will eventually get). Geoffrey On Tue, Jan 14, 2020 at 5:23 AM Prathap Rajendran wrote: > Hello All, > > We are trying to upgrade the phoenix version from > "apache-phoenix-4.14.0-cdh5.14.2" > to "APACHE_PHOENIX-5.1.0-cdh6.1.0." > > I couldn't find out any upgrade steps for the same. Please help me out to > get any documents available. > > *Note:* > I have downloaded the below phoenix parcel and trying to access some DML > operation. I am getting the following error > > > https://github.com/dmilan77/cloudera-phoenix/releases/download/5.1.0-HBase-2.0-cdh6.1.0/APACHE_PHOENIX-5.1.0-cdh6.1.0.p1.0-el7.parcel > > *Error:* > 20/01/13 04:22:41 WARN client.HTable: Error calling coprocessor service > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for > row \x00\x00WEB_STAT > java.util.concurrent.ExecutionException: > org.apache.hadoop.hbase.TableNotFoundException: > org.apache.hadoop.hbase.TableNotFoundException: SYSTEM.CHILD_LINK > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:860) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:755) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:137) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:326) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:58) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:192) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:267) > at > org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:435) > at > org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:310) > at > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:595) > at > org.apache.phoenix.coprocessor.ViewFinder.findRelatedViews(ViewFinder.java:94) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.dropChildViews(MetaDataEndpointImpl.java:2488) > at > org.apache.phoenix.coprocessor.MetaDataEndpointImpl.createTable(MetaDataEndpointImpl.java:2083) > at > org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17053) > at > org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8218) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2423) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2405) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42010) > > Thanks, > Prathap >
Just wanted to add that in the new index architecture recently introduced in Phoenix 4.14.3 and the forthcoming 4.15, the index stays in ACTIVE state even if there's a write failure, and the index will be transparently repaired the next time someone reads from the affected keyrange. From the client perspective indexes will always be in sync. Indexes created using the older index framework will still work, but will need to be upgraded to the new framework with the IndexUpgradeTool in order to benefit from the new behavior. We'll be updating the docs on the website soon to reflect that; in the meantime you can look at PHOENIX-5156 and PHOENIX-5211 if you'd like more details. Geoffrey On Tue, Sep 10, 2019 at 3:02 PM Vincent Poon wrote: > Normally you're right, this should get retried at the HBase layer and > would be transparent. However as part of PHOENIX-4130, we have the hbase > client only try the write once, so there's no chance to retry. We did that > to avoid tying up rpc handlers on the server. > Instead, we retry the entire Phoenix mutation from the client side. The > index is put into "PENDING_DISABLE", so that if the next write succeeds, it > can flip back to "ACTIVE". > > On Tue, Sep 10, 2019 at 2:29 PM Alexander Batyrshin <0x62...@gmail.com> > wrote: > >> As I know RegionMovedException is not a problem at all, its just >> notification that we need to update meta information about table regions >> and retry. >> Why we do extra work with changing state of index? >> >> 2019-09-10 22:35:00,764 WARN >> [hconnection-0x4a63b6ea-shared--pool10-t961] client.AsyncProcess: #41, >> table=IDX_TABLE, attempt=1/1 failed=1ops, last exception: >> org.apache.hadoop.hbase.exceptions.RegionMovedException: Region moved to: >> hostname=prod023 port=60020 startCode=1568139705179. As >> of locationSeqNum=93740117. on prod027,60020,1568142287280, tracking >> started Tue Sep 10 22:35:00 MSK 2019; not retrying 1 - final failure >> 2019-09-10 22:35:00,789 INFO >> [RpcServer.default.FPBQ.Fifo.handler=170,queue=10,port=60020] >> index.PhoenixIndexFailurePolicy: Successfully update >> INDEX_DISABLE_TIMESTAMP for IDX_TABLE due to an exception while writing >> updates. indexState=PENDING_DISABLE >> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: >> disableIndexOnFailure=true, Failed to write to multiple index tables: >> [IDX_TABLE] >> at >> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:236) >> at >> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) >> at >> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) >> at >> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) >> at >> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:614) >> at org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:589) >> at >> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:572) >> at >> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) >> at >> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) >> at >> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) >> at >> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) >> at >> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3677) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3138) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3080) >> at >> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916) >> at >> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844) >> at >> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2406) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2380) >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) >> at >> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) >> at >> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) > >
Alexander, I can tell you what's happening but I don't know why. When you do a Put in HBase (which is what Phoenix UPSERTs do underneath), it gets committed to a mutable in-memory buffer called the MemStore. Periodically, the MemStore is flushed to a physical HDFS file called an HFile. The rule for HFiles is that all of the data inside them is sorted by row key. That means that the code to flush the memstore to an HFile has a built in sanity check that makes sure that each Cell flushed to an HBase HFile is monotonically increasing. That sanity check is failing here for some reason. You mention that this happens constantly with increased write load -- that's likely because more write load causes more MemStore flushes. Since you're using a global index, which stores the index data in a separate table (and hence different regions, each of which has a different MemStore / set of HFiles), and the error's happening to the base table, I'd be surprised if Phoenix indexing is related. What coprocessors (Phoenix and otherwise) are loaded on the table? Geoffrey On Thu, Aug 15, 2019 at 12:49 PM Alexander Batyrshin <0x62...@gmail.com> wrote: > > > On 15 Aug 2019, at 21:27, Josh Elser wrote: > > > > Short-answer, it's possible that something around secondary indexing in > Phoenix causes this but not possible to definitively say in a vaccuum. > > > As I see region-server crashes on main table (not index) memstore flush. > How can I help to provide more information? > Maybe I should increase logging level for some specific java class of > Phoenix or HBase?
Out of curiosity, was the perf acceptable with that many parameters? (assuming of course a reasonable query against the row key or an index) Geoffrey On Wed, Aug 14, 2019 at 9:24 AM Jon Strayer wrote: > It turns out that you can have more than 80,000 parameters in a prepared > statement. > > > > *From: *Jon Strayer > *Reply-To: *"email@example.com" > *Date: *Friday, August 9, 2019 at 6:40 PM > *To: *"firstname.lastname@example.org" > *Subject: *Maximum sql string length > > > > We’ve been querying one table and then creating a query against another > table using a prepared statement with a very large number of parameters in > a IN clause. A code review brought up a very good question, is there a > limit to the size of the IN clause? Or, is there a limit to the size of > the SQL use to create a prepared statement. > > > > I assume there is some limit, but is it “well, how much memory do you > have?” Or, “we have a 64k buffer”? > > > > >
Monil, Could you please post the results of an EXPLAIN plan of your query? For directions how to do this please see http://phoenix.apache.org/explainplan.html Geoffrey Jacoby On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi wrote: > Hello, > Currently we have hourly data in our phoenix table. However, the schema > was designed to perform well for daily data. Increasing the number of rows > by 24X has lead to degradation of our service over time. > Our current schema is as follows > > CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL, cid > BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day, cid) > ) COMPRESSION='SNAPPY' > > The query we run is something along the lines of > SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4 > WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', '-MM-dd', > 'GMT') AND TO_DATE('2018-07-02', '-MM-dd', 'GMT') GROUP BY sid, cid > > Based on our investigation we have concluded that the main reason is > purely the number of rows that are being read. I am open to other > suggestions > > If number of rows is the case > I am wondering if there is a way to either > 1. to roll hourly data to daily using views, secondary index or map > reduce. I know map reduce is possible. > 2. migrate to a newer schema where cid is not part of pk and is actually a > column family. I was unable to find any kind of documentation on this. > > Thanks > Monil >
I'm interested in the new MapReduce features coming in Phoenix 4.3. I was wondering: is there a way to make a MapReduce job where PhoenixInputFormat reads from one HBase cluster, and PhoenixOutputFormat writes to a different cluster? Thanks, Geoffrey Jacoby