I suspect that both rows exist and that they are different,  but you can’t see 
the difference due to
some non-printing character. Did you try doing the query using JDBC or a direct 
HBase scan?

Brian

> On Oct 26, 2016, at 6:20 AM, Craig Roberts <craig.robe...@frogasia.com> wrote:
> 
> Hi all,
> 
> I have a strange issue with Phoenix/HBase (Phoenix 4.4 and HBase 1.1 in HDP 
> 2.4).
> 
> This is my table DDL:
> 
> CREATE TABLE IF NOT EXISTS ANALYTICS(
>   "school_code" VARCHAR NOT NULL,
>   "code" VARCHAR NOT NULL,
>   "iso_8601" DATE NOT NULL,
>   "uuid" VARCHAR NOT NULL,
>   -- Used to ensure the "events" column family exists
>   "events"."___" VARCHAR,
>   CONSTRAINT id PRIMARY KEY ("school_code", "code", "iso_8601", "uuid")
> )
> 
> I then load events into the table using a custom Flume sink.
> 
> 
> 0: jdbc:phoenix:10.124.12.191> SELECT COUNT(*) FROM ANALYTICS;
> +------------------------------------------+
> |                 COUNT(1)                 |
> +------------------------------------------+
> | 1248515                                  |
> +------------------------------------------+
> 1 row selected (2.415 seconds)
> 0: jdbc:phoenix:10.124.12.191> SELECT COUNT(DISTINCT("uuid")) FROM ANALYTICS;
> +------------------------------------------+
> |          DISTINCT_COUNT("uuid")          |
> +------------------------------------------+
> | 773373                                   |
> +------------------------------------------+
> 1 row selected (5.751 seconds)
> 
> Now, this is only showing up with lots of events, and I have almost 1 million 
> events.
> 
> We've managed to pin down an errant event UUID. The primary key is meant to 
> ensure that repeated events (Flume's at-least-once guarantees) are 
> overwritten, rather than duplicated (due to UPSERT). Here's some example 
> queries for the errant UUID:
> 
> 0: jdbc:phoenix:10.124.12.191> SELECT * FROM ANALYTICS WHERE "uuid" = 
> '00011783-c870-463c-9c01-4e530a57714f';
> +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+
> |               school_code                |                   code           
>         |        iso_8601         |                   uuid                   
> |                   
> +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+
> | WRA0004                                  | site.view                        
>         | 2016-03-21 16:00:00.000 | 00011783-c870-463c-9c01-4e530a57714f     
> |                                          
> | WRA0004                                  | site.view                        
>         | 2016-03-21 16:00:00.000 | 00011783-c870-463c-9c01-4e530a57714f     
> |                                          
> +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+
> 
> 
> Which shows two events with what should be a duplicate primary key.
> 
> If I pin it down by including all the primary key fields, however, I get one 
> result:
> 
> 0: jdbc:phoenix:10.124.12.191> SELECT * FROM ANALYTICS WHERE "school_code" = 
> 'WRA0004' AND "code" = 'site.view' AND "iso_8601" = TO_DATE('2016-03-21 
> 16:00:00') AND "uuid" = '00011783-c870-463c-9c01-4e530a57714f';
> +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+
> |               school_code                |                   code           
>         |        iso_8601         |                   uuid                   |
> +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+
> | WRA0004                                  | site.view                        
>         | 2016-03-21 16:00:00.000 | 00011783-c870-463c-9c01-4e530a57714f     |
> +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+
> 
> Finally, hbase shell is showing me the correct number of records:
> 
> hbase(main):001:0> count 'ANALYTICS'
> # lots of output
> 773376 row(s) in 142.3770 seconds
> 
> Now, I'll admit, I would have expected this to be "impossible".
> 
> Does anybody have any idea how Phoenix can show me two primary keys with a 
> generic query, but only one for a specific query? And how I might fix this?
> 
> Thanks in advance,
> Craig Roberts
> Senior Developer
> 
> FrogAsia Sdn Bhd (A YTL Company) | Unit 9, Level 2, D6 at Sentul East | 801, 
> Jalan Sentul, 51000 Kuala Lumpur | 01125618093 | Twitter 
> <http://www.twitter.com/FrogAsia> | Facebook 
> <http://www.facebook.com/FrogAsia> | Website <http://www.frogasia.com/>
> This message (including any attachments) is for the use of the addressee 
> only. It may contain private proprietary or legally privileged statements and 
> information. No confidentiality or privilege is waived or lost by any 
> mistransmission. If you are not the intended recipient, please immediately 
> delete it and all copies of it from your system, destroy any hard copies of 
> it and notify the sender. You must not, directly or indirectly, use, 
> disclose, distribute, print, copy or rely on any part of the message if you 
> are not the intended recipient. Any views expressed in this message 
> (including any attachments) are those of the individual sender and not those 
> of any member of the YTL Group, except where the message states otherwise and 
> the sender is authorized to state them to be the views of any such entity.
> 

Reply via email to