I suspect that both rows exist and that they are different, but you can’t see the difference due to some non-printing character. Did you try doing the query using JDBC or a direct HBase scan?
Brian > On Oct 26, 2016, at 6:20 AM, Craig Roberts <craig.robe...@frogasia.com> wrote: > > Hi all, > > I have a strange issue with Phoenix/HBase (Phoenix 4.4 and HBase 1.1 in HDP > 2.4). > > This is my table DDL: > > CREATE TABLE IF NOT EXISTS ANALYTICS( > "school_code" VARCHAR NOT NULL, > "code" VARCHAR NOT NULL, > "iso_8601" DATE NOT NULL, > "uuid" VARCHAR NOT NULL, > -- Used to ensure the "events" column family exists > "events"."___" VARCHAR, > CONSTRAINT id PRIMARY KEY ("school_code", "code", "iso_8601", "uuid") > ) > > I then load events into the table using a custom Flume sink. > > > 0: jdbc:phoenix:10.124.12.191> SELECT COUNT(*) FROM ANALYTICS; > +------------------------------------------+ > | COUNT(1) | > +------------------------------------------+ > | 1248515 | > +------------------------------------------+ > 1 row selected (2.415 seconds) > 0: jdbc:phoenix:10.124.12.191> SELECT COUNT(DISTINCT("uuid")) FROM ANALYTICS; > +------------------------------------------+ > | DISTINCT_COUNT("uuid") | > +------------------------------------------+ > | 773373 | > +------------------------------------------+ > 1 row selected (5.751 seconds) > > Now, this is only showing up with lots of events, and I have almost 1 million > events. > > We've managed to pin down an errant event UUID. The primary key is meant to > ensure that repeated events (Flume's at-least-once guarantees) are > overwritten, rather than duplicated (due to UPSERT). Here's some example > queries for the errant UUID: > > 0: jdbc:phoenix:10.124.12.191> SELECT * FROM ANALYTICS WHERE "uuid" = > '00011783-c870-463c-9c01-4e530a57714f'; > +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+ > | school_code | code > | iso_8601 | uuid > | > +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+ > | WRA0004 | site.view > | 2016-03-21 16:00:00.000 | 00011783-c870-463c-9c01-4e530a57714f > | > | WRA0004 | site.view > | 2016-03-21 16:00:00.000 | 00011783-c870-463c-9c01-4e530a57714f > | > +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+ > > > Which shows two events with what should be a duplicate primary key. > > If I pin it down by including all the primary key fields, however, I get one > result: > > 0: jdbc:phoenix:10.124.12.191> SELECT * FROM ANALYTICS WHERE "school_code" = > 'WRA0004' AND "code" = 'site.view' AND "iso_8601" = TO_DATE('2016-03-21 > 16:00:00') AND "uuid" = '00011783-c870-463c-9c01-4e530a57714f'; > +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+ > | school_code | code > | iso_8601 | uuid | > +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+ > | WRA0004 | site.view > | 2016-03-21 16:00:00.000 | 00011783-c870-463c-9c01-4e530a57714f | > +------------------------------------------+------------------------------------------+-------------------------+------------------------------------------+ > > Finally, hbase shell is showing me the correct number of records: > > hbase(main):001:0> count 'ANALYTICS' > # lots of output > 773376 row(s) in 142.3770 seconds > > Now, I'll admit, I would have expected this to be "impossible". > > Does anybody have any idea how Phoenix can show me two primary keys with a > generic query, but only one for a specific query? And how I might fix this? > > Thanks in advance, > Craig Roberts > Senior Developer > > FrogAsia Sdn Bhd (A YTL Company) | Unit 9, Level 2, D6 at Sentul East | 801, > Jalan Sentul, 51000 Kuala Lumpur | 01125618093 | Twitter > <http://www.twitter.com/FrogAsia> | Facebook > <http://www.facebook.com/FrogAsia> | Website <http://www.frogasia.com/> > This message (including any attachments) is for the use of the addressee > only. It may contain private proprietary or legally privileged statements and > information. No confidentiality or privilege is waived or lost by any > mistransmission. If you are not the intended recipient, please immediately > delete it and all copies of it from your system, destroy any hard copies of > it and notify the sender. You must not, directly or indirectly, use, > disclose, distribute, print, copy or rely on any part of the message if you > are not the intended recipient. Any views expressed in this message > (including any attachments) are those of the individual sender and not those > of any member of the YTL Group, except where the message states otherwise and > the sender is authorized to state them to be the views of any such entity. >