After using DELETE FROM TABLE_NAME; to purge data from a table, queries that 
"FULL SCAN" TABLE_NAME still return matches against the DELETED data, but 
queries that SCAN OVER indexes do not return the values. Basically, after a 
DELETE, data is in the main table, but not in the indexes.

Any ideas how we can ensure that DELETE fully deletes the original data, or 
detect when this occurs?

Schema is roughly:

CREATE TABLE IF NOT EXISTS table_name
(
  file_time VARCHAR NOT NULL,
  file_name VARCHAR NOT NULL,
  rec_num INTEGER NOT NULL,
  m.f1 VARCHAR,
  m.f2 VARCHAR,
  m.f3 VARCHAR,
  m.f4 VARCHAR,
  m.f5 VARCHAR,
  m.f6 VARCHAR,
  m.f7 VARCHAR,
  m.f8 VARCHAR,
 CONSTRAINT pkey PRIMARY KEY (file_time,file_name,rec_num)
) 
TTL='7776000',IMMUTABLE_ROWS=true,KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',SALT_BUCKETS=10,SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

-- indexes
CREATE INDEX IF NOT EXISTS raw_data_idx  ON table_name(m.f1) 
TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS table_name_f2f3_idx  ON table_name(m.f2,m.f3) 
TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
CREATE INDEX IF NOT EXISTS table_name_f4f5_idx  ON table_name(m.f4,m.f5) 
TTL='7776000',KEEP_DELETED_CELLS='false',COMPRESSION='SNAPPY',MAX_FILESIZE='1000000000',SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';


Query Examples:
0: jdbc:phoenix:dn01> SELECT FILE_NAME FROM TABLE_NAME WHERE FILE_NAME = 
'abcdefg' AND REC_NUM =101;
+------------------------------------------+
|                FILE_NAME                 |
+------------------------------------------+
+------------------------------------------+
No rows selected (1.329 seconds)

Full scan (includes non-index fields)
0: jdbc:phoenix:dn01> SELECT * FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND 
REC_NUM =101;
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
|                FILE_TIME                |                FILE_NAME            
     |                 REC_NUM                  |                F1             
    |               F3                |                    |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+
| 2015-03-23 06:42:37+00                   | abcdefg          | 101             
                        | 49                                       | 50         
                              | 15               |
+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+--------------------+

Full scan (single non-indexed field)
0: jdbc:phoenix:dn01> SELECT F8 FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND 
REC_NUM =101;
+------------------------------------------+
|                  F8                  |
+------------------------------------------+
| ABC                                      |
+------------------------------------------+


Index scan (single indexed field)
0: jdbc:phoenix:dn01> SELECT F1 FROM TABLE_NAME WHERE FILE_NAME = 'abcdefg' AND 
REC_NUM =101;

+------------------------------------------+
|                  F1                          |
+------------------------------------------+
+------------------------------------------+

Thanks,
Bryan G.

Reply via email to