Hi, while importing data using the CsvBulkLoadTool I've run into an issue 
trying to query the data using sqlline.py.  The bulk load tool was successful.  
There were no errors.  However when I attempt to query the data I get some 
exceptions:

java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException
        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)

followed by many GlobalMemoryManger errors:

WARN memory.GlobalMemoryManager: Orphaned chunk of xxxx bytes found during 
finalize

Not all queries, but most, produce this error and it seems related to the 
existence of a secondary index table:

select * from TABLE limit 10;  --ERROR - index not used
select <un-indexed field> from TABLE limit 10 -- ERROR

If I run a query on an INTEGER column with a secondary index I do not get this 
error:

select distinct(fieldx) from TABLE limit 10;  -- SUCCESS!

However, a similar query on an indexed VARCHAR field produces a timeout error:
java.lang.RuntimeException: ... PhoenixIOException: Failed after retry of 
OutOfOrderScannerNextException: was there a rpc timeout?
        at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)

select count(*) ... times out as well

Details:
Total records imported: 7.2B
Cluster size: 30 nodes
Splits: 40 (salted)

Phoenix version: 4.2.0
HBase version: 0.98
HDP distro 2.1.5

I can scan the data with no errors from hbase shell

Basic Phoenix table def:

CREATE TABLE IF NOT EXISTS
t1_csv_data
(
timestamp BIGINT NOT NULL,
location VARCHAR NOT NULL,
fileid VARCHAR NOT NULL,
recnum INTEGER NOT NULL,
field5 VARCHAR,
...
field45 VARCHAR,
CONSTRAINT pkey PRIMARY KEY (timestamp,
location, fileid,recnum)
)
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40, 
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';

-- indexes
CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1) 
COMPRESSION='SNAPPY', 
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';;
CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2) 
COMPRESSION='SNAPPY', 
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';;
CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3) 
COMPRESSION='SNAPPY', 
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';;

Thanks for your help,
Ralph

Reply via email to