Hi, while importing data using the CsvBulkLoadTool I've run into an issue
trying to query the data using sqlline.py. The bulk load tool was successful.
There were no errors. However when I attempt to query the data I get some
exceptions:
java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
followed by many GlobalMemoryManger errors:
WARN memory.GlobalMemoryManager: Orphaned chunk of xxxx bytes found during
finalize
Not all queries, but most, produce this error and it seems related to the
existence of a secondary index table:
select * from TABLE limit 10; --ERROR - index not used
select <un-indexed field> from TABLE limit 10 -- ERROR
If I run a query on an INTEGER column with a secondary index I do not get this
error:
select distinct(fieldx) from TABLE limit 10; -- SUCCESS!
However, a similar query on an indexed VARCHAR field produces a timeout error:
java.lang.RuntimeException: ... PhoenixIOException: Failed after retry of
OutOfOrderScannerNextException: was there a rpc timeout?
at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)
select count(*) ... times out as well
Details:
Total records imported: 7.2B
Cluster size: 30 nodes
Splits: 40 (salted)
Phoenix version: 4.2.0
HBase version: 0.98
HDP distro 2.1.5
I can scan the data with no errors from hbase shell
Basic Phoenix table def:
CREATE TABLE IF NOT EXISTS
t1_csv_data
(
timestamp BIGINT NOT NULL,
location VARCHAR NOT NULL,
fileid VARCHAR NOT NULL,
recnum INTEGER NOT NULL,
field5 VARCHAR,
...
field45 VARCHAR,
CONSTRAINT pkey PRIMARY KEY (timestamp,
location, fileid,recnum)
)
IMMUTABLE_ROWS=true,COMPRESSION='SNAPPY',SALT_BUCKETS=40,
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';
-- indexes
CREATE INDEX t1_csv_data_f1_idx ON t1_csv_data(somefield1)
COMPRESSION='SNAPPY',
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';;
CREATE INDEX t1_csv_data_f2_idx ON t1_csv_data(somefield2)
COMPRESSION='SNAPPY',
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';;
CREATE INDEX t1_csv_data_f3_idx ON t1_csv_data(somefield3)
COMPRESSION='SNAPPY',
SPLIT_POLICY='org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy';;
Thanks for your help,
Ralph