Hi Tim,

It sounds like you're doing the right steps to build an index with the async approach. Not having records after IndexTool runs successfully is definitely unexpected :)

If you aren't getting any records in the index table after running the IndexTool, my guess is that something is going awry there. Have you looked at the logging produced by that MapReduce job? (e.g. `yarn logs -applicationId <foo>`).

It's curious that the job runs successfully and sets the index state to active, but you don't have any data loaded.

As a sanity check, does `select * from meta_reads limit 10` return you expected data?

For the future, always a good idea to let us know what version of Hadoop/HBase/Phoenix you're running when asking questions.

PS: If you're worried about the SocketTimeException, it's probably more to do with the size of your data table and the way a `select count(*)` runs. This is a full-table scan, and you'd have to increase hbase.rpc.timeout at at minimum to a larger value. If this is a normal query pattern you intend to service, it will be an exercise in tweaking configs.

On 1/27/20 6:01 PM, Tim Dolbeare wrote:
Hello All,

I've run into a problem with a Phoenix index that no amount of googling is solving.  I hope someone might have run into this before and can offer some suggestions.  I'm a noob BTW, so please don't hesitate to point out the most obvious potential issues.  The problem is that after indexing a table already populated with 1M rows a) any query that uses the new index returns 0 results and b) the index table itself is empty.

I have created a table via psql.py, populated it with 1M rows via CsvBulkLoadTool, created an async covered index on that table in sqlline.py, followed by a mapreduce index population with IndexTool. All of that completes without error, and the index is marked "ACTIVE".

Here are my table and index definitions:

DROP TABLE IF EXISTS meta_reads;
CREATE IMMUTABLE TABLE IF NOT EXISTS meta_reads (
       cluster VARCHAR,
       subclass VARCHAR,
       class VARCHAR,
       sex VARCHAR,
       region VARCHAR,
       subregion VARCHAR,
       cell VARCHAR NOT NULL,
       gene VARCHAR NOT NULL,
       read FLOAT,
       CONSTRAINT my_pk PRIMARY KEY (cell, gene))
IMMUTABLE_STORAGE_SCHEME = ONE_CELL_PER_COLUMN;

create index idx_gc on meta_reads(gene, cluster) include(read) ASYNC;


Almost any query that attempts to use the index returns 0 results, however 'select count(*) from meta_reads' throws a SocketTimeoutException.


Any ideas?

Thanks

Tim





Reply via email to