Thanks for the detailed info. I took the advice of using the ASYNC method. The CREATE statement executes fine and I end up with an index table showing in state BUILDING. When I kick off the MR job with `hbase org.apache.phoenix.mapreduce.index.IndexTool --schema trans --data-table event --index-table event_object_id_idx_b --output-path EVENT_OBJECT_ID_IDX_B_HFILES` I get this odd error:
2016-08-12 14:29:40,073 ERROR [main] index.IndexTool: An exception occured > while performing the indexing job : java.lang.IllegalArgumentException: > trans.event_object_id_idx_b is not an index table for trans.event > at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:187) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:378) My CREATE INDEX was as follows: create index if not exists event_object_id_idx_b on trans.event ( > object_id > ) ASYNC UPDATE_CACHE_FREQUENCY=60000; On Thu, Aug 11, 2016 at 9:40 PM, James Taylor <[email protected]> wrote: > Hi Nathan, > If your table is large, I'd recommend creating your index asynchronously. > To do that, you'd add the ASYNC keyword to the end of your CREATE INDEX > call. In this case, the index will be built through Map Reduce in a more > resilient manner (i.e. the client going up or down won't impact it and you > have the regular retries of a MR job). On AWS, you'll need to manually > start the MR job, but at SFDC we have a CRON job that'll start it for you > automatically (this is open source too, so it'd be good to get that up and > running in AWS as well). See https://phoenix.apache. > org/secondary_indexing.html#Index_Population for details. > > If you don't run it asynchronously, then you'll need to increase the query > timeout (i.e. phoenix.query.timeoutMs config property) to be larger than > the time it'll take to build the index. If the client goes down before the > CREATE INDEX call finished (or the query times out), then the index build > will stop (and unfortunately will need to be run again). > > To monitor the index build, there are a few ways - if running through MR, > then you can monitor the MR job in the standard way. If running > synchronously, you can monitor the index table - you'll see new regions > created as splits occur, or you could query the SYSTEM.STATS table (which > gets populated as splits and compactions happen), or you could run a > count(*) query directly against the index table (though that'll put more > load on your system because it'll require a full table scan). > > HTH. Thanks, > James > > On Thu, Aug 11, 2016 at 5:51 PM, Nathan Davis <[email protected] > > wrote: > >> Hi All, >> I executed a CREATE INDEX against a fairly large table. And I received a >> timeout error after a minute or two, which I understand is expected. >> `!tables` in sqlline shows the index is still in BUILDING state after 2 >> hours, which may be accurate since it is a pretty large table and my >> cluster is just a smallish EMR throwaway. >> >> My question is: Is there some way to verify that the index is in fact >> still being built? Perhaps some HBase logs or the UI or some hbase shell >> command? Unfortunately I am just as new to HBase as I am to Phoenix itself. >> >> Thanks, >> -nathan >> > >
