Re: monitoring status of CREATE INDEX operation

Nathan Davis Fri, 12 Aug 2016 07:39:57 -0700

Thanks for the detailed info. I took the advice of using the ASYNC method.
The CREATE statement executes fine and I end up with an index table showing
in state BUILDING. When I kick off the MR job with `hbase
org.apache.phoenix.mapreduce.index.IndexTool --schema trans --data-table
event --index-table event_object_id_idx_b --output-path
EVENT_OBJECT_ID_IDX_B_HFILES` I get this odd error:


2016-08-12 14:29:40,073 ERROR [main] index.IndexTool:  An exception occured
> while performing the indexing job : java.lang.IllegalArgumentException:
>  trans.event_object_id_idx_b is not an index table for trans.event
> at org.apache.phoenix.mapreduce.index.IndexTool.run(IndexTool.java:187)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.phoenix.mapreduce.index.IndexTool.main(IndexTool.java:378)




My CREATE INDEX was as follows:

create index if not exists event_object_id_idx_b on trans.event (
>     object_id
> ) ASYNC UPDATE_CACHE_FREQUENCY=60000;



On Thu, Aug 11, 2016 at 9:40 PM, James Taylor <[email protected]>
wrote:

> Hi Nathan,
> If your table is large, I'd recommend creating your index asynchronously.
> To do that, you'd add the ASYNC keyword to the end of your CREATE INDEX
> call. In this case, the index will be built through Map Reduce in a more
> resilient manner (i.e. the client going up or down won't impact it and you
> have the regular retries of a MR job). On AWS, you'll need to manually
> start the MR job, but at SFDC we have a CRON job that'll start it for you
> automatically (this is open source too, so it'd be good to get that up and
> running in AWS as well). See https://phoenix.apache.
> org/secondary_indexing.html#Index_Population for details.
>
> If you don't run it asynchronously, then you'll need to increase the query
> timeout (i.e. phoenix.query.timeoutMs config property) to be larger than
> the time it'll take to build the index. If the client goes down before the
> CREATE INDEX call finished (or the query times out), then the index build
> will stop (and unfortunately will need to be run again).
>
> To monitor the index build, there are a few ways - if running through MR,
> then you can monitor the MR job in the standard way. If running
> synchronously, you can monitor the index table - you'll see new regions
> created as splits occur, or you could query the SYSTEM.STATS table (which
> gets populated as splits and compactions happen), or you could run a
> count(*) query directly against the index table (though that'll put more
> load on your system because it'll require a full table scan).
>
> HTH. Thanks,
> James
>
> On Thu, Aug 11, 2016 at 5:51 PM, Nathan Davis <[email protected]
> > wrote:
>
>> Hi All,
>> I executed a CREATE INDEX against a fairly large table. And I received a
>> timeout error after a minute or two, which I understand is expected.
>> `!tables` in sqlline shows the index is still in BUILDING state after 2
>> hours, which may be accurate since it is a pretty large table and my
>> cluster is just a smallish EMR throwaway.
>>
>> My question is: Is there some way to verify that the index is in fact
>> still being built? Perhaps some HBase logs or the UI or some hbase shell
>> command? Unfortunately I am just as new to HBase as I am to Phoenix itself.
>>
>> Thanks,
>>  -nathan
>>
>
>

Re: monitoring status of CREATE INDEX operation

Reply via email to