(Running Hbase 0.90.0 on 700+ nodes.)

You may have seen many (or mostly all) of the following issues already:
   1. HConnection.isTableAvailable: This doesn't seem to be working all the 
time. In particular, I had this code after creating a table asynchronously:

   do {
      LOG.info("Table " + tableName + "not yet available... Sleeping for" + 
sleepTime + "milliseconds...");
      Thread.sleep(sleepTime);
    } while (!conn.isTableAvailable(table.getTableName()));
    LOG.info("Table is available!! : "+tableName+" Available? 
"+conn.isTableAvailable(table.getTableName()));

It comes out of the loop but then I see this:
Table is available!! : <TABLE> Available? false

And then I see that not all the regions are yet available.


   2. The master getting stuck unable to delete a WAL (I have seen this before 
on this forum and a related JIRA on this one): We had worked around by manually 
deleting a WAL. But during times when the master crashed during table creation 
(with split key boundaries), the node that took over next as the master 
(failover) started getting stuck for around 25% of the cluster. I had to wipe 
out all the logs so that the master could start up right.

But even then, the regionservers which had suffered the log issue couldn't 
recognize the failed over master. (Is this something that has been observed 
before?)


   3. createTableAsync with incorrect split keys: By mistake, I had some 
duplicate keys in the split key byte array while calling the createTableAsync 
function. The master crashed throwing a KeeperException (thanks to the 
duplicate keys I guess?)


Also, can you let me know why createTableAsync blocks for some time and throws 
a socket timeout exception when I try creating a table with a large number of 
regions?

Thank you
Vidhya

Reply via email to