As in, the use of isTableAvailable there indicates, a bulk load should happen only if all the regions are available.
But that may not be the case since the function returns back true if even one region (regionCount.get()>0 check) is online. V On 5/17/11 7:14 PM, "Ted Yu" <[email protected]> wrote: Did you mean that coming out of the following loop, the table might still be unavailable if there were many regions ? while (!conn.isTableAvailable(table.getTableName()) && (ctr<TABLE_CREATE_MAX_RETRIES)) { Cheers On Tue, May 17, 2011 at 7:10 PM, Ted Yu <[email protected]> wrote: > >> Also some of the source for which we had used this function may be > broken (for example in LoadIncrementalHFiles.java) > Can you be more specific ? > > Thanks > > > On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman < > [email protected]> wrote: > >> >> For 1, the check in HCM.isTableAvailable() is: >> >> return available.get() && (regionCount.get() > 0); >> >> This explains why some regions aren't available. >> >> The javadoc says the function returns true if all regions are available. >> Clearly this statement is wrong going by what is there in the code. Also >> some of the source for which we had used this function may be broken (for >> example in LoadIncrementalHFiles.java). >> >> >> For 3, can you provide a unit test so that we can investigate further ? >> >> The problem is I am unable to get the master crash consistently. I can >> send you the key split. >> >> Thank you >> Vidhya >> >> On 5/17/11 4:59 PM, "Ted Yu" <[email protected]> wrote: >> >> For 1, the check in HCM.isTableAvailable() is: >> return available.get() && (regionCount.get() > 0); >> This explains why some regions aren't available. >> >> For 3, can you provide a unit test so that we can investigate further ? >> >> Thanks >> >> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman < >> [email protected]> wrote: >> >> > (Running Hbase 0.90.0 on 700+ nodes.) >> > >> > You may have seen many (or mostly all) of the following issues already: >> > 1. HConnection.isTableAvailable: This doesn't seem to be working all >> the >> > time. In particular, I had this code after creating a table >> asynchronously: >> > >> > do { >> > LOG.info("Table " + tableName + "not yet available... Sleeping for" >> + >> > sleepTime + "milliseconds..."); >> > Thread.sleep(sleepTime); >> > } while (!conn.isTableAvailable(table.getTableName())); >> > LOG.info("Table is available!! : "+tableName+" Available? >> > "+conn.isTableAvailable(table.getTableName())); >> > >> > It comes out of the loop but then I see this: >> > Table is available!! : <TABLE> Available? false >> > >> > And then I see that not all the regions are yet available. >> > >> > >> > 2. The master getting stuck unable to delete a WAL (I have seen this >> > before on this forum and a related JIRA on this one): We had worked >> around >> > by manually deleting a WAL. But during times when the master crashed >> during >> > table creation (with split key boundaries), the node that took over next >> as >> > the master (failover) started getting stuck for around 25% of the >> cluster. I >> > had to wipe out all the logs so that the master could start up right. >> > >> > But even then, the regionservers which had suffered the log issue >> couldn't >> > recognize the failed over master. (Is this something that has been >> observed >> > before?) >> > >> > >> > 3. createTableAsync with incorrect split keys: By mistake, I had some >> > duplicate keys in the split key byte array while calling the >> > createTableAsync function. The master crashed throwing a KeeperException >> > (thanks to the duplicate keys I guess?) >> > >> > >> > Also, can you let me know why createTableAsync blocks for some time and >> > throws a socket timeout exception when I try creating a table with a >> large >> > number of regions? >> > >> > Thank you >> > Vidhya >> > >> >> >
