>> Also some of the source for which we had used this function may be broken (for example in LoadIncrementalHFiles.java) Can you be more specific ?
Thanks On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman < [email protected]> wrote: > >> For 1, the check in HCM.isTableAvailable() is: > >> return available.get() && (regionCount.get() > 0); > >> This explains why some regions aren't available. > > The javadoc says the function returns true if all regions are available. > Clearly this statement is wrong going by what is there in the code. Also > some of the source for which we had used this function may be broken (for > example in LoadIncrementalHFiles.java). > > >> For 3, can you provide a unit test so that we can investigate further ? > > The problem is I am unable to get the master crash consistently. I can send > you the key split. > > Thank you > Vidhya > > On 5/17/11 4:59 PM, "Ted Yu" <[email protected]> wrote: > > For 1, the check in HCM.isTableAvailable() is: > return available.get() && (regionCount.get() > 0); > This explains why some regions aren't available. > > For 3, can you provide a unit test so that we can investigate further ? > > Thanks > > On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman < > [email protected]> wrote: > > > (Running Hbase 0.90.0 on 700+ nodes.) > > > > You may have seen many (or mostly all) of the following issues already: > > 1. HConnection.isTableAvailable: This doesn't seem to be working all > the > > time. In particular, I had this code after creating a table > asynchronously: > > > > do { > > LOG.info("Table " + tableName + "not yet available... Sleeping for" > + > > sleepTime + "milliseconds..."); > > Thread.sleep(sleepTime); > > } while (!conn.isTableAvailable(table.getTableName())); > > LOG.info("Table is available!! : "+tableName+" Available? > > "+conn.isTableAvailable(table.getTableName())); > > > > It comes out of the loop but then I see this: > > Table is available!! : <TABLE> Available? false > > > > And then I see that not all the regions are yet available. > > > > > > 2. The master getting stuck unable to delete a WAL (I have seen this > > before on this forum and a related JIRA on this one): We had worked > around > > by manually deleting a WAL. But during times when the master crashed > during > > table creation (with split key boundaries), the node that took over next > as > > the master (failover) started getting stuck for around 25% of the > cluster. I > > had to wipe out all the logs so that the master could start up right. > > > > But even then, the regionservers which had suffered the log issue > couldn't > > recognize the failed over master. (Is this something that has been > observed > > before?) > > > > > > 3. createTableAsync with incorrect split keys: By mistake, I had some > > duplicate keys in the split key byte array while calling the > > createTableAsync function. The master crashed throwing a KeeperException > > (thanks to the duplicate keys I guess?) > > > > > > Also, can you let me know why createTableAsync blocks for some time and > > throws a socket timeout exception when I try creating a table with a > large > > number of regions? > > > > Thank you > > Vidhya > > > >
