Thanks Ted! Will do it right away. 1. we should provide the following new API where numOfRegions is the expected number of regions to go online:
I used table.getRegionsInfo() to make sure all regions were online instead of this function. But that function requires apriori knowledge of the number of regions. V P.S: Copy-pasting my full name could be a little tedious! On 5/18/11 11:02 AM, "Ted Yu" <[email protected]> wrote: Vidhyashankar: Please file the following JIRAs: 1. we should provide the following new API where numOfRegions is the expected number of regions to go online: public boolean isTableAvailable(final byte[] tableName, int numOfRegions) throws IOException { 2. HBaseAdmin.createTableAsync() should check whether there're duplicate keys. Since it is a public method, we shouldn't solely reply on createTable() to perform the check. Thanks On Wed, May 18, 2011 at 10:46 AM, Vidhyashankar Venkataraman < [email protected]> wrote: > As in, the use of isTableAvailable there indicates, a bulk load should > happen only if all the regions are available. > > But that may not be the case since the function returns back true if even > one region (regionCount.get()>0 check) is online. > > V > > > On 5/17/11 7:14 PM, "Ted Yu" <[email protected]> wrote: > > Did you mean that coming out of the following loop, the table might still > be > unavailable if there were many regions ? > while (!conn.isTableAvailable(table.getTableName()) && > (ctr<TABLE_CREATE_MAX_RETRIES)) { > > Cheers > > On Tue, May 17, 2011 at 7:10 PM, Ted Yu <[email protected]> wrote: > > > >> Also some of the source for which we had used this function may be > > broken (for example in LoadIncrementalHFiles.java) > > Can you be more specific ? > > > > Thanks > > > > > > On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman < > > [email protected]> wrote: > > > >> >> For 1, the check in HCM.isTableAvailable() is: > >> >> return available.get() && (regionCount.get() > 0); > >> >> This explains why some regions aren't available. > >> > >> The javadoc says the function returns true if all regions are available. > >> Clearly this statement is wrong going by what is there in the code. Also > >> some of the source for which we had used this function may be broken > (for > >> example in LoadIncrementalHFiles.java). > >> > >> >> For 3, can you provide a unit test so that we can investigate further > ? > >> > >> The problem is I am unable to get the master crash consistently. I can > >> send you the key split. > >> > >> Thank you > >> Vidhya > >> > >> On 5/17/11 4:59 PM, "Ted Yu" <[email protected]> wrote: > >> > >> For 1, the check in HCM.isTableAvailable() is: > >> return available.get() && (regionCount.get() > 0); > >> This explains why some regions aren't available. > >> > >> For 3, can you provide a unit test so that we can investigate further ? > >> > >> Thanks > >> > >> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman < > >> [email protected]> wrote: > >> > >> > (Running Hbase 0.90.0 on 700+ nodes.) > >> > > >> > You may have seen many (or mostly all) of the following issues > already: > >> > 1. HConnection.isTableAvailable: This doesn't seem to be working all > >> the > >> > time. In particular, I had this code after creating a table > >> asynchronously: > >> > > >> > do { > >> > LOG.info("Table " + tableName + "not yet available... Sleeping > for" > >> + > >> > sleepTime + "milliseconds..."); > >> > Thread.sleep(sleepTime); > >> > } while (!conn.isTableAvailable(table.getTableName())); > >> > LOG.info("Table is available!! : "+tableName+" Available? > >> > "+conn.isTableAvailable(table.getTableName())); > >> > > >> > It comes out of the loop but then I see this: > >> > Table is available!! : <TABLE> Available? false > >> > > >> > And then I see that not all the regions are yet available. > >> > > >> > > >> > 2. The master getting stuck unable to delete a WAL (I have seen this > >> > before on this forum and a related JIRA on this one): We had worked > >> around > >> > by manually deleting a WAL. But during times when the master crashed > >> during > >> > table creation (with split key boundaries), the node that took over > next > >> as > >> > the master (failover) started getting stuck for around 25% of the > >> cluster. I > >> > had to wipe out all the logs so that the master could start up right. > >> > > >> > But even then, the regionservers which had suffered the log issue > >> couldn't > >> > recognize the failed over master. (Is this something that has been > >> observed > >> > before?) > >> > > >> > > >> > 3. createTableAsync with incorrect split keys: By mistake, I had > some > >> > duplicate keys in the split key byte array while calling the > >> > createTableAsync function. The master crashed throwing a > KeeperException > >> > (thanks to the duplicate keys I guess?) > >> > > >> > > >> > Also, can you let me know why createTableAsync blocks for some time > and > >> > throws a socket timeout exception when I try creating a table with a > >> large > >> > number of regions? > >> > > >> > Thank you > >> > Vidhya > >> > > >> > >> > > > >
