Vidhyashankar: table.getRegionsInfo() is for advanced users (such as you) :-) Anyway, we shouldn't enforce user to call it.
On Wed, May 18, 2011 at 11:12 AM, Vidhyashankar Venkataraman < [email protected]> wrote: > Thanks Ted! Will do it right away. > > 1. we should provide the following new API where numOfRegions is the > expected number of regions to go online: > > I used table.getRegionsInfo() to make sure all regions were online instead > of this function. But that function requires apriori knowledge of the number > of regions. > > V > P.S: Copy-pasting my full name could be a little tedious! > > > On 5/18/11 11:02 AM, "Ted Yu" <[email protected]> wrote: > > Vidhyashankar: > Please file the following JIRAs: > 1. we should provide the following new API where numOfRegions is the > expected number of regions to go online: > public boolean isTableAvailable(final byte[] tableName, int > numOfRegions) throws IOException { > > 2. HBaseAdmin.createTableAsync() should check whether there're duplicate > keys. Since it is a public method, we shouldn't solely reply on > createTable() to perform the check. > > Thanks > > On Wed, May 18, 2011 at 10:46 AM, Vidhyashankar Venkataraman < > [email protected]> wrote: > > > As in, the use of isTableAvailable there indicates, a bulk load should > > happen only if all the regions are available. > > > > But that may not be the case since the function returns back true if even > > one region (regionCount.get()>0 check) is online. > > > > V > > > > > > On 5/17/11 7:14 PM, "Ted Yu" <[email protected]> wrote: > > > > Did you mean that coming out of the following loop, the table might still > > be > > unavailable if there were many regions ? > > while (!conn.isTableAvailable(table.getTableName()) && > > (ctr<TABLE_CREATE_MAX_RETRIES)) { > > > > Cheers > > > > On Tue, May 17, 2011 at 7:10 PM, Ted Yu <[email protected]> wrote: > > > > > >> Also some of the source for which we had used this function may be > > > broken (for example in LoadIncrementalHFiles.java) > > > Can you be more specific ? > > > > > > Thanks > > > > > > > > > On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman < > > > [email protected]> wrote: > > > > > >> >> For 1, the check in HCM.isTableAvailable() is: > > >> >> return available.get() && (regionCount.get() > 0); > > >> >> This explains why some regions aren't available. > > >> > > >> The javadoc says the function returns true if all regions are > available. > > >> Clearly this statement is wrong going by what is there in the code. > Also > > >> some of the source for which we had used this function may be broken > > (for > > >> example in LoadIncrementalHFiles.java). > > >> > > >> >> For 3, can you provide a unit test so that we can investigate > further > > ? > > >> > > >> The problem is I am unable to get the master crash consistently. I can > > >> send you the key split. > > >> > > >> Thank you > > >> Vidhya > > >> > > >> On 5/17/11 4:59 PM, "Ted Yu" <[email protected]> wrote: > > >> > > >> For 1, the check in HCM.isTableAvailable() is: > > >> return available.get() && (regionCount.get() > 0); > > >> This explains why some regions aren't available. > > >> > > >> For 3, can you provide a unit test so that we can investigate further > ? > > >> > > >> Thanks > > >> > > >> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman < > > >> [email protected]> wrote: > > >> > > >> > (Running Hbase 0.90.0 on 700+ nodes.) > > >> > > > >> > You may have seen many (or mostly all) of the following issues > > already: > > >> > 1. HConnection.isTableAvailable: This doesn't seem to be working > all > > >> the > > >> > time. In particular, I had this code after creating a table > > >> asynchronously: > > >> > > > >> > do { > > >> > LOG.info("Table " + tableName + "not yet available... Sleeping > > for" > > >> + > > >> > sleepTime + "milliseconds..."); > > >> > Thread.sleep(sleepTime); > > >> > } while (!conn.isTableAvailable(table.getTableName())); > > >> > LOG.info("Table is available!! : "+tableName+" Available? > > >> > "+conn.isTableAvailable(table.getTableName())); > > >> > > > >> > It comes out of the loop but then I see this: > > >> > Table is available!! : <TABLE> Available? false > > >> > > > >> > And then I see that not all the regions are yet available. > > >> > > > >> > > > >> > 2. The master getting stuck unable to delete a WAL (I have seen > this > > >> > before on this forum and a related JIRA on this one): We had worked > > >> around > > >> > by manually deleting a WAL. But during times when the master crashed > > >> during > > >> > table creation (with split key boundaries), the node that took over > > next > > >> as > > >> > the master (failover) started getting stuck for around 25% of the > > >> cluster. I > > >> > had to wipe out all the logs so that the master could start up > right. > > >> > > > >> > But even then, the regionservers which had suffered the log issue > > >> couldn't > > >> > recognize the failed over master. (Is this something that has been > > >> observed > > >> > before?) > > >> > > > >> > > > >> > 3. createTableAsync with incorrect split keys: By mistake, I had > > some > > >> > duplicate keys in the split key byte array while calling the > > >> > createTableAsync function. The master crashed throwing a > > KeeperException > > >> > (thanks to the duplicate keys I guess?) > > >> > > > >> > > > >> > Also, can you let me know why createTableAsync blocks for some time > > and > > >> > throws a socket timeout exception when I try creating a table with a > > >> large > > >> > number of regions? > > >> > > > >> > Thank you > > >> > Vidhya > > >> > > > >> > > >> > > > > > > > > >
