Re: A few issues we ran into the last couple of weeks.

Ted Yu Tue, 17 May 2011 19:11:17 -0700

>> Also some of the source for which we had used this function may be broken
(for example in LoadIncrementalHFiles.java)
Can you be more specific ?


Thanks

On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman <
[email protected]> wrote:

> >> For 1, the check in HCM.isTableAvailable() is:
> >>      return available.get() && (regionCount.get() > 0);
> >> This explains why some regions aren't available.
>
> The javadoc says the function returns true if all regions are available.
> Clearly this statement is wrong going by what is there in the code. Also
> some of the source for which we had used this function may be broken (for
> example in LoadIncrementalHFiles.java).
>
> >> For 3, can you provide a unit test so that we can investigate further ?
>
> The problem is I am unable to get the master crash consistently. I can send
> you the key split.
>
> Thank you
> Vidhya
>
> On 5/17/11 4:59 PM, "Ted Yu" <[email protected]> wrote:
>
> For 1, the check in HCM.isTableAvailable() is:
>      return available.get() && (regionCount.get() > 0);
> This explains why some regions aren't available.
>
> For 3, can you provide a unit test so that we can investigate further ?
>
> Thanks
>
> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman <
> [email protected]> wrote:
>
> > (Running Hbase 0.90.0 on 700+ nodes.)
> >
> > You may have seen many (or mostly all) of the following issues already:
> >   1. HConnection.isTableAvailable: This doesn't seem to be working all
> the
> > time. In particular, I had this code after creating a table
> asynchronously:
> >
> >   do {
> >      LOG.info("Table " + tableName + "not yet available... Sleeping for"
> +
> > sleepTime + "milliseconds...");
> >      Thread.sleep(sleepTime);
> >    } while (!conn.isTableAvailable(table.getTableName()));
> >    LOG.info("Table is available!! : "+tableName+" Available?
> > "+conn.isTableAvailable(table.getTableName()));
> >
> > It comes out of the loop but then I see this:
> > Table is available!! : <TABLE> Available? false
> >
> > And then I see that not all the regions are yet available.
> >
> >
> >   2. The master getting stuck unable to delete a WAL (I have seen this
> > before on this forum and a related JIRA on this one): We had worked
> around
> > by manually deleting a WAL. But during times when the master crashed
> during
> > table creation (with split key boundaries), the node that took over next
> as
> > the master (failover) started getting stuck for around 25% of the
> cluster. I
> > had to wipe out all the logs so that the master could start up right.
> >
> > But even then, the regionservers which had suffered the log issue
> couldn't
> > recognize the failed over master. (Is this something that has been
> observed
> > before?)
> >
> >
> >   3. createTableAsync with incorrect split keys: By mistake, I had some
> > duplicate keys in the split key byte array while calling the
> > createTableAsync function. The master crashed throwing a KeeperException
> > (thanks to the duplicate keys I guess?)
> >
> >
> > Also, can you let me know why createTableAsync blocks for some time and
> > throws a socket timeout exception when I try creating a table with a
> large
> > number of regions?
> >
> > Thank you
> > Vidhya
> >
>
>

Re: A few issues we ran into the last couple of weeks.

Reply via email to