Re: A few issues we ran into the last couple of weeks.

Vidhyashankar Venkataraman Wed, 18 May 2011 10:48:02 -0700

As in, the use of isTableAvailable there indicates, a bulk load should happen 
only if all the regions are available.


But that may not be the case since the function returns back true if even one 
region (regionCount.get()>0 check) is online.

V


On 5/17/11 7:14 PM, "Ted Yu" <[email protected]> wrote:

Did you mean that coming out of the following loop, the table might still be
unavailable if there were many regions ?
    while (!conn.isTableAvailable(table.getTableName()) &&
(ctr<TABLE_CREATE_MAX_RETRIES)) {

Cheers

On Tue, May 17, 2011 at 7:10 PM, Ted Yu <[email protected]> wrote:

> >> Also some of the source for which we had used this function may be
> broken (for example in LoadIncrementalHFiles.java)
> Can you be more specific ?
>
> Thanks
>
>
> On Tue, May 17, 2011 at 5:54 PM, Vidhyashankar Venkataraman <
> [email protected]> wrote:
>
>> >> For 1, the check in HCM.isTableAvailable() is:
>> >>      return available.get() && (regionCount.get() > 0);
>> >> This explains why some regions aren't available.
>>
>> The javadoc says the function returns true if all regions are available.
>> Clearly this statement is wrong going by what is there in the code. Also
>> some of the source for which we had used this function may be broken (for
>> example in LoadIncrementalHFiles.java).
>>
>> >> For 3, can you provide a unit test so that we can investigate further ?
>>
>> The problem is I am unable to get the master crash consistently. I can
>> send you the key split.
>>
>> Thank you
>> Vidhya
>>
>> On 5/17/11 4:59 PM, "Ted Yu" <[email protected]> wrote:
>>
>> For 1, the check in HCM.isTableAvailable() is:
>>      return available.get() && (regionCount.get() > 0);
>> This explains why some regions aren't available.
>>
>> For 3, can you provide a unit test so that we can investigate further ?
>>
>> Thanks
>>
>> On Tue, May 17, 2011 at 4:25 PM, Vidhyashankar Venkataraman <
>> [email protected]> wrote:
>>
>> > (Running Hbase 0.90.0 on 700+ nodes.)
>> >
>> > You may have seen many (or mostly all) of the following issues already:
>> >   1. HConnection.isTableAvailable: This doesn't seem to be working all
>> the
>> > time. In particular, I had this code after creating a table
>> asynchronously:
>> >
>> >   do {
>> >      LOG.info("Table " + tableName + "not yet available... Sleeping for"
>> +
>> > sleepTime + "milliseconds...");
>> >      Thread.sleep(sleepTime);
>> >    } while (!conn.isTableAvailable(table.getTableName()));
>> >    LOG.info("Table is available!! : "+tableName+" Available?
>> > "+conn.isTableAvailable(table.getTableName()));
>> >
>> > It comes out of the loop but then I see this:
>> > Table is available!! : <TABLE> Available? false
>> >
>> > And then I see that not all the regions are yet available.
>> >
>> >
>> >   2. The master getting stuck unable to delete a WAL (I have seen this
>> > before on this forum and a related JIRA on this one): We had worked
>> around
>> > by manually deleting a WAL. But during times when the master crashed
>> during
>> > table creation (with split key boundaries), the node that took over next
>> as
>> > the master (failover) started getting stuck for around 25% of the
>> cluster. I
>> > had to wipe out all the logs so that the master could start up right.
>> >
>> > But even then, the regionservers which had suffered the log issue
>> couldn't
>> > recognize the failed over master. (Is this something that has been
>> observed
>> > before?)
>> >
>> >
>> >   3. createTableAsync with incorrect split keys: By mistake, I had some
>> > duplicate keys in the split key byte array while calling the
>> > createTableAsync function. The master crashed throwing a KeeperException
>> > (thanks to the duplicate keys I guess?)
>> >
>> >
>> > Also, can you let me know why createTableAsync blocks for some time and
>> > throws a socket timeout exception when I try creating a table with a
>> large
>> > number of regions?
>> >
>> > Thank you
>> > Vidhya
>> >
>>
>>
>

Re: A few issues we ran into the last couple of weeks.

Reply via email to