Re: Put errors via thrift

Chris Tarnas Tue, 15 Feb 2011 14:48:42 -0800

Thanks for the help. It definitely looks like the move to 0.90 would resolve 
many of these issues.


-chris

On Feb 15, 2011, at 2:33 PM, Jean-Daniel Cryans wrote:

> That would make sense... although I've done testing and the more files
> you have to split, the longer it takes to create the reference files
> so the longer the split. Now that I think of it, with your high
> blocking store files setting, you may be running into an extreme case
> of https://issues.apache.org/jira/browse/HBASE-3308
> 
> J-D
> 
> On Tue, Feb 15, 2011 at 2:27 PM, Chris Tarnas <[email protected]> wrote:
>> No swapping, about 30% of the total CPU is idle, looking through ganglia I 
>> do see a spike in cpu_wio at that time - but only to 2%. My suspect though 
>> is GZ compression is just taking a while.
>> 
>> 
>> 
>> On Feb 15, 2011, at 2:10 PM, Jean-Daniel Cryans wrote:
>> 
>>> Yeah if it's the same key space that splits, it could explain the
>>> issue... 65 seconds is a long time! Is there any swapping going on?
>>> CPU or IO starvation?
>>> 
>>> In that context I don't see any problem setting the pausing time higher.
>>> 
>>> J-D
>>> 
>>> On Tue, Feb 15, 2011 at 1:54 PM, Chris Tarnas <[email protected]> wrote:
>>>> Hi JD,
>>>> 
>>>> Two splits happened within 90 seconds of each other on one server - one 
>>>> took 65 seconds, the next took 43 seconds. with only a 10 second timeout 
>>>> (10 tries, 1 second between) I think that was the issue. Are their any 
>>>> hidden issues to raising those retry parameters so I can withstand a 120 
>>>> second pause?
>>>> 
>>>> thanks,
>>>> -chris
>>>> 
>>>> On Feb 15, 2011, at 1:37 PM, Chris Tarnas wrote:
>>>> 
>>>>> 
>>>>> On Feb 15, 2011, at 11:32 AM, Jean-Daniel Cryans wrote:
>>>>> 
>>>>>> On Tue, Feb 15, 2011 at 11:24 AM, Chris Tarnas <[email protected]> wrote:
>>>>>>> We are definitely considering writing a bulk loader, but as it is this 
>>>>>>> fits into an existing processing pipeline that is not Java and does not 
>>>>>>> fit into the importtsv tool (we use column names as data as well) we 
>>>>>>> have not done it yet. I do foresee a Java bulk loader in our future 
>>>>>>> though.
>>>>>> 
>>>>>> Well I was referring to THE bulk loader: 
>>>>>> http://hbase.apache.org/bulk-loads.html
>>>>>> 
>>>>> 
>>>>> It has the same problem really for us. Also - does that needs 0.92 for 
>>>>> multi-column support? I'm pretty sure we will be moving to a bulk loader 
>>>>> soon.
>>>>> 
>>>>>>> 
>>>>>>> Does the shell expose the createTable method that defines the number of 
>>>>>>> columns (or I suppose I'll probably need to brush up on my JRuby...). 
>>>>>>> Splits were definitely happening then. Currently I'm using 1GB regions, 
>>>>>>> I'll probably go larger (~5) and salt my keys to distribute them better.
>>>>>> 
>>>>>> I don't think that method is in the shell, it'd be weird anyway to
>>>>>> write down hundreds of bytes in the shell IMO... Do you see a region
>>>>>> hotspots? If so, definitely solve the key distribution as it's going
>>>>>> to kill your performance. Bigger regions won't really help if you're
>>>>>> still always writing to the same few ones.
>>>>>> 
>>>>> 
>>>>> We use schema files that we redirect into the shell like DDL. My other 
>>>>> reason to go to large reasons was we are going to have lots of older data 
>>>>> as well. The top few loads will be hot and used most often but we do need 
>>>>> access to the older data as well. I foresee up to about 2-4 billion rows 
>>>>> a week, so at the rate we are creating these tables that would be quite a 
>>>>> few regions per server at 1GB regions.
>>>>> 
>>>>>>> 
>>>>>>> The reason I had thought it might be compaction related is I saw that 
>>>>>>> we had hit the hbase.hstore.blockingStoreFiles limit as well as having 
>>>>>>> the timeout expire.
>>>>>>> 
>>>>>> 
>>>>>> Well the writes would block on flushing, so unless all the handlers
>>>>>> are filled then you shouldn't see retries exhausted. You could grep
>>>>>> your logs to see how log the splits took btw, but the total locking
>>>>>> time isn't exactly that time... it's less than that. 0.90.1 would
>>>>>> definitely help here.
>>>>>> 
>>>>> 
>>>>> Most splits look to be about 5-7 seconds. I'll investigate more around 
>>>>> the error times and see if any were longer.
>>>>> 
>>>>> We'll be upgrading next week.
>>>>> 
>>>>> Thanks again!
>>>>> -chris
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Put errors via thrift

Reply via email to