Re: Scan only talks to a single region server

Whitney Sorenson Tue, 17 Jul 2012 10:47:31 -0700

The code is pasted above, here it is again:

ResultScanner rs = table.getScanner(family, qualifier);
for (Result r : rs) {
// do something
}


ResultScanner's are iterable which means you can for:each them. In
addition, the debug logs indicate that the scanner only ever retrieves
rows from the first region server.

On Tue, Jul 17, 2012 at 12:02 PM, Alex Baranau <[email protected]> wrote:
>> How do you create your scan(ner)? Could you paste the code here?
>
> Sorry, meant to ask how do you instantiate HTable, configuration objects.
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau 
> <[email protected]>wrote:
>
>> > this scan is running
>> > inside a map task
>>
>> How do you create your scan(ner)? Could you paste the code here?
>>
>> You know that when HBase table is used as a source for MapReduce job (via
>> standard configuration), each Map task consumes data from one region (apart
>> from other things, it tries to benefit from data locality). I.e. it creates
>> one Map task per region. I wonder if this can be related.
>>
>> Sorry for obvious check...
>>
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>> On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson 
>> <[email protected]>wrote:
>>
>>> I'm trying to scan across an entire table (using only a specific
>>> family or family + qualifier).
>>>
>>> I've tried various methods but I can only get this scan to touch the
>>> first region server. Afterwords, it stops processing. Issuing the same
>>> scan in the shell works (returns 50,000 rows) whereas the Scan made
>>> from Java only returns ~4000 rows.
>>>
>>> I've tried adding/removing start/stop rows, using getScanner(family,
>>> column) vs getScanner(scan), and restarting the region servers which
>>> host the 1st and 2nd regions.
>>>
>>> The debug output from the scan shows that it knows about locations for
>>> each region; however, it calls close after the first region.
>>>
>>> In the simplest case, the code looks like:
>>>
>>> ResultScanner rs = table.getScanner(family, qualifier);
>>> for (Result r : rs) {
>>> // do something
>>> }
>>>
>>> Any ideas or known issues? (0.90.4-cdh3u2 - this scan is running
>>> inside a map task)
>>>
>>> I figure the next step is to walk through the client scanner code
>>> locally in a java main but haven't done this yet.
>>>
>>
>>
>>
>> --
>> Alex Baranau
>> ------
>> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
>> Solr
>>
>>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr

Re: Scan only talks to a single region server

Reply via email to