The reason I asked about HBaseURLsDaysAggregator.java was that I see no
HBase (client) code in call stack.
I have little clue for the problem you experienced.

There may be more than one connection to zookeeper from one map task.
So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns

Cheers

On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[email protected]> wrote:

> 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : are
> not important since even when I removed all my map code the tasks got stuck
> (but the thread dumps were generated after I revived the code). If you
> think
> its important I'll remove the map code again and re-generate the thread
> dumps...
>
> 2. 82 maps were launched but only 36 ran simultaneously.
>
> 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ?
>
> Thanks,
> Lior
>
>
> On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[email protected]> wrote:
>
> > In the future, provide full dump using pastebin.com
> > Write snippet of log in email.
> >
> > Can you tell us what the following lines are about ?
> > HBaseURLsDaysAggregator.java:124
> > HBaseURLsDaysAggregator.java:131
> >
> > How many mappers were launched ?
> >
> > What value is used for hbase.zookeeper.property.maxClientCnxns ?
> > You may need to increase the value for above setting.
> >
> > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[email protected]>
> > wrote:
> >
> > > I used kill -3, following the thread dump:
> > >
> > > ...
> > >
> > >
> > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[email protected]> wrote:
> > >
> > > > I wasn't clear in my previous email.
> > > > It was not answer to why map tasks got stuck.
> > > > TableInputFormatBase.getSplits() is being called already.
> > > >
> > > > Can you try getting jstack of one of the map tasks before task
> tracker
> > > > kills
> > > > it ?
> > > >
> > > > Thanks
> > > >
> > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[email protected]>
> > > > wrote:
> > > >
> > > > > 1. Currently every map gets one region. So I don't understand what
> > > > > difference will it make using the splits.
> > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could
> not
> > > find
> > > > > examples for that.
> > > > >
> > > > > Thanks,
> > > > > Lior
> > > > >
> > > > >
> > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[email protected]>
> wrote:
> > > > >
> > > > > > For #2, see TableInputFormatBase.getSplits():
> > > > > >   * Calculates the splits that will serve as input for the map
> > tasks.
> > > > The
> > > > > >   * number of splits matches the number of regions in a table.
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > 1. yes - I configure my job using this line:
> > > > > > >
> > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME,
> > > > > scan,
> > > > > > > ScanMapper.class, Text.class, MapWritable.class, job)
> > > > > > >
> > > > > > > which internally uses TableInputFormat.class
> > > > > > >
> > > > > > > 2. One split per region ? What do you mean ? How do I do that ?
> > > > > > >
> > > > > > > 3. hbase version 0.90.2
> > > > > > >
> > > > > > > 4. no exceptions. the logs are very clean.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[email protected]>
> > > wrote:
> > > > > > >
> > > > > > > > Do you use TableInputFormat ?
> > > > > > > > To scan large number of rows, it would be better to produce
> one
> > > > Split
> > > > > > per
> > > > > > > > region.
> > > > > > > >
> > > > > > > > What HBase version do you use ?
> > > > > > > > Do you find any exception in master / region server logs
> around
> > > the
> > > > > > > moment
> > > > > > > > of timeout ?
> > > > > > > >
> > > > > > > > Cheers
> > > > > > > >
> > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter <
> > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all,
> > > > > > > > > I'm running a scan using the M/R framework.
> > > > > > > > > My table contains hundreds of millions of rows and I'm
> > scanning
> > > > > using
> > > > > > > > > start/stop key about 50 million rows.
> > > > > > > > >
> > > > > > > > > The problem is that some map tasks get stuck and the task
> > > manager
> > > > > > kills
> > > > > > > > > these maps after 600 seconds. When retrying the task
> > everything
> > > > > works
> > > > > > > > fine
> > > > > > > > > (sometimes).
> > > > > > > > >
> > > > > > > > > To verify that the problem is in hbase (and not in the map
> > > code)
> > > > I
> > > > > > > > removed
> > > > > > > > > all the code from my map function, so it looks like this:
> > > > > > > > > public void map(ImmutableBytesWritable key, Result value,
> > > Context
> > > > > > > > context)
> > > > > > > > > throws IOException, InterruptedException {
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Also, when the map got stuck on a region, I tried to scan
> > this
> > > > > region
> > > > > > > > > (using
> > > > > > > > > simple scan from a Java main) and it worked fine.
> > > > > > > > >
> > > > > > > > > Any ideas ?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Lior
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to