>From master UI, click 'zk dump' :60010/zk.jsp would show you the active connections. See if the count reaches 300 when map tasks run.
On Mon, Jul 4, 2011 at 10:12 AM, Ted Yu <[email protected]> wrote: > The reason I asked about HBaseURLsDaysAggregator.java was that I see no > HBase (client) code in call stack. > I have little clue for the problem you experienced. > > There may be more than one connection to zookeeper from one map task. > So it doesn't hurt if you increase hbase.zookeeper.property.maxClientCnxns > > Cheers > > > On Mon, Jul 4, 2011 at 9:47 AM, Lior Schachter <[email protected]>wrote: > >> 1. HBaseURLsDaysAggregator.java:124, HBaseURLsDaysAggregator.java:131 : >> are >> not important since even when I removed all my map code the tasks got >> stuck >> (but the thread dumps were generated after I revived the code). If you >> think >> its important I'll remove the map code again and re-generate the thread >> dumps... >> >> 2. 82 maps were launched but only 36 ran simultaneously. >> >> 3. hbase.zookeeper.property.maxClientCnxns = 300. Should I increase it ? >> >> Thanks, >> Lior >> >> >> On Mon, Jul 4, 2011 at 7:33 PM, Ted Yu <[email protected]> wrote: >> >> > In the future, provide full dump using pastebin.com >> > Write snippet of log in email. >> > >> > Can you tell us what the following lines are about ? >> > HBaseURLsDaysAggregator.java:124 >> > HBaseURLsDaysAggregator.java:131 >> > >> > How many mappers were launched ? >> > >> > What value is used for hbase.zookeeper.property.maxClientCnxns ? >> > You may need to increase the value for above setting. >> > >> > On Mon, Jul 4, 2011 at 9:26 AM, Lior Schachter <[email protected]> >> > wrote: >> > >> > > I used kill -3, following the thread dump: >> > > >> > > ... >> > > >> > > >> > > On Mon, Jul 4, 2011 at 6:22 PM, Ted Yu <[email protected]> wrote: >> > > >> > > > I wasn't clear in my previous email. >> > > > It was not answer to why map tasks got stuck. >> > > > TableInputFormatBase.getSplits() is being called already. >> > > > >> > > > Can you try getting jstack of one of the map tasks before task >> tracker >> > > > kills >> > > > it ? >> > > > >> > > > Thanks >> > > > >> > > > On Mon, Jul 4, 2011 at 8:15 AM, Lior Schachter <[email protected] >> > >> > > > wrote: >> > > > >> > > > > 1. Currently every map gets one region. So I don't understand what >> > > > > difference will it make using the splits. >> > > > > 2. How should I use the TableInputFormatBase.getSplits() ? Could >> not >> > > find >> > > > > examples for that. >> > > > > >> > > > > Thanks, >> > > > > Lior >> > > > > >> > > > > >> > > > > On Mon, Jul 4, 2011 at 5:55 PM, Ted Yu <[email protected]> >> wrote: >> > > > > >> > > > > > For #2, see TableInputFormatBase.getSplits(): >> > > > > > * Calculates the splits that will serve as input for the map >> > tasks. >> > > > The >> > > > > > * number of splits matches the number of regions in a table. >> > > > > > >> > > > > > >> > > > > > On Mon, Jul 4, 2011 at 7:37 AM, Lior Schachter < >> > [email protected]> >> > > > > > wrote: >> > > > > > >> > > > > > > 1. yes - I configure my job using this line: >> > > > > > > >> > TableMapReduceUtil.initTableMapperJob(HBaseConsts.URLS_TABLE_NAME, >> > > > > scan, >> > > > > > > ScanMapper.class, Text.class, MapWritable.class, job) >> > > > > > > >> > > > > > > which internally uses TableInputFormat.class >> > > > > > > >> > > > > > > 2. One split per region ? What do you mean ? How do I do that >> ? >> > > > > > > >> > > > > > > 3. hbase version 0.90.2 >> > > > > > > >> > > > > > > 4. no exceptions. the logs are very clean. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Mon, Jul 4, 2011 at 5:22 PM, Ted Yu <[email protected]> >> > > wrote: >> > > > > > > >> > > > > > > > Do you use TableInputFormat ? >> > > > > > > > To scan large number of rows, it would be better to produce >> one >> > > > Split >> > > > > > per >> > > > > > > > region. >> > > > > > > > >> > > > > > > > What HBase version do you use ? >> > > > > > > > Do you find any exception in master / region server logs >> around >> > > the >> > > > > > > moment >> > > > > > > > of timeout ? >> > > > > > > > >> > > > > > > > Cheers >> > > > > > > > >> > > > > > > > On Mon, Jul 4, 2011 at 4:48 AM, Lior Schachter < >> > > > [email protected]> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi all, >> > > > > > > > > I'm running a scan using the M/R framework. >> > > > > > > > > My table contains hundreds of millions of rows and I'm >> > scanning >> > > > > using >> > > > > > > > > start/stop key about 50 million rows. >> > > > > > > > > >> > > > > > > > > The problem is that some map tasks get stuck and the task >> > > manager >> > > > > > kills >> > > > > > > > > these maps after 600 seconds. When retrying the task >> > everything >> > > > > works >> > > > > > > > fine >> > > > > > > > > (sometimes). >> > > > > > > > > >> > > > > > > > > To verify that the problem is in hbase (and not in the map >> > > code) >> > > > I >> > > > > > > > removed >> > > > > > > > > all the code from my map function, so it looks like this: >> > > > > > > > > public void map(ImmutableBytesWritable key, Result value, >> > > Context >> > > > > > > > context) >> > > > > > > > > throws IOException, InterruptedException { >> > > > > > > > > } >> > > > > > > > > >> > > > > > > > > Also, when the map got stuck on a region, I tried to scan >> > this >> > > > > region >> > > > > > > > > (using >> > > > > > > > > simple scan from a Java main) and it worked fine. >> > > > > > > > > >> > > > > > > > > Any ideas ? >> > > > > > > > > >> > > > > > > > > Thanks, >> > > > > > > > > Lior >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > >
