Hi James,
I'd recommend just the following in your log4j properties to tone down
the log volume:
log4j.logger.org.apache.hadoop.fs.FSNamesystem.audit=WARN
log4j.logger.org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace=WARN
This will keep the INFO level logs that are very useful for deb
Hi Todd,
Our log files were getting to be several gigabytes in size at the INFO level
(particularly the datanode logs), so we changed the log level in all log4j
configs to be WARN. Do you think we're potentially missing some useful
information at INFO and lower? I could lower the log level if yo
Hi James,
You'll need to go farther back in the logs to find what happened to the
block that caused it to get deleted. All of the logs below are too late (the
block's already gone, we need to figure out why).
Can you look backwards through the past several days of the NN logs? Have
you disabled t
OK, these logs are huge, so I'm just going to post the first 1,000 lines
from each for now. Let me know if it would be helpful to have more. The
namenode logs didn't contain either of the strings you were interested in.
A few of the datanode logs had '4841840178880951849':
http://pastebin.com/4M
If you can grep for '4841840178880951849' as well
as /hbase/users/73382377/data/312780071564432169 across all of your datanode
logs plus your NN, and put that online somewhere, that would be great. If
you can grep with -C 20 to get some context that would help as well.
Grepping for the region in q
Thanks, I'll check out HBase-2231. Prior to this problem occurring our
cluster had been running for almost 2 weeks with no problems. I'm not sure
about the GC pauses, but I'll look through the logs. I've never noticed
that before, though.
Also, maybe it would help to understand how we're using
On Sat, May 8, 2010 at 12:02 AM, Stack wrote:
> On Fri, May 7, 2010 at 8:27 PM, James Baldassari
> wrote:
> > java.io.IOException: Cannot open filename
> > /hbase/users/73382377/data/312780071564432169
> >
> This is the regionserver log? Is this deploying the region? It fails?
>
This error is
This could very well be HBASE-2231.
Do you find that region servers occasionally crash after going into GC
pauses?
-Todd
On Fri, May 7, 2010 at 9:02 PM, Stack wrote:
> On Fri, May 7, 2010 at 8:27 PM, James Baldassari
> wrote:
> > java.io.IOException: Cannot open filename
> > /hbase/users/7338
On Fri, May 7, 2010 at 8:27 PM, James Baldassari wrote:
> java.io.IOException: Cannot open filename
> /hbase/users/73382377/data/312780071564432169
>
This is the regionserver log? Is this deploying the region? It fails?
> Our cluster throughput goes from around 3k requests/second down to 500-10