Re: [DISCUSS] Switch to log4j 2

2014-08-18 Thread Colin McCabe
On Fri, Aug 15, 2014 at 8:50 AM, Aaron T. Myers  wrote:
> Not necessarily opposed to switching logging frameworks, but I believe we
> can actually support async logging with today's logging system if we wanted
> to, e.g. as was done for the HDFS audit logger in this JIRA:
>
> https://issues.apache.org/jira/browse/HDFS-5241

Yes, this is a great example of making something async without
switching logging frameworks.  +1 for doing that where it is
appropriate.

>
> --
> Aaron T. Myers
> Software Engineer, Cloudera
>
>
> On Fri, Aug 15, 2014 at 5:44 AM, Steve Loughran 
> wrote:
>
>> moving to SLF4J as an API is independent —it's just a better API for
>> logging than commons-logging, was already a dependency and doesn't force
>> anyone to switch to a new log back end.

Interesting idea.  Did anyone do a performance comparison and/or API
comparison with SLF4j on Hadoop?

>>
>>
>> On 15 August 2014 03:34, Tsuyoshi OZAWA  wrote:
>>
>> > Hi,
>> >
>> > Steve has started discussion titled "use SLF4J APIs in new modules?"
>> > as a related topic.
>> >
>> >
>> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E
>> >
>> > It sounds good to me to use asynchronous logging when we log INFO. One

-1.  Async logging for everything will make a lot of failures
un-debuggable.  Just to give one example, what if you get a JVM out of
memory crash?  You'll lose the last few log messages which could have
told you what was going on.  Even if the JVM doesn't terminate, log
messages will be out of order, which is annoying, and will make
debugging harder.

The kernel already buffers the log files in memory.  Not every log
message generates a disk seek.  But on the other hand, if the JVM
process crashes, you've got everything.  In other words, we've already
got as much buffering and asynchronicity as we need!

If the problem is that the noisy logs are overloading the disk
bandwidth, that problem can't be solved by adding Java-level async.
You need more bandwidth.  A simple way of doing this is putting the
log partition on /dev/shm.  We could also look into stripping some of
the boilerplate from log messages-- there are a lot of super-long log
messages that could be much more concise.  Other Java logging
frameworks might have less overhead (I'm not an expert on this, but
maybe someone could post some numbers?)

best,
Colin


>> > concern is that asynchronous logging makes debugging difficult - I
>> > don't know log4j 2 well, but I suspect that ordering of logging can be
>> > changed even if WARN or  FATAL are logged with synchronous logger.
>> >
>> > Thanks,
>> > - Tsuyoshi
>> >
>> > On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal > >
>> > wrote:
>> > > I don't recall whether this was discussed before.
>> > >
>> > > I often find our INFO logging to be too sparse for useful diagnosis. A
>> > high
>> > > performance logging framework will encourage us to log more.
>> > Specifically,
>> > > Asynchronous Loggers look interesting.
>> > > https://logging.apache.org/log4j/2.x/manual/async.html#Performance
>> > >
>> > > What does the community think of switching to log4j 2 in a Hadoop 2.x
>> > > release?
>> > >
>> > > --
>> > > CONFIDENTIALITY NOTICE
>> > > NOTICE: This message is intended for the use of the individual or
>> entity
>> > to
>> > > which it is addressed and may contain information that is confidential,
>> > > privileged and exempt from disclosure under applicable law. If the
>> reader
>> > > of this message is not the intended recipient, you are hereby notified
>> > that
>> > > any printing, copying, dissemination, distribution, disclosure or
>> > > forwarding of this communication is strictly prohibited. If you have
>> > > received this communication in error, please contact the sender
>> > immediately
>> > > and delete it from your system. Thank You.
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>


Re: [DISCUSS] Switch to log4j 2

2014-08-17 Thread Arpit Agarwal
The block state change logs are indeed too noisy at INFO and I've not found
them useful when troubleshooting. Just filed HDFS-6860 to fix that.

This is orthogal to SLF4J migration however moving to SLF4J would help ease
the transition to Log4j 2.

Thanks for the pointer to HDFS-5421 Aaron, looking into it.


On Sat, Aug 16, 2014 at 5:07 AM, Steve Loughran 
wrote:

> On 15 August 2014 17:20, Karthik Kambatla  wrote:
>
> > However, IMO we already log too much at INFO level (particularly YARN).
> > Logging more at DEBUG level and lowering the overhead of enabling DEBUG
> > logging is preferable.
> >
>
> +1
>
> This is the log4j properties file I've adopted for minicluster debugging,
> HDFS is pretty noisy these days too. BlockStateChange, for example. Then
> there's Zookeeper
>
>
>
> log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
>
> log4j.logger.org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner=WARN
> log4j.logger.org.apache.hadoop.hdfs.server.blockmanagement=WARN
> log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=WARN
> log4j.logger.org.apache.hadoop.hdfs=WARN
> log4j.logger.BlockStateChange=WARN
>
>
> log4j.logger.org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor=WARN
>
> log4j.logger.org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl=WARN
> log4j.logger.org.apache.zookeeper=WARN
> log4j.logger.org.apache.zookeeper.ClientCnxn=FATAL
>
> log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.security=WARN
> log4j.logger.org.apache.hadoop.metrics2=ERROR
> log4j.logger.org.apache.hadoop.util.HostsFileReader=WARN
> log4j.logger.org.apache.hadoop.yarn.event.AsyncDispatcher=WARN
> log4j.logger.org.apache.hadoop.security.token.delegation=WARN
> log4j.logger.org.apache.hadoop.yarn.util.AbstractLivelinessMonitor=WARN
> log4j.logger.org.apache.hadoop.yarn.server.nodemanager.security=WARN
> log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo=WARN
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [DISCUSS] Switch to log4j 2

2014-08-16 Thread Steve Loughran
On 15 August 2014 17:20, Karthik Kambatla  wrote:

> However, IMO we already log too much at INFO level (particularly YARN).
> Logging more at DEBUG level and lowering the overhead of enabling DEBUG
> logging is preferable.
>

+1

This is the log4j properties file I've adopted for minicluster debugging,
HDFS is pretty noisy these days too. BlockStateChange, for example. Then
there's Zookeeper



log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR
log4j.logger.org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner=WARN
log4j.logger.org.apache.hadoop.hdfs.server.blockmanagement=WARN
log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=WARN
log4j.logger.org.apache.hadoop.hdfs=WARN
log4j.logger.BlockStateChange=WARN

log4j.logger.org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor=WARN
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl=WARN
log4j.logger.org.apache.zookeeper=WARN
log4j.logger.org.apache.zookeeper.ClientCnxn=FATAL

log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.security=WARN
log4j.logger.org.apache.hadoop.metrics2=ERROR
log4j.logger.org.apache.hadoop.util.HostsFileReader=WARN
log4j.logger.org.apache.hadoop.yarn.event.AsyncDispatcher=WARN
log4j.logger.org.apache.hadoop.security.token.delegation=WARN
log4j.logger.org.apache.hadoop.yarn.util.AbstractLivelinessMonitor=WARN
log4j.logger.org.apache.hadoop.yarn.server.nodemanager.security=WARN
log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo=WARN

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [DISCUSS] Switch to log4j 2

2014-08-15 Thread Karthik Kambatla
Using asynchronous loggers for improved performance sounds reasonable.
However, IMO we already log too much at INFO level (particularly YARN).
Logging more at DEBUG level and lowering the overhead of enabling DEBUG
logging is preferable.

One concern is the defaults. Based on what I read on the log4j2 page
shared, we might want to keep our audit logging synchronous and make all
other logging asynchronous. Is there a way to easily configure it this way;
otherwise, what is the dev cost we are looking at?



On Wed, Aug 13, 2014 at 2:44 PM, Arpit Agarwal 
wrote:

> I don't recall whether this was discussed before.
>
> I often find our INFO logging to be too sparse for useful diagnosis. A high
> performance logging framework will encourage us to log more. Specifically,
> Asynchronous Loggers look interesting.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance
>
> What does the community think of switching to log4j 2 in a Hadoop 2.x
> release?
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: [DISCUSS] Switch to log4j 2

2014-08-15 Thread Aaron T. Myers
Not necessarily opposed to switching logging frameworks, but I believe we
can actually support async logging with today's logging system if we wanted
to, e.g. as was done for the HDFS audit logger in this JIRA:

https://issues.apache.org/jira/browse/HDFS-5241

--
Aaron T. Myers
Software Engineer, Cloudera


On Fri, Aug 15, 2014 at 5:44 AM, Steve Loughran 
wrote:

> moving to SLF4J as an API is independent —it's just a better API for
> logging than commons-logging, was already a dependency and doesn't force
> anyone to switch to a new log back end.
>
>
> On 15 August 2014 03:34, Tsuyoshi OZAWA  wrote:
>
> > Hi,
> >
> > Steve has started discussion titled "use SLF4J APIs in new modules?"
> > as a related topic.
> >
> >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E
> >
> > It sounds good to me to use asynchronous logging when we log INFO. One
> > concern is that asynchronous logging makes debugging difficult - I
> > don't know log4j 2 well, but I suspect that ordering of logging can be
> > changed even if WARN or  FATAL are logged with synchronous logger.
> >
> > Thanks,
> > - Tsuyoshi
> >
> > On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal  >
> > wrote:
> > > I don't recall whether this was discussed before.
> > >
> > > I often find our INFO logging to be too sparse for useful diagnosis. A
> > high
> > > performance logging framework will encourage us to log more.
> > Specifically,
> > > Asynchronous Loggers look interesting.
> > > https://logging.apache.org/log4j/2.x/manual/async.html#Performance
> > >
> > > What does the community think of switching to log4j 2 in a Hadoop 2.x
> > > release?
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: [DISCUSS] Switch to log4j 2

2014-08-15 Thread Steve Loughran
moving to SLF4J as an API is independent —it's just a better API for
logging than commons-logging, was already a dependency and doesn't force
anyone to switch to a new log back end.


On 15 August 2014 03:34, Tsuyoshi OZAWA  wrote:

> Hi,
>
> Steve has started discussion titled "use SLF4J APIs in new modules?"
> as a related topic.
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E
>
> It sounds good to me to use asynchronous logging when we log INFO. One
> concern is that asynchronous logging makes debugging difficult - I
> don't know log4j 2 well, but I suspect that ordering of logging can be
> changed even if WARN or  FATAL are logged with synchronous logger.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal 
> wrote:
> > I don't recall whether this was discussed before.
> >
> > I often find our INFO logging to be too sparse for useful diagnosis. A
> high
> > performance logging framework will encourage us to log more.
> Specifically,
> > Asynchronous Loggers look interesting.
> > https://logging.apache.org/log4j/2.x/manual/async.html#Performance
> >
> > What does the community think of switching to log4j 2 in a Hadoop 2.x
> > release?
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [DISCUSS] Switch to log4j 2

2014-08-14 Thread Tsuyoshi OZAWA
Hi,

Steve has started discussion titled "use SLF4J APIs in new modules?"
as a related topic.
http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201404.mbox/%3cca+4kjvv_9cmmtdqzcgzy-chslyb1wkgdunxs7wrheslwbuh...@mail.gmail.com%3E

It sounds good to me to use asynchronous logging when we log INFO. One
concern is that asynchronous logging makes debugging difficult - I
don't know log4j 2 well, but I suspect that ordering of logging can be
changed even if WARN or  FATAL are logged with synchronous logger.

Thanks,
- Tsuyoshi

On Thu, Aug 14, 2014 at 6:44 AM, Arpit Agarwal  wrote:
> I don't recall whether this was discussed before.
>
> I often find our INFO logging to be too sparse for useful diagnosis. A high
> performance logging framework will encourage us to log more. Specifically,
> Asynchronous Loggers look interesting.
> https://logging.apache.org/log4j/2.x/manual/async.html#Performance
>
> What does the community think of switching to log4j 2 in a Hadoop 2.x
> release?
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


[DISCUSS] Switch to log4j 2

2014-08-13 Thread Arpit Agarwal
I don't recall whether this was discussed before.

I often find our INFO logging to be too sparse for useful diagnosis. A high
performance logging framework will encourage us to log more. Specifically,
Asynchronous Loggers look interesting.
https://logging.apache.org/log4j/2.x/manual/async.html#Performance

What does the community think of switching to log4j 2 in a Hadoop 2.x
release?

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.