HttpAuthentication.html is out of date (HADOOP 10550) - some questions and proposed fix
Hi, I searched for Hadoop Http Authentication on google and came across a web page which lists the required steps: http://hadoop.apache.org/docs/r1.0.4/HttpAuthentication.html Should the fix for this bug be to update the document with the steps listed on the above webpage? Thanks.
Re: HADOOP_ROOT_LOGGER
Ah, that makes sense. Would it make sense to default the root logger to the one defined in log4j.properties file instead of the static value in the script then? That way an admin can set all logging properties desired in the log4j.properties file, but can override with HADOOP_ROOT_LOGGER to debug. It feels a little black box-y that if HADOOP_ROOT_LOGGER isn't set then the root logger set in log4j.properties is ignored. Maybe this is all very well known and just a bit black box-y to me since I'm new-ish to hadoop. Rob On 05/22/2014 03:41 PM, Colin McCabe wrote: It's not always practical to edit the log4j.properties file. For one thing, if you're using a management system, there may be many log4j properties sprinkled around the system, and it could be difficult to figure out which is the one you need to edit. For another, you may not (should not?) have permission to do this on a production cluster. Doing something like "HADOOP_ROOT_LOGGER="DEBUG,console" hadoop fs -cat /foo" has helped me diagnose problems in the past. best, Colin On Thu, May 22, 2014 at 6:34 AM, Robert Rati wrote: In my experience the default HADOOP_ROOT_LOGGER definition will override any root logger defined in log4j.properties, which is where the problems have arisen. If the HADOOP_ROOT_LOGGER definition in hadoop-config.sh were removed, wouldn't the root logger defined in the log4j.properties file be used? Or do the client commands not read that configuration file? I'm trying to understand why the root logger should be defined outside of the log4j.properties file. Rob On 05/22/2014 12:53 AM, Vinayakumar B wrote: Hi Robert, I understand your confusion. HADOOP_ROOT_LOGGER is set to default value "INFO,console" if it hasn't set for anything and logs will be displayed on the console itself. This will be true for any client commands you run. For ex: "hdfs dfs -ls /" But for the server scripts (hadoop-daemon.sh, yarn-daemon.sh, etc) HADOOP_ROOT_LOGGER will be set to "INFO, RFA" if HADOOP_ROOT_LOGGER env variable is not defined. So that all the log messages of the server daemons goto some log files and this will be maintained by RollingFileAppender. If you want to override all these default and set your own loglevel then define that as env variable HADOOP_ROOT_LOGGER. For ex: export HADOOP_ROOT_LOGGER="DEBUG,RFA" export above env variable and then start server scripts or execute client commands, all logs goto files and will be maintained by RollingFileAppender. Regards, Vinay On Wed, May 21, 2014 at 6:42 PM, Robert Rati wrote: I noticed in hadoop-config.sh there is this line: HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ ROOT_LOGGER:-INFO,console}" which is setting a root logger if HADOOP_ROOT_LOGGER isn't set. Why is this here.needed? There is a log4j.properties file provided that defines a default logger. I believe the line above will result in overriding whatever is set for the root logger in the log4j.properties file. This has caused some confusion and hacks to work around this. Is there a reason not to remove the above code and just have all the logger definitions in the log4j.properties file? Is there maybe a compatibility concern? Rob
Re: HADOOP_ROOT_LOGGER
It's not always practical to edit the log4j.properties file. For one thing, if you're using a management system, there may be many log4j properties sprinkled around the system, and it could be difficult to figure out which is the one you need to edit. For another, you may not (should not?) have permission to do this on a production cluster. Doing something like "HADOOP_ROOT_LOGGER="DEBUG,console" hadoop fs -cat /foo" has helped me diagnose problems in the past. best, Colin On Thu, May 22, 2014 at 6:34 AM, Robert Rati wrote: > In my experience the default HADOOP_ROOT_LOGGER definition will override > any root logger defined in log4j.properties, which is where the problems > have arisen. If the HADOOP_ROOT_LOGGER definition in hadoop-config.sh were > removed, wouldn't the root logger defined in the log4j.properties file be > used? Or do the client commands not read that configuration file? > > I'm trying to understand why the root logger should be defined outside of > the log4j.properties file. > > Rob > > > On 05/22/2014 12:53 AM, Vinayakumar B wrote: > >> Hi Robert, >> >> I understand your confusion. >> >> HADOOP_ROOT_LOGGER is set to default value "INFO,console" if it hasn't set >> for anything and logs will be displayed on the console itself. >> This will be true for any client commands you run. For ex: "hdfs dfs -ls >> /" >> >> But for the server scripts (hadoop-daemon.sh, yarn-daemon.sh, etc) >> HADOOP_ROOT_LOGGER will be set to "INFO, RFA" if HADOOP_ROOT_LOGGER env >> variable is not defined. >> So that all the log messages of the server daemons goto some log files and >> this will be maintained by RollingFileAppender. If you want to override >> all >> these default and set your own loglevel then define that as env variable >> HADOOP_ROOT_LOGGER. >> >> For ex: >> export HADOOP_ROOT_LOGGER="DEBUG,RFA" >>export above env variable and then start server scripts or execute >> client >> commands, all logs goto files and will be maintained by >> RollingFileAppender. >> >> >> Regards, >> Vinay >> >> >> On Wed, May 21, 2014 at 6:42 PM, Robert Rati wrote: >> >> I noticed in hadoop-config.sh there is this line: >>> >>> HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ >>> ROOT_LOGGER:-INFO,console}" >>> >>> which is setting a root logger if HADOOP_ROOT_LOGGER isn't set. Why is >>> this here.needed? There is a log4j.properties file provided that >>> defines a >>> default logger. I believe the line above will result in overriding >>> whatever is set for the root logger in the log4j.properties file. This >>> has >>> caused some confusion and hacks to work around this. >>> >>> Is there a reason not to remove the above code and just have all the >>> logger definitions in the log4j.properties file? Is there maybe a >>> compatibility concern? >>> >>> Rob >>> >>> >> >> >>
Re: getCounters NPE
(sorry, i meant THROW a NPE, not " return a null). Big difference of course ! On Thu, May 22, 2014 at 1:36 PM, Jay Vyas wrote: > Hi hadoop ... Is there a reason why line 220, below, should ever return > null when > being called through the code path of job.getCounters() ? > > It appears that there could be some indirection involving RPCs, etc, so im > thinking its best to ask before I attempt to trace the call. > > 217 @Override > 218 public GetCountersResponse getCounters(GetCountersRequest request) > 219 throws IOException { > 220 JobId jobId = request.getJobId(); > 221 Job job = verifyAndGetJob(jobId); > 222 GetCountersResponse response = > recordFactory.newRecordInstance(GetCountersResponse.class); > 223 response.setCounters(TypeConverter.toYarn(job.getAllCounters())); > 224 return response; > 225 } > > > -- > Jay Vyas > http://jayunit100.blogspot.com > -- Jay Vyas http://jayunit100.blogspot.com
getCounters NPE
Hi hadoop ... Is there a reason why line 220, below, should ever return null when being called through the code path of job.getCounters() ? It appears that there could be some indirection involving RPCs, etc, so im thinking its best to ask before I attempt to trace the call. 217 @Override 218 public GetCountersResponse getCounters(GetCountersRequest request) 219 throws IOException { 220 JobId jobId = request.getJobId(); 221 Job job = verifyAndGetJob(jobId); 222 GetCountersResponse response = recordFactory.newRecordInstance(GetCountersResponse.class); 223 response.setCounters(TypeConverter.toYarn(job.getAllCounters())); 224 return response; 225 } -- Jay Vyas http://jayunit100.blogspot.com
Struggling New Developer!
Hello everyone, After I studied a course in Hadoop and Mapreduce and wrote a couple of basic jobs, I tried to participate in developing these technologies into its next generation. However, when I looked through some of the submitted patches, I realized that I still miss a lot to be at the required level. I read a couple of textbooks but all of them provided high level descriptions and none of them provided in depth discussions of the Hadoop and Mapreduce operations. Please guys, how can I improve my level up to the required level to participate effectively in this forum?! Thanks a lot in advance. Regards, Hussam
[jira] [Created] (HADOOP-10626) Limit Returning Attributes for LDAP search
Jason Hubbard created HADOOP-10626: -- Summary: Limit Returning Attributes for LDAP search Key: HADOOP-10626 URL: https://issues.apache.org/jira/browse/HADOOP-10626 Project: Hadoop Common Issue Type: Improvement Components: security Affects Versions: 2.3.0 Reporter: Jason Hubbard When using Hadoop Ldap Group mappings in an enterprise environment, searching groups and returning all members can take a long time causing a timeout. This causes not all groups to be returned for a user. Because the first search only searches for the user dn and the second search retrieves the group member attribute, we only need to return the group member attribute on the search speeding up the search. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: HADOOP_ROOT_LOGGER
In my experience the default HADOOP_ROOT_LOGGER definition will override any root logger defined in log4j.properties, which is where the problems have arisen. If the HADOOP_ROOT_LOGGER definition in hadoop-config.sh were removed, wouldn't the root logger defined in the log4j.properties file be used? Or do the client commands not read that configuration file? I'm trying to understand why the root logger should be defined outside of the log4j.properties file. Rob On 05/22/2014 12:53 AM, Vinayakumar B wrote: Hi Robert, I understand your confusion. HADOOP_ROOT_LOGGER is set to default value "INFO,console" if it hasn't set for anything and logs will be displayed on the console itself. This will be true for any client commands you run. For ex: "hdfs dfs -ls /" But for the server scripts (hadoop-daemon.sh, yarn-daemon.sh, etc) HADOOP_ROOT_LOGGER will be set to "INFO, RFA" if HADOOP_ROOT_LOGGER env variable is not defined. So that all the log messages of the server daemons goto some log files and this will be maintained by RollingFileAppender. If you want to override all these default and set your own loglevel then define that as env variable HADOOP_ROOT_LOGGER. For ex: export HADOOP_ROOT_LOGGER="DEBUG,RFA" export above env variable and then start server scripts or execute client commands, all logs goto files and will be maintained by RollingFileAppender. Regards, Vinay On Wed, May 21, 2014 at 6:42 PM, Robert Rati wrote: I noticed in hadoop-config.sh there is this line: HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.root.logger=${HADOOP_ ROOT_LOGGER:-INFO,console}" which is setting a root logger if HADOOP_ROOT_LOGGER isn't set. Why is this here.needed? There is a log4j.properties file provided that defines a default logger. I believe the line above will result in overriding whatever is set for the root logger in the log4j.properties file. This has caused some confusion and hacks to work around this. Is there a reason not to remove the above code and just have all the logger definitions in the log4j.properties file? Is there maybe a compatibility concern? Rob
[jira] [Resolved] (HADOOP-8446) make hadoop-core jar OSGi friendly
[ https://issues.apache.org/jira/browse/HADOOP-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-8446. Resolution: Duplicate > make hadoop-core jar OSGi friendly > -- > > Key: HADOOP-8446 > URL: https://issues.apache.org/jira/browse/HADOOP-8446 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: Freeman Fang > > hadoop-core isn't OSGi friendly, so for those who wanna use it in OSGi > container, must wrap it with tool like bnd/maven-bundle-plugin. Apache > Servicemix always wrap 3rd party jars which isn't OSGi friendly, you can see > we've done it for lots of jars here[1], more specifically for several > hadoop-core versions[2]. Though we may keep this way doing it, the problem > is that we need do it for every new released version for 3rd party jars, more > importantly we need ensure other Apache projects communities are aware of > we're doing it. > In Servicemix we just wrap hadoop-core 1.0.3, issues to track it in > Servicemix is[3]. > We hope Apache Hadoop can offer OSGi friendly jars, in most cases, it's > should be straightforward, as it just need add OSGi metadata headers to > MANIFEST.MF, this could be done easily with maven-bundle-plugin if build with > maven. There's also some other practice should be followed like different > modules shouldn't share same package(avoid split pacakge). > thanks > [1]http://repo2.maven.org/maven2/org/apache/servicemix/bundles > [2]http://repo2.maven.org/maven2/org/apache/servicemix/bundles/org.apache.servicemix.bundles.hadoop-core/ > [3]https://issues.apache.org/jira/browse/SMX4-1147 -- This message was sent by Atlassian JIRA (v6.2#6252)