Ubuntu 12.04 - Which JDK?
Hi, I am planning to use Ubuntu 12.04, from http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and beyond, please use other JDKs." Is it OK to use OpenJDK 7 in Ubuntu 12.04? Thanks
Re: Ubuntu 12.04 - Which JDK?
Thanks Harsh, This is exactly what I am facing, I have used Ubuntu for long time, 12.04 is the Long Term Support version (LTS), however Oracle JDK/JRE6 is no longer supported in it, "OpenJDK has sometimes a odd behavior - Alexander Lorenz'". Any Ubuntu 12.04 users here to share your selections/options? Thanks On 8 Nov 2012, at 4:37 PM, Harsh J wrote: > Hi Sanjeev, > > Unfortunately, official Ubuntu repositories no longer supports Oracle > JDK/JRE 6. See https://help.ubuntu.com/community/Java#Oracle_.28Sun.29_Java_6 > > On Thu, Nov 8, 2012 at 11:47 AM, Sanjeev Verma > wrote: >> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop >> from source? Get precompiled binaries and you will be ok. >> >> Also, you can install sun/oracle jdk on ubuntu. Just google for >> instructions, u will find plenty, like here - >> http://www.ubuntututorials.com/install-oracle-java-jdk-7-ubuntu-12-04/. >> These are for jdk 7, but you can follow the same to install jdk 6. >> >> Enjoy! >> >> On Nov 8, 2012 11:30 AM, "a...@hsk.hk" wrote: >>> >>> Hi, >>> >>> I am planning to use Ubuntu 12.04, from >>> http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK >>> >>> "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so >>> OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and >>> beyond, please use other JDKs." >>> >>> Is it OK to use OpenJDK 7 in Ubuntu 12.04? >>> >>> Thanks >>> >> > > > > -- > Harsh J
Re: Ubuntu 12.04 - Which JDK?
Hi, thank you for your info and will head for it. Thanks. On 8 Nov 2012, at 6:07 PM, Mohammad Tariq wrote: > Hi there, > > As per my understanding, all the code has been built and tested against > java6(sun). There are some changes between java7 and 6 and you may face > issues while running your code, like using sort with collections. I faced the > same thing when using List.sort in my code. So, it's better to go with sun's > java6. If you want to install it you can follow these steps : > > 1. Download the zipped file and extract it. > > 2. chmod +x jdk-6u37-linux-x64.bin > > 3. ./jdk-6u37-linux-x64.bin > > 4. sudo mv jdk1.6.0_37/ /usr/lib/jvm/ > > 5. sudo update-alternatives --install /usr/bin/javac javac > /usr/lib/jvm/jdk1.6.0_37/bin/javac 1 > > 6. sudo update-alternatives --install /usr/bin/java java > /usr/lib/jvm/jdk1.6.0_37/bin/java 1 > > 7. sudo update-alternatives --install /usr/bin/javaws javaws > /usr/lib/jvm/jdk1.6.0_37/bin/javaws 1 > > Then choose which java to use : > > sudo update-alternatives --config java > > choose the no for java6 > > Regards, > Mohammad Tariq > > > > On Thu, Nov 8, 2012 at 2:29 PM, a...@hsk.hk wrote: > Thanks Harsh, > > This is exactly what I am facing, I have used Ubuntu for long time, 12.04 is > the Long Term Support version (LTS), however Oracle JDK/JRE6 is no longer > supported in it, "OpenJDK has sometimes a odd behavior - Alexander Lorenz'". > Any Ubuntu 12.04 users here to share your selections/options? > > Thanks > > > On 8 Nov 2012, at 4:37 PM, Harsh J wrote: > > > Hi Sanjeev, > > > > Unfortunately, official Ubuntu repositories no longer supports Oracle > > JDK/JRE 6. See > > https://help.ubuntu.com/community/Java#Oracle_.28Sun.29_Java_6 > > > > On Thu, Nov 8, 2012 at 11:47 AM, Sanjeev Verma > > wrote: > >> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop > >> from source? Get precompiled binaries and you will be ok. > >> > >> Also, you can install sun/oracle jdk on ubuntu. Just google for > >> instructions, u will find plenty, like here - > >> http://www.ubuntututorials.com/install-oracle-java-jdk-7-ubuntu-12-04/. > >> These are for jdk 7, but you can follow the same to install jdk 6. > >> > >> Enjoy! > >> > >> On Nov 8, 2012 11:30 AM, "a...@hsk.hk" wrote: > >>> > >>> Hi, > >>> > >>> I am planning to use Ubuntu 12.04, from > >>> http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK > >>> > >>> "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so > >>> OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and > >>> beyond, please use other JDKs." > >>> > >>> Is it OK to use OpenJDK 7 in Ubuntu 12.04? > >>> > >>> Thanks > >>> > >> > > > > > > > > -- > > Harsh J > >
Re: Ubuntu 12.04 - Which JDK? and some more
Hi, After spending time to research Hadoop and Hbase JDK requirements, here is my advice about using Ubuntu 12.04: 1) If you plan to use HBase on Hadoop HDFS, do note that HBase will run on Oracle's JDK only, so do NOT use OpenJDK or other JDKs for the setup. 2) When installing Ubuntu, should NOT use LVM (Linux Logical Volume Manager) for Hadoop data disks! (There will be performance issue between the filesystem and the device, LVM is the default for some Linux package, be careful not to select LVM) regards On 8 Nov 2012, at 6:12 PM, a...@hsk.hk wrote: > Hi, thank you for your info and will head for it. Thanks. > On 8 Nov 2012, at 6:07 PM, Mohammad Tariq wrote: > >> Hi there, >> >> As per my understanding, all the code has been built and tested against >> java6(sun). There are some changes between java7 and 6 and you may face >> issues while running your code, like using sort with collections. I faced >> the same thing when using List.sort in my code. So, it's better to go with >> sun's java6. If you want to install it you can follow these steps : >> >> 1. Download the zipped file and extract it. >> >> 2. chmod +x jdk-6u37-linux-x64.bin >> >> 3. ./jdk-6u37-linux-x64.bin >> >> 4. sudo mv jdk1.6.0_37/ /usr/lib/jvm/ >> >> 5. sudo update-alternatives --install /usr/bin/javac javac >> /usr/lib/jvm/jdk1.6.0_37/bin/javac 1 >> >> 6. sudo update-alternatives --install /usr/bin/java java >> /usr/lib/jvm/jdk1.6.0_37/bin/java 1 >> >> 7. sudo update-alternatives --install /usr/bin/javaws javaws >> /usr/lib/jvm/jdk1.6.0_37/bin/javaws 1 >> >> Then choose which java to use : >> >> sudo update-alternatives --config java >> >> choose the no for java6 >> >> Regards, >> Mohammad Tariq >> >> >> >> On Thu, Nov 8, 2012 at 2:29 PM, a...@hsk.hk wrote: >> Thanks Harsh, >> >> This is exactly what I am facing, I have used Ubuntu for long time, 12.04 is >> the Long Term Support version (LTS), however Oracle JDK/JRE6 is no longer >> supported in it, "OpenJDK has sometimes a odd behavior - Alexander >> Lorenz'". Any Ubuntu 12.04 users here to share your selections/options? >> >> Thanks >> >> >> On 8 Nov 2012, at 4:37 PM, Harsh J wrote: >> >> > Hi Sanjeev, >> > >> > Unfortunately, official Ubuntu repositories no longer supports Oracle >> > JDK/JRE 6. See >> > https://help.ubuntu.com/community/Java#Oracle_.28Sun.29_Java_6 >> > >> > On Thu, Nov 8, 2012 at 11:47 AM, Sanjeev Verma >> > wrote: >> >> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop >> >> from source? Get precompiled binaries and you will be ok. >> >> >> >> Also, you can install sun/oracle jdk on ubuntu. Just google for >> >> instructions, u will find plenty, like here - >> >> http://www.ubuntututorials.com/install-oracle-java-jdk-7-ubuntu-12-04/. >> >> These are for jdk 7, but you can follow the same to install jdk 6. >> >> >> >> Enjoy! >> >> >> >> On Nov 8, 2012 11:30 AM, "a...@hsk.hk" wrote: >> >>> >> >>> Hi, >> >>> >> >>> I am planning to use Ubuntu 12.04, from >> >>> http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK >> >>> >> >>> "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so >> >>> OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 >> >>> and >> >>> beyond, please use other JDKs." >> >>> >> >>> Is it OK to use OpenJDK 7 in Ubuntu 12.04? >> >>> >> >>> Thanks >> >>> >> >> >> > >> > >> > >> > -- >> > Harsh J >> >> >
Re: HA for hadoop-0.20.2
Hi, A question, is 2.x ready for production deployment? Thanks On 13 Nov 2012, at 5:19 PM, Harsh J wrote: > Hi, > > Why not just use the 2.x releases for HA-NNs? There is quite a wide > delta between 0.20.x and 2.x, especially around the edit log areas > after HDFS-1073. > > In any case, I think your question suits hdfs-...@hadoop.apache.org > more than the user lists, although I don't quite understand what > you're attempting to do (or point). > > On Tue, Nov 13, 2012 at 2:18 PM, lei liu wrote: >> I want to implement HA function for hadoop-0.20.2. >> >> When I learn the hadoop-2.0 code, I meet some question like this: >> >> Thera are below code in FSEditLogLoader.loadEditRecords method. >> >> if (op.hasTransactionId()) { >>if (op.getTransactionId() > expectedTxId) { >> MetaRecoveryContext.editLogLoaderPrompt("There appears " + >> "to be a gap in the edit log. We expected txid " + >> expectedTxId + ", but got txid " + >> op.getTransactionId() + ".", recovery, "ignoring missing " >> + >> " transaction IDs"); >>} else if (op.getTransactionId() < expectedTxId) { >> MetaRecoveryContext.editLogLoaderPrompt("There appears " + >> "to be an out-of-order edit in the edit log. We " + >> "expected txid " + expectedTxId + ", but got txid " + >> op.getTransactionId() + ".", recovery, >> "skipping the out-of-order edit"); >> continue; >>} >> } >> >> The method use transaction id to guarantee same transaction log is not >> applied to namespace more than once. >> >> But in hadoop-0.20.2, FSEditLog don't store the transaction id into edits >> log file. So I want to know if StandbyNN apply same transaction log to >> namespace more than once, that will lead to the namespace of StandbyNN is >> corrupt? >> >> Please give me some advice,Thanks. >> >> >> >> Best Regards >> >> LiuLei > > > > -- > Harsh J
High Availability - second namenode (master2) issue: Incompatible namespaceIDs
Hi, Please help! I have installed a Hadoop Cluster with a single master (master1) and have HBase running on the HDFS. Now I am setting up the second master (master2) in order to form HA. When I used JPS to check the cluster, I found : 2782 Jps 2126 NameNode 2720 SecondaryNameNode i.e. The datanode on this server could not be started In the log file, found: 2012-11-16 10:28:44,851 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 1356148070; datanode namespaceID = 1151604993 One of the possible solutions to fix this issue is to: stop the cluster, reformat the NameNode, restart the cluster. QUESTION: As I already have HBASE running on the cluster, if I reformat the NameNode, do I need to reinstall the entire HBASE? I don't mind to have all data lost as I don't have many data in HBASE and HDFS, however I don't want to re-install HBASE again. On the other hand, I have tried another solution: stop the DataNode, edit the namespaceID in current/VERSION (i.e. set namespaceID=1151604993), restart the datanode, it doesn't work: Warning: $HADOOP_HOME is deprecated. starting master2, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out Exception in thread "main" java.lang.NoClassDefFoundError: master2 Caused by: java.lang.ClassNotFoundException: master2 at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) Could not find the main class: master2. Program will exit. QUESTION: Any other solutions? Thanks
Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs
Thank you very much, will try. On 16 Nov 2012, at 4:31 PM, Vinayakumar B wrote: > Hi, > > If you are moving from NonHA (single master) to HA, then follow the below > steps. > 1. Configure the another namenode’s configuration in the running > namenode and all datanode’s configurations. And configure logical fs.defaultFS > 2. Configure the shared storage related configuration. > 3. Stop the running NameNode and all datanodes. > 4. Execute ‘hdfs namenode –initializeSharedEdits’ from the existing > namenode installation, to transfer the edits to shared storage. > 5. Now format zkfc using ‘hdfs zkfc –formatZK’ and start zkfc using > ‘hadoop-daemon.sh start zkfc’ > 6. Now restart the namenode from existing installation. If all > configurations are fine, then NameNode should start successfully as STANDBY, > then zkfc will make it to ACTIVE. > > 7. Now install the NameNode in another machine (master2) with same > configuration, except ‘dfs.ha.namenode.id’. > 8. Now instead of format, you need to copy the name dir contents from > another namenode (master1) to master2’s name dir. For this you are having 2 > options. > a. Execute ‘hdfs namenode -bootStrapStandby’ from the master2 > installation. > b. Using ‘scp’ copy entire contents of name dir from master1 to > master2’s name dir. > 9. Now start the zkfc for second namenode ( No need to do zkfc format > now). Also start the namenode (master2) > > Regards, > Vinay- > From: Uma Maheswara Rao G [mailto:mahesw...@huawei.com] > Sent: Friday, November 16, 2012 1:26 PM > To: user@hadoop.apache.org > Subject: RE: High Availability - second namenode (master2) issue: > Incompatible namespaceIDs > > If you format namenode, you need to cleanup storage directories of DataNode > as well if that is having some data already. DN also will have namespace ID > saved and compared with NN namespaceID. if you format NN, then namespaceID > will be changed and DN may have still older namespaceID. So, just cleaning > the data in DN would be fine. > > Regards, > Uma > From: hadoop hive [hadooph...@gmail.com] > Sent: Friday, November 16, 2012 1:15 PM > To: user@hadoop.apache.org > Subject: Re: High Availability - second namenode (master2) issue: > Incompatible namespaceIDs > > Seems like you havn't format your cluster (if its 1st time made). > > On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hk wrote: > Hi, > > Please help! > > I have installed a Hadoop Cluster with a single master (master1) and have > HBase running on the HDFS. Now I am setting up the second master (master2) > in order to form HA. When I used JPS to check the cluster, I found : > > 2782 Jps > 2126 NameNode > 2720 SecondaryNameNode > i.e. The datanode on this server could not be started > > In the log file, found: > 2012-11-16 10:28:44,851 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: > Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = > 1356148070; datanode namespaceID = 1151604993 > > > > One of the possible solutions to fix this issue is to: stop the cluster, > reformat the NameNode, restart the cluster. > QUESTION: As I already have HBASE running on the cluster, if I reformat the > NameNode, do I need to reinstall the entire HBASE? I don't mind to have all > data lost as I don't have many data in HBASE and HDFS, however I don't want > to re-install HBASE again. > > > On the other hand, I have tried another solution: stop the DataNode, edit the > namespaceID in current/VERSION (i.e. set namespaceID=1151604993), restart the > datanode, it doesn't work: > Warning: $HADOOP_HOME is deprecated. > starting master2, logging to > /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out > Exception in thread "main" java.lang.NoClassDefFoundError: master2 > Caused by: java.lang.ClassNotFoundException: master2 > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > Could not find the main class: master2. Program will exit. > QUESTION: Any other solutions? > > > > Thanks > > > > > > >
Datanode: "Cannot start secure cluster without privileged resources"
Hi, I am setting up HDFS security with Kerberos: When I manually started the first datanode, I got the following messages (the namenode is started): 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.RuntimeException: Cannot start secure cluster without privileged resources. OS: Ubuntu 12.04 Hadoop: 1.0.4 It seems that it could login successfully but something is missing Please help! Thanks
Re: Datanode: "Cannot start secure cluster without privileged resources"
Hi Harsh, Thank you very much for your reply, got it! Thanks ac On 26 Nov 2012, at 8:32 PM, Harsh J wrote: > Secure DN needs to be started as root (it runs as proper user, but > needs to be started as root to grab reserved ports), and needs a > proper jsvc binary (for your arch/OS) available. Are you using > tarballs or packages (and if packages, are they from Bigtop)? > > On Mon, Nov 26, 2012 at 5:21 PM, a...@hsk.hk wrote: >> Hi, >> >> I am setting up HDFS security with Kerberos: >> When I manually started the first datanode, I got the following messages >> (the namenode is started): >> >> 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful >> for user >> 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: >> java.lang.RuntimeException: Cannot start secure cluster without privileged >> resources. >> >> OS: Ubuntu 12.04 >> Hadoop: 1.0.4 >> >> It seems that it could login successfully but something is missing >> Please help! >> >> Thanks >> >> >> >> > > > > -- > Harsh J
Re: Datanode: "Cannot start secure cluster without privileged resources"
Hi, A question: I started Secure DN then ran JPS as root, I could not find any running DN: 16152 16195 Jps However, when I tried to start the secure DN again, I got: Warning: $HADOOP_HOME is deprecated. datanode running as process 16117. Stop it first. Does it mean JPS is no longer a tool to check DN in secure mode? Thanks On 26 Nov 2012, at 9:03 PM, a...@hsk.hk wrote: > Hi Harsh, > > Thank you very much for your reply, got it! > > Thanks > ac > > On 26 Nov 2012, at 8:32 PM, Harsh J wrote: > >> Secure DN needs to be started as root (it runs as proper user, but >> needs to be started as root to grab reserved ports), and needs a >> proper jsvc binary (for your arch/OS) available. Are you using >> tarballs or packages (and if packages, are they from Bigtop)? >> >> On Mon, Nov 26, 2012 at 5:21 PM, a...@hsk.hk wrote: >>> Hi, >>> >>> I am setting up HDFS security with Kerberos: >>> When I manually started the first datanode, I got the following messages >>> (the namenode is started): >>> >>> 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful >>> for user >>> 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: >>> java.lang.RuntimeException: Cannot start secure cluster without privileged >>> resources. >>> >>> OS: Ubuntu 12.04 >>> Hadoop: 1.0.4 >>> >>> It seems that it could login successfully but something is missing >>> Please help! >>> >>> Thanks >>> >>> >>> >>> >> >> >> >> -- >> Harsh J >
Re: Datanode: "Cannot start secure cluster without privileged resources"
Hi, Thanks for your reply. However, I think 16152 should not be the DN, since 1) my second try of "/usr/local/hadoop/bin/hadoop-daemon.sh start datanode" says 16117 (i.e. I ran start datanode twice), and 2) ps axu | grep 16117, I got root 16117 0.0 0.0 17004 904 pts/2S21:34 0:00 jsvc.exec -Dproc_datanode -outfile /usr/local/hadoop-1.0.4/libexec/ ... These are the two reasons that I think JPS is no longer a tool to check secure DN. Thanks again! On 26 Nov 2012, at 9:47 PM, Harsh J wrote: > The 16152 should be the DN JVM I think. This is a jps limitation, as > seen at http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jps.html > and jsvc (which secure mode DN uses) is such a custom launcher. > > "The jps command uses the java launcher to find the class name and > arguments passed to the main method. If the target JVM is started with > a custom launcher, the class name (or JAR file name) and the arguments > to the main method will not be available. In this case, the jps > command will output the string Unknownfor the class name or JAR file > name and for the arguments to the main method." > > On Mon, Nov 26, 2012 at 7:11 PM, a...@hsk.hk wrote: >> Hi, >> >> A question: >> I started Secure DN then ran JPS as root, I could not find any running DN: >> 16152 >> 16195 Jps >> >> However, when I tried to start the secure DN again, I got: >> Warning: $HADOOP_HOME is deprecated. >> datanode running as process 16117. Stop it first. >> >> Does it mean JPS is no longer a tool to check DN in secure mode? >> >> Thanks >> >> >> On 26 Nov 2012, at 9:03 PM, a...@hsk.hk wrote: >> >>> Hi Harsh, >>> >>> Thank you very much for your reply, got it! >>> >>> Thanks >>> ac >>> >>> On 26 Nov 2012, at 8:32 PM, Harsh J wrote: >>> >>>> Secure DN needs to be started as root (it runs as proper user, but >>>> needs to be started as root to grab reserved ports), and needs a >>>> proper jsvc binary (for your arch/OS) available. Are you using >>>> tarballs or packages (and if packages, are they from Bigtop)? >>>> >>>> On Mon, Nov 26, 2012 at 5:21 PM, a...@hsk.hk wrote: >>>>> Hi, >>>>> >>>>> I am setting up HDFS security with Kerberos: >>>>> When I manually started the first datanode, I got the following messages >>>>> (the namenode is started): >>>>> >>>>> 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful >>>>> for user >>>>> 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: >>>>> java.lang.RuntimeException: Cannot start secure cluster without >>>>> privileged resources. >>>>> >>>>> OS: Ubuntu 12.04 >>>>> Hadoop: 1.0.4 >>>>> >>>>> It seems that it could login successfully but something is missing >>>>> Please help! >>>>> >>>>> Thanks >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>> >> > > > > -- > Harsh J
Re: Datanode: "Cannot start secure cluster without privileged resources"
/../lib/commons-cli-1.2.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-codec-1.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-collections-3.2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-configuration-1.6.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-daemon-1.0.10.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-digester-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-el-1.0.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-io-2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-lang-2.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-logging-1.1.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-math-2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-net-1.4.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/core-3.1.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hadoop-capacity-scheduler-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hadoop-fairscheduler-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hadoop-thriftfs-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jdeb-0.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jersey-core-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jersey-json-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jersey-server-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jets3t-0.6.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jetty-6.1.26.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jetty-util-6.1.26.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jsch-0.1.42.jar:/usr/local/hadoop-1.0.4/libexec/../lib/junit-4.5.jar:/usr/local/hadoop-1.0.4/libexec/../lib/kfs-0.2.2.jar:/usr/local/hadoop-1.0.4/libexec/../lib/log4j-1.2.15.jar:/usr/local/hadoop-1.0.4/libexec/../lib/mockito-all-1.8.5.jar:/usr/local/hadoop-1.0.4/libexec/../lib/oro-2.0.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/local/hadoop-1.0.4/libexec/../lib/slf4j-api-1.4.3.jar:/usr/local/hadoop-1.0.4/libexec/../lib/slf4j-log4j12-1.4.3.jar:/usr/local/hadoop-1.0.4/libexec/../lib/xmlenc-0.52.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-api-2.1.jar -Xmx1000m -jvm server -Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS -Dcom.sun.management.jmxremote -Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS -Dcom.sun.management.jmxremote -Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS -Dcom.sun.management.jmxremote -Dhadoop.log.dir=/usr/local/hadoop-1.0.4/libexec/../logs -Dhadoop.log.file=hadoop-hduser-datanode-m147.log -Dhadoop.home.dir=/usr/local/hadoop-1.0.4/libexec/.. -Dhadoop.id.str=hduser -Dhadoop.root.logger=INFO,DRFA -Dhadoop.security.logger=INFO,NullAppender -Djava.library.path=/usr/local/hadoop-1.0.4/libexec/../lib/native/Linux-amd64-64 -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter root 16499 0.0 0.0 9388 920 pts/0R+ 22:35 0:00 grep --color=auto 16117 I started all DNs in secure mode now. Thanks again! ac On 26 Nov 2012, at 10:30 PM, Harsh J wrote: > Could you also check what 16152 is? The jsvc is a launcher process, > not the JVM itself. > > As I mentioned, JPS is pretty reliable, just wont' show the name of > the JVM launched by a custom wrapper - and will show just PID. > > On Mon, Nov 26, 2012 at 7:35 PM, a...@hsk.hk wrote: >> Hi, >> >> Thanks for your reply. >> >> However, I think 16152 should not be the DN, since >> 1) my second try of "/usr/local/hadoop/bin/hadoop-daemon.sh start datanode" >> says 16117 (i.e. I ran start datanode twice), and >> 2) ps axu | grep 16117, I got >> root 16117 0.0 0.0 17004 904 pts/2S21:34 0:00 jsvc.exec >> -Dproc_datanode -outfile /usr/local/hadoop-1.0.4/libexec/ ... >> >> These are the two reasons that I think JPS is no longer a tool to check >> secure DN. >> >> Thanks again! >> >> >> On 26 Nov 2012, at 9:47 PM, Harsh J wrote: >> >>> The 16152 should be the DN JVM I think. This is a jps limitation, as >>> seen at http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jps.html >>> and jsvc (which secure mode DN uses) is such a custom launcher. >>> >>> "The jps command uses the java launcher to find the class name and >>> arguments passed to the main method. If the target JVM is started with >>> a custom launcher, the class name (or JAR file name) and the arguments >>> to the main method wil
Failed To Start SecondaryNameNode in Secure Mode
Hi, Please help! I tried to start SecondaryNameNode in secure mode by the command: {$HADOOP_HOME}bin/hadoop-daemon.sh start secondarynamenode 1) from the log, I saw "Login successful" / 2012-11-27 22:05:23,120 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user .. 2012-11-27 22:05:23,246 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at .. / 2) However, from the command line, I saw $ {$HADOOP_HOME}/bin/hadoop-daemon.sh start secondarynamenode Warning: $HADOOP_HOME is deprecated. starting secondarynamenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-m146.out Exception in thread "main" java.io.IOException: Login failure for null from keytab /etc/hadoop/hadoop.keytab at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:183) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567) Caused by: javax.security.auth.login.LoginException: Unable to obtain Princpal Name for authentication at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733) at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) There is no secondarynamenode process if I use JPS to check QUESTION: Any idea where I am wrong? Thanks ac
Re: Failed To Start SecondaryNameNode in Secure Mode
ul for user .. 2012-11-28 22:43:01,447 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Starting web server as: host/m146.. 2012-11-28 22:43:01,480 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-11-28 22:43:01,531 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-11-28 22:43:01,536 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at m146 / Please help! Thanks ac On 28 Nov 2012, at 12:57 AM, Arpit Gupta wrote: > Hi AC, > > Do you have the following property defined in your hdfs-site.xml > > > dfs.secondary.namenode.kerberos.internal.spnego.principal > HTTP/_HOST@REALM > > > and this principal needs to be available in your /etc/hadoop/hadoop.keytab. > From the logs it looks like you only have the following configured > "dfs.secondary.namenode.kerberos.principal" > > > -- > Arpit Gupta > Hortonworks Inc. > http://hortonworks.com/ > > On Nov 27, 2012, at 6:14 AM, "a...@hsk.hk" wrote: > >> Hi, >> >> Please help! >> >> I tried to start SecondaryNameNode in secure mode by the command: >> {$HADOOP_HOME}bin/hadoop-daemon.sh start secondarynamenode >> >> 1) from the log, I saw "Login successful" >> / >> 2012-11-27 22:05:23,120 INFO >> org.apache.hadoop.security.UserGroupInformation: Login successful for user >> .. >> 2012-11-27 22:05:23,246 INFO >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: >> / >> SHUTDOWN_MSG: Shutting down SecondaryNameNode at .. >> / >> >> >> 2) However, from the command line, I saw >> $ {$HADOOP_HOME}/bin/hadoop-daemon.sh start secondarynamenode >> Warning: $HADOOP_HOME is deprecated. >> starting secondarynamenode, logging to >> /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-m146.out >> Exception in thread "main" java.io.IOException: Login failure for null >> from keytab /etc/hadoop/hadoop.keytab >> at >> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716) >> at >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:183) >> at >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129) >> at >> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567) >> Caused by: javax.security.auth.login.LoginException: Unable to obtain >> Princpal Name for authentication >> at >> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733) >> at >> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629) >> at >> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> >> There is no secondarynamenode process if I use JPS to check >> >> QUESTION: Any idea where I am wrong? >> >> >> Thanks >> ac >> >> >> >
Re: Failed To Start SecondaryNameNode in Secure Mode
Hi, I found this error message in the .out file after trying to start SecondaryNameNode in secure mode Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: m146:m146:0 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:190) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:190) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567) version: 1.0.4 distributed cluster: true os: ubuntu 12.04 server: also the server of NameNode, the NameNode is already started in secure mode QUESTION: It seems that it is related to configuration but what does this error mean and where I would be wrong? Please help! Thanks ac P.S. below is form the .log file 2012-11-29 17:42:16,687 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: / STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = m146.. STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2012-11-29 17:42:17,174 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hduser/m146.. 2012-11-29 17:42:17,405 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Starting web server as: host/m146... 2012-11-29 17:42:17,434 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-11-29 17:42:17,468 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-11-29 17:42:17,473 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down SecondaryNameNode at m146... /
Re: Failed To Start SecondaryNameNode in Secure Mode
Hi Since NN and SNN are used in the same server: 1) If i use the default "dfs.secondary.http.address", i.e. 0.0.0.0:50090 (commented out dfs.secondary.http.address property) I got : Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 0.0.0.0:0.0.0.0:0 2) If I add the following to hdfs-site.xml dfs.secondary.http.address m146:50090 I got: Exception in thread "main" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: m146:m146:0 in both cases, the port was not 50090, very strange. Thanks AC On 29 Nov 2012, at 5:46 PM, a...@hsk.hk wrote: > Hi, > > I found this error message in the .out file after trying to start > SecondaryNameNode in secure mode > > Exception in thread "main" java.lang.IllegalArgumentException: Does not > contain a valid host:port authority: m146:m146:0 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:205) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:190) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:190) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567) > > version: 1.0.4 > distributed cluster: true > os: ubuntu 12.04 > server: also the server of NameNode, the NameNode is already started in > secure mode > > QUESTION: It seems that it is related to configuration but what does this > error mean and where I would be wrong? Please help! > > Thanks > ac > > > > > P.S. > below is form the .log file > > 2012-11-29 17:42:16,687 INFO > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: > / > STARTUP_MSG: Starting SecondaryNameNode > STARTUP_MSG: host = m146.. > STARTUP_MSG: args = [] > STARTUP_MSG: version = 1.0.4 > STARTUP_MSG: build = > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r > 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 > / > 2012-11-29 17:42:17,174 INFO org.apache.hadoop.security.UserGroupInformation: > Login successful for user hduser/m146.. > 2012-11-29 17:42:17,405 INFO > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Starting web server > as: host/m146... > 2012-11-29 17:42:17,434 INFO org.mortbay.log: Logging to > org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > org.mortbay.log.Slf4jLog > 2012-11-29 17:42:17,468 INFO org.apache.hadoop.http.HttpServer: Added global > filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) > 2012-11-29 17:42:17,473 INFO > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: > / > SHUTDOWN_MSG: Shutting down SecondaryNameNode at m146... > /
Re: CheckPoint Node
Hi JM, If you migrate 1.0.3 to 2.0.x, could you mind to share your migration steps? it is because I also have a 1.0.4 cluster (Ubuntu 12.04, Hadoop 1.0.4, Hbase 0.94.2 and ZooKeeper 3.4.4 ) and want to migrate it to 2.0.x in order to avoid the hardware failure of the NameNode. I have a testing cluster ready for the migration test. Thanks ac On 1 Dec 2012, at 10:25 AM, Jean-Marc Spaggiari wrote: > Sorry about that. My fault. > > I have put this on the core-site.xml file but should be on the > hdfs-site.xml... > > I moved it and it's now working fine. > > Thanks. > > JM > > 2012/11/30, Jean-Marc Spaggiari : >> Hi, >> >> Is there a way to ask Hadoop to display its parameters? >> >> I have updated the property as followed: >> >>dfs.name.dir >>${hadoop.tmp.dir}/dfs/name,/media/usb0/ >> >> >> But even if I stop/start hadoop, there is nothing written on the usb >> drive. So I'm wondering if there is a command line like bin/hadoop >> --showparameters >> >> Thanks, >> >> JM >> >> 2012/11/22, Jean-Marc Spaggiari : >>> Perfect. Thanks again for your time! >>> >>> I will first add another drive on the Namenode because this will take >>> 5 minutes. Then I will read about the migration from 1.0.3 to 2.0.x >>> and most probably will use the zookeeper solution. >>> >>> This will take more time, so will be done over the week-end. >>> >>> I lost 2 hard drives this week (2 datanodes), so I'm not a bit >>> concerned about the NameNode data. Just want to secure that a bit >>> more. >>> >>> JM >>> >>> 2012/11/22, Harsh J : Jean-Marc (Sorry if I've been spelling your name wrong), 0.94 does support Hadoop-2 already, and works pretty well with it, if that is your only concern. You only need to use the right download (or if you compile, use the -Dhadoop.profile=23 maven option). You will need to restart the NameNode to make changes to the dfs.name.dir property and set it into effect. A reasonably fast disk is needed for quicker edit log writes (few bytes worth in each round) but a large, or SSD-style disk is not a requisite. An external disk would work fine too (instead of an NFS), as long as it is reliable. You do not need to copy data manually - just ensure that your NameNode process user owns the directory and it will auto-populate the empty directory on startup. Operationally speaking, in case 1/2 disk fails, the NN Web UI (and metrics as well) will indicate this (see bottom of NN UI page for an example of what am talking about) but the NN will continue to run with the lone remaining disk, but its not a good idea to let it run for too long without fixing/replacing the disk, for you will be losing out on redundancy. On Thu, Nov 22, 2012 at 11:59 PM, Jean-Marc Spaggiari wrote: > Hi Harsh, > > Again, thanks a lot for all those details. > > I read the previous link and I totally understand the HA NameNode. I > already have a zookeeper quorum (3 servers) that I will be able to > re-use. However, I'm running HBase 0.94.2 which is not yet compatible > (I think) with Hadoop 2.0.x. So I will have to go with a non-HA > NameNode until I can migrate to a stable 0.96 HBase version. > > Can I "simply" add one directory to dfs.name.dir and restart > my namenode? Is it going to feed all the required information in this > directory? Or do I need to copy the data of the existing one in the > new one before I restart it? Also, does it need a fast transfert rate? > Or will an exteral hard drive (quick to be moved to another server if > required) be enought? > > > 2012/11/22, Harsh J : >> Please follow the tips provided at >> http://wiki.apache.org/hadoop/FAQ#How_do_I_set_up_a_hadoop_node_to_use_multiple_volumes.3Fand >> http://wiki.apache.org/hadoop/FAQ#If_the_NameNode_loses_its_only_copy_of_the_fsimage_file.2C_can_the_file_system_be_recovered_from_the_DataNodes.3F >> >> In short, if you use a non-HA NameNode setup: >> >> - Yes the NN is a very vital persistence point in running HDFS and its >> data should be redundantly stored for safety. >> - You should, in production, configure your NameNode's image and edits >> disk (dfs.name.dir in 1.x+, or dfs.namenode.name.dir in 0.23+/2.x+) to >> be a dedicated one with adequate free space for gradual growth, and >> should configure multiple disks (with one off-machine NFS point highly >> recommended for easy recovery) for adequate redundancy. >> >> If you instead use a HA NameNode setup (I'd highly recommend doing >> this since it is now available), the presence of > 1 NameNodes and the >> journal log mount or quorum setup would automatically act as >> safeguards for the FS metadata. >> >> On Thu, Nov 22, 2012 at 11:03 PM, Jean-Marc Spaggiari >> wrote: >>> Hi Harsh,
Re: Map Reduce jobs taking a long time at the end
Hi, Have you also checked .out file of the tasktracker in logs? It could contain some useful information for the issue. Thanks ac On 4 Dec 2012, at 8:27 PM, Jay Whittaker wrote: > Hey, > > We are running Map reduce jobs against a 12 machine hbase cluster and > for a long time they took approx 30 mins to return a result against ~95 > million rows. Without any major changes to the data or any upgrade of > hbase/hadoop they now seem to be taking about 4 hours. and the logs are > full of > > 2012-12-04 13:33:15,602 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201211210952_0293_m_31_0 0.0% row: 63 6f 6d 2e 70 72 6f 75 > 67 68 74 > ... > 2012-12-04 13:45:17,134 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201211210952_0293_m_31_0 0.0% row: 63 6f 6d 2e 70 75 72 70 > 6c 65 64 65 73 69 67 6e 73 65 72 76 69 63 65 73 > ... > 2012-12-04 13:46:11,515 INFO org.apache.hadoop.mapred.TaskTracker: > attempt_201211210952_0293_m_31_0 0.0% row: 63 6f 6d 2e 70 75 73 68 > 74 6f 74 61 6c 6b 2d 6f 6e 6c 69 6e 65 > > I presume the 0% is percent complete but I'm not sure as to why the time > to complete has now jumped massively. Ganglia shows no major load on the > nodes in question so I don't think it's that. > > What steps should I be taking to try troubleshoot the problem? > > Regards, > > Jay
Re: Strange machine behavior
Hi, I always set "vm.swappiness = 0" for my hadoop servers (PostgreSQL servers too). The reason is that Linux moves memory pages to swap space if they have not been accessed for a period of time (swapping). Java virtual machine (JVM) does not act well in the case of swapping that will make MapReduce (and HBase and ZooKeeper) run into trouble. So I would suggest to set vm.swappiness = 0. Thanks ac On 9 Dec 2012, at 12:58 PM, seth wrote: > Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC nodes. > Otherwise you start paging out things you don't usually want paged out in > favor of a larger filesystem cache. > > There is also a vm parameter that controls the minimum size of the free > chain, might want to increase that a bit. > > Also, look into hosting your JVM heap on huge pages, they can't be paged out > and will help the JVM perform better too. > > On Dec 8, 2012, at 6:09 PM, Robert Dyer wrote: > >> Has anyone experienced a TaskTracker/DataNode behaving like the attached >> image? >> >> This was during a MR job (which runs often). Note the extremely high System >> CPU time. Upon investigating I saw that out of 64GB ram the system had >> allocated almost 45GB to cache! >> >> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is >> roughly where the graph goes back to normal (much lower System, much higher >> User). >> >> This has happened a few times. >> >> I have tried playing with the sysctl vm.swappiness value (default of 60) by >> setting it to 30 (which it was at when the graph was collected) and now to >> 10. I am not sure that helps. >> >> Any ideas? Anyone else run into this before? >> >> 24 cores >> 64GB ram >> 4x2TB sata3 hdd >> >> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on >> this machine. >> >> 24 map slots (1gb heap each), no reducers. >> >> Also running HBase 0.94.2 with a RS (8gb ram) on this machine. >>
Re: IOException:Error Recovery for block
Hi, Can you let us know which Hadoop version you are using? Thanks ac On 9 Dec 2012, at 3:03 PM, Manoj Babu wrote: > Hi All, > > When grepping the error logs i could see the below for a job which process > some 500GB of data. What would be the cause and how to avoid it further? > > java.io.IOException: Error Recovery for block blk_4907953961673137346_1929435 > failed because recovery from primary datanode 12.104.0.154:50010 failed 6 > times. Pipeline was 12.104.0.154:50010. Aborting... > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305) > at > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477) > > Cheers! > Manoj. >