Ubuntu 12.04 - Which JDK?

2012-11-07 Thread a...@hsk.hk
Hi,

I am planning to use Ubuntu 12.04, from 
http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK 

"Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so OpenJDK 
cannot be used to compile hadoop mapreduce code in branch-0.23 and beyond, 
please use other JDKs."

Is it OK to use OpenJDK 7 in Ubuntu 12.04?

Thanks



Re: Ubuntu 12.04 - Which JDK?

2012-11-08 Thread a...@hsk.hk
Thanks Harsh,

This is exactly what I am facing, I have used Ubuntu for long time, 12.04 is 
the Long Term Support version (LTS),  however Oracle JDK/JRE6  is no longer 
supported in it,  "OpenJDK has sometimes a odd behavior - Alexander Lorenz'".  
Any Ubuntu 12.04 users here to share your selections/options?

Thanks 


On 8 Nov 2012, at 4:37 PM, Harsh J wrote:

> Hi Sanjeev,
> 
> Unfortunately, official Ubuntu repositories no longer supports Oracle
> JDK/JRE 6. See https://help.ubuntu.com/community/Java#Oracle_.28Sun.29_Java_6
> 
> On Thu, Nov 8, 2012 at 11:47 AM, Sanjeev Verma
>  wrote:
>> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop
>> from source? Get precompiled binaries and you will be ok.
>> 
>> Also, you can install sun/oracle jdk on ubuntu. Just google for
>> instructions, u will find plenty, like here -
>> http://www.ubuntututorials.com/install-oracle-java-jdk-7-ubuntu-12-04/.
>> These are for jdk 7, but you can follow the same to install jdk 6.
>> 
>> Enjoy!
>> 
>> On Nov 8, 2012 11:30 AM, "a...@hsk.hk"  wrote:
>>> 
>>> Hi,
>>> 
>>> I am planning to use Ubuntu 12.04, from
>>> http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK
>>> 
>>> "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so
>>> OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and
>>> beyond, please use other JDKs."
>>> 
>>> Is it OK to use OpenJDK 7 in Ubuntu 12.04?
>>> 
>>> Thanks
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J



Re: Ubuntu 12.04 - Which JDK?

2012-11-08 Thread a...@hsk.hk
Hi, thank you for your info and will head for it. Thanks.
On 8 Nov 2012, at 6:07 PM, Mohammad Tariq wrote:

> Hi there,
> 
> As per my understanding, all the code has been built and tested against 
> java6(sun). There are some changes between java7 and 6 and you may face 
> issues while running your code, like using sort with collections. I faced the 
> same thing when using List.sort in my code. So, it's better to go with sun's 
> java6. If you want to install it you can follow these steps :
> 
> 1. Download the zipped file and extract it.
> 
> 2. chmod +x jdk-6u37-linux-x64.bin
> 
> 3. ./jdk-6u37-linux-x64.bin
> 
> 4. sudo mv jdk1.6.0_37/ /usr/lib/jvm/
> 
> 5. sudo update-alternatives --install /usr/bin/javac javac 
> /usr/lib/jvm/jdk1.6.0_37/bin/javac 1
> 
> 6. sudo update-alternatives --install /usr/bin/java java 
> /usr/lib/jvm/jdk1.6.0_37/bin/java 1
> 
> 7. sudo update-alternatives --install /usr/bin/javaws javaws 
> /usr/lib/jvm/jdk1.6.0_37/bin/javaws 1
> 
> Then choose which java to use :
> 
> sudo update-alternatives --config java
> 
> choose the no for java6
> 
> Regards,
> Mohammad Tariq
> 
> 
> 
> On Thu, Nov 8, 2012 at 2:29 PM, a...@hsk.hk  wrote:
> Thanks Harsh,
> 
> This is exactly what I am facing, I have used Ubuntu for long time, 12.04 is 
> the Long Term Support version (LTS),  however Oracle JDK/JRE6  is no longer 
> supported in it,  "OpenJDK has sometimes a odd behavior - Alexander Lorenz'". 
>  Any Ubuntu 12.04 users here to share your selections/options?
> 
> Thanks
> 
> 
> On 8 Nov 2012, at 4:37 PM, Harsh J wrote:
> 
> > Hi Sanjeev,
> >
> > Unfortunately, official Ubuntu repositories no longer supports Oracle
> > JDK/JRE 6. See 
> > https://help.ubuntu.com/community/Java#Oracle_.28Sun.29_Java_6
> >
> > On Thu, Nov 8, 2012 at 11:47 AM, Sanjeev Verma
> >  wrote:
> >> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop
> >> from source? Get precompiled binaries and you will be ok.
> >>
> >> Also, you can install sun/oracle jdk on ubuntu. Just google for
> >> instructions, u will find plenty, like here -
> >> http://www.ubuntututorials.com/install-oracle-java-jdk-7-ubuntu-12-04/.
> >> These are for jdk 7, but you can follow the same to install jdk 6.
> >>
> >> Enjoy!
> >>
> >> On Nov 8, 2012 11:30 AM, "a...@hsk.hk"  wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am planning to use Ubuntu 12.04, from
> >>> http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK
> >>>
> >>> "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so
> >>> OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and
> >>> beyond, please use other JDKs."
> >>>
> >>> Is it OK to use OpenJDK 7 in Ubuntu 12.04?
> >>>
> >>> Thanks
> >>>
> >>
> >
> >
> >
> > --
> > Harsh J
> 
> 



Re: Ubuntu 12.04 - Which JDK? and some more

2012-11-10 Thread a...@hsk.hk
Hi,

After spending time to research Hadoop and Hbase JDK requirements, here is my 
advice about using Ubuntu 12.04: 
 
1) If you plan to use HBase on Hadoop HDFS, do note that HBase will run on 
Oracle's JDK only, so do NOT use OpenJDK or other JDKs for the setup. 
2) When installing Ubuntu, should NOT use LVM (Linux Logical Volume Manager) 
for Hadoop data disks! (There will be performance issue between the filesystem 
and the device, LVM is the default for some Linux package, be careful not to 
select LVM)

regards

On 8 Nov 2012, at 6:12 PM, a...@hsk.hk wrote:

> Hi, thank you for your info and will head for it. Thanks.
> On 8 Nov 2012, at 6:07 PM, Mohammad Tariq wrote:
> 
>> Hi there,
>> 
>> As per my understanding, all the code has been built and tested against 
>> java6(sun). There are some changes between java7 and 6 and you may face 
>> issues while running your code, like using sort with collections. I faced 
>> the same thing when using List.sort in my code. So, it's better to go with 
>> sun's java6. If you want to install it you can follow these steps :
>> 
>> 1. Download the zipped file and extract it.
>> 
>> 2. chmod +x jdk-6u37-linux-x64.bin
>> 
>> 3. ./jdk-6u37-linux-x64.bin
>> 
>> 4. sudo mv jdk1.6.0_37/ /usr/lib/jvm/
>> 
>> 5. sudo update-alternatives --install /usr/bin/javac javac 
>> /usr/lib/jvm/jdk1.6.0_37/bin/javac 1
>> 
>> 6. sudo update-alternatives --install /usr/bin/java java 
>> /usr/lib/jvm/jdk1.6.0_37/bin/java 1
>> 
>> 7. sudo update-alternatives --install /usr/bin/javaws javaws 
>> /usr/lib/jvm/jdk1.6.0_37/bin/javaws 1
>> 
>> Then choose which java to use :
>> 
>> sudo update-alternatives --config java
>> 
>> choose the no for java6
>> 
>> Regards,
>> Mohammad Tariq
>> 
>> 
>> 
>> On Thu, Nov 8, 2012 at 2:29 PM, a...@hsk.hk  wrote:
>> Thanks Harsh,
>> 
>> This is exactly what I am facing, I have used Ubuntu for long time, 12.04 is 
>> the Long Term Support version (LTS),  however Oracle JDK/JRE6  is no longer 
>> supported in it,  "OpenJDK has sometimes a odd behavior - Alexander 
>> Lorenz'".  Any Ubuntu 12.04 users here to share your selections/options?
>> 
>> Thanks
>> 
>> 
>> On 8 Nov 2012, at 4:37 PM, Harsh J wrote:
>> 
>> > Hi Sanjeev,
>> >
>> > Unfortunately, official Ubuntu repositories no longer supports Oracle
>> > JDK/JRE 6. See 
>> > https://help.ubuntu.com/community/Java#Oracle_.28Sun.29_Java_6
>> >
>> > On Thu, Nov 8, 2012 at 11:47 AM, Sanjeev Verma
>> >  wrote:
>> >> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop
>> >> from source? Get precompiled binaries and you will be ok.
>> >>
>> >> Also, you can install sun/oracle jdk on ubuntu. Just google for
>> >> instructions, u will find plenty, like here -
>> >> http://www.ubuntututorials.com/install-oracle-java-jdk-7-ubuntu-12-04/.
>> >> These are for jdk 7, but you can follow the same to install jdk 6.
>> >>
>> >> Enjoy!
>> >>
>> >> On Nov 8, 2012 11:30 AM, "a...@hsk.hk"  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am planning to use Ubuntu 12.04, from
>> >>> http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK
>> >>>
>> >>> "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so
>> >>> OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 
>> >>> and
>> >>> beyond, please use other JDKs."
>> >>>
>> >>> Is it OK to use OpenJDK 7 in Ubuntu 12.04?
>> >>>
>> >>> Thanks
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Harsh J
>> 
>> 
> 



Re: HA for hadoop-0.20.2

2012-11-13 Thread a...@hsk.hk
Hi, A question, is 2.x ready for production deployment?  Thanks

On 13 Nov 2012, at 5:19 PM, Harsh J wrote:

> Hi,
> 
> Why not just use the 2.x releases for HA-NNs? There is quite a wide
> delta between 0.20.x and 2.x, especially around the edit log areas
> after HDFS-1073.
> 
> In any case, I think your question suits hdfs-...@hadoop.apache.org
> more than the user lists, although I don't quite understand what
> you're attempting to do (or point).
> 
> On Tue, Nov 13, 2012 at 2:18 PM, lei liu  wrote:
>> I want to implement HA function for hadoop-0.20.2.
>> 
>> When I learn the hadoop-2.0 code, I meet some question like this:
>> 
>> Thera are below code in FSEditLogLoader.loadEditRecords method.
>> 
>>   if (op.hasTransactionId()) {
>>if (op.getTransactionId() > expectedTxId) {
>>  MetaRecoveryContext.editLogLoaderPrompt("There appears " +
>>  "to be a gap in the edit log.  We expected txid " +
>>  expectedTxId + ", but got txid " +
>>  op.getTransactionId() + ".", recovery, "ignoring missing "
>> +
>>  " transaction IDs");
>>} else if (op.getTransactionId() < expectedTxId) {
>>  MetaRecoveryContext.editLogLoaderPrompt("There appears " +
>>  "to be an out-of-order edit in the edit log.  We " +
>>  "expected txid " + expectedTxId + ", but got txid " +
>>  op.getTransactionId() + ".", recovery,
>>  "skipping the out-of-order edit");
>>  continue;
>>}
>>  }
>> 
>> The method use transaction id to  guarantee same transaction log is not
>> applied to namespace more than once.
>> 
>> But in hadoop-0.20.2, FSEditLog don't store the transaction id into edits
>> log file. So I want to know if  StandbyNN apply same transaction log  to
>> namespace more than once, that will lead to the namespace of StandbyNN is
>> corrupt?
>> 
>> Please give me some advice,Thanks.
>> 
>> 
>> 
>> Best Regards
>> 
>> LiuLei
> 
> 
> 
> -- 
> Harsh J



High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-15 Thread a...@hsk.hk
Hi,

Please help!

I have installed a Hadoop Cluster with a single master (master1) and have HBase 
running on the HDFS.  Now I am setting up the second master  (master2) in order 
to form HA.  When I used JPS to check the cluster, I found :

2782 Jps
2126 NameNode
2720 SecondaryNameNode
i.e. The datanode on this server could not be started

In the log file, found: 
2012-11-16 10:28:44,851 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
java.io.IOException: Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: 
namenode namespaceID = 1356148070; datanode namespaceID = 1151604993



One of the possible solutions to fix this issue is to:  stop the cluster, 
reformat the NameNode, restart the cluster.
QUESTION: As I already have HBASE running on the cluster, if I reformat the 
NameNode, do I need to reinstall the entire HBASE? I don't mind to have all 
data lost as I don't have many data in HBASE and HDFS, however I don't want to 
re-install HBASE again.


On the other hand, I have tried another solution: stop the DataNode, edit the 
namespaceID in current/VERSION (i.e. set namespaceID=1151604993), restart the 
datanode, it doesn't work:
Warning: $HADOOP_HOME is deprecated.
starting master2, logging to 
/usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out
Exception in thread "main" java.lang.NoClassDefFoundError: master2
Caused by: java.lang.ClassNotFoundException: master2
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: master2.  Program will exit.
QUESTION: Any other solutions?



Thanks



  




Re: High Availability - second namenode (master2) issue: Incompatible namespaceIDs

2012-11-16 Thread a...@hsk.hk
Thank you very much, will try. 


On 16 Nov 2012, at 4:31 PM, Vinayakumar B wrote:

> Hi,
>  
> If you are moving from NonHA (single master) to HA, then follow the below 
> steps.
> 1.   Configure the another namenode’s configuration in the running 
> namenode and all datanode’s configurations. And configure logical fs.defaultFS
> 2.   Configure the shared storage related configuration.
> 3.   Stop the running NameNode and all datanodes.
> 4.   Execute ‘hdfs namenode –initializeSharedEdits’ from the existing 
> namenode installation, to transfer the edits to shared storage.
> 5.   Now format zkfc using ‘hdfs zkfc –formatZK’ and start zkfc using 
> ‘hadoop-daemon.sh start zkfc’
> 6.   Now restart the namenode from existing installation. If all 
> configurations are fine, then NameNode should start successfully as STANDBY, 
> then zkfc will make it to ACTIVE.
>  
> 7.   Now install the NameNode in another machine (master2) with same 
> configuration, except ‘dfs.ha.namenode.id’.
> 8.   Now instead of format, you need to copy the name dir contents from 
> another namenode (master1) to master2’s name dir. For this you are having 2 
> options.
> a.   Execute ‘hdfs namenode -bootStrapStandby’  from the master2 
> installation.
> b.  Using ‘scp’ copy entire contents of name dir from master1 to 
> master2’s name dir.
> 9.   Now start the zkfc for second namenode ( No need to do zkfc format 
> now). Also start the namenode (master2)
>  
> Regards,
> Vinay-
> From: Uma Maheswara Rao G [mailto:mahesw...@huawei.com] 
> Sent: Friday, November 16, 2012 1:26 PM
> To: user@hadoop.apache.org
> Subject: RE: High Availability - second namenode (master2) issue: 
> Incompatible namespaceIDs
>  
> If you format namenode, you need to cleanup storage directories of DataNode 
> as well if that is having some data already. DN also will have namespace ID 
> saved and compared with NN namespaceID. if you format NN, then namespaceID 
> will be changed and DN may have still older namespaceID. So, just cleaning 
> the data in DN would be fine.
>  
> Regards,
> Uma
> From: hadoop hive [hadooph...@gmail.com]
> Sent: Friday, November 16, 2012 1:15 PM
> To: user@hadoop.apache.org
> Subject: Re: High Availability - second namenode (master2) issue: 
> Incompatible namespaceIDs
> 
> Seems like you havn't format your cluster (if its 1st time made).
> 
> On Fri, Nov 16, 2012 at 9:58 AM, a...@hsk.hk  wrote:
> Hi,
>  
> Please help!
>  
> I have installed a Hadoop Cluster with a single master (master1) and have 
> HBase running on the HDFS.  Now I am setting up the second master  (master2) 
> in order to form HA.  When I used JPS to check the cluster, I found :
>  
> 2782 Jps
> 2126 NameNode
> 2720 SecondaryNameNode
> i.e. The datanode on this server could not be started
>  
> In the log file, found: 
> 2012-11-16 10:28:44,851 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: 
> Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID = 
> 1356148070; datanode namespaceID = 1151604993
>  
>  
>  
> One of the possible solutions to fix this issue is to:  stop the cluster, 
> reformat the NameNode, restart the cluster.
> QUESTION: As I already have HBASE running on the cluster, if I reformat the 
> NameNode, do I need to reinstall the entire HBASE? I don't mind to have all 
> data lost as I don't have many data in HBASE and HDFS, however I don't want 
> to re-install HBASE again.
>  
>  
> On the other hand, I have tried another solution: stop the DataNode, edit the 
> namespaceID in current/VERSION (i.e. set namespaceID=1151604993), restart the 
> datanode, it doesn't work:
> Warning: $HADOOP_HOME is deprecated.
> starting master2, logging to 
> /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-master2-master2.out
> Exception in thread "main" java.lang.NoClassDefFoundError: master2
> Caused by: java.lang.ClassNotFoundException: master2
> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> Could not find the main class: master2.  Program will exit.
> QUESTION: Any other solutions?
>  
>  
>  
> Thanks
>  
>  
>  
>   
>  
>  
>  



Datanode: "Cannot start secure cluster without privileged resources"

2012-11-26 Thread a...@hsk.hk
Hi,

I am setting up HDFS security with Kerberos: 
When I manually started the first datanode, I got the following messages (the 
namenode is started):

1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful for 
user 
2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
java.lang.RuntimeException: Cannot start secure cluster without privileged 
resources.

OS: Ubuntu 12.04
Hadoop: 1.0.4

It seems that it could login successfully but something is missing
Please help!

Thanks



 

Re: Datanode: "Cannot start secure cluster without privileged resources"

2012-11-26 Thread a...@hsk.hk
Hi Harsh,

Thank you very much for your reply, got it!

Thanks
ac

On 26 Nov 2012, at 8:32 PM, Harsh J wrote:

> Secure DN needs to be started as root (it runs as proper user, but
> needs to be started as root to grab reserved ports), and needs a
> proper jsvc binary (for your arch/OS) available. Are you using
> tarballs or packages (and if packages, are they from Bigtop)?
> 
> On Mon, Nov 26, 2012 at 5:21 PM, a...@hsk.hk  wrote:
>> Hi,
>> 
>> I am setting up HDFS security with Kerberos:
>> When I manually started the first datanode, I got the following messages 
>> (the namenode is started):
>> 
>> 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful 
>> for user 
>> 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> java.lang.RuntimeException: Cannot start secure cluster without privileged 
>> resources.
>> 
>> OS: Ubuntu 12.04
>> Hadoop: 1.0.4
>> 
>> It seems that it could login successfully but something is missing
>> Please help!
>> 
>> Thanks
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J



Re: Datanode: "Cannot start secure cluster without privileged resources"

2012-11-26 Thread a...@hsk.hk
Hi,

A question:
I started Secure DN then ran JPS as root, I could not find any running DN:
16152 
16195 Jps

However, when I tried to start the secure DN again, I got: 
Warning: $HADOOP_HOME is deprecated.
datanode running as process 16117. Stop it first.

Does it mean JPS is no longer a tool to check DN in secure mode?

Thanks


On 26 Nov 2012, at 9:03 PM, a...@hsk.hk wrote:

> Hi Harsh,
> 
> Thank you very much for your reply, got it!
> 
> Thanks
> ac
> 
> On 26 Nov 2012, at 8:32 PM, Harsh J wrote:
> 
>> Secure DN needs to be started as root (it runs as proper user, but
>> needs to be started as root to grab reserved ports), and needs a
>> proper jsvc binary (for your arch/OS) available. Are you using
>> tarballs or packages (and if packages, are they from Bigtop)?
>> 
>> On Mon, Nov 26, 2012 at 5:21 PM, a...@hsk.hk  wrote:
>>> Hi,
>>> 
>>> I am setting up HDFS security with Kerberos:
>>> When I manually started the first datanode, I got the following messages 
>>> (the namenode is started):
>>> 
>>> 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful 
>>> for user 
>>> 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>> java.lang.RuntimeException: Cannot start secure cluster without privileged 
>>> resources.
>>> 
>>> OS: Ubuntu 12.04
>>> Hadoop: 1.0.4
>>> 
>>> It seems that it could login successfully but something is missing
>>> Please help!
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Harsh J
> 



Re: Datanode: "Cannot start secure cluster without privileged resources"

2012-11-26 Thread a...@hsk.hk
Hi,

Thanks for your reply.

However, I think 16152 should not be the DN, since
1) my second try of "/usr/local/hadoop/bin/hadoop-daemon.sh start datanode" 
says 16117 (i.e. I ran start datanode twice), and 
2) ps axu | grep 16117, I got
root 16117  0.0  0.0  17004   904 pts/2S21:34   0:00 jsvc.exec 
-Dproc_datanode -outfile /usr/local/hadoop-1.0.4/libexec/ ...

These are the two reasons that I think JPS is no longer a tool to check secure 
DN.

Thanks again!


On 26 Nov 2012, at 9:47 PM, Harsh J wrote:

> The 16152 should be the DN JVM I think. This is a jps limitation, as
> seen at http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jps.html
> and jsvc (which secure mode DN uses) is such a custom launcher.
> 
> "The jps command uses the java launcher to find the class name and
> arguments passed to the main method. If the target JVM is started with
> a custom launcher, the class name (or JAR file name) and the arguments
> to the main method will not be available. In this case, the jps
> command will output the string Unknownfor the class name or JAR file
> name and for the arguments to the main method."
> 
> On Mon, Nov 26, 2012 at 7:11 PM, a...@hsk.hk  wrote:
>> Hi,
>> 
>> A question:
>> I started Secure DN then ran JPS as root, I could not find any running DN:
>> 16152
>> 16195 Jps
>> 
>> However, when I tried to start the secure DN again, I got:
>> Warning: $HADOOP_HOME is deprecated.
>> datanode running as process 16117. Stop it first.
>> 
>> Does it mean JPS is no longer a tool to check DN in secure mode?
>> 
>> Thanks
>> 
>> 
>> On 26 Nov 2012, at 9:03 PM, a...@hsk.hk wrote:
>> 
>>> Hi Harsh,
>>> 
>>> Thank you very much for your reply, got it!
>>> 
>>> Thanks
>>> ac
>>> 
>>> On 26 Nov 2012, at 8:32 PM, Harsh J wrote:
>>> 
>>>> Secure DN needs to be started as root (it runs as proper user, but
>>>> needs to be started as root to grab reserved ports), and needs a
>>>> proper jsvc binary (for your arch/OS) available. Are you using
>>>> tarballs or packages (and if packages, are they from Bigtop)?
>>>> 
>>>> On Mon, Nov 26, 2012 at 5:21 PM, a...@hsk.hk  wrote:
>>>>> Hi,
>>>>> 
>>>>> I am setting up HDFS security with Kerberos:
>>>>> When I manually started the first datanode, I got the following messages 
>>>>> (the namenode is started):
>>>>> 
>>>>> 1) INFO org.apache.hadoop.security.UserGroupInformation: Login successful 
>>>>> for user 
>>>>> 2) ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
>>>>> java.lang.RuntimeException: Cannot start secure cluster without 
>>>>> privileged resources.
>>>>> 
>>>>> OS: Ubuntu 12.04
>>>>> Hadoop: 1.0.4
>>>>> 
>>>>> It seems that it could login successfully but something is missing
>>>>> Please help!
>>>>> 
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J



Re: Datanode: "Cannot start secure cluster without privileged resources"

2012-11-26 Thread a...@hsk.hk
/../lib/commons-cli-1.2.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-codec-1.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-collections-3.2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-configuration-1.6.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-daemon-1.0.10.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-digester-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-el-1.0.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-httpclient-3.0.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-io-2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-lang-2.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-logging-1.1.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-logging-api-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-math-2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/commons-net-1.4.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/core-3.1.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hadoop-capacity-scheduler-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hadoop-fairscheduler-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hadoop-thriftfs-1.0.4.jar:/usr/local/hadoop-1.0.4/libexec/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jackson-core-asl-1.8.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jasper-compiler-5.5.12.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jasper-runtime-5.5.12.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jdeb-0.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jersey-core-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jersey-json-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jersey-server-1.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jets3t-0.6.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jetty-6.1.26.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jetty-util-6.1.26.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jsch-0.1.42.jar:/usr/local/hadoop-1.0.4/libexec/../lib/junit-4.5.jar:/usr/local/hadoop-1.0.4/libexec/../lib/kfs-0.2.2.jar:/usr/local/hadoop-1.0.4/libexec/../lib/log4j-1.2.15.jar:/usr/local/hadoop-1.0.4/libexec/../lib/mockito-all-1.8.5.jar:/usr/local/hadoop-1.0.4/libexec/../lib/oro-2.0.8.jar:/usr/local/hadoop-1.0.4/libexec/../lib/servlet-api-2.5-20081211.jar:/usr/local/hadoop-1.0.4/libexec/../lib/slf4j-api-1.4.3.jar:/usr/local/hadoop-1.0.4/libexec/../lib/slf4j-log4j12-1.4.3.jar:/usr/local/hadoop-1.0.4/libexec/../lib/xmlenc-0.52.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop-1.0.4/libexec/../lib/jsp-2.1/jsp-api-2.1.jar
 -Xmx1000m -jvm server -Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS 
-Dcom.sun.management.jmxremote -Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS 
-Dcom.sun.management.jmxremote -Xmx1024m -Dsecurity.audit.logger=ERROR,DRFAS 
-Dcom.sun.management.jmxremote 
-Dhadoop.log.dir=/usr/local/hadoop-1.0.4/libexec/../logs 
-Dhadoop.log.file=hadoop-hduser-datanode-m147.log 
-Dhadoop.home.dir=/usr/local/hadoop-1.0.4/libexec/.. -Dhadoop.id.str=hduser 
-Dhadoop.root.logger=INFO,DRFA -Dhadoop.security.logger=INFO,NullAppender 
-Djava.library.path=/usr/local/hadoop-1.0.4/libexec/../lib/native/Linux-amd64-64
 -Dhadoop.policy.file=hadoop-policy.xml 
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter
root 16499  0.0  0.0   9388   920 pts/0R+   22:35   0:00 grep 
--color=auto 16117



I started all DNs in secure mode now.
Thanks again!

ac

On 26 Nov 2012, at 10:30 PM, Harsh J wrote:

> Could you also check what 16152 is? The jsvc is a launcher process,
> not the JVM itself.
> 
> As I mentioned, JPS is pretty reliable, just wont' show the name of
> the JVM launched by a custom wrapper - and will show just PID.
> 
> On Mon, Nov 26, 2012 at 7:35 PM, a...@hsk.hk  wrote:
>> Hi,
>> 
>> Thanks for your reply.
>> 
>> However, I think 16152 should not be the DN, since
>> 1) my second try of "/usr/local/hadoop/bin/hadoop-daemon.sh start datanode" 
>> says 16117 (i.e. I ran start datanode twice), and
>> 2) ps axu | grep 16117, I got
>> root 16117  0.0  0.0  17004   904 pts/2S21:34   0:00 jsvc.exec 
>> -Dproc_datanode -outfile /usr/local/hadoop-1.0.4/libexec/ ...
>> 
>> These are the two reasons that I think JPS is no longer a tool to check 
>> secure DN.
>> 
>> Thanks again!
>> 
>> 
>> On 26 Nov 2012, at 9:47 PM, Harsh J wrote:
>> 
>>> The 16152 should be the DN JVM I think. This is a jps limitation, as
>>> seen at http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jps.html
>>> and jsvc (which secure mode DN uses) is such a custom launcher.
>>> 
>>> "The jps command uses the java launcher to find the class name and
>>> arguments passed to the main method. If the target JVM is started with
>>> a custom launcher, the class name (or JAR file name) and the arguments
>>> to the main method wil

Failed To Start SecondaryNameNode in Secure Mode

2012-11-27 Thread a...@hsk.hk
Hi,

Please help!

I tried to start SecondaryNameNode in secure mode by the command: 
{$HADOOP_HOME}bin/hadoop-daemon.sh start secondarynamenode

1) from the log, I saw "Login successful" 
/
2012-11-27 22:05:23,120 INFO 
org.apache.hadoop.security.UserGroupInformation: Login successful for user 
..
2012-11-27 22:05:23,246 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down SecondaryNameNode at ..
/


2) However, from the command line, I saw 
$ {$HADOOP_HOME}/bin/hadoop-daemon.sh start secondarynamenode
Warning: $HADOOP_HOME is deprecated.
starting secondarynamenode, logging to 
/usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-m146.out
Exception in thread "main" java.io.IOException: Login failure for null 
from keytab /etc/hadoop/hadoop.keytab
at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:183)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567)
Caused by: javax.security.auth.login.LoginException: Unable to obtain 
Princpal Name for authentication 
at 
com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733)
at 
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629)
at 
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)


There is no secondarynamenode process if I use JPS to check 

QUESTION: Any idea where I am wrong?


Thanks
ac





Re: Failed To Start SecondaryNameNode in Secure Mode

2012-11-28 Thread a...@hsk.hk
ul for user 
..
2012-11-28 22:43:01,447 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Starting web server 
as: host/m146..
2012-11-28 22:43:01,480 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-11-28 22:43:01,531 INFO org.apache.hadoop.http.HttpServer: Added 
global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-11-28 22:43:01,536 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down SecondaryNameNode at m146
/




Please help!
Thanks
ac





On 28 Nov 2012, at 12:57 AM, Arpit Gupta wrote:

> Hi AC,
> 
> Do you have the following property defined in your hdfs-site.xml
> 
> 
> dfs.secondary.namenode.kerberos.internal.spnego.principal
> HTTP/_HOST@REALM
> 
> 
> and this principal needs to be available in your /etc/hadoop/hadoop.keytab. 
> From the logs it looks like you only have the following configured 
> "dfs.secondary.namenode.kerberos.principal"
> 
> 
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
> 
> On Nov 27, 2012, at 6:14 AM, "a...@hsk.hk"  wrote:
> 
>> Hi,
>> 
>> Please help!
>> 
>> I tried to start SecondaryNameNode in secure mode by the command: 
>> {$HADOOP_HOME}bin/hadoop-daemon.sh start secondarynamenode
>> 
>> 1) from the log, I saw "Login successful" 
>>  /
>>  2012-11-27 22:05:23,120 INFO 
>> org.apache.hadoop.security.UserGroupInformation: Login successful for user 
>> ..
>>  2012-11-27 22:05:23,246 INFO 
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>  /
>>  SHUTDOWN_MSG: Shutting down SecondaryNameNode at ..
>>  /
>> 
>> 
>> 2) However, from the command line, I saw 
>>  $ {$HADOOP_HOME}/bin/hadoop-daemon.sh start secondarynamenode
>>  Warning: $HADOOP_HOME is deprecated.
>>  starting secondarynamenode, logging to 
>> /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-m146.out
>>  Exception in thread "main" java.io.IOException: Login failure for null 
>> from keytab /etc/hadoop/hadoop.keytab
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716)
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:183)
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129)
>>  at 
>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567)
>>  Caused by: javax.security.auth.login.LoginException: Unable to obtain 
>> Princpal Name for authentication 
>>  at 
>> com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733)
>>  at 
>> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629)
>>  at 
>> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 
>> 
>> There is no secondarynamenode process if I use JPS to check 
>> 
>> QUESTION: Any idea where I am wrong?
>> 
>> 
>> Thanks
>> ac
>> 
>> 
>> 
> 



Re: Failed To Start SecondaryNameNode in Secure Mode

2012-11-29 Thread a...@hsk.hk
Hi,

I found this error message in the .out file after trying to start 
SecondaryNameNode in secure mode

Exception in thread "main" java.lang.IllegalArgumentException: Does not contain 
a valid host:port authority: m146:m146:0
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:205)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:190)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:190)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567)

version: 1.0.4
distributed cluster: true
os: ubuntu 12.04
server: also the server of NameNode, the NameNode is already started in secure 
mode

QUESTION:   It seems that it is related to configuration but what does this 
error mean and where I would be wrong?  Please help!

Thanks
ac




P.S.
below is form the .log file

2012-11-29 17:42:16,687 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: 
/
STARTUP_MSG: Starting SecondaryNameNode
STARTUP_MSG:   host = m146..
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.4
STARTUP_MSG:   build = 
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; 
compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
/
2012-11-29 17:42:17,174 INFO org.apache.hadoop.security.UserGroupInformation: 
Login successful for user hduser/m146..
2012-11-29 17:42:17,405 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Starting web server 
as: host/m146...
2012-11-29 17:42:17,434 INFO org.mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-11-29 17:42:17,468 INFO org.apache.hadoop.http.HttpServer: Added global 
filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-11-29 17:42:17,473 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down SecondaryNameNode at m146...
/

Re: Failed To Start SecondaryNameNode in Secure Mode

2012-11-29 Thread a...@hsk.hk
Hi

Since NN and SNN are used in the same server:

1) If i use the default "dfs.secondary.http.address", i.e. 0.0.0.0:50090  
(commented out  dfs.secondary.http.address property)

 I got : Exception in thread "main" java.lang.IllegalArgumentException: 
Does not contain a valid host:port authority: 0.0.0.0:0.0.0.0:0


2) If I add the following to hdfs-site.xml  
 
 dfs.secondary.http.address   
 m146:50090
 

I got: Exception in thread "main" java.lang.IllegalArgumentException: Does 
not contain a valid host:port authority: m146:m146:0

in both cases, the port was not 50090, very strange.



Thanks
AC


On 29 Nov 2012, at 5:46 PM, a...@hsk.hk wrote:

> Hi,
> 
> I found this error message in the .out file after trying to start 
> SecondaryNameNode in secure mode
> 
> Exception in thread "main" java.lang.IllegalArgumentException: Does not 
> contain a valid host:port authority: m146:m146:0
> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:162)
> at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:205)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$1.run(SecondaryNameNode.java:190)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:190)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:129)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:567)
> 
> version: 1.0.4
> distributed cluster: true
> os: ubuntu 12.04
> server: also the server of NameNode, the NameNode is already started in 
> secure mode
> 
> QUESTION:   It seems that it is related to configuration but what does this 
> error mean and where I would be wrong?  Please help!
> 
> Thanks
> ac
> 
> 
> 
> 
> P.S.
> below is form the .log file
> 
> 2012-11-29 17:42:16,687 INFO 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: 
> /
> STARTUP_MSG: Starting SecondaryNameNode
> STARTUP_MSG:   host = m146..
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 1.0.4
> STARTUP_MSG:   build = 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 
> 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
> /
> 2012-11-29 17:42:17,174 INFO org.apache.hadoop.security.UserGroupInformation: 
> Login successful for user hduser/m146..
> 2012-11-29 17:42:17,405 INFO 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Starting web server 
> as: host/m146...
> 2012-11-29 17:42:17,434 INFO org.mortbay.log: Logging to 
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via 
> org.mortbay.log.Slf4jLog
> 2012-11-29 17:42:17,468 INFO org.apache.hadoop.http.HttpServer: Added global 
> filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 2012-11-29 17:42:17,473 INFO 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down SecondaryNameNode at m146...
> /



Re: CheckPoint Node

2012-11-30 Thread a...@hsk.hk
Hi JM,

If you migrate 1.0.3 to 2.0.x, could you mind to share your migration steps? it 
is because I also have a 1.0.4 cluster (Ubuntu 12.04, Hadoop 1.0.4, Hbase 
0.94.2 and ZooKeeper 3.4.4 ) and want to migrate it to 2.0.x in order to avoid 
the hardware failure of the NameNode.

I have a testing cluster ready for the migration test.
 
Thanks
ac



On 1 Dec 2012, at 10:25 AM, Jean-Marc Spaggiari wrote:

> Sorry about that. My fault.
> 
> I have put this on the core-site.xml file but should be on the 
> hdfs-site.xml...
> 
> I moved it and it's now working fine.
> 
> Thanks.
> 
> JM
> 
> 2012/11/30, Jean-Marc Spaggiari :
>> Hi,
>> 
>> Is there a way to ask Hadoop to display its parameters?
>> 
>> I have updated the property as followed:
>>  
>>dfs.name.dir
>>${hadoop.tmp.dir}/dfs/name,/media/usb0/
>>  
>> 
>> But even if I stop/start hadoop, there is nothing written on the usb
>> drive. So I'm wondering if there is a command line like bin/hadoop
>> --showparameters
>> 
>> Thanks,
>> 
>> JM
>> 
>> 2012/11/22, Jean-Marc Spaggiari :
>>> Perfect. Thanks again for your time!
>>> 
>>> I will first add another drive on the Namenode because this will take
>>> 5 minutes. Then I will read about the migration from 1.0.3 to 2.0.x
>>> and most probably will use the zookeeper solution.
>>> 
>>> This will take more time, so will be done over the week-end.
>>> 
>>> I lost 2 hard drives this week (2 datanodes), so I'm not a bit
>>> concerned about the NameNode data. Just want to secure that a bit
>>> more.
>>> 
>>> JM
>>> 
>>> 2012/11/22, Harsh J :
 Jean-Marc (Sorry if I've been spelling your name wrong),
 
 0.94 does support Hadoop-2 already, and works pretty well with it, if
 that is your only concern. You only need to use the right download (or
 if you compile, use the -Dhadoop.profile=23 maven option).
 
 You will need to restart the NameNode to make changes to the
 dfs.name.dir property and set it into effect. A reasonably fast disk
 is needed for quicker edit log writes (few bytes worth in each round)
 but a large, or SSD-style disk is not a requisite. An external disk
 would work fine too (instead of an NFS), as long as it is reliable.
 
 You do not need to copy data manually - just ensure that your NameNode
 process user owns the directory and it will auto-populate the empty
 directory on startup.
 
 Operationally speaking, in case 1/2 disk fails, the NN Web UI (and
 metrics as well) will indicate this (see bottom of NN UI page for an
 example of what am talking about) but the NN will continue to run with
 the lone remaining disk, but its not a good idea to let it run for too
 long without fixing/replacing the disk, for you will be losing out on
 redundancy.
 
 On Thu, Nov 22, 2012 at 11:59 PM, Jean-Marc Spaggiari
  wrote:
> Hi Harsh,
> 
> Again, thanks a lot for all those details.
> 
> I read the previous link and I totally understand the HA NameNode. I
> already have a zookeeper quorum (3 servers) that I will be able to
> re-use. However, I'm running HBase 0.94.2 which is not yet compatible
> (I think) with Hadoop 2.0.x. So I will have to go with a non-HA
> NameNode until I can migrate to a stable 0.96 HBase version.
> 
> Can I "simply" add one directory to dfs.name.dir and restart
> my namenode? Is it going to feed all the required information in this
> directory? Or do I need to copy the data of the existing one in the
> new one before I restart it? Also, does it need a fast transfert rate?
> Or will an exteral hard drive (quick to be moved to another server if
> required) be enought?
> 
> 
> 2012/11/22, Harsh J :
>> Please follow the tips provided at
>> http://wiki.apache.org/hadoop/FAQ#How_do_I_set_up_a_hadoop_node_to_use_multiple_volumes.3Fand
>> http://wiki.apache.org/hadoop/FAQ#If_the_NameNode_loses_its_only_copy_of_the_fsimage_file.2C_can_the_file_system_be_recovered_from_the_DataNodes.3F
>> 
>> In short, if you use a non-HA NameNode setup:
>> 
>> - Yes the NN is a very vital persistence point in running HDFS and its
>> data should be redundantly stored for safety.
>> - You should, in production, configure your NameNode's image and edits
>> disk (dfs.name.dir in 1.x+, or dfs.namenode.name.dir in 0.23+/2.x+) to
>> be a dedicated one with adequate free space for gradual growth, and
>> should configure multiple disks (with one off-machine NFS point highly
>> recommended for easy recovery) for adequate redundancy.
>> 
>> If you instead use a HA NameNode setup (I'd highly recommend doing
>> this since it is now available), the presence of > 1 NameNodes and the
>> journal log mount or quorum setup would automatically act as
>> safeguards for the FS metadata.
>> 
>> On Thu, Nov 22, 2012 at 11:03 PM, Jean-Marc Spaggiari
>>  wrote:
>>> Hi Harsh,

Re: Map Reduce jobs taking a long time at the end

2012-12-04 Thread a...@hsk.hk
Hi,

Have you also checked .out file of the tasktracker in logs? It could contain 
some useful information for the issue.

Thanks
ac


On 4 Dec 2012, at 8:27 PM, Jay Whittaker wrote:

> Hey,
> 
> We are running Map reduce jobs against a 12 machine hbase cluster and
> for a long time they took approx 30 mins to return a result against ~95
> million rows. Without any major changes to the data or any upgrade of
> hbase/hadoop they now seem to be taking about 4 hours. and the logs are
> full of
> 
> 2012-12-04 13:33:15,602 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201211210952_0293_m_31_0 0.0% row: 63 6f 6d 2e 70 72 6f 75
> 67 68 74
> ...
> 2012-12-04 13:45:17,134 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201211210952_0293_m_31_0 0.0% row: 63 6f 6d 2e 70 75 72 70
> 6c 65 64 65 73 69 67 6e 73 65 72 76 69 63 65 73
> ...
> 2012-12-04 13:46:11,515 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201211210952_0293_m_31_0 0.0% row: 63 6f 6d 2e 70 75 73 68
> 74 6f 74 61 6c 6b 2d 6f 6e 6c 69 6e 65
> 
> I presume the 0% is percent complete but I'm not sure as to why the time
> to complete has now jumped massively. Ganglia shows no major load on the
> nodes in question so I don't think it's that.
> 
> What steps should I be taking to try troubleshoot the problem?
> 
> Regards,
> 
> Jay



Re: Strange machine behavior

2012-12-09 Thread a...@hsk.hk
Hi,

I always set  "vm.swappiness = 0" for my hadoop servers (PostgreSQL servers 
too).  

The reason is that Linux moves memory pages to swap space if they have not been 
accessed for a period of time (swapping).  Java virtual machine (JVM) does not 
act well in the case of swapping that will make MapReduce (and HBase and 
ZooKeeper) run into trouble.   So I would suggest to set  vm.swappiness = 0.

Thanks
ac

On 9 Dec 2012, at 12:58 PM, seth wrote:

> Oracle frequently recommends vm.swappiness = 0 to get well behaved RAC nodes. 
>  Otherwise you start paging out things you don't usually want paged out in 
> favor of a larger filesystem cache.
> 
> There is also a vm parameter that controls the minimum size of the free 
> chain, might want to increase that a bit.
> 
> Also, look into hosting your JVM heap on huge pages, they can't be paged out 
> and will help the JVM perform better too.
> 
> On Dec 8, 2012, at 6:09 PM, Robert Dyer  wrote:
> 
>> Has anyone experienced a TaskTracker/DataNode behaving like the attached 
>> image?
>> 
>> This was during a MR job (which runs often).  Note the extremely high System 
>> CPU time.  Upon investigating I saw that out of 64GB ram the system had 
>> allocated almost 45GB to cache!
>> 
>> I did a sudo sh -c "sync ; echo 3 > /proc/sys/vm/drop_cache ; sync" which is 
>> roughly where the graph goes back to normal (much lower System, much higher 
>> User).
>> 
>> This has happened a few times.
>> 
>> I have tried playing with the sysctl vm.swappiness value (default of 60) by 
>> setting it to 30 (which it was at when the graph was collected) and now to 
>> 10.  I am not sure that helps.
>> 
>> Any ideas?  Anyone else run into this before?
>> 
>> 24 cores
>> 64GB ram
>> 4x2TB sata3 hdd
>> 
>> Running Hadoop 1.0.4, with a DataNode (2gb heap), TaskTracker (2gb heap) on 
>> this machine.
>> 
>> 24 map slots (1gb heap each), no reducers.
>> 
>> Also running HBase 0.94.2 with a RS (8gb ram) on this machine.
>> 



Re: IOException:Error Recovery for block

2012-12-09 Thread a...@hsk.hk
Hi,

Can you let us know which Hadoop version you are using?

Thanks
ac

On 9 Dec 2012, at 3:03 PM, Manoj Babu wrote:

> Hi All,
> 
> When grepping the error logs i could see the below for a job which process 
> some 500GB of data. What would be the cause and how to avoid it further?
> 
> java.io.IOException: Error Recovery for block blk_4907953961673137346_1929435 
> failed  because recovery from primary datanode 12.104.0.154:50010 failed 6 
> times.  Pipeline was 12.104.0.154:50010. Aborting...
>   at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2833)
>   at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2305)
>   at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2477)
> 
> Cheers!
> Manoj.
>