Re: Nutch hadoop integration
may be this will help you if you have not already checked it http://wiki.apache.org/nutch/NutchHadoopTutorial On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari abhishektiwari.u...@gmail.com wrote: how can i integrate hadood and nutch ..anyone please brief me . -- Nitin Pawar
Re: Nutch hadoop integration
how can i integrate hadood and nutch ..anyone please brief me . Just configure hadoop cluster. Configure nutch path to store the nuth crawl index and crawl list to hdfs. Thats it. -- *Biju*
Hadoop-Git-Eclipse
Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github (https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps for earlier versions of Hadoop. I am right now following instructions given in these links: - http://wiki.apache.org/hadoop/GitAndHadoop - http://wiki.apache.org/hadoop/EclipseEnvironment - http://wiki.apache.org/hadoop/HowToContribute Can someone please give me a link to the steps to be followed for getting Hadoop (latest from trunk) started in Eclipse? I need to be able to commit changes to my forked repository on github. Thanks in advance. Regards, Prajakta
Re: Nutch hadoop integration
Check out these links : http://wiki.apache.org/nutch/NutchHadoopTutorial http://wiki.apache.org/nutch/NutchTutorial http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/ http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari abhishektiwari.u...@gmail.com wrote: how can i integrate hadood and nutch ..anyone please brief me . -- ∞ Shashwat Shriparv
Re: Hadoop-Git-Eclipse
Check out this link: http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/ Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote: Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github (https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps for earlier versions of Hadoop. I am right now following instructions given in these links: - http://wiki.apache.org/hadoop/GitAndHadoop - http://wiki.apache.org/hadoop/EclipseEnvironment - http://wiki.apache.org/hadoop/HowToContribute Can someone please give me a link to the steps to be followed for getting Hadoop (latest from trunk) started in Eclipse? I need to be able to commit changes to my forked repository on github. Thanks in advance. Regards, Prajakta -- ∞ Shashwat Shriparv
Hadoop command not found:hdfs and yarn
Hi I am trying to execute the following commands for setting up Hadoop: # Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode hdfs datanode yarn resourcemanager yarn nodemanager It gives me a Hadoop Command not found. error for all the commands. When I try to use hadoop namenode -format instead, it gives me a deprecated command warning. Can someone please tell me if I am missing including any env variables? I have included HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, YARN_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR, HADOOP_PREFIX in my path (apart from java etc). I am following the instructions for setting up Hadoop with Eclipse given in - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment - http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html Regards, Prajakta
Re: Nutch hadoop integration
http://wiki.apache.org/nutch/NutchHadoopTutorial above tutorial is not working for me .. i am using nutch 1.4 .. can u give the steps.. what property i have to set in nutch-site.xml On Fri, Jun 8, 2012 at 1:34 PM, shashwat shriparv dwivedishash...@gmail.com wrote: Check out these links : http://wiki.apache.org/nutch/NutchHadoopTutorial http://wiki.apache.org/nutch/NutchTutorial http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/ http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari abhishektiwari.u...@gmail.com wrote: how can i integrate hadood and nutch ..anyone please brief me . -- ∞ Shashwat Shriparv
AUTO: Prabhat Pandey is out of the office (returning 06/28/2012)
I am out of the office until 06/28/2012. I am out of the office until 06/28/2012. For any issues please contact Dispatcher: dbqor...@us.ibm.com Thanks. Prabhat Pandey Note: This is an automated response to your message Nutch hadoop integration sent on 06/08/2012 1:59:22. This is the only notification you will receive while this person is away.
Re: Hadoop command not found:hdfs and yarn
Hello , Can you quickly review your hadoop install with below page may be you get some hints to install. http://jugnu-life.blogspot.in/2012/05/hadoop-20-install-tutorial-023x.html The depreciated warning is correct as hadoop jobs have been divided now. Regards, Jagat Singh On Fri, Jun 8, 2012 at 2:56 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote: Hi I am trying to execute the following commands for setting up Hadoop: # Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode hdfs datanode yarn resourcemanager yarn nodemanager It gives me a Hadoop Command not found. error for all the commands. When I try to use hadoop namenode -format instead, it gives me a deprecated command warning. Can someone please tell me if I am missing including any env variables? I have included HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, YARN_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR, HADOOP_PREFIX in my path (apart from java etc). I am following the instructions for setting up Hadoop with Eclipse given in - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment - http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html Regards, Prajakta
InvalidJobConfException
Hi, Here I'm developing a MapReduce web crawler which reads url lists and writes html to MongoDB. So, each map read one url list file, get the html and insert to MongoDB. There is no reduce and no output of map. So, how to set the output directory in this case? If I do not set the output directory, it gives me following exception, Exception in thread main org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapreduce.Job.submit(Job.java:476) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Thank you ! Best, Huanchen 2012-06-08 huanchen.zhang
Re: InvalidJobConfException
Hi Huanchen, Just set your output format class to NullOutputFormat http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.html if you don't need any direct outputs to HDFS/etc. from your M/R classes. On Fri, Jun 8, 2012 at 4:16 PM, huanchen.zhang huanchen.zh...@ipinyou.com wrote: Hi, Here I'm developing a MapReduce web crawler which reads url lists and writes html to MongoDB. So, each map read one url list file, get the html and insert to MongoDB. There is no reduce and no output of map. So, how to set the output directory in this case? If I do not set the output directory, it gives me following exception, Exception in thread main org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapreduce.Job.submit(Job.java:476) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Thank you ! Best, Huanchen 2012-06-08 huanchen.zhang -- Harsh J
RE: InvalidJobConfException
By default it uses the TextOutputFomat(subclass of FileOutputFormat) which checks for output path. You can use NullOuputFormat or your custom output format which doesn't do any thing for your job. Thanks Devaraj From: huanchen.zhang [huanchen.zh...@ipinyou.com] Sent: Friday, June 08, 2012 4:16 PM To: common-user Subject: InvalidJobConfException Hi, Here I'm developing a MapReduce web crawler which reads url lists and writes html to MongoDB. So, each map read one url list file, get the html and insert to MongoDB. There is no reduce and no output of map. So, how to set the output directory in this case? If I do not set the output directory, it gives me following exception, Exception in thread main org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set. at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833) at org.apache.hadoop.mapreduce.Job.submit(Job.java:476) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506) at com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:186) Thank you ! Best, Huanchen 2012-06-08 huanchen.zhang
Re: Hadoop-Git-Eclipse
I did not find that screencast useful. This one worked for me: http://wiki.apache.org/hadoop/EclipseEnvironment Best, Deniz On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote: Check out this link: http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/ Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote: Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github (https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps for earlier versions of Hadoop. I am right now following instructions given in these links: - http://wiki.apache.org/hadoop/GitAndHadoop - http://wiki.apache.org/hadoop/EclipseEnvironment - http://wiki.apache.org/hadoop/HowToContribute Can someone please give me a link to the steps to be followed for getting Hadoop (latest from trunk) started in Eclipse? I need to be able to commit changes to my forked repository on github. Thanks in advance. Regards, Prajakta -- ∞ Shashwat Shriparv
Re: Hadoop-Git-Eclipse
Hi Yes I did configure using the wiki link at http://wiki.apache.org/hadoop/EclipseEnvironment. I am facing a new problem while setting up Hadoop in Psuedo-distributed mode on my laptop. I am trying to execute the following commands for setting up Hadoop: hdfs namenode -format hdfs namenode hdfs datanode yarn resourcemanager yarn nodemanager It gives me a Hadoop Common not found. error for all the commands. When I try to use hadoop namenode -format instead, it gives me a deprecated command warning. I am following the instructions for setting up Hadoop with Eclipse given in - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment - http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html This issue is discussed in JIRA https://issues.apache.org/jira/browse/HDFS-2014 and is resolved. Not sure why I am getting the error. My environment variables look something like: HADOOP_COMMON_HOME=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT HADOOP_CONF_DIR=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT/etc/hadoop HADOOP_HDFS_HOME=/home/Projects/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-3.0.0-SNAPSHOT HADOOP_MAPRED_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/target/hadoop-mapreduce-3.0.0-SNAPSHOT YARN_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/target/hadoop-yarn-common-3.0.0-SNAPSHOT YARN_CONF_DIR=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/conf I have included them in the PATH. I am trying to build and setup from apache-hadoop-common git repository (my own cloned fork). Any idea why 'Hadoop Common Not found' error is coming? Do I have to add anything to the hadoop-config.sh or hdfs-config.sh? Regards, Prajakta Deniz Demir denizde...@me.com 06/08/2012 05:35 PM Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org, cc Subject Re: Hadoop-Git-Eclipse I did not find that screencast useful. This one worked for me: http://wiki.apache.org/hadoop/EclipseEnvironment Best, Deniz On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote: Check out this link: http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/ Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.com wrote: Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github (https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps for earlier versions of Hadoop. I am right now following instructions given in these links: - http://wiki.apache.org/hadoop/GitAndHadoop - http://wiki.apache.org/hadoop/EclipseEnvironment - http://wiki.apache.org/hadoop/HowToContribute Can someone please give me a link to the steps to be followed for getting Hadoop (latest from trunk) started in Eclipse? I need to be able to commit changes to my forked repository on github. Thanks in advance. Regards, Prajakta -- ∞ Shashwat Shriparv
Re: Hadoop-Git-Eclipse
Check out these thread : http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/22976 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201012.mbox/%3c4cff292d.3090...@corp.mail.ru%3E On Fri, Jun 8, 2012 at 6:24 PM, Prajakta Kalmegh pkalm...@gmail.com wrote: Hi Yes I did configure using the wiki link at http://wiki.apache.org/hadoop/EclipseEnvironment. I am facing a new problem while setting up Hadoop in Psuedo-distributed mode on my laptop. I am trying to execute the following commands for setting up Hadoop: hdfs namenode -format hdfs namenode hdfs datanode yarn resourcemanager yarn nodemanager It gives me a Hadoop Common not found. error for all the commands. When I try to use hadoop namenode -format instead, it gives me a deprecated command warning. I am following the instructions for setting up Hadoop with Eclipse given in - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment - http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html This issue is discussed in JIRA https://issues.apache.org/jira/browse/HDFS-2014 and is resolved. Not sure why I am getting the error. My environment variables look something like: HADOOP_COMMON_HOME=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT HADOOP_CONF_DIR=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT/etc/hadoop HADOOP_HDFS_HOME=/home/Projects/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-3.0.0-SNAPSHOT HADOOP_MAPRED_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/target/hadoop-mapreduce-3.0.0-SNAPSHOT YARN_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/target/hadoop-yarn-common-3.0.0-SNAPSHOT YARN_CONF_DIR=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/conf I have included them in the PATH. I am trying to build and setup from apache-hadoop-common git repository (my own cloned fork). Any idea why 'Hadoop Common Not found' error is coming? Do I have to add anything to the hadoop-config.sh or hdfs-config.sh? Regards, Prajakta Deniz Demir denizde...@me.com 06/08/2012 05:35 PM Please respond to common-user@hadoop.apache.org To common-user@hadoop.apache.org, cc Subject Re: Hadoop-Git-Eclipse I did not find that screencast useful. This one worked for me: http://wiki.apache.org/hadoop/EclipseEnvironment Best, Deniz On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote: Check out this link: http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/ Regards ∞ Shashwat Shriparv On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.com wrote: Hi I have done MapReduce programming using Eclipse before but now I need to learn the Hadoop code internals for one of my projects. I have forked Hadoop from github ( https://github.com/apache/hadoop-common ) and need to configure it to work with Eclipse. All the links I could find list steps for earlier versions of Hadoop. I am right now following instructions given in these links: - http://wiki.apache.org/hadoop/GitAndHadoop - http://wiki.apache.org/hadoop/EclipseEnvironment - http://wiki.apache.org/hadoop/HowToContribute Can someone please give me a link to the steps to be followed for getting Hadoop (latest from trunk) started in Eclipse? I need to be able to commit changes to my forked repository on github. Thanks in advance. Regards, Prajakta -- ∞ Shashwat Shriparv -- ∞ Shashwat Shriparv
decommissioning datanodes
Hello, I'm in the trying to figure out how to decommission data nodes. Here's what I do: In hdfs-site.xml I have: property namedfs.hosts.exclude/name value/opt/hadoop/hadoop-1.0.0/conf/exclude/value /property Add to exclude file: host1 host2 Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's nothing in the Decommissioning Nodes list). If I look at the datanode logs running on host1 or host2, I still see blocks being copied in and it does not appear that any additional replication was happening. What am I missing during the decommission process? -Chris
Re: decommissioning datanodes
Do you mean the file specified by the 'dfs.hosts' parameter? That is not currently set in my configuration (the hosts are only specified in the slaves file). -Chris On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy serge.blazhiyevs...@nice.com wrote: Your nodes need to be in include and exclude file in the same time Do you use both files? On 6/8/12 11:46 AM, Chris Grier gr...@imchris.org wrote: Hello, I'm in the trying to figure out how to decommission data nodes. Here's what I do: In hdfs-site.xml I have: property namedfs.hosts.exclude/name value/opt/hadoop/hadoop-1.0.0/conf/exclude/value /property Add to exclude file: host1 host2 Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's nothing in the Decommissioning Nodes list). If I look at the datanode logs running on host1 or host2, I still see blocks being copied in and it does not appear that any additional replication was happening. What am I missing during the decommission process? -Chris
Re: decommissioning datanodes
Thanks, this seems to work now. Note that the parameter is 'dfs.hosts' instead of 'dfs.hosts.include'. (Also, the normal caveats like hostnames are case sensitive). -Chris On Fri, Jun 8, 2012 at 12:19 PM, Serge Blazhiyevskyy serge.blazhiyevs...@nice.com wrote: Your config should be something like this: property namedfs.hosts.exclude/name value/opt/hadoop/hadoop-1.0.0/conf/exclude/value /property property namedfs.hosts.include/name value/opt/hadoop/hadoop-1.0.0/conf/include/value /property Add to exclude file: host1 host2 Add to include file host1 host2 Plus the rest of the nodes On 6/8/12 12:15 PM, Chris Grier gr...@imchris.org wrote: Do you mean the file specified by the 'dfs.hosts' parameter? That is not currently set in my configuration (the hosts are only specified in the slaves file). -Chris On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy serge.blazhiyevs...@nice.com wrote: Your nodes need to be in include and exclude file in the same time Do you use both files? On 6/8/12 11:46 AM, Chris Grier gr...@imchris.org wrote: Hello, I'm in the trying to figure out how to decommission data nodes. Here's what I do: In hdfs-site.xml I have: property namedfs.hosts.exclude/name value/opt/hadoop/hadoop-1.0.0/conf/exclude/value /property Add to exclude file: host1 host2 Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's nothing in the Decommissioning Nodes list). If I look at the datanode logs running on host1 or host2, I still see blocks being copied in and it does not appear that any additional replication was happening. What am I missing during the decommission process? -Chris
hbase client security (cluster is secure)
Hi all, I have created a hadoop/hbase/zookeeper cluster that is secured and verified. Now a simple test is to connect an hbase client (e.g, shell) to see its behavior. Well, I get the following message on the hbase master: AccessControlException: authentication is required. Looking at the code it appears that the client passed simple authentication byte in the rpc header. Why, I don't know? My client configuration is as follows: hbase-site.xml: property namehbase.security.authentication/name valuekerberos/value /property property namehbase.rpc.engine/name valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value /property hbase-env.sh: export HBASE_OPTS=$HBASE_OPTS -Djava.security.auth.login.config=/usr/local/hadoop/hbase/conf/hbase.jaas hbase.jaas: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=false useTicketCache=true }; I issue kinit for the client I want to use. Then invoke hbase shell. I simply issue list and see the error on the server. Any ideas what I am doing wrong? Thanks so much! _ From: Tony Dean Sent: Tuesday, June 05, 2012 5:41 PM To: common-user@hadoop.apache.org Subject: hadoop file permission 1.0.3 (security) Can someone detail the options that are available to set file permissions at the hadoop and os level? Here's what I have discovered thus far: dfs.permissions = true|false (works as advertised) dfs.supergroup = supergroup (works as advertised) dfs.umaskmode = umask (I believe this should be used in lieu of dfs.umask) - it appears to set the permissions for files created in hadoop fs (minus execute permission). why was dffs.umask deprecated? what's difference between the 2. dfs.datanode.data.dir.perm = perm (not sure this is working at all?) I thought it was supposed to set permission on blks at the os level. Are there any other file permission configuration properties? What I would really like to do is set data blk file permissions at the os level so that the blocks can be locked down from all users except super and supergroup, but allow it to be used accessed by hadoop API as specified by hdfs permissions. Is this possible? Thanks. Tony Dean SAS Institute Inc. Senior Software Developer 919-531-6704 OLE Object: Picture (Device Independent Bitmap)
Sync and Data Replication
I am wondering the role of sync in replication of data to other nodes. Say client writes a line to a file in Hadoop, at this point file handle is open and sync has not been called. In this scenario is data also replicated as defined by the replication factor to other nodes as well? I am wondering if at this point if crash occurs do I have data in other nodes?
memory usage tasks
silly question, but i have our hadoop slave boxes configured with 7 mappers each, yet i see java 14 process for user mapred on each box. and each process takes up about 2GB, which is equals to my memory allocation (mapred.child.java.opts=-Xmx2048m). so it is using twice as much memory as i expected! why is that?
Compile Hadoop 1.0.3 native library failed on mac 10.7.4
Hello I am trying to compile the hadoop native library on mac os. My Mac OS X is 10.7.4. My Hadoop is 1.0.3 I have installed the zlib 1.2.7 and lzo 2.0.6 like below: ./configure -shared --prefix=/usr/local/[zlib/lzo] make make install I check the /usr/local/zlib-1.2.7 and /usr/local/lzo-2.0.6, the header files and libraries are there. I change the .bash_profile like below export C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/zlib-1.2.7/include:/usr/local/lzo-2.06/include export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/zlib-1.2.7/lib:/usr/local/lzo-2.06/lib export CFLAGS=-arch x86_64 I switch to hadoop folder and run ant -Dcompile.native=true compile-native I got such information like below [exec] checking stddef.h usability... yes [exec] checking stddef.h presence... yes [exec] checking for stddef.h... yes [exec] checking jni.h usability... yes [exec] checking jni.h presence... yes [exec] checking for jni.h... yes [exec] checking zlib.h usability... yes [exec] checking zlib.h presence... yes [exec] checking for zlib.h... yes [exec] checking Checking for the 'actual' dynamic-library for '-lz'... [exec] configure: error: Can't find either 'objdump' or 'ldd' to compute the dynamic library for '-lz' BUILD FAILED Does anyone meet this issue before? Best Regards, --