Re: Nutch hadoop integration

2012-06-08 Thread Nitin Pawar
may be this will help you if you have not already checked it

http://wiki.apache.org/nutch/NutchHadoopTutorial

On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari 
abhishektiwari.u...@gmail.com wrote:

 how can i integrate hadood and nutch ..anyone please brief me .




-- 
Nitin Pawar


Re: Nutch hadoop integration

2012-06-08 Thread Biju Balakrishnan
 how can i integrate hadood and nutch ..anyone please brief me .


Just configure hadoop cluster.
Configure nutch path to store the nuth crawl index and crawl list to hdfs.
Thats it.

-- 
*Biju*


Hadoop-Git-Eclipse

2012-06-08 Thread Prajakta Kalmegh
Hi 

I have done MapReduce programming using Eclipse before but now I need to 
learn the Hadoop code internals for one of my projects. 

I have forked Hadoop from github (https://github.com/apache/hadoop-common 
) and need to configure it to work with Eclipse. All the links I could 
find list steps for earlier versions of Hadoop. I am right now following 
instructions given in these links:
- http://wiki.apache.org/hadoop/GitAndHadoop 
- http://wiki.apache.org/hadoop/EclipseEnvironment 
- http://wiki.apache.org/hadoop/HowToContribute 

Can someone please give me a link to the steps to be followed for getting 
Hadoop (latest from trunk) started in Eclipse? I need to be able to commit 
changes to my forked repository on github. 

Thanks in advance.
Regards,
Prajakta

Re: Nutch hadoop integration

2012-06-08 Thread shashwat shriparv
Check out these links :

http://wiki.apache.org/nutch/NutchHadoopTutorial

http://wiki.apache.org/nutch/NutchTutorial
http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/
http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster

Regards

∞
Shashwat Shriparv

On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari 
abhishektiwari.u...@gmail.com wrote:

 how can i integrate hadood and nutch ..anyone please brief me .




-- 


∞
Shashwat Shriparv


Re: Hadoop-Git-Eclipse

2012-06-08 Thread shashwat shriparv
Check out this link:
http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/

Regards

∞
Shashwat Shriparv




On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote:

 Hi

 I have done MapReduce programming using Eclipse before but now I need to
 learn the Hadoop code internals for one of my projects.

 I have forked Hadoop from github (https://github.com/apache/hadoop-common
 ) and need to configure it to work with Eclipse. All the links I could
 find list steps for earlier versions of Hadoop. I am right now following
 instructions given in these links:
 - http://wiki.apache.org/hadoop/GitAndHadoop
 - http://wiki.apache.org/hadoop/EclipseEnvironment
 - http://wiki.apache.org/hadoop/HowToContribute

 Can someone please give me a link to the steps to be followed for getting
 Hadoop (latest from trunk) started in Eclipse? I need to be able to commit
 changes to my forked repository on github.

 Thanks in advance.
 Regards,
 Prajakta




-- 


∞
Shashwat Shriparv


Hadoop command not found:hdfs and yarn

2012-06-08 Thread Prajakta Kalmegh
Hi

I am trying to execute the following commands for setting up Hadoop:
# Format the namenode
hdfs namenode -format
# Start the namenode
hdfs namenode
# Start a datanode
hdfs datanode

yarn resourcemanager
yarn nodemanager

It gives me a Hadoop Command not found. error for all the commands. When 
I try to use hadoop namenode -format instead, it gives me a deprecated 
command warning. Can someone please tell me if I am missing including any 
env variables? I have included HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, 
HADOOP_MAPRED_HOME, YARN_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR, 
HADOOP_PREFIX in my path (apart from java etc).

I am following the instructions for setting up Hadoop with Eclipse given 
in 
- http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
- 
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

Regards,
Prajakta



Re: Nutch hadoop integration

2012-06-08 Thread abhishek tiwari
http://wiki.apache.org/nutch/NutchHadoopTutorial

above tutorial is not working for me ..
i am using nutch 1.4 .. can u give the steps.. what property i have to set
in nutch-site.xml

On Fri, Jun 8, 2012 at 1:34 PM, shashwat shriparv dwivedishash...@gmail.com
 wrote:

 Check out these links :

 http://wiki.apache.org/nutch/NutchHadoopTutorial

 http://wiki.apache.org/nutch/NutchTutorial
 http://joey.mazzarelli.com/2007/07/25/nutch-and-hadoop-as-user-with-nfs/

 http://stackoverflow.com/questions/5301883/run-nutch-on-existing-hadoop-cluster

 Regards

 ∞
 Shashwat Shriparv

 On Fri, Jun 8, 2012 at 1:29 PM, abhishek tiwari 
 abhishektiwari.u...@gmail.com wrote:

  how can i integrate hadood and nutch ..anyone please brief me .
 



 --


 ∞
 Shashwat Shriparv



AUTO: Prabhat Pandey is out of the office (returning 06/28/2012)

2012-06-08 Thread Prabhat Pandey


I am out of the office until 06/28/2012.

I am out of the office until 06/28/2012.
For any issues please contact Dispatcher: dbqor...@us.ibm.com
Thanks.

Prabhat Pandey


Note: This is an automated response to your message  Nutch hadoop
integration sent on 06/08/2012 1:59:22.

This is the only notification you will receive while this person is away.

Re: Hadoop command not found:hdfs and yarn

2012-06-08 Thread Jagat Singh
Hello ,

Can you quickly review your hadoop install with below page may be you get
some hints to install.

http://jugnu-life.blogspot.in/2012/05/hadoop-20-install-tutorial-023x.html

The depreciated warning is correct as hadoop jobs have been divided now.

Regards,

Jagat Singh

On Fri, Jun 8, 2012 at 2:56 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote:

 Hi

 I am trying to execute the following commands for setting up Hadoop:
 # Format the namenode
 hdfs namenode -format
 # Start the namenode
 hdfs namenode
 # Start a datanode
 hdfs datanode

 yarn resourcemanager
 yarn nodemanager

 It gives me a Hadoop Command not found. error for all the commands. When
 I try to use hadoop namenode -format instead, it gives me a deprecated
 command warning. Can someone please tell me if I am missing including any
 env variables? I have included HADOOP_COMMON_HOME, HADOOP_HDFS_HOME,
 HADOOP_MAPRED_HOME, YARN_HOME, HADOOP_CONF_DIR, YARN_CONF_DIR,
 HADOOP_PREFIX in my path (apart from java etc).

 I am following the instructions for setting up Hadoop with Eclipse given
 in
 - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
 -

 http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

 Regards,
 Prajakta




InvalidJobConfException

2012-06-08 Thread huanchen.zhang
Hi,

Here I'm developing a MapReduce web crawler which reads url lists and writes 
html to MongoDB.
So, each map read one url list file, get the html and insert to MongoDB. There 
is no reduce and no output of map. So, how to set the output directory in this 
case? If I do not set the output directory, it gives me following exception,

Exception in thread main org.apache.hadoop.mapred.InvalidJobConfException: 
Output directory not set.
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


Thank you ! 

Best,
Huanchen
  

2012-06-08 



huanchen.zhang 


Re: InvalidJobConfException

2012-06-08 Thread Harsh J
Hi Huanchen,

Just set your output format class to NullOutputFormat
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/output/NullOutputFormat.html
if you don't need any direct outputs to HDFS/etc. from your M/R
classes.

On Fri, Jun 8, 2012 at 4:16 PM, huanchen.zhang
huanchen.zh...@ipinyou.com wrote:
 Hi,

 Here I'm developing a MapReduce web crawler which reads url lists and writes 
 html to MongoDB.
 So, each map read one url list file, get the html and insert to MongoDB. 
 There is no reduce and no output of map. So, how to set the output directory 
 in this case? If I do not set the output directory, it gives me following 
 exception,

 Exception in thread main org.apache.hadoop.mapred.InvalidJobConfException: 
 Output directory not set.
        at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
        at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
        at 
 com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


 Thank you !

 Best,
 Huanchen


 2012-06-08



 huanchen.zhang



-- 
Harsh J


RE: InvalidJobConfException

2012-06-08 Thread Devaraj k
By default it uses the TextOutputFomat(subclass of FileOutputFormat) which 
checks for output path. 

You can use NullOuputFormat or your custom output format which doesn't do any 
thing for your job.



Thanks
Devaraj


From: huanchen.zhang [huanchen.zh...@ipinyou.com]
Sent: Friday, June 08, 2012 4:16 PM
To: common-user
Subject: InvalidJobConfException

Hi,

Here I'm developing a MapReduce web crawler which reads url lists and writes 
html to MongoDB.
So, each map read one url list file, get the html and insert to MongoDB. There 
is no reduce and no output of map. So, how to set the output directory in this 
case? If I do not set the output directory, it gives me following exception,

Exception in thread main org.apache.hadoop.mapred.InvalidJobConfException: 
Output directory not set.
at 
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:872)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:476)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:506)
at 
com.ipinyou.data.preprocess.mapreduce.ExtractFeatureFromURLJob.main(ExtractFeatureFromURLJob.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)


Thank you !

Best,
Huanchen


2012-06-08



huanchen.zhang


Re: Hadoop-Git-Eclipse

2012-06-08 Thread Deniz Demir
I did not find that screencast useful. This one worked for me:

http://wiki.apache.org/hadoop/EclipseEnvironment

Best,
Deniz

On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:

 Check out this link:
 http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
 
 Regards
 
 ∞
 Shashwat Shriparv
 
 
 
 
 On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.comwrote:
 
 Hi
 
 I have done MapReduce programming using Eclipse before but now I need to
 learn the Hadoop code internals for one of my projects.
 
 I have forked Hadoop from github (https://github.com/apache/hadoop-common
 ) and need to configure it to work with Eclipse. All the links I could
 find list steps for earlier versions of Hadoop. I am right now following
 instructions given in these links:
 - http://wiki.apache.org/hadoop/GitAndHadoop
 - http://wiki.apache.org/hadoop/EclipseEnvironment
 - http://wiki.apache.org/hadoop/HowToContribute
 
 Can someone please give me a link to the steps to be followed for getting
 Hadoop (latest from trunk) started in Eclipse? I need to be able to commit
 changes to my forked repository on github.
 
 Thanks in advance.
 Regards,
 Prajakta
 
 
 
 
 -- 
 
 
 ∞
 Shashwat Shriparv



Re: Hadoop-Git-Eclipse

2012-06-08 Thread Prajakta Kalmegh
Hi

Yes I did configure using the wiki link at
http://wiki.apache.org/hadoop/EclipseEnvironment.
I am facing a new problem while setting up Hadoop in Psuedo-distributed
mode on my laptop.  I am trying to execute the following commands for
setting up Hadoop:
hdfs namenode -format
hdfs namenode
hdfs datanode
yarn resourcemanager
yarn nodemanager

It gives me a Hadoop Common not found. error for all the commands. When I
try to use hadoop namenode -format instead, it gives me a deprecated
command warning.

I am following the instructions for setting up Hadoop with Eclipse given in
- http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
-
http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

This issue is discussed in JIRA 
https://issues.apache.org/jira/browse/HDFS-2014  and is resolved. Not sure
why I am getting the error.

My environment variables look something like:
HADOOP_COMMON_HOME=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT
HADOOP_CONF_DIR=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT/etc/hadoop
HADOOP_HDFS_HOME=/home/Projects/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-3.0.0-SNAPSHOT
HADOOP_MAPRED_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/target/hadoop-mapreduce-3.0.0-SNAPSHOT
YARN_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/target/hadoop-yarn-common-3.0.0-SNAPSHOT
YARN_CONF_DIR=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/conf

I have included them in the PATH. I am trying to build and setup from
apache-hadoop-common git repository (my own cloned fork). Any idea why
'Hadoop Common Not found' error is coming? Do I have to add anything to the
hadoop-config.sh or hdfs-config.sh?

Regards,
Prajakta





Deniz Demir denizde...@me.com
06/08/2012 05:35 PM
Please respond to
common-user@hadoop.apache.org
 To
common-user@hadoop.apache.org,
 cc
 Subject
 Re: Hadoop-Git-Eclipse


I did not find that screencast useful. This one worked for me:

http://wiki.apache.org/hadoop/EclipseEnvironment

Best,
Deniz

On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:

 Check out this link:

http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/

 Regards

 ∞
 Shashwat Shriparv




 On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.com
wrote:

 Hi

 I have done MapReduce programming using Eclipse before but now I need to
 learn the Hadoop code internals for one of my projects.

 I have forked Hadoop from github (https://github.com/apache/hadoop-common
 ) and need to configure it to work with Eclipse. All the links I could
 find list steps for earlier versions of Hadoop. I am right now following
 instructions given in these links:
 - http://wiki.apache.org/hadoop/GitAndHadoop
 - http://wiki.apache.org/hadoop/EclipseEnvironment
 - http://wiki.apache.org/hadoop/HowToContribute

 Can someone please give me a link to the steps to be followed for getting
 Hadoop (latest from trunk) started in Eclipse? I need to be able to
commit
 changes to my forked repository on github.

 Thanks in advance.
 Regards,
 Prajakta




 --


 ∞
 Shashwat Shriparv


Re: Hadoop-Git-Eclipse

2012-06-08 Thread shashwat shriparv
Check out these thread :

http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/22976
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201012.mbox/%3c4cff292d.3090...@corp.mail.ru%3E


On Fri, Jun 8, 2012 at 6:24 PM, Prajakta Kalmegh pkalm...@gmail.com wrote:

 Hi

 Yes I did configure using the wiki link at
 http://wiki.apache.org/hadoop/EclipseEnvironment.
 I am facing a new problem while setting up Hadoop in Psuedo-distributed
 mode on my laptop.  I am trying to execute the following commands for
 setting up Hadoop:
 hdfs namenode -format
 hdfs namenode
 hdfs datanode
 yarn resourcemanager
 yarn nodemanager

 It gives me a Hadoop Common not found. error for all the commands. When I
 try to use hadoop namenode -format instead, it gives me a deprecated
 command warning.

 I am following the instructions for setting up Hadoop with Eclipse given in
 - http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment
 -

 http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

 This issue is discussed in JIRA 
 https://issues.apache.org/jira/browse/HDFS-2014  and is resolved. Not
 sure
 why I am getting the error.

 My environment variables look something like:

 HADOOP_COMMON_HOME=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT

 HADOOP_CONF_DIR=/home/Projects/hadoop-common/hadoop-common-project/hadoop-common/target/hadoop-common-3.0.0-SNAPSHOT/etc/hadoop

 HADOOP_HDFS_HOME=/home/Projects/hadoop-common/hadoop-hdfs-project/hadoop-hdfs/target/hadoop-hdfs-3.0.0-SNAPSHOT

 HADOOP_MAPRED_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/target/hadoop-mapreduce-3.0.0-SNAPSHOT

 YARN_HOME=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/target/hadoop-yarn-common-3.0.0-SNAPSHOT

 YARN_CONF_DIR=/home/Projects/hadoop-common/hadoop-mapreduce-project/hadoop-yarn/conf

 I have included them in the PATH. I am trying to build and setup from
 apache-hadoop-common git repository (my own cloned fork). Any idea why
 'Hadoop Common Not found' error is coming? Do I have to add anything to the
 hadoop-config.sh or hdfs-config.sh?

 Regards,
 Prajakta





 Deniz Demir denizde...@me.com
 06/08/2012 05:35 PM
 Please respond to
 common-user@hadoop.apache.org
  To
 common-user@hadoop.apache.org,
  cc
  Subject
  Re: Hadoop-Git-Eclipse


 I did not find that screencast useful. This one worked for me:

 http://wiki.apache.org/hadoop/EclipseEnvironment

 Best,
 Deniz

 On Jun 8, 2012, at 1:08 AM, shashwat shriparv wrote:

  Check out this link:
 

 http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/
 
  Regards
 
  ∞
  Shashwat Shriparv
 
 
 
 
  On Fri, Jun 8, 2012 at 1:32 PM, Prajakta Kalmegh prkal...@in.ibm.com
 wrote:
 
  Hi
 
  I have done MapReduce programming using Eclipse before but now I need to
  learn the Hadoop code internals for one of my projects.
 
  I have forked Hadoop from github (
 https://github.com/apache/hadoop-common
  ) and need to configure it to work with Eclipse. All the links I could
  find list steps for earlier versions of Hadoop. I am right now following
  instructions given in these links:
  - http://wiki.apache.org/hadoop/GitAndHadoop
  - http://wiki.apache.org/hadoop/EclipseEnvironment
  - http://wiki.apache.org/hadoop/HowToContribute
 
  Can someone please give me a link to the steps to be followed for
 getting
  Hadoop (latest from trunk) started in Eclipse? I need to be able to
 commit
  changes to my forked repository on github.
 
  Thanks in advance.
  Regards,
  Prajakta
 
 
 
 
  --
 
 
  ∞
  Shashwat Shriparv




-- 


∞
Shashwat Shriparv


decommissioning datanodes

2012-06-08 Thread Chris Grier
Hello,

I'm in the trying to figure out how to decommission data nodes. Here's what
I do:

In hdfs-site.xml I have:

property
namedfs.hosts.exclude/name
value/opt/hadoop/hadoop-1.0.0/conf/exclude/value
/property

Add to exclude file:

host1
host2

Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two
nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
nothing in the Decommissioning Nodes list). If I look at the datanode logs
running on host1 or host2, I still see blocks being copied in and it does
not appear that any additional replication was happening.

What am I missing during the decommission process?

-Chris


Re: decommissioning datanodes

2012-06-08 Thread Chris Grier
Do you mean the file specified by the 'dfs.hosts' parameter? That is not
currently set in my configuration (the hosts are only specified in the
slaves file).

-Chris

On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy 
serge.blazhiyevs...@nice.com wrote:

 Your nodes need to be in include and exclude file in the same time


 Do you use both files?

 On 6/8/12 11:46 AM, Chris Grier gr...@imchris.org wrote:

 Hello,
 
 I'm in the trying to figure out how to decommission data nodes. Here's
 what
 I do:
 
 In hdfs-site.xml I have:
 
 property
 namedfs.hosts.exclude/name
 value/opt/hadoop/hadoop-1.0.0/conf/exclude/value
 /property
 
 Add to exclude file:
 
 host1
 host2
 
 Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the two
 nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
 nothing in the Decommissioning Nodes list). If I look at the datanode logs
 running on host1 or host2, I still see blocks being copied in and it does
 not appear that any additional replication was happening.
 
 What am I missing during the decommission process?
 
 -Chris




Re: decommissioning datanodes

2012-06-08 Thread Chris Grier
Thanks, this seems to work now.

Note that the parameter is 'dfs.hosts' instead of 'dfs.hosts.include'.
(Also, the normal caveats like hostnames are case sensitive).

-Chris

On Fri, Jun 8, 2012 at 12:19 PM, Serge Blazhiyevskyy 
serge.blazhiyevs...@nice.com wrote:

 Your config should be something like this:

 property
 namedfs.hosts.exclude/name
 value/opt/hadoop/hadoop-1.0.0/conf/exclude/value
 /property

 property
 namedfs.hosts.include/name
 value/opt/hadoop/hadoop-1.0.0/conf/include/value
 /property



 
 Add to exclude file:
 
 host1
 host2
 



 Add to include file
 host1
 host2
 Plus the rest of the nodes




 On 6/8/12 12:15 PM, Chris Grier gr...@imchris.org wrote:

 Do you mean the file specified by the 'dfs.hosts' parameter? That is not
 currently set in my configuration (the hosts are only specified in the
 slaves file).
 
 -Chris
 
 On Fri, Jun 8, 2012 at 11:56 AM, Serge Blazhiyevskyy 
 serge.blazhiyevs...@nice.com wrote:
 
  Your nodes need to be in include and exclude file in the same time
 
 
  Do you use both files?
 
  On 6/8/12 11:46 AM, Chris Grier gr...@imchris.org wrote:
 
  Hello,
  
  I'm in the trying to figure out how to decommission data nodes. Here's
  what
  I do:
  
  In hdfs-site.xml I have:
  
  property
  namedfs.hosts.exclude/name
  value/opt/hadoop/hadoop-1.0.0/conf/exclude/value
  /property
  
  Add to exclude file:
  
  host1
  host2
  
  Then I run 'hadoop dfsadmin -refreshNodes'. On the web interface the
 two
  nodes now appear in both the 'Live Nodes' and 'Dead Nodes' (but there's
  nothing in the Decommissioning Nodes list). If I look at the datanode
 logs
  running on host1 or host2, I still see blocks being copied in and it
 does
  not appear that any additional replication was happening.
  
  What am I missing during the decommission process?
  
  -Chris
 
 




hbase client security (cluster is secure)

2012-06-08 Thread Tony Dean
Hi all,

I have created a hadoop/hbase/zookeeper cluster that is secured and verified.  
Now a simple test is to connect an hbase client (e.g, shell) to see its 
behavior.

Well, I get the following message on the hbase master: AccessControlException: 
authentication is required.

Looking at the code it appears that the client passed simple authentication 
byte in the rpc header.  Why, I don't know?

My client configuration is as follows:

hbase-site.xml:
   property
  namehbase.security.authentication/name
  valuekerberos/value
   /property

   property
  namehbase.rpc.engine/name
  valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value
   /property

hbase-env.sh:
export HBASE_OPTS=$HBASE_OPTS 
-Djava.security.auth.login.config=/usr/local/hadoop/hbase/conf/hbase.jaas

hbase.jaas:
Client {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=false
   useTicketCache=true
 };

I issue kinit for the client I want to use.  Then invoke hbase shell.  I simply 
issue list and see the error on the server.

Any ideas what I am doing wrong?

Thanks so much!


_
From: Tony Dean
Sent: Tuesday, June 05, 2012 5:41 PM
To: common-user@hadoop.apache.org
Subject: hadoop file permission 1.0.3 (security)


Can someone detail the options that are available to set file permissions at 
the hadoop and os level?  Here's what I have discovered thus far:

dfs.permissions  = true|false (works as advertised)
dfs.supergroup = supergroup (works as advertised)
dfs.umaskmode = umask (I believe this should be used in lieu of dfs.umask) - it 
appears to set the permissions for files created in hadoop fs (minus execute 
permission).
why was dffs.umask deprecated?  what's difference between the 2.
dfs.datanode.data.dir.perm = perm (not sure this is working at all?) I thought 
it was supposed to set permission on blks at the os level.

Are there any other file permission configuration properties?

What I would really like to do is set data blk file permissions at the os level 
so that the blocks can be locked down from all users except super and 
supergroup, but allow it to be used accessed by hadoop API as specified by hdfs 
permissions.  Is this possible?

Thanks.


Tony Dean
SAS Institute Inc.
Senior Software Developer
919-531-6704

  OLE Object: Picture (Device Independent Bitmap) 





Sync and Data Replication

2012-06-08 Thread Mohit Anchlia
I am wondering the role of sync in replication of data to other nodes. Say
client writes a line to a file in Hadoop, at this point file handle is open
and sync has not been called. In this scenario is data also replicated as
defined by the replication factor to other nodes as well? I am wondering if
at this point if crash occurs do I have data in other nodes?


memory usage tasks

2012-06-08 Thread Koert Kuipers
silly question, but i have our hadoop slave boxes configured with 7 mappers
each, yet i see java 14 process for user mapred on each box. and each
process takes up about 2GB, which is equals to my memory allocation
(mapred.child.java.opts=-Xmx2048m). so it is using twice as much memory as
i expected! why is that?


Compile Hadoop 1.0.3 native library failed on mac 10.7.4

2012-06-08 Thread Yongwei Xing
Hello

I am trying to compile the hadoop native library on mac os.

My Mac OS X is 10.7.4. My Hadoop is 1.0.3

I have installed the zlib 1.2.7 and lzo 2.0.6 like below:

./configure -shared --prefix=/usr/local/[zlib/lzo]

make

make install


I check the /usr/local/zlib-1.2.7 and /usr/local/lzo-2.0.6, the header
files and libraries are there.

I change the .bash_profile like below

export
C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/local/zlib-1.2.7/include:/usr/local/lzo-2.06/include

export
LIBRARY_PATH=$LIBRARY_PATH:/usr/local/zlib-1.2.7/lib:/usr/local/lzo-2.06/lib

export CFLAGS=-arch x86_64

I switch to hadoop folder and run

ant -Dcompile.native=true compile-native

I got such information like below

[exec] checking stddef.h usability... yes

 [exec] checking stddef.h presence... yes

 [exec] checking for stddef.h... yes

 [exec] checking jni.h usability... yes

 [exec] checking jni.h presence... yes

 [exec] checking for jni.h... yes

 [exec] checking zlib.h usability... yes

 [exec] checking zlib.h presence... yes

 [exec] checking for zlib.h... yes

 [exec] checking Checking for the 'actual' dynamic-library for '-lz'...

 [exec] configure: error: Can't find either 'objdump' or 'ldd' to
compute the dynamic library for '-lz'


BUILD FAILED

Does anyone meet this issue before?

Best Regards,

--