Fwd: Need help

2009-06-18 Thread ashish pareek
Hello,
I am doing my master my final year project is on Hadoop ...so I
would like to know some thing about Hadoop cluster i.e, Do new version of
Hadoop are able to handle heterogeneous hardware.If you have any
informantion regarding these please mail me as my project is in heterogenous
environment.


Thanks!

Reagrds,
Ashish Pareek


Re: Need help

2009-06-18 Thread ashish pareek
Does that mean hadoop is not scalable wrt heterogeneous environment? and one
more question is can we run different application on the same hadoop cluster
.

Thanks.
Regards,
Ashish

On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop jason.had...@gmail.comwrote:

 Hadoop has always been reasonably agnostic wrt hardware and homogeneity.
 There are optimizations in configuration for near homogeneous machines.



 On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek pareek...@gmail.com
 wrote:

  Hello,
 I am doing my master my final year project is on Hadoop ...so
 I
  would like to know some thing about Hadoop cluster i.e, Do new version of
  Hadoop are able to handle heterogeneous hardware.If you have any
  informantion regarding these please mail me as my project is in
  heterogenous
  environment.
 
 
  Thanks!
 
  Reagrds,
  Ashish Pareek
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals



Re: Need help

2009-06-18 Thread ashish pareek
Can you tell few of the challenges in configuring heterogeneous cluster...or
can pass on some link where I would get some information regarding
challenges in running Hadoop on heterogeneous hardware

One more things is How about running different applications on the same
Hadoop cluster?and what challenges are involved in it ?

Thanks,
Regards,
Ashish


On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop jason.had...@gmail.comwrote:

 I don't know anyone who has a completely homogeneous cluster.

 So hadoop is scalable across heterogeneous environments.

 I stated that configuration is simpler if the machines are similar (There
 are optimizations in configuration for near homogeneous machines.)

 On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek pareek...@gmail.com
 wrote:

  Does that mean hadoop is not scalable wrt heterogeneous environment? and
  one
  more question is can we run different application on the same hadoop
  cluster
  .
 
  Thanks.
  Regards,
  Ashish
 
  On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop jason.had...@gmail.com
  wrote:
 
   Hadoop has always been reasonably agnostic wrt hardware and
 homogeneity.
   There are optimizations in configuration for near homogeneous machines.
  
  
  
   On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek pareek...@gmail.com
   wrote:
  
Hello,
   I am doing my master my final year project is on Hadoop
  ...so
   I
would like to know some thing about Hadoop cluster i.e, Do new
 version
  of
Hadoop are able to handle heterogeneous hardware.If you have any
informantion regarding these please mail me as my project is in
heterogenous
environment.
   
   
Thanks!
   
Reagrds,
Ashish Pareek
   
  
  
  
   --
   Pro Hadoop, a book to guide you from beginner to hadoop mastery,
   http://www.amazon.com/dp/1430219424?tag=jewlerymall
   www.prohadoopbook.com a community for Hadoop Professionals
  
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals



Re: Need help

2009-06-18 Thread ashish pareek
Hello Everybody,

  How can we handle different applications having
different requirement being run on the same hadoop cluster ? What are the
various approaches to solve such problem.. if possible please mention some
of those ideas.

Does such implementation exists ?

Thanks ,

Regards,
Ashish

On Thu, Jun 18, 2009 at 9:36 PM, jason hadoop jason.had...@gmail.comwrote:

 For me, I like to have one configuration file that I distribute to all of
 the machines in my cluster via rsync.

 In there are things like the number of tasks per node to run, and where to
 store dfs data and local temporary data, and the limits to storage for the
 machines.

 If the machines are very different, it becomes important to tailor the
 configuration file per machine or type of machine.

 At this point, you are pretty much going to have to spend the time, reading
 through the details of configuring a hadoop cluster.


 On Thu, Jun 18, 2009 at 8:33 AM, ashish pareek pareek...@gmail.com
 wrote:

  Can you tell few of the challenges in configuring heterogeneous
  cluster...or
  can pass on some link where I would get some information regarding
  challenges in running Hadoop on heterogeneous hardware
 
  One more things is How about running different applications on the same
  Hadoop cluster?and what challenges are involved in it ?
 
  Thanks,
  Regards,
  Ashish
 
 
  On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop jason.had...@gmail.com
  wrote:
 
   I don't know anyone who has a completely homogeneous cluster.
  
   So hadoop is scalable across heterogeneous environments.
  
   I stated that configuration is simpler if the machines are similar
 (There
   are optimizations in configuration for near homogeneous machines.)
  
   On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek pareek...@gmail.com
   wrote:
  
Does that mean hadoop is not scalable wrt heterogeneous environment?
  and
one
more question is can we run different application on the same hadoop
cluster
.
   
Thanks.
Regards,
Ashish
   
On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop 
 jason.had...@gmail.com
wrote:
   
 Hadoop has always been reasonably agnostic wrt hardware and
   homogeneity.
 There are optimizations in configuration for near homogeneous
  machines.



 On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek 
 pareek...@gmail.com
 wrote:

  Hello,
 I am doing my master my final year project is on
 Hadoop
...so
 I
  would like to know some thing about Hadoop cluster i.e, Do new
   version
of
  Hadoop are able to handle heterogeneous hardware.If you have any
  informantion regarding these please mail me as my project is in
  heterogenous
  environment.
 
 
  Thanks!
 
  Reagrds,
  Ashish Pareek
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals

   
  
  
  
   --
   Pro Hadoop, a book to guide you from beginner to hadoop mastery,
   http://www.amazon.com/dp/1430219424?tag=jewlerymall
   www.prohadoopbook.com a community for Hadoop Professionals
  
 



 --
 Pro Hadoop, a book to guide you from beginner to hadoop mastery,
 http://www.amazon.com/dp/1430219424?tag=jewlerymall
 www.prohadoopbook.com a community for Hadoop Professionals



Re: Need help

2009-06-18 Thread Matt Massie
Hadoop can be run on a hardware heterogeneous cluster.  Currently,  
Hadoop clusters really only run well on Linux although you can run a  
Hadoop client on non-Linux machines.


You will need to have a special configuration for each of the machine  
in your cluster based on their hardware profile.  Ideally, you'll be  
able to group the machines in your cluster into classes of machines  
(e.g. machines with 1GB of RAM and 2 core versus 4GB of RAM and 4  
core) to reduce the burden of managing multiple configurations.  If  
you are talking about a Hadoop cluster that is completely  
heterogeneous (each machine is completely different), the management  
overhead could be high.


Configuration variables like mapred.tasktracker.map.tasks.maximum  
and mapred.tasktracker.reduce.tasks.maximum should be set based on  
the number of cores/memory in each machine.  Variables like  
mapred.child.java.opts need to be set differently based on the  
amount of memory the machine has (e.g. -Xmx250m).  You should have  
at least 250MB of memory dedicated to each task although more is  
better.  It's also wise to make sure that each task has the same  
amount of memory regardless of the machine it's scheduled on;  
otherwise, tasks might succeed or fail based on which machine gets the  
task.  This asymmetry will make debugging harder.


You can use our online configurator (http://www.cloudera.com/configurator/ 
), to generate optimized configurations for each class of machines in  
your cluster.  It will ask simple question about your configuration  
and then produce a hadoop-site.xml file.


Good luck!
-Matt

On Jun 18, 2009, at 8:33 AM, ashish pareek wrote:

Can you tell few of the challenges in configuring heterogeneous  
cluster...or

can pass on some link where I would get some information regarding
challenges in running Hadoop on heterogeneous hardware

One more things is How about running different applications on the  
same

Hadoop cluster?and what challenges are involved in it ?

Thanks,
Regards,
Ashish


On Thu, Jun 18, 2009 at 8:53 PM, jason hadoop  
jason.had...@gmail.comwrote:



I don't know anyone who has a completely homogeneous cluster.

So hadoop is scalable across heterogeneous environments.

I stated that configuration is simpler if the machines are similar  
(There

are optimizations in configuration for near homogeneous machines.)

On Thu, Jun 18, 2009 at 8:10 AM, ashish pareek pareek...@gmail.com
wrote:

Does that mean hadoop is not scalable wrt heterogeneous  
environment? and

one
more question is can we run different application on the same hadoop
cluster
.

Thanks.
Regards,
Ashish

On Thu, Jun 18, 2009 at 8:30 PM, jason hadoop  
jason.had...@gmail.com

wrote:



Hadoop has always been reasonably agnostic wrt hardware and

homogeneity.
There are optimizations in configuration for near homogeneous  
machines.




On Thu, Jun 18, 2009 at 7:46 AM, ashish pareek  
pareek...@gmail.com

wrote:


Hello,
  I am doing my master my final year project is on Hadoop

...so

I

would like to know some thing about Hadoop cluster i.e, Do new

version

of

Hadoop are able to handle heterogeneous hardware.If you have any
informantion regarding these please mail me as my project is in
heterogenous
environment.


Thanks!

Reagrds,
Ashish Pareek





--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals







--
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals





Re: I need help

2009-04-30 Thread Steve Loughran

Razen Alharbi wrote:

Thanks everybody,

The issue was that hadoop writes all the outputs to stderr instead of stdout
and i don't know why. I would really love to know why the usual hadoop job
progress is written to stderr.


because there is a line in log4.properties telling it to do just that?

log4j.appender.console.target=System.err


--
Steve Loughran  http://www.1060.org/blogxter/publish/5
Author: Ant in Action   http://antbook.org/


Re: I need help

2009-04-29 Thread Razen Alharbi

Thanks everybody,

The issue was that hadoop writes all the outputs to stderr instead of stdout
and i don't know why. I would really love to know why the usual hadoop job
progress is written to stderr.

Thanks again.

Razen


Razen Alharbi wrote:
 
 Hi all,
 
 I am writing an application in which I create a forked process to execute
 a specific Map/Reduce job. The problem is that when I try to read the
 output stream of the forked process I get nothing and when I execute the
 same job manually it starts printing the output I am expecting. For
 clarification I will go through the simple code snippet:
 
 
 Process p = rt.exec(hadoop jar GraphClean args);
 BufferedReader reader = new BufferedReader(new
 InputStreamReader(p.getInputStream()));
 String line = null;
 check = true;
 while(check){
     line = reader.readLine();
 if(line != null){// I know this will not finish it's only for testing.
         System.out.println(line);
 } 
 }
 
 If I run this code nothing shows up. But if execute the command (hadoop
 jar GraphClean args) from the command line it works fine. I am using
 hadoop 0.19.0.
 
 Thanks,
 
 Razen
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/I-need-help-tp23273273p23307094.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



I need help

2009-04-28 Thread Razen Al Harbi
Hi all,

I am writing an application in which I create a forked process to execute a 
specific Map/Reduce job. The problem is that when I try to read the output 
stream of the forked process I get nothing and when I execute the same job 
manually it starts printing the output I am expecting. For clarification I will 
go through the simple code snippet:


Process p = rt.exec(hadoop jar GraphClean args);
BufferedReader reader = new BufferedReader(new 
InputStreamReader(p.getInputStream()));
String line = null;
check = true;
while(check){
    line = reader.readLine();
if(line != null){// I know this will not finish it's only for testing.
        System.out.println(line);
} 
}

If I run this code nothing shows up. But if execute the command (hadoop jar 
GraphClean args) from the command line it works fine. I am using hadoop 0.19.0.

Thanks,

Razen


  

Re: I need help

2009-04-28 Thread Steve Loughran

Razen Al Harbi wrote:

Hi all,

I am writing an application in which I create a forked process to execute a 
specific Map/Reduce job. The problem is that when I try to read the output 
stream of the forked process I get nothing and when I execute the same job 
manually it starts printing the output I am expecting. For clarification I will 
go through the simple code snippet:


Process p = rt.exec(hadoop jar GraphClean args);
BufferedReader reader = new BufferedReader(new 
InputStreamReader(p.getInputStream()));
String line = null;
check = true;
while(check){
line = reader.readLine();
if(line != null){// I know this will not finish it's only for testing.
System.out.println(line);
} 
}


If I run this code nothing shows up. But if execute the command (hadoop jar 
GraphClean args) from the command line it works fine. I am using hadoop 0.19.0.



Why not just invoke the Hadoop job submission calls yourself, no need to 
exec anything?


Look at org.apache.hadoop.util.RunJar to see what you need to do.

Avoid calling RunJar.main() directly as
 - it calls System.exit() when it wants to exit with an error
 - it adds shutdown hooks

-steve


Re: I need help

2009-04-28 Thread Edward J. Yoon
Hi,

Is that command available for all nodes? Did you try as below? ;)

Process proc = rt.exec(/bin/hostname);
..
output.collect(hostname, disk usage);

On Tue, Apr 28, 2009 at 6:13 PM, Razen Al Harbi razen.alha...@yahoo.com wrote:
 Hi all,

 I am writing an application in which I create a forked process to execute a 
 specific Map/Reduce job. The problem is that when I try to read the output 
 stream of the forked process I get nothing and when I execute the same job 
 manually it starts printing the output I am expecting. For clarification I 
 will go through the simple code snippet:


 Process p = rt.exec(hadoop jar GraphClean args);
 BufferedReader reader = new BufferedReader(new 
 InputStreamReader(p.getInputStream()));
 String line = null;
 check = true;
 while(check){
     line = reader.readLine();
 if(line != null){// I know this will not finish it's only for testing.
         System.out.println(line);
 }
 }

 If I run this code nothing shows up. But if execute the command (hadoop jar 
 GraphClean args) from the command line it works fine. I am using hadoop 
 0.19.0.

 Thanks,

 Razen






-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardy...@apache.org
http://blog.udanax.org


Re: I need help

2009-04-28 Thread Razen Alharbi

Thanks for the reply,

-Steve:
I know that I can use the JobClient to run or submit jobs; however, for the
time being I need to exec the job as a separate process. 

-Edward:
The forked job is not executed from witin a map or reduce so I dont need to
do data collection.

It seems for some reason the output of the reduce tasks is not written to
stdout because when I tried to direct the output to a tmp file using the
following command (hadoop jar GraphClean args  tmp), nothing was written to
the file and the output still goes to the screen.

Regards,

Razen



Razen Alharbi wrote:
 
 Hi all,
 
 I am writing an application in which I create a forked process to execute
 a specific Map/Reduce job. The problem is that when I try to read the
 output stream of the forked process I get nothing and when I execute the
 same job manually it starts printing the output I am expecting. For
 clarification I will go through the simple code snippet:
 
 
 Process p = rt.exec(hadoop jar GraphClean args);
 BufferedReader reader = new BufferedReader(new
 InputStreamReader(p.getInputStream()));
 String line = null;
 check = true;
 while(check){
     line = reader.readLine();
 if(line != null){// I know this will not finish it's only for testing.
         System.out.println(line);
 } 
 }
 
 If I run this code nothing shows up. But if execute the command (hadoop
 jar GraphClean args) from the command line it works fine. I am using
 hadoop 0.19.0.
 
 Thanks,
 
 Razen
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/I-need-help-tp23273273p23284528.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: I need help

2009-04-28 Thread Edward J. Yoon
Why not read the output result after job done? And, if you wanted see
the log4j log, you need to set the stdout option to log4jproperties.

On Wed, Apr 29, 2009 at 4:35 AM, Razen Alharbi razen.alha...@yahoo.com wrote:

 Thanks for the reply,

 -Steve:
 I know that I can use the JobClient to run or submit jobs; however, for the
 time being I need to exec the job as a separate process.

 -Edward:
 The forked job is not executed from witin a map or reduce so I dont need to
 do data collection.

 It seems for some reason the output of the reduce tasks is not written to
 stdout because when I tried to direct the output to a tmp file using the
 following command (hadoop jar GraphClean args  tmp), nothing was written to
 the file and the output still goes to the screen.

 Regards,

 Razen



 Razen Alharbi wrote:

 Hi all,

 I am writing an application in which I create a forked process to execute
 a specific Map/Reduce job. The problem is that when I try to read the
 output stream of the forked process I get nothing and when I execute the
 same job manually it starts printing the output I am expecting. For
 clarification I will go through the simple code snippet:


 Process p = rt.exec(hadoop jar GraphClean args);
 BufferedReader reader = new BufferedReader(new
 InputStreamReader(p.getInputStream()));
 String line = null;
 check = true;
 while(check){
     line = reader.readLine();
 if(line != null){// I know this will not finish it's only for testing.
         System.out.println(line);
 }
 }

 If I run this code nothing shows up. But if execute the command (hadoop
 jar GraphClean args) from the command line it works fine. I am using
 hadoop 0.19.0.

 Thanks,

 Razen





 --
 View this message in context: 
 http://www.nabble.com/I-need-help-tp23273273p23284528.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.





-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardy...@apache.org
http://blog.udanax.org


Re: hadoop need help please suggest

2009-03-26 Thread Snehal Nagmote


Sorry for the inconvenience caused ...I will not spam core dev.
Scale we are thinking in terms of more nodes in coming future can go to
petabytes of data
Can you please give some pointers for handling the same issue, i am quite
new to hadoop

Regards,
Snehal




Raghu Angadi wrote:
 
 
 What is scale you are thinking of? (10s, 100s or more nodes)?
 
 The memory for metadata at NameNode you mentioned is that main issue 
 with small files. There are multiple alternatives for the dealing with 
 that. This issue is discussed many times here.
 
 Also please use core-user@ id alone for asking for help.. you don't need 
 to send to core-devel@
 
 Raghu.
 
 snehal nagmote wrote:
 Hello Sir,
 
 I have some doubts, please help me.
 we have requirement of scalable storage system, we have developed one
 agro-advisory system in which farmers will sent the crop pictures
 particularly in sequential manner some
 6-7 photos of 3-4 kb each would be stored in storage server and these
 photos
 would be read sequentially by scientist to detect the problem, writing to
 images would not be done.
 
 So for storing these images we  are using hadoop file system, is it
 feasible
 to use hadoop
 file system for the same purpose.
 
 As also the images are of only 3-4 kb and hadoop reads the data in blocks
 of
 size 64 mb
 how can we increase the performance, what could be the tricks and tweaks
 that should be done to use hadoop for such kind of purpose.
 
 Next problem is as hadoop stores all the metadata in memory,can we use
 some
 mechanism to store the files in the block of some greater size because as
 the files would be of small size,so it will store the lots metadata and
 will
 overflow the main memory
 please suggest what could be done
 
 
 regards,
 Snehal
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/hadoop-need-help-please-suggest-tp22666530p22721718.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Need Help hdfs -How to minimize access Time

2009-03-25 Thread Brian Bockelman
Hey Snehal (removing the core-dev list; please only post to one at a  
time),


The access time should be fine, but it depends on what you define as  
an acceptable access time.  If this is not acceptable, I'd suggest  
putting it behind a web cache like Squid.  The best way to find out is  
to use the system as a prototype and to evaluate it based on your  
requirements.


Hadoop is useful for small data, but optimized and originally designed  
only for big data.  The primary downfall of the small files is that it  
may cost more per file in terms of memory.  Hadoop as a solution may  
be overkill, however, if your total storage size is never going to  
grow very large.


We currently use HDFS for mostly random access.

Brian

On Mar 25, 2009, at 6:10 AM, snehal nagmote wrote:


Hello Sir,
I am doing mtech in iiit  hyderabad , I am doing research project  
whose aim

is to develop the scalable storage system For esagu.
The esagu is all about taking the crop images from the fields and  
store it
in the filesystem and then those images would be accessed by  
agricultural
scientist to detect the problem, So currently many fields in the  
A.P. are

using this system,it may go beyond A.Pso we require storage system

1)My problem is we are using hadoop for the storage, but hadoop  
retrieves
(reads/writes) in 64 mb chunk . these images stored would be very  
small size

say max 2 to 3 mb, So access time would be larger in case of accessing
images, Can you suggest how this access time can be reduced.Is there
anyother thing we could do to improve the performance like building  
our own

cache, To what extent it would be feasible or helpful in such kind of
application.
2)Second is does hadoop would be useful for small small data like  
this, if
not  what tricks we could do to make it usable for such knid of  
application


Please help, Thanks in advance



Regards,
Snehal Nagmote
IIIT Hyderabad




Re: hadoop need help please suggest

2009-03-24 Thread Raghu Angadi


What is scale you are thinking of? (10s, 100s or more nodes)?

The memory for metadata at NameNode you mentioned is that main issue 
with small files. There are multiple alternatives for the dealing with 
that. This issue is discussed many times here.


Also please use core-user@ id alone for asking for help.. you don't need 
to send to core-devel@


Raghu.

snehal nagmote wrote:

Hello Sir,

I have some doubts, please help me.
we have requirement of scalable storage system, we have developed one
agro-advisory system in which farmers will sent the crop pictures
particularly in sequential manner some
6-7 photos of 3-4 kb each would be stored in storage server and these photos
would be read sequentially by scientist to detect the problem, writing to
images would not be done.

So for storing these images we  are using hadoop file system, is it feasible
to use hadoop
file system for the same purpose.

As also the images are of only 3-4 kb and hadoop reads the data in blocks of
size 64 mb
how can we increase the performance, what could be the tricks and tweaks
that should be done to use hadoop for such kind of purpose.

Next problem is as hadoop stores all the metadata in memory,can we use some
mechanism to store the files in the block of some greater size because as
the files would be of small size,so it will store the lots metadata and will
overflow the main memory
please suggest what could be done


regards,
Snehal





Need Help hdfs -How to minimize access Time

2009-03-24 Thread snehal nagmote
Hello Sir,
I am doing mtech in iiit  hyderabad , I am doing research project whose aim
is to develop the scalable storage system For esagu.
The esagu is all about taking the crop images from the fields and store it
in the filesystem and then those images would be accessed by agricultural
scientist to detect the problem, So currently many fields in the A.P. are
using this system,it may go beyond A.Pso we require storage system

1)My problem is we are using hadoop for the storage, but hadoop retrieves
(reads/writes) in 64 mb chunk . these images stored would be very small size
say max 2 to 3 mb, So access time would be larger in case of accessing
images, Can you suggest how this access time can be reduced.Is there
anyother thing we could do to improve the performance like building our own
cache, To what extent it would be feasible or helpful in such kind of
application.
2)Second is does hadoop would be useful for small small data like this, if
not  what tricks we could do to make it usable for such knid of application

Please help, Thanks in advance



Regards,
Snehal Nagmote
IIIT Hyderabad


hadoop need help please suggest

2009-03-23 Thread snehal nagmote
Hello Sir,

I have some doubts, please help me.
we have requirement of scalable storage system, we have developed one
agro-advisory system in which farmers will sent the crop pictures
particularly in sequential manner some
6-7 photos of 3-4 kb each would be stored in storage server and these photos
would be read sequentially by scientist to detect the problem, writing to
images would not be done.

So for storing these images we  are using hadoop file system, is it feasible
to use hadoop
file system for the same purpose.

As also the images are of only 3-4 kb and hadoop reads the data in blocks of
size 64 mb
how can we increase the performance, what could be the tricks and tweaks
that should be done to use hadoop for such kind of purpose.

Next problem is as hadoop stores all the metadata in memory,can we use some
mechanism to store the files in the block of some greater size because as
the files would be of small size,so it will store the lots metadata and will
overflow the main memory
please suggest what could be done


regards,
Snehal


extreme nubbie need help setting up hadoop

2009-02-03 Thread bjday

Good afternoon all,

I work tech and an extreme nubbie at hadoop.  I could sure use some 
help.  I have a professor wanting hadoop installed on multiple Linux 
computers in a lab.  The computers are running CentOS 5.  I know i have 
something configured wrong and am not sure where to go.  I am following 
the instructions at 
http://www.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/   I get to the 
part Testing Your Hadoop Cluster but when i use the command hadoop jar 
hadoop-*-examples.jar grep input output 'dfs[a-z.]+'   It hangs.  Could 
anyone be kind enough to point me to a step by step instillation and 
configuration website?


Thank you
Brian


Re: extreme nubbie need help setting up hadoop

2009-02-03 Thread Ravindra Phulari
Hello Brian ,
  Here is the Hadoop project Wiki link which covers detailed Hadoop 
setup and running your first program on Single node as well as on multiple 
nodes .
   Also below are some more useful links to start understanding and using 
Hadoop.

http://hadoop.apache.org/core/docs/current/quickstart.html
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

 If you still have difficulties running Hadoop based programs please reply 
with error output so that experts can comment.
-
Ravi


On 2/3/09 10:08 AM, bjday bj...@cse.usf.edu wrote:

Good afternoon all,

I work tech and an extreme nubbie at hadoop.  I could sure use some
help.  I have a professor wanting hadoop installed on multiple Linux
computers in a lab.  The computers are running CentOS 5.  I know i have
something configured wrong and am not sure where to go.  I am following
the instructions at
http://www.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/   I get to the
part Testing Your Hadoop Cluster but when i use the command hadoop jar
hadoop-*-examples.jar grep input output 'dfs[a-z.]+'   It hangs.  Could
anyone be kind enough to point me to a step by step instillation and
configuration website?

Thank you
Brian


Ravi Phulari
Yahoo! IM : ravescorp |Office Phone: (408)-336-0806 |
--



Need help regarding DataNodeCluster

2008-12-22 Thread Ramya R
Hello, 

I tried using DataNodeCluster to simulate a set of 3000 datanodes. The
namenode is already running.

When I gave the below command :

$ java DataNodeCluster -n 3000 -simulated -inject 1 1 -d someDirectory

 

I am getting the following error:

Starting 3000 Simulated  Data Nodes that will connect to Name Node at
gs301850.inktomisearch.com:50830

Starting DataNode 0 with dfs.data.dir:
someDirectory/dfs/data/data1,someDirectory/dfs/data/data2

08/12/22 11:42:33 INFO datanode.DataNode: Registered
FSDatasetStatusMBean

08/12/22 11:42:33 INFO datanode.DataNode: Opened info server at 60195

08/12/22 11:42:33 INFO datanode.DataNode: Balancing bandwith is 1048576
bytes/s

08/12/22 11:42:33 INFO datanode.DataNode: Periodic Block Verification is
disabled because verifcation is supported only with FSDataset.

08/12/22 11:42:33 INFO http.HttpServer: Version Jetty/5.1.4

08/12/22 11:42:33 INFO util.Container: Started
HttpContext[/static,/static]

08/12/22 11:42:34 INFO util.Credential: Checking Resource aliases

08/12/22 11:42:34 INFO util.Container: Started
org.mortbay.jetty.servlet.webapplicationhand...@1f03691

08/12/22 11:42:34 INFO http.SocketListener: Started SocketListener on
127.0.0.1:60196

08/12/22 11:42:34 INFO datanode.DataNode: Waiting for threadgroup to
exit, active threads is 0

Error creating data node:java.io.IOException: Problem starting http
server

 

The system just hangs after the above message.

 

Can anyone please let me know why I am getting the above error? 

 

Thanks in advance

Ramya

 



RE: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-18 Thread souravm
Hi Mafish,

Thanks for your suggestions.

Finally I could resolve the issue. The *site.xml in namenode had 
ds.default.name as localhost where as in data nodes it were the actual ip. I 
changed the local host to actual ip in name node and it started working.

Regards,
Sourav

-Original Message-
From: Mafish Liu [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 16, 2008 7:37 PM
To: core-user@hadoop.apache.org
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

Hi, souravm:
  I don't know exactly what's wrong with your configuration from your post
and I guest the possible causes are:

  1. Make sure firewall on namenode is off or the port of 9000 is free to
connect in your firewall configuration.

  2. Namenode. Check the namenode start up log to see if namenode starts up
correctly, or try run 'jps' on your namenode to see if there is process
called namenode.

May this help.


On Tue, Sep 16, 2008 at 10:41 PM, souravm [EMAIL PROTECTED] wrote:

 Hi,

 Tha namenode in machine 1 has started. I can see the following log. Is
 there a specific way to provide the master name in masters file (in
 hadoop/conf) in datanode ? I've currently specified

 2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
 Initializing RPC Metrics with hostName=NameNode, port=9000
 2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up
 at: localhost/127.0.0.1:9000
 2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=NameNode, sessionId=null
 2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics:
 Initializing NameNodeMeterics using context
 object:org.apache.hadoop.metrics.spi.NullContext
 2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem:
 fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin
 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem:
 supergroup=supergroup
 2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem:
 isPermissionEnabled=true
 2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished
 loading FSImage in 112 msecs
 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE*
 Leaving safe mode after 0 secs.
 2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE*
 Network topology has 0 racks and 0 datanodes
 2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE*
 UnderReplicatedBlocks has 0 blocks
 2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered
 FSNamesystemStatusMBean
 2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource
 aliases
 2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version
 Jetty/5.1.4
 2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started
 [EMAIL PROTECTED]
 2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started
 WebApplicationContext[/,/]
 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started
 HttpContext[/logs,/logs]
 2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started
 HttpContext[/static,/static]
 2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started
 SocketListener on 0.0.0.0:50070
 2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started
 [EMAIL PROTECTED]
 2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server
 up at: 0.0.0.0:50070
 2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server
 Responder: starting
 2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server
 listener on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 0 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 1 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 2 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 3 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 4 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 5 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 6 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 7 on 9000: starting
 2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 8 on 9000: starting
 2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 9 on 9000: starting

 Is there a specific way to provide the master name in masters file (in
 hadoop/conf) in datanode ? I've currently specified username@namenode
 server ip. I'm thinking there might be a problem as in log file of data
 node I can see the message '2008-09-16 14:38:51,501 INFO
 org.apache.hadoop.ipc.RPC: Server at /192.168.1.102:9000 not available
 yet, Z...'

 Any help ?

 Regards,
 Sourav

Re: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread Mafish Liu
Hi:
  You need to configure your nodes to ensure that node 1 can connect to node
2 without password.

On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote:

 Hi All,

 I'm facing a problem in configuring hdfs in a fully distributed way in Mac
 OSX.

 Here is the topology -

 1. The namenode is in machine 1
 2. There is 1 datanode in machine 2

 Now when I execute start-dfs.sh from machine 1, it connects to machine 2
 (after it asks for password for connecting to machine 2) and starts datanode
 in machine 2 (as the console message says).

 However -
 1. When I go to http://machine1:50070 - it does not show the data node at
 all. It says 0 data node configured
 2. In the log file in machine 2 what I see is -
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.17.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r
 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
 /
 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 1 time(s).
 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 2 time(s).
 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 3 time(s).
 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 4 time(s).
 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 5 time(s).
 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 6 time(s).
 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 7 time(s).
 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 8 time(s).
 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 9 time(s).
 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 10 time(s).
 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /
 17.229.23.77:9000 not available yet, Z...

 ... and this retyring gets on repeating


 The  hadoop-site.xmls are like this -

 1. In machine 1
 -
 configuration

  property
namefs.default.name/name
valuehdfs://localhost:9000/value
  /property

   property
namedfs.name.dir/name
value/Users/souravm/hdpn/value
  /property

  property
namemapred.job.tracker/name
valuelocalhost:9001/value
  /property
  property
namedfs.replication/name
value1/value
  /property
 /configuration


 2. In machine 2

 configuration

  property
namefs.default.name/name
valuehdfs://machine1 ip:9000/value
  /property
  property
namedfs.data.dir/name
value/Users/nirdosh/hdfsd1/value
  /property
  property
namedfs.replication/name
value1/value
  /property
 /configuration

 The slaves file in machine 1 has single entry - user name@ip of
 machine2

 The exact steps I did -

 1. Reformat the namenode in machine 1
 2. execute start-dfs.sh in machine 1
 3. Then I try to see whether the datanode is created through http://machine
 1 ip:50070

 Any pointer to resolve this issue would be appreciated.

 Regards,
 Sourav



  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has
 taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content of all messages sent to or from
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.


Re: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread souravm
Hi,

I tried the way u suggested. I setup ssh without password. So now namenode can 
connect to datanode without password - the start-dfs.sh script does not ask for 
any password. However, even with this fix I still face the same problem.

Regards,
Sourav

- Original Message -
From: Mafish Liu [EMAIL PROTECTED]
To: core-user@hadoop.apache.org core-user@hadoop.apache.org
Sent: Mon Sep 15 23:26:10 2008
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

Hi:
  You need to configure your nodes to ensure that node 1 can connect to node
2 without password.

On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote:

 Hi All,

 I'm facing a problem in configuring hdfs in a fully distributed way in Mac
 OSX.

 Here is the topology -

 1. The namenode is in machine 1
 2. There is 1 datanode in machine 2

 Now when I execute start-dfs.sh from machine 1, it connects to machine 2
 (after it asks for password for connecting to machine 2) and starts datanode
 in machine 2 (as the console message says).

 However -
 1. When I go to http://machine1:50070 - it does not show the data node at
 all. It says 0 data node configured
 2. In the log file in machine 2 what I see is -
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.17.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r
 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
 /
 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 1 time(s).
 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 2 time(s).
 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 3 time(s).
 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 4 time(s).
 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 5 time(s).
 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 6 time(s).
 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 7 time(s).
 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 8 time(s).
 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 9 time(s).
 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 10 time(s).
 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /
 17.229.23.77:9000 not available yet, Z...

 ... and this retyring gets on repeating


 The  hadoop-site.xmls are like this -

 1. In machine 1
 -
 configuration

  property
namefs.default.name/name
valuehdfs://localhost:9000/value
  /property

   property
namedfs.name.dir/name
value/Users/souravm/hdpn/value
  /property

  property
namemapred.job.tracker/name
valuelocalhost:9001/value
  /property
  property
namedfs.replication/name
value1/value
  /property
 /configuration


 2. In machine 2

 configuration

  property
namefs.default.name/name
valuehdfs://machine1 ip:9000/value
  /property
  property
namedfs.data.dir/name
value/Users/nirdosh/hdfsd1/value
  /property
  property
namedfs.replication/name
value1/value
  /property
 /configuration

 The slaves file in machine 1 has single entry - user name@ip of
 machine2

 The exact steps I did -

 1. Reformat the namenode in machine 1
 2. execute start-dfs.sh in machine 1
 3. Then I try to see whether the datanode is created through http://machine
 1 ip:50070

 Any pointer to resolve this issue would be appreciated.

 Regards,
 Sourav



  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has
 taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content

Re: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread Samuel Guo
check the namenode's log in machine1 to see if your namenode started
successfully :)

On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote:

 Hi All,

 I'm facing a problem in configuring hdfs in a fully distributed way in Mac
 OSX.

 Here is the topology -

 1. The namenode is in machine 1
 2. There is 1 datanode in machine 2

 Now when I execute start-dfs.sh from machine 1, it connects to machine 2
 (after it asks for password for connecting to machine 2) and starts datanode
 in machine 2 (as the console message says).

 However -
 1. When I go to http://machine1:50070 - it does not show the data node at
 all. It says 0 data node configured
 2. In the log file in machine 2 what I see is -
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = rc0902b-dhcp169.apple.com/17.229.22.169
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.17.2.1
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r
 684969; compiled by 'oom' on Wed Aug 20 22:29:32 UTC 2008
 /
 2008-09-15 18:54:44,626 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 1 time(s).
 2008-09-15 18:54:45,627 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 2 time(s).
 2008-09-15 18:54:46,628 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 3 time(s).
 2008-09-15 18:54:47,629 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 4 time(s).
 2008-09-15 18:54:48,630 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 5 time(s).
 2008-09-15 18:54:49,631 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 6 time(s).
 2008-09-15 18:54:50,632 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 7 time(s).
 2008-09-15 18:54:51,633 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 8 time(s).
 2008-09-15 18:54:52,635 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 9 time(s).
 2008-09-15 18:54:53,640 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: /17.229.23.77:9000. Already tried 10 time(s).
 2008-09-15 18:54:54,641 INFO org.apache.hadoop.ipc.RPC: Server at /
 17.229.23.77:9000 not available yet, Z...

 ... and this retyring gets on repeating


 The  hadoop-site.xmls are like this -

 1. In machine 1
 -
 configuration

  property
namefs.default.name/name
valuehdfs://localhost:9000/value
  /property

   property
namedfs.name.dir/name
value/Users/souravm/hdpn/value
  /property

  property
namemapred.job.tracker/name
valuelocalhost:9001/value
  /property
  property
namedfs.replication/name
value1/value
  /property
 /configuration


 2. In machine 2

 configuration

  property
namefs.default.name/name
valuehdfs://machine1 ip:9000/value
  /property
  property
namedfs.data.dir/name
value/Users/nirdosh/hdfsd1/value
  /property
  property
namedfs.replication/name
value1/value
  /property
 /configuration

 The slaves file in machine 1 has single entry - user name@ip of
 machine2

 The exact steps I did -

 1. Reformat the namenode in machine 1
 2. execute start-dfs.sh in machine 1
 3. Then I try to see whether the datanode is created through http://machine
 1 ip:50070

 Any pointer to resolve this issue would be appreciated.

 Regards,
 Sourav



  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has
 taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content of all messages sent to or from
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***



RE: Need help in hdfs configuration fully distributed way in Mac OSX...

2008-09-16 Thread souravm
Hi,

Tha namenode in machine 1 has started. I can see the following log. Is there a 
specific way to provide the master name in masters file (in hadoop/conf) in 
datanode ? I've currently specified

2008-09-16 07:23:46,321 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=9000
2008-09-16 07:23:46,325 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: 
localhost/127.0.0.1:9000
2008-09-16 07:23:46,327 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2008-09-16 07:23:46,329 INFO org.apache.hadoop.dfs.NameNodeMetrics: 
Initializing NameNodeMeterics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2008-09-16 07:23:46,404 INFO org.apache.hadoop.fs.FSNamesystem: 
fsOwner=souravm,souravm,_lpadmin,_appserveradm,_appserverusr,admin
2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: 
supergroup=supergroup
2008-09-16 07:23:46,405 INFO org.apache.hadoop.fs.FSNamesystem: 
isPermissionEnabled=true
2008-09-16 07:23:46,473 INFO org.apache.hadoop.fs.FSNamesystem: Finished 
loading FSImage in 112 msecs
2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Leaving 
safe mode after 0 secs.
2008-09-16 07:23:46,475 INFO org.apache.hadoop.dfs.StateChange: STATE* Network 
topology has 0 racks and 0 datanodes
2008-09-16 07:23:46,480 INFO org.apache.hadoop.dfs.StateChange: STATE* 
UnderReplicatedBlocks has 0 blocks
2008-09-16 07:23:46,486 INFO org.apache.hadoop.fs.FSNamesystem: Registered 
FSNamesystemStatusMBean
2008-09-16 07:23:46,561 INFO org.mortbay.util.Credential: Checking Resource 
aliases
2008-09-16 07:23:46,627 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2008-09-16 07:23:46,907 INFO org.mortbay.util.Container: Started [EMAIL 
PROTECTED]
2008-09-16 07:23:46,937 INFO org.mortbay.util.Container: Started 
WebApplicationContext[/,/]
2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started 
HttpContext[/logs,/logs]
2008-09-16 07:23:46,938 INFO org.mortbay.util.Container: Started 
HttpContext[/static,/static]
2008-09-16 07:23:46,939 INFO org.mortbay.http.SocketListener: Started 
SocketListener on 0.0.0.0:50070
2008-09-16 07:23:46,939 INFO org.mortbay.util.Container: Started [EMAIL 
PROTECTED]
2008-09-16 07:23:46,940 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up 
at: 0.0.0.0:50070
2008-09-16 07:23:46,940 INFO org.apache.hadoop.ipc.Server: IPC Server 
Responder: starting
2008-09-16 07:23:46,942 INFO org.apache.hadoop.ipc.Server: IPC Server listener 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 9000: starting
2008-09-16 07:23:46,943 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 
on 9000: starting
2008-09-16 07:23:46,944 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 9000: starting

Is there a specific way to provide the master name in masters file (in 
hadoop/conf) in datanode ? I've currently specified username@namenode server 
ip. I'm thinking there might be a problem as in log file of data node I can 
see the message '2008-09-16 14:38:51,501 INFO org.apache.hadoop.ipc.RPC: Server 
at /192.168.1.102:9000 not available yet, Z...'

Any help ?

Regards,
Sourav



From: Samuel Guo [EMAIL PROTECTED]
Sent: Tuesday, September 16, 2008 5:49 AM
To: core-user@hadoop.apache.org
Subject: Re: Need help in hdfs configuration fully distributed way in Mac OSX...

check the namenode's log in machine1 to see if your namenode started
successfully :)

On Tue, Sep 16, 2008 at 2:04 PM, souravm [EMAIL PROTECTED] wrote:

 Hi All,

 I'm facing a problem in configuring hdfs in a fully distributed way in Mac
 OSX.

 Here is the topology -

 1. The namenode is in machine 1
 2. There is 1 datanode in machine 2

 Now when I execute start-dfs.sh from machine 1, it connects to machine 2
 (after it asks for password for connecting to machine 2) and starts datanode
 in machine 2 (as the console message says).

 However -
 1. When I go to http://machine1:50070 - it does not show the data node at
 all. It says 0 data node configured
 2. In the log file in machine 2 what I see is -
 /
 STARTUP_MSG: Starting DataNode

Re: Need help to setup Hadoop on Fedora Core 6

2008-08-19 Thread Sandy
I tried this. Frankly, the hardest part was getting Java set up on that
machine. GIJ got in the way of -everything-, causing me much frustration and
furious anger. Even if you install sun java, it's possible that all the
symbolic links don't point to sun java, but rather gij. I'm not sure if this
is the case for you, but if you do a:

/usr/sbin/alternatives --config java

and you don't see two options, something is messed up.  Also, the ubuntu
tutorial on how to setup hadoop on a single node can be applied to fedora..
you just need to find the associated fedora packages, and use yum.

Hope this helps.

-SM

PS - How my Fedora saga ended: Since I had the option of reformatting, and
Fedora and I have had lots of disagreements in the past, I switched to a
linux distro I was more comfortable with and took it from there. Best of
luck.


On Thu, Jul 24, 2008 at 6:03 PM, hadoop hadoop-chetan 
[EMAIL PROTECTED] wrote:

 Hello Folks

   I somebody has successfully installed Hadoop on FC 6, Please Help
 !!!

   Just bootstrapping into the Haddop madness and was attempting to install
 hadoop on Fedora Core 6.
   Tried all sorts of things but couldn't get past this error which is not
 starting the reduce tasks

 2008-07-24 13:04:06,642 INFO org.apache.hadoop.mapred.TaskInProgress: Error
 from task_200807241301_0001_r_00_0: java.lang.NullPointerException
at java.util.Hashtable.get(Hashtable.java:334)
at

 org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1103)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:328)
at
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)


 Before you ask, here are the details:

  1. Running hadoop as a single node cluster
  2. Disabled IPV6
  3. Using Hadoop version */hadoop-0.17.1/*
  4. enabled ssh to access local machine
  5. Master and Slaves are set to localhost
  6. Created simple sample file and loaded into DFS
  7. Encountered error when I was running the sample with the wordcount
 example provided with the package
  8. Here is my hadoop-site.xml

  configuration

 property
  namehadoop.tmp.dir/name
  value/tmp/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
 /property

 property
  namefs.default.name/name
  valuehdfs://localhost:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
 /property

 property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
 /property

  property
   namemapred.map.tasks/name
   value1/value
   description
 define mapred.map tasks to be number of slave hosts
   /description
  /property

  property
   namemapred.reduce.tasks/name
  value1/value
   description
 define mapred.reduce tasks to be number of slave hosts
   /description
  /property

 property
  namedfs.replication/name
  value1/value
  descriptionDefault block replication.
  The actual number of replications can be specified when the file is
 created.
  The default is used if replication is not specified in create time.
  /description
 /property

 property
   namemapred.child.java.opts/name
   value-Xmx1800m/value
   descriptionJava opts for the task tracker child processes.
   The following symbol, if present, will be interpolated: @taskid@ is
  replaced by current TaskID. Any other occurrences of '@' will go
 unchanged.
  For example, to enable verbose gc logging to a file named for the taskid
 in
  /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
  -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED]
   /description
 /property


 /configuration



Re: Configuration: I need help.

2008-08-07 Thread Steve Loughran

Allen Wittenauer wrote:

On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

You can put the same hadoop-site.xml on all machines.  Yes, you do want a
secondary NN - a single NN is a SPOF.  Browser the archives a few days back to
find an email from Paul about DRBD (disk replication) to avoid this SPOF.


Keep in mind that even with a secondary name node, you still have a
SPOF.  If the NameNode process dies, so does your HDFS. 



There's always a SPOF. it just moves. Sometimes it moves out of your own 
infrastructure, and then you have big problems :)


Configuration: I need help.

2008-08-06 Thread James Graham (Greywolf)

Seeing as there is no search function on the archives, I'm relegated
to asking a possibly redundant question or four:

I have, as a sample setup:

idx1-trackerJobTracker
idx2-namenode   NameNode
idx3-slave  DataTracker
...
idx20-slave DataTracker

Q1: Can I put the same hadoop-site.xml file on all machines or do I need
to configure each machine separately?

Q2: My current setup does not seem to find a primary namenode, but instead
wants to put idx1 and idx2 as secondary namenodes; as a result, I am
not getting anything usable on any of the web addresses (50030, 50050,
50070, 50090).

Q3: Possibly connected to Q1:  The current setup seems to go out and start
on all machines (masters/slaves); when I say bin/start-mapred.sh on
the JobTracker, I get the answer jobtracker running...kill it first.

Q4: Do I even *need* a secondary namenode?

IWBN if I did not have to maintain three separate configuration files
(jobtracker/namenode/datatracker).
--
James Graham (Greywolf)   |
650.930.1138|925.768.4053 *
[EMAIL PROTECTED] |
Check out what people are saying about SearchMe! -- click below
http://www.searchme.com/stack/109aa


Re: Configuration: I need help.

2008-08-06 Thread Otis Gospodnetic
Hi James,

You can put the same hadoop-site.xml on all machines.  Yes, you do want a 
secondary NN - a single NN is a SPOF.  Browser the archives a few days back to 
find an email from Paul about DRBD (disk replication) to avoid this SPOF.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: James Graham (Greywolf) [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Wednesday, August 6, 2008 1:37:20 PM
 Subject: Configuration: I need help.
 
 Seeing as there is no search function on the archives, I'm relegated
 to asking a possibly redundant question or four:
 
 I have, as a sample setup:
 
 idx1-trackerJobTracker
 idx2-namenode   NameNode
 idx3-slave  DataTracker
 ...
 idx20-slaveDataTracker
 
 Q1: Can I put the same hadoop-site.xml file on all machines or do I need
  to configure each machine separately?
 
 Q2: My current setup does not seem to find a primary namenode, but instead
  wants to put idx1 and idx2 as secondary namenodes; as a result, I am
  not getting anything usable on any of the web addresses (50030, 
 50050,
  50070, 50090).
 
 Q3: Possibly connected to Q1:  The current setup seems to go out and start
  on all machines (masters/slaves); when I say bin/start-mapred.sh on
  the JobTracker, I get the answer jobtracker running...kill it 
 first.
 
 Q4: Do I even *need* a secondary namenode?
 
 IWBN if I did not have to maintain three separate configuration files
 (jobtracker/namenode/datatracker).
 -- 
 James Graham (Greywolf)  |
 650.930.1138|925.768.4053  *
 [EMAIL PROTECTED]  |
 Check out what people are saying about SearchMe! -- click below
 http://www.searchme.com/stack/109aa



Re: Configuration: I need help.

2008-08-06 Thread James Graham (Greywolf)

Thus spake Otis Gospodnetic::

Hi James,

You can put the same hadoop-site.xml on all machines. Yes, you do want a 
secondary NN - a single NN is a SPOF. Browser the archives a few days 
back to find an email from Paul about DRBD (disk replication) to avoid 
this SPOF.


Okay, thank you!  good to know (even though the documentation seems to state
that secondary (NN) is a misnomer, since it never takes over for the primary
NN.

Now I have something interesting going on.  Given the following configuration
file, what am I doing wrong?  When I type start-dfs.sh on the namenode,
as instructed in the docs, I end up with, effectively, Address already in use;
shutting down NameNode.

I do not understand this.  It's like it's trying to start it twice; netstat
shows no port 50070 in use after shutdown.

I feel like an idiot trying to wrap my mind around this!  What the heck am
I doing wrong?


configuration
!-- HOST:PORT MAPPINGS --
property
 namedfs.secondary.http.address/name
 value0.0.0.0:50090/value
 description
   The secondary namenode http server address and port.
   If the port is 0 then the server will start on a free port.
 /description
/property

property
 namedfs.datanode.address/name
 value0.0.0.0:50010/value
 description
   The address where the datanode server will listen to.
   If the port is 0 then the server will start on a free port.
 /description
/property

property
 namedfs.datanode.http.address/name
 value0.0.0.0:50075/value
 description
   The datanode http server address and port.
   If the port is 0 then the server will start on a free port.
 /description
/property

property
 namedfs.http.address/name
 valueidx2-r70:50070/value
 description
   The address and the base port where the dfs namenode web ui will listen on.
   If the port is 0 then the server will start on a free port.
 /description
/property

property
 namemapred.job.tracker/name
 valueidx1-r70:50030/value
 descriptionThe host and port that the MapReduce job tracker runs
 at.  If local, then jobs are run in-process as a single map
 and reduce task.
 /description
/property

property
 namemapred.job.tracker.http.address/name
 valueidx1-r70:50030/value
 description
   The job tracker http server address and port the server will listen on.
   If the port is 0 then the server will start on a free port.
 /description
/property


property
 namefs.default.name/name
 valuehdfs://idx2-r70:50070//value
 descriptionThe name of the default file system.  A URI whose
 scheme and authority determine the FileSystem implementation.  The
 uri's scheme determines the config property (fs.SCHEME.impl) naming
 the FileSystem implementation class.  The uri's authority is used to
 determine the host, port, etc. for a filesystem./description
/property

/configuration
###
--
James Graham (Greywolf)   |
650.930.1138|925.768.4053 *
[EMAIL PROTECTED] |
Check out what people are saying about SearchMe! -- click below
http://www.searchme.com/stack/109aa


Re: Configuration: I need help.

2008-08-06 Thread Allen Wittenauer
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 You can put the same hadoop-site.xml on all machines.  Yes, you do want a
 secondary NN - a single NN is a SPOF.  Browser the archives a few days back to
 find an email from Paul about DRBD (disk replication) to avoid this SPOF.

Keep in mind that even with a secondary name node, you still have a
SPOF.  If the NameNode process dies, so does your HDFS. 



Re: Configuration: I need help.

2008-08-06 Thread James Graham (Greywolf)

Thus spake James Graham (Greywolf)::


Now I have something interesting going on. Given the following configuration
file, what am I doing wrong? When I type start-dfs.sh on the namenode,
as instructed in the docs, I end up with, effectively, Address already 
in use;

shutting down NameNode.

I do not understand this. It's like it's trying to start it twice; netstat
shows no port 50070 in use after shutdown.

I feel like an idiot trying to wrap my mind around this! What the heck am
I doing wrong?


Never mind.  declaring multiple services at the same port never works.



--
James Graham (Greywolf)   |
650.930.1138|925.768.4053 *
[EMAIL PROTECTED] |
Check out what people are saying about SearchMe! -- click below
http://www.searchme.com/stack/109aa


Re: Hadoop and Fedora Core 6 Adventure, Need Help ASAP

2008-07-24 Thread hadoop hadoop-chetan
Hello Folks

   I somebody has successfully installed Hadoop on FC 6, Please Help
!!!

   Just bootstrapping into the Haddop madness and was attempting to install
hadoop on Fedora Core 6.
   Tried all sorts of things but couldn't get past this error which is not
starting the reduce tasks

2008-07-24 13:04:06,642 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from task_200807241301_0001_r_00_0: java.lang.NullPointerException
at java.util.Hashtable.get(Hashtable.java:334)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1103)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:328)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)


Before you ask, here are the details:

 1. Running hadoop as a single node cluster
 2. Disabled IPV6
 3. Using Hadoop version */hadoop-0.17.1/*
 4. enabled ssh to access local machine
 5. Master and Slaves are set to localhost
 6. Created simple sample file and loaded into DFS
 7. Encountered error when I was running the sample with the wordcount
example provided with the package
 8. Here is my hadoop-site.xml

 configuration

property
  namehadoop.tmp.dir/name
  value/tmp/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
/property

property
  namefs.default.name/name
  valuehdfs://localhost:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
/property

property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property

 property
   namemapred.map.tasks/name
   value1/value
   description
 define mapred.map tasks to be number of slave hosts
   /description
  /property

 property
   namemapred.reduce.tasks/name
  value1/value
   description
 define mapred.reduce tasks to be number of slave hosts
   /description
  /property

property
  namedfs.replication/name
  value1/value
  descriptionDefault block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  /description
/property

property
   namemapred.child.java.opts/name
   value-Xmx1800m/value
   descriptionJava opts for the task tracker child processes.
   The following symbol, if present, will be interpolated: @taskid@ is
 replaced by current TaskID. Any other occurrences of '@' will go unchanged.
 For example, to enable verbose gc logging to a file named for the taskid in
 /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
 -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED]
   /description
/property


/configuration


Need help to setup Hadoop on Fedora Core 6

2008-07-24 Thread hadoop hadoop-chetan
Hello Folks

   I somebody has successfully installed Hadoop on FC 6, Please Help
!!!

   Just bootstrapping into the Haddop madness and was attempting to install
hadoop on Fedora Core 6.
   Tried all sorts of things but couldn't get past this error which is not
starting the reduce tasks

2008-07-24 13:04:06,642 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from task_200807241301_0001_r_00_0: java.lang.NullPointerException
at java.util.Hashtable.get(Hashtable.java:334)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1103)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:328)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)


Before you ask, here are the details:

 1. Running hadoop as a single node cluster
 2. Disabled IPV6
 3. Using Hadoop version */hadoop-0.17.1/*
 4. enabled ssh to access local machine
 5. Master and Slaves are set to localhost
 6. Created simple sample file and loaded into DFS
 7. Encountered error when I was running the sample with the wordcount
example provided with the package
 8. Here is my hadoop-site.xml

 configuration

property
  namehadoop.tmp.dir/name
  value/tmp/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
/property

property
  namefs.default.name/name
  valuehdfs://localhost:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem./description
/property

property
  namemapred.job.tracker/name
  valuelocalhost:54311/value
  descriptionThe host and port that the MapReduce job tracker runs
  at.  If local, then jobs are run in-process as a single map
  and reduce task.
  /description
/property

 property
   namemapred.map.tasks/name
   value1/value
   description
 define mapred.map tasks to be number of slave hosts
   /description
  /property

 property
   namemapred.reduce.tasks/name
  value1/value
   description
 define mapred.reduce tasks to be number of slave hosts
   /description
  /property

property
  namedfs.replication/name
  value1/value
  descriptionDefault block replication.
  The actual number of replications can be specified when the file is
created.
  The default is used if replication is not specified in create time.
  /description
/property

property
   namemapred.child.java.opts/name
   value-Xmx1800m/value
   descriptionJava opts for the task tracker child processes.
   The following symbol, if present, will be interpolated: @taskid@ is
 replaced by current TaskID. Any other occurrences of '@' will go unchanged.
 For example, to enable verbose gc logging to a file named for the taskid in
 /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:
 -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED]
   /description
/property


/configuration


Re: Need Help

2008-05-12 Thread Amar Kamat

hemal patel wrote:

Hello ,

Can u help me to solve this problem..

When I am trying to run this program it give me error like this.

 bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
08/05/12 17:32:59 INFO mapred.FileInputFormat: Total input paths to process
: 12
java.io.IOException: Not a file: hdfs://localhost:9000/user/hemal/input/conf
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:170)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
at org.apache.hadoop.examples.Grep.run(Grep.java:69)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.Grep.main(Grep.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155).


And also one more error

[EMAIL PROTECTED]:~/hadoop-0.15.3 bin/hadoop jar usr/hemal/wordconut.jar
  

Two things to check
1) The jar file should be on the local disk and not on the DFS. It looks 
like 'usr/hemal/wordconut.jar' is the dfs path. So the command would 
look like

bin/hadoop jar /local/path/to/jar/job.jar args
So if you have the jar file in your home folder then you can use 
~/job.jar or /home/user-name/job.jar

2) Also make sure its wordconut.jar and not wordcount.jar
Amar

P.S Changing the mailing list to core-user.

WordCount /usr/hemal/wordconut/input /usr/hemal/wordcount/output
Exception in thread main java.io.IOException: Error opening job jar:
usr/hemal/wordconut.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:133)
at java.util.jar.JarFile.init(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)


Please help me out.

Thanks
Hemal

  




I need help to set HADOOP

2008-05-09 Thread sigma syd
Hello!!:

I am trying to set hadoop  two  pc.

I have in conf/master:
master master.visid.com

and in conf/slave:
master.visid.com
slave3.visid.com.

When i execute bin/start-dfs.sh and bin/start-mapred.sh in 
logs/hadoop-hadoop-datanode-slave3.visid.com.log is displayed the next error:

STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = slave3.visid.com/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.16.2
STARTUP_MSG:   build = 
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 642481; 
compiled by 'hadoopqa' on Sat Mar 29 01:59:04 UTC 2008
/
2008-05-08 21:15:40,133 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 1 time(s).
2008-05-08 21:15:41,133 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 2 time(s).
2008-05-08 21:15:42,134 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 3 time(s).
2008-05-08 21:15:43,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 4 time(s).
2008-05-08 21:15:44,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 5 time(s).
2008-05-08 21:15:45,136 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 6 time(s).
2008-05-08 21:15:46,136 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 7 time(s).
2008-05-08 21:15:47,137 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 8 time(s).
2008-05-08 21:15:48,138 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 9 time(s).
2008-05-08 21:15:49,138 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: master.visid.com/192.168.46.242:54310. Already tried 10 time(s).
2008-05-08 21:15:50,141 ERROR org.apache.hadoop.dfs.DataNode: 
java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:519)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:161)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:578)
at org.apache.hadoop.ipc.Client.call(Client.java:501)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:291)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:278)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:315)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:260)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:207)
at org.apache.hadoop.dfs.DataNode.init(DataNode.java:162)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512)
at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673)


in the master.visid.com i execute jps and all services is running:
3984 DataNode
4148 SecondaryNameNode
4373 TaskTracker
4461 Jps
3873 NameNode
4246 JobTracker
but in the slave3.visid.com none service is running:
jps

here is my file config - hadoop-site.xml:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

!-- Put site-specific property overrides in this file. --

configuration
property
  namehadoop.tmp.dir/name
  value/nutch/filesystem/hadoop-datastore/hadoop-${user.name}/value
  descriptionA base for other temporary directories./description
/property

property
  namefs.default.name/name
  valuemaster.visid.com:54310/value
  descriptionThe name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property 

need help

2008-04-09 Thread krishna prasanna
Hi I started using hadoop very recently I am struct with the basic example 
when i am trying to run 
bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

i am getting output as 

08/04/09 21:23:12 INFO mapred.FileInputFormat: Total input paths to process : 2
java.io.IOException: Not a file: hdfs://localhost:9000/user/Administrator/input/
conf
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
va:170)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:515)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753)
at org.apache.hadoop.examples.WordCount.run(WordCount.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:153)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
mDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

Where are there libraries residing..? How should i configure this.?

Thanks  Regards,
Krishna.


  Unlimited freedom, unlimited storage. Get it now, on 
http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/