Re: [ANNOUNCE] Hadoop release 0.18.3 available

2009-01-30 Thread Amareshwari Sriramadasu

Anum Ali wrote:

Hi,


Need some kind of guidance related to started with Hadoop Installation and
system setup. Iam newbie regarding to Hadoop . Our system OS is Fedora 8,
should I start from a stable release of Hadoop or get it from svn developing
version (from contribute site).



Thank You



  

Download a stable release from http://hadoop.apache.org/core/releases.html
For installation and setup, You can see 
http://hadoop.apache.org/core/docs/current/quickstart.html and 
http://hadoop.apache.org/core/docs/current/cluster_setup.html


-Amareshwari

On Thu, Jan 29, 2009 at 7:38 PM, Nigel Daley nda...@yahoo-inc.com wrote:

  

Release 0.18.3 fixes many critical bugs in 0.18.2.

For Hadoop release details and downloads, visit:
http://hadoop.apache.org/core/releases.html

Hadoop 0.18.3 Release Notes are at
http://hadoop.apache.org/core/docs/r0.18.3/releasenotes.html

Thanks to all who contributed to this release!

Nigel




  




Hadoop perl

2009-01-30 Thread Daren

Just started using hadoop and want to use perl to interface with it.

Thriftfs has some perl modules which claim to be able to work with the 
thrift server !


Unfortunately I have'nt been able to get this to work and was wondering 
if anyone out there can give me some advice as to how to get a perl 
interface to work, if indeeed it's possible ???


da...@adestra.com


local path

2009-01-30 Thread Hakan Kocakulak
Hello,
How can I write and read from directly to the datanode's local path?

Thanks,
Hakan


How To Encrypt Hadoop Socket Connections

2009-01-30 Thread Brian MacKay
Hello,

Found some archive posts regarding encrypt Hadoop socket connections

https://issues.apache.org/jira/browse/HADOOP-2239

http://markmail.org/message/pmn23y4b3gdxcpif

Couldn't find any documentation or Junit tests.  Does anyone know the
proper configuration changes to make?

It seems like the following are needed in hadoop-site.xml?

https.keystore.info.rsrc  = should refernce an external config file, in
this example called sslinfo.xml ?

https.keystore.password  =  ?
https.keystore.keypassword  = ?

---
Snippet from org.apache.hadoop.dfs.DataNode

  void startDataNode(Configuration conf, 
 AbstractListFile dataDirs
 ) throws IOException {

   ...
   sslConf.addResource(conf.get(https.keystore.info.rsrc,
sslinfo.xml));
String keyloc = sslConf.get(https.keystore.location);
if (null != keyloc) {
  this.infoServer.addSslListener(secInfoSocAddr, keyloc,
  sslConf.get(https.keystore.password, ),
  sslConf.get(https.keystore.keypassword, ));
--
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.




Re: Hadoop perl

2009-01-30 Thread Keita Higashi

Hello!!

Do you know existence of hadoop-streaming.jar?
Recommend that you use hadoop-streaming.jar if you do not know.
The usage of hadoop-streaming.jar is written on 
http://hadoop.apache.org/core/docs/r0.18.3/streaming.html.


ex:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar  \
-input httpd_logs \
-output logc_output \
-mapper /home/hadoop/work/hadoop/analog/map.pl \
-reducer /home/hadoop/work/hadoop/analog/reduce.pl \
-inputformat TextInputFormat \
-outputformat TextOutputFormat


Thank you.


- Original Message - 
From: Daren daren.hise...@adestra.com

To: core-user@hadoop.apache.org
Sent: Friday, January 30, 2009 7:53 PM
Subject: Hadoop  perl



Just started using hadoop and want to use perl to interface with it.

Thriftfs has some perl modules which claim to be able to work with the 
thrift server !


Unfortunately I have'nt been able to get this to work and was wondering if 
anyone out there can give me some advice as to how to get a perl interface 
to work, if indeeed it's possible ???


da...@adestra.com 




problem with completion notification from block movement

2009-01-30 Thread Karl Kleinpaste
We have a small test cluster, a double master (NameNode+JobTracker) plus
2 slaves, running 0.18.1.  We are seeing an intermittent problem where
our application logs failures out of DFSClient, thus:

2009-01-30 01:59:42,072 WARN org.apache.hadoop.dfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_7603130349014268849_2349933java.net.SocketTimeoutException: 66000
millis timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.0.10.102:54700
remote=/10.0.10.108:50010]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.DataInputStream.readFully(DataInputStream.java:176)
at java.io.DataInputStream.readLong(DataInputStream.java:380)
at org.apache.hadoop.dfs.DFSClient$DFSOutputStream
$ResponseProcessor.run(DFSClient.java:2044)

(Apologies for paste formatting.  I hate Evolution.)

Our application here is our JobConsole, which is responsible for
taking notifications from an external data-generating application: The
external app scribbles files into DFS and then tells JobConsole about
them.  JobConsole submits jobs to crunch that data in response to the
external app's notifications of data availability.  JobConsole runs on
the master node.

Chasing that block identifier through our JobConsole log plus the
DataNode logs on the slaves, we have an odd timeline, which is this:
01:58:32slave (.108, above): receiving blk from master (.102)
01:58:35other slave (.107): receiving blk from .108
01:58:36.107: received blk
01:58:38.107: terminate PacketResponder
01:59:42JobConsole (.102): 66s t.o. + Error Recovery (above)
01:59:42.107: invoke recoverBlock on that blk
02:01:15.108: received blk + terminate PacketResponder
03:03:24.108: deleting blk, from Linux pathname in DFS storage

What's clear from this is that .108 got the block quickly, because it
was in a position immediately to send a copy to .107, which responded
promptly enough to say that it was in possession.  But .108's DataNode
sat on the block for a full 3 minutes before announcing what appears to
have been ordinary completion and responder termination.  After the
first minute-plus of that long period, JobConsole gave up and did a
recovery operation, which appears to work.  If .108's DataNode sent a
notification when it finally logged completed reception, no doubt there
was nobody listening for it any more.

What's particularly of interest to us is that the NameNode log shows us
that the data being moved is job.jar:

2009-01-30 01:58:32,353 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
NameSystem.allocateBlock: 
/usr/local/rinera/hadoop/hadoop-runtime/0/mapred/system/job_200901291752_3021/job.jar.
 blk_7603130349014268849_2349933

Note block name and timestamp.

Does anyone else have knowledge or history with such glitches?  We've
recently begun seeing a number of problems in communication between task
management processes and DFS that previously had not been seen, and
we're trying to nail down where they're coming from, without success.



Re: Question about HDFS capacity and remaining

2009-01-30 Thread Bryan Duxbury
Hm, very interesting. Didn't know about that. What's the purpose of  
the reservation? Just to give root preference or leave wiggle room?  
If it's not strictly necessary it seems like it would make sense to  
reduce it to essentially 0%.


-Bryan

On Jan 29, 2009, at 6:18 PM, Doug Cutting wrote:

Ext2 by default reserves 5% of the drive for use by root only.   
That'd be 45MB of your 907GB capacity which would account for most  
of the discrepancy.  You can adjust this with tune2fs.


Doug

Bryan Duxbury wrote:

There are no non-dfs files on the partitions in question.
df -h indicates that there is 907GB capacity, but only 853GB  
remaining, with 200M used. The only thing I can think of is the  
filesystem overhead.

-Bryan
On Jan 29, 2009, at 4:06 PM, Hairong Kuang wrote:

It's taken by non-dfs files.

Hairong


On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote:


Hey all,

I'm currently installing a new cluster, and noticed something a
little confusing. My DFS is *completely* empty - 0 files in DFS.
However, in the namenode web interface, the reported capacity is
3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go?  
There
are literally zero other files on the partitions hosting the DFS  
data

directories. Where am I losing 240GB?

-Bryan






Re: Question about HDFS capacity and remaining

2009-01-30 Thread stephen mulcahy


Bryan Duxbury wrote:
Hm, very interesting. Didn't know about that. What's the purpose of the 
reservation? Just to give root preference or leave wiggle room? If it's 
not strictly necessary it seems like it would make sense to reduce it to 
essentially 0%.


AFAIK It is needed for defragmentation / fsck to work properly and your 
filesystem performance will degrade a lot if you reduce this to 0% (but 
I'd love to hear otherwise :)


-stephen



Seeing DiskErrorException, but no real error appears to be happening

2009-01-30 Thread John Lee
Folks,

I'm seeing a lot of the following exceptions in my Hadoop logs when I
run jobs under Hadoop 0.19. I dont recall seeing this in Hadoop 0.18:

org.apache.hadoop.mapred.TaskTracker:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
taskTracker/jobcache/job_200901131804_0215/attempt_200901131804_0215_r_09_0/output/file.out
in any of the configured local directories

My understanding is that this means the reduce outputs can't be found
for some reason. However, the jobs seem to complete successfully and
the output is fine. I've double checked my configuration and can't
find any errors or problems. Is this pretty normal behavior? Is there
anything that might cause this other than misconfiguration? I'm trying
to decide if a bug needs to be filed.

Thanks,
John


Re: Question about HDFS capacity and remaining

2009-01-30 Thread Doug Cutting

Bryan Duxbury wrote:
Hm, very interesting. Didn't know about that. What's the purpose of the 
reservation? Just to give root preference or leave wiggle room?


I think it's so that, when the disk is full, root processes don't fail, 
only user processes.  So you don't lose, e.g., syslog.  With modern 
disks, 5% is too much, especially for volumes that are only used for 
user data.  You can safely set this to 1%.


Doug


Re: Question about HDFS capacity and remaining

2009-01-30 Thread Brian Bockelman
For what it's worth, our organization did extensive tests on many  
filesystems benchmarking their performance when they are 90 - 95% full.


Only XFS retained most of its performance when it was mostly  
full (ext4 was not tested)... so, if you are thinking of pushing  
things to the limits, that might be something worth considering.


Brian

On Jan 30, 2009, at 11:18 AM, stephen mulcahy wrote:



Bryan Duxbury wrote:
Hm, very interesting. Didn't know about that. What's the purpose of  
the reservation? Just to give root preference or leave wiggle room?  
If it's not strictly necessary it seems like it would make sense to  
reduce it to essentially 0%.


AFAIK It is needed for defragmentation / fsck to work properly and  
your filesystem performance will degrade a lot if you reduce this to  
0% (but I'd love to hear otherwise :)


-stephen




Reducers stuck in Shuffle ...

2009-01-30 Thread Miles Osborne
i've been seeing a lot of jobs where large numbers of reducers keep
failing at the shuffle phase due to timeouts (see a sample reducer
syslog entry below).  our setup consists of 8-core machines, with one
box acting as both a slave and a namenode.  the load on the namenode
is not at full capacity so that doesn't appear to be the problem.  we
also run 0.18.1

reducers which run on the namenode are fine, it is only those running
on slaves which seem affected.

note that i seem to get this if i vary the number of reducers run, so
it doesn't appear to be a function of the shard size

is there some flag i should modify to increase the timeout value?  or,
is this fixed in the latest release?

(i found one thread on this which talked about DNS entries and another
which mentioned HADOOP-3155)

thanks

Miles

2009-01-30 10:26:14,085 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2009-01-30 10:26:14,229 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed exec
[/disk2/hadoop/mapred/local/taskTracker/jobcache/job_200901301017_0001/attempt_200901301017_0001_r_11_0/work/./r-compute-ngram-counts]
2009-01-30 10:26:14,368 INFO org.apache.hadoop.mapred.ReduceTask:
ShuffleRamManager: MemoryLimit=78643200,
MaxSingleShuffleLimit=19660800
2009-01-30 10:26:14,488 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0 Thread started: Thread for
merging on-disk files
2009-01-30 10:26:14,488 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0 Thread waiting: Thread for
merging on-disk files
2009-01-30 10:26:14,488 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0 Thread started: Thread for
merging in memory files
2009-01-30 10:26:14,489 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0 Need another 3895 map output(s)
where 0 is already in progress
2009-01-30 10:26:14,495 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0: Got 6 new map-outputs  number
of known map outputs is 6
2009-01-30 10:26:14,496 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0 Scheduled 1 of 6 known outputs (0
slow hosts and 5 dup hosts)
2009-01-30 10:26:44,566 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200901301017_0001_r_11_0 copy failed:
attempt_200901301017_0001_m_03_0 from crom.inf.ed.ac.uk
2009-01-30 10:26:44,567 WARN org.apache.hadoop.mapred.ReduceTask:
java.net.SocketTimeoutException: connect timed out
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1296)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1290)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:944)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1143)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1084)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:997)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:946)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.Socket.connect(Socket.java:519)
at sun.net.NetworkClient.doConnect(NetworkClient.java:152)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
at sun.net.www.http.HttpClient.init(HttpClient.java:233)at
sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.http.HttpClient.New(HttpClient.java:323)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:788)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:729)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:654)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:977)
... 4 more

2009-01-30 10:26:45,493 INFO org.apache.hadoop.mapred.ReduceTask: Task

Re: Question about HDFS capacity and remaining

2009-01-30 Thread Bryan Duxbury

Did you publish those results anywhere?

On Jan 30, 2009, at 9:56 AM, Brian Bockelman wrote:

For what it's worth, our organization did extensive tests on many  
filesystems benchmarking their performance when they are 90 - 95%  
full.


Only XFS retained most of its performance when it was mostly  
full (ext4 was not tested)... so, if you are thinking of pushing  
things to the limits, that might be something worth considering.


Brian

On Jan 30, 2009, at 11:18 AM, stephen mulcahy wrote:



Bryan Duxbury wrote:
Hm, very interesting. Didn't know about that. What's the purpose  
of the reservation? Just to give root preference or leave wiggle  
room? If it's not strictly necessary it seems like it would make  
sense to reduce it to essentially 0%.


AFAIK It is needed for defragmentation / fsck to work properly and  
your filesystem performance will degrade a lot if you reduce this  
to 0% (but I'd love to hear otherwise :)


-stephen






Re: How does Hadoop choose machines for Reducers?

2009-01-30 Thread Nathan Marz
This is a huge problem for my application. I tried setting  
mapred.tasktracker.reduce.tasks.maximum to 1 in the job's JobConf, but  
that didn't have any effect. I'm using a custom output format and it's  
essential that Hadoop distribute the reduce tasks to make use of all  
the machine's as there is contention when multiple reduce tasks run on  
one machine. Since my number of reduce tasks is guaranteed to be less  
than the number of machines in the cluster, there's no reason for  
Hadoop not to make use of the full cluster.


Does anyone know of a way to force Hadoop to distribute reduce tasks  
evenly across all the machines?



On Jan 30, 2009, at 7:32 AM, jason hadoop wrote:

Hadoop just distributes to the available reduce execution slots. I  
don't

believe it pays attention to what machine they are on.
I believe the plan is to take account data locality in future (ie:
distribute tasks to machines that are considered more topologically  
close to
their input split first, but I don't think this is available to most  
users.)



On Thu, Jan 29, 2009 at 7:05 PM, Nathan Marz nat...@rapleaf.com  
wrote:


I have a MapReduce application in which I configure 16 reducers to  
run on
15 machines. My mappers output exactly 16 keys, IntWritable's from  
0 to 15.
However, only 12 out of the 15 machines are used to run the 16  
reducers (4
machines have 2 reducers running on each). Is there a way to get  
Hadoop to

use all the machines for reducing?





Re: Cannot run program chmod: error=12, Not enough space

2009-01-30 Thread Allen Wittenauer
On 1/28/09 7:42 PM, Andy Liu andyliu1...@gmail.com wrote:
 I'm running Hadoop 0.19.0 on Solaris (SunOS 5.10 on x86) and many jobs are
 failing with this exception:
 
 Error initializing attempt_200901281655_0004_m_25_0:
 java.io.IOException: Cannot run program chmod: error=12, Not enough space
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:459)
...
 at java.lang.UNIXProcess.forkAndExec(Native Method)
 at java.lang.UNIXProcess.(UNIXProcess.java:53)
 at java.lang.ProcessImpl.start(ProcessImpl.java:65)
 at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
 ... 20 more
 
 However, all the disks have plenty of disk space left (over 800 gigs).  Can
 somebody point me in the right direction?

Not enough space is usually SysV kernel speak for not enough virtual
memory to swap.  See how much mem you have free.




Re: 2009 Hadoop Summit?

2009-01-30 Thread Bill Au
JavaOne is scheduled for the first week on June this year.  Please keep that
in mind since I am guessing I am not the only one who are interested in
both.

Bill

On Thu, Jan 29, 2009 at 7:45 PM, Ajay Anand aan...@yahoo-inc.com wrote:

 Yes! We are planning one for the first week of June. I will be sending
 out a note inviting talks and reaching out to presenters over the next
 couple of days. You read my mind :)
 Stay tuned.

 Ajay

 -Original Message-
 From: Bradford Stephens [mailto:bradfordsteph...@gmail.com]
 Sent: Thursday, January 29, 2009 4:34 PM
 To: core-user@hadoop.apache.org
 Subject: 2009 Hadoop Summit?

 Hey there,

 I was just wondering if there's plans for another Hadoop Summit this
 year? I went last March and learned quite a bit -- I'm excited to see
 what new things people have done since then.

 Cheers,
 Bradford



Re: Hadoop Streaming Semantics

2009-01-30 Thread S D
Thanks for your response Amereshwari. I'm unclear on how to take advantage
of NLineInputFormat with Hadoop Streaming. Is the idea that I modify the
streaming jar file (contrib/streaming/hadoop-version-streaming.jar) to
include the NLineInputFormat class and then pass a command line
configuration param to indicate that NLineInputFormat should be used? If
this is the proper approach, can you point me to an example of what kind of
param should be specified? I appreciate your help.

Thanks,
SD

On Thu, Jan 29, 2009 at 10:49 PM, Amareshwari Sriramadasu 
amar...@yahoo-inc.com wrote:

 You can use NLineInputFormat for this, which splits one line (N=1, by
 default) as one split.
 So, each map task processes one line.
 See
 http://hadoop.apache.org/core/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html

 -Amareshwari

 S D wrote:

 Hello,

 I have a clarifying question about Hadoop streaming. I'm new to the list
 and
 didn't see anything posted that covers my questions - my apologies if I
 overlooked a relevant post.

 I have an input file consisting of a list of files (one per line) that
 need
 to be processed independently of each other. The duration for processing
 each file is significant - perhaps an hour each. I'm using Hadoop
 streaming
 without a reduce function to process each file and save the results (back
 to
 S3 native in my case). To handle to long processing time of each file I've
 set mapred.task.timeout=0 and I have a pretty straight forward Ruby script
 reading from STDIN:

 STDIN.each_line do |line|
   # Get file from contents of line
   # Process file (long running)
 end

 Currently I'm using a cluster of 3 workers in which each worker can have
 up
 to 2 tasks running simultaneously. I've noticed that if I have a single
 input file with many lines (more than 6 given my cluster), then not all
 workers will be allocated tasks; I've noticed two workers being allocated
 one task each and the other worker sitting idly. If I split my input file
 into multiple files (at least 6) then all workers will be immediately
 allocated the maximum number of tasks that they can handle.

 My interpretation on this is fuzzy. It seems that Hadoop streaming will
 take
 separate input files and allocate a new task per file (up to the maximum
 constraint) but if given a single input file it is unclear as to whether a
 new task is allocated per file or line. My understanding of Hadoop Java is
 that (unlike Hadoop streaming) when given a single input file, the file
 will
 be broken up into separate lines and the maximum number of map tasks will
 automagically be allocated to handle the lines of the file (assuming the
 use
 of TextInputFormat).

 Can someone clarify this?

 Thanks,
 SD







settin JAVA_HOME...

2009-01-30 Thread zander1013

hi,

i am new to hadoop. i am trying to set it up for the first time as a single
node cluster. at present the snag is that i cannot seem to find the correct
path for setting the JAVA_HOME variable.

i am using ubuntu 8.10. i have tried using whereis java and tried setting
the variable to point to those places (except the dir where i have hadoop).

please advise.

-zander
-- 
View this message in context: 
http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: settin JAVA_HOME...

2009-01-30 Thread Mark Kerzner
You set it in the conf/hadoop-env.sh file, with an entry like this
export JAVA_HOME=/usr/lib/jvm/default-java

Mark

On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com wrote:


 hi,

 i am new to hadoop. i am trying to set it up for the first time as a single
 node cluster. at present the snag is that i cannot seem to find the correct
 path for setting the JAVA_HOME variable.

 i am using ubuntu 8.10. i have tried using whereis java and tried setting
 the variable to point to those places (except the dir where i have hadoop).

 please advise.

 -zander
 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: settin JAVA_HOME...

2009-01-30 Thread zander1013

okay,

here is the section for conf/hadoop-env.sh...

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/lib/jvm/default-java

...

and here is what i got for output. i am trying to go through the tutorial at
http://hadoop.apache.org/core/docs/current/quickstart.html

here is the output...

a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar grep
input output 'dfs[a-z.]+'
bin/hadoop: line 243: /usr/lib/jvm/default-java/bin/java: No such file or
directory
bin/hadoop: line 273: /usr/lib/jvm/default-java/bin/java: No such file or
directory
bin/hadoop: line 273: exec: /usr/lib/jvm/default-java/bin/java: cannot
execute: No such file or directory
a...@node0:~/Hadoop/hadoop-0.19.0$ 

...

please advise...




Mark Kerzner-2 wrote:
 
 You set it in the conf/hadoop-env.sh file, with an entry like this
 export JAVA_HOME=/usr/lib/jvm/default-java
 
 Mark
 
 On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com wrote:
 

 hi,

 i am new to hadoop. i am trying to set it up for the first time as a
 single
 node cluster. at present the snag is that i cannot seem to find the
 correct
 path for setting the JAVA_HOME variable.

 i am using ubuntu 8.10. i have tried using whereis java and tried
 setting
 the variable to point to those places (except the dir where i have
 hadoop).

 please advise.

 -zander
 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756569.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: settin JAVA_HOME...

2009-01-30 Thread Mark Kerzner
Oh, you have used my path to JDK, you need yours
do this

which java
something like /usr/bin/java will come back

then do
ls -l /usr/bin/java

it will tell you where the link is to. There may be more redirections, get
the real path to your JDK

On Fri, Jan 30, 2009 at 4:09 PM, zander1013 zander1...@gmail.com wrote:


 okay,

 here is the section for conf/hadoop-env.sh...

 # Set Hadoop-specific environment variables here.

 # The only required environment variable is JAVA_HOME.  All others are
 # optional.  When running a distributed configuration it is best to
 # set JAVA_HOME in this file, so that it is correctly defined on
 # remote nodes.

 # The java implementation to use.  Required.
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
 export JAVA_HOME=/usr/lib/jvm/default-java

 ...

 and here is what i got for output. i am trying to go through the tutorial
 at
 http://hadoop.apache.org/core/docs/current/quickstart.html

 here is the output...

 a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar grep
 input output 'dfs[a-z.]+'
 bin/hadoop: line 243: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: exec: /usr/lib/jvm/default-java/bin/java: cannot
 execute: No such file or directory
 a...@node0:~/Hadoop/hadoop-0.19.0$

 ...

 please advise...




 Mark Kerzner-2 wrote:
 
  You set it in the conf/hadoop-env.sh file, with an entry like this
  export JAVA_HOME=/usr/lib/jvm/default-java
 
  Mark
 
  On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com
 wrote:
 
 
  hi,
 
  i am new to hadoop. i am trying to set it up for the first time as a
  single
  node cluster. at present the snag is that i cannot seem to find the
  correct
  path for setting the JAVA_HOME variable.
 
  i am using ubuntu 8.10. i have tried using whereis java and tried
  setting
  the variable to point to those places (except the dir where i have
  hadoop).
 
  please advise.
 
  -zander
  --
  View this message in context:
  http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756569.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: extra documentation on how to write your own partitioner class

2009-01-30 Thread james warren
Hello Sandy -
Your partitioner isn't using any information from the key/value pair - it's
only using the value T which is read once from the job configuration.
 getPartition() will always return the same value, so all of your data is
being sent to one reducer. :P

cheers,
-James

On Fri, Jan 30, 2009 at 1:32 PM, Sandy snickerdoodl...@gmail.com wrote:

 Hello,

 Could someone point me toward some more documentation on how to write one's
 own partition class? I have having quite a bit of trouble getting mine to
 work. So far, it looks something like this:

 public class myPartitioner extends MapReduceBase implements
 PartitionerIntWritable, IntWritable {

private int T;

public void configure(JobConf job) {
super.configure(job);
String myT = job.get(tval);//this is user defined
T = Integer.parseInt(myT);
}

public int getPartition(IntWritable key, IntWritable value, int
 numReduceTasks) {
int newT = (T/numReduceTasks);
int id = ((value.get()/ T);
return (int)(id/newT);
}
 }

 In the run() function of my M/R program I just set it using:

 conf.setPartitionerClass(myPartitioner.class);

 Is there anything else I need to set in the run() function?


 The code compiles fine. When I run it, I know it is using the
 partitioner,
 since I get different output than if I just let it use HashPartitioner.
 However, it is not splitting between the reducers at all! If I set the
 number of reducers to 2, all the output shows up in part-0, while
 part-1 has nothing.

 I am having trouble debugging this since I don't know how I can observe the
 values of numReduceTasks (which I assume is being set by the system). Is
 this a proper assumption?

 If I try to insert any println() statements in the function, it isn't
 outputted to either my terminal or my log files. Could someone give me some
 general advice on how best to debug pieces of code like this?



Re: settin JAVA_HOME...

2009-01-30 Thread Bill Au
You actually have to set JAVA_HOME to where Java is actually installed on
your system.  /usr/lib/jvm/default-java is just an example.  The error
messages indicate that that's not where Java is installed on your system.

Bill

On Fri, Jan 30, 2009 at 5:09 PM, zander1013 zander1...@gmail.com wrote:


 okay,

 here is the section for conf/hadoop-env.sh...

 # Set Hadoop-specific environment variables here.

 # The only required environment variable is JAVA_HOME.  All others are
 # optional.  When running a distributed configuration it is best to
 # set JAVA_HOME in this file, so that it is correctly defined on
 # remote nodes.

 # The java implementation to use.  Required.
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
 export JAVA_HOME=/usr/lib/jvm/default-java

 ...

 and here is what i got for output. i am trying to go through the tutorial
 at
 http://hadoop.apache.org/core/docs/current/quickstart.html

 here is the output...

 a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar grep
 input output 'dfs[a-z.]+'
 bin/hadoop: line 243: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: exec: /usr/lib/jvm/default-java/bin/java: cannot
 execute: No such file or directory
 a...@node0:~/Hadoop/hadoop-0.19.0$

 ...

 please advise...




 Mark Kerzner-2 wrote:
 
  You set it in the conf/hadoop-env.sh file, with an entry like this
  export JAVA_HOME=/usr/lib/jvm/default-java
 
  Mark
 
  On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com
 wrote:
 
 
  hi,
 
  i am new to hadoop. i am trying to set it up for the first time as a
  single
  node cluster. at present the snag is that i cannot seem to find the
  correct
  path for setting the JAVA_HOME variable.
 
  i am using ubuntu 8.10. i have tried using whereis java and tried
  setting
  the variable to point to those places (except the dir where i have
  hadoop).
 
  please advise.
 
  -zander
  --
  View this message in context:
  http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756569.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: settin JAVA_HOME...

2009-01-30 Thread zander1013

cool!

here is the output for those commands...

a...@node0:~/Hadoop/hadoop-0.19.0$ which java
/usr/bin/java
a...@node0:~/Hadoop/hadoop-0.19.0$ 
a...@node0:~/Hadoop/hadoop-0.19.0$ ls -l /usr/bin/java 
lrwxrwxrwx 1 root root 22 2009-01-29 18:03 /usr/bin/java -
/etc/alternatives/java
a...@node0:~/Hadoop/hadoop-0.19.0$ 

... i will try and set JAVA_HOME=/etc/alternatives/java...

thank you for helping...

-zander


Mark Kerzner-2 wrote:
 
 Oh, you have used my path to JDK, you need yours
 do this
 
 which java
 something like /usr/bin/java will come back
 
 then do
 ls -l /usr/bin/java
 
 it will tell you where the link is to. There may be more redirections, get
 the real path to your JDK
 
 On Fri, Jan 30, 2009 at 4:09 PM, zander1013 zander1...@gmail.com wrote:
 

 okay,

 here is the section for conf/hadoop-env.sh...

 # Set Hadoop-specific environment variables here.

 # The only required environment variable is JAVA_HOME.  All others are
 # optional.  When running a distributed configuration it is best to
 # set JAVA_HOME in this file, so that it is correctly defined on
 # remote nodes.

 # The java implementation to use.  Required.
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
 export JAVA_HOME=/usr/lib/jvm/default-java

 ...

 and here is what i got for output. i am trying to go through the tutorial
 at
 http://hadoop.apache.org/core/docs/current/quickstart.html

 here is the output...

 a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar
 grep
 input output 'dfs[a-z.]+'
 bin/hadoop: line 243: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: exec: /usr/lib/jvm/default-java/bin/java: cannot
 execute: No such file or directory
 a...@node0:~/Hadoop/hadoop-0.19.0$

 ...

 please advise...




 Mark Kerzner-2 wrote:
 
  You set it in the conf/hadoop-env.sh file, with an entry like this
  export JAVA_HOME=/usr/lib/jvm/default-java
 
  Mark
 
  On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com
 wrote:
 
 
  hi,
 
  i am new to hadoop. i am trying to set it up for the first time as a
  single
  node cluster. at present the snag is that i cannot seem to find the
  correct
  path for setting the JAVA_HOME variable.
 
  i am using ubuntu 8.10. i have tried using whereis java and tried
  setting
  the variable to point to those places (except the dir where i have
  hadoop).
 
  please advise.
 
  -zander
  --
  View this message in context:
  http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756569.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756710.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: settin JAVA_HOME...

2009-01-30 Thread zander1013

yes i am trying to locate java's location on  my machine... i installed
sun-java6-jre using the synaptic package manager...

here is the output from the tutorial after i tried to locate java using
which java ls... etc and put that into the .sh file...

a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar grep
input output 'dfs[a-z.]+'
bin/hadoop: line 243: /etc/alternatives/java/bin/java: Not a directory
bin/hadoop: line 273: /etc/alternatives/java/bin/java: Not a directory
bin/hadoop: line 273: exec: /etc/alternatives/java/bin/java: cannot execute:
Not a directory
a...@node0:~/Hadoop/hadoop-0.19.0$ 

please advise.


Bill Au wrote:
 
 You actually have to set JAVA_HOME to where Java is actually installed on
 your system.  /usr/lib/jvm/default-java is just an example.  The error
 messages indicate that that's not where Java is installed on your system.
 
 Bill
 
 On Fri, Jan 30, 2009 at 5:09 PM, zander1013 zander1...@gmail.com wrote:
 

 okay,

 here is the section for conf/hadoop-env.sh...

 # Set Hadoop-specific environment variables here.

 # The only required environment variable is JAVA_HOME.  All others are
 # optional.  When running a distributed configuration it is best to
 # set JAVA_HOME in this file, so that it is correctly defined on
 # remote nodes.

 # The java implementation to use.  Required.
 # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
 export JAVA_HOME=/usr/lib/jvm/default-java

 ...

 and here is what i got for output. i am trying to go through the tutorial
 at
 http://hadoop.apache.org/core/docs/current/quickstart.html

 here is the output...

 a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar
 grep
 input output 'dfs[a-z.]+'
 bin/hadoop: line 243: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: /usr/lib/jvm/default-java/bin/java: No such file or
 directory
 bin/hadoop: line 273: exec: /usr/lib/jvm/default-java/bin/java: cannot
 execute: No such file or directory
 a...@node0:~/Hadoop/hadoop-0.19.0$

 ...

 please advise...




 Mark Kerzner-2 wrote:
 
  You set it in the conf/hadoop-env.sh file, with an entry like this
  export JAVA_HOME=/usr/lib/jvm/default-java
 
  Mark
 
  On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com
 wrote:
 
 
  hi,
 
  i am new to hadoop. i am trying to set it up for the first time as a
  single
  node cluster. at present the snag is that i cannot seem to find the
  correct
  path for setting the JAVA_HOME variable.
 
  i am using ubuntu 8.10. i have tried using whereis java and tried
  setting
  the variable to point to those places (except the dir where i have
  hadoop).
 
  please advise.
 
  -zander
  --
  View this message in context:
  http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756569.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756798.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



job management in Hadoop

2009-01-30 Thread Bill Au
Is there any way to cancel a job after it has been submitted?

Bill


Setting up cluster

2009-01-30 Thread Amandeep Khurana
Hi,

I am a new user and was setting up the HDFS on 3 nodes as of now. I could
get them to run individual pseudo distributed setups but am unable to get
the cluster going together. The site localhost:50070 shows me that there are
no datanodes.

I kept the same hadoop-site.xml as the pseudodistributed setup on the master
node and added the slaves to the list of slaves in the conf directory.
Thereafter, I ran the start-dfs.sh and start-mapred.sh scripts.

Am I missing something out?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


Re: job management in Hadoop

2009-01-30 Thread Arun C Murthy


On Jan 30, 2009, at 2:41 PM, Bill Au wrote:


Is there any way to cancel a job after it has been submitted?



bin/hadoop job -kill jobid

Arun


Re: Setting up cluster

2009-01-30 Thread Amandeep Khurana
Here's the log from the datanode:

2009-01-30 14:54:18,019 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: rndpc1/171.69.102.51:9000. Already tried 8 time(s).
2009-01-30 14:54:19,022 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: rndpc1/171.69.102.51:9000. Already tried 9 time(s).
2009-01-30 14:54:19,026 ERROR org.apache.hadoop.dfs.DataNode:
java.io.IOException: Call failed on local exception
at org.apache.hadoop.ipc.Client.call(Client.java:718)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:319)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:306)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:343)
at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:288)
at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:277)
at org.apache.hadoop.dfs.DataNode.init(DataNode.java:223)
at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:3031)
at
org.apache.hadoop.dfs.DataNode.instantiateDataNode(DataNode.java:2986)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2994)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3116)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source)
at sun.nio.ch.SocketAdaptor.connect(Unknown Source)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:300)
at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:789)
at org.apache.hadoop.ipc.Client.call(Client.java:704)
... 12 more

What do I need to do for this?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Fri, Jan 30, 2009 at 2:49 PM, Amandeep Khurana ama...@gmail.com wrote:

 Hi,

 I am a new user and was setting up the HDFS on 3 nodes as of now. I could
 get them to run individual pseudo distributed setups but am unable to get
 the cluster going together. The site localhost:50070 shows me that there are
 no datanodes.

 I kept the same hadoop-site.xml as the pseudodistributed setup on the
 master node and added the slaves to the list of slaves in the conf
 directory. Thereafter, I ran the start-dfs.sh and start-mapred.sh scripts.

 Am I missing something out?

 Amandeep


 Amandeep Khurana
 Computer Science Graduate Student
 University of California, Santa Cruz



Re: job management in Hadoop

2009-01-30 Thread Bill Au
Thanks.

Anyone knows if there is plan to add this functionality to the web UI like
job priority can be changed from both the command line and the web UI?

Bill

On Fri, Jan 30, 2009 at 5:54 PM, Arun C Murthy a...@yahoo-inc.com wrote:


 On Jan 30, 2009, at 2:41 PM, Bill Au wrote:

  Is there any way to cancel a job after it has been submitted?


 bin/hadoop job -kill jobid

 Arun



Re: job management in Hadoop

2009-01-30 Thread Bhupesh Bansal
Bill, 

Currently you can kill the job from the UI.
You have to enable the config in hadoop-default.xml

  namewebinterface.private.actions/name to be true

Best
Bhupesh


On 1/30/09 3:23 PM, Bill Au bill.w...@gmail.com wrote:

 Thanks.
 
 Anyone knows if there is plan to add this functionality to the web UI like
 job priority can be changed from both the command line and the web UI?
 
 Bill
 
 On Fri, Jan 30, 2009 at 5:54 PM, Arun C Murthy a...@yahoo-inc.com wrote:
 
 
 On Jan 30, 2009, at 2:41 PM, Bill Au wrote:
 
  Is there any way to cancel a job after it has been submitted?
 
 
 bin/hadoop job -kill jobid
 
 Arun
 



Re: extra documentation on how to write your own partitioner class

2009-01-30 Thread Sandy
Hi James,

Thank you very much! :-)

-SM

On Fri, Jan 30, 2009 at 4:17 PM, james warren ja...@rockyou.com wrote:

 Hello Sandy -
 Your partitioner isn't using any information from the key/value pair - it's
 only using the value T which is read once from the job configuration.
  getPartition() will always return the same value, so all of your data is
 being sent to one reducer. :P

 cheers,
 -James

 On Fri, Jan 30, 2009 at 1:32 PM, Sandy snickerdoodl...@gmail.com wrote:

  Hello,
 
  Could someone point me toward some more documentation on how to write
 one's
  own partition class? I have having quite a bit of trouble getting mine to
  work. So far, it looks something like this:
 
  public class myPartitioner extends MapReduceBase implements
  PartitionerIntWritable, IntWritable {
 
 private int T;
 
 public void configure(JobConf job) {
 super.configure(job);
 String myT = job.get(tval);//this is user defined
 T = Integer.parseInt(myT);
 }
 
 public int getPartition(IntWritable key, IntWritable value, int
  numReduceTasks) {
 int newT = (T/numReduceTasks);
 int id = ((value.get()/ T);
 return (int)(id/newT);
 }
  }
 
  In the run() function of my M/R program I just set it using:
 
  conf.setPartitionerClass(myPartitioner.class);
 
  Is there anything else I need to set in the run() function?
 
 
  The code compiles fine. When I run it, I know it is using the
  partitioner,
  since I get different output than if I just let it use HashPartitioner.
  However, it is not splitting between the reducers at all! If I set the
  number of reducers to 2, all the output shows up in part-0, while
  part-1 has nothing.
 
  I am having trouble debugging this since I don't know how I can observe
 the
  values of numReduceTasks (which I assume is being set by the system). Is
  this a proper assumption?
 
  If I try to insert any println() statements in the function, it isn't
  outputted to either my terminal or my log files. Could someone give me
 some
  general advice on how best to debug pieces of code like this?
 



Re: settin JAVA_HOME...

2009-01-30 Thread Sandy
Hi Zander,

Do not use jdk. Horrific things happen. You must use sun java in order to
use hadoop.

There are packages for sun java on the ubuntu repository. You can download
these directly using apt-get. This will install java 6 on your system.
Your JAVA_HOME line in hadoop-env.sh should look like:
export JAVA_HOME=/usr/lib/jvm/java-6-sun

Also, on the wiki, there is a guide for installing hadoop on ubuntu systems.
I think you may find this helpful.
http://wiki.apache.org/hadoop/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

All the best!
-SM


On Fri, Jan 30, 2009 at 4:33 PM, zander1013 zander1...@gmail.com wrote:


 i am installing default-jdk now. perhaps that was the problem. is this
 the
 right jdk?



 zander1013 wrote:
 
  cool!
 
  here is the output for those commands...
 
  a...@node0:~/Hadoop/hadoop-0.19.0$ which java
  /usr/bin/java
  a...@node0:~/Hadoop/hadoop-0.19.0$
  a...@node0:~/Hadoop/hadoop-0.19.0$ ls -l /usr/bin/java
  lrwxrwxrwx 1 root root 22 2009-01-29 18:03 /usr/bin/java -
  /etc/alternatives/java
  a...@node0:~/Hadoop/hadoop-0.19.0$
 
  ... i will try and set JAVA_HOME=/etc/alternatives/java...
 
  thank you for helping...
 
  -zander
 
 
  Mark Kerzner-2 wrote:
 
  Oh, you have used my path to JDK, you need yours
  do this
 
  which java
  something like /usr/bin/java will come back
 
  then do
  ls -l /usr/bin/java
 
  it will tell you where the link is to. There may be more redirections,
  get
  the real path to your JDK
 
  On Fri, Jan 30, 2009 at 4:09 PM, zander1013 zander1...@gmail.com
 wrote:
 
 
  okay,
 
  here is the section for conf/hadoop-env.sh...
 
  # Set Hadoop-specific environment variables here.
 
  # The only required environment variable is JAVA_HOME.  All others are
  # optional.  When running a distributed configuration it is best to
  # set JAVA_HOME in this file, so that it is correctly defined on
  # remote nodes.
 
  # The java implementation to use.  Required.
  # export JAVA_HOME=/usr/lib/j2sdk1.5-sun
  export JAVA_HOME=/usr/lib/jvm/default-java
 
  ...
 
  and here is what i got for output. i am trying to go through the
  tutorial
  at
  http://hadoop.apache.org/core/docs/current/quickstart.html
 
  here is the output...
 
  a...@node0:~/Hadoop/hadoop-0.19.0$ bin/hadoop jar hadoop-*-examples.jar
  grep
  input output 'dfs[a-z.]+'
  bin/hadoop: line 243: /usr/lib/jvm/default-java/bin/java: No such file
  or
  directory
  bin/hadoop: line 273: /usr/lib/jvm/default-java/bin/java: No such file
  or
  directory
  bin/hadoop: line 273: exec: /usr/lib/jvm/default-java/bin/java: cannot
  execute: No such file or directory
  a...@node0:~/Hadoop/hadoop-0.19.0$
 
  ...
 
  please advise...
 
 
 
 
  Mark Kerzner-2 wrote:
  
   You set it in the conf/hadoop-env.sh file, with an entry like this
   export JAVA_HOME=/usr/lib/jvm/default-java
  
   Mark
  
   On Fri, Jan 30, 2009 at 3:49 PM, zander1013 zander1...@gmail.com
  wrote:
  
  
   hi,
  
   i am new to hadoop. i am trying to set it up for the first time as a
   single
   node cluster. at present the snag is that i cannot seem to find the
   correct
   path for setting the JAVA_HOME variable.
  
   i am using ubuntu 8.10. i have tried using whereis java and tried
   setting
   the variable to point to those places (except the dir where i have
   hadoop).
  
   please advise.
  
   -zander
   --
   View this message in context:
   http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756240.html
   Sent from the Hadoop core-user mailing list archive at Nabble.com.
  
  
  
  
 
  --
  View this message in context:
  http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756569.html
  Sent from the Hadoop core-user mailing list archive at Nabble.com.
 
 
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/settin-JAVA_HOME...-tp21756240p21756916.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




How to add nodes to existing cluster?

2009-01-30 Thread Amandeep Khurana
I am trying to add nodes to an existing working cluster. Do I need to bring
the entire cluster down or just shutting down and restarting the namenode
after adding the new machine list to the slaves would work?

Amandeep


Re: How to add nodes to existing cluster?

2009-01-30 Thread Amandeep Khurana
Thanks Lohit


On Fri, Jan 30, 2009 at 7:13 PM, lohit lohit.vijayar...@yahoo.com wrote:

 Just starting DataNode and TaskTracker would add it to cluster.
 http://wiki.apache.org/hadoop/FAQ#25

 Lohit



 - Original Message 
 From: Amandeep Khurana ama...@gmail.com
 To: core-user@hadoop.apache.org
 Sent: Friday, January 30, 2009 6:55:00 PM
 Subject: How to add nodes to existing cluster?

 I am trying to add nodes to an existing working cluster. Do I need to bring
 the entire cluster down or just shutting down and restarting the namenode
 after adding the new machine list to the slaves would work?

 Amandeep