John Wang wrote:
And in the data node log, I see the line:
2006-11-24 10:05:07,261 INFO org.apache.hadoop.dfs.DataNode: Opened server
at 50
010
So looks like the data node is alive.
Also, by clocking on the Browse the filesystem link in the admin ui, I
am taken to the address:
http://19
Aslo when 'Version Upgrade' is checked in, datanode and namenode will
accept common arguments (like -conf, -D option). We still need to enable
hadoop-deamon.sh to pass in arguments to bin/hadoop command but that
would be a simple change.
Fix to accept common arguments is independent of versi
Simon Willnauer wrote:
Thanks Timothy for your short answer, I guess I have to be a bit more
specific!
Actually I'm interested in the distributed FS rather than in the
Map/Reduce features. Did the HDFS change very much since it has been
moved out of the nutch project?As far as I can tell the ve
Can you manually try to read one such file with 'hadoop fs -cat'? If it
is not a transient software error, you should see the checksum error
again. If you see the error, it does not confirm a hardware error but if
you are able to read correctly, then it is mostly Hadoop bug.
Raghu.
Dennis K
Dennis Kubes wrote:
It turns out that ECC memory did the trick. We replaced all memory on
our 50 node cluster with ECC memory and it has just completed a 50
Million page crawl and merge with 0 errors. Before we would have 10-20
errors or more on this job.
I still find it interesting that th
This is the result of HADOOP-1242. I prefer if it did not require
presence of this image directory.
For now you could manually create image/fsimage file in name/ directory.
If you write random 4 bytes to fsimage, you have 50% chance of success.
Basically readInt() from the file should be les
Bwolen,
First of all, Hadoop is not optimized for small cluster or small bursts
of writes/reads. There are some costs (like storing a copy locally and
copying it locally) that don't have benefits for small clusters with .
You could try using different disks (not just partitions) for tmp
dire
Your interest is good. I think you should ask even smaller number of
questions in one mail and try to do more experimentation.
Bwolen Yang wrote:
Here is a summary of my remaining questions from the [write and sort
performance] thread.
- Looks like every 5GB data I put into Hadoop DFS, it us
> - 1 replica / 1 slave case writes at 15MB/sec. This seems to point
> the performance problem to how datanode writes data (even to itself).
On Hadoop, most of the delay you are seeing for 1 replica test with one
node, is because of this: It first writes 64MB to local tmp file, then
it sends
Doug Cutting wrote:
Owen wrote:
One side note is that all of the servers have a servlet such that if
you do http://:/stacks you'll get a stack trace of all
the threads in the server. I find that useful for remote debugging.
*smile* Although if it is a task jvm that has the problem, then the
Ankur Sethi wrote:
Then what? Can one bring up the new machine and start a namenode server and
have it repopulate on its own? Please explain?
If you bring up the new Namenode with same hostname and IP, then you
don't need to restart the Datanodes. If the hostname changes, then you
need to
You can specify multiple directories for Namenode data, in which case
the image is written to all the directories. You can also an NFS mount,
raid or similar approach.
Raghu.
Ankur Sethi wrote:
Thank you for the information.
I want to take a worse case scenario if the namenode fails. So y
Ankur Sethi wrote:
How are datanodes added? Do they get added and started only at start of DFS
filesystem? Can they be added while hadoop fs is running by editing slaves
file or does hadoop have to be restarted?
to add more data nodes, you can just bring up new datanodes with the
right confi
le in logs/ directory as well.
The documentation says to start DFS from the namenode which will startup all
the datanodes.
This is for the simple, common case.
Raghu.
Thanks,
Ankur
-Original Message-----
From: Raghu Angadi [mailto:[EMAIL PROTECTED]
Sent: Tuesday, July 17, 2007 1:33 PM
The stacktrace is on the client and not on datanode. If it is on linux,
you can check /proc/pid/fd to see which fds are still open. Usually 1024
should be a lot for the client (and even on datanode).
Raghu.
Andrzej Bialecki wrote:
Shailendra Mudgal wrote:
Hi ,
We have upgraded our code to n
Can you get the stack trace of the threads that are left? It was not
obvious from the code where a thread is started. It might be 'trash
handler'.
You could add sleep(10sec) to give you enough time to get the trace.
FsShell might not be designed for this use, but seems like a pretty
useful
for these use ?
here "these" I mean: upload or download files and create dir
programmatically even in concurrency operation.
Raghu Angadi wrote:
Can you get the stack trace of the threads that are left? It was not
obvious from the code where a thread is started. It might be 'tra
I did not watch large hadoop clusters closely but from my experience of
other large clusters that have heavy disk loads (seek dominated), the
behavior you see seems consistent. Some disks do become very slow and if
they are on some raid, whole raid runs at the speed of the slowest disk.
iost
cluster and would make speculative execution of tasks in map/reduce more
effective.
Raghu.
Raghu Angadi wrote:
I did not watch large hadoop clusters closely but from my experience of
other large clusters that have heavy disk loads (seek dominated), the
behavior you see seems consistent. Some
To write data to HDFS, you should be able to connect to both Namenode
and to all of the datanodes. NameNode decides where to place the data
blocks and tells the client (you) to send the blocks to corresponding
datanodes.
Raghu.
Phantom wrote:
Hi
When I write data into HDFS do I always nee
g the file.
Raghu.
Thanks
A
On 8/7/07, Raghu Angadi <[EMAIL PROTECTED]> wrote:
To write data to HDFS, you should be able to connect to both Namenode
and to all of the datanodes. NameNode decides where to place the data
blocks and tells the client (you) to send the blocks to corresponding
d
Joydeep Sen Sarma wrote:
The only thing is that I have often done 'dfs -put' of large files (from
a NFS mount) from this node. Would this case local storage to be
allocated by HDFS?
Yes. Local node stores one copy as long as it has space, if it is also
part of the cluster.
Raghu.
Can you check in Namenode log for the filename? There should at least be
one message regd allocating a block to this file. If exact filename grep
does not give any results, then try to look for something close to it.
Raghu.
Michael Stack wrote:
Anyone have any pointers debugging why an odd
file.
...
St.Ack
Raghu Angadi wrote:
Can you check in Namenode log for the filename? There should at least
be one message regd allocating a block to this file. If exact filename
grep does not give any results, then try to look for something close
to it.
Raghu.
Michael Stack wrote
Michael Stack wrote:
I'll make an issue. I'll run some tests first. I think I have a simple
receipe for provoking this failure mode.
great! Can you try with logging level set to debug?
Thanks.
Raghu.
Thorsten Schuett wrote:
On Wednesday 22 August 2007, Doug Cutting wrote:
Thorsten Schuett wrote:
In my case, it looks as if the loopback device is the bottleneck. So
increasing the number of tasks won't help.
Hmm. I have trouble believing that the loopback device is actually the
bottleneck.
Regd the second problem :
It is surprising that this fails repeatedly around the same place. 0.14
does check the checksum at the datanode (0.13 did not do this check). I
will try to reproduce this.
Raghu.
C G wrote:
Hi All:
Second issue is a failure on copyFromLocal with lost connections.
On a related note, please don't use 0.13.0, use the latest released
version for 0.13 (I think it is 0.13.1). If the secondary namenode
actually works, then it will resulting all the replications set to 1.
Raghu.
Joydeep Sen Sarma wrote:
Hi folks,
Would be grateful if someone can help u
when moving to next major version.
Raghu.
Joydeep
-Original Message-
From: Raghu Angadi [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 23, 2007 9:44 PM
To: hadoop-user@lucene.apache.org
Subject: Re: secondary namenode errors
On a related note, please don't use 0.13.0, use
66)
Any thoughts or help appreciated...I'm planning to build out a large grid
running terabytes of data...assuming I can get it Hadoop to handle more than
500M :-(.
Thanks!
Raghu Angadi <[EMAIL PROTECTED]> wrote:
Regd the second problem :
It is surprising that this fa
I would say biggest difference between a C-Hadoop and Java-Hadoop would
be memory usage on Namenode (and memory allocation related cpu
benifits). Rest of the nodes on the cluster would perform about the
same. C is more suitable for low level memory optimizations (both in
overall size and numb
readFully(arr) implies that you are expecting read nothing less than
arr.length bytes. If the file does not have that many bytes, you will
get an EOF exception.
http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataInputStream.html#readFully(byte[])
Raghu.
Frank LIN wrote:
Hi all,
For some
Namenode does not initiate any connections.
The ip address for a Datanode that Namenode gives to client is as the IP
address that Datanode uses to connect to Namenode (i.e. Namenode just
does getRemoteAddress() on the connection from Datanode). There is no
option to change this.
If you just
Hi,
Datanode should be able to connect to Namenode for any progress on
upgrade. Do you see any other errors reported in datanode log? You need
to fix the connection problem first.
Are you comfortable taking tcpdump for Namenode port on the client? I
think client should be trying to reconnect
Open Study wrote:
Hi, all
I noticed there's wrong timestamp from the status report of " ./hadoop
dfsadmin -upgradeProgress details", although the time setting on the server
is right, will this matter?
No. It just means the stats were not updated yet (yeah, it probably
should say "never" inste
Andrew Cantino wrote:
I know that HDFS can handle huge files, but how does it do with very
large numbers of medium sized files? I'm interested in using it to
store very large numbers of files (tens to hundreds of millions).
Will this be a problem?
pretty much. On a 64 bit JVM, with the current
Taeho Kang wrote:
Hello all.
Due to limited space in current datacenter, I am trying to move my Hadoop
cluster to a new datacenter.
In the new datacenter, each machine will keep its hostname, but each will be
assigned to a new ip address.
We should be able to edit our DNS to assign existing host
Taeho Kang wrote:
Thanks for your quick reply, Raghu.
The problem I am faced with is...
- I need to move my machines to a new location
assuming this goes well (i.e. no data loss),
- The new location will assign new ip addresses for my machines.
I am worried that this change of ip addresses m
(I am not sure if I replied already...)
Taeho Kang wrote:
Thanks for your quick reply, Raghu.
The problem I am faced with is...
- I need to move my machines to a new location
Assuming this goes well (i.e. no data loss),
- The new location will assign new ip addresses for my machines.
I am
Not a complete list by far, but just a start :
For HDFS:
- Make sure you run Java 6 (jdk1.6).
- Set namenode handler count to 40 or more (dfs.namenode.handler.count,
and may be mapred.job.tracker.handler.count etc).
- more config guides are in the works :
https://issues.apache.org/jira/bro
Does commodity hardware come with ECC memory? Since Hadoop apps tend to
move large amounts of data around, ECC memory seems pretty important.
With just two machines, you might be limited since you need to run
multiple components (Namenode, job tracker, etc) on one machine.
Two machines seems
Doug Cutting wrote:
Derek Gottfrid wrote:
Are there configuration suggestions for 1k nodes ?
http://wiki.apache.org/lucene-hadoop/FAQ#3
Updated this entry with Java 1.6 recommendation.
Raghu.
ly try different buffer sizes etc.
Thanx,
Taj
Raghu Angadi wrote:
How slow is it? May the code that reads is relevant too.
Raghu.
j2eeiscool wrote:
Hi,
I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
file
system use.
From the tests (1 name + 1 data, both on different RHEL
How slow is it? May the code that reads is relevant too.
Raghu.
j2eeiscool wrote:
Hi,
I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted file
system use.
From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
running on the name node m/c) I have run so far
another m/c and read was about 4 times faster.
I have the tcpdump from the original client m/c.
This is probably asking too much but anything in particular I should be
looking in the tcpdump.
Is (tcpdump) about 16 megs in size.
Thanx,
Taj
Raghu Angadi wrote:
Thats too long.. buffer size
something to compare with: how long should this file read
(68 megs) take on a good set-up
(client and data node on same network, one hop).
Thanx for your help,
Taj
Raghu Angadi wrote:
Taj,
Even 4 times faster (400 sec for 68MB) is not very fast. First try to
scp a similar sized file between
To simplify, read rate should be faster than write speed.
Raghu.
Raghu Angadi wrote:
Normally, Hadoop read saturates either disk b/w or network b/w on
moderate hardware. So if you have one modern IDE disk and 100mbps
ethernet, you should expect around 10MBps read rate for a simple read
Taj,
I don't know what you are trying to do but simultaneous write and read
won't work on any filesystem (unless reader is more complicated that
what you had).
For now, I think you will get most predictable behaviour if you read
after writer has closed the file.
Raghu.
j2eeiscool wrote:
Do you know if this is a known issue with JVM and a bug is filed? If
this is not expected to be fixed anytime soon, we might be able to work
around.
If you can, please try some other varition of SecureRandom that works on
your platform. Otherwise we might replace it with something else. We
used Se
Khalil Honsali wrote:
Greetings;
I followed the excellent tutorials on the wiki, everything worked fine for
the single node version,
but for the multi-node setup (four nodes, including master), I had to use ip
addresses instead of fully qualified domain names in the
hadoop-site.xml(see appendix)
try 'Path outFile = new Path("/ryan/test");'
also check if there is any usefule message on Namenode log.
Raghu.
Ryan Wang wrote:
Hope this version can attract other's attention
Hadoop Version: 0.15.0
JDK version: Sun JDK 6.0.3
Platform: Ubuntu 7.10
IDE: Eclipse 3.2
Code :
public class Hado
I would think after an hours or so things are ok.. but that might not
have helped the job.
Raghu.
Jason Venner wrote:
We have a small cluster of 9 machines on a shared Gig Switch (with a lot
of other machines)
The other day, running a job, the reduce stalled, when the map was
99.99x% done
gt;
reduce > copy (623 of 789 at 0.12 MB/s) >
reduce > copy (621 of 789 at 0.12 MB/s) >
Raghu Angadi wrote:
I would think after an hours or so things are ok.. but that might not
have helped the job.
Raghu.
Jason Venner wrote:
We have a small cluster of 9 machines on a shared Gig Switch (w
Joydeep Sen Sarma wrote:
agreed - i think for anyone who is thinking of using hadoop as a place from where data is served - has to be distrubed by lack of data protection.
replication in hadoop provides protection against hardware failures. not software failures. backups (and depending on how t
This is of course not expected. A more detailed info or log message
would help. Do you know if there is at least one good block? Sometimes,
the remaining "good" block might actually be corrupted and thus can not
replicate itself. Restarting might just have brought up the datanodes
that were dow
arget replication factor: 3
Real replication factor: 2.9993873
The filesystem under path '/' is CORRUPT
-Chris
On Jan 4, 2008, at 1:02 PM, Raghu Angadi wrote:
This is of course not expected. A more detailed info or log message
would help. Do you know if there is at least
hmm... one possibility is that rest of the nodes were down. but the name
node showed other nodes were up. If more than one datanodes were up,
this indicates some bug.
One last thing : grep for this block id at 10.100.11.31. You might see
some useful error message when the block was written.
Datanode log looks fine. There was an error while writing to mirrors
when the data was first written, which can happen sometimes. It is still
not clear why namenode did not try to replicate these blocks until the
next restart.
How big is the cluster?
Raghu.
Chris Kline wrote:
Ah, yes, very
58 matches
Mail list logo