Re: configuration question

2006-11-27 Thread Raghu Angadi
John Wang wrote: And in the data node log, I see the line: 2006-11-24 10:05:07,261 INFO org.apache.hadoop.dfs.DataNode: Opened server at 50 010 So looks like the data node is alive. Also, by clocking on the Browse the filesystem link in the admin ui, I am taken to the address: http://19

Re: overriding config options on command line

2007-01-12 Thread Raghu Angadi
Aslo when 'Version Upgrade' is checked in, datanode and namenode will accept common arguments (like -conf, -D option). We still need to enable hadoop-deamon.sh to pass in arguments to bin/hadoop command but that would be a simple change. Fix to accept common arguments is independent of versi

Re: Hadoop suitable for production

2007-03-14 Thread Raghu Angadi
Simon Willnauer wrote: Thanks Timothy for your short answer, I guess I have to be a bit more specific! Actually I'm interested in the distributed FS rather than in the Map/Reduce features. Did the HDFS change very much since it has been moved out of the nutch project?As far as I can tell the ve

Re: Many Checksum Errors

2007-05-01 Thread Raghu Angadi
Can you manually try to read one such file with 'hadoop fs -cat'? If it is not a transient software error, you should see the checksum error again. If you see the error, it does not confirm a hardware error but if you are able to read correctly, then it is mostly Hadoop bug. Raghu. Dennis K

Re: Many Checksum Errors

2007-05-16 Thread Raghu Angadi
Dennis Kubes wrote: It turns out that ECC memory did the trick. We replaced all memory on our 50 node cluster with ECC memory and it has just completed a 50 Million page crawl and merge with 0 errors. Before we would have 10-20 errors or more on this job. I still find it interesting that th

Re: Upgrade of DFS - Urgent

2007-05-31 Thread Raghu Angadi
This is the result of HADOOP-1242. I prefer if it did not require presence of this image directory. For now you could manually create image/fsimage file in name/ directory. If you write random 4 bytes to fsimage, you have 50% chance of success. Basically readInt() from the file should be les

Re: write and sort performance

2007-06-08 Thread Raghu Angadi
Bwolen, First of all, Hadoop is not optimized for small cluster or small bursts of writes/reads. There are some costs (like storing a copy locally and copying it locally) that don't have benefits for small clusters with . You could try using different disks (not just partitions) for tmp dire

Re: performace questions

2007-06-08 Thread Raghu Angadi
Your interest is good. I think you should ask even smaller number of questions in one mail and try to do more experimentation. Bwolen Yang wrote: Here is a summary of my remaining questions from the [write and sort performance] thread. - Looks like every 5GB data I put into Hadoop DFS, it us

Re: performace questions

2007-06-09 Thread Raghu Angadi
> - 1 replica / 1 slave case writes at 15MB/sec. This seems to point > the performance problem to how datanode writes data (even to itself). On Hadoop, most of the delay you are seeing for 1 replica test with one node, is because of this: It first writes 64MB to local tmp file, then it sends

Re: map task in initializing phase for too long

2007-06-21 Thread Raghu Angadi
Doug Cutting wrote: Owen wrote: One side note is that all of the servers have a servlet such that if you do http://:/stacks you'll get a stack trace of all the threads in the server. I find that useful for remote debugging. *smile* Although if it is a task jvm that has the problem, then the

Re: New user question

2007-07-14 Thread Raghu Angadi
Ankur Sethi wrote: Then what? Can one bring up the new machine and start a namenode server and have it repopulate on its own? Please explain? If you bring up the new Namenode with same hostname and IP, then you don't need to restart the Datanodes. If the hostname changes, then you need to

Re: New user question

2007-07-14 Thread Raghu Angadi
You can specify multiple directories for Namenode data, in which case the image is written to all the directories. You can also an NFS mount, raid or similar approach. Raghu. Ankur Sethi wrote: Thank you for the information. I want to take a worse case scenario if the namenode fails. So y

Re: adding datanodes on the fly?

2007-07-17 Thread Raghu Angadi
Ankur Sethi wrote: How are datanodes added? Do they get added and started only at start of DFS filesystem? Can they be added while hadoop fs is running by editing slaves file or does hadoop have to be restarted? to add more data nodes, you can just bring up new datanodes with the right confi

Re: adding datanodes on the fly?

2007-07-17 Thread Raghu Angadi
le in logs/ directory as well. The documentation says to start DFS from the namenode which will startup all the datanodes. This is for the simple, common case. Raghu. Thanks, Ankur -Original Message----- From: Raghu Angadi [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 17, 2007 1:33 PM

Re: "Too many open files" error after running a number of jobs

2007-07-17 Thread Raghu Angadi
The stacktrace is on the client and not on datanode. If it is on linux, you can check /proc/pid/fd to see which fds are still open. Usually 1024 should be a lot for the client (and even on datanode). Raghu. Andrzej Bialecki wrote: Shailendra Mudgal wrote: Hi , We have upgraded our code to n

Re: Calling FsShell.doMain() hold so many threads

2007-07-24 Thread Raghu Angadi
Can you get the stack trace of the threads that are left? It was not obvious from the code where a thread is started. It might be 'trash handler'. You could add sleep(10sec) to give you enough time to get the trace. FsShell might not be designed for this use, but seems like a pretty useful

Re: Calling FsShell.doMain() hold so many threads

2007-07-24 Thread Raghu Angadi
for these use ? here "these" I mean: upload or download files and create dir programmatically even in concurrency operation. Raghu Angadi wrote: Can you get the stack trace of the threads that are left? It was not obvious from the code where a thread is started. It might be 'tra

Re: Hard Disk Failures

2007-07-31 Thread Raghu Angadi
I did not watch large hadoop clusters closely but from my experience of other large clusters that have heavy disk loads (seek dominated), the behavior you see seems consistent. Some disks do become very slow and if they are on some raid, whole raid runs at the speed of the slowest disk. iost

Re: Hard Disk Failures

2007-08-01 Thread Raghu Angadi
cluster and would make speculative execution of tasks in map/reduce more effective. Raghu. Raghu Angadi wrote: I did not watch large hadoop clusters closely but from my experience of other large clusters that have heavy disk loads (seek dominated), the behavior you see seems consistent. Some

Re: Writes to HDFS

2007-08-07 Thread Raghu Angadi
To write data to HDFS, you should be able to connect to both Namenode and to all of the datanodes. NameNode decides where to place the data blocks and tells the client (you) to send the blocks to corresponding datanodes. Raghu. Phantom wrote: Hi When I write data into HDFS do I always nee

Re: Writes to HDFS

2007-08-07 Thread Raghu Angadi
g the file. Raghu. Thanks A On 8/7/07, Raghu Angadi <[EMAIL PROTECTED]> wrote: To write data to HDFS, you should be able to connect to both Namenode and to all of the datanodes. NameNode decides where to place the data blocks and tells the client (you) to send the blocks to corresponding d

Re: data nodes imbalanced

2007-08-17 Thread Raghu Angadi
Joydeep Sen Sarma wrote: The only thing is that I have often done 'dfs -put' of large files (from a NFS mount) from this node. Would this case local storage to be allocated by HDFS? Yes. Local node stores one copy as long as it has space, if it is also part of the cluster. Raghu.

Re: hdfs close of a file failing

2007-08-22 Thread Raghu Angadi
Can you check in Namenode log for the filename? There should at least be one message regd allocating a block to this file. If exact filename grep does not give any results, then try to look for something close to it. Raghu. Michael Stack wrote: Anyone have any pointers debugging why an odd

Re: hdfs close of a file failing

2007-08-22 Thread Raghu Angadi
file. ... St.Ack Raghu Angadi wrote: Can you check in Namenode log for the filename? There should at least be one message regd allocating a block to this file. If exact filename grep does not give any results, then try to look for something close to it. Raghu. Michael Stack wrote

Re: hdfs close of a file failing

2007-08-22 Thread Raghu Angadi
Michael Stack wrote: I'll make an issue. I'll run some tests first. I think I have a simple receipe for provoking this failure mode. great! Can you try with logging level set to debug? Thanks. Raghu.

Re: Reduce Performance

2007-08-23 Thread Raghu Angadi
Thorsten Schuett wrote: On Wednesday 22 August 2007, Doug Cutting wrote: Thorsten Schuett wrote: In my case, it looks as if the loopback device is the bottleneck. So increasing the number of tasks won't help. Hmm. I have trouble believing that the loopback device is actually the bottleneck.

Re: Issues with 0.14.0...

2007-08-23 Thread Raghu Angadi
Regd the second problem : It is surprising that this fails repeatedly around the same place. 0.14 does check the checksum at the datanode (0.13 did not do this check). I will try to reproduce this. Raghu. C G wrote: Hi All: Second issue is a failure on copyFromLocal with lost connections.

Re: secondary namenode errors

2007-08-23 Thread Raghu Angadi
On a related note, please don't use 0.13.0, use the latest released version for 0.13 (I think it is 0.13.1). If the secondary namenode actually works, then it will resulting all the replications set to 1. Raghu. Joydeep Sen Sarma wrote: Hi folks, Would be grateful if someone can help u

Re: secondary namenode errors

2007-08-24 Thread Raghu Angadi
when moving to next major version. Raghu. Joydeep -Original Message- From: Raghu Angadi [mailto:[EMAIL PROTECTED] Sent: Thursday, August 23, 2007 9:44 PM To: hadoop-user@lucene.apache.org Subject: Re: secondary namenode errors On a related note, please don't use 0.13.0, use

Re: Issues with 0.14.0...

2007-08-24 Thread Raghu Angadi
66) Any thoughts or help appreciated...I'm planning to build out a large grid running terabytes of data...assuming I can get it Hadoop to handle more than 500M :-(. Thanks! Raghu Angadi <[EMAIL PROTECTED]> wrote: Regd the second problem : It is surprising that this fa

Re: Overhead of Java?

2007-09-05 Thread Raghu Angadi
I would say biggest difference between a C-Hadoop and Java-Hadoop would be memory usage on Namenode (and memory allocation related cpu benifits). Rest of the nodes on the cluster would perform about the same. C is more suitable for low level memory optimizations (both in overall size and numb

Re: How do I read a file to a buffer from HDFS?

2007-09-05 Thread Raghu Angadi
readFully(arr) implies that you are expecting read nothing less than arr.length bytes. If the file does not have that many bytes, you will get an EOF exception. http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataInputStream.html#readFully(byte[]) Raghu. Frank LIN wrote: Hi all, For some

Re: Hadoop behind a Firewall

2007-09-11 Thread Raghu Angadi
Namenode does not initiate any connections. The ip address for a Datanode that Namenode gives to client is as the IP address that Datanode uses to connect to Namenode (i.e. Namenode just does getRemoteAddress() on the connection from Datanode). There is no option to change this. If you just

Re: Namenode can't connect with Datanode during upgrade from 0.13.1 to 0.14.1

2007-09-12 Thread Raghu Angadi
Hi, Datanode should be able to connect to Namenode for any progress on upgrade. Do you see any other errors reported in datanode log? You need to fix the connection problem first. Are you comfortable taking tcpdump for Namenode port on the client? I think client should be trying to reconnect

Re: Namenode can't connect with Datanode during upgrade from 0.13.1 to 0.14.1

2007-09-12 Thread Raghu Angadi
Open Study wrote: Hi, all I noticed there's wrong timestamp from the status report of " ./hadoop dfsadmin -upgradeProgress details", although the time setting on the server is right, will this matter? No. It just means the stats were not updated yet (yeah, it probably should say "never" inste

Re: Large numbers of files?

2007-09-18 Thread Raghu Angadi
Andrew Cantino wrote: I know that HDFS can handle huge files, but how does it do with very large numbers of medium sized files? I'm interested in using it to store very large numbers of files (tens to hundreds of millions). Will this be a problem? pretty much. On a 64 bit JVM, with the current

Re: Question on relocation of Hadoop cluster

2007-10-03 Thread Raghu Angadi
Taeho Kang wrote: Hello all. Due to limited space in current datacenter, I am trying to move my Hadoop cluster to a new datacenter. In the new datacenter, each machine will keep its hostname, but each will be assigned to a new ip address. We should be able to edit our DNS to assign existing host

Re: Question on relocation of Hadoop cluster

2007-10-03 Thread Raghu Angadi
Taeho Kang wrote: Thanks for your quick reply, Raghu. The problem I am faced with is... - I need to move my machines to a new location assuming this goes well (i.e. no data loss), - The new location will assign new ip addresses for my machines. I am worried that this change of ip addresses m

Re: Question on relocation of Hadoop cluster

2007-10-04 Thread Raghu Angadi
(I am not sure if I replied already...) Taeho Kang wrote: Thanks for your quick reply, Raghu. The problem I am faced with is... - I need to move my machines to a new location Assuming this goes well (i.e. no data loss), - The new location will assign new ip addresses for my machines. I am

Re: configuration suggestions for 1k nodes

2007-11-06 Thread Raghu Angadi
Not a complete list by far, but just a start : For HDFS: - Make sure you run Java 6 (jdk1.6). - Set namenode handler count to 40 or more (dfs.namenode.handler.count, and may be mapred.job.tracker.handler.count etc). - more config guides are in the works : https://issues.apache.org/jira/bro

Re: commodity vs. high perf machines: which would you rather

2007-11-07 Thread Raghu Angadi
Does commodity hardware come with ECC memory? Since Hadoop apps tend to move large amounts of data around, ECC memory seems pretty important. With just two machines, you might be limited since you need to run multiple components (Namenode, job tracker, etc) on one machine. Two machines seems

Re: configuration suggestions for 1k nodes

2007-11-07 Thread Raghu Angadi
Doug Cutting wrote: Derek Gottfrid wrote: Are there configuration suggestions for 1k nodes ? http://wiki.apache.org/lucene-hadoop/FAQ#3 Updated this entry with Java 1.6 recommendation. Raghu.

Re: HDFS File Read

2007-11-08 Thread Raghu Angadi
ly try different buffer sizes etc. Thanx, Taj Raghu Angadi wrote: How slow is it? May the code that reads is relevant too. Raghu. j2eeiscool wrote: Hi, I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted file system use. From the tests (1 name + 1 data, both on different RHEL

Re: HDFS File Read

2007-11-08 Thread Raghu Angadi
How slow is it? May the code that reads is relevant too. Raghu. j2eeiscool wrote: Hi, I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted file system use. From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client running on the name node m/c) I have run so far

Re: HDFS File Read

2007-11-12 Thread Raghu Angadi
another m/c and read was about 4 times faster. I have the tcpdump from the original client m/c. This is probably asking too much but anything in particular I should be looking in the tcpdump. Is (tcpdump) about 16 megs in size. Thanx, Taj Raghu Angadi wrote: Thats too long.. buffer size

Re: HDFS File Read

2007-11-13 Thread Raghu Angadi
something to compare with: how long should this file read (68 megs) take on a good set-up (client and data node on same network, one hop). Thanx for your help, Taj Raghu Angadi wrote: Taj, Even 4 times faster (400 sec for 68MB) is not very fast. First try to scp a similar sized file between

Re: HDFS File Read

2007-11-13 Thread Raghu Angadi
To simplify, read rate should be faster than write speed. Raghu. Raghu Angadi wrote: Normally, Hadoop read saturates either disk b/w or network b/w on moderate hardware. So if you have one modern IDE disk and 100mbps ethernet, you should expect around 10MBps read rate for a simple read

Re: HDFS File Read

2007-11-16 Thread Raghu Angadi
Taj, I don't know what you are trying to do but simultaneous write and read won't work on any filesystem (unless reader is more complicated that what you had). For now, I think you will get most predictable behaviour if you read after writer has closed the file. Raghu. j2eeiscool wrote:

Re: HBase PerformanceEvaluation failing

2007-11-19 Thread Raghu Angadi
Do you know if this is a known issue with JVM and a bug is filed? If this is not expected to be fixed anytime soon, we might be able to work around. If you can, please try some other varition of SecureRandom that works on your platform. Otherwise we might replace it with something else. We used Se

Re: (repost3) Problem: [multi-node setup] addresss + DNS + ipc.client {query mailist = {!0}

2007-11-28 Thread Raghu Angadi
Khalil Honsali wrote: Greetings; I followed the excellent tutorials on the wiki, everything worked fine for the single node version, but for the multi-node setup (four nodes, including master), I had to use ip addresses instead of fully qualified domain names in the hadoop-site.xml(see appendix)

Re: Any one can tell me about how to write to HDFS?

2007-11-30 Thread Raghu Angadi
try 'Path outFile = new Path("/ryan/test");' also check if there is any usefule message on Namenode log. Raghu. Ryan Wang wrote: Hope this version can attract other's attention Hadoop Version: 0.15.0 JDK version: Sun JDK 6.0.3 Platform: Ubuntu 7.10 IDE: Eclipse 3.2 Code : public class Hado

Re: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

2007-12-04 Thread Raghu Angadi
I would think after an hours or so things are ok.. but that might not have helped the job. Raghu. Jason Venner wrote: We have a small cluster of 9 machines on a shared Gig Switch (with a lot of other machines) The other day, running a job, the reduce stalled, when the map was 99.99x% done

Re: Has anyone had hdfs block move synchronization failures with hadoop 0.15.0?

2007-12-04 Thread Raghu Angadi
gt; reduce > copy (623 of 789 at 0.12 MB/s) > reduce > copy (621 of 789 at 0.12 MB/s) > Raghu Angadi wrote: I would think after an hours or so things are ok.. but that might not have helped the job. Raghu. Jason Venner wrote: We have a small cluster of 9 machines on a shared Gig Switch (w

Re: Some Doubts of hadoop functionality

2007-12-20 Thread Raghu Angadi
Joydeep Sen Sarma wrote: agreed - i think for anyone who is thinking of using hadoop as a place from where data is served - has to be distrubed by lack of data protection. replication in hadoop provides protection against hardware failures. not software failures. backups (and depending on how t

Re: Under replicated block doesn't get fixed until DFS restart

2008-01-04 Thread Raghu Angadi
This is of course not expected. A more detailed info or log message would help. Do you know if there is at least one good block? Sometimes, the remaining "good" block might actually be corrupted and thus can not replicate itself. Restarting might just have brought up the datanodes that were dow

Re: Under replicated block doesn't get fixed until DFS restart

2008-01-07 Thread Raghu Angadi
arget replication factor: 3 Real replication factor: 2.9993873 The filesystem under path '/' is CORRUPT -Chris On Jan 4, 2008, at 1:02 PM, Raghu Angadi wrote: This is of course not expected. A more detailed info or log message would help. Do you know if there is at least

Re: Under replicated block doesn't get fixed until DFS restart

2008-01-07 Thread Raghu Angadi
hmm... one possibility is that rest of the nodes were down. but the name node showed other nodes were up. If more than one datanodes were up, this indicates some bug. One last thing : grep for this block id at 10.100.11.31. You might see some useful error message when the block was written.

Re: Under replicated block doesn't get fixed until DFS restart

2008-01-08 Thread Raghu Angadi
Datanode log looks fine. There was an error while writing to mirrors when the data was first written, which can happen sometimes. It is still not clear why namenode did not try to replicate these blocks until the next restart. How big is the cluster? Raghu. Chris Kline wrote: Ah, yes, very