RE: HDFS loosing blocks or connection error

2009-01-23 Thread Zak, Richard [USA]
It happens right after the MR job (though once or twice its happened during). I am not using EBS, just HDFS between the machines. As for tasks, there are 4 mappers and 0 reducers. Richard J. Zak -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of

Re: HDFS loosing blocks or connection error

2009-01-23 Thread Jean-Daniel Cryans
Yes you may overload your machines that way because of the small number. One thing to do would be to look in the logs for any signs of IOExceptions and report them back here. Another thing you can do is to change some configs. Increase *dfs.datanode.max.xcievers* to 512 and set the

Re: HDFS loosing blocks or connection error

2009-01-23 Thread Konstantin Shvachko
Yes guys. We observed such problems. They will be common for 0.18.2 and 0.19.0 exactly as you described it when data-nodes become unstable. There were several issues, please take a look HADOOP-4997 workaround for tmp file handling on DataNodes HADOOP-4663 - links to other related HADOOP-4810

AlredyBeingCreatedExceptions after upgrade to 0.19.0

2009-01-23 Thread Stefan Will
Hi, Since I¹ve upgraded to 0.19.0, I¹ve been getting the following exceptions when restarting jobs, or even when a failed reducer is being restarted by the job tracker. It appears that stale file locks in the namenode don¹t get properly released sometimes: org.apache.hadoop.ipc.RemoteException:

Re: HDFS loosing blocks or connection error

2009-01-23 Thread Raghu Angadi
It seems hdfs isn't so robust or reliable as the website says and/or I have a configuration issue. quite possible. How robust does the website say it is? I agree debuggings failures like the following is pretty hard for casual users. You need look at the logs for block, or run 'bin/hadoop

Re: using distcp for http source files

2009-01-23 Thread Doug Cutting
Can you please attach your latest version of this to https://issues.apache.org/jira/browse/HADOOP-496? Thanks, Doug Boris Musykantski wrote: we have fixed some patches in JIRA for support of webdav server on top of HDFS, updated to work with newer version (0.18.0 IIRC) and added support for

Re: Why does Hadoop need ssh access to master and slaves?

2009-01-23 Thread Edward Capriolo
I am looking to create some RA scripts and experiment with starting hadoop via linux-ha cluster manager. Linux HA would handle restarting downed nodes and eliminate the ssh key dependency.

How-to in MapReduce

2009-01-23 Thread Mark Kerzner
Hi, esteemed group, how would I form Maps in MapReduce to recursevely look at every file in a directory, and do something to this file, such as produce a PDF or compute its hash? For that matter, Google builds its index using MapReduce, or so the papers say. First the crawlers store all the

Re: How-to in MapReduce

2009-01-23 Thread tim robertson
Hi, Sounds like you might want to look at the Nutch project architecture and then see the Nutch on Hadoop tutorial - http://wiki.apache.org/nutch/NutchHadoopTutorial It does web crawling, and indexing using Lucene. It would be a good place to start anyway for ideas, even if it doesn't end up

RE: Problem running hdfs_test

2009-01-23 Thread Arifa Nisar
Thanks a lot for your help. I solved that problem by removing LDFLAGS (containing libjvm.so) from hdfs_test compilation. I added that flag to compile correctly using Makefile but that was the real problem. Only after removing it I was able to run with ant. Thanks, Arifa -Original

hadoop consulting?

2009-01-23 Thread Christophe Bisciglia
Hey all, I wanted to reach out to the user / development community to start identifying those of you who are interested in consulting / contract work for new Hadoop deployments. A number of our larger customers are asking for more extensive on-site help than would normally happen under a support

Re: hadoop consulting?

2009-01-23 Thread Mark Kerzner - SHMSoft
Christophe, I am writing my first Hadoop project now, and I have 20 years of consulting, and I am in Houston. Here is my resume, http://markkerzner.googlepages.com. I have used EC2. Sincerely, Mark On Fri, Jan 23, 2009 at 4:04 PM, Christophe Bisciglia christo...@cloudera.com wrote: Hey all,

Re: How-to in MapReduce

2009-01-23 Thread Mark Kerzner
Tim, I looked there, but it is a set up manual. I read the MapReduce, Sazall, and the MS paper on these, but I need best practices. Thank you, Mark On Fri, Jan 23, 2009 at 3:22 PM, tim robertson timrobertson...@gmail.comwrote: Hi, Sounds like you might want to look at the Nutch project

Re: hadoop consulting?

2009-01-23 Thread Christophe Bisciglia
Thanks Mark. I'll be getting in touch early next week. Others, I see replies default strait to the list. Please feel free to email just me (christo...@cloudera.com), unless, well, you're in the mood to share you bio with everyone :-) Cheers, Christophe On Fri, Jan 23, 2009 at 2:31 PM, Mark

HDFS - millions of files in one directory?

2009-01-23 Thread Mark Kerzner
Hi, there is a performance penalty in Windows (pardon the expression) if you put too many files in the same directory. The OS becomes very slow, stops seeing them, and lies about their status to my Java requests. I do not know if this is also a problem in Linux, but in HDFS - do I need to balance

Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi
If you are adding and deleting files in the directory, you might notice CPU penalty (for many loads, higher CPU on NN is not an issue). This is mainly because HDFS does a binary search on files in a directory each time it inserts a new file. If the directory is relatively idle, then there

Re: HDFS - millions of files in one directory?

2009-01-23 Thread Mark V
On Sat, Jan 24, 2009 at 10:03 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, there is a performance penalty in Windows (pardon the expression) if you put too many files in the same directory. The OS becomes very slow, stops seeing them, and lies about their status to my Java requests. I do

Re: HDFS - millions of files in one directory?

2009-01-23 Thread Raghu Angadi
Raghu Angadi wrote: If you are adding and deleting files in the directory, you might notice CPU penalty (for many loads, higher CPU on NN is not an issue). This is mainly because HDFS does a binary search on files in a directory each time it inserts a new file. I should add that equal or

Re: hadoop balanceing data

2009-01-23 Thread Hairong Kuang
%Remaining is much more fluctuate than %dfs used. This is because dfs shares the disks with mapred and mapred tasks may use a lot of disks temporally. So trying to keep the same %free is impossible most of the time. Hairong On 1/19/09 10:28 PM, Billy Pearson sa...@pearsonwholesale.com wrote: