Re: large files vs many files

2009-05-09 Thread Sasha Dolgy
yes, that is the problem. two or hundreds...data streams in very quickly. On Fri, May 8, 2009 at 8:42 PM, jason hadoop jason.had...@gmail.com wrote: Is it possible that two tasks are trying to write to the same file path? On Fri, May 8, 2009 at 11:46 AM, Sasha Dolgy sdo...@gmail.com wrote:

Re: Huge DataNode Virtual Memory Usage

2009-05-09 Thread Stefan Will
Chris, Thanks for the tip ... However I'm already running 1.6_10: java version 1.6.0_10 Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) Do you know of a specific bug # in the JDK bug database that addresses this ? Cheers,

Re: large files vs many files

2009-05-09 Thread jason hadoop
You must create unique file names, I don't believe (but I do not know) that the append could will allow multiple writers. Are you writing from within a task, or as an external application writing into hadoop. You may try using UUID, http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html, as

Re: ClassNotFoundException

2009-05-09 Thread jason hadoop
rel is short for the hadoop version you are using 0.18.x, 0.19.x or 0.20.x etc You must make all of the required jars available to all of your tasks. You can either install them all the tasktracker machines and setup the tasktracker classpath to include them, or distributed them via the

Re: large files vs many files

2009-05-09 Thread Sasha Dolgy
Would WritableFactories not allow me to open one outputstream and continue to write() and sync() ? Maybe I'm reading into that wrong. Although UUID would be nice, it would still leave me in the problem of having lots of little files instead of a few large files. -sd On Sat, May 9, 2009 at 8:37

Re: Huge DataNode Virtual Memory Usage

2009-05-09 Thread Chris Collins
I think it may of been 6676016: http://java.sun.com/javase/6/webnotes/6u10.html We were able to repro at the time this through heavy lucene indexing + our internal document pre-processing logic that churned a lot of objects. We have still experience similar issues with 10 but much rarer.

Heterogeneous cluster - quadcores/8 cores, Fairscheduler

2009-05-09 Thread Saptarshi Guha
Hello, Our unit has 5 quad-core machines, running Hadoop. We have a dedicated Jobtracker/Namenode. Each machine has 32GB ram. We have the option of buying an 8 core,128GB machine and the question is would this be useful as a Tasktracker? A) It can certainly be used as the JobTracker and Namenode

Re: Error when start hadoop cluster.

2009-05-09 Thread jason hadoop
looks like you have different versions of the jars, or perhaps a someone has run ant in one of your installation directories. On Fri, May 8, 2009 at 7:54 PM, nguyenhuynh.mr nguyenhuynh...@gmail.comwrote: Hi all! I cannot start hdfs successful. I checked log file and found following message:

[ANN] hbase-0.19.2 available for download

2009-05-09 Thread stack
HBase 0.19.2 is now available for download http://hadoop.apache.org/hbase/releases.html 17 issues have been fixed since hbase 0.19.1. Release notes are available here: *http://tinyurl.com/p3x2bn* http://tinyurl.com/8xmyx9 Thanks to all who contributed to this release. At your service, The

Re-Addressing a cluster

2009-05-09 Thread John Kane
I have a situation that I have not been able to find in the mail archives. I have an active cluster that was built on a private switch with private IP address space (192.168.*.*) I need to relocate the cluster into real address space. Assuming that I have working DNS, is there an issue? Do I

Re: Most efficient way to support shared content among all mappers

2009-05-09 Thread Jeff Hammerbacher
Hey, For a more detailed discussion of how to use memcached for this purpose, see the paper Low-Latency, High-Throughput Access to Static Global Resources within the Hadoop Framework: http://www.umiacs.umd.edu/~jimmylin/publications/Lin_etal_TR2009.pdf. Regards, Jeff On Fri, May 8, 2009 at 2:49

Re: Most efficient way to support shared content among all mappers

2009-05-09 Thread jason hadoop
Thanks Jeff! On Sat, May 9, 2009 at 1:31 PM, Jeff Hammerbacher ham...@cloudera.comwrote: Hey, For a more detailed discussion of how to use memcached for this purpose, see the paper Low-Latency, High-Throughput Access to Static Global Resources within the Hadoop Framework:

Re: Re-Addressing a cluster

2009-05-09 Thread jason hadoop
You should be able to relocate the cluster's IP space by stopping the cluster, modifying the configuration files, resetting the dns and starting the cluster. Be best to verify connectivity with the new IP addresses before starting the cluster. to the best of my knowledge the namenode doesn't care