Re: performance

2008-03-12 Thread Jason Rennie
Hmm... sounds promising :) How do you distribute the data? Do you use HDFS? Pass the data directly to the individual nodes? We really only need to do the map operation like you. We need to distribute a matrix * vector operation, so we want rows of the matrix distributed across different

Re: performance

2008-03-12 Thread Theodore Van Rooy
I have been using the HDFS, setting the block size to some appropriate level and the replication as well. When submitting the job keep in mind that each block of the file in the HDFS will be passed into your mapping script as Standard Input. The datafile calls will be done locally if possible.

Re: performance

2008-03-12 Thread Ted Dunning
Identity reduce is nice because the result values can be sorted. On 3/12/08 8:21 AM, Jason Rennie [EMAIL PROTECTED] wrote: Map could perform all the dot-products, which is the heavy lifting in what we're trying to do. Might want to do a reduce after that, not sure...

Re: reading input file only once for multiple map functions

2008-03-12 Thread Ted Dunning
Ahhh... There is an old saying for this. I think you are pulling fly specks out of pepper. Unless your input format is very, very strange, doing the split again for two jobs does, indeed, lead to some small inefficiency, but this cost should be so low compared to other inefficiencies that you

RE: reading input file only once for multiple map functions

2008-03-12 Thread Joydeep Sen Sarma
the short answer is no - can't do this. there are some special cases: if the map output key for the same xml record is the same for both the jobs (ie. sort/partition/grouping is based on same value) - then you can do this in the application layer. if the map output keys differs - then there's

Re: HDFS interface

2008-03-12 Thread Cagdas Gerede
I would like to use HDFS component of Hadoop but not interested in MapReduce. All the Hadoop examples I have seen so far uses MapReduce classes and from these examples there is no reference to HDFS classes including File System API of Hadoop

Searching email list

2008-03-12 Thread Cagdas Gerede
Is there an easy way to search this email list? I couldn't find any web interface. Please help. CEG

Re: Searching email list

2008-03-12 Thread Daryl C. W. O'Shea
On 12/03/2008 4:18 PM, Cagdas Gerede wrote: Is there an easy way to search this email list? I couldn't find any web interface. Please help. http://wiki.apache.org/hadoop/MailingListArchives Daryl

Re: Does Hadoop Honor Reserved Space?

2008-03-12 Thread Eric Baldeschwieler
Hi Pete, Joydeep, These sound like thoughts that could lead to excellent suggestions with a little more investment of your time. We'd love it if you could invest some effort into contributing to the release process! Hadoop is open source and becoming active contributors is the best

Re: HDFS interface

2008-03-12 Thread Hairong Kuang
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Hairong On 3/12/08 1:21 PM, Arun C Murthy [EMAIL PROTECTED] wrote: http://hadoop.apache.org/core/docs/r0.16.0/hdfs_user_guide.html Arun On Mar 12, 2008, at 1:16 PM, Cagdas Gerede wrote: I would like to use HDFS component of

Re: HDFS interface

2008-03-12 Thread Eddie C
I used this code like this inside of a tomcat web application. It works. Shared webserver filesystem :) On Wed, Mar 12, 2008 at 4:50 PM, Hairong Kuang [EMAIL PROTECTED] wrote: http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Hairong On 3/12/08 1:21 PM, Arun C Murthy [EMAIL

naming output files from Reduce

2008-03-12 Thread Prasan Ary
I have two Map/Reduce jobs and both of them output a file each. Is there a way I can name these output files different from the default names of part- ? thanks. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection

scaling experiments on a static cluster?

2008-03-12 Thread Chris Dyer
Hi Hadoop mavens- I'm hoping someone out there will have a quick solution for me. I'm trying to run some very basic scaling experiments for a rapidly approaching paper deadline on a 16.0 Hadoop cluster that has ~20 nodes with 2 procs/node. Ideally, I would want to run my code on clusters of

Re: scaling experiments on a static cluster?

2008-03-12 Thread Chris Dyer
Thanks-- that should work. I'll follow up with the cluster administrators to see if I can get this to happen. To rebalance the file storage can I just set the replication factor using hadoop dfs? Chris On Wed, Mar 12, 2008 at 6:36 PM, Ted Dunning [EMAIL PROTECTED] wrote: What about just

Re: scaling experiments on a static cluster?

2008-03-12 Thread Ted Dunning
Yes. Increase the replication. Wait. Drop the replication. On 3/12/08 3:44 PM, Chris Dyer [EMAIL PROTECTED] wrote: Thanks-- that should work. I'll follow up with the cluster administrators to see if I can get this to happen. To rebalance the file storage can I just set the replication

Re: HDFS interface

2008-03-12 Thread Cagdas Gerede
I see the following paragraphs in the wiki ( http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample)http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Create a [image: [WWW]] FileSystemhttp://hadoop.apache.org/core/api/org/apache/hadoop/fs/FileSystem.htmlinstance by passing a new

Re: HDFS interface

2008-03-12 Thread Cagdas Gerede
I found the solution. Please let me know if you have a better idea. I added the following addResource lines. Configuration conf = new Configuration(); conf.addResource(new Path(location_of_hadoop-default.xml)); conf.addResource(new Path(location_of_hadoop-site.xml));

Re: HDFS interface

2008-03-12 Thread Hairong Kuang
If you add the configuration directory to the class path, the configuration files will be automatically loaded. Hairong On 3/12/08 5:32 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: I found the solution. Please let me know if you have a better idea. I added the following addResource lines.

Re: file permission problem

2008-03-12 Thread s29752-hadoopuser
Hi Johannes, Which version of hadoop are you using? There is a known bug in some nightly builds. Nicholas - Original Message From: Johannes Zillmann [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, March 12, 2008 5:47:27 PM Subject: file permission problem Hi, i

Re: file permission problem

2008-03-12 Thread Johannes Zillmann
Hi Nicholas, i'm using the 0.16.0 distribution. Johannes [EMAIL PROTECTED] wrote: Hi Johannes, Which version of hadoop are you using? There is a known bug in some nightly builds. Nicholas - Original Message From: Johannes Zillmann [EMAIL PROTECTED] To:

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

2008-03-12 Thread 11 Nov.
I tried to specify WordCountInputFormat as the input format, here is the command line: bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input inputdata/ -output outputdata -inputformat org.apache.hadoop.mapred.pipes.WordCountInputFormat The process of mapreduce seems not really