date:20080312

Re: performance

2008-03-12 Thread Jason Rennie

Hmm... sounds promising :) How do you distribute the data? Do you use HDFS? Pass the data directly to the individual nodes? We really only need to do the map operation like you. We need to distribute a matrix * vector operation, so we want rows of the matrix distributed across different

Re: performance

2008-03-12 Thread Theodore Van Rooy

I have been using the HDFS, setting the block size to some appropriate level and the replication as well. When submitting the job keep in mind that each block of the file in the HDFS will be passed into your mapping script as Standard Input. The datafile calls will be done locally if possible.

Re: performance

2008-03-12 Thread Ted Dunning

Identity reduce is nice because the result values can be sorted. On 3/12/08 8:21 AM, Jason Rennie [EMAIL PROTECTED] wrote: Map could perform all the dot-products, which is the heavy lifting in what we're trying to do. Might want to do a reduce after that, not sure...

Re: reading input file only once for multiple map functions

2008-03-12 Thread Ted Dunning

Ahhh... There is an old saying for this. I think you are pulling fly specks out of pepper. Unless your input format is very, very strange, doing the split again for two jobs does, indeed, lead to some small inefficiency, but this cost should be so low compared to other inefficiencies that you

RE: reading input file only once for multiple map functions

2008-03-12 Thread Joydeep Sen Sarma

the short answer is no - can't do this. there are some special cases: if the map output key for the same xml record is the same for both the jobs (ie. sort/partition/grouping is based on same value) - then you can do this in the application layer. if the map output keys differs - then there's

Re: HDFS interface

2008-03-12 Thread Cagdas Gerede

I would like to use HDFS component of Hadoop but not interested in MapReduce. All the Hadoop examples I have seen so far uses MapReduce classes and from these examples there is no reference to HDFS classes including File System API of Hadoop

Searching email list

2008-03-12 Thread Cagdas Gerede

Is there an easy way to search this email list? I couldn't find any web interface. Please help. CEG

Re: Searching email list

2008-03-12 Thread Daryl C. W. O'Shea

On 12/03/2008 4:18 PM, Cagdas Gerede wrote: Is there an easy way to search this email list? I couldn't find any web interface. Please help. http://wiki.apache.org/hadoop/MailingListArchives Daryl

Re: Does Hadoop Honor Reserved Space?

2008-03-12 Thread Eric Baldeschwieler

Hi Pete, Joydeep, These sound like thoughts that could lead to excellent suggestions with a little more investment of your time. We'd love it if you could invest some effort into contributing to the release process! Hadoop is open source and becoming active contributors is the best

Re: HDFS interface

2008-03-12 Thread Hairong Kuang

http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Hairong On 3/12/08 1:21 PM, Arun C Murthy [EMAIL PROTECTED] wrote: http://hadoop.apache.org/core/docs/r0.16.0/hdfs_user_guide.html Arun On Mar 12, 2008, at 1:16 PM, Cagdas Gerede wrote: I would like to use HDFS component of

Re: HDFS interface

2008-03-12 Thread Eddie C

I used this code like this inside of a tomcat web application. It works. Shared webserver filesystem :) On Wed, Mar 12, 2008 at 4:50 PM, Hairong Kuang [EMAIL PROTECTED] wrote: http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Hairong On 3/12/08 1:21 PM, Arun C Murthy [EMAIL

naming output files from Reduce

2008-03-12 Thread Prasan Ary

I have two Map/Reduce jobs and both of them output a file each. Is there a way I can name these output files different from the default names of part- ? thanks. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection

scaling experiments on a static cluster?

2008-03-12 Thread Chris Dyer

Hi Hadoop mavens- I'm hoping someone out there will have a quick solution for me. I'm trying to run some very basic scaling experiments for a rapidly approaching paper deadline on a 16.0 Hadoop cluster that has ~20 nodes with 2 procs/node. Ideally, I would want to run my code on clusters of

Re: scaling experiments on a static cluster?

2008-03-12 Thread Chris Dyer

Thanks-- that should work. I'll follow up with the cluster administrators to see if I can get this to happen. To rebalance the file storage can I just set the replication factor using hadoop dfs? Chris On Wed, Mar 12, 2008 at 6:36 PM, Ted Dunning [EMAIL PROTECTED] wrote: What about just

Re: scaling experiments on a static cluster?

2008-03-12 Thread Ted Dunning

Yes. Increase the replication. Wait. Drop the replication. On 3/12/08 3:44 PM, Chris Dyer [EMAIL PROTECTED] wrote: Thanks-- that should work. I'll follow up with the cluster administrators to see if I can get this to happen. To rebalance the file storage can I just set the replication

Re: HDFS interface

2008-03-12 Thread Cagdas Gerede

I see the following paragraphs in the wiki ( http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample)http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample Create a [image: [WWW]] FileSystemhttp://hadoop.apache.org/core/api/org/apache/hadoop/fs/FileSystem.htmlinstance by passing a new

Re: HDFS interface

2008-03-12 Thread Cagdas Gerede

I found the solution. Please let me know if you have a better idea. I added the following addResource lines. Configuration conf = new Configuration(); conf.addResource(new Path(location_of_hadoop-default.xml)); conf.addResource(new Path(location_of_hadoop-site.xml));

Re: HDFS interface

2008-03-12 Thread Hairong Kuang

If you add the configuration directory to the class path, the configuration files will be automatically loaded. Hairong On 3/12/08 5:32 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: I found the solution. Please let me know if you have a better idea. I added the following addResource lines.

Re: file permission problem

2008-03-12 Thread s29752-hadoopuser

Hi Johannes, Which version of hadoop are you using? There is a known bug in some nightly builds. Nicholas - Original Message From: Johannes Zillmann [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Wednesday, March 12, 2008 5:47:27 PM Subject: file permission problem Hi, i

Re: file permission problem

2008-03-12 Thread Johannes Zillmann

Hi Nicholas, i'm using the 0.16.0 distribution. Johannes [EMAIL PROTECTED] wrote: Hi Johannes, Which version of hadoop are you using? There is a known bug in some nightly builds. Nicholas - Original Message From: Johannes Zillmann [EMAIL PROTECTED] To:

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

2008-03-12 Thread 11 Nov.

I tried to specify WordCountInputFormat as the input format, here is the command line: bin/hadoop pipes -conf src/examples/pipes/conf/word-nopipe.xml -input inputdata/ -output outputdata -inputformat org.apache.hadoop.mapred.pipes.WordCountInputFormat The process of mapreduce seems not really

Re: performance

Re: performance

Re: performance

Re: reading input file only once for multiple map functions

RE: reading input file only once for multiple map functions

Re: HDFS interface

Searching email list

Re: Searching email list

Re: Does Hadoop Honor Reserved Space?

Re: HDFS interface

Re: HDFS interface

naming output files from Reduce

scaling experiments on a static cluster?

Re: scaling experiments on a static cluster?

Re: scaling experiments on a static cluster?

Re: HDFS interface

Re: HDFS interface

Re: HDFS interface

Re: file permission problem

Re: file permission problem

Re: Pipes example wordcount-nopipe.cc failed when reading from input splits

21 matches

Site Navigation

Mail list logo

Footer information