scaling experiments on a static cluster?

2008-03-12 Thread Chris Dyer
Hi Hadoop mavens- I'm hoping someone out there will have a quick solution for me. I'm trying to run some very basic scaling experiments for a rapidly approaching paper deadline on a 16.0 Hadoop cluster that has ~20 nodes with 2 procs/node. Ideally, I would want to run my code on clusters of

Re: scaling experiments on a static cluster?

2008-03-12 Thread Chris Dyer
numbers. My suggested procedure would be to take all but 2 nodes down, and then - run test - double number of nodes - rebalance file storage - lather, rinse, repeat On 3/12/08 3:28 PM, Chris Dyer [EMAIL PROTECTED] wrote: Hi Hadoop mavens- I'm hoping someone out there will have

Re: runtime exceptions not killing job

2008-03-17 Thread Chris Dyer
I've noticed this behavior as well in 16.0 with RuntimeExceptions in general. Chris On Mon, Mar 17, 2008 at 6:14 PM, Matt Kent [EMAIL PROTECTED] wrote: I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1, if a map or reduce task threw a runtime exception such as an NPE, the

using a set of MapFiles - getting the right partition

2008-03-20 Thread Chris Dyer
Hi all-- I would like to have a reducer generate a MapFile so that in later processes I can look up the values associated with a few keys without processing an entire sequence file. However, if I have N reducers, I will generate N different map files, so to pick the right map file I will need to

Re: Getting map ouput as final output by setting number of reduce to zero

2008-04-30 Thread Chris Dyer
Setting the number of reducers to zero has the advantage that no sorting of the intermediate values will occur, which can save considerable amounts of time if the output from the mapper is large. Chris On Wed, Apr 30, 2008 at 6:47 PM, jkupferman [EMAIL PROTECTED] wrote: You actually dont need

Re: Know how many records remain?

2008-08-21 Thread Chris Dyer
:36 PM, Chris Dyer [EMAIL PROTECTED] wrote: Qin, since I can guess what you're trying to do with this (emit a bunch of expected counts at the end of EM?), you can write output during the call to close(). It involves having to store the output collector object as a member of the class

Serving contents of large MapFiles/SequenceFiles from memory across many machines

2008-09-17 Thread Chris Dyer
Hi all- One more question. I'm looking for a lightweight way to serve data stored as key-value pairs in a series of MapFiles or SequenceFiles. HBase/Hypertable offer a very robust, powerful solution to this problem with a bunch of extra features like updates and column types, etc., that I don't

Re: Serving contents of large MapFiles/SequenceFiles from memory across many machines

2008-09-19 Thread Chris Dyer
an on-disk storage in addition to in-memory (thus avoiding the if a machine goes down, data is lost issue of memcached). On Fri, Sep 19, 2008 at 10:54 AM, James Moore [EMAIL PROTECTED] wrote: On Wed, Sep 17, 2008 at 10:05 PM, Chris Dyer [EMAIL PROTECTED] wrote: I'm looking for a lightweight

streaming silently failing when executing binaries with unresolved dependencies

2008-10-02 Thread Chris Dyer
Hi all- I am using streaming with some c++ mappers and reducers. One of the binaries I attempted to run this evening had a dependency on a shared library that did not exist on my cluster, so it failed during execution. However, the streaming framework didn't appear to recognize this failure, and

Re: Can anyone recommend me a inter-language data file format?

2008-11-03 Thread Chris Dyer
I've been using protocol buffers to serialize the data and then encoding them in base64 so that I can then treat them like text. This obviously isn't optimal, but I'm assuming that this is only a short term solution which won't be necessary when non-Java clients become first class citizens of the

Re: Counters missing from jobdetails.jsp page?

2008-11-22 Thread Chris Dyer
I've noticed that the counters sometimes disappear from jobdetails page, but the job context object that is returned at the end of the job always seems to have the correct values for my counters, which has been my primary concern... -Chris On Sat, Nov 22, 2008 at 3:26 PM, Arthur van Hoff [EMAIL

Re: Run Map-Reduce multiple times

2008-12-07 Thread Chris Dyer
Hey Delip- mapreduce doesn't really have any particular support for iterative algorithms. You just have to put a loop in the control program and set the output path from the previous iteration to be the input path in the next iteration. This at least lets you control whether you decide to keep