Hi Hadoop mavens-
I'm hoping someone out there will have a quick solution for me. I'm
trying to run some very basic scaling experiments for a rapidly
approaching paper deadline on a 16.0 Hadoop cluster that has ~20 nodes
with 2 procs/node. Ideally, I would want to run my code on clusters
of
numbers.
My suggested procedure would be to take all but 2 nodes down, and then
- run test
- double number of nodes
- rebalance file storage
- lather, rinse, repeat
On 3/12/08 3:28 PM, Chris Dyer [EMAIL PROTECTED] wrote:
Hi Hadoop mavens-
I'm hoping someone out there will have
I've noticed this behavior as well in 16.0 with RuntimeExceptions in general.
Chris
On Mon, Mar 17, 2008 at 6:14 PM, Matt Kent [EMAIL PROTECTED] wrote:
I recently upgraded from Hadoop 0.14.1 to 0.16.1. Previously in 0.14.1,
if a map or reduce task threw a runtime exception such as an NPE, the
Hi all--
I would like to have a reducer generate a MapFile so that in later
processes I can look up the values associated with a few keys without
processing an entire sequence file. However, if I have N reducers, I
will generate N different map files, so to pick the right map file I
will need to
Setting the number of reducers to zero has the advantage that no
sorting of the intermediate values will occur, which can save
considerable amounts of time if the output from the mapper is large.
Chris
On Wed, Apr 30, 2008 at 6:47 PM, jkupferman [EMAIL PROTECTED] wrote:
You actually dont need
:36 PM, Chris Dyer [EMAIL PROTECTED] wrote:
Qin, since I can guess what you're trying to do with this (emit a
bunch of expected counts at the end of EM?), you can write output
during the call to close(). It involves having to store the output
collector object as a member of the class
Hi all-
One more question.
I'm looking for a lightweight way to serve data stored as key-value
pairs in a series of MapFiles or SequenceFiles. HBase/Hypertable
offer a very robust, powerful solution to this problem with a bunch of
extra features like updates and column types, etc., that I don't
an on-disk storage in addition to in-memory (thus
avoiding the if a machine
goes down, data is lost issue of memcached).
On Fri, Sep 19, 2008 at 10:54 AM, James Moore [EMAIL PROTECTED] wrote:
On Wed, Sep 17, 2008 at 10:05 PM, Chris Dyer [EMAIL PROTECTED] wrote:
I'm looking for a lightweight
Hi all-
I am using streaming with some c++ mappers and reducers. One of the
binaries I attempted to run this evening had a dependency on a shared
library that did not exist on my cluster, so it failed during
execution. However, the streaming framework didn't appear to
recognize this failure, and
I've been using protocol buffers to serialize the data and then
encoding them in base64 so that I can then treat them like text. This
obviously isn't optimal, but I'm assuming that this is only a short
term solution which won't be necessary when non-Java clients become
first class citizens of the
I've noticed that the counters sometimes disappear from jobdetails
page, but the job context object that is returned at the end of the
job always seems to have the correct values for my counters, which has
been my primary concern...
-Chris
On Sat, Nov 22, 2008 at 3:26 PM, Arthur van Hoff [EMAIL
Hey Delip-
mapreduce doesn't really have any particular support for iterative
algorithms. You just have to put a loop in the control program and
set the output path from the previous iteration to be the input path
in the next iteration. This at least lets you control whether you
decide to keep
12 matches
Mail list logo