I encountered a similar case.
Here is the Jira: https://issues.apache.org/jira/browse/HADOOP-2164
Runping
-Original Message-
From: Vadim Zaliva [mailto:[EMAIL PROTECTED]
Sent: Tuesday, January 15, 2008 9:59 PM
To: hadoop-user@lucene.apache.org
Subject: Re: unable to figure out
One way to achieve your goal is to implement your own
OutputFormat/RecordWriter classes.
Your reducer will emit all the key/value pairs as in the normal case.
In your record writer class can open multiple output files and dispatch
the key/value to appropriate files based on the actual values.
An improvement over Doug's proposal is to make the limit soft in the
following sense:
1. A job is entitled to run up to the limit number of tasks.
2. If there are free slots and no other job waits for their entitled
slots, a job can run more tasks than the limit.
3. When a job runs more tasks
Your problem may be related to:
http://issues.apache.org/jira/browse/HADOOP-1622
Runping
=
Ted,
Means going the HADOOP_CLASSPATH route, ie. creating a separate
directory for those shared jars and then set it once in the
hadoop-env.sh, I think this will work for me too, I am
I encountered similar problems many times too, especially the input data
is compressed.
I had to raise the heapsize around 700MB to avoid oom problems in the
mappers.
Runping
-Original Message-
From: Devaraj Das [mailto:[EMAIL PROTECTED]
Sent: Friday, December 28, 2007 3:28 AM
To:
If your files have .gz as extension, they will split.
Runping
-Original Message-
From: Rui Shi [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 13, 2007 2:53 PM
To: hadoop-user@lucene.apache.org
Subject: How to ask hadoop not to split the input
Hi,
My input is a bunch of
The o.a.h.m.jobcontrol.JobControl class allows you to build a
dependency graph of jobs and submit them.
Runping
-Original Message-
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 06, 2007 8:20 PM
To: hadoop-user@lucene.apache.org
Subject: RE: performance
Try to add the package name too:
o.a.h.m. SequenceFileAsTextInputFormat
Runping
-Original Message-
From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
Sent: Friday, October 26, 2007 12:30 AM
To: hadoop-user@lucene.apache.org
Subject: problems reading compressed sequencefiles in
: Question about valueaggregators in 0.14.1...
Hi Runping and All:
That fixed the problem. Of course my aggregator is now failing for a
different reason, but that's an error in my code that I can fix.
I am extremely grateful for your assistance!
Thanks,
C G
Runping Qi
The values to reduce is an disk backed iterator.
The problematic part is to compute the distinct count.
You have to keep the unique values in memory, or you have to use some other
tricks.
One of such tricks is sampling. The other is to do write the values out to
disk to do a merge sort, then read
Try to add something like the following lines in your build.xml:
path id=project.classpath
...
pathelement location=${hadoop.home}/contrib/hadoop-datajoin.jar/
...
/path
Runping
-Original Message-
From: C G [mailto:[EMAIL PROTECTED]
Sent: Wednesday,
.../javac rules:
classpath refid=proto.classpath/
Thanks,
C G
Runping Qi [EMAIL PROTECTED] wrote:
Try to add something like the following lines in your build.xml:
Runping
-Original Message-
From: C G [mailto:[EMAIL PROTECTED]
Sent
You can write map/reduce output to multiple files by implementing your own
output format class. The class can open multiple output files and for each
key/value, write them to the appropriate one(s).
Runping
-Original Message-
From: C G [mailto:[EMAIL PROTECTED]
Sent: Friday,
Hadoop Aggregate package (o.a.h.mapred.lib.aggregate) is a good fit for your
aggregation problem.
Runping
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 07, 2007 12:09 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Loading data into HDFS
Seem your thinking is on the right track.
You can use one map/reduce job to split your input file containing the
complex numbers into desired number of files. This should be easy to do.
Then you can run your main job on the split files which will offer you
desired parallelism.
One thing to keep
You can use the data_join lib in contrib to do your job.
Runping
-Original Message-
From: Alexandre Rochette [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 20, 2007 5:44 PM
To: hadoop-user@lucene.apache.org
Subject: 'Combining' input files for maps
Hello Hadoop users,
Hadoop supports random reads.
However, it does not support random writes.
Hadoop's file is write-once-only. When you create a file, you can write to
it sequentially. Once you close it, it becomes read only.
In order to replace a section of a file, you can create a temp file, get
data from the
The in current framework, each mapper task will create one combiner object
per partition per spill.
This is very costly, since each time a combiner is created, a new process is
actually created to execute the
combiner executable. I suspect a job with a stream combiner may not run much
With HADOOP-1216, the framework will support reduce=none feature by setting
numReduceTasks=0.
If a map/reduce job set numReduceTasks=0, it will not create any reducer
tasks.
The mappers will not generate the map output files either.
Rather, each mapper will generate one DFS file in the
Hi,
I am in the process of cleaning up Hadoop streaming.
I noticed there are some half baked stuffs, and not sure whether they have
ever been used/tested.
Your feedbacks will help a lot. Thanks a lot in advance.
TupleInputFormat
MergerInputFormat
PipeCombiner
MustangFile
One way to do that is to store your words in a DFS file.
In the configure method of your mapper class, you can read the words in from
the file and use them. You can use JobConf to pass the file name to the
mapper.
Runping
-Original Message-
From: Ilya Vishnevsky [mailto:[EMAIL
If the word set is small ( 100), it should be OK to stuff them in the
jobConf.
-Original Message-
From: Ilya Vishnevsky [mailto:[EMAIL PROTECTED]
Sent: Monday, March 19, 2007 9:25 AM
To: hadoop-user@lucene.apache.org
Subject: RE: Global information in MapReduce
Thanks,
22 matches
Mail list logo