RE: unable to figure out this exception from reduce task

2008-01-15 Thread Runping Qi
I encountered a similar case. Here is the Jira: https://issues.apache.org/jira/browse/HADOOP-2164 Runping -Original Message- From: Vadim Zaliva [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 15, 2008 9:59 PM To: hadoop-user@lucene.apache.org Subject: Re: unable to figure out

RE: writing output files in hadoop streaming

2008-01-14 Thread Runping Qi
One way to achieve your goal is to implement your own OutputFormat/RecordWriter classes. Your reducer will emit all the key/value pairs as in the normal case. In your record writer class can open multiple output files and dispatch the key/value to appropriate files based on the actual values.

RE: Question on running simultaneous jobs

2008-01-10 Thread Runping Qi
An improvement over Doug's proposal is to make the limit soft in the following sense: 1. A job is entitled to run up to the limit number of tasks. 2. If there are free slots and no other job waits for their entitled slots, a job can run more tasks than the limit. 3. When a job runs more tasks

RE: Jar file location

2008-01-07 Thread Runping Qi
Your problem may be related to: http://issues.apache.org/jira/browse/HADOOP-1622 Runping = Ted, Means going the HADOOP_CLASSPATH route, ie. creating a separate directory for those shared jars and then set it once in the hadoop-env.sh, I think this will work for me too, I am

RE: question on Hadoop configuration for non cpu intensive jobs - 0.15.1

2007-12-28 Thread Runping Qi
I encountered similar problems many times too, especially the input data is compressed. I had to raise the heapsize around 700MB to avoid oom problems in the mappers. Runping -Original Message- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Friday, December 28, 2007 3:28 AM To:

RE: How to ask hadoop not to split the input

2007-12-13 Thread Runping Qi
If your files have .gz as extension, they will split. Runping -Original Message- From: Rui Shi [mailto:[EMAIL PROTECTED] Sent: Thursday, December 13, 2007 2:53 PM To: hadoop-user@lucene.apache.org Subject: How to ask hadoop not to split the input Hi, My input is a bunch of

RE: performance of multiple map-reduce operations

2007-11-06 Thread Runping Qi
The o.a.h.m.jobcontrol.JobControl class allows you to build a dependency graph of jobs and submit them. Runping -Original Message- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 06, 2007 8:20 PM To: hadoop-user@lucene.apache.org Subject: RE: performance

RE: problems reading compressed sequencefiles in streaming (0.13.1)

2007-10-26 Thread Runping Qi
Try to add the package name too: o.a.h.m. SequenceFileAsTextInputFormat Runping -Original Message- From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED] Sent: Friday, October 26, 2007 12:30 AM To: hadoop-user@lucene.apache.org Subject: problems reading compressed sequencefiles in

RE: Question about valueaggregators in 0.14.1...

2007-10-12 Thread Runping Qi
: Question about valueaggregators in 0.14.1... Hi Runping and All: That fixed the problem. Of course my aggregator is now failing for a different reason, but that's an error in my code that I can fix. I am extremely grateful for your assistance! Thanks, C G Runping Qi

RE: large reduce group sizes

2007-10-11 Thread Runping Qi
The values to reduce is an disk backed iterator. The problematic part is to compute the distinct count. You have to keep the unique values in memory, or you have to use some other tricks. One of such tricks is sampling. The other is to do write the values out to disk to do a merge sort, then read

RE: build question

2007-09-27 Thread Runping Qi
Try to add something like the following lines in your build.xml: path id=project.classpath ... pathelement location=${hadoop.home}/contrib/hadoop-datajoin.jar/ ... /path Runping -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent: Wednesday,

RE: build question

2007-09-27 Thread Runping Qi
.../javac rules: classpath refid=proto.classpath/ Thanks, C G Runping Qi [EMAIL PROTECTED] wrote: Try to add something like the following lines in your build.xml: Runping -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent

RE: Multiple output files, and controlling output file name...

2007-09-21 Thread Runping Qi
You can write map/reduce output to multiple files by implementing your own output format class. The class can open multiple output files and for each key/value, write them to the appropriate one(s). Runping -Original Message- From: C G [mailto:[EMAIL PROTECTED] Sent: Friday,

RE: Loading data into HDFS

2007-08-07 Thread Runping Qi
Hadoop Aggregate package (o.a.h.mapred.lib.aggregate) is a good fit for your aggregation problem. Runping -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 07, 2007 12:09 PM To: hadoop-user@lucene.apache.org Subject: Re: Loading data into HDFS

RE: Setting number of Maps

2007-07-03 Thread Runping Qi
Seem your thinking is on the right track. You can use one map/reduce job to split your input file containing the complex numbers into desired number of files. This should be easy to do. Then you can run your main job on the split files which will offer you desired parallelism. One thing to keep

RE: 'Combining' input files for maps

2007-06-20 Thread Runping Qi
You can use the data_join lib in contrib to do your job. Runping -Original Message- From: Alexandre Rochette [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 20, 2007 5:44 PM To: hadoop-user@lucene.apache.org Subject: 'Combining' input files for maps Hello Hadoop users,

RE: Does Hadoop support Random Read/Write?

2007-05-22 Thread Runping Qi
Hadoop supports random reads. However, it does not support random writes. Hadoop's file is write-once-only. When you create a file, you can write to it sequentially. Once you close it, it becomes read only. In order to replace a section of a file, you can create a temp file, get data from the

anybody use stream combiner feature?

2007-04-20 Thread Runping Qi
The in current framework, each mapper task will create one combiner object per partition per spill. This is very costly, since each time a combiner is created, a new process is actually created to execute the combiner executable. I suspect a job with a stream combiner may not run much

Real use scenario of streaming with Reduce=None

2007-04-20 Thread Runping Qi
With HADOOP-1216, the framework will support reduce=none feature by setting numReduceTasks=0. If a map/reduce job set numReduceTasks=0, it will not create any reducer tasks. The mappers will not generate the map output files either. Rather, each mapper will generate one DFS file in the

Anybody uses (or ever used) the following features/classes of Hadoop Streaming?

2007-04-05 Thread Runping Qi
Hi, I am in the process of cleaning up Hadoop streaming. I noticed there are some half baked stuffs, and not sure whether they have ever been used/tested. Your feedbacks will help a lot. Thanks a lot in advance. TupleInputFormat MergerInputFormat PipeCombiner MustangFile

RE: Global information in MapReduce

2007-03-19 Thread Runping Qi
One way to do that is to store your words in a DFS file. In the configure method of your mapper class, you can read the words in from the file and use them. You can use JobConf to pass the file name to the mapper. Runping -Original Message- From: Ilya Vishnevsky [mailto:[EMAIL

RE: Global information in MapReduce

2007-03-19 Thread Runping Qi
If the word set is small ( 100), it should be OK to stuff them in the jobConf. -Original Message- From: Ilya Vishnevsky [mailto:[EMAIL PROTECTED] Sent: Monday, March 19, 2007 9:25 AM To: hadoop-user@lucene.apache.org Subject: RE: Global information in MapReduce Thanks,