date:20090312

how to preserve original line order?

2009-03-12 Thread Roldano Cattoni

The task should be simple, I want to put in uppercase all the words of a (large) file. I tried the following: - streaming mode - the mapper is a perl script that put each line in uppercase (number of mappers > 1) - no reducer (number of reducers set to zero) It works fine except for line or

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-03-12 Thread Sriram Rao

Hey TCK, We operate a large cluster in which we run both HDFS/KFS in the same cluster and on the same nodes. We run two instances of KFS and one instance of HDFS in the cluster: - Our logs are in KFS and we have KFS setup in WORM mode (a mode in which deletions/renames on files/dirs are permitte

Re: Creating Lucene index in Hadoop

2009-03-12 Thread 王红宝

you can see the nutch code. 2009/3/13 Mark Kerzner > Hi, > > How do I allow multiple nodes to write to the same index file in HDFS? > > Thank you, > Mark >

Re: tuning performance

2009-03-12 Thread Allen Wittenauer

On 3/12/09 7:13 PM, "Vadim Zaliva" wrote: > The machines have 4 disk each, stripped. > However I do not see disks being a bottleneck. When you stripe you automatically make every disk in the system have the same speed as the slowest disk. In our experiences, systems are more likely to ha

Creating Lucene index in Hadoop

2009-03-12 Thread Mark Kerzner

Hi, How do I allow multiple nodes to write to the same index file in HDFS? Thank you, Mark

Child Nodes processing jobs?

2009-03-12 Thread Richa Khandelwal

Hi, I am running a cluster of map/reduce jobs. How do I confirm that slaves are actually executing the map/reduce job spawned by the JobTracker at the master. All the slaves are running the datanodes and tasktracker fine. Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-12 Thread Amareshwari Sriramadasu

Are you seeing reducers getting spawned from web ui? then, it is a bug. If not, there won't be reducers spawned, it could be job-setup/ job-cleanup task that is running on a reduce slot. See HADOOP-3150 and HADOOP-4261. -Amareshwari Chris K Wensel wrote: May have found the answer, waiting on

Re: tuning performance

2009-03-12 Thread jason hadoop

For a simple test, set the replication on your entire cluster to 6 hadoop dfs -setRep -R -w 6 / This will triple your disk usage and probably take a while, but then you are guaranteed that all data is local. You can also get a rough idea from the Job Counters, 'Data-local map tasks' total field

Re: tuning performance

2009-03-12 Thread Vadim Zaliva

The machines have 4 disk each, stripped. However I do not see disks being a bottleneck. Monitoring system activity shows that CPU is utilized 2-70%, disk usage is moderate, while network activity seems to be quite high. In this particular cluster we have 6 machines and replication factor is 2. I wa

Hadoop Streaming throw an exception with wget as the mapper

2009-03-12 Thread Nick Cen

Hi All, I am trying to use the hadoop straeming with "wget" to simulate a distributed downloader. The command line i use is ./bin/hadoop jar -D mapred.reduce.tasks=0 contrib/streaming/hadoop-0.19.0-streaming.jar -input urli -output urlo -mapper /usr/bin/wget -outputformat org.apache.hadoop.mapred

Re: How to let key sorted in the final outputfile

2009-03-12 Thread Edward J. Yoon

For your information - http://wiki.apache.org/hama/MatMult On Wed, Nov 12, 2008 at 2:05 AM, He Chen wrote: > hi everyone > > I use hadoop to do matrix multiplication, I let the key to store the row > information, and let the value be the total row like this: > > 0 (this is the key) (

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

2009-03-12 Thread Raghu Angadi

TCK wrote: How well does the read throughput from HDFS scale with the number of data nodes ? For example, if I had a large file (say 10GB) on a 10 data node cluster, would the time taken to read this whole file in parallel (ie, with multiple reader client processes requesting different parts of

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-12 Thread Chris K Wensel

May have found the answer, waiting on confirmation from users. Turns out 0.19.0 and .1 instantiate the reducer class when the task is actually intended for job/task cleanup. branch-0.19 looks like it resolves this issue by not instantiating the reducer class in this case. I've got a work

Hadoop User Group Meeting (Bay Area) 3/18

2009-03-12 Thread Ajay Anand

The next Bay Area Hadoop User Group meeting is scheduled for Wednesday, March 18th at Yahoo! 2811 Mission College Blvd, Santa Clara, Building 2, Training Rooms 5 & 6 from 6:00-7:30 pm. Agenda: "Performance Enhancement Techniques with Hadoop - a Case Study" - Milind Bhandarkar "RPMs for Hadoop D

Re: Building Release 0.19.1

2009-03-12 Thread Tsz Wo (Nicholas), Sze

Hi Aviad, You are right. The eclipse plugin cannot be compiled in in windows. See also HADOOP-4310, https://issues.apache.org/jira/browse/HADOOP-4310 Nicholas Sze - Original Message > From: Aviad sela > To: Hadoop Users Support > Sent: Thursday, March 12, 2009 1:00:12 PM > Subje

Re: tuning performance

2009-03-12 Thread Aaron Kimball

Xeon vs. Opteron is likely not going to be a major factor. More important than this is the number of disks you have per machine. Task performance is proportional to both the number of CPUs and the number of disks. You are probably using way too many tasks. Adding more tasks/node isn't necessarily

Building Release 0.19.1

2009-03-12 Thread Aviad sela

Building the eclipse project in windows XP, using Eclipse 3.4 results with the following error. It seems that some of the jars to build the projects are missing * compile*: [*echo*] contrib: eclipse-plugin [*javac*] Compiling 45 source files to D:\Work\AviadWork\workspace\cur\W_ECLIPSE\E34_Hadoop_

Re: about block size

2009-03-12 Thread Doug Cutting

One factor is that block size should minimize the impact of disk seeks. For example, if a disk seeks in 10ms and transfers at 100MB/s, then a good block size will be substantially larger than 1MB. With 100MB blocks, seeks would only slow things by 1%. Another factor is that, unless files are

Reducers spawned when mapred.reduce.tasks=0

2009-03-12 Thread Chris K Wensel

Hey all Have some users reporting intermittent spawning of Reducers when the job.xml shows mapred.reduce.tasks=0 in 0.19.0 and .1. This is also confirmed when jobConf is queried in the (supposedly ignored) Reducer implementation. In general this issue would likely go unnoticed since the d

How to limit concurrent task numbers of a job.

2009-03-12 Thread Zhou, Yunqing

Here I have a job , it contains 2000 map tasks and each map need 1 hour or so (map cannot be splited because its input is a compressed archive.) How can I set this job's max concurrent task numbers (map and reduce) to leave resources for other urgent jobs? Thanks.

Re: Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value

2009-03-12 Thread Richa Khandelwal

I am running the same test and job that completes in 10 mins for (hk,lv) case takes is still running after 30mins have passed for (sk,hv) case. Would be interesting to pinpoint the reason behind it. On Wed, Mar 11, 2009 at 1:27 PM, Gyanit wrote: > > Here are exact numbers: > # of (k,v) pairs = 1

Re: using virtual slave machines

2009-03-12 Thread Steve Loughran

Karthikeyan V wrote: There is no specific procedure for configuring virtual machine slaves. make sure the following thing are done. I've used these as the beginning of a page on this http://wiki.apache.org/hadoop/VirtualCluster

Re: Extending ClusterMapReduceTestCase

2009-03-12 Thread Steve Loughran

jason hadoop wrote: I am having trouble reproducing this one. It happened in a very specific environment that pulled in an alternate sax parser. The bottom line is that jetty expects a parser with particular capabilities and if it doesn't get one, odd things happen. In a day or so I will have h

Re: Persistent HDFS On EC2

2009-03-12 Thread Steve Loughran

Kris Jirapinyo wrote: Why would you lose the locality of storage-per-machine if one EBS volume is mounted to each machine instance? When that machine goes down, you can just restart the instance and re-mount the exact same volume. I've tried this idea before successfully on a 10 node cluster on

Re: HADOOP Oracle connection workaround

2009-03-12 Thread Mridul Muralidharan

Would be better to externalize this through either a template - or at the least, message bundles. - Mridul evana wrote: Out of the box implementation hadoop has some issues in connecting to oracle. Loos like DBInputFomat is built keeping mysql/hsqldb in mind. You need to modify the out of th

How to skip bad records in .19.1

2009-03-12 Thread 柳松

Dear all: I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)", and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)". However, after 3 failed attempts, it gave me this exception message: java.lang.NullPointerException at org.apache.hadoop.io.seriali

HADOOP Oracle connection workaround

2009-03-12 Thread evana

Out of the box implementation hadoop has some issues in connecting to oracle. Loos like DBInputFomat is built keeping mysql/hsqldb in mind. You need to modify the out of the box implementation of getSelectQuery method in DBInputFomat. WORK AROUND here is the code snippet...(remember this works on

how to preserve original line order?

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

Re: Creating Lucene index in Hadoop

Re: tuning performance

Creating Lucene index in Hadoop

Child Nodes processing jobs?

Re: Reducers spawned when mapred.reduce.tasks=0

Re: tuning performance

Re: tuning performance

Hadoop Streaming throw an exception with wget as the mapper

Re: How to let key sorted in the final outputfile

Re: Batch processing with Hadoop -- does HDFS scale for parallel reads?

Re: Reducers spawned when mapred.reduce.tasks=0

Hadoop User Group Meeting (Bay Area) 3/18

Re: Building Release 0.19.1

Re: tuning performance

Building Release 0.19.1

Re: about block size

Reducers spawned when mapred.reduce.tasks=0

How to limit concurrent task numbers of a job.

Re: Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value

Re: using virtual slave machines

Re: Extending ClusterMapReduceTestCase

Re: Persistent HDFS On EC2

Re: HADOOP Oracle connection workaround

How to skip bad records in .19.1

HADOOP Oracle connection workaround

27 matches

Site Navigation

Mail list logo

Footer information