The task should be simple, I want to put in uppercase all the words of a
(large) file.
I tried the following:
- streaming mode
- the mapper is a perl script that put each line in uppercase (number of
mappers > 1)
- no reducer (number of reducers set to zero)
It works fine except for line or
Hey TCK,
We operate a large cluster in which we run both HDFS/KFS in the same
cluster and on the same nodes. We run two instances of KFS and one
instance of HDFS in the cluster:
- Our logs are in KFS and we have KFS setup in WORM mode (a mode in
which deletions/renames on files/dirs are permitte
you can see the nutch code.
2009/3/13 Mark Kerzner
> Hi,
>
> How do I allow multiple nodes to write to the same index file in HDFS?
>
> Thank you,
> Mark
>
On 3/12/09 7:13 PM, "Vadim Zaliva" wrote:
> The machines have 4 disk each, stripped.
> However I do not see disks being a bottleneck.
When you stripe you automatically make every disk in the system have the
same speed as the slowest disk. In our experiences, systems are more likely
to ha
Hi,
How do I allow multiple nodes to write to the same index file in HDFS?
Thank you,
Mark
Hi,
I am running a cluster of map/reduce jobs. How do I confirm that slaves are
actually executing the map/reduce job spawned by the JobTracker at the
master. All the slaves are running the datanodes and tasktracker fine.
Thanks,
Richa Khandelwal
University Of California,
Santa Cruz.
Ph:425-241-
Are you seeing reducers getting spawned from web ui? then, it is a bug.
If not, there won't be reducers spawned, it could be job-setup/
job-cleanup task that is running on a reduce slot. See HADOOP-3150 and
HADOOP-4261.
-Amareshwari
Chris K Wensel wrote:
May have found the answer, waiting on
For a simple test, set the replication on your entire cluster to 6 hadoop
dfs -setRep -R -w 6 /
This will triple your disk usage and probably take a while, but then you are
guaranteed that all data is local.
You can also get a rough idea from the Job Counters, 'Data-local map tasks'
total field
The machines have 4 disk each, stripped.
However I do not see disks being a bottleneck. Monitoring system activity
shows that CPU is utilized 2-70%, disk usage is moderate, while network
activity seems to be quite high. In this particular cluster we have 6 machines
and replication factor is 2. I wa
Hi All,
I am trying to use the hadoop straeming with "wget" to simulate a
distributed downloader.
The command line i use is
./bin/hadoop jar -D mapred.reduce.tasks=0
contrib/streaming/hadoop-0.19.0-streaming.jar -input urli -output urlo
-mapper /usr/bin/wget -outputformat
org.apache.hadoop.mapred
For your information - http://wiki.apache.org/hama/MatMult
On Wed, Nov 12, 2008 at 2:05 AM, He Chen wrote:
> hi everyone
>
> I use hadoop to do matrix multiplication, I let the key to store the row
> information, and let the value be the total row like this:
>
> 0 (this is the key) (
TCK wrote:
How well does the read throughput from HDFS scale with the number of data nodes
?
For example, if I had a large file (say 10GB) on a 10 data node cluster, would the time taken to read this whole file in parallel (ie, with multiple reader client processes requesting different parts of
May have found the answer, waiting on confirmation from users.
Turns out 0.19.0 and .1 instantiate the reducer class when the task is
actually intended for job/task cleanup.
branch-0.19 looks like it resolves this issue by not instantiating the
reducer class in this case.
I've got a work
The next Bay Area Hadoop User Group meeting is scheduled for Wednesday,
March 18th at Yahoo! 2811 Mission College Blvd, Santa Clara, Building 2,
Training Rooms 5 & 6 from 6:00-7:30 pm.
Agenda:
"Performance Enhancement Techniques with Hadoop - a Case Study" - Milind
Bhandarkar
"RPMs for Hadoop D
Hi Aviad,
You are right. The eclipse plugin cannot be compiled in in windows. See also
HADOOP-4310,
https://issues.apache.org/jira/browse/HADOOP-4310
Nicholas Sze
- Original Message
> From: Aviad sela
> To: Hadoop Users Support
> Sent: Thursday, March 12, 2009 1:00:12 PM
> Subje
Xeon vs. Opteron is likely not going to be a major factor. More important
than this is the number of disks you have per machine. Task performance is
proportional to both the number of CPUs and the number of disks.
You are probably using way too many tasks. Adding more tasks/node isn't
necessarily
Building the eclipse project in windows XP, using Eclipse 3.4
results with the following error.
It seems that some of the jars to build the projects are missing
*
compile*:
[*echo*] contrib: eclipse-plugin
[*javac*] Compiling 45 source files to
D:\Work\AviadWork\workspace\cur\W_ECLIPSE\E34_Hadoop_
One factor is that block size should minimize the impact of disk seeks.
For example, if a disk seeks in 10ms and transfers at 100MB/s, then a
good block size will be substantially larger than 1MB. With 100MB
blocks, seeks would only slow things by 1%.
Another factor is that, unless files are
Hey all
Have some users reporting intermittent spawning of Reducers when the
job.xml shows mapred.reduce.tasks=0 in 0.19.0 and .1.
This is also confirmed when jobConf is queried in the (supposedly
ignored) Reducer implementation.
In general this issue would likely go unnoticed since the d
Here I have a job , it contains 2000 map tasks and each map need 1
hour or so (map cannot be splited because its input is a compressed
archive.)
How can I set this job's max concurrent task numbers (map and reduce)
to leave resources for other urgent jobs?
Thanks.
I am running the same test and job that completes in 10 mins for (hk,lv)
case takes is still running after 30mins have passed for (sk,hv) case.
Would be interesting to pinpoint the reason behind it.
On Wed, Mar 11, 2009 at 1:27 PM, Gyanit wrote:
>
> Here are exact numbers:
> # of (k,v) pairs = 1
Karthikeyan V wrote:
There is no specific procedure for configuring virtual machine slaves.
make sure the following thing are done.
I've used these as the beginning of a page on this
http://wiki.apache.org/hadoop/VirtualCluster
jason hadoop wrote:
I am having trouble reproducing this one. It happened in a very specific
environment that pulled in an alternate sax parser.
The bottom line is that jetty expects a parser with particular capabilities
and if it doesn't get one, odd things happen.
In a day or so I will have h
Kris Jirapinyo wrote:
Why would you lose the locality of storage-per-machine if one EBS volume is
mounted to each machine instance? When that machine goes down, you can just
restart the instance and re-mount the exact same volume. I've tried this
idea before successfully on a 10 node cluster on
Would be better to externalize this through either a template - or at
the least, message bundles.
- Mridul
evana wrote:
Out of the box implementation hadoop has some issues in connecting to oracle.
Loos like DBInputFomat is built keeping mysql/hsqldb in mind. You need to
modify the out of th
Dear all:
I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)",
and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)".
However, after 3 failed attempts, it gave me this exception message:
java.lang.NullPointerException
at
org.apache.hadoop.io.seriali
Out of the box implementation hadoop has some issues in connecting to oracle.
Loos like DBInputFomat is built keeping mysql/hsqldb in mind. You need to
modify the out of the box implementation of getSelectQuery method in
DBInputFomat.
WORK AROUND
here is the code snippet...(remember this works on
27 matches
Mail list logo