Thank you very much for sharing your experience! This is very helpful
and we will take a loog at mogile.
I have two questions regarding your decision against HDFS. You mention
issues with scale regarding the number of files. Could you elaborate a
bit? At which orders of magnitude would you
Hi
I set up the property dfs.replication to 1
and I got this error while coping files to the DFS.
[EMAIL PROTECTED] ~/software/Hadoop/hadoop-0.16.0]$ bin/hadoop dfs
-copyFromLocal /home2/mtlinden/simdata/GASS-RDS-3-G/tm IDT
08/03/24 11:46:32 WARN fs.DFSClient: DataStreamer Exception:
Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as Could not get
block locations and DataXceiver: java.io.EOFException? Can anyone
give me a little more insight about those exceptions?
And does anyone have a similar workload (frequent writes and
Hi Hemanth,
More design questions I'm wondering about:
So what determines the spread/location of data blocks that are
uploaded/added to HDFS outside of the Map/Reduce framework? For
instance, if I use a dfs -put to upload files to the HDFS, does the
dfs system try to spread the blocks out across
Hi
I want to copy 1000 files (37GB) of data to the dfs. I have a set up
of 9-10 nodes, each one has between 5 to 15GB of free space.
While coping the files from the local file system on nodeA, the node
gets full of data and the the process gets stalled.
I have another free node with 80GB of
Copy from a machine that is *not* running as a data node in order to get
better balancing. Using distcp may also help because the nodes actually
doing the copying will be spread across the cluster.
You should probably be running a rebalancing script as well if your nodes
have differing sizes.
Ted Dunning wrote:
A few million files should fit pretty easily in hdfs.
One problem is that hadoop is not designed with full high availability in
mind. Mogile is easier to adapt to those needs.
Sorry to be so persistent but what failure scenario would mogile handle
better than hadoop hdfs or
Hi Ted
Thanks for the info. But running the distfs I got this exception
bin/hadoop distcp -update
file:///home2/mtlinden/simdata/GASS-RDS-3-G/tm /user/aolias/IDT
With failures, global counters are inaccurate; consider running with -i
Copy failed: org.apache.hadoop.ipc.RemoteException:
Ok it seems I have the file system corrupted. How can I recover from this
bin/hadoop fsck /
/tmp/hadoop-aolias/mapred/system/job_200803241610_0001/job.jar: Under
replicated blk_4445907956276011533. Target Replicas is 10 but found 7
replica(s).
...
Hi,
I have been unsuccessfully trying to set the map output value class
different to the one reduce outputs (in 0.16.0). AFAIK the following should
do the trick:
conf.setMapOutputValueClass(FooWritable.class)
conf.setOutputValueClass(BarWritable.class)
However I kept getting exceptions saying
From the exception stack it appears that the map output class is correctly
set to FooWritable.class but you are trying to collect BarWritable(s) in
your map tasks.
Best,
RB
On Mon, Mar 24, 2008 at 1:22 PM, Chang Hu [EMAIL PROTECTED] wrote:
Hi,
I have been unsuccessfully trying to set the
Map-reduce excels at gluing together files like this.
The map phase selects the key and makes sure that you have some way of
telling what the source of the record is.
The reduce phase takes all of the records with the same key and glues them
together. It can do your processing, but it is also
Thanks Riccardo, but that's not the case. I checked and made sure it's
collecting FooWritable. In fact, from the following thread:
http://www.nabble.com/Different-output-classes-from-map-and-reducer-td15728122.html
My exception is the same as if map output value class was not set.
- Chang
If your client use to copy is one of the datanodes, then the first copy would
go to this datanode(client) and second would be on another random nodes in your
cluster. This policy is designed to improve write performance. On the other
hand if you would like the data to be distributed, as Ted
It's possible to do the whole thing in one round of map/reduce.
The only requirement is to be able to differentiate between the 2
different types of input files, possibly using different file name
extensions.
One of my coworkers wrote a smart InputFormat class that creates a
different
Code below, also attached. I put this together from the word count
example.
package edu.umd.cs.mapreduce;
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import
Chang Hu wrote:
Code below, also attached. I put this together from the word count
example.
The problem is with your combiner. When a combiner is specified, it
generates the final map output, since combination is a map-side
operation. Your combiner takes Text,IntWritable generated by
Yes I did the test and it worked. Also I run the distfs command and
the parallel map/reduce copy.
Improves performance on the basis that files are copied locally in
that node, so there is no need network transmission. But isn't that
policy more weak? If that node crashes ( he worst case), you
Thanks Doug! I am able to run the job after removing the setConbiner()
line. Does it hurt efficiency and how do I add a combiner?
- Chang
On Mon, Mar 24, 2008 at 6:26 PM, Doug Cutting [EMAIL PROTECTED] wrote:
Chang Hu wrote:
Code below, also attached. I put this together from the word
Good call. Thank you guys for helping me out. I'll do some experiments on
efficiency later and keep you guys updated.
- Chang
On Mon, Mar 24, 2008 at 6:51 PM, Riccardo Boscolo [EMAIL PROTECTED]
wrote:
That's simple, add a combiner that looks exactly like your reducer, but
collects
Improves performance on the basis that files are copied locally in
that node, so there is no need network transmission. But isn't that
policy more weak? If that node crashes ( he worst case), you loses 1
redundancy level.
This policy was for better write performance. As you mentioned, yes in
I hate to point this out, but losing *any* data node will decrease the
replication of some blocks.
On 3/24/08 4:53 PM, lohit [EMAIL PROTECTED] wrote:
Improves performance on the basis that files are copied locally in
that node, so there is no need network transmission. But isn't that
policy
sandybandy wrote:
Hi , I have put hadoop-core.jar and all dependent JARS in webapp lib and
also all XMLBEAN jars in webapp lib since my mapreducer program is using
these XMLBEAN jar files to process xml document. But when I submit a job
via servlet it is saying
23 matches
Mail list logo