Hi, everybody. I am writing a project in C++ and want to use the power of
MapFile class(which belongs to org.apache.hadoop.io) of hadoop. Can you
please tell me how can I write code in C++ using MapFile or there is no way
to use API org.apache.hadoop.io in c++ (libhdfs only helps with
Is it possible to write a map reduce job using multiple input files?
For example:
File 1 has data like - Name, Number
File 2 has data like - Number, Address
Using these, I want to create a third file which has something like - Name,
Address
How can a map reduce job be written to do this?
Hey Amandeep,
You can get the file name for a task via the map.input.file property. For
the join you're doing, you could inspect this property and ouput (number,
name) and (number, address) as your (key, value) pairs, depending on the
file you're working with. Then you can do the combination in
Hi Mark,
Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by BytesWritable#getBytes() to use.
Tom
On Fri, Feb 6, 2009 at 5:41 AM, Mark Kerzner markkerz...@gmail.com wrote:
Hi,
I have written
Amandeep Khurana ama...@gmail.com writes:
Is it possible to write a map reduce job using multiple input files?
For example:
File 1 has data like - Name, Number
File 2 has data like - Number, Address
Using these, I want to create a third file which has something like - Name,
Address
How
Well, that obviously depend on the RDBMS' implementation. And although
the case is not as bad as you describe (otherwise you better ask your
RDBMS vendor for your money back), your point is valid. But then
again, a RDBMS is not designed for that kind of work.
What do you mean by creating
Indeed, this was the answer!
Thank you,
Mark
On Fri, Feb 6, 2009 at 4:25 AM, Tom White t...@cloudera.com wrote:
Hi Mark,
Not all the bytes stored in a BytesWritable object are necessarily
valid. Use BytesWritable#getLength() to determine how much of the
buffer returned by
You put the files into a common directory, and use that as your input to the
MapReduce job. You write a single Mapper class that has an if statement
examining the map.input.file property, outputting number as the key for
both files, but address for one and name for the other. By using a
commone
On Fri, Feb 6, 2009 at 2:40 PM, Fredrik Hedberg fred...@avafan.com wrote:
Well, that obviously depend on the RDBMS' implementation. And although the
case is not as bad as you describe (otherwise you better ask your RDBMS
vendor for your money back), your point is valid. But then again, a RDBMS
There is currently no way to read MapFiles in any language other than
Java. You can write a JNI wrapper similar to libhdfs.
Alternatively, you can also write the complete stack from scratch,
however this might prove very difficult or impossible. You might want to
check the ObjectFile/TFile
Hello,
I recently checked out revision 741606, and am attempting to run the
'test' ant task.
I'm new to building hadoop from source, so my problem is most likely
somewhere in my own configuration, but I'm at a bit of a loss as to
how to trace it.
The only environment variable that I've set for
Thanks Jeff...
I am not 100% clear about the first solution you have given. How do I get
the multiple files to be read and then feed into a single reducer? I should
have multiple mappers in the same class and have different job configs for
them, run two separate jobs with one outputing the key as
Hey Tom,
I got also burned by this ?? Why does BytesWritable.getBytes() returns
non-vaild bytes ?? Or we should add a BytesWritable.getValidBytes() kind of
function.
Best
Bhupesh
-Original Message-
From: Tom White [mailto:t...@cloudera.com]
Sent: Fri 2/6/2009 2:25 AM
To:
How well does the read throughput from HDFS scale with the number of data nodes
?
For example, if I had a large file (say 10GB) on a 10 data node cluster, would
the time taken to read this whole file in parallel (ie, with multiple reader
client processes requesting different parts of the file
On Feb 6, 2009, at 7:06 AM, Stefan Podkowinski wrote:
Another scenario I just recognized: what about current/realtime
data? E.g. 'select * from logs where date = today()'. Working with
'offset' may turn out to return different results after the table has
been updated and tasks are still
On Feb 6, 2009, at 11:00 AM, TCK wrote:
How well does the read throughput from HDFS scale with the number of
data nodes ?
For example, if I had a large file (say 10GB) on a 10 data node
cluster, would the time taken to read this whole file in parallel
(ie, with multiple reader client
Hey all
I was trying to run the word count example on one of the hadoop systems I
installed, but when i try to copy the text files from the local file system
to the DFS, it throws up the following exception:
[mith...@node02 hadoop]$ jps
8711 JobTracker
8805 TaskTracker
8901 Jps
8419 NameNode
8642
I had to change the master on my running cluster and ended up with the same
problem. Were you able to fix it at your end?
Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Thu, Feb 5, 2009 at 8:46 AM, shefali pawar
Well, that's also implicit by design, and cannot really be solved in a
generic way. As with any system, it's not foolproof; unless you fully
understand what you're doing, you won't reliably get the result you're
seeking.
As I said before, the JDBC interface for Hadoop solves a specific
Ok. Got it.
Now, how would my reducer know whether the name is coming first or the
address? Is it going to be in the same order in the iterator as the files
are read (alphabetically) in the mapper?
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Fri,
I'm getting the following error while running my hadoop job:
09/02/06 15:33:03 INFO mapred.JobClient: Task Id :
attempt_200902061333_0004_r_00_1, Status : FAILED
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Unknown Source)
at
Ok. I was able to get this to run but have a slight problem.
*File 1*
1 10
2 20
3 30
3 35
4 40
4 45
4 49
5 50
*File 2*
a 10 123
b 20 21321
c 45 2131
d 40 213
I want to join the above two based on the second column of file 1. Here's
what I am getting as the
If it was me I would prefix the map values outputs with a: and n:.
a: for address and n: for number
then on the reduce you could test the value to see if its the address or the
name with if statements no need to worry about which one comes first just
make sure they both have been set before
23 matches
Mail list logo