Hello,
I am using Hadoop 0.19.2 and DataJoin (contrib/datajoin), and I'd like to
know if this is still maintained by anyone, of if there is a wiki page or
something where I could get more info.
I was looking at Hadoop 0.21 release and it seems that this part of the code
did not change.
I'd like
Hi all..
I have searched the documentation but could not find a input file
format which will give line number as the key and line as the value.
Did I miss something? Can someone give me a clue of how to implement
one such input file format.
Thanks,
Udaya.
Hi,
For global line numbers, you would need to know the ordering within each split
generated from the input file. The standard input formats provide offsets in
splits, so if the records are of equal length you can compute some kind of
numbering.
I remember someone had implemented sequential
One easy way is to increase the timeout by setting mapred.task.timeout in
mapred-site.xml
On Thu, Jan 28, 2010 at 5:59 PM, #YONG YONG CHENG#
aarnc...@pmail.ntu.edu.sg wrote:
Good Day,
Is there any way to control the cleanup attempt of a failed map task
without changing the Hadoop
Thank you Amogh.
On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
For global line numbers, you would need to know the ordering within each
split generated from the input file. The standard input formats provide
offsets in splits, so if the records are of equal
I too had the doubt but could not find the clue. However Please post the
code if u can find it.
On Thu, Jan 28, 2010 at 4:03 PM, Ravi ravindra.babu.rav...@gmail.comwrote:
Thank you Amogh.
On Thu, Jan 28, 2010 at 3:44 PM, Amogh Vasekar am...@yahoo-inc.comwrote:
Hi,
For global line numbers,
Hi,
Here's the relevant thread with Gordon, the author of the solution:
I am in the process of learning Hadoop (and I think I've made a lot of
progress). I have described the specific problem and solution on my blog
Thank you Amogh
Ravi.
On 1/28/10, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
Here's the relevant thread with Gordon, the author of the solution:
I am in the process of learning Hadoop (and I think I've made a lot of
progress). I have described the specific problem and solution on my blog
Thank you Amogh. I will go through the link.
Udaya.
On 1/28/10, Ravi ravindra.babu.rav...@gmail.com wrote:
Thank you Amogh
Ravi.
On 1/28/10, Amogh Vasekar am...@yahoo-inc.com wrote:
Hi,
Here's the relevant thread with Gordon, the author of the solution:
I am in the process of learning
with hostnames of master and slaves added to /etc/hosts and removing entry
for 127.0.1.1 it worked
I was always specifying IP address instead of hostname in conf file. But
Hadoop uses IP address only at start up and for all other operations, it
uses hostname only. so added IP address in
hello, all,
As a newbie, I have been used to the (k1,v1,k2,v2) format
parameter list for map and reduce methods in mapper and reducer(as is
written in many books), but after several failures, I found in 0.20+, if we
extends from base class org.apache.hadoop.mapreduce.Mapper, the
I met the same problem on WinXP+Cygwin and fixed it by
either:
- moving to a linux box (VMWare works very well)
or:
- configuring a mapred.child.tmp parameter in core-site.xml
I cannot explain why and how mapred.child.tmp is related to the problem.
From source code, it seems to be a JVM issue on
Thanks Amogh.
For the second part of my question, I actually mean loading block separately
from HDFS. I don't know whether it is realistic. Anyway, for my goal is to
process different division of a file separately, to do that at split level is
OK. But even I can get the splits from
Hi Le,
I don't think mapreduce can completely combine all the records with the same
key into one record. one situation is when min.num.spills.for.combine is too
high, while you get less records than that which share the same key, the
combiner will not be invoked on these records.
Actually, I
Hi all..
I have searched the documentation but could not find a input file
format which will give line number as the key and line as the value.
Did I miss something? Can someone give me a clue of how to implement
one such input file format.
Thanks,
Udaya.
Hi Gang,
Yes PathFilters work only on file paths. I meant you can include such type of
logic at split level.
The input format's getSplits() method is responsible for computing and adding
splits to a list container, for which JT initializes mapper tasks. You can
override the getSplits() method
On Wed, Jan 27, 2010 at 3:08 PM, Ryan Smith ryan.justin.sm...@gmail.com wrote:
If you just want to use hadoop jars in your maven projects, run your own
caching archive repository manager like Nexus.
What I really want is to publish my own projects with the correct
dependencies, using artifacts
SS,
Unless Im grossly mistaken, Nexus does exactly this. I have my own projects
that use hadoop jars. I can easily add custom patched versions of hadoop
too. These hadoop jars arent in maven central though, theyre in my own
instance of Nexus. When i go into my custom hadoop project and type:
On Thu, Jan 28, 2010 at 4:01 AM, Udaya Lakshmi udaya...@gmail.com wrote:
Hi all..
I have searched the documentation but could not find a input file
format which will give line number as the key and line as the value.
Did I miss something? Can someone give me a clue of how to implement
one
On Thu, Jan 28, 2010 at 8:14 AM, steven zhuang zhuangxin8...@gmail.com wrote:
hello, all,
As a newbie, I have been used to the (k1,v1,k2,v2) format
parameter list for map and reduce methods in mapper and reducer(as is
written in many books), but after several failures, I found
Unfortunately, setting mapred.child.tmp doesn't help. Could you share
your sample config files?
What about VMWare - I am thinking about this as a last resort :)
On Thu, Jan 28, 2010 at 3:41 PM, Yang Li liy...@cn.ibm.com wrote:
I met the same problem on WinXP+Cygwin and fixed it by
either:
-
Jay Booth wrote:
Did you set io.file.buffer.size (or whatever the property is) to a large
value?
Just re-ran the benchmark with that bumped to 65536 (as proposed in
http://www.cloudera.com/blog/tag/configuration/). The benchmark is still
slower with jumbo frames than without (but difference
Hi Erik,
With four priority levels like this, you should just be able to use Hadoop's
priorities, because it has five of them (very high, high, normal, low and very
low). You can just use the default scheduler for this (i.e. don't enable either
the fair or the capacity scheduler). Or am I
On 1/28/10 12:59 AM, Alex Parvulescu alex.parvule...@gmail.com wrote:
I am using Hadoop 0.19.2 and DataJoin (contrib/datajoin),
...
I'd like to know if I can submit a for this small project. It's nothing
much, I just added some generics. It's not perfect, but I think it's a good
start.
You
We're working on a patch that monkeys with the TCP buffers because we're
seeing slow downs with big transfers as well. It might be related...
On 1/28/10 9:25 AM, stephen mulcahy stephen.mulc...@deri.org wrote:
Jay Booth wrote:
Did you set io.file.buffer.size (or whatever the property is) to
Hi all,
I have a use case for collecting several rows from MySQL of
compressed/unstructured data (n rows), expanding the data set, and
storing the expanded results back into a MySQL DB (100,000n rows).
DBInputFormat seems to perform reasonably well but DBOutputFormat is
inserting rows
Hm, yes. See how few hits this shows:
http://search-hadoop.com/?q=non-distributedfc_project=Hadoop
You can set it up on 1 box, but that's really useful only for development.
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
- Original Message
From: Ranganathan,
Hi All,
I got an NPE on hadoop-18.1 datanode
Exception in thread org.apache.hadoop.dfs.datanod...@107f7f7
java.lang.NullPointerException
at org.apache.hadoop.dfs.FSDataset.getMetaFile(FSDataset.java:571)
at org.apache.hadoop.dfs.FSDataset.updateBlock(FSDataset.java:801)
Good Day,
My Hadoop version is 0.19.1. I have successfully configured to run it on a
Windows machine.
Here is the configuration that I performed:
1. I put the Hadoop files under this folder C:\cygwin\usr\local\hadoop
2. Below is the hadoop-site.xml that I use.
?xml version=1.0?
Yura Taras wrote:
Unfortunately, setting mapred.child.tmp doesn't help. Could you share
your sample config files?
What about VMWare - I am thinking about this as a last resort :)
On Thu, Jan 28, 2010 at 3:41 PM, Yang Li liy...@cn.ibm.com wrote:
I met the same problem on WinXP+Cygwin and
Thank you Jeff.
On 1/29/10, Jeff Zhang zjf...@gmail.com wrote:
Sorry for my mistake, the idea of writing your own InputFormat seems not a
good idea. The cost of getting the line number of each split is a little
high.
On Fri, Jan 29, 2010 at 8:40 AM, Jeff Zhang zjf...@gmail.com wrote:
I'm
Hi,
When framework splits a file, will it happen that some part of a
line falls in one split and the other part in some other split? Or is
the framework going to take care that it always splits at the end of
the line?
Thanks,
Udaya.
Hadoop will take care of it. If the split is supposed to be at the middle of
the
line, then it will be extended till the end. Though the split limit will be
exceeded
by few bytes.
On Thu, Jan 28, 2010 at 7:34 PM, Udaya Lakshmi udaya...@gmail.com wrote:
Hi,
When framework splits a file,
when hadoop running multi jobs concurrently, that is when hadoop is busy,
always have killed tasks in some jobs, although the jobs success finally.
anybody tell me why?
--
Regards
Junyong
The splitting does not know anything about the input file's internal logical
structure, for example line-oriented text files are split on arbitrary byte
boundaries.
On Fri, Jan 29, 2010 at 1:49 AM, .ke. sivakumar kesivaku...@gmail.comwrote:
Hadoop will take care of it. If the split is supposed
On Fri, Jan 29, 2010 at 2:52 PM, john li lij...@gmail.com wrote:
when hadoop running multi jobs concurrently, that is when hadoop is busy,
always have killed tasks in some jobs, although the jobs success finally.
anybody tell me why?
if only killed, don't mind it. JobTracker schedules idle
I guess this would be a better answer
A FileSplit is merely a description of the boundaries. e.g., bytes 0 to
and bytes 1 to 1. The Mapper then interprets the boundaries
described by a FileSplit in a way that makes sense at the data level. The
FileSplit does not actually physically
Hi,
In general, the file split may break the records, its the responsibility of the
record reader to present the record as a whole. If you use standard available
InputFormats, the framework will make sure complete records are presented in
key,value.
Amogh
On 1/29/10 9:04 AM, Udaya Lakshmi
You can find out the reason from the JT logs (eg: memory/timeout restrictions)
and adjust the timeout - mapred.task.timeout or the memory parameters
accordingly.Refer
http://hadoop.apache.org/common/docs/r0.20.0/cluster_setup.html
Cheers,
/R
On 1/29/10 12:22 PM, john li lij...@gmail.com wrote:
39 matches
Mail list logo