Hi,
I implemented a customized input format in Java for a Map Reduce job.
The mapper and reducer classes are implemented in C++, using the Hadoop
Pipes API.
The package documentation for org.apache.hadoop.mapred.pipes states that
The job may consist of any combination of Java and C++
You should use the -pipes option in the command.
For the input format, you can pack it into the hadoop core class jar file,
or put it into the cache file.
2008/4/8, Rahul Sood [EMAIL PROTECTED]:
Hi,
I implemented a customized input format in Java for a Map Reduce job.
The mapper and reducer
Maybe because you pass Strings to the LongWritables?
micha
11 Nov. wrote:
Hi folks,
I'm writing a little test programm to check the writing speed of DFS
file system, but can't get the file size using
fs.getFileStatus(file).getLen() or fs.getContentLength(file). Here is my
code:
I tried to play with the little test by attaching eclipse on when it
started, what surprised me is that the size could be got in eclipse, and the
result file is witten as expected. Can anybody explain this?
2008/4/8, 11 Nov. [EMAIL PROTECTED]:
Hi folks,
I'm writing a little test programm
Hi,
I have implemented Key and value pairs in the following way:
Key (Text class) Value(Custom class)
word1
word2
class Custom{
int freq;
TreeMapString, ArrayListString
}
I construct this type of key, value pairs in the outputcollector of reduce
phase. Now I want to SORT this
Hi!
Yes, I'm aware that it's not good idea build ordinary filesystem above
Hadoop. Let's say that I try to build system for my users where is 500 GB
space for every user. It seems that Hadoop can write/store 500 GB fine, but
reading and altering data later isn't easy (at least not altering).
How
HDFS has slightly different design goals. It's not meant as a general
purpose filesystem, it's meant as the fast sequential input/output
storage thing meant for hadoops map/reduce.
Andreas
Am Dienstag, den 08.04.2008, 16:24 +0300 schrieb Mika Joukainen:
Hi!
Yes, I'm aware that it's not good
Hi,
We had a bad disk issue in one of the box and I am seeing
some strange behaviour. Just wanted to confirm whether this is
expected..
* We are running a small cluster with 10 data nodes and a name
node
* Each data node has 6 disks
* While a job was running,
I'm invoking hadoop with pipes command:
hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
I tried the -file and -cacheFile options but when either of these is
passed to hadoop pipes, the command just
running a job on my 5 node cluster, i get these intermittent exceptions in
my logs:
java.io.IOException: incorrect data check
at
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
Method)
at
I'd be happy to file a JIRA for the bug, I just want to make sure I understand
what the bug is: is it the misleading null pointer message or is it that
someone is listening on this port and not doing anything useful? I mean,
what is the configuration parameter dfs.secondary.http.address for?
The secondary Namenode uses the HTTP interface to pull the fsimage from
the primary. Similarly, the primary Namenode uses the
dfs.secondary.http.address to pull the checkpointed-fsimage back from
the secondary to the primary. So, the definition of
dfs.secondary.http.address is needed.
However,
On 4/8/08 10:43 AM, Natarajan, Senthil [EMAIL PROTECTED] wrote:
I would like to try using Hadoop.
That is good for education, probably bad for run time. It could take
SECONDS longer to run (oh my).
Do you mean to write another MapReduce program which takes the output of the
first
Thanks Ted.
I would like to try using Hadoop.
Do you mean to write another MapReduce program which takes the output of the
first MapReduce (the already existing file of this format)
IP Add Count
1.2. 5. 42 27
2.8. 6. 6 24
7.9.24.13 8
7.9. 6. 9201
And use count as the key and IP
Yuri,
The NullPointerException should be fixed as Dhruba proposed.
We do not have any secondary nn web interface as of today.
The http server is used for transferring data between the primary and the
secondary.
I don't see we can display anything useful on the secondary web UI except for
the
The wiki has been down for more than a day, any ETA? I was going to search the
archives for the status, but I'm getting 403's for each of the Archive links on
the mailing list page:
http://hadoop.apache.org/core/mailing_lists.html
My original question was about specifying
Hello.
I'm using Hadoop to process several XML files, each with several XML
records, through a group of Linux servers. I am using an XMLInputFormat that
I found
http://www.nabble.com/map-reduce-function-on-xml-string-td15816818.html here
in Nabble , and I'm using the TextOutputFormat with an
On Tuesday 08 April 2008 11:54:35 am Konstantin Shvachko wrote:
If you have anything in mind that can be displayed on the UI please let us
know. You can also find a jira for the issue, it would be good if this
discussion is reflected in it.
Well, I guess we could have interface to browse the
Hi,
I have implemented Key and value pairs in the following way:
Key (Text class) Value(Custom class)
word1
word2
class Custom{
int freq;
TreeMapString, ArrayListString
}
I construct this type of key, value pairs in the outputcollector of reduce
phase. Now I want to SORT this
Looks like it is up to me.
On 4/8/08 12:36 PM, Ian Tegebo [EMAIL PROTECTED] wrote:
The wiki has been down for more than a day, any ETA? I was going to search
the
archives for the status, but I'm getting 403's for each of the Archive links
on
the mailing list page:
The behavior seems correct.
Assuming blacklisted to mean NameNode marked this node 'dead' :
Murali Krishna wrote:
* We are running a small cluster with 10 data nodes and a name
node
* Each data node has 6 disks
* While a job was running, one of the disk in one data node got
corrupted
so, in an attempt to track down this problem, i've stripped out most of the
files for input, trying to identify which ones are causing the problem.
i've narrowed it down, but i can't pinpoint it. i keep getting these
incorrect data check errors below, but the .gz files test fine with gzip.
is
Colin, how about writing a streaming mapper which simply runs md5sum on each
file it gets as input? Run this task along with the identity reducer, and
you should be able to identify pretty quickly if there's HDFS corruption
issue.
Norbert
On Tue, Apr 8, 2008 at 5:50 PM, Colin Freas [EMAIL
On Tue, Apr 8, 2008 at 12:36 PM, Ian Tegebo [EMAIL PROTECTED] wrote:
My original question was about specifying MaxMapTaskFailuresPercent as a job
conf parameter on the command line for streaming jobs. Is there a conf
setting
like the following?
mapred.taskfailure.percent
The job
Unfortunately we do not have an api for the secondary nn that would allow
browsing the checkpoint.
I agree it would be nice to have one.
Thanks for filing the issue.
--Konstantin
Yuri Pradkin wrote:
On Tuesday 08 April 2008 11:54:35 am Konstantin Shvachko wrote:
If you have anything in mind
Hi everybody,
I have a question about fuse-j-hadoopfs. Do it handles the hadoop
permissions ?
I'm using hadoop.0.16.3
Thanks
X
26 matches
Mail list logo