On May 19, 2009, at 12:13 AM, Foss User wrote:
I know that if a file is very large, it will be split into blocks and
the blocks would be spread out in various data nodes. I want to know
whether I can find out through GUI or logs exactly where which data
nodes contain which file blocks of a
... oh, and getting it to run a marathon too!
http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html
Owen Arun
On May 5, 2009, at 4:47 AM, Christian Ulrik Søttrup wrote:
Hi all,
I have a job that creates very big local files so i need to split it
to as many mappers as possible. Now the DFS block size I'm
using means that this job is only split to 3 mappers. I don't want
to change the hdfs wide
On Apr 20, 2009, at 7:49 PM, Xie, Tao wrote:
I am new to hadoop and now begin to look into the code. I want to
know the
difference between RawLocalFileSystem and LocalFileSystem. I know
the latter
one has the capability to do checksum. Is that all?
Pretty much.
Arun
On Apr 20, 2009, at 9:56 AM, Mark Kerzner wrote:
Hi,
I ran a Hadoop MapReduce task in the local mode, reading and writing
from
HDFS, and it took 2.5 minutes. Essentially the same operations on
the local
file system without MapReduce took 1/2 minute. Is this to be
expected?
Hmm...
On Apr 14, 2009, at 9:11 AM, Jothi Padmanabhan wrote:
2. Framework kills the task because it did not progress enough
That should count as a 'failed' task, not 'killed' - it is a bug if we
are not counting timed-out tasks against the job...
Arun
On Apr 4, 2009, at 7:05 AM, Zheng Shao wrote:
I guess the performance will be bad, but we should still be able to
read/write the file. Correct?
Why do we throw an Exception?
java.util.zip.GzipCodec doesn't expose the underlying codec... that's
critical to do a *reset*. The native
I assume you have only 2 map and 2 reduce slots per tasktracker -
which totals to 2 maps/reduces for you cluster. This means with more
maps/reduces they are serialized to 2 at a time.
Also, the -m is only a hint to the JobTracker, you might see less/more
than the number of maps you have
On Feb 24, 2009, at 4:03 PM, bzheng wrote:
2009-02-23 14:27:50,902 INFO org.apache.hadoop.mapred.TaskTracker:
java.lang.OutOfMemoryError: Java heap space
That tells that that your TaskTracker is running out of memory, not
your reduce tasks.
I think you are hitting
On Feb 23, 2009, at 2:01 AM, Bing TANG wrote:
Hi, everyone,
Could somdone tell me the principle of -file when using Hadoop
Streaming. I want to ship a big file to Slaves, so how it works?
Hadoop uses SCP to copy? How does Hadoop deal with -file option?
No, -file just copies the file from
On Feb 10, 2009, at 12:24 PM, Mimi Sun wrote:
I see UnsatisfiedLinkError. Also I'm calling
System.getProperty(java.library.path) in the reducer and logging
it. The only thing that prints out is ...hadoop-0.18.2/bin/../lib/
native/Mac_OS_X-i386-32
I'm using Cascading, not sure if that
On Feb 10, 2009, at 11:06 AM, Mimi Sun wrote:
Hi,
I'm new to Hadoop and I'm wondering what the recommended method is
for using native libraries in mapred jobs.
I've tried the following separately:
1. set LD_LIBRARY_PATH in .bashrc
2. set LD_LIBRARY_PATH and JAVA_LIBRARY_PATH in
On Feb 8, 2009, at 11:26 PM, Taeho Kang wrote:
Dear All,
With Hadoop 0.19.0, Reduce stage does not start until Map stage gets
to the
100% completion.
Has anyone faced the similar situation?
How many maps and reduces does your job have?
Arun
On Feb 6, 2009, at 12:39 PM, Bryan Duxbury wrote:
I'm seeing some strange behavior on my cluster. Jobs will be done
(that is, all tasks completed), but the job will still be running.
This state seems to persist for minutes, and is really killing my
throughput.
I'm seeing errors
On Feb 5, 2009, at 1:40 PM, S D wrote:
Is there a way to use the Reporter interface (or something similar
such as
Counters) with Hadoop streaming? Alternatively, can how could STDOUT
be
intercepted for the purpose of updates? If anyone could point me to
documentation or examples that cover
On Jan 30, 2009, at 2:41 PM, Bill Au wrote:
Is there any way to cancel a job after it has been submitted?
bin/hadoop job -kill jobid
Arun
On Jan 13, 2009, at 7:29 AM, Gert Pfeifer wrote:
Hi,
I want to use an lzo file as input for a mapper. The record reader
determines the codec using a CompressionCodecFactory, like this:
(Hadoop version 0.19.0)
http://hadoop.apache.org/core/docs/r0.19.0/native_libraries.html
hth,
Arun
On Jan 9, 2009, at 12:09 AM, Saptarshi Guha wrote:
Hello,
Sorry for the puzzling subject. I have a single long running
/statement/ in my reduce method, so the the framework might assume my
reduce is not responding and kill it.
I solved the problem in the map method by subclassing MapRunner,
On Dec 18, 2008, at 2:09 PM, Zheng Shao wrote:
mapred.compress.map.output is set to true, and the job has 6860
mappers and 300 reducers.
Several reducers failed because:out of memory error in the shuffling
phase.
Error log:
2008-12-18 11:42:46,593 WARN org.apache.hadoop.mapred.ReduceTask:
On Dec 9, 2008, at 10:37 AM, Owen O'Malley wrote:
On Dec 9, 2008, at 2:22 AM, Devaraj Das wrote:
I know that the tasktracker/jobtracker doesn't have any command for
re-reading the configuration. There is built-in support for
restart/shut-down but those are via external scripts that
On Dec 5, 2008, at 12:32 PM, Craig Macdonald wrote:
I have a related question - I have a class which is both mapper and
reducer. How can I tell in configure() if the current task is map or
a reduce task? Parse the taskid?
Get the taskid, then use
On Dec 5, 2008, at 2:43 PM, Songting Chen wrote:
To summarize the slow shuffle issue:
1. I think one problem is that the Reducer starts very
late in the process, slowing the entire job significantly.
Is there a way to let reducer start earlier?
On Dec 5, 2008, at 10:58 AM, charles du wrote:
Any update on this?
What is the available heapsize for the JobTracker? (HADOOP_HEAPSIZE or
set it in HADOOP_JOBTRACKER_OPTS in conf/hadoop-env.sh).
Do you remember how many total tasks (across all jobs) were executed
before the OOM?
On Dec 6, 2008, at 11:40 AM, charles du wrote:
I used the default value, which I believe is 1000 MB. My cluster has
about
30 machines. Each machine is configured to run up to 5 tasks. We run
hourly,
daily jobs on the cluster. When OOM happened, I was running a job
with 1500
- 1600
On Dec 3, 2008, at 5:49 AM, Zhou, Yunqing wrote:
I'm running a job on a data with size 5TB. But currently it reports
there is a checksum error block in the file. Then it cause a map task
failure then the whole job failed.
But the lack of a 64MB block will almost not affect the final result.
So
On Nov 23, 2008, at 6:09 AM, Tim Williams wrote:
The Quickstart[1] suggests the minimum java version is 1.5.x but I
was only
successful getting the examples running after using 1.6.Thanks,
--tim
[1] - http://hadoop.apache.org/core/docs/current/quickstart.html
Thanks for pointing this
On Nov 7, 2008, at 12:12 PM, Brian MacKay wrote:
Looking for a way to dynamically terminate a job once Reporter in a
Map
job hits a threshold,
Example:
public void map(WritableComparable key, Text values, Output
CollectorText, Text output, Reporter reporter) throws IOException {
if(
On Oct 30, 2008, at 1:16 PM, Scott Whitecross wrote:
Is the presentation online as well? (Hard to see some of the slides
in the video)
http://wiki.apache.org/hadoop/HadoopPresentations
Arun
On Oct 30, 2008, at 1:34 PM, Alex Loddengaard wrote:
Arun gave a great talk about debugging
It's possible that the JobTracker is under duress and unable to
respond to the TaskTrackers... what do the JobTracker logs say?
Arun
On Oct 29, 2008, at 12:33 PM, Aaron Kimball wrote:
Hi all,
I'm working with a 30 node Hadoop cluster that has just started
demonstrating some weird behavior.
On Oct 27, 2008, at 7:05 PM, Grant Ingersoll wrote:
Hi,
Over in Mahout (lucene.a.o/mahout), we are seeing an oddity with
some of our clustering code and Hadoop 0.18.1. The thread in
context is at: http://mahout.markmail.org/message/vcyvlz2met7fnthr
The problem seems to occur when
On Oct 22, 2008, at 2:52 PM, Yih Sun Khoo wrote:
I like to hear some good ways of passing constants from one job to
the next.
Unless I'm missing something: JobConf? A HDFS file? DistributedCache?
Arun
These are some ways that I can think of:
1) The obvious solution is to carry the
On Oct 10, 2008, at 12:52 AM, Edward J. Yoon wrote:
Hi,
To get a number of reduce_output_records, I was write code as:
long rows = rJob.getCounters().findCounter(
org.apache.hadoop.mapred.Task$Counter, 8,
REDUCE_OUTPUT_RECORDS)
.getCounter();
. There is no liblzo2.so there. Do I need to rename them to
liblzo2.so somehow?
--- On Thu, 10/9/08, Arun C Murthy [EMAIL PROTECTED] wrote:
From: Arun C Murthy [EMAIL PROTECTED]
Subject: Re: How to make LZO work?
To: core-user@hadoop.apache.org
Date: Thursday, October 9, 2008, 6:35 PM
On Oct 9
= SequenceFile.createWriter(fileSys, jobConf,
file, LongWritable.class, BytesWritable.class,
SequenceFile.CompressionType.BLOCK, new LzoCodec());
Rebuilding the library gave some weird error too.
--- On Fri, 10/10/08, Arun C Murthy [EMAIL PROTECTED] wrote:
From: Arun C
On Oct 9, 2008, at 5:58 PM, Songting Chen wrote:
Hi,
I have installed lzo-2.03 to my Linux box.
But still my code for writing a SequenceFile using LZOcodec returns
the following error:
util.NativeCodeLoader: Loaded the native-hadoop library
java.lang.UnsatisfiedLinkError: Cannot load
On Oct 3, 2008, at 1:10 AM, Devajyoti Sarkar wrote:
Hi Alan,
Thanks for your message.
The object can be read-only once it is initialized - I do not need
to modify
Please take a look at DistributedCache:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache
An
Nathan,
On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:
Hello,
We have been doing some profiling of our MapReduce jobs, and we are
seeing about 20% of the time of our jobs is spent calling FileSystem
$Statistics.incrementBytesRead when we interact with the
FileSystem. Is there a way to
On Oct 1, 2008, at 10:17 AM, Terrence A. Pietrondi wrote:
I am trying to plan out my map-reduce implementation and I have some
questions of where computation should be split in order to take
advantage of the distributed nodes.
Looking at the architecture diagram
On Oct 1, 2008, at 11:07 AM, Per Jacobsson wrote:
I ran a job last night with Hadoop 0.18.0 on EC2, using the standard
small
AMI. The job was producing gzipped output, otherwise I haven't
changed the
configuration.
The final reduce steps failed with this error that I haven't seem
before:
/jira/browse/HADOOP-3659
You probably will need to monkey with LDFLAGS as well to get it
to work, but we've been able to build the native libs for the
Mac without too much trouble.
Doug Cutting wrote:
Arun C Murthy wrote:
You need to add libhadoop.so to your java.library.patch.
libhadoop.so
, Oct 1, 2008 at 11:23 AM, Arun C Murthy [EMAIL PROTECTED]
wrote:
Do you still have the task logs for the reduce?
I suspect are running into
http://issues.apache.org/jira/browse/HADOOP-3647 which we never could
reproduce reliably to pin it down or fix.
However, in light of http
Nathan,
You need to add libhadoop.so to your java.library.patch.
libhadoop.so is available in the corresponding release in the lib/
native directory.
Arun
On Sep 30, 2008, at 11:14 AM, Nathan Marz wrote:
I am trying to use SequenceFiles with LZO compression outside the
context of a
On Sep 30, 2008, at 11:46 AM, Doug Cutting wrote:
Arun C Murthy wrote:
You need to add libhadoop.so to your java.library.patch.
libhadoop.so is available in the corresponding release in the lib/
native directory.
I think he needs to first build libhadoop.so, since he appears
On Sep 30, 2008, at 1:37 PM, Bryan Duxbury wrote:
Hey all,
Why is it that FileSystem.rename returns true or false instead of
throwing an exception? It seems incredibly inconvenient to get a
false result and then have to go poring over the namenode logs
looking for the actual error
On Sep 29, 2008, at 3:11 AM, Geethajini C wrote:
Hi everyone,
In the example MultiFileWordCount.java
(hadoop-0.17.0), what happens when the statement
JobClient.runJob(job);is executed. What methods will be
called in sequence?
This might help:
On Sep 29, 2008, at 2:52 PM, Saptarshi Guha wrote:
Setup:
I am running the namenode on A, the sec. namenode on B and the
jobtracker on C. The datanodes and tasktrackers are on Z1,Z2,Z3.
Problem:
However, the jobtracker is starting up on A. Here are my configs for
Jobtracker
This would
2008-09-25 17:12:18,250 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_200809180916_0027_r_07_2: Got 2 new map-outputs number
of known map outputs is 21
2008-09-25 17:12:18,251 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_200809180916_0027_r_07_2 Merge of the inmemory files
On Sep 25, 2008, at 2:26 PM, Joe Shaw wrote:
Hi,
I'm trying to build an index using the index contrib in Hadoop
0.18.0, but the reduce tasks are consistently failing.
What did the logs for the task-attempt
'attempt_200809180916_0027_r_07_2' look like? Did the TIP/Job
succeed?
On Sep 23, 2008, at 11:41 AM, Joel Welling wrote:
Hi folks;
I have a small cluster, but each node is big- 8 cores each, with lots
of IO bandwidth. I'd like to increase the number of simultaneous map
and reduce tasks scheduled per node from the default of 2 to something
like 8.
My
==map phase==
input: key = LongWritable value = Text,
output: key = Text, value = Longwritable
==combiner==
input: key = Text, value = iteratorLongWritable
output: key = Text, value = Text
The combiner is a pure optimization and *cannot* change the output
types of the map i.e. the combiner
On Sep 21, 2008, at 2:05 PM, David Hall wrote:
(New to this list)
Hi,
My research group is setting up a small (20-node) cluster. All of
these machines are linked by NFS. We have a fairly entrenched
codebase/development cycle, and in particular we'd like to be able to
access user $CLASSPATHs
On Sep 16, 2008, at 12:26 PM, pvvpr wrote:
Hello,
A strange thing happened in my job. In reduce phase, one of the tasks
status shows 101.44% complete and runs till some 102% and successfully
finished back to 100%. Is this a right behavior?
Which version of Hadoop are you running; and are
On Sep 11, 2008, at 9:10 AM, pvvpr wrote:
Hello,
Never came across this error before. Upgraded to 0.18.0 this
morning and
ran a nutch fetch job. Got this exception in both the reduce
attempts of
a task and they failed. All other reducers seemed to work fine, except
for one task.
Any
On Sep 7, 2008, at 12:26 PM, Erik Holstad wrote:
Hi!
I'm trying to run a MR job, but it keeps on failing and I can't
understand
why.
Sometimes it shows output at 66% and sometimes 98% or so.
I had a couple of exception before that I didn't catch that made the
job to
fail.
The log file
On Aug 22, 2008, at 2:15 PM, Kevin wrote:
Why -jobconf is not recognized, and -D is overwritten by the program
code?
For Hadoop Streaming:
-jobconf mapred.job.name=myjob
For java Map-Reduce applications:
-Dmapred.job.name=myjob
Arun
Best,
-Kevin
On Fri, Aug 22, 2008 at 2:05 PM,
On Aug 22, 2008, at 11:15 AM, Chris Gray wrote:
All,
I am using Hadoop using a test case set up by Michael Noll found on
his web
page (http://www.michael-noll.com).
I have successfully ran a job on a single Node Cluster from his
examples. I
am trying to add a additional machine to the
It wouldn't be too much of a stretch to use Lustre directly...
although it isn't trivial either.
You'd need to implement the 'FileSystem' interface for Lustre, define
a URI scheme (e.g. lfs://) etc. Please take a take a look at the KFS/
S3 implementations.
Arun
On Aug 21, 2008, at 9:59 AM,
On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote:
Hello list,
Thought I would share this tidbit that frustrated me for a couple of
hours. Beware! Hadoop reuses the Writable objects given to the
reducer. For example:
Yes.
http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in
.
On Tue, Aug 12, 2008 at 5:07 PM, Arun C Murthy [EMAIL PROTECTED]
wrote:
On Aug 12, 2008, at 11:21 AM, charles du wrote:
Hi:
Does hadoop always start a new process for each map task?
Yes. http://issues.apache.org/jira/browse/HADOOP-249 is open to
optimize
that.
Till HADOOP-249 is fixed
On Aug 12, 2008, at 11:51 PM, 11 Nov. wrote:
Hi colleagues,
As you know, the append writer will be available in version 0.18.
We are
here waiting for the feature and want to know the rough time of
release.
It's currently under vote, it should be released by the end of the
week if it
io.sort.mb and fs.inmemory.size.mb are way too high given you are
using default of -200Xmx.
Bump both down to 100-200 and up -Xmx to 512M via
mapred.child.java.opts.
Arun
On Aug 12, 2008, at 1:26 PM, James Graham (Greywolf) wrote:
Environment specifications:
Hadoop 0.16.4 (stable)
On Aug 12, 2008, at 2:35 PM, steph wrote:
Is there a tunable in hadoop to make it cleans the jobcache entry?
JobCache can ve extremely greedy in terms of inodes considering that
for each
task it starts by unjarring the jars-- which can be big--.
Which version of Hadoop are you running?
On Aug 12, 2008, at 11:52 AM, Stuart Sierra wrote:
Hello, list,
I've seen this question before, but haven't found an answer. If I run
a Hadoop job on a cluster (EC2), how can I download the stdout/stderr
logs from each mapper/reducer task? Are they stored somewhere in
HDFS, or just on the
On Aug 12, 2008, at 11:01 AM, stephanebrossier wrote:
Hi,
I am using hadoop 0.16.3 in a production environment.
We have been using our system for a few weeks already
and are constantly running out of inodes.
We've fixed related bugs in 0.17.2 - it should be released in the next
couple of
On Aug 12, 2008, at 3:15 PM, Ashish Venugopal wrote:
There is definitely functionality in normal mode that is not
available in
streaming, like the ability to write counters to instruments jobs. I
personally just use streaming, so I am interested to see if there are
further key differences...
On Aug 12, 2008, at 11:21 AM, charles du wrote:
Hi:
Does hadoop always start a new process for each map task?
Yes. http://issues.apache.org/jira/browse/HADOOP-249 is open to
optimize that.
Till HADOOP-249 is fixed, you could try and launch fewer, fatter maps
by doing more work on
On Jul 25, 2008, at 3:53 PM, Joydeep Sen Sarma wrote:
Just as an aside - there is probably a general perception that
streaming
is really slow (at least I had it).
The last I did some profiling (in 0.15) - the primary overheads from
streaming came from the scripting language (python is
Charles,
The right forum for Pig is [EMAIL PROTECTED], I'm
redirecting you there... good luck!
Arun
On Jul 18, 2008, at 11:51 AM, charles du wrote:
Hi:
Just start learning hadoop and pig latin. How can I get the number of
elements in a data bag?
For example, a data bag like follow has
On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:
Hi All,
I am using Hadoop Streaming. I am confused by streaming
options: -file and -CacheFile. Seems that they mean the same thing,
right?
The difference is that -file will 'ship' your file (local file) to
the cluster, while
On Jul 16, 2008, at 4:09 PM, Kylie McCormick wrote:
Hello (Again):
I've managed to get Map/Reduce on its feet and running, but the
JobClient
runs the Map() to 100% then idles. At least, I think it's idling. It's
certainly not updating, and I let it run 10+ minutes.
I tried to get the
# bin/hadoop dfs -put conf input
08/06/29 09:38:42 INFO dfs.DFSClient:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /
user/root/input/hadoop-env.sh could only be replicated to 0 nodes,
instead of 1
Looks like your datanode didn't come up, anything in the logs?
On Jul 7, 2008, at 9:46 AM, Chris K Wensel wrote:
Hey all
Has anyone had success with RandomTextWriter?
I'm finding it fairly unstable on 0.16.x, haven't tried 0.17 yet
though.
What problems are you seeing? It seems to work fine for me...
Arun
On Jul 1, 2008, at 4:04 AM, novice user wrote:
Hi all,
I have a query regarding the functionality of combiner.
Is it possible to ignore combiner code for some of the outputs of
mapper and
directly being sent to reducer though combiner is specified in job
configuration?
Because, I
On Jul 1, 2008, at 5:49 AM, boris starchev wrote:
a.io.IOException: File
/tmp/hadoop-bstarchev/mapred/system/job_200807011532_0001
/job.jar could only be replicated to 0 nodes, instead of 1
at
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.ja
va:1145)
Looks like
On Jun 27, 2008, at 4:26 PM, Mori Bellamy wrote:
hey all,
i was wondering if theres a way to allocate more heap space for
each mapper and reducer process that hadoop spawns. i'm getting
this error:
Use the 'mapred.child.java.opts' parameter:
On Jun 12, 2008, at 6:47 AM, montag wrote:
Hi,
I'm a new Hadoop user, so if this question is blatantly obvious, I
apologize. I'm trying to load a native shared library using the
DistributedCache as outlined in
https://issues.apache.org/jira/browse/HADOOP-1660?
On Jun 11, 2008, at 11:53 AM, Elia Mazzawi wrote:
we concatenated the files to bring them close to and less than 64mb
and the difference was huge without changing anything else
we went from 214 minutes to 3 minutes !
*smile*
How many reduces are you running now? 1 or more?
Arun
Elia
On Jun 10, 2008, at 2:48 PM, Meng Mao wrote:
I'm interested in the same thing -- is there a recommended way to
batch
Hadoop jobs together?
Hadoop Map-Reduce JobControl:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Job
+Control
and
On Jun 10, 2008, at 3:16 PM, Miles Osborne wrote:
Is there support for counters in streaming? In particular, it
would be nice
to be able to access these after a job has run.
Yes. Streaming applications can update counters in hadoop-0.18:
http://issues.apache.org/jira/browse/HADOOP-1328
On May 21, 2008, at 10:45 PM, Taeho Kang wrote:
Dear all,
I am trying to use DistributedCache class for distributing files
required
for running my jobs.
While API documentation provides good guidelines,
Is there any tips or usage examples (e.g. sample codes)?
On May 20, 2008, at 2:21 AM, Marianne Spiller wrote:
Can anyone please give me a hint?
The below BindException's tell me that there is something already
running on those ports... maybe you need to kill them if they are
Hadoop daemons.
Arun
2008-05-15 12:51:55,077 FATAL
On May 20, 2008, at 9:03 AM, Saptarshi Guha wrote:
Hello,
Does the Data-local map tasks counter mean the number of tasks
that the had the input data already present on the machine on they
are running on? i.e the wasn't a need to ship the data to them.
Yes. Your understanding is
On May 19, 2008, at 1:37 AM, Fabrizio detto Mario wrote:
How does Hadoop manage the failure of the JobTracker (Master Node)?
For example, Google Map/Reduce version aborts the MapReduce
computation if
the master fails.
As NameNode and SecondaryNameNode, exists a SecondaryJobTracker?
Wang,
On May 13, 2008, at 8:12 AM, wangxiaowei wrote:
hi all:
I uses two computers A and B as a hadoop cluster,A is JobTracker
and NameNode,both A and B are slaves.
The input data size is about 80MB,including 100,000records. The job
is to read one record a time and find some useful
property
namehadoop.tmp.dir/name
valuetmp_storage/value
/property
Could you try and change the above to an absolute path and check?
That path should be relevant on each of the tasktrackers.
Of course, you can configure each tasktracker independently by
editing it's hadoop-site.xml.
On May 7, 2008, at 12:33 AM, Hadoop wrote:
Is there any chance to see some simple programs for Hadoop (such as
Hello
world, counting numbers 1-10, reading two numbers and printing the
larger
one, other number, string and file processing examples,...etc)
written in
Java/C++.
It seems
On May 7, 2008, at 6:30 AM, Roberto Zandonati wrote:
Hi at all, I'm a newbie and I have the following problem.
I need to implement an InputFormat such that the isSplitable always
returns false ah shown in http://wiki.apache.org/hadoop/FAQ (question
no 10).
And here there is the problem.
I
Yi,
On May 3, 2008, at 1:02 AM, Yi Wang (王益) wrote:
It seems org.apache.hadoop.mapred.join implements the function of
joined-inputs. I am wondering whether Hadoop allows Mapper outputs to
multiple output channels?
There is a MultipleOutputCollector being currently worked on:
On Apr 30, 2008, at 8:14 AM, ncardoso wrote:
Hello,
I'm using Hadoop for distributed text mining of large collection of
documents, and in my optimizing process, I want to speed things up
a bit,
and I want to know how can I do this step with Hadoop...
Each Map process takes a group of
On Apr 23, 2008, at 8:14 PM, Ashish Venugopal wrote:
2008-04-23 19:43:23,848 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess failed with code 14
Looks like your streaming command failed with error_code of 14, could
you check why your command failed?
On Apr 23, 2008, at 7:51 AM, Apurva Jadhav wrote:
There are six reducers and 24000 mappers because there are 24000
files.
The number of tasks per node is 2.
mapred.child.java opts is the default value 200m. What is a good
value for this.? My mappers and reducers are fairly simple and do
On Apr 23, 2008, at 7:51 AM, Apurva Jadhav wrote:
There are six reducers and 24000 mappers because there are 24000
files.
The number of tasks per node is 2.
mapred.child.java opts is the default value 200m. What is a good
value for this.? My mappers and reducers are fairly simple and do
On Apr 3, 2008, at 5:36 PM, Jason Venner wrote:
For the first day or so, when the jobs are viewable via the main
page of the job tracker web interface, the jobs specific counters
are also visible. Once the job is only visible in the history page,
the counters are not visible.
Is it
On Mar 26, 2008, at 9:39 AM, Aayush Garg wrote:
HI,
I am developing the simple inverted index program frm the hadoop.
My map
function has the output:
word, doc
and the reducer has:
word, list(docs)
Now I want to use one more mapreduce to remove stop and scrub words
from
this output.
On Mar 26, 2008, at 10:08 AM, Parand Darugar wrote:
Hello,
Is there a hadoop recipes / snippets / cookbook site? I'm thinking
something like the Python Cookbook (http://aspn.activestate.com/
ASPN/Python/Cookbook/) or Django Snippets (http://
www.djangosnippets.org/), where people can post
On Mar 26, 2008, at 11:05 AM, Arun C Murthy wrote:
On Mar 26, 2008, at 9:39 AM, Aayush Garg wrote:
HI,
I am developing the simple inverted index program frm the hadoop.
My map
function has the output:
word, doc
and the reducer has:
word, list(docs)
Now I want to use one more mapreduce
On Mar 21, 2008, at 6:35 PM, Stephen J. Barr wrote:
Hello,
I am working on developing my first hadoop app from scratch. It is
a Monte-Carlo simulation, and I am using the PiEstimator code from
the examples as a reference. I believe I have what I want in
a .java file. However, I couldn't
On Mar 14, 2008, at 11:48 PM, Raghavendra K wrote:
Hi,
My apologies for bugging the forum again and again.
I am able to get the sample program for libhdfs working. I followed
these
steps.
--- compiled using ant
--- modified the test-libhdfs.sh to include CLASSPATH, HADOOP_HOME,
On Mar 10, 2008, at 3:18 PM, Jason Rennie wrote:
I just ran through this as a new user and had trouble w/ the JAVA_HOME
setting. Per the instructions, I had JAVA_HOME set appropriately
(via my
.bashrc), but not in conf/hadoop-env.sh. Would be good if 1. under
Required Software specified
On Feb 21, 2008, at 3:29 AM, Raghavendra K wrote:
Hi,
I am able to get Hadoop running and also able to compile the
libhdfs.
But when I run the hdfs_test program it is giving Segmentation Fault.
Unfortunately the documentation for using libhdfs is sparse, our
apologies.
You'll need
1 - 100 of 107 matches
Mail list logo