Hi,
I see there are various posts claiming hadoop is available through official
debian mirrors (for debian squeeze, i.e. stable):
* http://www.debian-news.net/2010/07/17/apache-hadoop-in-debian-squeeze/
*
http://blog.isabel-drost.de/index.php/archives/213/apache-hadoop-in-debian-squeeze
On Thu, 17 Mar 2011 19:33:02 +0100
Thomas Koch tho...@koch.ro wrote:
Currently my advise is to use the Debian packages from cloudera.
That's the problem, it appears there are none.
Like I said in my earlier mail, Debian is not in Cloudera's list of
supported distros, and they do not have a
Hi, I'm using the streaming API and I notice my reducer gets - in the same
invocation - a bunch of different keys, and I wonder why.
I would expect to get one key per reducer run, as with the normal
hadoop.
Is this to limit the amount of spawned processes, assuming creating and
destroying
On Tue, 29 Mar 2011 23:17:13 +0530
Harsh J qwertyman...@gmail.com wrote:
Hello,
On Tue, Mar 29, 2011 at 8:25 PM, Dieter Plaetinck
dieter.plaeti...@intec.ugent.be wrote:
Hi, I'm using the streaming API and I notice my reducer gets - in
the same invocation - a bunch of different keys
Hi,
I use 0.20.2 on Debian 6.0 (squeeze) nodes.
I have 2 problems with my streaming jobs:
1) I start the job like so:
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-file /proj/Search/wall/experiment/ \
-mapper './nolog.sh mapper' \
-reducer
hi,
I use hadoop 0.20.2, more specifically hadoop-streaming, on Debian 6.0
(squeeze) nodes.
My question is: how do I make sure input keys being fed to the reducer
are sorted numerically rather then alphabetically?
example:
- standard behavior:
#1 some-value1
#10 some-value10
#100 some-value100
Hi,
I have a cluster of 4 debian squeeze machines, on all of them I
installed the same version ( hadoop-0.20.2.tar.gz )
I have : n-0 namenode, n-1: jobtracker and n-{0,1,2,3} slaves
but you can see all my configs in more detail @
http://pastie.org/1754875
the machines have 3GiB RAM.
I don't
of the datanode
or tasktracker logs. And the NameNode webinterface even tells me all
nodes are live, none are dead.
This is effectively holding me back from using the cluster,
I'm completely in the dark, I find this very frustrating. :(
Thank you,
Dieter
On Mon, 4 Apr 2011 18:45:49 +0200
Dieter Plaetinck
parameter for it, as noted in:
http://hadoop.apache.org/common/docs/r0.20.2/streaming.html#A+Useful+Comparator+Class
[The -D mapred.output.key.comparator.class=xyz part]
On Thu, Mar 31, 2011 at 6:26 PM, Dieter Plaetinck
dieter.plaeti...@intec.ugent.be wrote:
couldn't find how I should do that.
Hi,
I have a script something like this (simplified):
for i in $(seq 1 200); do
regenerate-files $dir $i
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-D mapred.job.name=$i \
-file $dir \
-mapper ... -reducer ... -input $i-input -output
exec_stream_job.sh
regenerate-files $dir $i
hadoop
jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-D mapred.job.name=$i \ -file $dir \
-mapper ... -reducer ... -input $i-input -output $i-output
From: Dieter
Hi,
I'm running some experiments using hadoop streaming.
I always get a output_dir/part-0 file at the end, but I wonder:
when exactly will this filename show up? when it's completely written,
or will it already show up while the hapreduce software is still
writing to it? Is the write atomic?
On Thu, 12 May 2011 09:49:23 -0700 (PDT)
Aman aman_d...@hotmail.com wrote:
The creation of files part-n is atomic. When you run a MR job,
these files are created in directory output_dir/_temporary and
moved to output_dir after the files is closed for writing. This
move is atomic hence as
What do you mean clunky?
IMHO this is a quite elegant, simple, working solution.
Sure this spawns multiple processes, but it beats any
api-overcomplications, imho.
Dieter
On Wed, 18 May 2011 11:39:36 -0500
Patrick Angeles patr...@cloudera.com wrote:
kinda clunky but you could do this via
On Fri, 20 May 2011 10:11:13 -0500
Brian Bockelman bbock...@cse.unl.edu wrote:
On May 20, 2011, at 6:10 AM, Dieter Plaetinck wrote:
What do you mean clunky?
IMHO this is a quite elegant, simple, working solution.
Try giving it to a user; watch them feed it a list of 10,000 files
Hi,
if I simplify my code, I basically do this:
hadoop dfs -rm -skipTrash $file
hadoop dfs -copyFromLocal - $local $file
(the removal is needed because I run a job but previous input/output may exist,
so I need to delete it first, as -copyFromLocal does not support overwrite)
During the 2nd
On Thu, 28 Jul 2011 06:13:01 -0700
Thomas Graves tgra...@yahoo-inc.com wrote:
Its currently still on the MR279 branch -
http://svn.apache.org/viewvc/hadoop/common/branches/MR-279/. It is
planned to be merged to trunk soon.
Tom
On 7/28/11 7:31 AM, real great..
Hi,
On Wed, 10 Aug 2011 13:26:18 -0500
Michel Segel michael_se...@hotmail.com wrote:
This sounds like a homework assignment than a real world problem.
Why? just wondering.
I guess people don't race cars against trains or have two trains
traveling in different directions anymore... :-)
huh?
Hi,
I know this question has been asked before, but I could not find
the right solution. Maybe because I use hadoop 0.20.2, some posts assumed
older versions.
My code (relevant chunk):
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
Configuration conf =
On Wed, 31 Aug 2011 08:44:42 -0700
Mohit Anchlia mohitanch...@gmail.com wrote:
Does map-reduce work well with binary contents in the file? This
binary content is basically some CAD files and map reduce program need
to read these files using some proprietry tool extract values and do
some
On Wed, 21 Sep 2011 11:21:01 +0100
Steve Loughran ste...@apache.org wrote:
On 20/09/11 22:52, Michael Segel wrote:
PS... There's this junction box in your machine room that has this
very large on/off switch. If pulled down, it will cut power to your
cluster and you will lose everything.
Or more general:
isn't using virtualized i/o counter effective when dealing with hadoop M/R?
I would think that for running hadoop M/R you'd want predictable and consistent
i/o on each node,
not to mention your bottlenecks are usually disk i/o (and maybe CPU), so using
virtualisation makes
Hello friends of hadoop,
I just want to inform you about the 12th edition of the Dutch Information
Retrieval conference which will be organized in the lovely city of Ghent,
Belgium on 23/24 february 2012.
There's the usual CFP, see the website at http://dir2012.intec.ugent.be/
There's definitely
Very clear. The comic format works indeed quite well.
I never considered comics as a serious (professional) way to get something
explained efficiently,
but this shows people should think twice before they start writing their next
documentation.
one question though: if a DN has a corrupted
Great work folks! Very interesting.
PS: did you notice if you google for hanborq or HDH it's very hard to find
your website, hanborq.com ?
Dieter
On Tue, 21 Feb 2012 02:17:31 +0800
Schubert Zhang zson...@gmail.com wrote:
We just update the slides of this improvements:
25 matches
Mail list logo