There is a C-HDFS API + library (called libhdfs) available @
http://hadoop.apache.org/common/docs/r0.20.2/libhdfs.html. Perhaps you
can make your C++ mapper program use that?
On Thu, Mar 24, 2011 at 10:56 AM, Keith Wiley kwi...@keithwiley.com wrote:
This webpage:
Also, is your Hadoop really under nutch/search or is it
nutch/search/hadoop-0.x.x? Set the HADOOP_HOME appropriately to the
exact directory Hadoop's files exist immediately under.
On Thu, Mar 24, 2011 at 1:13 PM, Andy XUE andyxuey...@gmail.com wrote:
Hi there:
I'm a new user to Hadoop and
On 23/03/11 15:32, Michael Segel wrote:
Rita,
It sounds like you're only using Hadoop and have no intentions to really get
into the internals.
I'm like most admins/developers/IT guys and I'm pretty lazy.
I find it easier to set up the yum repository and then issue the yum install
hadoop
Hi there:
I'm a new user to Hadoop and Nutch, and I am trying to run the crawler *
Nutch* on a distributed system powered by *Hadoop*. However as it turns out,
the distributed system does not recognise any slave nodes in the cluster.
I've stucked at this point for months and am desperate to look
hi,
I run almost 60 ruduce tasks for a single job.
if the outputs of a job are from part00 to part 59.
is there a way to write rows sequentially by sorted keys?
curretly my outputs are like this.
part00)
1
10
12
14
part 01)
2
4
6
11
13
part 02)
3
5
7
8
9
but, my aim is to get the
I'm also new to hadoop, but I was able to get my cluster up and
running. I'm not familiar with Nutch though.
In any case, my assumption is that Nutch relies on a working hadoop
cluster as the base and adds on a few configurations to integrate the
two.
Here are some things that might help you:
*
On Mar 23, 2011, at 11:10 PM, Harsh J wrote:
There is a C-HDFS API + library (called libhdfs) available @
http://hadoop.apache.org/common/docs/r0.20.2/libhdfs.html. Perhaps you
can make your C++ mapper program use that?
Thanks. Actually, I think that with reference to the passage I quoted
Hello,
On Thu, Mar 24, 2011 at 8:45 PM, Keith Wiley kwi...@keithwiley.com wrote:
Thanks. Actually, I think that with reference to the passage I quoted in my
first post, the unstated intent was to simply do a system() call and invoke
hadoop fs -get or hadoop fs -copyToLocal.
Some would
On Mar 24, 2011, at 8:31 AM, Harsh J wrote:
Hello,
On Thu, Mar 24, 2011 at 8:45 PM, Keith Wiley kwi...@keithwiley.com wrote:
Thanks. Actually, I think that with reference to the passage I quoted in my
first post, the unstated intent was to simply do a system() call and invoke
hadoop fs
On Mar 23, 2011, at 7:29 AM, Rita wrote:
I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/)
instead of the standard Hadoop distribution.
What do most people use? Is CDH free? do they provide the tars or does it
provide source code and I simply compile? Can I have
I am setting up a (very) small Hadoop/CDH3 beta 4 cluster in virtual machines
to do some initial feasibility work. I proceeded by progressing through the
Cloudera documentation standalone - pseudo-cluster - cluster with a single VM
and then, when I had it stable(-ish) I copied the VM to a
I'm using hadoop streaming and currently have these properties in my
command line:
-Dstream.map.output.field.separator=' ' \
-Dstream.num.map.output.key.fields=1 \
This works for me as my test data happens to have a space at column 14.
If I want to use a fixed length split, is there a simple
Shi,
The key here is the 99% done mapper. Nothing can move on until all
mappers complete.
Is it possible your data in the larger set has an incomplete record or
some such at the end?
Kevin
-Original Message-
From: Shi Yu [mailto:sh...@uchicago.edu]
Sent: Thursday, March 24, 2011 3:02
Hi,
To solve that simply do the following on the problematic nodes:
1) Stop the datanode (probably not running)
2) Remove everything inside the .../cache/hdfs/
3) Start the datanode again.
Note: With cloudera always use service way to stop/start hadoop software!
service hadoop-0.20-datanode stop
Hi Kevin,
thanks for reply. I could hardly imagine an example of incomplete
record. The mapper is very simple, just reading line by line as Strings,
splitting the line by tab, and outputting a Text Pair for sort and
secondary sort. If there were incomplete record, there should be an
error
Dear,
I have checked out via SVN the Hadoop core code. I am trying to
compile it. Is there a build script to work from?
Daniel McEnnis.
Shi,
This states Of course, the framework discards the sub-directory of
unsuccessful task-attempts.
http://hadoop-karma.blogspot.com/2011/01/hadoop-cookbook-how-to-write.ht
ml
So yes, the missing directory is likely a failure.
If you can, narrow the problem down by looking at sections of your
Hey Rita,
All software developed by Cloudera for CDH is Apache (v2) licensed and
freely available. See these docs [1,2] for more info.
We publish source packages (which includes the packaging source) and
source tarballs, you can find these at
http://archive.cloudera.com/cdh/3/. See the
Worked perfectly.
Thanks Niels!
-mgl
On Mar 24, 2011, at 12:48 PM, Niels Basjes wrote:
Hi,
To solve that simply do the following on the problematic nodes:
1) Stop the datanode (probably not running)
2) Remove everything inside the .../cache/hdfs/
3) Start the datanode again.
Note:
Hi Kevin,
thanks for the suggestion. I think I found the problem, because my code
is a chained map / reduce. In the previous iteration there is a
.lzo_deflate output which is 40 times larger than other files. That was
because of a special key value, which has significant larger
occurrences
Thanks everyone for your replies.
I knew Cloudera had their release but never knew Y! had one too...
On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins e...@cloudera.com wrote:
Hey Rita,
All software developed by Cloudera for CDH is Apache (v2) licensed and
freely available. See these docs
They do, but IIRC, they recently announced that they're going to be
discontinuing it.
DR
On Thu, March 24, 2011 8:10 pm, Rita wrote:
Thanks everyone for your replies.
I knew Cloudera had their release but never knew Y! had one too...
On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins
Good. Data skew should not look stuck. Try sending status updates so at
least you can tell one mapper is still busy. Yes, adding data or
including another field into the key can help reduce data skew.
Kevin
-Original Message-
From: Shi Yu [mailto:sh...@uchicago.edu]
Sent: Thursday,
yeah i got it.
On Mar 22, 2011, at 1:18 PM, Aaron Baff wrote:
Does anyone see this? Can someone at least respond to this to indicate that
it's getting to the mailing list fine? I've just gotten 0 replies to a few
previous emails so I'm wondering if it's nobody is seeing these, or if people
On Thu, Mar 24, 2011 at 7:04 PM, Rita rmorgan...@gmail.com wrote:
Oh! Thats for the heads up on that...
I guess I will go with the cloudera source then
On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch dar...@darose.net
wrote:
They do, but IIRC, they recently announced that they're
Hello,
On Fri, Mar 25, 2011 at 2:27 AM, Daniel McEnnis dmcen...@gmail.com wrote:
Dear,
I have checked out via SVN the Hadoop core code. I am trying to
compile it. Is there a build script to work from?
There is an Apache Ant build.xml file bundled along (in the root
directory of the
26 matches
Mail list logo