date:20110324

Re: Direct HDFS access from a streaming job

2011-03-24 Thread Harsh J

There is a C-HDFS API + library (called libhdfs) available @ http://hadoop.apache.org/common/docs/r0.20.2/libhdfs.html. Perhaps you can make your C++ mapper program use that? On Thu, Mar 24, 2011 at 10:56 AM, Keith Wiley kwi...@keithwiley.com wrote: This webpage:

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

2011-03-24 Thread Harsh J

Also, is your Hadoop really under nutch/search or is it nutch/search/hadoop-0.x.x? Set the HADOOP_HOME appropriately to the exact directory Hadoop's files exist immediately under. On Thu, Mar 24, 2011 at 1:13 PM, Andy XUE andyxuey...@gmail.com wrote: Hi there: I'm a new user to Hadoop and

Re: CDH and Hadoop

2011-03-24 Thread Steve Loughran

On 23/03/11 15:32, Michael Segel wrote: Rita, It sounds like you're only using Hadoop and have no intentions to really get into the internals. I'm like most admins/developers/IT guys and I'm pretty lazy. I find it easier to set up the yum repository and then issue the yum install hadoop

Hadoop Distributed System Problems: Does not recognise any slave nodes

2011-03-24 Thread Andy XUE

Hi there: I'm a new user to Hadoop and Nutch, and I am trying to run the crawler * Nutch* on a distributed system powered by *Hadoop*. However as it turns out, the distributed system does not recognise any slave nodes in the cluster. I've stucked at this point for months and am desperate to look

is there a way to write rows sequentially against 60 reduce tasks?

2011-03-24 Thread JunYoung Kim

hi, I run almost 60 ruduce tasks for a single job. if the outputs of a job are from part00 to part 59. is there a way to write rows sequentially by sorted keys? curretly my outputs are like this. part00) 1 10 12 14 part 01) 2 4 6 11 13 part 02) 3 5 7 8 9 but, my aim is to get the

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

2011-03-24 Thread modemide

I'm also new to hadoop, but I was able to get my cluster up and running. I'm not familiar with Nutch though. In any case, my assumption is that Nutch relies on a working hadoop cluster as the base and adds on a few configurations to integrate the two. Here are some things that might help you: *

Re: Direct HDFS access from a streaming job

2011-03-24 Thread Keith Wiley

On Mar 23, 2011, at 11:10 PM, Harsh J wrote: There is a C-HDFS API + library (called libhdfs) available @ http://hadoop.apache.org/common/docs/r0.20.2/libhdfs.html. Perhaps you can make your C++ mapper program use that? Thanks. Actually, I think that with reference to the passage I quoted

Re: Direct HDFS access from a streaming job

2011-03-24 Thread Harsh J

Hello, On Thu, Mar 24, 2011 at 8:45 PM, Keith Wiley kwi...@keithwiley.com wrote: Thanks. Actually, I think that with reference to the passage I quoted in my first post, the unstated intent was to simply do a system() call and invoke hadoop fs -get or hadoop fs -copyToLocal. Some would

Re: Direct HDFS access from a streaming job

2011-03-24 Thread Keith Wiley

On Mar 24, 2011, at 8:31 AM, Harsh J wrote: Hello, On Thu, Mar 24, 2011 at 8:45 PM, Keith Wiley kwi...@keithwiley.com wrote: Thanks. Actually, I think that with reference to the passage I quoted in my first post, the unstated intent was to simply do a system() call and invoke hadoop fs

Re: CDH and Hadoop

2011-03-24 Thread Allen Wittenauer

On Mar 23, 2011, at 7:29 AM, Rita wrote: I have been wondering if I should use CDH (http://www.cloudera.com/hadoop/) instead of the standard Hadoop distribution. What do most people use? Is CDH free? do they provide the tars or does it provide source code and I simply compile? Can I have

Re-generate datanode storageID?

2011-03-24 Thread Marc Leavitt

I am setting up a (very) small Hadoop/CDH3 beta 4 cluster in virtual machines to do some initial feasibility work. I proceeded by progressing through the Cloudera documentation standalone - pseudo-cluster - cluster with a single VM and then, when I had it stable(-ish) I copied the VM to a

How do I split input on fixed length keys

2011-03-24 Thread Kevin.Leach

I'm using hadoop streaming and currently have these properties in my command line: -Dstream.map.output.field.separator=' ' \ -Dstream.num.map.output.key.fields=1 \ This works for me as my test data happens to have a space at column 14. If I want to use a fixed length split, is there a simple

RE: Program freezes at Map 99% Reduce 33%

2011-03-24 Thread Kevin.Leach

Shi, The key here is the 99% done mapper. Nothing can move on until all mappers complete. Is it possible your data in the larger set has an incomplete record or some such at the end? Kevin -Original Message- From: Shi Yu [mailto:sh...@uchicago.edu] Sent: Thursday, March 24, 2011 3:02

Re: Re-generate datanode storageID?

2011-03-24 Thread Niels Basjes

Hi, To solve that simply do the following on the problematic nodes: 1) Stop the datanode (probably not running) 2) Remove everything inside the .../cache/hdfs/ 3) Start the datanode again. Note: With cloudera always use service way to stop/start hadoop software! service hadoop-0.20-datanode stop

Re: Program freezes at Map 99% Reduce 33%

2011-03-24 Thread Shi Yu

Hi Kevin, thanks for reply. I could hardly imagine an example of incomplete record. The mapper is very simple, just reading line by line as Strings, splitting the line by tab, and outputting a Text Pair for sort and secondary sort. If there were incomplete record, there should be an error

build script?

2011-03-24 Thread Daniel McEnnis

Dear, I have checked out via SVN the Hadoop core code. I am trying to compile it. Is there a build script to work from? Daniel McEnnis.

RE: Program freezes at Map 99% Reduce 33%

2011-03-24 Thread Kevin.Leach

Shi, This states Of course, the framework discards the sub-directory of unsuccessful task-attempts. http://hadoop-karma.blogspot.com/2011/01/hadoop-cookbook-how-to-write.ht ml So yes, the missing directory is likely a failure. If you can, narrow the problem down by looking at sections of your

Re: CDH and Hadoop

2011-03-24 Thread Eli Collins

Hey Rita, All software developed by Cloudera for CDH is Apache (v2) licensed and freely available. See these docs [1,2] for more info. We publish source packages (which includes the packaging source) and source tarballs, you can find these at http://archive.cloudera.com/cdh/3/. See the

Re: Re-generate datanode storageID?

2011-03-24 Thread Marc Leavitt

Worked perfectly. Thanks Niels! -mgl On Mar 24, 2011, at 12:48 PM, Niels Basjes wrote: Hi, To solve that simply do the following on the problematic nodes: 1) Stop the datanode (probably not running) 2) Remove everything inside the .../cache/hdfs/ 3) Start the datanode again. Note:

Re: Program freezes at Map 99% Reduce 33%

2011-03-24 Thread Shi Yu

Hi Kevin, thanks for the suggestion. I think I found the problem, because my code is a chained map / reduce. In the previous iteration there is a .lzo_deflate output which is 40 times larger than other files. That was because of a special key value, which has significant larger occurrences

Re: CDH and Hadoop

2011-03-24 Thread Rita

Thanks everyone for your replies. I knew Cloudera had their release but never knew Y! had one too... On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins e...@cloudera.com wrote: Hey Rita, All software developed by Cloudera for CDH is Apache (v2) licensed and freely available. See these docs

Re: CDH and Hadoop

2011-03-24 Thread David Rosenstrauch

They do, but IIRC, they recently announced that they're going to be discontinuing it. DR On Thu, March 24, 2011 8:10 pm, Rita wrote: Thanks everyone for your replies. I knew Cloudera had their release but never knew Y! had one too... On Thu, Mar 24, 2011 at 5:04 PM, Eli Collins

RE: Program freezes at Map 99% Reduce 33%

2011-03-24 Thread Kevin.Leach

Good. Data skew should not look stuck. Try sending status updates so at least you can tell one mapper is still busy. Yes, adding data or including another field into the key can help reduce data skew. Kevin -Original Message- From: Shi Yu [mailto:sh...@uchicago.edu] Sent: Thursday,

Re: Test, please respond

2011-03-24 Thread Jon Lederman

yeah i got it. On Mar 22, 2011, at 1:18 PM, Aaron Baff wrote: Does anyone see this? Can someone at least respond to this to indicate that it's getting to the mailing list fine? I've just gotten 0 replies to a few previous emails so I'm wondering if it's nobody is seeing these, or if people

Re: CDH and Hadoop

2011-03-24 Thread suresh srinivas

On Thu, Mar 24, 2011 at 7:04 PM, Rita rmorgan...@gmail.com wrote: Oh! Thats for the heads up on that... I guess I will go with the cloudera source then On Thu, Mar 24, 2011 at 8:41 PM, David Rosenstrauch dar...@darose.net wrote: They do, but IIRC, they recently announced that they're

Re: build script?

2011-03-24 Thread Harsh J

Hello, On Fri, Mar 25, 2011 at 2:27 AM, Daniel McEnnis dmcen...@gmail.com wrote: Dear, I have checked out via SVN the Hadoop core code. I am trying to compile it. Is there a build script to work from? There is an Apache Ant build.xml file bundled along (in the root directory of the

Re: Direct HDFS access from a streaming job

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Re: CDH and Hadoop

Hadoop Distributed System Problems: Does not recognise any slave nodes

is there a way to write rows sequentially against 60 reduce tasks?

Re: Hadoop Distributed System Problems: Does not recognise any slave nodes

Re: Direct HDFS access from a streaming job

Re: Direct HDFS access from a streaming job

Re: Direct HDFS access from a streaming job

Re: CDH and Hadoop

Re-generate datanode storageID?

How do I split input on fixed length keys

RE: Program freezes at Map 99% Reduce 33%

Re: Re-generate datanode storageID?

Re: Program freezes at Map 99% Reduce 33%

build script?

RE: Program freezes at Map 99% Reduce 33%

Re: CDH and Hadoop

Re: Re-generate datanode storageID?

Re: Program freezes at Map 99% Reduce 33%

Re: CDH and Hadoop

Re: CDH and Hadoop

RE: Program freezes at Map 99% Reduce 33%

Re: Test, please respond

Re: CDH and Hadoop

Re: build script?

26 matches

Site Navigation

Mail list logo

Footer information