date:20110331

Re: Reverse Indexing Programming Help

2011-03-31 Thread Ted Dunning

It would help to get a good book. There are several. For your program, there are several things that will trip you up: a) lots of little files is going to be slow. You want input that is >100MB per file if you want speed. b) That file format is a bit cheesy since it is hard to tell URL's from

Reverse Indexing Programming Help

2011-03-31 Thread DoomUs

I'm just starting out using Hadoop. I've looked through the java examples, and have an idea about what's going on, but don't really get it. I'd like to write a program that takes a directory of files. Contained in those files are a URL to a website on the first line, and the second line is the

Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread Edward Capriolo

On Thu, Mar 31, 2011 at 10:43 AM, XiaoboGu wrote: > I have trouble browsing the file system vi namenode web interface, namenode > saying in log file that th –G option is invalid to get the groups for the > user. > > I thought this was not the case any more but hadoop forks to the 'id' command t

Reading Records from a Sequence File

2011-03-31 Thread maha

Hello Everyone, As far as I know, when my java program opens a sequence file for a map calculations, from hdfs. Using SequenceFile.Reader(key,value) will actually read the file in dfs.block.size then grabes record-by-record from memory. Is that right? .. I tried a simple program wit

Re: What does "Too many fetch-failures" mean? How do I debug it?

2011-03-31 Thread David Rosenstrauch

On 03/31/2011 05:13 PM, W.P. McNeill wrote: I'm running a big job on my cluster and a handful of attempts are failing with a "Too many fetch-failures" error message. They're all on the same node, but that node doesn't appear to be down. Subsequent attempts succeed, so this looks like a transient

What does "Too many fetch-failures" mean? How do I debug it?

2011-03-31 Thread W.P. McNeill

I'm running a big job on my cluster and a handful of attempts are failing with a "Too many fetch-failures" error message. They're all on the same node, but that node doesn't appear to be down. Subsequent attempts succeed, so this looks like a transient stress issue rather than a problem with my cod

Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread Allen Wittenauer

On Mar 31, 2011, at 7:43 AM, XiaoboGu wrote: > I have trouble browsing the file system vi namenode web interface, namenode > saying in log file that th –G option is invalid to get the groups for the > user. > I don't but I suspect you'll need to enable one of the POSIX personalities

questions on map-side spills

2011-03-31 Thread Shrinivas Joshi

I am trying TeraSort with Apache 0.21.0 build. io.sort.mb is 360M, map.sort.spill.percent is 0.8, dfs.blocksize is 256M. I am having some difficulty understanding spill related decisions from the log files. Here are the relevant log lines: 2011-03-30 13:46:51,591 INFO org.apache.hadoop.mapred.MapT

Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread XiaoboGu

I have trouble browsing the file system vi namenode web interface, namenode saying in log file that th –G option is invalid to get the groups for the user.

How to avoid receiving threads send by other people.

2011-03-31 Thread XiaoboGu

Hi, I have subscribed to the digest mode, but I still get all the messages instantly from other people in the list. But other mailing list won’t do this, they will send all the messages during a time frame in one mail. How can I achieve this with Apache mailing lists? Regards, Xiaobo G

sorting reducer input numerically in hadoop streaming

2011-03-31 Thread Dieter Plaetinck

hi, I use hadoop 0.20.2, more specifically hadoop-streaming, on Debian 6.0 (squeeze) nodes. My question is: how do I make sure input keys being fed to the reducer are sorted numerically rather then alphabetically? example: - standard behavior: #1 some-value1 #10 some-value10 #100 some-value100 #2

DFSIO benchmark

2011-03-31 Thread Matthew John

Can some one provide pointers/ links for DFSio Benchmarks to check the IO performance of HDFS ?? Thanks, Matthew John

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma

Thanks Amareshwari, I find it & I'm sorry it results in another error: bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -libjars /home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-test.jar -inputformat org.apache.hadoop.mapred.pipes.WordCou

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu

Also see TestPipes.java for more details. On 3/31/11 4:29 PM, "Amareshwari Sriramadasu" wrote: Adarsh, The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program specific options. So, it should be just after your -D parameters.

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu

Adarsh, The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program specific options. So, it should be just after your -D parameters. -Amareshwari On 3/31/11 3:45 PM, "Adarsh Sharma" wrote: Amareshwari Sri Ramadasu wrote: Re: Hado

hadoop streaming shebang line for python and mappers jumping to 100% completion right away

2011-03-31 Thread Dieter Plaetinck

Hi, I use 0.20.2 on Debian 6.0 (squeeze) nodes. I have 2 problems with my streaming jobs: 1) I start the job like so: hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \ -file /proj/Search/wall/experiment/ \ -mapper './nolog.sh mapper' \ -reducer './

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma

Amareshwari Sri Ramadasu wrote: You can not run it with TextInputFormat. You should run it with org.apache.hadoop.mapred.pipes .*WordCountInputFormat. *You can pass the input format by passing it in --inputformat option. I did not try it myself, but it should work. Here is the command that

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu

You can not run it with TextInputFormat. You should run it with org.apache.hadoop.mapred.pipes .WordCountInputFormat. You can pass the input format by passing it in -inputformat option. I did not try it myself, but it should work. -Amareshwari On 3/31/11 12:23 PM, "Adarsh Sharma" wrote: Thank

Re: Hadoop Pipes Error

2011-03-31 Thread Steve Loughran

On 31/03/11 07:53, Adarsh Sharma wrote: Thanks Amareshwari, here is the posting : The *nopipe* example needs more documentation. It assumes that it is run with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/ *WordCountInputFormat*.java, which has a very specific input split for

Re: How to apply Patch

2011-03-31 Thread Adarsh Sharma

Thanks Steve , U helped me to clear my doubts several times. I explain U What my Problem is : I am trying to run *wordcount-nopipe.cc* program in */home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory. I am able to run a simple wordcount.cpp program in Hadoop Cluster but whebn

Re: How to apply Patch

2011-03-31 Thread Steve Loughran

On 31/03/11 07:37, Adarsh Sharma wrote: Thanks a lot for such deep explanation : I have done it now, but it doesn't help me in my original problem for which I'm doing this. Please if you have some idea comment on it. I attached the problem. Sadly. Matt's deep explanation is what you need, lo

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma

What are the steps needed to debug the error & make worcount-nopipe.cc running properly. Please if possible guide in steps. Thanks & best Regards, Adarsh Sharma Amareshwari Sri Ramadasu wrote: Here is an answer for your question in old mail archive: http://lucene.472066.n3.nabble.com/pipe-a

Re: # of keys per reducer invocation (streaming api)

2011-03-31 Thread Dieter Plaetinck

On Tue, 29 Mar 2011 23:17:13 +0530 Harsh J wrote: > Hello, > > On Tue, Mar 29, 2011 at 8:25 PM, Dieter Plaetinck > wrote: > > Hi, I'm using the streaming API and I notice my reducer gets - in > > the same invocation - a bunch of different keys, and I wonder why. > > I would expect to get one ke

RE: Hadoop for Bioinformatics

2011-03-31 Thread Evert Lammerts

> The short answer is yes! At CRS4 we are working on this very problem. > > We have implemented a Hadoop-based workflow to perform short read > alignment to > support DNA sequencing activities in our lab. Its alignment operation > is > based on (and therefore equivalent to) BWA. We have written

Re: Reverse Indexing Programming Help

Reverse Indexing Programming Help

Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

Reading Records from a Sequence File

Re: What does "Too many fetch-failures" mean? How do I debug it?

What does "Too many fetch-failures" mean? How do I debug it?

Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

questions on map-side spills

Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

How to avoid receiving threads send by other people.

sorting reducer input numerically in hadoop streaming

DFSIO benchmark

Re: Hadoop Pipes Error

Re: Hadoop Pipes Error

Re: Hadoop Pipes Error

hadoop streaming shebang line for python and mappers jumping to 100% completion right away

Re: Hadoop Pipes Error

Re: Hadoop Pipes Error

Re: Hadoop Pipes Error

Re: How to apply Patch

Re: How to apply Patch

Re: Hadoop Pipes Error

Re: # of keys per reducer invocation (streaming api)

RE: Hadoop for Bioinformatics

24 matches

Site Navigation

Mail list logo

Footer information