Re: Reverse Indexing Programming Help

2011-03-31 Thread Ted Dunning
It would help to get a good book. There are several. For your program, there are several things that will trip you up: a) lots of little files is going to be slow. You want input that is >100MB per file if you want speed. b) That file format is a bit cheesy since it is hard to tell URL's from

Reverse Indexing Programming Help

2011-03-31 Thread DoomUs
I'm just starting out using Hadoop. I've looked through the java examples, and have an idea about what's going on, but don't really get it. I'd like to write a program that takes a directory of files. Contained in those files are a URL to a website on the first line, and the second line is the

Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 10:43 AM, XiaoboGu wrote: > I have trouble browsing the file system vi namenode web interface, namenode > saying in log file that th –G option is invalid to get the groups for the > user. > > I thought this was not the case any more but hadoop forks to the 'id' command t

Reading Records from a Sequence File

2011-03-31 Thread maha
Hello Everyone, As far as I know, when my java program opens a sequence file for a map calculations, from hdfs. Using SequenceFile.Reader(key,value) will actually read the file in dfs.block.size then grabes record-by-record from memory. Is that right? .. I tried a simple program wit

Re: What does "Too many fetch-failures" mean? How do I debug it?

2011-03-31 Thread David Rosenstrauch
On 03/31/2011 05:13 PM, W.P. McNeill wrote: I'm running a big job on my cluster and a handful of attempts are failing with a "Too many fetch-failures" error message. They're all on the same node, but that node doesn't appear to be down. Subsequent attempts succeed, so this looks like a transient

What does "Too many fetch-failures" mean? How do I debug it?

2011-03-31 Thread W.P. McNeill
I'm running a big job on my cluster and a handful of attempts are failing with a "Too many fetch-failures" error message. They're all on the same node, but that node doesn't appear to be down. Subsequent attempts succeed, so this looks like a transient stress issue rather than a problem with my cod

Re: Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread Allen Wittenauer
On Mar 31, 2011, at 7:43 AM, XiaoboGu wrote: > I have trouble browsing the file system vi namenode web interface, namenode > saying in log file that th –G option is invalid to get the groups for the > user. > I don't but I suspect you'll need to enable one of the POSIX personalities

questions on map-side spills

2011-03-31 Thread Shrinivas Joshi
I am trying TeraSort with Apache 0.21.0 build. io.sort.mb is 360M, map.sort.spill.percent is 0.8, dfs.blocksize is 256M. I am having some difficulty understanding spill related decisions from the log files. Here are the relevant log lines: 2011-03-30 13:46:51,591 INFO org.apache.hadoop.mapred.MapT

Is anyone running Hadoop 0.21.0 on Solaris 10 X64?

2011-03-31 Thread XiaoboGu
I have trouble browsing the file system vi namenode web interface, namenode saying in log file that th –G option is invalid to get the groups for the user.

How to avoid receiving threads send by other people.

2011-03-31 Thread XiaoboGu
Hi, I have subscribed to the digest mode, but I still get all the messages instantly from other people in the list. But other mailing list won’t do this, they will send all the messages during a time frame in one mail. How can I achieve this with Apache mailing lists? Regards, Xiaobo G

sorting reducer input numerically in hadoop streaming

2011-03-31 Thread Dieter Plaetinck
hi, I use hadoop 0.20.2, more specifically hadoop-streaming, on Debian 6.0 (squeeze) nodes. My question is: how do I make sure input keys being fed to the reducer are sorted numerically rather then alphabetically? example: - standard behavior: #1 some-value1 #10 some-value10 #100 some-value100 #2

DFSIO benchmark

2011-03-31 Thread Matthew John
Can some one provide pointers/ links for DFSio Benchmarks to check the IO performance of HDFS ?? Thanks, Matthew John

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma
Thanks Amareshwari, I find it & I'm sorry it results in another error: bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -libjars /home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-test.jar -inputformat org.apache.hadoop.mapred.pipes.WordCou

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu
Also see TestPipes.java for more details. On 3/31/11 4:29 PM, "Amareshwari Sriramadasu" wrote: Adarsh, The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program specific options. So, it should be just after your -D parameters.

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu
Adarsh, The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program specific options. So, it should be just after your -D parameters. -Amareshwari On 3/31/11 3:45 PM, "Adarsh Sharma" wrote: Amareshwari Sri Ramadasu wrote: Re: Hado

hadoop streaming shebang line for python and mappers jumping to 100% completion right away

2011-03-31 Thread Dieter Plaetinck
Hi, I use 0.20.2 on Debian 6.0 (squeeze) nodes. I have 2 problems with my streaming jobs: 1) I start the job like so: hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \ -file /proj/Search/wall/experiment/ \ -mapper './nolog.sh mapper' \ -reducer './

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma
Amareshwari Sri Ramadasu wrote: You can not run it with TextInputFormat. You should run it with org.apache.hadoop.mapred.pipes .*WordCountInputFormat. *You can pass the input format by passing it in --inputformat option. I did not try it myself, but it should work. Here is the command that

Re: Hadoop Pipes Error

2011-03-31 Thread Amareshwari Sri Ramadasu
You can not run it with TextInputFormat. You should run it with org.apache.hadoop.mapred.pipes .WordCountInputFormat. You can pass the input format by passing it in -inputformat option. I did not try it myself, but it should work. -Amareshwari On 3/31/11 12:23 PM, "Adarsh Sharma" wrote: Thank

Re: Hadoop Pipes Error

2011-03-31 Thread Steve Loughran
On 31/03/11 07:53, Adarsh Sharma wrote: Thanks Amareshwari, here is the posting : The *nopipe* example needs more documentation. It assumes that it is run with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/ *WordCountInputFormat*.java, which has a very specific input split for

Re: How to apply Patch

2011-03-31 Thread Adarsh Sharma
Thanks Steve , U helped me to clear my doubts several times. I explain U What my Problem is : I am trying to run *wordcount-nopipe.cc* program in */home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory. I am able to run a simple wordcount.cpp program in Hadoop Cluster but whebn

Re: How to apply Patch

2011-03-31 Thread Steve Loughran
On 31/03/11 07:37, Adarsh Sharma wrote: Thanks a lot for such deep explanation : I have done it now, but it doesn't help me in my original problem for which I'm doing this. Please if you have some idea comment on it. I attached the problem. Sadly. Matt's deep explanation is what you need, lo

Re: Hadoop Pipes Error

2011-03-31 Thread Adarsh Sharma
What are the steps needed to debug the error & make worcount-nopipe.cc running properly. Please if possible guide in steps. Thanks & best Regards, Adarsh Sharma Amareshwari Sri Ramadasu wrote: Here is an answer for your question in old mail archive: http://lucene.472066.n3.nabble.com/pipe-a

Re: # of keys per reducer invocation (streaming api)

2011-03-31 Thread Dieter Plaetinck
On Tue, 29 Mar 2011 23:17:13 +0530 Harsh J wrote: > Hello, > > On Tue, Mar 29, 2011 at 8:25 PM, Dieter Plaetinck > wrote: > > Hi, I'm using the streaming API and I notice my reducer gets - in > > the same invocation - a bunch of different keys, and I wonder why. > > I would expect to get one ke

RE: Hadoop for Bioinformatics

2011-03-31 Thread Evert Lammerts
> The short answer is yes! At CRS4 we are working on this very problem. > > We have implemented a Hadoop-based workflow to perform short read > alignment to > support DNA sequencing activities in our lab. Its alignment operation > is > based on (and therefore equivalent to) BWA. We have written