It would help to get a good book. There are several.
For your program, there are several things that will trip you up:
a) lots of little files is going to be slow. You want input that is >100MB
per file if you want speed.
b) That file format is a bit cheesy since it is hard to tell URL's from
I'm just starting out using Hadoop. I've looked through the java examples,
and have an idea about what's going on, but don't really get it.
I'd like to write a program that takes a directory of files. Contained in
those files are a URL to a website on the first line, and the second line is
the
On Thu, Mar 31, 2011 at 10:43 AM, XiaoboGu wrote:
> I have trouble browsing the file system vi namenode web interface, namenode
> saying in log file that th –G option is invalid to get the groups for the
> user.
>
>
I thought this was not the case any more but hadoop forks to the 'id'
command t
Hello Everyone,
As far as I know, when my java program opens a sequence file for a map
calculations, from hdfs. Using SequenceFile.Reader(key,value) will actually
read the file in dfs.block.size then grabes record-by-record from memory.
Is that right?
.. I tried a simple program wit
On 03/31/2011 05:13 PM, W.P. McNeill wrote:
I'm running a big job on my cluster and a handful of attempts are failing
with a "Too many fetch-failures" error message. They're all on the same
node, but that node doesn't appear to be down. Subsequent attempts succeed,
so this looks like a transient
I'm running a big job on my cluster and a handful of attempts are failing
with a "Too many fetch-failures" error message. They're all on the same
node, but that node doesn't appear to be down. Subsequent attempts succeed,
so this looks like a transient stress issue rather than a problem with my
cod
On Mar 31, 2011, at 7:43 AM, XiaoboGu wrote:
> I have trouble browsing the file system vi namenode web interface, namenode
> saying in log file that th –G option is invalid to get the groups for the
> user.
>
I don't but I suspect you'll need to enable one of the POSIX
personalities
I am trying TeraSort with Apache 0.21.0 build. io.sort.mb is 360M,
map.sort.spill.percent is 0.8, dfs.blocksize is 256M. I am having some
difficulty understanding spill related decisions from the log files. Here
are the relevant log lines:
2011-03-30 13:46:51,591 INFO org.apache.hadoop.mapred.MapT
I have trouble browsing the file system vi namenode web interface, namenode
saying in log file that th –G option is invalid to get the groups for the user.
Hi,
I have subscribed to the digest mode, but I still get all the messages
instantly from other people in the list. But other mailing list won’t do this,
they will send all the messages during a time frame in one mail.
How can I achieve this with Apache mailing lists?
Regards,
Xiaobo G
hi,
I use hadoop 0.20.2, more specifically hadoop-streaming, on Debian 6.0
(squeeze) nodes.
My question is: how do I make sure input keys being fed to the reducer
are sorted numerically rather then alphabetically?
example:
- standard behavior:
#1 some-value1
#10 some-value10
#100 some-value100
#2
Can some one provide pointers/ links for DFSio Benchmarks to check the IO
performance of HDFS ??
Thanks,
Matthew John
Thanks Amareshwari, I find it & I'm sorry it results in another error:
bash-3.2$ bin/hadoop pipes -D hadoop.pipes.java.recordreader=true -D
hadoop.pipes.java.recordwriter=true -libjars
/home/hadoop/project/hadoop-0.20.2/hadoop-0.20.2-test.jar -inputformat
org.apache.hadoop.mapred.pipes.WordCou
Also see TestPipes.java for more details.
On 3/31/11 4:29 PM, "Amareshwari Sriramadasu" wrote:
Adarsh,
The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program
specific options. So, it should be just after your -D parameters.
Adarsh,
The inputformat is present in test jar. So, pass -libjars to your command. libjars option should be passed before program
specific options. So, it should be just after your -D parameters.
-Amareshwari
On 3/31/11 3:45 PM, "Adarsh Sharma" wrote:
Amareshwari Sri Ramadasu wrote:
Re: Hado
Hi,
I use 0.20.2 on Debian 6.0 (squeeze) nodes.
I have 2 problems with my streaming jobs:
1) I start the job like so:
hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
-file /proj/Search/wall/experiment/ \
-mapper './nolog.sh mapper' \
-reducer './
Amareshwari Sri Ramadasu wrote:
You can not run it with TextInputFormat. You should run it with
org.apache.hadoop.mapred.pipes .*WordCountInputFormat. *You can pass
the input format by passing it in --inputformat option.
I did not try it myself, but it should work.
Here is the command that
You can not run it with TextInputFormat. You should run it with
org.apache.hadoop.mapred.pipes .WordCountInputFormat. You can pass the input
format by passing it in -inputformat option.
I did not try it myself, but it should work.
-Amareshwari
On 3/31/11 12:23 PM, "Adarsh Sharma" wrote:
Thank
On 31/03/11 07:53, Adarsh Sharma wrote:
Thanks Amareshwari,
here is the posting :
The *nopipe* example needs more documentation. It assumes that it is run
with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/
*WordCountInputFormat*.java, which has a very specific input split
for
Thanks Steve , U helped me to clear my doubts several times.
I explain U What my Problem is :
I am trying to run *wordcount-nopipe.cc* program in
*/home/hadoop/project/hadoop-0.20.2/src/examples/pipes/impl* directory.
I am able to run a simple wordcount.cpp program in Hadoop Cluster but
whebn
On 31/03/11 07:37, Adarsh Sharma wrote:
Thanks a lot for such deep explanation :
I have done it now, but it doesn't help me in my original problem for
which I'm doing this.
Please if you have some idea comment on it. I attached the problem.
Sadly. Matt's deep explanation is what you need, lo
What are the steps needed to debug the error & make worcount-nopipe.cc
running properly.
Please if possible guide in steps.
Thanks & best Regards,
Adarsh Sharma
Amareshwari Sri Ramadasu wrote:
Here is an answer for your question in old mail archive:
http://lucene.472066.n3.nabble.com/pipe-a
On Tue, 29 Mar 2011 23:17:13 +0530
Harsh J wrote:
> Hello,
>
> On Tue, Mar 29, 2011 at 8:25 PM, Dieter Plaetinck
> wrote:
> > Hi, I'm using the streaming API and I notice my reducer gets - in
> > the same invocation - a bunch of different keys, and I wonder why.
> > I would expect to get one ke
> The short answer is yes! At CRS4 we are working on this very problem.
>
> We have implemented a Hadoop-based workflow to perform short read
> alignment to
> support DNA sequencing activities in our lab. Its alignment operation
> is
> based on (and therefore equivalent to) BWA. We have written
24 matches
Mail list logo