how to optimize mapreduce procedure??

2009-03-12 Thread ZhiHong Fu
Hello, I'm writing a program which will finish lucene searching in about 12 index directorys, all of them are stored in HDFS. It is done like this: 1. We get about 12 index Directorys through lucene index functionality, each of which about 100M size, 2. We store these 12 index

How-to in MapReduce

2009-01-23 Thread Mark Kerzner
Hi, esteemed group, how would I form Maps in MapReduce to recursevely look at every file in a directory, and do something to this file, such as produce a PDF or compute its hash? For that matter, Google builds its index using MapReduce, or so the papers say. First the crawlers store all the

Re: How-to in MapReduce

2009-01-23 Thread tim robertson
Hi, Sounds like you might want to look at the Nutch project architecture and then see the Nutch on Hadoop tutorial - http://wiki.apache.org/nutch/NutchHadoopTutorial It does web crawling, and indexing using Lucene. It would be a good place to start anyway for ideas, even if it doesn't end up

Re: How-to in MapReduce

2009-01-23 Thread Mark Kerzner
Tim, I looked there, but it is a set up manual. I read the MapReduce, Sazall, and the MS paper on these, but I need best practices. Thank you, Mark On Fri, Jan 23, 2009 at 3:22 PM, tim robertson timrobertson...@gmail.comwrote: Hi, Sounds like you might want to look at the Nutch project

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-02 Thread GO-HADOOP
is ok, my concern is what if the output of the map/reduce become so large and i run into similar performance issues using other RDBMS. -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output-in-HDFS-directory-from-Web-Application-tp20282762p20290241.html Sent from

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-02 Thread Alex Loddengaard
table to display as output in the web application? MYSQL option is ok, my concern is what if the output of the map/reduce become so large and i run into similar performance issues using other RDBMS. -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-02 Thread Jean-Daniel Cryans
if the output of the map/reduce become so large and i run into similar performance issues using other RDBMS. -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output-in-HDFS-directory-from-Web-Application-tp20282762p20290241.html Sent from the Hadoop core-user

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-02 Thread GO-HADOOP
I think i found what i was looking for regarding HBASE. Thanks -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output-in-HDFS-directory-from-Web-Application-tp20282762p20294221.html Sent from the Hadoop core-user mailing list archive at Nabble.com.

How to read mapreduce output in HDFS directory from Web Application

2008-11-01 Thread GO-HADOOP
I am new to HADOOP, i am trying to understand what is the efficient method to read the output file from HDFS and display the result in simple web application? Thanks -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output-in-HDFS-directory-from-Web-Application

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-01 Thread Alex Loddengaard
the result in simple web application? Thanks -- View this message in context: http://www.nabble.com/How-to-read-mapreduce-output-in-HDFS-directory-from-Web-Application-tp20282762p20282762.html Sent from the Hadoop core-user mailing list archive at Nabble.com.