Re: Nutch build failed

2008-03-29 Thread Developer Developer
DO you it is related to the java version u r using? On Fri, Mar 28, 2008 at 10:14 AM, John Mitterko [EMAIL PROTECTED] wrote: Hi, I am trying to build nutch from the 03/28/08 build and I keep getting the following error. compile-core: [javac] Compiling 181 source

Re: Running Nutch on existing Hadoop installation

2008-03-29 Thread Developer Developer
Are you setting on multi node environment? On Fri, Mar 28, 2008 at 8:04 PM, Bradford Stephens [EMAIL PROTECTED] wrote: Greetings, What should it take to run Nutch on an existing Hadoop installation? I plan on using HBase on my Hadoop cluster as well as Nutch, so I'd like to keep everything

Re: url file and crawl filter file - basic question ( may be )

2008-03-28 Thread Developer Developer
no comments ? :) On Fri, Mar 28, 2008 at 12:42 PM, Developer Developer [EMAIL PROTECTED] wrote: Hello Frens, I want nutch to crawl two hosts www.oracle.com and www.ibm.com . I think my url-crawl filter is not set up correctly, because i see the message No URLs to fetch - check your seed

Re: Cluster Summary

2008-03-26 Thread Developer Developer
this helps. Cheers boris On Tue, Mar 25, 2008 at 7:37 PM, Developer Developer [EMAIL PROTECTED] wrote: Hello Frens, I debugged the problem and here is some more informaiton and question. I think the slave node is not communicating with the master node through continuous heart beat. I

Cluster summary

2008-03-19 Thread Developer Developer
I am a beginner with nutch/hadoop/. I have setup nutch/hadoop with two nodes. How do I know that my setup is working i.e how do i know that both slaves are being used for crawl/index etc. When I point my browser to http://masternode:50030/jobtracker.jsp , In the cluster symmary I only see

Cluster Summary

2008-03-19 Thread Developer Developer
Hello I have setup nutch/hadoop on 2 nodes. How can i make sure that the setup is correct ? I can start and stop using start-all and stop-all script with no errors. But during crawl when i look at the status on http://master:50030/jobtracker.jsp, the cluster summary shows only one node. ? Any

Re: Setting nutch/hadopp multi node environment on a SAN device.

2008-03-10 Thread Developer Developer
Hi folks, any more comments from hadoop experts ? On Sun, Mar 9, 2008 at 7:56 AM, Developer Developer [EMAIL PROTECTED] wrote: it is nfs mount On Sat, Mar 8, 2008 at 9:07 PM, Dennis Kubes [EMAIL PROTECTED] wrote: How is the san accessed, as a network drive, a special protocol? Dennis

Re: Setting nutch/hadopp multi node environment on a SAN device.

2008-03-09 Thread Developer Developer
it is nfs mount On Sat, Mar 8, 2008 at 9:07 PM, Dennis Kubes [EMAIL PROTECTED] wrote: How is the san accessed, as a network drive, a special protocol? Dennis Developer Developer wrote: Any comments? On Sat, Mar 8, 2008 at 1:36 PM, Developer Developer [EMAIL PROTECTED] wrote

Setting nutch/hadopp multi node environment on a SAN device.

2008-03-08 Thread Developer Developer
Hello Friends, The tutorial @ http://wiki.apache.org/nutch/NutchHadoopTutorial says don't use DFS on an NFS mount (this would be pretty stupid anyway).. I am setting up multi node nutch/hadoop environment with lots of storage available on a SAN device. I tried running nutch/hadoop with DFS

Re: Setting nutch/hadopp multi node environment on a SAN device.

2008-03-08 Thread Developer Developer
Any comments? On Sat, Mar 8, 2008 at 1:36 PM, Developer Developer [EMAIL PROTECTED] wrote: Hello Friends, The tutorial @ http://wiki.apache.org/nutch/NutchHadoopTutorial says don't use DFS on an NFS mount (this would be pretty stupid anyway).. I am setting up multi node nutch/hadoop

Re: nutch 0.9, multiple nodes, dedup error and Failed to transfer blk_-1407334809134504262

2008-03-05 Thread Developer Developer
Hello John and Fellow coders, I there any resolution for this 50010 port connection error !! I am really struggling to get the multiple node environment working. I belive I have followed all the steps on the wiki. I am using nutch 0.9. Thanks ! 08-03-05 13:01:08,876 WARN dfs.DataNode -

Re: nutch 0.9, multiple nodes, dedup error and Failed to transfer blk_-1407334809134504262

2008-03-05 Thread Developer Developer
Is there any command to check if the port 50010 is open for socket connection ? Thanks ! On Wed, Mar 5, 2008 at 1:09 PM, Developer Developer [EMAIL PROTECTED] wrote: Hello John and Fellow coders, I there any resolution for this 50010 port connection error !! I am really struggling to get

Re: nutch 0.9, multiple nodes, dedup error and Failed to transfer blk_-1407334809134504262

2008-03-05 Thread Developer Developer
Thanks job. I found solution for port 500010. It was just a firewall issue on the slave machine. I tested with firewall turned off, it worked. Thanks ! On Wed, Mar 5, 2008 at 1:31 PM, John Mendenhall [EMAIL PROTECTED] wrote: On Wed, 05 Mar 2008, Developer Developer wrote: Hello John

cat: /home/user/nutch/search/bin/../conf/masters: No such file or directory

2008-02-25 Thread Developer Developer
Hello I am trying to setup nutch in a clustered environment using the tutorial at http://wiki.apache.org/nutch/NutchHadoopTutorial I am seeing errors to verify setup on single machine. When I run bin/start-all.sh i get the error *cat: /home/user/nutch/search/bin/../conf/masters: No such file or

Re: cat: /home/user/nutch/search/bin/../conf/masters: No such file or directory

2008-02-25 Thread Developer Developer
that wasn't a feature when the tutorial was first written). Feel free to change the wiki to clarify for others. Dennis Developer Developer wrote: Hello I am trying to setup nutch in a clustered environment using the tutorial at http://wiki.apache.org/nutch/NutchHadoopTutorial I am seeing errors

Cannot delete /home/user/nutch/filesystem/mapreduce/system. Name node is in safe mode. - Error

2008-02-25 Thread Developer Developer
Hello I am trying to setup nutch in a clustered environment using the tutorial at http://wiki.apache.org/nutch/NutchHadoopTutorial * *I am see the following error in the file * hadoop-user-jobtracker-localhost.log *at startup. 2008-02-25 13:48:01,988 WARN mapred.JobTracker - Error starting

Installing nutch over existing Hadoop cluster

2008-02-14 Thread Developer Developer
Hello Frens, Are there any instructions or information available on how to install Nutch on an existing Hadoop Cluster on a set of linux boxes. I look at the nutch wiki instructions http://wiki.apache.org/nutch/NutchHadoopTutorial, but these are for a new nutch and hadoop install . Thanks !

Re: Nutch performance numbers

2008-01-25 Thread Developer Developer
a lot more details before any meaningful response can be made. Imagine yourself on the receiving end of a question this vague in an area of your own expertise. Could you answer it? Best Erick On Jan 25, 2008 12:10 PM, Developer Developer [EMAIL PROTECTED] wrote: Please provide any

Re: Nutch performance numbers

2008-01-25 Thread Developer Developer
Please provide any comments on this one. Thanks ! On Jan 23, 2008 9:57 AM, Developer Developer [EMAIL PROTECTED] wrote: Folks, I want to record the performance numbers of nutch crawl and index ? Can you please let me know what is the best to do it ? HOw do I obtain performance numbers

Nutch performance numbers

2008-01-23 Thread Developer Developer
Folks, I want to record the performance numbers of nutch crawl and index ? Can you please let me know what is the best to do it ? HOw do I obtain performance numbers for inject, generate, fetch and updatedb ? Thanks !

Re: How to use Nutch to parse Web-pages!

2008-01-15 Thread Developer Developer
check this out http://kuthrax.blogspot.com/2008/01/how-to-retrieve-parsed-content-from.html On Jan 15, 2008 2:46 PM, Morrowwind [EMAIL PROTECTED] wrote: Hi, My project is about web page processing and I need to parse the web-pages to get all the plain text first. Now I have finished

Support Hardware and OS for nutch and hadoop

2008-01-04 Thread Developer Developer
Hello Frens, I am gathering information on supoorted hardware and OS for nutch and hadoop . I did not find any conclusive information by going thru Nutch wiki. If I want to build a cluster of nodes using nutch/hadoop for crawling then what are my options for H/W and OS ?

Prefix Query in Nutch and Wildcard support.

2008-01-03 Thread Developer Developer
Hello Frens, Is there anyway to do prefix query in Nutch ? Eg Query the content field for the occurance of abc* ? I could do it in Lucene, but i want to do it in nuthch . Going through the mialing list it appeared that Nutch does not support such queries. Is it ture ? Thanks !

System.out.println(parsetext.getText()) prints non readable chars - Please help

2008-01-02 Thread Developer Developer
Hello , I need to access parse text from nutch documents, I am using nuthbean to search and then access the parseText from it. Here is the sample code Configuration conf = NutchConfiguration.create(); NutchBean nb = new NutchBean(conf); Hits hits = nb.search(Query.parse(irs, conf), 10); //get

Re: System.out.println(parsetext.getText()) prints non readable chars - Please help

2008-01-02 Thread Developer Developer
It is in English language. I am pretty sure it is not in other language because here is the document url http://www.irs.gov/pub/irs-pdf/f1040as1.pdf. On Jan 2, 2008 10:49 AM, Dennis Kubes [EMAIL PROTECTED] wrote: Most likely this page is in a different language. Dennis Developer

Re: System.out.println(parsetext.getText()) prints non readable chars - Please help

2008-01-02 Thread Developer Developer
for the pdf-plugin in folder plugins to see how Nutch uses this api. 2008/1/2, Developer Developer [EMAIL PROTECTED]: Hello , I need to access parse text from nutch documents, I am using nuthbean to search and then access the parseText from it. Here is the sample code Configuration conf

Accessing parsed content from java application

2007-12-14 Thread Developer Developer
Hello Frens, I believe nutch stores parsed content somewhere. Can you please let me know how I can access through a java code the parsed content given a url ? Thanks !

Question on searching nutch from java appliction

2007-12-05 Thread Developer Developer
Hello, I have a requirement to search nutch index from Java application ( non web). Here is the code I am using but I get errors. Please help. Code: public class TestSearch { /** * @param args */ public static void main(String[] args) { try {