returning explanation as an Object instead of HTML

2006-09-13 Thread David Podunavac
Hi there I wonder if there is a way to return the results explanation as an JavaObject instead of returning HTML. nutchBean.getExplanation(query, show[i]) will give me ul... and so on which not very handy maybe it should return an Object and another method getExplanationAsHtml will give the

resolving path of plugins relative to nutch-local.xml

2006-09-13 Thread Michael Wechner
Hi In the case of relative paths the plugins folder path is being resolved relative to the classpath, but I would like to resolve it relative to nutch-local.xml. Is that possible somehow? Or would it make sense to add an attribute where one could set how relative paths could be resolved?

Re: QueryFilter not found

2006-09-13 Thread David Podunavac
David Podunavac schrieb: When I try to instantiate NutchBean i get this Error message even thought i got the class in my lib direcotry i am not sure if i missed anything but help is appreciated has anyone occured the same weird error and found a solution i would be very pleased to get this

Question about using Nutch plug-ins as libraries

2006-09-13 Thread Trym B. Asserson
Hello, We're currently developing an application using the Lucene API for building a search engine and as part of the application we have a component for parsing several file formats. For this component we were hoping to use several of the plug-ins in Nutch and we have written classes in our own

Re: Question about using Nutch plug-ins as libraries

2006-09-13 Thread Dennis Kubes
Is the plugins folder in the root of the war? Dennis Trym B. Asserson wrote: Hello, We're currently developing an application using the Lucene API for building a search engine and as part of the application we have a component for parsing several file formats. For this component we were

problem: install java

2006-09-13 Thread kawther khazri
I went to install nutch whith new version of java 1.*.08 i got on error went im installing java. [EMAIL PROTECTED] ]# rpmbuild --rebuild Desktop/java-1.5.0-sun*src.rpm Installing Desktop/java-1.5.0-sun-compat-1.5.0.08-1jpp.src.rpm warning: user scop does not exist - using root warning: group

RE: Question about using Nutch plug-ins as libraries

2006-09-13 Thread Trym B. Asserson
Hello again, Thanks for the reply. With some intensive use of Filemon I eventually figured out where the classes were looking and the plugins + conf-files had to be moved into the WEB-INF\classes directory. We now have it fully functional, cheers. But to answer your question, no we'd tried

How to crawl this website

2006-09-13 Thread [EMAIL PROTECTED]
Hey,I am confused with the crawling with nutch. as you know,there are some website which can not be accessed becaused they are the postmethod,that means,even if you know the web site's url,when you input the url into the address bar on the IE or Mozilla,the website 's some important content has

Bug in Nutch?

2006-09-13 Thread Meghna Kukreja
Hi, I set the http.content.limit to -1 to not truncate any data being fetched, however if the fetched data was compressed (http response header Content-Encoding: gzip) then Nutch was not able to uncompress this data. If i set http.content.limit to its default value of 65536, Nutch did not have

0.8 Intranet Crawl Output/Logging?

2006-09-13 Thread jared.dunne
I am using the nutch 0.8 'crawl' command to crawl some content. When I run the crawl command, I don't see any output, but the crawl is running... Is there a way to see information about what the crawler is doing? I have tried setting 'fetcher.verbose' to 'true' in my nutch-site.xml causing no

Re: 0.8 Intranet Crawl Output/Logging?

2006-09-13 Thread Ben Ogle
Look in the hadoop.log file under the nutch-0.8/logs dir. It should have that info. Ben jared.dunne wrote: I am using the nutch 0.8 'crawl' command to crawl some content. When I run the crawl command, I don't see any output, but the crawl is running... Is there a way to see information

Re: caching - filetypes

2006-09-13 Thread Ernesto De Santis
Hi Steven I don't know if I understand completely your email. What you mean with cache? If do you want to crawl pdf's, you need to delete the url filter for that. In your crawl-urlfilter.txt, do you have a line starting with a minus and a list of file extensions. Delete pdf extension. Good

Re: 0.8 Intranet Crawl Output/Logging?

2006-09-13 Thread wmelo
I have the same original doubt. I know that the log shows informations, but, how to see the things happening, real time, like in nutch 0.7.2, when you use the crawl command in the terminal? - Original Message - From: Ben Ogle [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent:

Re: How can I modify the crawler?

2006-09-13 Thread Ernesto De Santis
Hi suxiaoke I do something similar, I did tell it category. Do you need do it in two steps: - index your topic. - search filtering by topic. In my approach, I build a pluing to index the category. In it plugin, the category is resolved by a rules applied to the url. In your case, you know how

Re: 0.8 Intranet Crawl Output/Logging?

2006-09-13 Thread Tomi NA
On 9/13/06, wmelo [EMAIL PROTECTED] wrote: I have the same original doubt. I know that the log shows informations, but, how to see the things happening, real time, like in nutch 0.7.2, when you use the crawl command in the terminal? try something like this (assuming you know what's good for

Re: 0.8 Intranet Crawl Output/Logging?

2006-09-13 Thread Jim Wilson
If you don't know what's good for you, baretail can provide a suitable Windows alternative. http://www.baremetalsoft.com/baretail/ -- Jim On 9/13/06, Tomi NA [EMAIL PROTECTED] wrote: On 9/13/06, wmelo [EMAIL PROTECTED] wrote: I have the same original doubt. I know that the log shows

RE: Charset question

2006-09-13 Thread Ken Krugler
Thanks for your reply. I have found that the method you mentioned looks into the http header from web server. It looks for charset and does the mapping. The apache web server which contains the document has already configured: AddDefaultCharset Big5-HKSCS The crawl engine does treat the