Hi there
I wonder if there is a way to return the results explanation as an
JavaObject instead of returning HTML.
nutchBean.getExplanation(query, show[i]) will give me ul... and so
on which not very handy
maybe it should return an Object and another method getExplanationAsHtml
will give the
Hi
In the case of relative paths the plugins folder path is being resolved
relative to the classpath, but
I would like to resolve it relative to nutch-local.xml.
Is that possible somehow? Or would it make sense to add an attribute
where one could set how relative paths could
be resolved?
David Podunavac schrieb:
When I try to instantiate NutchBean i get this Error message even
thought i got the class in my lib direcotry
i am not sure if i missed anything but help is appreciated
has anyone occured the same weird error and found a solution i would be
very pleased to get this
Hello,
We're currently developing an application using the Lucene API for
building a search engine and as part of the application we have a
component for parsing several file formats. For this component we were
hoping to use several of the plug-ins in Nutch and we have written
classes in our own
Is the plugins folder in the root of the war?
Dennis
Trym B. Asserson wrote:
Hello,
We're currently developing an application using the Lucene API for
building a search engine and as part of the application we have a
component for parsing several file formats. For this component we were
I went to install nutch whith new version of java 1.*.08
i got on error went im installing java.
[EMAIL PROTECTED] ]# rpmbuild --rebuild Desktop/java-1.5.0-sun*src.rpm
Installing Desktop/java-1.5.0-sun-compat-1.5.0.08-1jpp.src.rpm
warning: user scop does not exist - using root
warning: group
Hello again,
Thanks for the reply. With some intensive use of Filemon I eventually
figured out where the classes were looking and the plugins + conf-files
had to be moved into the WEB-INF\classes directory.
We now have it fully functional, cheers.
But to answer your question, no we'd tried
Hey,I am confused with the crawling with nutch.
as you know,there are some website which can not be accessed becaused
they are the postmethod,that means,even if you know the web site's
url,when you input the url into the address bar on the IE or Mozilla,the
website 's some important content has
Hi,
I set the http.content.limit to -1 to not truncate any data being
fetched, however if the fetched data was compressed (http response
header Content-Encoding: gzip) then Nutch was not able to uncompress
this data. If i set http.content.limit to its default value of 65536,
Nutch did not have
I am using the nutch 0.8 'crawl' command to crawl some content. When I
run the crawl command, I don't see any output, but the crawl is
running... Is there a way to see information about what the crawler is
doing?
I have tried setting 'fetcher.verbose' to 'true' in my nutch-site.xml
causing no
Look in the hadoop.log file under the nutch-0.8/logs dir. It should have that
info.
Ben
jared.dunne wrote:
I am using the nutch 0.8 'crawl' command to crawl some content. When I
run the crawl command, I don't see any output, but the crawl is
running... Is there a way to see information
Hi Steven
I don't know if I understand completely your email.
What you mean with cache?
If do you want to crawl pdf's, you need to delete the url filter for that.
In your crawl-urlfilter.txt, do you have a line starting with a minus
and a list of file extensions. Delete pdf extension.
Good
I have the same original doubt. I know that the log shows informations,
but, how to see the things happening, real time, like in nutch 0.7.2, when
you use the crawl command in the terminal?
- Original Message -
From: Ben Ogle [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org
Sent:
Hi suxiaoke
I do something similar, I did tell it category.
Do you need do it in two steps:
- index your topic.
- search filtering by topic.
In my approach, I build a pluing to index the category. In it plugin,
the category is resolved by a rules applied to the url. In your case,
you know how
On 9/13/06, wmelo [EMAIL PROTECTED] wrote:
I have the same original doubt. I know that the log shows informations,
but, how to see the things happening, real time, like in nutch 0.7.2, when
you use the crawl command in the terminal?
try something like this (assuming you know what's good for
If you don't know what's good for you, baretail can provide a suitable
Windows alternative.
http://www.baremetalsoft.com/baretail/
-- Jim
On 9/13/06, Tomi NA [EMAIL PROTECTED] wrote:
On 9/13/06, wmelo [EMAIL PROTECTED] wrote:
I have the same original doubt. I know that the log
shows
Thanks for your reply.
I have found that the method you mentioned looks into the http header from
web server. It looks for charset and does the mapping. The apache web
server which contains the document has already configured:
AddDefaultCharset Big5-HKSCS
The crawl engine does treat the
17 matches
Mail list logo