Yes, this is currently a bug in trunk which errors out when the content
for a given url is null. This bug is in process of being fixed.
Dennis
Alexis Votta wrote:
I have updated my copy of Nutch from subversion to revision 597822.
With minimal settings like nutch-site.xml,
Thanks, now it works, just some feedback for everybody:
- Including the Nutch conf directory in the classpath solved the NPE
- I really need to set the path to the index dir in the NutchBean constructor,
otherwise I get 0 hits (despite having a searcher.dir proporty with the path in
My guess is seeing your error below is that you didn't move over the
common-terms.utf8 or other needed files from the nutch conf directory
into the classpath of your web application.
Dennis Kubes
Wolfgang Woerndl wrote:
Hello,
I installed Nutch 0.8.1., crawled some Web pages and get
Hey,
I would like to mention 2 points :
- The nutch config files shud be in the classpath.
- The 2nd arg in NutchBean ctor is the path to index dir
I guess this shud solve the NPE
Wolfgang Woerndl wrote:
Hello,
I installed Nutch 0.8.1., crawled some Web pages and get (meaningful)
results
can u show the source html file that produces this exception?
there's an issue with pages that don't mention the content type in the
header (ususally in redirects) so nutch throws exception.
if that is the case, there's a code line in Content.java that needs to
modified,
On 9/18/07,
Carl Cerecke wrote:
The problem is that the contentType for the page (that it was redirected
to) is null.
Changing Content.java:165 to:
Text.writeString(out, contentType != null ? contentType : ); // write
contentType
fixes the problem. But is empty string better for an unknown content
The problem is that the contentType for the page (that it was redirected
to) is null.
Changing Content.java:165 to:
Text.writeString(out, contentType != null ? contentType : ); // write
contentType
fixes the problem. But is empty string better for an unknown content
type or something like
I'll try those if I get a chance. (BTW Remuneration is misspelled on
absoluteit.co.nz if you care)
--Kai M.
- Original Message
From: Carl Cerecke [EMAIL PROTECTED]
To: nutch-user@lucene.apache.org
Sent: Thursday, July 26, 2007 4:21:07 PM
Subject: Re: NullPointerException fetching some
Is anybody else getting NullPointerExceptions fetching either of these
two sites (0.90 and latest from trunk) ?
http://www.absoluteit.co.nz
http://defence.allmedia.co.nz
I am, but would be grateful if someone else could test whether they work
or not so I can eliminate nutch configuration
On 7/27/07, Carl Cerecke [EMAIL PROTECTED] wrote:
Carl Cerecke wrote:
The problem is that the contentType for the page (that it was redirected
to) is null.
Changing Content.java:165 to:
Text.writeString(out, contentType != null ? contentType : ); // write
contentType
fixes the
Hi,
On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote:
Hi,
Using nutch 0.9, although I get the same with a more recent nightly build.
I'm getting NPE fetching these two pages:
http://www.absoluteit.co.nz
and
http://defence.allmedia.co.nz
I've tracked it down by putting a t.printStackTrace()
Hi Doğacan,
Yes, I get the NullPointerException with the latest trunk, too.
Cheers,
Carl.
Doğacan Güney wrote:
Hi,
On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote:
Hi,
Using nutch 0.9, although I get the same with a more recent nightly
build.
I'm getting NPE fetching these two pages:
Hi, Included Content.java. Will retry with latest trunk shortly.
Content.java:137-149
137 protected final void writeCompressed(DataOutput out) throws
IOException {
138out.writeByte(VERSION);
139
140Text.writeString(out, url); // write url
141Text.writeString(out, base); // write
Thanks . I attached my nutch-site.xml file.
But for some reason, I now get:
$ bin/nutch fetch $s1
Fetcher: starting
Fetcher: segment: crawl/segments/20070409222306
Fetcher: java.io.IOException: Segment already fetched!
at
open nutch-site.xml and nutch-default.xml
and in the plugin.includesproperty set value like
valueindex-basic|index-more|./value
with the other values only include these plugins as extra.
Ratnesh,V2Solutions India
Meryl Silverburgh
Check whether you have included index-basic index-more plugin in your
nutch-site.xml file
the same problem was solved including this file.
hope this will solve the issue...
Ratnesh V2Solutions,India
Meryl Silverburgh wrote:
HI,
I am following the
Thanks. but how to include the index-basic, index-more plugin?
I don' t can't find that in the documentation.
Thank you.
On 4/7/07, Ratnesh,V2Solutions India
[EMAIL PROTECTED] wrote:
Check whether you have included index-basic index-more plugin in your
nutch-site.xml file
the same problem
I hope someone can help me with this problem.
This works fine:
#bin/nutch crawl urls.txt
and it creates a directory named something like crawl-20060418105008,
with a working index.
However if I try to add any parameters beyond the root_url_file
parameter I get the output below. I'm really
I didn't see query-basic/query-more on your list of plugins included. This
is what
handles most queries usually. query-url will only handle parts of the
query that look like url:http://www.google.com, and query-site handles
site:www.google.com. Nothing seems to be handling just regular
text in
On 06/03/06, Howie Wang [EMAIL PROTECTED] wrote:
Is query-basic or query-more included in your nutch-default.xml?
It is indeed included in my nutch-site.xml :-
property
nameplugin.includes/name
Hi, Hasan,
Looking more carefully at the query-more plugin, it seems that it
only adds functionality for date queries and type queries. I think
you need to add query-basic to the list also to get it to search
the default content. Can you try adding query-basic and running:
bin/nutch search http
Hi,
http or www are very good test queries.
double check that the nutch-default.xml which inside the nutch.war
points to the correct folder namesearcher.dir/name.
Stefan
Am 06.03.2006 um 02:31 schrieb Hasan Diwan:
I've followed the nutch tutorial for crawling and started tomcat from
the
If none are being fetched, something is definaltely wrong with
your filter or url file.
Yes, since it is blog it may has dynamic pages like foo.com?entry=23
this definitely filtered by default.
-
blog: http://www.find23.org
company:
Gentlemen:
On 05/03/06, Richard Braman [EMAIL PROTECTED] wrote:
This sounds like your crawl didn't get anything. I have seen that
happen when the url wasn't added right, or the filter was bad. Pipe the
crawl to crawl.log and look in there. It should show some pages being
fecthed. If none
It did fetch some urls:
-Original Message-
From: Jack Tang [mailto:[EMAIL PROTECTED]
Sent: Sunday, March 05, 2006 9:35 PM
To: nutch-user@lucene.apache.org
Subject: Re: NullPointerException
Hey Hasan
Crawling seems ok. Can you pls try org.apache.nutch.searcher.NutchBean
[your-query
Mr Tang:
Crawling seems ok. Can you pls try org.apache.nutch.searcher.NutchBean
[your-query-string] in shell/cmd?
server: 7:20pm % ./bin/nutch org.apache.nutch.searcher.NutchBean hasan
060305 192042 10 parsing file:/home/hdiwan/nutch-0.7.1/conf/nutch-default.xml
060305 192042 10 parsing
Mr Tang:
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
Weird! You are running nutch on local file system or distributed file system?
Local file system
And can you find the same query hasan via luke?
Nope
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]
On 3/6/06, Hasan Diwan [EMAIL PROTECTED] wrote:
Mr Tang:
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
Weird! You are running nutch on local file system or distributed file
system?
Local file system
And can you find the same query hasan via luke?
Nope
ok. As stepan said, can you get
On 3/6/06, Hasan Diwan [EMAIL PROTECTED] wrote:
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
ok. As stepan said, can you get any hit when you try to search http or
www?
No
Hey, can you zip the index and send it to me directly?
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]
--
Keep
Hasan
It seems your index is not completed.
If you get whole(correct) indices, index dir should include
1. segements file
2. deletable file
3. other files
I am not sure what's wrong in nutch-0.7.1 indexing, but now it is
possible to upgrade to nutch 0.8(svn version)?
/Jack
On 3/6/06, Jack
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
I am not sure what's wrong in nutch-0.7.1 indexing, but now it is
possible to upgrade to nutch 0.8(svn version)?
It is possible, but I was under the assumption that 0.8 required NDFS?
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
You can still build it on local file system:)
Build, yes, but what of deployment? Can I use it in the same way? At
present, I don't have enough resources to run a distributed crawl.
--
Cheers,
Hasan Diwan [EMAIL PROTECTED]
On 3/6/06, Hasan Diwan [EMAIL PROTECTED] wrote:
On 05/03/06, Jack Tang [EMAIL PROTECTED] wrote:
You can still build it on local file system:)
Build, yes, but what of deployment? Can I use it in the same way?
Of course yes.
At
present, I don't have enough resources to run a distributed
Right then.. compiled the svn version of nutch. Tried running the
crawl with it and this is the log:
server: 11:32pm % ./bin/nutch crawl ../SpectraSearch/urls -dir
../SpectraSearch/crawl -depth 2 -threads 20
060305 233255 parsing
34 matches
Mail list logo