On Wed, 2006-12-20 at 21:52 -0800, sanjeev wrote:
Hello,
My crawl index is not being created correctly using the new settings.
https://issues.apache.org/jira/browse/SOLR-88
Although the log shows no errors - I am not able to open using Luke,
it says index corrupt, access denied, invalid
Hi There,
I'm just starting with nutch and I've been trying to run the index. It seems
to work but it is getting the following error:
fetch of http://localhost/index.html/ failed with
java.lang.NullPointerException
The page is available through my browser though. When I try to run a search
in
Fixes parsing of XHTML (e.g. title)
---
Key: NUTCH-418
URL: http://issues.apache.org/jira/browse/NUTCH-418
Project: Nutch
Issue Type: Bug
Affects Versions: 0.8.2
Environment: Ubuntu Linux
[ http://issues.apache.org/jira/browse/NUTCH-418?page=all ]
Michael Wechner updated NUTCH-418:
--
Attachment: parse-xhtml-patch.txt
patch which fixes the mime-type
Fixes parsing of XHTML (e.g. title)
---
Michael Wechner wrote:
Sami Siren wrote:
Michael Wechner wrote:
Hi
It seems to me that Nutch 0.8.x cannot extract the title from an XHTML
page, e.g.
Try changing the following in your parse-plugins.xml
mimeType name=application/xhtml+xml
plugin id=parse-html /
Michael Wechner wrote:
I have added a patch
https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12359202
sorry, I actually meant
https://issues.apache.org/jira/browse/NUTCH-418
Cheers
Michi
Thanks
Michi
Cheers
Michi
--
Sami Siren
--
Michael Wechner
[
http://issues.apache.org/jira/browse/NUTCH-418?page=comments#action_12460282 ]
Sami Siren commented on NUTCH-418:
--
We should perhaps include the rest of changes made in NUTCH-362.
Fixes parsing of XHTML (e.g. title)