Re: implement thai language indexing and search

2006-12-21 Thread Thorsten Scherler
On Wed, 2006-12-20 at 21:52 -0800, sanjeev wrote: Hello, My crawl index is not being created correctly using the new settings. https://issues.apache.org/jira/browse/SOLR-88 Although the log shows no errors - I am not able to open using Luke, it says index corrupt, access denied, invalid

crawl null pointer

2006-12-21 Thread hyrogen
Hi There, I'm just starting with nutch and I've been trying to run the index. It seems to work but it is getting the following error: fetch of http://localhost/index.html/ failed with java.lang.NullPointerException The page is available through my browser though. When I try to run a search in

[jira] Created: (NUTCH-418) Fixes parsing of XHTML (e.g. title)

2006-12-21 Thread Michael Wechner (JIRA)
Fixes parsing of XHTML (e.g. title) --- Key: NUTCH-418 URL: http://issues.apache.org/jira/browse/NUTCH-418 Project: Nutch Issue Type: Bug Affects Versions: 0.8.2 Environment: Ubuntu Linux

[jira] Updated: (NUTCH-418) Fixes parsing of XHTML (e.g. title)

2006-12-21 Thread Michael Wechner (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-418?page=all ] Michael Wechner updated NUTCH-418: -- Attachment: parse-xhtml-patch.txt patch which fixes the mime-type Fixes parsing of XHTML (e.g. title) ---

Re: Extracting title from XHTML pages

2006-12-21 Thread Michael Wechner
Michael Wechner wrote: Sami Siren wrote: Michael Wechner wrote: Hi It seems to me that Nutch 0.8.x cannot extract the title from an XHTML page, e.g. Try changing the following in your parse-plugins.xml mimeType name=application/xhtml+xml plugin id=parse-html /

Re: Extracting title from XHTML pages

2006-12-21 Thread Michael Wechner
Michael Wechner wrote: I have added a patch https://issues.apache.org/jira/secure/ManageAttachments.jspa?id=12359202 sorry, I actually meant https://issues.apache.org/jira/browse/NUTCH-418 Cheers Michi Thanks Michi Cheers Michi -- Sami Siren -- Michael Wechner

[jira] Commented: (NUTCH-418) Fixes parsing of XHTML (e.g. title)

2006-12-21 Thread Sami Siren (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-418?page=comments#action_12460282 ] Sami Siren commented on NUTCH-418: -- We should perhaps include the rest of changes made in NUTCH-362. Fixes parsing of XHTML (e.g. title)