[ http://issues.apache.org/jira/browse/NUTCH-289?page=all ]
Enis Soztutar updated NUTCH-289:
Attachment: ipInCrawlDatumDraftV5.1.patch
The version 5 patch does not run on the current build. So i have fixed it and
resend the patch(did not changed any
[
http://issues.apache.org/jira/browse/NUTCH-289?page=comments#action_12450315 ]
Uros Gruber commented on NUTCH-289:
---
One question. Why does IP need to be in CrawlDatum and not in metadata?
CrawlDatum should store IP address
ok. I was able to enable the language identifier plugin by adding the value
in plugin.includes attribute
in nutch-site.xml - but i'm not sure just by doing that I can have thai text
recognized and tokenized
properly.
What else do I have to do ? Please help me.
Thanks and regards,
sanjeev.
ok. I was able to enable the language identifier plugin by adding the
value
in plugin.includes attribute
in nutch-site.xml - but i'm not sure just by doing that I can have thai
text
recognized and tokenized
properly.
What else do I have to do ? Please help me.
1. You must create a thai NGP
Thanks Jerome,
i used an existing ThaiAnalyzer which was in lucene package.
ok - i renamed the lucene.analysis.th.* to nutch.analysis.th.* - compiled
and
placed all class files in a jar - analysis-th.jar (do i need to bundle the
ngp file in the jar as well ?)
take a look at the log file for a
Hello committers,
Based on a recent discussion on nutch user list - (Strategic Direction
of Nutch) I would like to prepare 0.7.3 release. The idea is to allow
people who still use 0.7.2 to get rid of most important bugs and allow
them to add some small features they would need as the claim is