[
https://issues.apache.org/jira/browse/NUTCH-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinci updated NUTCH-624:
Description:
I found the parsed text by default parser, Neko in 1.0 nightly is not easy to
process - it just
Hi,
Thank you for your feedback.
The default parsed text dumped by readseg utility is just giving the parsed
text in space, that is not easy to process:
I need to process the text in sentence-by-sentence manner.However in most of
page I crawled, there is no footstop or comma appear in the end of
[
https://issues.apache.org/jira/browse/NUTCH-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinci updated NUTCH-625:
Description:
If the crawl db contains both utf-8 non-ascii character and non-utf-8 non-ascii
character(i.e.
Hi, i'm newbie in here.
When I read Better Search with Apacke Lucene and Solr, i found a LSI
approache.
Is there any LSI implementation?
I'm interested in the problem of scalability and the parallel matrix operations.
Thanks.
--
B. Regards,
Edward J. Yoon