Hello Everyone,
I am using Lucene Nutch in my project for searching content in the webpages.
For a webpage or any other document, Lucene takes all the words in the page and
indexes them and returns the result when searched.
Lets say, I have 2 webpages as shown below:
Webpage1
, Kunal Wku wrote:
Hello Everyone,
I am using Lucene Nutch in my project for searching content in the
webpages.
For a webpage or any other document, Lucene takes all the words in the
page and indexes them and returns the result when searched.
Lets say, I have 2 webpages as shown below
Hello,
I worked on a plugin using the reference webpage:
http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
After setting everything, finally when I compile using Ant 1.6.0, it says
build successfully. But when I look in the build folder, nutch-0.9 war file is
not found,
Hello Everyone,
Can anyone please let me know regarding the page ranking technology used by
lucene nutch. I was not able to find any documentation regarding it. If you
have any document regarding the ranking algorithms used, please e-mail me.
Thanks Regards,
Kunal Gosar
Hello Everyone,
I have one question. I have used a plugin for searching metadata, called
recommended using this webpage:
http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
When I am searching using nutch, I did not find any difference in the normal
search and the metadata search.
Hello Everyone,
I have 2 meta tags in the html file.
For example, subject:english and professor:john
i have added 2 plugins for the respective meta data - subject professor.
If I query 'subject:english' in nutch, it results me the pages containing
meta data subject:english.
Hello,
I have a webpage consisting of around 300 hyperlinks to other pages. When I
use the crawl using Cygwin, it is crawling around 80 pages (hyperlinks). How
can I crawl over the whole webpage i.e., cover all the hyperlinks ?
Thanks Regards,
Kunal
Hello Everyone,
I encountered errors during the crawl process as follows:
java.lang.OutOfMemoryError: Java heap space
fetcher caught:java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
fetcher caught:java.lang.OutOfMemoryError: Java heap space
Hello,
I am using Nutch 0.9. I would like to enable multi-lingual support in our
existing system. I read the article on Multi-Lingual Support in Nutch by Jérôme
Charron. But it is about the previous versions of Nutch. I included the plugin
in Nutch-Site.xml as analysis-es. What are the other