Regarding Lucene Nutch

2007-09-07 Thread Kunal Wku
Hello Everyone, I am using Lucene Nutch in my project for searching content in the webpages. For a webpage or any other document, Lucene takes all the words in the page and indexes them and returns the result when searched. Lets say, I have 2 webpages as shown below: Webpage1

Re: Regarding Lucene Nutc

2007-09-10 Thread Kunal Wku
, Kunal Wku wrote: Hello Everyone, I am using Lucene Nutch in my project for searching content in the webpages. For a webpage or any other document, Lucene takes all the words in the page and indexes them and returns the result when searched. Lets say, I have 2 webpages as shown below

Problem: Compiling Plugin Using Ant

2007-09-12 Thread Kunal Wku
Hello, I worked on a plugin using the reference webpage: http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 After setting everything, finally when I compile using Ant 1.6.0, it says build successfully. But when I look in the build folder, nutch-0.9 war file is not found,

Ranking Technology

2007-09-21 Thread Kunal Wku
Hello Everyone, Can anyone please let me know regarding the page ranking technology used by lucene nutch. I was not able to find any documentation regarding it. If you have any document regarding the ranking algorithms used, please e-mail me. Thanks Regards, Kunal Gosar

Plugin for Metadata

2007-09-21 Thread Kunal Wku
Hello Everyone, I have one question. I have used a plugin for searching metadata, called recommended using this webpage: http://wiki.apache.org/nutch/WritingPluginExample-0%2e9 When I am searching using nutch, I did not find any difference in the normal search and the metadata search.

Searching multiple meta fields in a single query

2007-10-03 Thread Kunal Wku
Hello Everyone, I have 2 meta tags in the html file. For example, subject:english and professor:john i have added 2 plugins for the respective meta data - subject professor. If I query 'subject:english' in nutch, it results me the pages containing meta data subject:english.

Crawl Problem

2007-10-29 Thread Kunal Wku
Hello, I have a webpage consisting of around 300 hyperlinks to other pages. When I use the crawl using Cygwin, it is crawling around 80 pages (hyperlinks). How can I crawl over the whole webpage i.e., cover all the hyperlinks ? Thanks Regards, Kunal

Out of Memory Error While Crawling

2007-11-05 Thread Kunal Wku
Hello Everyone, I encountered errors during the crawl process as follows: java.lang.OutOfMemoryError: Java heap space fetcher caught:java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space fetcher caught:java.lang.OutOfMemoryError: Java heap space

Multi-Lingual Support in Nutch

2009-04-13 Thread Kunal Wku
Hello, I am using Nutch 0.9. I would like to enable multi-lingual support in our existing system. I read the article on Multi-Lingual Support in Nutch by Jérôme Charron. But it is about the previous versions of Nutch. I included the plugin in Nutch-Site.xml as analysis-es. What are the other