Multi-Lingual Support in Nutch

2009-04-13 Thread Kunal Wku
Hello,

I am using Nutch 0.9. I would like to enable multi-lingual support in our 
existing system. I read the article on Multi-Lingual Support in Nutch by Jérôme 
Charron. But it is about the previous versions of Nutch. I included the plugin 
in Nutch-Site.xml as analysis-es. What are the other steps to be followed to 
enable multi-lingual support ?

Thanks  Regards,
Kunal



  

Out of Memory Error While Crawling

2007-11-05 Thread Kunal Wku
Hello Everyone,
   
  I encountered errors during the crawl process as follows:
   
  java.lang.OutOfMemoryError: Java heap space
fetcher caught:java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
fetcher caught:java.lang.OutOfMemoryError: Java heap space
Exception in thread main java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
 at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
 at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
   
  Please help me solve this.
   
  Thanks  Regards,
  Kunal Gosar

 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Crawl Problem

2007-10-29 Thread Kunal Wku
Hello,
   
  I have a webpage consisting of around 300 hyperlinks to other pages. When I 
use the crawl using Cygwin, it is crawling around 80 pages (hyperlinks). How 
can I crawl over the whole webpage i.e., cover all the hyperlinks ? 
   
  Thanks  Regards,
  Kunal

 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Searching multiple meta fields in a single query

2007-10-03 Thread Kunal Wku
Hello Everyone,
   
  I have 2 meta tags in the html file.
  For example, subject:english and professor:john
   
  i have added 2 plugins for the respective meta data - subject  professor.
   
  If I query 'subject:english' in nutch, it results me the pages containing 
meta data subject:english.
   
  If I query 'subject:english professor:john' in nutch, it doesnt give me any 
results. How can I query mulitple meta tags in a single query ?
   
  Please help me solve this problem.
   
  Thanks  Regards,
  Kunal
   

   
-
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, 
photos  more. 

Ranking Technology

2007-09-21 Thread Kunal Wku
Hello Everyone,
   
  Can anyone please let me know regarding the page ranking technology used by 
lucene  nutch. I was not able to find any documentation regarding it. If you 
have any document regarding the ranking algorithms used, please e-mail me.
   
  Thanks  Regards,
  Kunal Gosar

   
-
Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us.

Plugin for Metadata

2007-09-21 Thread Kunal Wku
Hello Everyone,
   
  I have one question. I have used a plugin for searching metadata, called 
recommended using this webpage:
  http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
  When I am searching using nutch, I did not find any difference in the normal 
search and the metadata search. The word from metadata should get the greater 
ranking than the normal one. But I am not able to get it. Please help me solve 
this problem.
   
  Thanks  Regards,
  Kunal Gosar

   
-
Looking for a deal? Find great prices on flights and hotels with Yahoo! 
FareChase.

Problem: Compiling Plugin Using Ant

2007-09-12 Thread Kunal Wku
Hello,
   
  I worked on a plugin using the reference webpage:
  http://wiki.apache.org/nutch/WritingPluginExample-0%2e9
   
  After setting everything, finally when I compile using Ant 1.6.0, it says 
build successfully. But when I look in the build folder, nutch-0.9 war file is 
not found, instead nutch-0.9 file of type 'Task Object' is found. Please help 
me solving this problem.
   
  Thanks  Regards,
  Kunal

   
-
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.

Re: Regarding Lucene Nutc

2007-09-10 Thread Kunal Wku
Hello Aditya,
   
  Thank you for your reply. I just your e-mail and I will try implementing your 
idea.
  I think using this idea, the search results me the files in which the 
required word appears in the content as well as the metadata of the file. My 
requirement is that the search should result me the files in which the required 
word appears only in the metadata of the file i.e., it should search only in 
the metadata (the required word may appear in the content of the file too. but 
it need not search in the content of the file). How can I achieve this ?
   
  Thanks  Regards,
  Kunal

aditya naga hemanth kumar [EMAIL PROTECTED] wrote:
  Hi
You can search a file in the meta-data fields and default fields that are
indexed by the search engine.Say you have a set of files which belong to
operating system course.You can add a meta-data field subject with value
operating systems to all the files directly by using XMP.
Then when you are indexing with lucene you can add a separate field called
subject for each document.When searching you can boost the score if the
query matches with the value of subject field which brings it to the
top.Hope this helps

Cheers
Aditya V

On 9/7/07, Kunal Wku wrote:

 Hello Everyone,

 I am using Lucene  Nutch in my project for searching content in the
 webpages.
 For a webpage or any other document, Lucene takes all the words in the
 page and indexes them and returns the result when searched.

 Lets say, I have 2 webpages as shown below:

 Webpage1
 --
 This is the course page of Computer Science Department
 Subject: Operating System I
 Professor: Qi Li
 Details:
 The course operating system I deals with the basics of the operating
 system. Mainly the three topics dealt are process management, storage
 management  memory mangement.
 etc
 ..
 --

 Webpage2
 --
 This is the home page of Computer Science Department
 The computer science department offers courses at undergradudate level
 and
 graduate level. The core courses for the graduate students
 are Mathematical Foundations of Computer Science, Compilers, Advanced
 Database, Analysis of Algorithms and Operating Systems.
 etc
 ..
 --

 Now if I search using the word operating system, the results shows
 both the webpages (webpage 1  webpage2) since the word operating system
 exists in both the webpage.

 But my requirement is different. If I want to search the word Operating
 System which should appear in the subject field i.e., as in the webpage1,
 the result should show only webpage1. How can I achieve this result ?

 Please help me in this regard.
 Thanks  Regards,
 Kunal Gosar



 -
 Be a better Globetrotter. Get better travel answers from someone who
 knows.
 Yahoo! Answers - Check it out.


   
-
Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, 
when. 

Regarding Lucene Nutch

2007-09-07 Thread Kunal Wku
Hello Everyone,
   
  I am using Lucene  Nutch in my project for searching content in the webpages.
For a webpage or any other document, Lucene takes all the words in the page and 
indexes them and returns the result when searched.
   
  Lets say, I have 2 webpages as shown below:
   
  Webpage1
--
This is the course page of Computer Science Department
  Subject: Operating System I
Professor: Qi Li
  Details:
The course operating system I deals with the basics of the operating system. 
Mainly the three topics dealt are process management, storage management  
memory mangement. etc
..
--
   
  Webpage2
--
This is the home page of Computer Science Department
  The computer science department offers courses at undergradudate level and 
graduate level. The core courses for the graduate students are  Mathematical 
Foundations of Computer Science, Compilers, Advanced Database, Analysis of 
Algorithms and Operating Systems. etc
..
--
   
  Now if I search using the word operating system, the results shows both the 
webpages (webpage 1  webpage2) since the word operating system exists in 
both the webpage. 
   
  But my requirement is different. If I want to search the word Operating 
System which should appear in the subject field i.e., as in the webpage1, the 
result should show only webpage1. How can I achieve this result ? 
   
  Please help me in this regard.
  Thanks  Regards,
Kunal Gosar


   
-
Be a better Globetrotter. Get better travel answers from someone who knows.
Yahoo! Answers - Check it out.