Dear Nutch Developers, I have three questions for using nutch.
(1) Following Nutch Tutorial, we excuted the under script on Redhat Linux 8. A big size data file was created under director segments/20040303*/ Then, I copy this directory (segments) to jakarta-tomcat-4.1.30/ and run tomcat as follows: jakarta-tomcat-4.1.30/bin/catalina.sh start And after I accessed localhost:8080 and search with query like ``Iraq'', there is no answer as ``Hits 1-0 (out of 0 total matching documents). What is the problem? (2) I would like to crawl only English newspapers in Japan. Therefore, I changed ``content.rdf.u8'' as the attached in this mail. When I checked segments/*, no big size files were created. What is the problem? (3) How can I crawl Japanese newspapers (written in Japanese language) and search them? Thank you for your attention. Sincerely Yours, Yohei Seki [EMAIL PROTECTED] %-- (1) script #!/bin/bash rm -rf db segments mkdir db mkdir segments bin/nutch admin db -create bin/nutch inject db -dmozfile content.rdf.u8 -subset 3000 bin/nutch generate db segments s1=`ls -d segments/2* | tail -1` echo $s1 bin/nutch fetch $s1 bin/nutch updatedb db $s1 bin/nutch analyze db 5 bin/nutch generate db segments -topN 1000 s2=`ls -d segments/2* | tail -1` echo $s2 bin/nutch fetch $s2 bin/nutch updatedb db $s2 bin/nutch analyze db 2 bin/nutch generate db segments -topN 1000 s3=`ls -d segments/2* | tail -1` echo $s3 bin/nutch fetch $s3 bin/nutch updatedb db $s3 %-- (1) end %-- (2) content.rdf.u8 <?xml version='1.0' encoding='UTF-8' ?> <RDF xmlns:r="http://www.w3.org/TR/RDF/" xmlns:d="http://purl.org/dc/elements/1.0/" xmlns="http://dmoz.org/rdf"> <!-- Generated at 2004-02-23 20:20:18 GMT on dust --> <Topic r:id="Top"> <catid>1</catid> </Topic> <Topic r:id="Top/Regional/Asia/Japan/News_and_Media/Newspapers"> <catid>25141</catid> <link r:resource="http://www.yomiuri.co.jp/index-e.htm"/> <link r:resource="http://home.kyodo.co.jp/"/> <link r:resource="http://www.japantimes.co.jp/"/> <link r:resource="http://www.hokkoku.co.jp/_e_index/e_index.htm"/> <link r:resource="http://www.nni.nikkei.co.jp/"/> <link r:resource="http://www.mainichi.co.jp/english/index.html"/> <link r:resource="http://www.eal.or.jp/CW/"/> <link1 r:resource="http://newslink.org/nonusajap.html"/> <link r:resource="http://www.sankei.co.jp/databox/e_seiron"/> <link r:resource="http://www.asahi.com/english/"/> </Topic> <ExternalPage about="http://www.yomiuri.co.jp/index-e.htm"> <d:Title>Daily Yomiuri On-Line</d:Title> <d:Description>The online presence of the Daily Yomiuri - one of Japan's most respected newspapers (English/Kanji)</d:Description> </ExternalPage> <ExternalPage about="http://home.kyodo.co.jp/"> <d:Title>Kyodo News Web</d:Title> <d:Description>The Kyodo newswire service (English/Kanji)</d:Description> </ExternalPage> <ExternalPage about="http://www.japantimes.co.jp/"> <d:Title>Japan Times Online</d:Title> <d:Description>Online extension of The Japan Times.</d:Description> </ExternalPage> <ExternalPage about="http://www.hokkoku.co.jp/_e_index/e_index.htm"> <d:Title>Hokkoku Shimbun</d:Title> <d:Description>News from Hokuriku Kanazawa City, Japan (English/Kanji)</d:Desc ription> </ExternalPage> <ExternalPage about="http://www.nni.nikkei.co.jp/"> <d:Title>Nikkei Net Interactive</d:Title> <d:Description>Financial news and information (English)</d:Description> </ExternalPage> <ExternalPage about="http://www.mainichi.co.jp/english/index.html"> <d:Title>Mainichi Interactive</d:Title> <d:Description>The online version of the Mainichi Newspaper.</d:Description> </ExternalPage> <ExternalPage about="http://www.eal.or.jp/CW/"> <d:Title>The Chubu Weekly Online</d:Title> <d:Description>Bi-weekly English news source for the Chubu region of Japan. In cludes PDF and video streaming news service.</d:Description> </ExternalPage> <ExternalPage about="http://newslink.org/nonusajap.html"> <d:Title>AJR Newslink</d:Title> <d:Description>Index to the English versions of newspapers in Japan.</d:Descri ption> <priority>1</priority> </ExternalPage> <ExternalPage about="http://www.sankei.co.jp/databox/e_seiron"> <d:Title>Sankei Shimbun - Seiron</d:Title> <d:Description>Selected Op-Ed columns from the Sankei Shimbun.</d:Description> </ExternalPage> <ExternalPage about="http://www.asahi.com/english/"> <d:Title>Asahi Shimbun</d:Title> <d:Description>A well respected source with online news summaries. (Kanji/Engl ish)</d:Description> </ExternalPage> </RDF> %-- -- Yohei Seki <[EMAIL PROTECTED]> ------------------------------------------------------- SF.Net is sponsored by: Speed Start Your Linux Apps Now. Build and deploy apps & Web services for Linux with a free DVD software kit from IBM. Click Now! http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
