Dear Nutch Developers,

I have three questions for using nutch.

(1) Following Nutch Tutorial, we excuted the under script on Redhat
Linux 8. A big size data file was created under director
segments/20040303*/

Then, I copy this directory (segments) to jakarta-tomcat-4.1.30/ and run
tomcat as follows:

jakarta-tomcat-4.1.30/bin/catalina.sh start

And after I accessed localhost:8080 and search with query like ``Iraq'',
there is no answer as ``Hits 1-0 (out of 0 total matching documents).

What is the problem?

(2) I would like to crawl only English newspapers in Japan. Therefore, I
changed ``content.rdf.u8'' as the attached in this mail. When I checked
segments/*, no big size files were created.

What is the problem?

(3) How can I crawl Japanese newspapers (written in Japanese language)
and search them?

Thank you for your attention.

Sincerely Yours,

Yohei Seki [EMAIL PROTECTED]

%-- (1) script
#!/bin/bash
rm -rf db segments
mkdir db
mkdir segments
bin/nutch admin db -create
bin/nutch inject db -dmozfile content.rdf.u8 -subset 3000
bin/nutch generate db segments
s1=`ls -d segments/2* | tail -1`
echo $s1
bin/nutch fetch $s1
bin/nutch updatedb db $s1
bin/nutch analyze db 5
bin/nutch generate db segments -topN 1000
s2=`ls -d segments/2* | tail -1`
echo $s2
bin/nutch fetch $s2
bin/nutch updatedb db $s2
bin/nutch analyze db 2
bin/nutch generate db segments -topN 1000
s3=`ls -d segments/2* | tail -1`
echo $s3
bin/nutch fetch $s3
bin/nutch updatedb db $s3
%-- (1) end

%-- (2) content.rdf.u8
<?xml version='1.0' encoding='UTF-8' ?>
<RDF xmlns:r="http://www.w3.org/TR/RDF/";
     xmlns:d="http://purl.org/dc/elements/1.0/";
     xmlns="http://dmoz.org/rdf";>

<!-- Generated at 2004-02-23 20:20:18 GMT on dust -->

<Topic r:id="Top">
  <catid>1</catid>
</Topic>

<Topic r:id="Top/Regional/Asia/Japan/News_and_Media/Newspapers">
  <catid>25141</catid>
  <link r:resource="http://www.yomiuri.co.jp/index-e.htm"/>
  <link r:resource="http://home.kyodo.co.jp/"/>
  <link r:resource="http://www.japantimes.co.jp/"/>
  <link r:resource="http://www.hokkoku.co.jp/_e_index/e_index.htm"/>
  <link r:resource="http://www.nni.nikkei.co.jp/"/>
  <link r:resource="http://www.mainichi.co.jp/english/index.html"/>
  <link r:resource="http://www.eal.or.jp/CW/"/>
  <link1 r:resource="http://newslink.org/nonusajap.html"/>
  <link r:resource="http://www.sankei.co.jp/databox/e_seiron"/>
  <link r:resource="http://www.asahi.com/english/"/>
</Topic>

<ExternalPage about="http://www.yomiuri.co.jp/index-e.htm";>
  <d:Title>Daily Yomiuri On-Line</d:Title>
  <d:Description>The online presence of the Daily Yomiuri - one of Japan's most
respected newspapers (English/Kanji)</d:Description>
</ExternalPage>

<ExternalPage about="http://home.kyodo.co.jp/";>
  <d:Title>Kyodo News Web</d:Title>
  <d:Description>The Kyodo newswire service (English/Kanji)</d:Description>
</ExternalPage>

<ExternalPage about="http://www.japantimes.co.jp/";>
  <d:Title>Japan Times Online</d:Title>
  <d:Description>Online extension of The Japan Times.</d:Description>
</ExternalPage>

<ExternalPage about="http://www.hokkoku.co.jp/_e_index/e_index.htm";>
  <d:Title>Hokkoku Shimbun</d:Title>
  <d:Description>News from Hokuriku Kanazawa City, Japan (English/Kanji)</d:Desc
ription>
</ExternalPage>

<ExternalPage about="http://www.nni.nikkei.co.jp/";>
  <d:Title>Nikkei Net Interactive</d:Title>
  <d:Description>Financial news and information (English)</d:Description>
</ExternalPage>

<ExternalPage about="http://www.mainichi.co.jp/english/index.html";>
  <d:Title>Mainichi Interactive</d:Title>
  <d:Description>The online version of the Mainichi Newspaper.</d:Description>
</ExternalPage>

<ExternalPage about="http://www.eal.or.jp/CW/";>
  <d:Title>The Chubu Weekly Online</d:Title>
  <d:Description>Bi-weekly English news source for the Chubu region of Japan. In
cludes PDF and video streaming news service.</d:Description>
</ExternalPage>

<ExternalPage about="http://newslink.org/nonusajap.html";>
  <d:Title>AJR Newslink</d:Title>
  <d:Description>Index to the English versions of newspapers in Japan.</d:Descri
ption>
  <priority>1</priority>
</ExternalPage>

<ExternalPage about="http://www.sankei.co.jp/databox/e_seiron";>
  <d:Title>Sankei Shimbun - Seiron</d:Title>
  <d:Description>Selected Op-Ed columns from the Sankei Shimbun.</d:Description>
</ExternalPage>

<ExternalPage about="http://www.asahi.com/english/";>
  <d:Title>Asahi Shimbun</d:Title>
  <d:Description>A well respected source with online news summaries. (Kanji/Engl
ish)</d:Description>
</ExternalPage>


</RDF>
%--
-- 
Yohei Seki <[EMAIL PROTECTED]>



-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to