hi all

2006-04-02 Thread kauu
hi all: i get a big problem when crawl the ftp. it seems that Nutch couldn't parse or index the files named in Chinese so after the command looks like: bin/nutch crawl urls.txt -dir test.dir (i've modified the crawl-urlfilter.txt) # skip file:, ftp:, mailto: urls

RE: Multiple crawls how to get them to work together

2006-04-02 Thread Dan Morrill
Berlin, Sorry about the delay - I have dumped my entire experience on my blog http://infosecandpolitics.blogspot.com including shell scripts, merging, whole web crawls and the rest of the lot. The shell script was posted on Thursday on the blog, and this morning was a wrap up of getting the

Ubsubscribe

2006-04-02 Thread Shahinul Islam
On 3/5/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: - This email was sent using SquirrelMail. Webmail for nuts! http://squirrelmail.org/ -- This message has been scanned for viruses and dangerous content and is believed to be clean.

Re: hi all

2006-04-02 Thread kauu
thx for advice! now i know what's up. but my OS is WinXp(CHINESE), it supports Chinese very well. and i used the LUKE to see the index, ant there are messy character when crawl the Chinese webs. so ,how can i deal with it?? any reply will be appreciated. On 4/2/06, Dan Morrill [EMAIL

RE: hi all

2006-04-02 Thread Dan Morrill
Kauu, Are you using the simplified Chinese character localaization package for windows XP, or are you using the non simplied UTF version? You might need an IME from here http://www.microsoft.com/windows/ie/downloads/recommended/ime/default.mspx That may help out. Since you are using Luke to

Re: Nutch 0.7.2 release | upgrading from 0.7.1?

2006-04-02 Thread Håvard W. Kongsgård
What about upgrading from 0.7.1? Can I use my existing db and segments? Piotr Kosiorowski wrote: Hello all, The 0.7.2 release of Nutch is now available. This is a bug fix release for 0.7 branch. See CHANGES.txt

Re: Nutch 0.7.2 release

2006-04-02 Thread Piotr Kosiorowski
Yes. Correct link is http://svn.apache.org/viewcvs.cgi/lucene/nutch/branches/branch-0.7/CHANGES.txt?rev=390158 It was used on the Web site but I made a mistake while pasting it into email (I used the one for 0.7.1 release). Thanks for spotting it. Regrads Piotr On 4/1/06, TDLN [EMAIL PROTECTED]

Re: Nutch 0.7.2 release | upgrading from 0.7.1?

2006-04-02 Thread Piotr Kosiorowski
The 0.7.2 release should work without problems with 0.7.1 data. Regards Piotr On 4/2/06, Håvard W. Kongsgård [EMAIL PROTECTED] wrote: What about upgrading from 0.7.1? Can I use my existing db and segments? Piotr Kosiorowski wrote: Hello all, The 0.7.2 release of Nutch is now

Problems Installing

2006-04-02 Thread Paul Stewart
Hi there... I am trying to get nutch running Have done a trial indexing run successfully etc... Now I'm running into issues that may be more Tomcat related than Nutch: HTTP Status 500 - type Exception

RE: Problems Installing

2006-04-02 Thread Dan Morrill
Did you: 1. remove the root.war from tomcat? 2. rename nutch.war to root.war and dump that into webapps under tomcat? 3. did it install ok (can you see the exploded pages under webapps root? Just checking, this is how I fixed the same issue under windows. r/d -Original Message- From:

Merging Nutch crawls under 0.8-dev

2006-04-02 Thread Carl Dorestos
Hi all, I'd appreciate your help with this question. I am using Nutch/Hadoop 0.8 (of 3/31/06). I am using DFS.I want to merge multiple crawls and search the combined content For example, i'd like to be able to: - Crawl 1 million urls into a directory crawlA (with directories segments, crawldb,

RE: Problems Installing

2006-04-02 Thread Paul Stewart
Thanks for the reply... I re-did what you mentioned below It re-installed just fine (I'm running Fedora Core 4 and installed with yum using rpm's) Even when I rename it, I must access it now via http://www.myserver..:8080/root Or else I get a 404 not found... When I try and do a search I

Re: RE: Problems Installing

2006-04-02 Thread José Ramón Pérez Agüera
I think that you must start tomcat since the directory wich contain the directories db/ and segments/, maybe this is the problem. jose José Ramón Pérez Agüera Despacho 411 tlf. 913947599 Dept. de Sistemas Informáticos y Programación Facultad de Informática Universidad Complutense de Madrid

RE: Problems Installing

2006-04-02 Thread sudhendra seshachala
REname the file as ROOT.war (all upper case) Then, http://localhost:8080 should work Paul Stewart [EMAIL PROTECTED] wrote: Thanks for the reply... I re-did what you mentioned below It re-installed just fine (I'm running Fedora Core 4 and installed with yum using rpm's) Even when I rename

Re: hi all

2006-04-02 Thread Andrzej Bialecki
Dan Morrill wrote: Since you are using Luke to see the index, luke may not have the character support built in for non utf-8 character sets (meaning gork when you look at it). I went to the luke site http://www.getopt.org/luke/ to see if they make mention of the character sets they support, but

RE: hi all

2006-04-02 Thread Dan Morrill
Andrzej, Cheers! Good to know. Thanks! r/d -Original Message- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Sunday, April 02, 2006 5:01 PM To: nutch-user@lucene.apache.org Subject: Re: hi all Dan Morrill wrote: Since you are using Luke to see the index, luke may not have the

RE: Tomcat Problem

2006-04-02 Thread Babu, KameshNarayana \(GE, Research, consultant\)
Hey, Check the classpath and ur JSP file. Regards Kamesh -Original Message- From: Paul Stewart [mailto:[EMAIL PROTECTED] Sent: Monday, April 03, 2006 4:25 AM To: nutch-user@lucene.apache.org Subject: Tomcat Problem Sorry if this is slightly off-topic but I'm just trying to get Nutch

RE: Tomcat Problem

2006-04-02 Thread Paul Stewart
Where would I check that? I can check the JSP file by copying the nutch--.war file back over to the webroot and watch it expand etc... But confused and new to tomcat stuff -Original Message- From: Babu, KameshNarayana (GE, Research, consultant) [mailto:[EMAIL PROTECTED] Sent: