hi all:
i get a big problem when crawl the ftp.
it seems that Nutch couldn't parse or index the files named in Chinese
so after the command looks like:
bin/nutch crawl urls.txt -dir test.dir
(i've modified the crawl-urlfilter.txt)
# skip file:, ftp:, mailto: urls
Berlin,
Sorry about the delay - I have dumped my entire experience on my blog
http://infosecandpolitics.blogspot.com including shell scripts, merging,
whole web crawls and the rest of the lot. The shell script was posted on
Thursday on the blog, and this morning was a wrap up of getting the
On 3/5/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
-
This email was sent using SquirrelMail.
Webmail for nuts!
http://squirrelmail.org/
--
This message has been scanned for viruses and
dangerous content and is believed to be clean.
thx for advice!
now i know what's up.
but my OS is WinXp(CHINESE), it supports Chinese very well. and i used the
LUKE to see the index, ant there are messy character when crawl the Chinese
webs.
so ,how can i deal with it??
any reply will be appreciated.
On 4/2/06, Dan Morrill [EMAIL
Kauu,
Are you using the simplified Chinese character localaization package for
windows XP, or are you using the non simplied UTF version? You might need an
IME from here
http://www.microsoft.com/windows/ie/downloads/recommended/ime/default.mspx
That may help out.
Since you are using Luke to
What about upgrading from 0.7.1? Can I use my existing db and segments?
Piotr Kosiorowski wrote:
Hello all,
The 0.7.2 release of Nutch is now available. This is a bug fix release
for 0.7 branch. See CHANGES.txt
Yes. Correct link is
http://svn.apache.org/viewcvs.cgi/lucene/nutch/branches/branch-0.7/CHANGES.txt?rev=390158
It was used on the Web site but I made a mistake while pasting it into email
(I used the one for 0.7.1 release).
Thanks for spotting it.
Regrads
Piotr
On 4/1/06, TDLN [EMAIL PROTECTED]
The 0.7.2 release should work without problems with 0.7.1 data.
Regards
Piotr
On 4/2/06, Håvard W. Kongsgård [EMAIL PROTECTED] wrote:
What about upgrading from 0.7.1? Can I use my existing db and segments?
Piotr Kosiorowski wrote:
Hello all,
The 0.7.2 release of Nutch is now
Hi there...
I am trying to get nutch running Have done a trial indexing run
successfully etc...
Now I'm running into issues that may be more Tomcat related than Nutch:
HTTP Status 500 -
type Exception
Did you:
1. remove the root.war from tomcat?
2. rename nutch.war to root.war and dump that into webapps under tomcat?
3. did it install ok (can you see the exploded pages under webapps root?
Just checking, this is how I fixed the same issue under windows.
r/d
-Original Message-
From:
Hi all,
I'd appreciate your help with this question. I am using Nutch/Hadoop 0.8 (of
3/31/06). I am using DFS.I want to merge multiple crawls and search the
combined content
For example, i'd like to be able to:
- Crawl 1 million urls into a directory crawlA (with directories segments,
crawldb,
Thanks for the reply...
I re-did what you mentioned below It re-installed just fine (I'm
running Fedora Core 4 and installed with yum using rpm's)
Even when I rename it, I must access it now via
http://www.myserver..:8080/root
Or else I get a 404 not found...
When I try and do a search I
I think that you must start tomcat since the directory wich contain the
directories db/ and segments/, maybe this is the problem.
jose
José Ramón Pérez Agüera
Despacho 411 tlf. 913947599
Dept. de Sistemas Informáticos y Programación
Facultad de Informática
Universidad Complutense de Madrid
REname the file as ROOT.war (all upper case)
Then, http://localhost:8080 should work
Paul Stewart [EMAIL PROTECTED] wrote: Thanks for the reply...
I re-did what you mentioned below It re-installed just fine (I'm
running Fedora Core 4 and installed with yum using rpm's)
Even when I rename
Dan Morrill wrote:
Since you are using Luke to see the index, luke may not have the character
support built in for non utf-8 character sets (meaning gork when you look at
it). I went to the luke site http://www.getopt.org/luke/ to see if they make
mention of the character sets they support, but
Andrzej,
Cheers! Good to know. Thanks!
r/d
-Original Message-
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED]
Sent: Sunday, April 02, 2006 5:01 PM
To: nutch-user@lucene.apache.org
Subject: Re: hi all
Dan Morrill wrote:
Since you are using Luke to see the index, luke may not have the
Hey,
Check the classpath and ur JSP file.
Regards
Kamesh
-Original Message-
From: Paul Stewart [mailto:[EMAIL PROTECTED]
Sent: Monday, April 03, 2006 4:25 AM
To: nutch-user@lucene.apache.org
Subject: Tomcat Problem
Sorry if this is slightly off-topic but I'm just trying to get Nutch
Where would I check that? I can check the JSP file by copying the
nutch--.war file back over to the webroot and watch it expand etc... But
confused and new to tomcat stuff
-Original Message-
From: Babu, KameshNarayana (GE, Research, consultant)
[mailto:[EMAIL PROTECTED]
Sent:
18 matches
Mail list logo