Hi,
My ENV:
lenya 1.2.3, cocoon 2.1.7, tomcat 5.0.28, j2sdk1.4.2, WinXP
Still got a problem with crawling or indexing.
I try to implement lucene in my publication, but I'm not getting it to work,
so help is welcome.
To avoid that faults in my publication are the reason for the problem, I
tried to crawl the default publication.
I modified the crawler-live.xconf in the default-pub to fit the
tomcat-build:
<crawler>
<user-agent>lenya</user-agent>
<base-url href="http://127.0.0.1:8080/lenya/default/live/index.html"/>
<scope-url href="http://127.0.0.1:8080/lenya/default/live/"/>
<uri-list src="../../work/search/lucene/uris.txt"/>
<htdocs-dump-dir src="../../work/search/lucene/htdocs_dump"/>
<robots src="robots.txt" domain="127.0.0.1"/>
</crawler>
The robots.txt is there and I allowed user-agent lenya everything.
Result:
Buildfile: crawl_and_index.xml
init: [echo] INFO: Init
crawl: [echo] INFO: Crawl and dump hypertext documents
(../pubs/default/config/search/crawler-live.xconf)
[echo] INFO: Show configuration
[java] Crawler Config: Base URL:
http://127.0.0.1:8080/lenya/default/live/index.html
[java] Crawler Config: Scope URL:
http://127.0.0.1:8080/lenya/default/live/
[java] Crawler Config: User Agent: lenya
[java] Crawler Config: URI List:
../pubs/default/work/search/lucene/uris.txt
(../../work/search/lucene/uris.txt)
[java] Crawler Config: HTDocs Dump Dir:
../pubs/default/work/search/lucene/htdocs_dump
(../../work/search/lucene/htdocs_dump)
[java] Crawler Config: Robots File:
../pubs/default/config/search/robots.txt (robots.txt)
[java] Crawler Config: Robots Domain: 127.0.0.1
[echo] INFO: START crawling ...
[java] java.lang.StringIndexOutOfBoundsException: String index out of
range: -1
[java] at
org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:180)
[java] at org.apache.tools.ant.taskdefs.Java.run(Java.java:710)
[java] at
org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:178)
[java] at org.apache.tools.ant.taskdefs.Java.execute(Java.java:84)
A lucene.log is written, but it's empty...
What I did before:
I installed ant 1.6.5 (for the tomcat-build, lenya does'nt ship a ant.bat
like in the win installer binaries). I read the crawl_and_index.xml and
there are pathelements pointing to jars in the web-inf/lib directories.
Uh-huh!
http://lenya.apache.org/1_2_x/installation/source_version.html:
You must then validate that no other instances of these libraries exist in
any of the following directories:
[...]
* Any other location in your Lenya deployment. Specifically, check
webapps/lenya/WEB-INF/lib/.
So where should this files be? In the tomcat-5.0.28\common\endorsed dir or
in the web-inf/lib?
I put them back in the web-inf/lib, many other errors disappeared and the
above described error came.
Because of having this problem already with the binaries, I had a thread
already for this, and solprovider wrote, that it could be, that there are to
manys "../" in my path or something like that (I did not really understand
it), so I tried absolute and relative paths with and without backslashes,
but nothing changes.
Does anybody use a similar env like me and crawling works? Could it be a
win-specific problem?
Any ideas?
Please consider simple answer for simple mind.
Thanks in advance
Franz
_________________________________________________________________
Eine f�r alle. MSN Suche. http://search.msn.de Finden statt suchen!
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]