Hello,
I use the latest Subversion checkout and Jetty instead of Tomcat,
besides that my ENV is the same and my crawling works.
calling the crawler:
"E:\lenya_workspace\lenya\tools\bin\ant.bat" -f
/E:/lenya_workspace/lenya/build/lenya/webapp/lenya/bin/crawl_and_index.xml
-Dcrawler.xconf=/E:/lenya_workspace/lenya/build/lenya/webapp/lenya/pubs/default/config/search/crawler-live.xconf
crawl
the output looks like this:
init:
[echo] INFO: Init
crawl:
[echo] INFO: Crawl and dump hypertext documents
(/E:/lenya_workspace/lenya/build/lenya/webapp/lenya/pubs/default/config/search/crawler-live.xconf)
[echo] INFO: Show configuration
[java] log4j:WARN No appenders could be found for logger
(org.apache.lenya.xml.DOMUtil).
[java] log4j:WARN Please initialize the log4j system properly.
[java] Crawler Config: Base URL:
http://127.0.0.1:8888/default/live/index.html
[java] Crawler Config: Scope URL: http://127.0.0.1:8888/default/live/
[java] Crawler Config: User Agent: lenya
[java] Crawler Config: URI List:
E:\lenya_workspace\lenya\build\lenya\webapp\lenya\pubs\default\work\search\lucene\uris.txt
(../../work/search/lucene/uris.txt)
[java] Crawler Config: HTDocs Dump Dir:
E:\lenya_workspace\lenya\build\lenya\webapp\lenya\pubs\default\work\search\lucene\htdocs_dump
(../../work/search/lucene/htdocs_dump)
[java] Crawler Config: Robots File:
E:\lenya_workspace\lenya\build\lenya\webapp\lenya\pubs\default\config\search\robots.txt
(robots.txt)
[java] Crawler Config: Robots Domain: 127.0.0.1
[echo] INFO: START crawling ...
[java] log4j:WARN No appenders could be found for logger
(org.apache.lenya.xml.DOMUtil).
[java] log4j:WARN Please initialize the log4j system properly.
[echo] INFO: Crawling DONE
As you can see I was too lazy to install the log4j.properties so far.
I am using the ant that comes with Lenya and absolute unix-style pathnames
for the parameters when calling the crawler.
Maybe with the latest Subversion Checkout this works for you too :-/
Michael
Franz Ruebe wrote:
Hi,
My ENV:
lenya 1.2.3, cocoon 2.1.7, tomcat 5.0.28, j2sdk1.4.2, WinXP
Still got a problem with crawling or indexing.
I try to implement lucene in my publication, but I'm not getting it to
work, so help is welcome.
To avoid that faults in my publication are the reason for the problem, I
tried to crawl the default publication.
I modified the crawler-live.xconf in the default-pub to fit the
tomcat-build:
<crawler>
<user-agent>lenya</user-agent>
<base-url href="http://127.0.0.1:8080/lenya/default/live/index.html"/>
<scope-url href="http://127.0.0.1:8080/lenya/default/live/"/>
<uri-list src="../../work/search/lucene/uris.txt"/>
<htdocs-dump-dir src="../../work/search/lucene/htdocs_dump"/>
<robots src="robots.txt" domain="127.0.0.1"/>
</crawler>
The robots.txt is there and I allowed user-agent lenya everything.
Result:
Buildfile: crawl_and_index.xml
init: [echo] INFO: Init
crawl: [echo] INFO: Crawl and dump hypertext documents
(../pubs/default/config/search/crawler-live.xconf)
[echo] INFO: Show configuration
[java] Crawler Config: Base URL:
http://127.0.0.1:8080/lenya/default/live/index.html
[java] Crawler Config: Scope URL:
http://127.0.0.1:8080/lenya/default/live/
[java] Crawler Config: User Agent: lenya
[java] Crawler Config: URI List:
../pubs/default/work/search/lucene/uris.txt
(../../work/search/lucene/uris.txt)
[java] Crawler Config: HTDocs Dump Dir:
../pubs/default/work/search/lucene/htdocs_dump
(../../work/search/lucene/htdocs_dump)
[java] Crawler Config: Robots File:
../pubs/default/config/search/robots.txt (robots.txt)
[java] Crawler Config: Robots Domain: 127.0.0.1
[echo] INFO: START crawling ...
[java] java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
[java] at
org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:180)
[java] at org.apache.tools.ant.taskdefs.Java.run(Java.java:710)
[java] at
org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:178)
[java] at org.apache.tools.ant.taskdefs.Java.execute(Java.java:84)
A lucene.log is written, but it's empty...
What I did before:
I installed ant 1.6.5 (for the tomcat-build, lenya does'nt ship a
ant.bat like in the win installer binaries). I read the
crawl_and_index.xml and there are pathelements pointing to jars in the
web-inf/lib directories. Uh-huh!
http://lenya.apache.org/1_2_x/installation/source_version.html:
You must then validate that no other instances of these libraries exist
in any of the following directories:
[...]
* Any other location in your Lenya deployment. Specifically, check
webapps/lenya/WEB-INF/lib/.
So where should this files be? In the tomcat-5.0.28\common\endorsed dir
or in the web-inf/lib?
I put them back in the web-inf/lib, many other errors disappeared and
the above described error came.
Because of having this problem already with the binaries, I had a thread
already for this, and solprovider wrote, that it could be, that there
are to manys "../" in my path or something like that (I did not really
understand it), so I tried absolute and relative paths with and without
backslashes, but nothing changes.
Does anybody use a similar env like me and crawling works? Could it be a
win-specific problem?
Any ideas?
Please consider simple answer for simple mind.
Thanks in advance
Franz
_________________________________________________________________
Eine f�r alle. MSN Suche. http://search.msn.de Finden statt suchen!
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]