Hello,
I use the latest Subversion checkout and Jetty instead of Tomcat,
besides that my ENV is the same and my crawling works.


calling the crawler:

"E:\lenya_workspace\lenya\tools\bin\ant.bat" -f 
/E:/lenya_workspace/lenya/build/lenya/webapp/lenya/bin/crawl_and_index.xml
-Dcrawler.xconf=/E:/lenya_workspace/lenya/build/lenya/webapp/lenya/pubs/default/config/search/crawler-live.xconf
 crawl


the output looks like this:

init:
     [echo] INFO: Init
crawl:
     [echo] INFO: Crawl and dump hypertext documents 
(/E:/lenya_workspace/lenya/build/lenya/webapp/lenya/pubs/default/config/search/crawler-live.xconf)
     [echo] INFO: Show configuration
     [java] log4j:WARN No appenders could be found for logger 
(org.apache.lenya.xml.DOMUtil).
     [java] log4j:WARN Please initialize the log4j system properly.
     [java] Crawler Config: Base URL: 
http://127.0.0.1:8888/default/live/index.html
     [java] Crawler Config: Scope URL: http://127.0.0.1:8888/default/live/
     [java] Crawler Config: User Agent: lenya
     [java] Crawler Config: URI List: 
E:\lenya_workspace\lenya\build\lenya\webapp\lenya\pubs\default\work\search\lucene\uris.txt
 (../../work/search/lucene/uris.txt)
     [java] Crawler Config: HTDocs Dump Dir: 
E:\lenya_workspace\lenya\build\lenya\webapp\lenya\pubs\default\work\search\lucene\htdocs_dump
 (../../work/search/lucene/htdocs_dump)
     [java] Crawler Config: Robots File: 
E:\lenya_workspace\lenya\build\lenya\webapp\lenya\pubs\default\config\search\robots.txt
 (robots.txt)
     [java] Crawler Config: Robots Domain: 127.0.0.1
     [echo] INFO: START crawling ...
     [java] log4j:WARN No appenders could be found for logger 
(org.apache.lenya.xml.DOMUtil).
     [java] log4j:WARN Please initialize the log4j system properly.
     [echo] INFO: Crawling DONE

As you can see I was too lazy to install the log4j.properties so far.
I am using the ant that comes with Lenya and absolute unix-style pathnames
for the parameters when calling the crawler.
Maybe with the latest Subversion Checkout this works for you too :-/


Michael

Franz Ruebe wrote:
Hi,

My ENV:
lenya 1.2.3, cocoon 2.1.7, tomcat 5.0.28, j2sdk1.4.2, WinXP

Still got a problem with crawling or indexing.
I try to implement lucene in my publication, but I'm not getting it to work, so help is welcome.

To avoid that faults in my publication are the reason for the problem, I tried to crawl the default publication.

I modified the crawler-live.xconf in the default-pub to fit the tomcat-build:
<crawler>
 <user-agent>lenya</user-agent>

 <base-url href="http://127.0.0.1:8080/lenya/default/live/index.html"/>
 <scope-url href="http://127.0.0.1:8080/lenya/default/live/"/>

 <uri-list src="../../work/search/lucene/uris.txt"/>
 <htdocs-dump-dir src="../../work/search/lucene/htdocs_dump"/>

 <robots src="robots.txt" domain="127.0.0.1"/>
</crawler>

The robots.txt is there and I allowed user-agent lenya everything.

Result:
Buildfile: crawl_and_index.xml
init:    [echo] INFO: Init
crawl: [echo] INFO: Crawl and dump hypertext documents (../pubs/default/config/search/crawler-live.xconf)
    [echo] INFO: Show configuration
[java] Crawler Config: Base URL: http://127.0.0.1:8080/lenya/default/live/index.html [java] Crawler Config: Scope URL: http://127.0.0.1:8080/lenya/default/live/
    [java] Crawler Config: User Agent: lenya
[java] Crawler Config: URI List: ../pubs/default/work/search/lucene/uris.txt (../../work/search/lucene/uris.txt) [java] Crawler Config: HTDocs Dump Dir: ../pubs/default/work/search/lucene/htdocs_dump (../../work/search/lucene/htdocs_dump) [java] Crawler Config: Robots File: ../pubs/default/config/search/robots.txt (robots.txt)
    [java] Crawler Config: Robots Domain: 127.0.0.1
    [echo] INFO: START crawling ...
[java] java.lang.StringIndexOutOfBoundsException: String index out of range: -1 [java] at org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:180)
    [java]     at org.apache.tools.ant.taskdefs.Java.run(Java.java:710)
[java] at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:178)
    [java]     at org.apache.tools.ant.taskdefs.Java.execute(Java.java:84)

A lucene.log is written, but it's empty...

What I did before:

I installed ant 1.6.5 (for the tomcat-build, lenya does'nt ship a ant.bat like in the win installer binaries). I read the crawl_and_index.xml and there are pathelements pointing to jars in the web-inf/lib directories. Uh-huh!

http://lenya.apache.org/1_2_x/installation/source_version.html:

You must then validate that no other instances of these libraries exist in any of the following directories:
[...]
* Any other location in your Lenya deployment. Specifically, check webapps/lenya/WEB-INF/lib/.

So where should this files be? In the tomcat-5.0.28\common\endorsed dir or in the web-inf/lib? I put them back in the web-inf/lib, many other errors disappeared and the above described error came.

Because of having this problem already with the binaries, I had a thread already for this, and solprovider wrote, that it could be, that there are to manys "../" in my path or something like that (I did not really understand it), so I tried absolute and relative paths with and without backslashes, but nothing changes.

Does anybody use a similar env like me and crawling works? Could it be a win-specific problem?
Any ideas?

Please consider simple answer for simple mind.

Thanks in advance

Franz

_________________________________________________________________
Eine f�r alle. MSN Suche. http://search.msn.de Finden statt suchen!


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to