I everybody,
i'm an italian student(informatic engineering) and i'm making
thesis on search engine.
At the end of that i'm trying to configure my university web server
to work with nutch.
But i'm having some problem.
All the site is build in java jsp/jspx and the server is a apache
tomcat reacheble on port 8080.
the site is http://www.alice.unibo.it and the first page redirect the
browser
to http://www.alice.unibo.it:8080/index.jspx
The question's is that i need to index that web site, i configure
nutch on the web server configuring:
-conf/nutch-site.xml
-url/nutch with http://www.alice.unibo.it
-conf/crawl-urlfilter.txt with alice.unibo.it for domain
And nutch cant't do that, i try 1000 way, putting the port in the
filter domain,
putting the redirected url in the url/nutch file, modifying conf/
nutch-default.xml
plugin properties and other.
Nutch work perfectly with other domain without redirection,
but with redirection to 8080 can't fetch pages.
I Hope to find a solution in time to end discuss my thesis.
Thank you all.
Parini Gianni, Bologna, Italy
System env:
nutch 0.8
tomcat 5
java 1.5
System hardware:
MacOs 10.4.7 on Ibook G4
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general