On 2010-05-09 12:23, Rafael Kubina wrote:
> Hi
> i´m trying to do a full text search on my java souces (.java)   via nutch 
> (1.0), svn and http (mod_dav_svn). 
> other documents  like html are pretty searchable, my sources not.
> currently  the  output ist the following:
> fetching http://s025/svn/java/foo/trunk/src/main/java/Bar.java
> Pre-configured   credentials with scope - host: s025; port: 80; found for 
> url: http://s025/svn/java/foo/trunk/src/main/java/Bar.java
> url:   http://s025/svn/java/foo/trunk/src/main/java/Bar.java;   status 
> code: 200; bytes received: 5829; Content-Length: 5829
> the   content-type for this file is text/plain
> there are no   exceptions, no other problems.
> i really appreciate any help that I   can get. Thanks a lot!

You need to check the following:

* parse_text in your segment (you can dump this with readseg command).
It should contain a plain text content of your file.

* use Luke (www.getopt.org/luke) to examine your Lucene index. You
should be able to retrieve terms coming from your Java documents - use
Reconstruct & Edit in Luke.

Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to