[opengrok-dev] archive analysers query

James Hunt Mon, 31 Oct 2011 21:25:12 -0700

Hi,

I'm currently looking at adding support for "ar" archives to OpenGrok. So far, 
it works like the zip
and tar analysers do, but I'd like to take it a stage further. Looking at the 
code I'm rather
confused by the way different archive formats are handled. Here's what I've 
gleaned so far:


File Type   Handling
----------|---------------------------------------------------------------------------
zip         Just index names of all files in archive, but do not index file 
contents.
tar         Just index names of all files in archive, but do not index file 
contents.
gzip        Uncompress stream and pass up to other analysers to index.
bzip2       Uncompress stream and pass up to other analysers to index.

Is there are reason that files within zip and tar files are not indexed? Would 
it be reasonable to
add the file names to the "full" Lucene doc field and then also call 
AnalyzerGuru.getAnalyzer() on
each file contained within such archives (the JavaClassAnalyser appears to be 
adding a "full" field
twice in this fashion)?

Kind regards,

James
--
James Hunt
_______________________________________________
opengrok-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opengrok-dev

[opengrok-dev] archive analysers query

Reply via email to