Hi, Lucene is up for an Apache board report this month, and Tika should contribute its part to the report. Here's a quick draft:
<draft> Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Development towards Tika 0.3 is ongoing. Metadata handling and metadata frameworks like XMP have been a source of much discussion, but so far no clear consensus on has been reached on whether or how the metadata features in Tika should be extended. A wiki was created for Tika. </draft> Anything I'm missing/misrepresenting? BR, Jukka Zitting