Hi, Lucene is again up for an Apache board report this month, and Tika should contribute its part to the report. Here's a quick draft:
<draft> Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika 0.3 was released in March, and we are planning to release version 0.4 soon. Tika development continues at a steady pace with no major roadblocks in sight. A Solr-based search feature built and hosted by Lucid Imagination was added to the Tika web site. </draft> Comments/improvements are welcome. BR, Jukka Zitting