Hi Rida, I agree totally! You should take a look at the MarkupLanguageProposal (within Nutch http://wiki.apache.org/nutch/MarkupLanguageParserProposal) and the work done in Frutch (http://www.krugle.com/kse/files?query=frutch%20parse%20out) on the ParseXml plugin.
I'd love to chat with you more about this. Let me know what you think. Thanks, Chris On 10/10/07 9:28 AM, "Rida Benjelloun" <[EMAIL PROTECTED]> wrote: > Hi, > Do you think that we should have a XmlOutputter that save the extracted > content and metadata in XML file ? This will simplify integration with other > technologies like Solr for example. > The XmlOutputter will process File (File or Directory recursively) and Url. > Will use XSLT as a filter to masque or display the elements needed and an > output encoding : > Example > TikaXmlOutputter txo = new TikaXmlOutputter() > txo.output(File|URL input, File xmlOutput, File xsltFilter, String > encoding); > > Regards. ______________________________________________ Chris Mattmann, Ph.D. [EMAIL PROTECTED] Cognizant Development Engineer Early Detection Research Network Project _________________________________________________ Jet Propulsion Laboratory Pasadena, CA Office: 171-266B Mailstop: 171-246 _______________________________________________________ Disclaimer: The opinions presented within are my own and do not reflect those of either NASA, JPL, or the California Institute of Technology.
