Re: [Nutch-dev] RE: [proposal] Generic Markup Language Parser

2005-11-25 Thread Jérôme Charron
> Do we talk about parsing rdf or do we discuss to store parsed html > text in rdf and convert it via xslt to pure text? > I may misunderstand something. I very like the idea of a general rdf > parser. Back in the days i played around with jena.sf.net > Parsing yes, replace nutch sequence file and

Re: [Nutch-dev] RE: [proposal] Generic Markup Language Parser

2005-11-25 Thread Stefan Groschupf
Am 25.11.2005 um 11:30 schrieb Erik Hatcher: On 24 Nov 2005, at 23:49, Chris Mattmann wrote: Dublin core may is good for semantic web, but not for a content storage. I completely disagree with that. Me too. Do we talk about parsing rdf or do we discuss to store parsed html text in rdf

[jira] Created: (NUTCH-129) rtf-parser does not work when opened with wordpad files and saved

2005-11-25 Thread raghavendra prabhu (JIRA)
rtf-parser does not work when opened with wordpad files and saved - Key: NUTCH-129 URL: http://issues.apache.org/jira/browse/NUTCH-129 Project: Nutch Type: Bug Components: indexer Environment: A sam

Re: [proposal] Generic Markup Language Parser

2005-11-25 Thread Piotr Kosiorowski
Hello, I do agree with Andrzej. I do not see it as a solution for for parse-html. But generic XML plugin maybe will have some use for some people (even if not for me). Regards Piotr Andrzej Bialecki wrote: Stefan Groschupf wrote: [...] Gentlemen, please let's keep a civilized tone to this

Re: [proposal] Generic Markup Language Parser

2005-11-25 Thread Andrzej Bialecki
Stefan Groschupf wrote: [...] Gentlemen, please let's keep a civilized tone to this exchange, or take it off the list. I applaud this effort, I can certainly sympathize with its goals - just the other day I struggled with parsing an XML feed into Nutch segments. It would be very welcome to

Re: [Nutch-dev] RE: [proposal] Generic Markup Language Parser

2005-11-25 Thread Erik Hatcher
On 24 Nov 2005, at 23:49, Chris Mattmann wrote: Dublin core may is good for semantic web, but not for a content storage. I completely disagree with that. Me too. In fact, I think many people would disagree with that in fact. Dublin core is a "standard" metadata model for electronic reso