Andrzej Bialecki wrote:
Gentlemen, please let's keep a civilized tone to this exchange, or take
it off the list.
+1
Doug
On 24 Nov 2005, at 23:49, Chris Mattmann wrote:
Dublin core may is good for semantic web, but not for a content
storage.
I completely disagree with that.
Me too.
In fact, I think many people would disagree
with that in fact. Dublin core is a standard metadata model for
electronic
Am 25.11.2005 um 11:30 schrieb Erik Hatcher:
On 24 Nov 2005, at 23:49, Chris Mattmann wrote:
Dublin core may is good for semantic web, but not for a content
storage.
I completely disagree with that.
Me too.
Do we talk about parsing rdf or do we discuss to store parsed html
text in rdf
Do we talk about parsing rdf or do we discuss to store parsed html
text in rdf and convert it via xslt to pure text?
I may misunderstand something. I very like the idea of a general rdf
parser. Back in the days i played around with jena.sf.net
Parsing yes, replace nutch sequence file and the
Jérôme,
A mail archive is a amazing source of information, isn't it?! :-)
To answer your question, just ask your self how many pages per second
your plan to fetch and parse and how much queries per second a lucene
index is able to handle - and you can deliver in the ui.
I have here
Hi Stefan,
-1!
Xsl is terrible slow!
You have to consider what the XSL will be used for. Our proposal suggests
XSL as a means of intermediate transformation of markup content on the
backend, as Jerome suggested in his reply. This means that whenever markup
content is encountered,
Hi Stefan, and Jerome,
A mail archive is a amazing source of information, isn't it?! :-)
To answer your question, just ask your self how many pages per second
your plan to fetch and parse and how much queries per second a lucene
index is able to handle - and you can deliver in the ui.
I
Correct me if I'm wrong, but isn't log4j used a lot within Nutch? :-)
No, nutch uses java logging, only some plugins use jar that depends
on log4j.
Stefan