Re: Connectors, Parsers, Plugin architecture

Eivind Hasle Amundsen Tue, 16 Jan 2007 07:40:01 -0800

: Solr aims at being an answer to "enterprise needs", by indexing
: structured data for different applications. However I think that many
: enterprises would like to be able to structure information themselves.


thta's exactly what Solr is about: letting a schema creator define
what the structure is, and letting putting data in whatever fields they
want.

Could a future "parser plugin" architecture make sure that the outcomeis in a well-defined format? In this case there could be a step for puredocument processing.

Everything fed into the document processor stage should in other wordsbe in a universal format - complete with source and which parser wasused, of course. From this document, fields could be extracted andcomputed via simple programming to meet the requirements of the schema.

the problem with providing support for unstructured data out of hte box is
that it's got no strucutre :) ... how would Solr know what to do with the
binary data it finds? how would it know what charset to use when reading
thta data? ... assuming it gets character data, how does it know which
strings should go in which fields? how does it know which analyzers to
use?

With regards to the above, this could be handled by the parser, whichcreates the "standard document". This document would also contain metadata relevant to solving these tasks. The document processing stagewould then know which conversion to use.

some code somewhere has to make these decissions ... at the moment that
code needs to be provided by the user and run outside of Solr ... i
suspect it won't be long before much of that code can run inside of Solr
as a plugin, but it will still need to be provided by the user to parse
truely unstructured data.

Yep. But my idea of a "standard document" - wouldn't that help a bit?Don't look at me, I'm just a newbie :)


Eivind

Re: Connectors, Parsers, Plugin architecture

Reply via email to