On 03/11/2010 16:36, Nils Breunese wrote:
Weston Ruter wrote:
Specifically, I'm looking at books that are in a constant flux, i.e. books that
are being edited. The application here is for Bible translations in particular,
where each word token needs to be keyed into other metadata, like link to
source word, insertion datetime, translator, etc. Now that I think of it, in
order to be referencable, each token would have to exist as a separate document
anyway since parts of documents aren't indexed by ID, I wouldn't think.
That's right. You'll definitely want to use a document per token here.
I'm not sure this is right. It appears most odd to treat a book that is
being translated as a sequence of words and symbols. I would expect the
translator to translate whole sentences, or paragraphs at a time. For
the Bible, isn't the obvious choice the verse? This would imply two
document types....
Verses - this contains a list of dictionaries - one for each token.
Each dictionary contains the token and the notes about that token. Might
use an ordered Dictionary and make the token the key. From this, the
source and target texts can be created. Each dictionary can point to
lexicon entries and carry translation notes, dates times, translators etc.
Lexicon - each entry is the meaning of a word, in the context in which
it is used. One entry may be referenced in many many places.
Translation notes would record data about inferences and implications to
ensure the correct meaning is chosen.
I rather suspect that notes about the source or target language words
and how they have been translated, would be almost meaningless if
separated from the context of the verse.
If verses are given a key computed from Book No, Chapter No, and Verse
No, then a view that presents the verses in the correct order is trivial
to construct. If there are situations where verses need to be
re-ordered, then you need two views and two Verse Nos (one for each
language) so you can build the correct keys.
As I mentioned above, metadata and related data are both going to be externally
attached to each token at various sources, so each token needs to referenced
by ID. This fact alone invalidates a single-document approach because parts of
a document can't be linked to, correct?
A list of dictionaries that include the token, and data about the token,
will avoid this problem.
You will have the user interface problem of presenting a verse with
words in one order, and receiving it back with new words in a new order.
How do you get the program to match up the right notes with the right
words?
Regards
Ian