I would *strongly* encourage you to store them together as one document. There's no real method of doing DB like joins in the underlying Lucene search engine.
But that's generic advice. The question I have for you is "What's the big deal about coordinating the sources?" That is, you have to have something that allows you to make a 1:1 correspondence between your data sources or you couldn't relate them in the first place. Is it really that onerous to check? If it is, why not build an index and search it when you want to know? Surrounding this question is "How often to you really update data?" If it's once an hour, I submit that you don't care how difficult finding out if there's corresponding data in the other data set. If it's once a second, that may be a different story. You haven't described enough of your problem space for me to render any opinion of whether this is premature optimization or not, but it sure smells like it from a distance <G>... Best Erick On Jan 17, 2008 2:11 AM, Michael Lackhoff <[EMAIL PROTECTED]> wrote: > Hello, > > I have two sources of data for the same "things" to search. It is book > data in a library. First there is the usual bibliographic data (author, > title...) and then I have scanned and OCRed table of contents data about > the same books. Both are updated independently. > Now I don't know how to best index and search this data. > - One option would be to save the data in different records. That would > make updates easy because I don't have to worry about the fields > from the other source. But searching would be more difficult: I have > to do an additional search for every hit in the "contents" data to > get the bibliographic data. > - The other option would be to save everything in one record but then > updates would be difficult. Before I can update a record I must first > look if there is any data from the other source, merge it into the > record and only then update it. This option sounds very time consuming > for a complete reindex. > > The best solution would be some sort of join: Have two records in the > index but always give both in the result no matter where the hit was. > Any ideas on how to best organize this kind of data? > > -Michael > >