At 6:43 AM -0500 1/11/07, Erik Hatcher wrote: >If all fields are stored, the implementation could simply pull them all into >memory on the Solr side and add the document as if it had been sent entirely >by the client. But, what happens when for un-stored fields?
I'll observe that Luke has a "Reconstruct and Edit" function which displays the indexed values for each field for the selected Document when stored values aren't available... it iterates the entire inverted index and intersects each term position vector with the target Document ID via TermPositions.skipTo(id). While that would be too slow to do on a per-update basis, it might be feasible for an update function if it cached a list of partially defined Documents and only at the end (at closing or whenever the list grew past a defined maximum) did a bulk intersection to find indexed values which are not overridden with new values, with just a single traversal of the index in Term then updated DocID order. Once done the reconstructed Documents could be added and the prior versions deleted. The roadblocks come up when re-adding the indexed values to the index: while the updater can create a new untokenized unstored Field for each indexed value so it is literally re-added, in that case there is no way to externally specify the position offset to match the original. DocumentWriter and the classes it relies on are package-private and final, so no way to interpose there. But an effective hack might be to set the reconstructed Fields to tokenized but specify for those fields a special Analyzer which acts like Keyword Analyzer but looks up the position offset in a table created by the update mechanism and returns it with the token. A little convoluted but probably doable if someone had the time and inclination. - J.J.