Yeah thought of that one too but it still requires each be ordered by Key, in 
which case simultaneous iteration works in one pass I think.

If the DRMs are always sorted by Key you can iterate through each at the same 
time, writing only when you have both fields or know there is a field missing 
from one DRM. If you get the same key you write a combined doc, if you have 
different ones, write out one sided until it catches up to the other.

Every DRM I've examined seems to be ordered by key and I assume that is not an 
artifact of seqdumper. I'm using SequenceFileDirIterator so the part file 
splits aren't a problem.

A m/r join is pretty simple too but I'll go with non-m/r unless there is a 
problem above.

BTW the schema for the Solr csv is:
id,b_b_links,b_a_links
item1,"itemX itemY","itemZ"

am I missing some "normal metadata"?

> On Aug 5, 2013, at 11:05 AM, Ted Dunning <[email protected]> wrote:
> 
> What about just updating the document with the fields?  Have three passes.
> Pass 1 puts the normal meta-data for the item in place.  Pass2 updates
> with data from B'B.  Pass 3 udpates with data from B'A.
> 
> This will cause the entire index to be rewritten more than necessary, but
> it should be fast enough to be a non-issue.
> 

Reply via email to