Thanks Marcus, your pointers are very helpful. I have looked at BlockJoins. Since there is a 1-to-1 relationship between the pairs of pages I need to process, I think BlockJoins would add unnecessary complexity to the queries. A custom update processor appears to me to be the better option.
I have found a couple of useful examples that may help others tackling similar problems. First, I am going to try using the links-extractor indexing plugin found at https://github.com/jorgelbg/links-extractor to ensure that I have a reference to "Page A" at that time I index "Page B". Second, I am going to start with solr-field-update UpdateRequestProcessor found at https://github.com/guardian/solr-field-update as a template, but will modify the lookup approach to use the inlink from the link extractor. I will still need to build the custom parser for vCard, unless anyone has one they can share. I plan to do this based on ez-vcard found at https://code.google.com/p/ez-vcard/wiki/ReadingVCards#3_Differences_between_Ezvcard_and_reader_classes Plenty to do, but I think you have me headed in the right direction - and certainly seems better than hacking the map/reduce processing in the Nutch indexer. Thanks again -----Original Message----- From: Markus Jelsma [mailto:[email protected]] Sent: Wednesday, November 26, 2014 1:39 PM To: [email protected] Subject: RE: Processing Pages in Pairs Using Solr BlockJoins would probably be the easiest these days unless you really need to process them in Nutch. If you still want to process them simultaneously you can write a custom Solr UpdateRequestProcessor plugin and build the logic there. -----Original message----- > From:Lewis John Mcgibbney <[email protected]> > Sent: Wednesday 26th November 2014 0:10 > To: [email protected] > Subject: Re: Processing Pages in Pairs > > Hi Iain, > > On Tue, Nov 25, 2014 at 2:44 PM, <[email protected]> wrote: > > > > > > > What would you recommend in this situation? Are there other options > > that I am missing? > > > I think that our good friend Markus has previously provided some > insight into the technical implementation of a task which may be > synonymous with what you are trying to achieve. > http://www.mail-archive.com/user%40nutch.apache.org/msg04695.html > Sounds pretty hands on to me, it would be difficult to keep your > version of Nutch up-to-date with trunk if you were doing that. > hth > Lewis >

