On 8/6/2011 8:49 AM, eks dev wrote:
I would appreciate some clarifications about DIH

I do not have reliable timestamp, but I do have atomic sequence that
only grows on inserts/changes.

I use DIH, but I don't use the built-in timestamp facility at all. I have an autoincrement field in a MySQL database that tells me what's new. Here are the three queries I have defined in dih-config.xml:

      query="
        SELECT * FROM ${dataimporter.request.dataView}
        WHERE (
          (
            did > ${dataimporter.request.minDid}
            AND did <= ${dataimporter.request.maxDid}
          )
          ${dataimporter.request.extraWhere}
        ) AND (crc32(did) % ${dataimporter.request.numShards})
          IN (${dataimporter.request.modVal})
        "
      deltaImportQuery="
        SELECT * FROM ${dataimporter.request.dataView}
        WHERE (
          (
            did > ${dataimporter.request.minDid}
            AND did <= ${dataimporter.request.maxDid}
          )
          ${dataimporter.request.extraWhere}
        ) AND (crc32(did) % ${dataimporter.request.numShards})
          IN (${dataimporter.request.modVal})
        "
      deltaQuery="SELECT 1 AS did"

If you look carefully, you'll notice that query and deltaImportQuery are identical, and deltaQuery is just something that always returns a value. I keep track of did (the primary key for both dih-config and the database) in my build system, passing in minDid and maxDid parameters on the DIH URL to tell it what to index. I include more parameters to handle sharding and special situations. I actually use a different field (with it's own unique MySQL index) as Solr's uniqueKey.

Currently Solr does not support keeping track of arbitrary data, just the current timestamp ... but if you can track it outside of Solr and pass the appropriate parameters in with the full-import or delta-import request, you can do almost anything.

This is on Solr 3.2, but I used a similar setup when I was running 1.4.1 as well.

Shawn

Reply via email to