On 8/6/2011 8:49 AM, eks dev wrote:
I would appreciate some clarifications about DIH
I do not have reliable timestamp, but I do have atomic sequence that
only grows on inserts/changes.
I use DIH, but I don't use the built-in timestamp facility at all. I
have an autoincrement field in a MySQL database that tells me what's
new. Here are the three queries I have defined in dih-config.xml:
query="
SELECT * FROM ${dataimporter.request.dataView}
WHERE (
(
did > ${dataimporter.request.minDid}
AND did <= ${dataimporter.request.maxDid}
)
${dataimporter.request.extraWhere}
) AND (crc32(did) % ${dataimporter.request.numShards})
IN (${dataimporter.request.modVal})
"
deltaImportQuery="
SELECT * FROM ${dataimporter.request.dataView}
WHERE (
(
did > ${dataimporter.request.minDid}
AND did <= ${dataimporter.request.maxDid}
)
${dataimporter.request.extraWhere}
) AND (crc32(did) % ${dataimporter.request.numShards})
IN (${dataimporter.request.modVal})
"
deltaQuery="SELECT 1 AS did"
If you look carefully, you'll notice that query and deltaImportQuery are
identical, and deltaQuery is just something that always returns a
value. I keep track of did (the primary key for both dih-config and the
database) in my build system, passing in minDid and maxDid parameters on
the DIH URL to tell it what to index. I include more parameters to
handle sharding and special situations. I actually use a different
field (with it's own unique MySQL index) as Solr's uniqueKey.
Currently Solr does not support keeping track of arbitrary data, just
the current timestamp ... but if you can track it outside of Solr and
pass the appropriate parameters in with the full-import or delta-import
request, you can do almost anything.
This is on Solr 3.2, but I used a similar setup when I was running 1.4.1
as well.
Shawn