Thanks Shawn, nice! I didn't notice you can pass more params all the way to sql.
So you really do not care about DIH incremental facility, you use it just as vehicle to provide - SQL import - transactional commit to solr on updates... But keeping DB/solr n sync is externalized (I am trying to find simple/robust solution for this part as well...). I am researching possibilities to get this information from lucene index itself, "what was the last document added?" , and than read stored ID field from it to feed DIH query like yours Should be easy question for solr/lucene to do, but I really do not know simple and fast way... cheers, eks On Sat, Aug 6, 2011 at 8:32 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 8/6/2011 8:49 AM, eks dev wrote: >> >> I would appreciate some clarifications about DIH >> >> I do not have reliable timestamp, but I do have atomic sequence that >> only grows on inserts/changes. > > I use DIH, but I don't use the built-in timestamp facility at all. I have > an autoincrement field in a MySQL database that tells me what's new. Here > are the three queries I have defined in dih-config.xml: > > query=" > SELECT * FROM ${dataimporter.request.dataView} > WHERE ( > ( > did > ${dataimporter.request.minDid} > AND did <= ${dataimporter.request.maxDid} > ) > ${dataimporter.request.extraWhere} > ) AND (crc32(did) % ${dataimporter.request.numShards}) > IN (${dataimporter.request.modVal}) > " > deltaImportQuery=" > SELECT * FROM ${dataimporter.request.dataView} > WHERE ( > ( > did > ${dataimporter.request.minDid} > AND did <= ${dataimporter.request.maxDid} > ) > ${dataimporter.request.extraWhere} > ) AND (crc32(did) % ${dataimporter.request.numShards}) > IN (${dataimporter.request.modVal}) > " > deltaQuery="SELECT 1 AS did" > > If you look carefully, you'll notice that query and deltaImportQuery are > identical, and deltaQuery is just something that always returns a value. I > keep track of did (the primary key for both dih-config and the database) in > my build system, passing in minDid and maxDid parameters on the DIH URL to > tell it what to index. I include more parameters to handle sharding and > special situations. I actually use a different field (with it's own unique > MySQL index) as Solr's uniqueKey. > > Currently Solr does not support keeping track of arbitrary data, just the > current timestamp ... but if you can track it outside of Solr and pass the > appropriate parameters in with the full-import or delta-import request, you > can do almost anything. > > This is on Solr 3.2, but I used a similar setup when I was running 1.4.1 as > well. > > Shawn > >