Thanks Shawn, nice! I didn't notice you can pass more params all the
way to sql.

So you really do not care about DIH incremental facility,  you use it
just as vehicle to provide
- SQL import
- transactional commit to solr on updates...

But keeping DB/solr n sync is externalized (I am trying to find
simple/robust solution for this part as well...).

I am researching possibilities to get this information from lucene
index itself,  "what was the last document added?" , and than read
stored ID field from it to feed DIH query like yours

Should be easy question for solr/lucene to do, but I really do not
know simple and fast way...


cheers,
eks


On Sat, Aug 6, 2011 at 8:32 PM, Shawn Heisey <s...@elyograg.org> wrote:
> On 8/6/2011 8:49 AM, eks dev wrote:
>>
>> I would appreciate some clarifications about DIH
>>
>> I do not have reliable timestamp, but I do have atomic sequence that
>> only grows on inserts/changes.
>
> I use DIH, but I don't use the built-in timestamp facility at all.  I have
> an autoincrement field in a MySQL database that tells me what's new.  Here
> are the three queries I have defined in dih-config.xml:
>
>      query="
>        SELECT * FROM ${dataimporter.request.dataView}
>        WHERE (
>          (
>            did &gt; ${dataimporter.request.minDid}
>            AND did &lt;= ${dataimporter.request.maxDid}
>          )
>          ${dataimporter.request.extraWhere}
>        ) AND (crc32(did) % ${dataimporter.request.numShards})
>          IN (${dataimporter.request.modVal})
>        "
>      deltaImportQuery="
>        SELECT * FROM ${dataimporter.request.dataView}
>        WHERE (
>          (
>            did &gt; ${dataimporter.request.minDid}
>            AND did &lt;= ${dataimporter.request.maxDid}
>          )
>          ${dataimporter.request.extraWhere}
>        ) AND (crc32(did) % ${dataimporter.request.numShards})
>          IN (${dataimporter.request.modVal})
>        "
>      deltaQuery="SELECT 1 AS did"
>
> If you look carefully, you'll notice that query and deltaImportQuery are
> identical, and deltaQuery is just something that always returns a value.  I
> keep track of did (the primary key for both dih-config and the database) in
> my build system, passing in minDid and maxDid parameters on the DIH URL to
> tell it what to index.  I include more parameters to handle sharding and
> special situations.  I actually use a different field (with it's own unique
> MySQL index) as Solr's uniqueKey.
>
> Currently Solr does not support keeping track of arbitrary data, just the
> current timestamp ... but if you can track it outside of Solr and pass the
> appropriate parameters in with the full-import or delta-import request, you
> can do almost anything.
>
> This is on Solr 3.2, but I used a similar setup when I was running 1.4.1 as
> well.
>
> Shawn
>
>

Reply via email to