I would appreciate some clarifications about DIH

I do not have reliable timestamp, but I do have atomic sequence that
only grows on inserts/changes.
You can understand it as a timestamp on some funky timezone not
related to wall clock time, it is integer type.

Is DIH keeping track of the MAX(committed timestamp) or it expects
timestamp in DB to be wall clock time?
If it expects wall clock timestamp, casting integer sequence value to
timestamp (like number of seconds since constant point in time) at
reading time would not work...

Ideally for my case, DIH should keep MAX(whatever_field_specified)...

Maybe in an Idea would be to modify DIH to support passing max(of the
specified field in dih config)  to the Lucene
IndexWriter.commit(Map<String,String> commitUserData)

Later, just read IndexReader.getCommitUserData() and pass it to SQL as
${last.committed.sequence}

This would have charming property, in a master slave setup, to
continue working after master fail over without touching anything....
every slave could overtake at any time

Second question is related to the delta queries as well. I know I have
no deletes/modifications in my DataSource, only additions.
Can I prevent DIH from trying to resolve deletes... my delta is fully
qualified by:
select * from source_table
where my_sequence > ${last.committed.sequence}

I imagine this step takes a lot of time to lookup every document ID in index?

Thanks in advance,
eks

Reply via email to