I upgraded from Solr 3.6.2 to 4.2.1 and I am noticing that my data import
handler's delta import is actually doing a full import.

In my data-config.xml, I have an entity named 'Lists' defined as follows:

-----------------------------------------

<entity name="Lists"
     pk="l.list_id"

     query="SELECT l.list_id AS id, l.user_id, l.name, LOWER(l.name) AS
raw_name, l.description, l.class_id, l.category_id, l.created_on,
l.modified_on, l.first_publish_date, l.last_publish_date,
      lbs.burial_score,
      lo.is_votable, lo.open_list_enabled,
      IF (ISNULL(rlvs.like_count), 0, rlvs.invisilike_count) AS like_count,
      ua.user_name, ua.url_friendly_user_name, ua.is_staff,
IF(ua.permission_level = 0, 1, 0) AS is_admin,
      lt.tagged_date,
      LOWER(slm.property_values) AS source_list_property_value,
      agglvs.total_overall_view AS views
     FROM lists l
     INNER JOIN list_burial_state lbs ON lbs.list_id = l.list_id
     INNER JOIN list_options lo ON lo.list_id = l.list_id
     INNER JOIN user_account ua ON ua.user_id = l.user_id
     LEFT JOIN  list_view_stats rlvs ON rlvs.list_id = l.list_id
     LEFT JOIN  list_tag lt ON lt.list_id = l.list_id AND lt.tag_id =
244948
     LEFT JOIN  source_list_mapping slm ON slm.list_id = l.list_id
     LEFT JOIN  agg_list_view_stats agglvs ON agglvs.list_id = l.list_id
     WHERE l.status = 'ACTIVE' AND l.is_public = 1"

     deltaQuery="SELECT l.list_id FROM lists l
     WHERE l.status = 'ACTIVE' AND l.is_public = 1 AND l.modified_on &gt;
'${dih.Lists.last_index_time}'"

     deletedPkQuery="SELECT l.list_id FROM lists l WHERE (l.status !=
'ACTIVE' OR l.is_public = 0) AND l.modified_on &gt;
'${dih.Lists.last_index_time}'">

 ... followed by bunch of sub-entities ...

-----------------------------------------
dataimport.properties:
-----------------------------------------
#Sat Jun 01 11:06:10 PDT 2013
last_index_time=2013-06-01 11\:05\:18
Lists.last_index_time=2013-06-01 11\:05\:18
-------------------------------------------

When I run the SQL in deltaQuery in the DB, I get only 3 IDs. However, when
I run delta import and see the status it says:
<str name="Total Changed Documents">353501</str>

It is first deleting a lot of documents and then takes as much time as the
full import to complete the indexing.

What changed and how do I get delta import to only index the documents that
got modified after ${dih.Lists.last_index_time}'?

Thanks.

Reply via email to