Re: Problem with DIH delta-import delete.
Problem was incorrect pk definition on data-config.xml pk attribute needs to be the same as Solr uniqueField, so in my case changing pk value from id to uuid solved the problem. 2010/12/7 Matti Oinas : > Thanks Koji. > > Problem seems to be that template transformer is not used when delete > is performed. > > ... > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder > collectDelta > INFO: Completed parentDeltaQuery for Entity: entry > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll > INFO: Deleting stale documents > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc > INFO: Deleting document: 787 > Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc > INFO: Deleting document: 786 > ... > > There are entries with id 787 and 786 in database and those are marked > as deleted. Query returns right number of deleted documents and right > rows from database but delete fails because solr is using plain > numeric id when deleting document. The same happens with blogs also. > > Matti > > > 2010/12/4 Koji Sekiguchi : >> (10/11/17 20:18), Matti Oinas wrote: >>> >>> Solr does not delete documents from index although delta-import says >>> it has deleted n documents from index. I'm using version 1.4.1. >>> >>> The schema looks like >>> >>> >>> >> required="true" /> >>> >> required="true" /> >>> >>> >>> >>> >>> uuid >>> >>> >>> Relevant fields from database tables: >>> >>> TABLE: blogs and entries both have >>> >>> Field: id >>> Type: int(11) >>> Null: NO >>> Key: PRI >>> Default: NULL >>> Extra: auto_increment >>> >>> Field: modified >>> Type: datetime >>> Null: YES >>> Key: >>> Default: NULL >>> Extra: >>> >>> Field: status >>> Type: tinyint(1) unsigned >>> Null: YES >>> Key: >>> Default: NULL >>> Extra: >>> >>> >>> >>> >>> >> driver="com.mysql.jdbc.Driver".../> >>> >>> >> pk="id" >>> query="SELECT id,description,1 as type FROM >>> blogs WHERE status=2" >>> deltaImportQuery="SELECT id,description,1 >>> as type FROM blogs WHERE >>> status=2 AND id='${dataimporter.delta.id}'" >>> deltaQuery="SELECT id FROM blogs WHERE >>> '${dataimporter.last_index_time}'< modified AND status=2" >>> deletedPkQuery="SELECT id FROM blogs WHERE >>> '${dataimporter.last_index_time}'<= modified AND status=3" >>> transformer="TemplateTransformer"> >>> >> template="blog-${blog.id}" /> >>> >>> >>> >>> >>> >> pk="id" >>> query="SELECT f.id as >>> id,f.content,f.blog_id,2 as type FROM >>> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2" >>> deltaImportQuery="SELECT f.id as >>> id,f.content,f.blog_id,2 as type >>> FROM entries f,blogs b WHERE f.blog_id=b.id AND >>> f.id='${dataimporter.delta.id}'" >>> deltaQuery="SELECT f.id as id FROM entries >>> f JOIN blogs b ON >>> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified >>> AND b.status=2" >>> deletedPkQuery="SELECT f.id as id FROM >>> entries f JOIN blogs b ON >>> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' >>> < b.modified" >>> >>> transformer="HTMLStripTransformer,TemplateTransformer"> >>> >> template="entry-${entry.id}" /> >>> >>> >>> >> stripHTML="true" /> >>> >>> >>> >>> >>> >>> Full import and delta import works without problems when it comes to >>> adding new documents to the index but when blog is deleted (status is >>> set to 3 in database), solr report after delta import is something >>> like "Indexing completed. Added/Updated: 0 documents. Deleted 81 >>> documents.". The problem is that documents are still found from solr >>> index. >>> >>> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; >>> >>> 2. delta-import => >>> >>> >>> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. >>> >>> 2010-11-17 13:00:50 >>> 2010-11-17 13:00:50 >>> >>> So solr says it has deleted documents and that index is also optimzed
Re: Problem with DIH delta-import delete.
Thanks Koji. Problem seems to be that template transformer is not used when delete is performed. ... Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed parentDeltaQuery for Entity: entry Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 787 Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: 786 ... There are entries with id 787 and 786 in database and those are marked as deleted. Query returns right number of deleted documents and right rows from database but delete fails because solr is using plain numeric id when deleting document. The same happens with blogs also. Matti 2010/12/4 Koji Sekiguchi : > (10/11/17 20:18), Matti Oinas wrote: >> >> Solr does not delete documents from index although delta-import says >> it has deleted n documents from index. I'm using version 1.4.1. >> >> The schema looks like >> >> >> > required="true" /> >> > required="true" /> >> >> >> >> >> uuid >> >> >> Relevant fields from database tables: >> >> TABLE: blogs and entries both have >> >> Field: id >> Type: int(11) >> Null: NO >> Key: PRI >> Default: NULL >> Extra: auto_increment >> >> Field: modified >> Type: datetime >> Null: YES >> Key: >> Default: NULL >> Extra: >> >> Field: status >> Type: tinyint(1) unsigned >> Null: YES >> Key: >> Default: NULL >> Extra: >> >> >> >> >> > driver="com.mysql.jdbc.Driver".../> >> >> > pk="id" >> query="SELECT id,description,1 as type FROM >> blogs WHERE status=2" >> deltaImportQuery="SELECT id,description,1 >> as type FROM blogs WHERE >> status=2 AND id='${dataimporter.delta.id}'" >> deltaQuery="SELECT id FROM blogs WHERE >> '${dataimporter.last_index_time}'< modified AND status=2" >> deletedPkQuery="SELECT id FROM blogs WHERE >> '${dataimporter.last_index_time}'<= modified AND status=3" >> transformer="TemplateTransformer"> >> > template="blog-${blog.id}" /> >> >> >> >> >> > pk="id" >> query="SELECT f.id as >> id,f.content,f.blog_id,2 as type FROM >> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2" >> deltaImportQuery="SELECT f.id as >> id,f.content,f.blog_id,2 as type >> FROM entries f,blogs b WHERE f.blog_id=b.id AND >> f.id='${dataimporter.delta.id}'" >> deltaQuery="SELECT f.id as id FROM entries >> f JOIN blogs b ON >> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified >> AND b.status=2" >> deletedPkQuery="SELECT f.id as id FROM >> entries f JOIN blogs b ON >> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}' >> < b.modified" >> >> transformer="HTMLStripTransformer,TemplateTransformer"> >> > template="entry-${entry.id}" /> >> >> >> > stripHTML="true" /> >> >> >> >> >> >> Full import and delta import works without problems when it comes to >> adding new documents to the index but when blog is deleted (status is >> set to 3 in database), solr report after delta import is something >> like "Indexing completed. Added/Updated: 0 documents. Deleted 81 >> documents.". The problem is that documents are still found from solr >> index. >> >> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; >> >> 2. delta-import => >> >> >> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. >> >> 2010-11-17 13:00:50 >> 2010-11-17 13:00:50 >> >> So solr says it has deleted documents and that index is also optimzed >> and committed after the operation. >> >> 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and >> 80 documents with type 2 (entry). >> > > Hi Matti, > > Can you see something like the following "Completed DeletedRowKey for > Entity" > and then "Deleting document: ID-1" in your solr log? > > (sample messages from my Solr log) > Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport
Re: Problem with DIH delta-import delete.
(10/11/17 20:18), Matti Oinas wrote: Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like uuid Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like "Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.". The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import => Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. 2010-11-17 13:00:50 2010-11-17 13:00:50 So solr says it has deleted documents and that index is also optimzed and committed after the operation. 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and 80 documents with type 2 (entry). Hi Matti, Can you see something like the following "Completed DeletedRowKey for Entity" and then "Deleting document: ID-1" in your solr log? (sample messages from my Solr log) Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder collectDelta INFO: Completed DeletedRowKey for Entity: product rows obtained : 2 : Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder deleteAll INFO: Deleting stale documents Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter deleteDoc INFO: Deleting document: OVEN-2 : If you cannot find these messages, I think there is something incorrect setting (but I couldn't find incorrect ones in your data-config.xml...). Koji -- http://www.rondhuit.com/en/
Problem with DIH delta-import delete.
Solr does not delete documents from index although delta-import says it has deleted n documents from index. I'm using version 1.4.1. The schema looks like uuid Relevant fields from database tables: TABLE: blogs and entries both have Field: id Type: int(11) Null: NO Key: PRI Default: NULL Extra: auto_increment Field: modified Type: datetime Null: YES Key: Default: NULL Extra: Field: status Type: tinyint(1) unsigned Null: YES Key: Default: NULL Extra: Full import and delta import works without problems when it comes to adding new documents to the index but when blog is deleted (status is set to 3 in database), solr report after delta import is something like "Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.". The problem is that documents are still found from solr index. 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26; 2. delta-import => Indexing completed. Added/Updated: 0 documents. Deleted 81 documents. 2010-11-17 13:00:50 2010-11-17 13:00:50 So solr says it has deleted documents and that index is also optimzed and committed after the operation. 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and 80 documents with type 2 (entry).