Re: Problem with DIH delta-import delete.

2011-01-11 Thread Matti Oinas
Problem was incorrect pk definition on data-config.xml


   

pk attribute needs to be the same as Solr uniqueField, so in my case
changing pk value from id to uuid solved the problem.


2010/12/7 Matti Oinas :
> Thanks Koji.
>
> Problem seems to be that template transformer is not used when delete
> is performed.
>
> ...
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: entry
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll
> INFO: Deleting stale documents
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
> INFO: Deleting document: 787
> Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
> INFO: Deleting document: 786
> ...
>
> There are entries with id 787 and 786 in database and those are marked
> as deleted. Query returns right number of deleted documents and right
> rows from database but delete fails because solr is using plain
> numeric id when deleting document. The same happens with blogs also.
>
> Matti
>
>
> 2010/12/4 Koji Sekiguchi :
>> (10/11/17 20:18), Matti Oinas wrote:
>>>
>>> Solr does not delete documents from index although delta-import says
>>> it has deleted n documents from index. I'm using version 1.4.1.
>>>
>>> The schema looks like
>>>
>>>  
>>>     >> required="true" />
>>>     >> required="true" />
>>>     
>>>     
>>>     
>>>  
>>>  uuid
>>>
>>>
>>> Relevant fields from database tables:
>>>
>>> TABLE: blogs and entries both have
>>>
>>>   Field: id
>>>    Type: int(11)
>>>    Null: NO
>>>     Key: PRI
>>> Default: NULL
>>>   Extra: auto_increment
>>> 
>>>   Field: modified
>>>    Type: datetime
>>>    Null: YES
>>>     Key:
>>> Default: NULL
>>>   Extra:
>>> 
>>>   Field: status
>>>    Type: tinyint(1) unsigned
>>>    Null: YES
>>>     Key:
>>> Default: NULL
>>>   Extra:
>>>
>>>
>>> 
>>> 
>>>        >> driver="com.mysql.jdbc.Driver".../>
>>>        
>>>                >>                                pk="id"
>>>                                query="SELECT id,description,1 as type FROM
>>> blogs WHERE status=2"
>>>                                deltaImportQuery="SELECT id,description,1
>>> as type FROM blogs WHERE
>>> status=2 AND id='${dataimporter.delta.id}'"
>>>                                deltaQuery="SELECT id FROM blogs WHERE
>>> '${dataimporter.last_index_time}'< modified AND status=2"
>>>                                deletedPkQuery="SELECT id FROM blogs WHERE
>>> '${dataimporter.last_index_time}'<= modified AND status=3"
>>>                                transformer="TemplateTransformer">
>>>                        >> template="blog-${blog.id}" />
>>>                        
>>>                        
>>>                        
>>>                
>>>                >>                                pk="id"
>>>                                query="SELECT f.id as
>>> id,f.content,f.blog_id,2 as type FROM
>>> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2"
>>>                                deltaImportQuery="SELECT f.id as
>>> id,f.content,f.blog_id,2 as type
>>> FROM entries f,blogs b WHERE f.blog_id=b.id AND
>>> f.id='${dataimporter.delta.id}'"
>>>                                deltaQuery="SELECT f.id as id FROM entries
>>> f JOIN blogs b ON
>>> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified
>>> AND b.status=2"
>>>                                deletedPkQuery="SELECT f.id as id FROM
>>> entries f JOIN blogs b ON
>>> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
>>> < b.modified"
>>>
>>>  transformer="HTMLStripTransformer,TemplateTransformer">
>>>                        >> template="entry-${entry.id}" />
>>>                        
>>>                        
>>>                        >> stripHTML="true" />
>>>                        
>>>                
>>>        
>>> 
>>>
>>> Full import and delta import works without problems when it comes to
>>> adding new documents to the index but when blog is deleted (status is
>>> set to 3 in database), solr report after delta import is something
>>> like "Indexing completed. Added/Updated: 0 documents. Deleted 81
>>> documents.". The problem is that documents are still found from solr
>>> index.
>>>
>>> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;
>>>
>>> 2. delta-import =>
>>>
>>> 
>>> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
>>> 
>>> 2010-11-17 13:00:50
>>> 2010-11-17 13:00:50
>>>
>>> So solr says it has deleted documents and that index is also optimzed

Re: Problem with DIH delta-import delete.

2010-12-06 Thread Matti Oinas
Thanks Koji.

Problem seems to be that template transformer is not used when delete
is performed.

...
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed ModifiedRowKey for Entity: entry rows obtained : 0
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed DeletedRowKey for Entity: entry rows obtained : 1223
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder
collectDelta
INFO: Completed parentDeltaQuery for Entity: entry
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.DocBuilder deleteAll
INFO: Deleting stale documents
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 787
Dec 7, 2010 7:19:43 AM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: 786
...

There are entries with id 787 and 786 in database and those are marked
as deleted. Query returns right number of deleted documents and right
rows from database but delete fails because solr is using plain
numeric id when deleting document. The same happens with blogs also.

Matti


2010/12/4 Koji Sekiguchi :
> (10/11/17 20:18), Matti Oinas wrote:
>>
>> Solr does not delete documents from index although delta-import says
>> it has deleted n documents from index. I'm using version 1.4.1.
>>
>> The schema looks like
>>
>>  
>>     > required="true" />
>>     > required="true" />
>>     
>>     
>>     
>>  
>>  uuid
>>
>>
>> Relevant fields from database tables:
>>
>> TABLE: blogs and entries both have
>>
>>   Field: id
>>    Type: int(11)
>>    Null: NO
>>     Key: PRI
>> Default: NULL
>>   Extra: auto_increment
>> 
>>   Field: modified
>>    Type: datetime
>>    Null: YES
>>     Key:
>> Default: NULL
>>   Extra:
>> 
>>   Field: status
>>    Type: tinyint(1) unsigned
>>    Null: YES
>>     Key:
>> Default: NULL
>>   Extra:
>>
>>
>> 
>> 
>>        > driver="com.mysql.jdbc.Driver".../>
>>        
>>                >                                pk="id"
>>                                query="SELECT id,description,1 as type FROM
>> blogs WHERE status=2"
>>                                deltaImportQuery="SELECT id,description,1
>> as type FROM blogs WHERE
>> status=2 AND id='${dataimporter.delta.id}'"
>>                                deltaQuery="SELECT id FROM blogs WHERE
>> '${dataimporter.last_index_time}'< modified AND status=2"
>>                                deletedPkQuery="SELECT id FROM blogs WHERE
>> '${dataimporter.last_index_time}'<= modified AND status=3"
>>                                transformer="TemplateTransformer">
>>                        > template="blog-${blog.id}" />
>>                        
>>                        
>>                        
>>                
>>                >                                pk="id"
>>                                query="SELECT f.id as
>> id,f.content,f.blog_id,2 as type FROM
>> entries f,blogs b WHERE f.blog_id=b.id AND b.status=2"
>>                                deltaImportQuery="SELECT f.id as
>> id,f.content,f.blog_id,2 as type
>> FROM entries f,blogs b WHERE f.blog_id=b.id AND
>> f.id='${dataimporter.delta.id}'"
>>                                deltaQuery="SELECT f.id as id FROM entries
>> f JOIN blogs b ON
>> b.id=f.blog_id WHERE '${dataimporter.last_index_time}'< b.modified
>> AND b.status=2"
>>                                deletedPkQuery="SELECT f.id as id FROM
>> entries f JOIN blogs b ON
>> b.id=f.blog_id WHERE b.status!=2 AND '${dataimporter.last_index_time}'
>> < b.modified"
>>
>>  transformer="HTMLStripTransformer,TemplateTransformer">
>>                        > template="entry-${entry.id}" />
>>                        
>>                        
>>                        > stripHTML="true" />
>>                        
>>                
>>        
>> 
>>
>> Full import and delta import works without problems when it comes to
>> adding new documents to the index but when blog is deleted (status is
>> set to 3 in database), solr report after delta import is something
>> like "Indexing completed. Added/Updated: 0 documents. Deleted 81
>> documents.". The problem is that documents are still found from solr
>> index.
>>
>> 1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;
>>
>> 2. delta-import =>
>>
>> 
>> Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.
>> 
>> 2010-11-17 13:00:50
>> 2010-11-17 13:00:50
>>
>> So solr says it has deleted documents and that index is also optimzed
>> and committed after the operation.
>>
>> 3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
>> 80 documents with type 2 (entry).
>>
>
> Hi Matti,
>
> Can you see something like the following "Completed DeletedRowKey for
> Entity"
> and then "Deleting document: ID-1" in your solr log?
>
> (sample messages from my Solr log)
> Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport

Re: Problem with DIH delta-import delete.

2010-12-04 Thread Koji Sekiguchi

(10/11/17 20:18), Matti Oinas wrote:

Solr does not delete documents from index although delta-import says
it has deleted n documents from index. I'm using version 1.4.1.

The schema looks like

  
 
 
 
 
 
  
  uuid


Relevant fields from database tables:

TABLE: blogs and entries both have

   Field: id
Type: int(11)
Null: NO
 Key: PRI
Default: NULL
   Extra: auto_increment

   Field: modified
Type: datetime
Null: YES
 Key:
Default: NULL
   Extra:

   Field: status
Type: tinyint(1) unsigned
Null: YES
 Key:
Default: NULL
   Extra:






















Full import and delta import works without problems when it comes to
adding new documents to the index but when blog is deleted (status is
set to 3 in database), solr report after delta import is something
like "Indexing completed. Added/Updated: 0 documents. Deleted 81
documents.". The problem is that documents are still found from solr
index.

1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;

2. delta-import =>


Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.

2010-11-17 13:00:50
2010-11-17 13:00:50

So solr says it has deleted documents and that index is also optimzed
and committed after the operation.

3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
80 documents with type 2 (entry).



Hi Matti,

Can you see something like the following "Completed DeletedRowKey for Entity"
and then "Deleting document: ID-1" in your solr log?

(sample messages from my Solr log)
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder 
collectDelta
INFO: Completed DeletedRowKey for Entity: product rows obtained : 2
  :
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.DocBuilder deleteAll
INFO: Deleting stale documents
Dec 4, 2010 8:25:40 PM org.apache.solr.handler.dataimport.SolrWriter deleteDoc
INFO: Deleting document: OVEN-2
  :

If you cannot find these messages, I think there is something incorrect
setting (but I couldn't find incorrect ones in your data-config.xml...).

Koji
--
http://www.rondhuit.com/en/


Problem with DIH delta-import delete.

2010-11-17 Thread Matti Oinas
Solr does not delete documents from index although delta-import says
it has deleted n documents from index. I'm using version 1.4.1.

The schema looks like

 





 
 uuid


Relevant fields from database tables:

TABLE: blogs and entries both have

  Field: id
   Type: int(11)
   Null: NO
Key: PRI
Default: NULL
  Extra: auto_increment

  Field: modified
   Type: datetime
   Null: YES
Key:
Default: NULL
  Extra:

  Field: status
   Type: tinyint(1) unsigned
   Null: YES
Key:
Default: NULL
  Extra:






















Full import and delta import works without problems when it comes to
adding new documents to the index but when blog is deleted (status is
set to 3 in database), solr report after delta import is something
like "Indexing completed. Added/Updated: 0 documents. Deleted 81
documents.". The problem is that documents are still found from solr
index.

1. UPDATE blogs SET modified=NOW(),status=3 WHERE id=26;

2. delta-import =>


Indexing completed. Added/Updated: 0 documents. Deleted 81 documents.

2010-11-17 13:00:50
2010-11-17 13:00:50

So solr says it has deleted documents and that index is also optimzed
and committed after the operation.

3. Search; blog_id:26 still returns 1 document with type 1 (blog) and
80 documents with type 2 (entry).