[ts] Re: Delta indexes within multi server deployment (and filtering records)

Joe Simms Mon, 17 Aug 2009 05:21:10 -0700

Hi Jimmy

Thanks alot for your help.

We can confirm that the delayed jobs are being added to the table and  
being processed by the job runner, but I think we have now isolated  
the problem and it's to do with calculating the sphinx_document_id.

The problem is that the document_id was different on the application  
servers to the dbservers, and this was due to the following:

The document_id is calculated as follows:

     def sphinx_document_id
       primary_key_for_sphinx * ThinkingSphinx.indexed_models.size +

ThinkingSphinx 
.indexed_models.index(self.class.source_of_sphinx_index.name)
     end

With regards to the index_models.size:

dbserver:               ThinkingSphinx.indexed_models.size => 14
app servers:            ThinkingSphinx.indexed_models.size => 15

This was because we commented a model's 'define_index' block on the  
dbserver (our fubar).
(Although I believe that this dependency for the document_id does mean  
that you cannot easily add a new model into your index without re- 
indexing the entire index otherwise document id's will become corrupt,  
although I cannot think of another way of doing it  to be honest)

However, there was also another problem:

dbserver:                
ThinkingSphinx 
.indexed_models.index(self.class.source_of_sphinx_index.name) => 2
app servers:             
ThinkingSphinx 
.indexed_models.index(self.class.source_of_sphinx_index.name) => 10

It seems this is because the order of the indexed models array is  
assumed to be the same and for some reason is different on different  
machines, to ensure this was always the same we added sort to  
the :sphinx_document_id method.

ThinkingSphinx 
.indexed_models.sort.index(self.class.source_of_sphinx_index.name)

And this fixed the problem, so now all servers return the same  
sphinx_document_id in runtime.

However, there is now another problem, although each server returns  
the same document_id, it seems that the document_id calculated when  
creating the index was different to the one calculated after the index  
was built!

In the configuration the document id is coded into the sql:" `id` *  
index_model.size + index", in our case the index in the configuration  
file was 10, BUT when obtained on runtime was now 4, so we hunted down  
another use of the indexed_models.index in configuration.rb:146 and  
added a sort to this too (maybe this calculation should be abstracted  
to a helper method).

Thanks for your help, it got us looking along the right track.

Hope this maybe of help to others too.

Thanks

Joe

On 17 Aug 2009, at 02:30, James Healy wrote:

> I think I can shed some light on what might be happening, although I
> don't have a clear solution.
>
> When your product instance is updated and added to the Delta index, it
> remains in the core index. The core index has a sphinx_deleted  
> attribute
> that TS uses to identify core index records that should be ignored.  
> Once
> a record is in the delta index, the sphinx_deleted attribute in the
> matching core index record needs to be toggled on.
>
> If this fails, your product will be returned if a search matches it in
> the core OR delta index.
>
> For delayed deltas, the sphinx_deleted toggle is handled by an
> additional delayed job, defined here:
>
>  lib/thinking_sphinx/deltas/delayed_delta/flag_as_deleted_job.rb
>
> Can you verify that the task is being added to the queue and executing
> correctly?
>
> -- James Healy <jimmy-at-deefa-dot-com>  Mon, 17 Aug 2009 11:19:59  
> +1000
>
> joesimms wrote:
>>
>> We have an issue running delta indexes and then filtering with
>> attributes.
>>
>> I present our configuration below:
>>
>> class Product
>>  define_index do
>>     # State can be 'in_stock', 'out_of_stock'
>>     has "CRC32(products.state)", :type => :integer, :as => :state
>>  end
>> end
>>
>> We have a multi application server deployment and are using delayed
>> delta's.
>> Our database sever, sphinxd and job runner all run from the same
>> machine (dbserver).
>> Our application servers DO NOT run searchd and do not have access to
>> the index files on the dbserver. We assume that as the delta indexing
>> is done via the job runner directly on the dbserver this is not
>> required.
>>
>> Our production.rd contains:
>> ThinkingSphinx.remote_sphinx = true
>>
>> and sphinx.yml points the address to the dbserver where searchd is
>> running:
>> production:
>>  address:  dbserver
>>
>> The problem we experience is the following:
>>
>> Product.search('', :with => {:state => Zlib.crc32('in_stock')}) let's
>> say returns records with id 1,2,3
>>
>> Then we update the state of 1
>>
>> Product.find(1).update_attribute(:state => 'out_of_stock')
>>
>> Once the job runner has run (next to no delay) we can run the
>> following and id:1 is returned, so the delta has worked correctly,
>> these results are returned from all app servers:
>>
>> Product.search('', :with => {:state => Zlib.crc32('out_of_stock')})
>>
>> BUT
>>
>> The record is also still present in the following resultset
>> Product.search('', :with => {:state => Zlib.crc32('in_stock')})
>>
>> This problem only exists in our production configuration outlined
>> above, when we perform these test on a single machine in dev then
>> there are no problems.
>>
>> Any help would much appreciated.
>>
>> Many thanks
>>
>> Joe
>>
>>
>>
>>
>>
>> >>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

[ts] Re: Delta indexes within multi server deployment (and filtering records)

Reply via email to