Thanks for the debugging work Joe - indexed_models definitely needs to  
be rethought (and lazily-loaded, instead of on the environment  
initalisation), so it's good to know that sorting is pretty critical.  
Will keep that in mind when I can find some time to work on it.

Cheers

-- 
Pat

On 17/08/2009, at 1:20 PM, Joe Simms wrote:

>
> Hi Jimmy
>
> Thanks alot for your help.
>
> We can confirm that the delayed jobs are being added to the table and
> being processed by the job runner, but I think we have now isolated
> the problem and it's to do with calculating the sphinx_document_id.
>
> The problem is that the document_id was different on the application
> servers to the dbservers, and this was due to the following:
>
> The document_id is calculated as follows:
>
>     def sphinx_document_id
>       primary_key_for_sphinx * ThinkingSphinx.indexed_models.size +
>
> ThinkingSphinx
> .indexed_models.index(self.class.source_of_sphinx_index.name)
>     end
>
> With regards to the index_models.size:
>
> dbserver:             ThinkingSphinx.indexed_models.size => 14
> app servers:          ThinkingSphinx.indexed_models.size => 15
>
> This was because we commented a model's 'define_index' block on the
> dbserver (our fubar).
> (Although I believe that this dependency for the document_id does mean
> that you cannot easily add a new model into your index without re-
> indexing the entire index otherwise document id's will become corrupt,
> although I cannot think of another way of doing it  to be honest)
>
> However, there was also another problem:
>
> dbserver:             
> ThinkingSphinx
> .indexed_models.index(self.class.source_of_sphinx_index.name) => 2
> app servers:          
> ThinkingSphinx
> .indexed_models.index(self.class.source_of_sphinx_index.name) => 10
>
> It seems this is because the order of the indexed models array is
> assumed to be the same and for some reason is different on different
> machines, to ensure this was always the same we added sort to
> the :sphinx_document_id method.
>
> ThinkingSphinx
> .indexed_models.sort.index(self.class.source_of_sphinx_index.name)
>
> And this fixed the problem, so now all servers return the same
> sphinx_document_id in runtime.
>
> However, there is now another problem, although each server returns
> the same document_id, it seems that the document_id calculated when
> creating the index was different to the one calculated after the index
> was built!
>
> In the configuration the document id is coded into the sql:" `id` *
> index_model.size + index", in our case the index in the configuration
> file was 10, BUT when obtained on runtime was now 4, so we hunted down
> another use of the indexed_models.index in configuration.rb:146 and
> added a sort to this too (maybe this calculation should be abstracted
> to a helper method).
>
> Thanks for your help, it got us looking along the right track.
>
> Hope this maybe of help to others too.
>
> Thanks
>
> Joe
>
>
> On 17 Aug 2009, at 02:30, James Healy wrote:
>
>> I think I can shed some light on what might be happening, although I
>> don't have a clear solution.
>>
>> When your product instance is updated and added to the Delta index,  
>> it
>> remains in the core index. The core index has a sphinx_deleted
>> attribute
>> that TS uses to identify core index records that should be ignored.
>> Once
>> a record is in the delta index, the sphinx_deleted attribute in the
>> matching core index record needs to be toggled on.
>>
>> If this fails, your product will be returned if a search matches it  
>> in
>> the core OR delta index.
>>
>> For delayed deltas, the sphinx_deleted toggle is handled by an
>> additional delayed job, defined here:
>>
>> lib/thinking_sphinx/deltas/delayed_delta/flag_as_deleted_job.rb
>>
>> Can you verify that the task is being added to the queue and  
>> executing
>> correctly?
>>
>> -- James Healy <jimmy-at-deefa-dot-com>  Mon, 17 Aug 2009 11:19:59  
>> +1000
>>
>> joesimms wrote:
>>>
>>> We have an issue running delta indexes and then filtering with
>>> attributes.
>>>
>>> I present our configuration below:
>>>
>>> class Product
>>> define_index do
>>>    # State can be 'in_stock', 'out_of_stock'
>>>    has "CRC32(products.state)", :type => :integer, :as => :state
>>> end
>>> end
>>>
>>> We have a multi application server deployment and are using delayed
>>> delta's.
>>> Our database sever, sphinxd and job runner all run from the same
>>> machine (dbserver).
>>> Our application servers DO NOT run searchd and do not have access to
>>> the index files on the dbserver. We assume that as the delta  
>>> indexing
>>> is done via the job runner directly on the dbserver this is not
>>> required.
>>>
>>> Our production.rd contains:
>>> ThinkingSphinx.remote_sphinx = true
>>>
>>> and sphinx.yml points the address to the dbserver where searchd is
>>> running:
>>> production:
>>> address:  dbserver
>>>
>>> The problem we experience is the following:
>>>
>>> Product.search('', :with => {:state => Zlib.crc32('in_stock')})  
>>> let's
>>> say returns records with id 1,2,3
>>>
>>> Then we update the state of 1
>>>
>>> Product.find(1).update_attribute(:state => 'out_of_stock')
>>>
>>> Once the job runner has run (next to no delay) we can run the
>>> following and id:1 is returned, so the delta has worked correctly,
>>> these results are returned from all app servers:
>>>
>>> Product.search('', :with => {:state => Zlib.crc32('out_of_stock')})
>>>
>>> BUT
>>>
>>> The record is also still present in the following resultset
>>> Product.search('', :with => {:state => Zlib.crc32('in_stock')})
>>>
>>> This problem only exists in our production configuration outlined
>>> above, when we perform these test on a single machine in dev then
>>> there are no problems.
>>>
>>> Any help would much appreciated.
>>>
>>> Many thanks
>>>
>>> Joe
>>>
>>>
>>>
>>>
>>>
>>>>>
>
>
> >


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to