Hi there,

I've run into some scaling problems in the way the indexer handles
long lists of multi-valued-attributes. Worst case scenario I have
items with over 25000 attributes attached. Indexing these through a
left-join with group_concat took a long time and caused quite some
load on the database.

Reading up on the sphinx-documentation I found that multi-valued-
attributes could also be indexed through a separate query that simply
retrieves all the <document, attribute>-pairs. A quick test showed
that this speeds up the indexing tremendously.

This feature isn't supported by thinking-sphinx so I took a stab at it
in my fork at http://github.com/menno/thinking-sphinx/commits/mva

It's tested in production for my use case which is along the line of
Item.has_many :tags, :through => :taggings. For which it can "select
item_id, tag_id from taggings" to get all the pairs. There are specs
and code for other has-many-associations but they, and other cases,
haven't been thoroughly tested.

Another point of concern is that I needed access to the unique-id-
expression used in the select-query to match up the ids. I've moved
this logic to ThinkingSphinx.unique_id_expression(offset) but I still
needed to pass around the offset a lot more than I'd like.

So I hope this can be of use to anyone, and feel free to comment on
the implementation/tests as it's my first encounter with the internals
of thinking-sphinx, cucumber and rspec ;)

Cheers,

Menno van der Sman

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to