Hi Pat, I implemented according to this, and the indexing time went down (5 times faster on development). However, the delta indexing time went up (30 times slower on development). See below the indexing stats:
Total docsBytesTime (sec)Total docsBytesTime (sec)incident_index_1_core7331 653112239.436incident_index_6_core7331282395938.802incident_index_1_delta6 11280.184incident_index_6_delta6247634255.234incident_index_2_core7319 675118945.477incident_index_7_core7319283317268.819incident_index_2_delta5 8430.233incident_index_7_delta5247632895.321incident_index_3_core73906803814 42.064incident_index_8_core7390283101217.913incident_index_3_delta821430.203 incident_index_8_delta8247643665.282incident_index_4_core7278637766437.665 incident_index_9_core7278281622607.891incident_index_4_delta611080.436 incident_index_9_delta6247633305.456incident_index_5_core7396660135839.704 incident_index_10_core7396281520759.562incident_index_5_delta69440.216 incident_index_10_delta6247633085.303 Any idea why this is happening? Thanks, Jonathan On Friday, July 26, 2013 at 3:57:38 PM UTC+3, Pat Allan wrote: > > Heya Steve > > Was just looking into how difficult this would be to implement properly, > and noticed I have added the ability to take a string as the source query - > instead of the column references. So, it's possible without hacking around > in the index definition itself: > > https://gist.github.com/pat/6088629 > > It's worth noting that the document id (Sphinx's equivalent of a primary > key) involves the normal primary key with an offset and a multiplier. Make > sure those two integers match what's in your generated index in sql_query. > They may change when you add other indices to your app (depends on > alphabetical order of your index files). > > Also: there's probably some metaprogramming you could add to simplify > things a bit more. > > Would love to hear if this approach helps with your real app and not just > the test one :) > > -- > Pat > > On 26/07/2013, at 12:14 AM, Pat Allan wrote: > > > Hi Steve > > > > I've got a way forward to greatly improve the speed of indexing… > unfortunately, it's not going to work within Thinking Sphinx easily right > now. > > > > Sphinx has the ability to gather attribute and field values from > separate queries - this existed for TS v1/v2 for attributes, and fields was > added in TS v3, but the catch is those separate queries don't work for > HABTM joins. I'd love to change that, it's just painful from an > ActiveRecord perspective because you're not dealing with a model's table as > the base, but the HABTM join table. > > > > Here's the configuration for the relevant source that I modified by > hand: > > https://gist.github.com/pat/6080031 > > > > You'll see that the main query is nice and short - and then there's each > of the MVA and joined field definitions. If you put this in the generated > source definition in config/development.sphinx.conf, and then run the > indexer manually (NOT through the rake task, that'll overwrite this): > > indexer --config config/development.sphinx.conf --all --rotate > > > > (Remove --rotate if Sphinx isn't running.) You'll see it's pretty damn > fast. > > > > Now, ways forward? Well, I'd love to write something for TS v3 that can > handle HABTM - it's just a shame that it might need to be pure ARel rather > than ActiveRecord-built (which can otherwise help with joins). > > > > But otherwise: switch from HABTM to has_many/has_many :through - make > each of the joins an actual model. Then, you can add :source => :query to > each of the appropriate field and attribute definitions, and it should > generate something pretty much the same. > > > > Hope this provides some clarity at the very least! And also: thanks for > the test app, really helped with debugging! > > > > -- > > Pat > > > > > > On 25/07/2013, at 2:54 PM, Steve Kenworthy wrote: > > > >> Hi there, > >> > >> Firstly, thinking-sphinx is awesome and I love it. Thanks Pat for an > excellent project. V3 is looking great and represents a lot of hard work > and effort. > >> > >> I've been using thinking-sphinx to index a document model and it's > really slowed down when I add lots of associations in the index. In fact, > it never finishes on my machine (8Gig RAM, 8 CPU's) when I add 4 indexes. > >> > >> Times: > >> • 4 seconds - when 1 association (images) is indexed > >> • 6 seconds - when 2 associations (images and subscribers) are > indexed > >> • 23 seconds - when 2 associations (images and countries) are > indexed > >> • 115 seconds - when 3 associations (images, subscribers and > tags) are indexed > >> • 113 seconds - when 3 associations (images, subscribers and > videos) are indexed (just to prove it's not tags slowing it down) > >> • ꝏ (not finishing) - when 4 associations or more are selected. > >> > >> Here's my index file: > >> > >> ThinkingSphinx::Index.define :document, with: :active_record, delta: > true, sql_range_step: 999999999, group_concat_max_len: 16384 do > >> > >> has countries(:id), as: :country_ids > >> has images(:id), as: :image_ids, facet: true > >> has subscribers(:id), as: :subscriber_ids, facet: true > >> has tags(:id), as: :tag_ids, facet: true > >> has videos(:id), as: :video_ids, facet: true > >> > >> indexes countries.name, as: :countries > >> indexes images.title, as: :images > >> indexes subscribers.title, as: :subscribers > >> indexes tags.name, as: :tags > >> indexes videos.title, as: :videos > >> > >> has updated_at > >> > >> end > >> > >> The generated sql is a massive group_by query and is not finishing. See > it here > https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing > >> > >> I'd really appreciate some advice on how to optimise this so indexing > becomes viable again. Do I just have too much going on here? I'm using > facets, indexes and attributes. Perhaps there is a better way to optimise? > A friend suggested pre-computing with some joins... how would this work? > >> > >> Vital stats: using mysql v14.14, sphinx 2.0.4, Ubuntu, rails 3.2.13, > thinking-sphinx 3.0.4 > >> > >> For those who'd like to take a look, I've uploaded a sample project > here https://github.com/crossroads/rails3-ts-example which can be cloned. > If you follow the instructions, it will setup a db with test data and > reproduce the problem quickly. > >> > >> There's also the sphinx generated SQL and EXPLAIN: > https://github.com/crossroads/rails3-ts-example#what-sphinx-is-doing > >> > >> Thanks in advance for anyone taking the time to read. > >> > >> Regards, > >> Steve > >> > >> -- > >> You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > >> To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > >> To post to this group, send email to [email protected] > <javascript:>. > >> Visit this group at http://groups.google.com/group/thinking-sphinx. > >> For more options, visit https://groups.google.com/groups/opt_out. > >> > >> > > > > > > -- > > You received this message because you are subscribed to the Google > Groups "Thinking Sphinx" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected] <javascript:>. > > To post to this group, send email to [email protected] > <javascript:>. > > Visit this group at http://groups.google.com/group/thinking-sphinx. > > For more options, visit https://groups.google.com/groups/opt_out. > > > > > > > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/thinking-sphinx. For more options, visit https://groups.google.com/d/optout.
