Hi Pedro Not that it's particularly helpful, but this is definitely not how Sphinx behaves. It definitely should not take an hour to index 15 documents. It shouldn't even take a minute, unless the query is particularly complex - and this isn't what I'd call complex!
The reason for adding the group_by call, is that PostgreSQL is strict with the SQL it expects - anything that isn't within an aggregation function should be mentioned in the GROUP BY clause. Thinking Sphinx will do this automatically for any fields it references to - but it can't look into SQL snippets to figure out which columns are used, hence why you need to manually add it yourself, because of your CRC'd state attribute. The repeated data is definitely odd. Thinking Sphinx concatenates data from columns in associations, but the query shouldn't duplicate data (unless the record is attached to many associated records that look the same - but that's not a SQL problem, that's a data problem). When you limit the query to one restaurant, can you confirm it's joining only on the menu categories and menu items you're expecting? (Add the id columns from both tables to the start of your query, and to the GROUP BY clause). -- Pat On 03/03/2011, at 6:13 AM, Pedro Cunha wrote: > Hello Pat, > > Have been investigating last days and I'm almost at the same place. > > I'm using PostgreSQL. > > As far as I read we need to trust how postgres analyses queries and he > can choose or not to use indexes. > > Here http://explain.depesz.com/s/sA9 is a bit (only a bit..) more > detailed the EXPLAIN ANALYSE. > > I've been digging through the query that TS generates, and for the 10 > entities (restaurants) i'm trying to index, 8500~ rows are created. > > What is the reason for when including all "has x" to use "group by"? > > Also I noticed something weird, imagine I pick one of the lines of the > output, and now pick a column, is it supposed to see lots of > information repeated? like: > > ".. Negócios Negócios Negócios Negócios Negócios Negócios Sair a dois > Negócios Negócios Negócios Negócios Negócios Negócios Negócios Sair a > dois Negócios Negócios Negócios Sair a dois Negócios Sair a dois > Descontrair Sair a dois Sair a dois Sair a dois Sair a dois Sair a > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a > dois Sair a dois Sair a dois ..." > > Production database is at the moment with 15 restaurants.. and it's > taking 1h to index > > As DB grows, time is not incremental but exponential and this is > worrying me a lot. Also because delta indexing is taking around 15~ > mins. > > Is it supposed that 1:N associations decay so badly? > > I'm running out of ideas and considering breaking references for this > case to save one inner join to happen. > > Restaurant (1 : N ) Menu Categories > Menu Categories ( 1 : N ) Menu Items <====== Shortcut this one > > > Best regards, > Pedro > > On Feb 24, 3:59 am, Pat Allan <[email protected]> wrote: >> Hi Pedro >> >> Wow, PostgreSQL's EXPLAIN output is pretty complex! >> >> I'm not spotting anything obviously odd, but I'm not sure what to look for. >> >> Perhaps you try it all again, but comment out all the fields and attributes >> except those from menu items? Indexing should be a bit faster, but I'd >> imagine there's still a speed hit to some extent. I'm not sure if that'll >> shine any light on the situation though... >> >> Is this all on your development environment? If it's production or staging, >> are you sure that the database indexes on the foreign keys exist? (ie: >> latest migrations have been run) >> >> -- >> Pat >> >> On 24/02/2011, at 11:47 AM, Pedro Cunha wrote: >> >> >> >> >> >> >> >>> Hello Pat, >> >>> Here is the link: >>> https://gist.github.com/841520 >> >>> With: >>> - define_index block >>> - sql query on conf >>> - explain >>> - explain analyse >> >>> Thanks in advance >> >>> On Feb 23, 11:51 pm, Pat Allan <[email protected]> wrote: >>>> Hi Pedro >> >>>> Perhaps post the EXPLAIN and EXPLAIN ANALYSE outputs to a gist? To be >>>> honest, given I've not used the PostgreSQL equivalents, I'm not sure how >>>> much sense I'll be able to make out of it, but let's start with that. >> >>>> Also, can you add the SQL query to the gist as well? >> >>>> Cheers >> >>>> -- >>>> Pat >> >>>> On 24/02/2011, at 6:07 AM, Pedro Cunha wrote: >> >>>>> Hey Pat, I'm using PostgresSQL, and command is the same. >> >>>>> I tried looking at the output and it kinda looks like chinese to me. >>>>> From what I see, I can see no reference to indexes whatsoever. That >>>>> can be a problem. According to Postgres docs I must do an ANALYZE >>>>> <table_name> to make pg server aware of an index on that specific >>>>> table. >> >>>>> After I ve done that and run the EXPLAIN and EXPLAIN ANALYSE I still >>>>> can't see any reference to indexes. That might be a problem. >> >>>>> Do you want me to copy paste here the whole log or is there some way I >>>>> can attach that file to this thread? (it's a bit lengthy) >> >>>>> Best regards >> >>>>> On Feb 23, 12:31 am, Pat Allan <[email protected]> wrote: >>>>>> Hi Pedro >> >>>>>> Given you've got the database indexes, this definitely feels far too >>>>>> slow. Are you using MySQL or PostgreSQL? >> >>>>>> If you're using MySQL, then I would recommend taking the generated >>>>>> sql_query value from the restaurant_core_0 source in your sphinx >>>>>> configuration file, remove the $start and $end parts of the WHERE >>>>>> clause, and add EXPLAIN to the start of the query. Then, run this query >>>>>> in a MySQL console: >> >>>>>> EXPLAIN SELECT ... >> >>>>>> This should give you some information on what database indexes are being >>>>>> used for the query. >> >>>>>> I'm not sure what the equivalent of the EXPLAIN command for PostgreSQL >>>>>> is, but I'd be surprised if such a thing didn't exist. >> >>>>>> Cheers >> >>>>>> -- >>>>>> Pat >> >>>>>> On 23/02/2011, at 1:32 AM, Pedro Cunha wrote: >> >>>>>>> Hello there, >> >>>>>>> I'm having a weird slow indexing. Tried looking for similar issues on >>>>>>> this group and couldn't find. >> >>>>>>> When I index an has_many through association, perfomance decays a lot. >>>>>>> To help trace this issue I ll give as much info as I can. >> >>>>>>> So the association looks like this: >> >>>>>>> Restaurant (1 : N ) Menu Categories >>>>>>> Menu Categories ( 1 : N ) Menu Items >> >>>>>>> Index on table menu_categories for restaurant_id (DB level) >>>>>>> Index on table menu_items for menu_category_id (DB level) >> >>>>>>> define_index on restaurant >>>>>>> ... >>>>>>> indexes menu_categories.menu_items(:name), :as => :menu_item_name >>>>>>> indexes menu_categories.menu_items(:subtitle), :as >>>>>>> => :menu_item_subtitle >>>>>>> indexes menu_categories.menu_items(:description), :as >>>>>>> => :menu_item_description >>>>>>> ... >>>>>>> end >> >>>>>>> Without the indexes before, and with a lot of other stuff I get this >>>>>>> perfomance: >> >>>>>>> indexing index 'restaurant_core'... >>>>>>> collected 12 docs, 3.0 MB >>>>>>> collected 122936 attr values >>>>>>> sorted 0.2 Mvalues, 100.0% done >>>>>>> sorted 2.2 Mhits, 100.0% done >>>>>>> total 12 docs, 2973437 bytes >>>>>>> total 16.741 sec, 177611 bytes/sec, 0.71 docs/sec >>>>>>> distributed index 'restaurant' can not be directly indexed; skipping. >> >>>>>>> With the indexes >> >>>>>>> indexing index 'restaurant_core'... >>>>>>> collected 12 docs, 15.9 MB >>>>>>> collected 423920 attr values >>>>>>> sorted 0.8 Mvalues, 100.0% done >>>>>>> sorted 11.4 Mhits, 100.0% done >>>>>>> total 12 docs, 15885288 bytes >>>>>>> total 320.783 sec, 49520 bytes/sec, 0.03 docs/sec >>>>>>> distributed index 'restaurant' can not be directly indexed; skipping. >> >>>>>>> table ids are pretty small so I think that issue is not causing >>>>>>> problems.. still I tried adding the sql step config and performance >>>>>>> stays the same >> >>>>>>> Properties on define_index are: >> >>>>>>> set_property :min_prefix_len => 3 >>>>>>> set_property :stopwords => "#{RAILS_ROOT}/config/sphinx/ >>>>>>> stoplist.txt" >>>>>>> set_property :delta => ThinkingSphinx::Deltas::ResqueDelta >> >>>>>>> Tried commenting min_prefix_len and perfomance keeps the same.. >> >>>>>>> Running out of ideas :( >> >>>>>>> Best regards, >>>>>>> Pedro >> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Thinking Sphinx" group. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> To unsubscribe from this group, send email to >>>>>>> [email protected]. >>>>>>> For more options, visit this group >>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. >> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Thinking Sphinx" group. >>>>> To post to this group, send email to [email protected]. >>>>> To unsubscribe from this group, send email to >>>>> [email protected]. >>>>> For more options, visit this group >>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en. >> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Thinking Sphinx" group. >>> To post to this group, send email to [email protected]. >>> To unsubscribe from this group, send email to >>> [email protected]. >>> For more options, visit this group >>> athttp://groups.google.com/group/thinking-sphinx?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Thinking Sphinx" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/thinking-sphinx?hl=en. > -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
