Hi Pedro

Looks like there's something seriously wrong in the generated SQL statement. If 
possible (and let's do this off-list), can I get access to the app's code so I 
can reproduce the problem locally, and see if I can fix it?

-- 
Pat

On 05/03/2011, at 9:55 AM, Pedro Cunha wrote:

> Pat could you take a look at: http://dl.dropbox.com/u/1175879/output.zip
> 
> As you can see, this is an output for just 1 restaurant and it's
> totally weird :\
> 
> Best Regards,
> Pedro
> 
> On Mar 3, 6:57 am, Pat Allan <[email protected]> wrote:
>> Hi Pedro
>> 
>> Not that it's particularly helpful, but this is definitely not how Sphinx 
>> behaves. It definitely should not take an hour to index 15 documents. It 
>> shouldn't even take a minute, unless the query is particularly complex - and 
>> this isn't what I'd call complex!
>> 
>> The reason for adding the group_by call, is that PostgreSQL is strict with 
>> the SQL it expects - anything that isn't within an aggregation function 
>> should be mentioned in the GROUP BY clause.
>> 
>> Thinking Sphinx will do this automatically for any fields it references to - 
>> but it can't look into SQL snippets to figure out which columns are used, 
>> hence why you need to manually add it yourself, because of your CRC'd state 
>> attribute.
>> 
>> The repeated data is definitely odd. Thinking Sphinx concatenates data from 
>> columns in associations, but the query shouldn't duplicate data (unless the 
>> record is attached to many associated records that look the same - but 
>> that's not a SQL problem, that's a data problem).
>> 
>> When you limit the query to one restaurant, can you confirm it's joining 
>> only on the menu categories and menu items you're expecting? (Add the id 
>> columns from both tables to the start of your query, and to the GROUP BY 
>> clause).
>> 
>> --
>> Pat
>> 
>> On 03/03/2011, at 6:13 AM, Pedro Cunha wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> Hello Pat,
>> 
>>> Have been investigating last days and I'm almost at the same place.
>> 
>>> I'm using PostgreSQL.
>> 
>>> As far as I read we need to trust how postgres analyses queries and he
>>> can choose or not to use indexes.
>> 
>>> Herehttp://explain.depesz.com/s/sA9is a bit (only a bit..) more
>>> detailed the EXPLAIN ANALYSE.
>> 
>>> I've been digging through the query that TS generates, and for the 10
>>> entities (restaurants) i'm trying to index, 8500~ rows are created.
>> 
>>> What is the reason for when including all "has x" to use "group by"?
>> 
>>> Also I noticed something weird, imagine I pick one of the lines of the
>>> output, and now pick a column, is it supposed to see lots of
>>> information repeated?  like:
>> 
>>> ".. Negócios Negócios Negócios Negócios Negócios Negócios Sair a dois
>>> Negócios Negócios Negócios Negócios Negócios Negócios Negócios Sair a
>>> dois Negócios Negócios Negócios Sair a dois Negócios Sair a dois
>>> Descontrair Sair a dois Sair a dois Sair a dois Sair a dois Sair a
>>> dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair
>>> a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois
>>> Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a
>>> dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair
>>> a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois
>>> Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a
>>> dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair
>>> a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois
>>> Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a
>>> dois Sair a dois Sair a dois ..."
>> 
>>> Production database is at the moment with 15 restaurants.. and it's
>>> taking 1h to index
>> 
>>> As DB grows, time is not incremental but exponential and this is
>>> worrying me a lot. Also because delta indexing is taking around 15~
>>> mins.
>> 
>>> Is it supposed that 1:N associations decay so badly?
>> 
>>> I'm running out of ideas and considering breaking references for this
>>> case to save one inner join to happen.
>> 
>>> Restaurant (1 : N ) Menu Categories
>>> Menu Categories ( 1 : N ) Menu Items  <====== Shortcut this one
>> 
>>> Best regards,
>>> Pedro
>> 
>>> On Feb 24, 3:59 am, Pat Allan <[email protected]> wrote:
>>>> Hi Pedro
>> 
>>>> Wow, PostgreSQL's EXPLAIN output is pretty complex!
>> 
>>>> I'm not spotting anything obviously odd, but I'm not sure what to look for.
>> 
>>>> Perhaps you try it all again, but comment out all the fields and 
>>>> attributes except those from menu items? Indexing should be a bit faster, 
>>>> but I'd imagine there's still a speed hit to some extent. I'm not sure if 
>>>> that'll shine any light on the situation though...
>> 
>>>> Is this all on your development environment? If it's production or 
>>>> staging, are you sure that the database indexes on the foreign keys exist? 
>>>> (ie: latest migrations have been run)
>> 
>>>> --
>>>> Pat
>> 
>>>> On 24/02/2011, at 11:47 AM, Pedro Cunha wrote:
>> 
>>>>> Hello Pat,
>> 
>>>>> Here is the link:
>>>>> https://gist.github.com/841520
>> 
>>>>> With:
>>>>> - define_index block
>>>>> - sql query on conf
>>>>> - explain
>>>>> - explain analyse
>> 
>>>>> Thanks in advance
>> 
>>>>> On Feb 23, 11:51 pm, Pat Allan <[email protected]> wrote:
>>>>>> Hi Pedro
>> 
>>>>>> Perhaps post the EXPLAIN and EXPLAIN ANALYSE outputs to a gist? To be 
>>>>>> honest, given I've not used the PostgreSQL equivalents, I'm not sure how 
>>>>>> much sense I'll be able to make out of it, but let's start with that.
>> 
>>>>>> Also, can you add the SQL query to the gist as well?
>> 
>>>>>> Cheers
>> 
>>>>>> --
>>>>>> Pat
>> 
>>>>>> On 24/02/2011, at 6:07 AM, Pedro Cunha wrote:
>> 
>>>>>>> Hey Pat, I'm using PostgresSQL, and command is the same.
>> 
>>>>>>> I tried looking at the output and it kinda looks like chinese to me.
>>>>>>> From what I see, I can see no reference to indexes whatsoever. That
>>>>>>> can be a problem. According to Postgres docs I must do an ANALYZE
>>>>>>> <table_name>  to make pg server aware of an index on that specific
>>>>>>> table.
>> 
>>>>>>> After I ve done that and run the EXPLAIN and EXPLAIN ANALYSE I still
>>>>>>> can't see any reference to indexes. That might be a problem.
>> 
>>>>>>> Do you want me to copy paste here the whole log or is there some way I
>>>>>>> can attach that file to this thread? (it's a bit lengthy)
>> 
>>>>>>> Best regards
>> 
>>>>>>> On Feb 23, 12:31 am, Pat Allan <[email protected]> wrote:
>>>>>>>> Hi Pedro
>> 
>>>>>>>> Given you've got the database indexes, this definitely feels far too 
>>>>>>>> slow. Are you using MySQL or PostgreSQL?
>> 
>>>>>>>> If you're using MySQL, then I would recommend taking the generated 
>>>>>>>> sql_query value from the restaurant_core_0 source in your sphinx 
>>>>>>>> configuration file, remove the $start and $end parts of the WHERE 
>>>>>>>> clause, and add EXPLAIN to the start of the query. Then, run this 
>>>>>>>> query in a MySQL console:
>> 
>>>>>>>>   EXPLAIN SELECT ...
>> 
>>>>>>>> This should give you some information on what database indexes are 
>>>>>>>> being used for the query.
>> 
>>>>>>>> I'm not sure what the equivalent of the EXPLAIN command for PostgreSQL 
>>>>>>>> is, but I'd be surprised if such a thing didn't exist.
>> 
>>>>>>>> Cheers
>> 
>>>>>>>> --
>>>>>>>> Pat
>> 
>>>>>>>> On 23/02/2011, at 1:32 AM, Pedro Cunha wrote:
>> 
>>>>>>>>> Hello there,
>> 
>>>>>>>>> I'm having a weird slow indexing. Tried looking for similar issues on
>>>>>>>>> this group and couldn't find.
>> 
>>>>>>>>> When I index an has_many through association, perfomance decays a lot.
>>>>>>>>> To help trace this issue I ll give as much info as I can.
>> 
>>>>>>>>> So the association looks like this:
>> 
>>>>>>>>> Restaurant (1 : N ) Menu Categories
>>>>>>>>> Menu Categories ( 1 : N ) Menu Items
>> 
>>>>>>>>> Index on table menu_categories for restaurant_id (DB level)
>>>>>>>>> Index on table menu_items for menu_category_id (DB level)
>> 
>>>>>>>>> define_index on restaurant
>>>>>>>>> ...
>>>>>>>>>    indexes menu_categories.menu_items(:name), :as => :menu_item_name
>>>>>>>>>    indexes menu_categories.menu_items(:subtitle), :as
>>>>>>>>> => :menu_item_subtitle
>>>>>>>>>    indexes menu_categories.menu_items(:description), :as
>>>>>>>>> => :menu_item_description
>>>>>>>>> ...
>>>>>>>>> end
>> 
>>>>>>>>> Without the indexes before, and with a lot of other stuff I get this
>>>>>>>>> perfomance:
>> 
>>>>>>>>> indexing index 'restaurant_core'...
>>>>>>>>> collected 12 docs, 3.0 MB
>>>>>>>>> collected 122936 attr values
>>>>>>>>> sorted 0.2 Mvalues, 100.0% done
>>>>>>>>> sorted 2.2 Mhits, 100.0% done
>>>>>>>>> total 12 docs, 2973437 bytes
>>>>>>>>> total 16.741 sec, 177611 bytes/sec, 0.71 docs/sec
>>>>>>>>> distributed index 'restaurant' can not be directly indexed; skipping.
>> 
>>>>>>>>> With the indexes
>> 
>>>>>>>>> indexing index 'restaurant_core'...
>>>>>>>>> collected 12 docs, 15.9 MB
>>>>>>>>> collected 423920 attr values
>>>>>>>>> sorted 0.8 Mvalues, 100.0% done
>>>>>>>>> sorted 11.4 Mhits, 100.0% done
>>>>>>>>> total 12 docs, 15885288 bytes
>>>>>>>>> total 320.783 sec, 49520 bytes/sec, 0.03 docs/sec
>>>>>>>>> distributed index 'restaurant' can not be directly indexed; skipping.
>> 
>>>>>>>>> table ids are pretty small so I think that issue is not causing
>>>>>>>>> problems.. still I tried adding the sql step config and performance
>>>>>>>>> stays the same
>> 
>>>>>>>>> Properties on define_index are:
>> 
>>>>>>>>>    set_property :min_prefix_len => 3
>>>>>>>>>    set_property :stopwords => "#{RAILS_ROOT}/config/sphinx/
>>>>>>>>> stoplist.txt"
>>>>>>>>>    set_property :delta => ThinkingSphinx::Deltas::ResqueDelta
>> 
>>>>>>>>> Tried commenting min_prefix_len and perfomance keeps the same..
>> 
>>>>>>>>> Running out of ideas :(
>> 
>>>>>>>>> Best regards,
>>>>>>>>> Pedro
>> 
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "Thinking Sphinx" group.
>>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>> To unsubscribe from this group, send email to 
>>>>>>>>> [email protected].
>>>>>>>>> For more options, visit this group 
>>>>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>> 
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Thinking Sphinx" group.
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> To unsubscribe from this group, send email to 
>>>>>>> [email protected].
>>>>>>> For more options, visit this group 
>>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Thinking Sphinx" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to 
>>>>> [email protected].
>>>>> For more options, visit this
>> 
>> ...
>> 
>> read more »
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/thinking-sphinx?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to