Pat could you take a look at: http://dl.dropbox.com/u/1175879/output.zip

As you can see, this is an output for just 1 restaurant and it's
totally weird :\

Best Regards,
Pedro

On Mar 3, 6:57 am, Pat Allan <[email protected]> wrote:
> Hi Pedro
>
> Not that it's particularly helpful, but this is definitely not how Sphinx 
> behaves. It definitely should not take an hour to index 15 documents. It 
> shouldn't even take a minute, unless the query is particularly complex - and 
> this isn't what I'd call complex!
>
> The reason for adding the group_by call, is that PostgreSQL is strict with 
> the SQL it expects - anything that isn't within an aggregation function 
> should be mentioned in the GROUP BY clause.
>
> Thinking Sphinx will do this automatically for any fields it references to - 
> but it can't look into SQL snippets to figure out which columns are used, 
> hence why you need to manually add it yourself, because of your CRC'd state 
> attribute.
>
> The repeated data is definitely odd. Thinking Sphinx concatenates data from 
> columns in associations, but the query shouldn't duplicate data (unless the 
> record is attached to many associated records that look the same - but that's 
> not a SQL problem, that's a data problem).
>
> When you limit the query to one restaurant, can you confirm it's joining only 
> on the menu categories and menu items you're expecting? (Add the id columns 
> from both tables to the start of your query, and to the GROUP BY clause).
>
> --
> Pat
>
> On 03/03/2011, at 6:13 AM, Pedro Cunha wrote:
>
>
>
>
>
>
>
> > Hello Pat,
>
> > Have been investigating last days and I'm almost at the same place.
>
> > I'm using PostgreSQL.
>
> > As far as I read we need to trust how postgres analyses queries and he
> > can choose or not to use indexes.
>
> > Herehttp://explain.depesz.com/s/sA9is a bit (only a bit..) more
> > detailed the EXPLAIN ANALYSE.
>
> > I've been digging through the query that TS generates, and for the 10
> > entities (restaurants) i'm trying to index, 8500~ rows are created.
>
> > What is the reason for when including all "has x" to use "group by"?
>
> > Also I noticed something weird, imagine I pick one of the lines of the
> > output, and now pick a column, is it supposed to see lots of
> > information repeated?  like:
>
> > ".. Negócios Negócios Negócios Negócios Negócios Negócios Sair a dois
> > Negócios Negócios Negócios Negócios Negócios Negócios Negócios Sair a
> > dois Negócios Negócios Negócios Sair a dois Negócios Sair a dois
> > Descontrair Sair a dois Sair a dois Sair a dois Sair a dois Sair a
> > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair
> > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois
> > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a
> > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair
> > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois
> > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a
> > dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair
> > a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois
> > Sair a dois Sair a dois Sair a dois Sair a dois Sair a dois Sair a
> > dois Sair a dois Sair a dois ..."
>
> > Production database is at the moment with 15 restaurants.. and it's
> > taking 1h to index
>
> > As DB grows, time is not incremental but exponential and this is
> > worrying me a lot. Also because delta indexing is taking around 15~
> > mins.
>
> > Is it supposed that 1:N associations decay so badly?
>
> > I'm running out of ideas and considering breaking references for this
> > case to save one inner join to happen.
>
> > Restaurant (1 : N ) Menu Categories
> > Menu Categories ( 1 : N ) Menu Items  <====== Shortcut this one
>
> > Best regards,
> > Pedro
>
> > On Feb 24, 3:59 am, Pat Allan <[email protected]> wrote:
> >> Hi Pedro
>
> >> Wow, PostgreSQL's EXPLAIN output is pretty complex!
>
> >> I'm not spotting anything obviously odd, but I'm not sure what to look for.
>
> >> Perhaps you try it all again, but comment out all the fields and 
> >> attributes except those from menu items? Indexing should be a bit faster, 
> >> but I'd imagine there's still a speed hit to some extent. I'm not sure if 
> >> that'll shine any light on the situation though...
>
> >> Is this all on your development environment? If it's production or 
> >> staging, are you sure that the database indexes on the foreign keys exist? 
> >> (ie: latest migrations have been run)
>
> >> --
> >> Pat
>
> >> On 24/02/2011, at 11:47 AM, Pedro Cunha wrote:
>
> >>> Hello Pat,
>
> >>> Here is the link:
> >>>https://gist.github.com/841520
>
> >>> With:
> >>> - define_index block
> >>> - sql query on conf
> >>> - explain
> >>> - explain analyse
>
> >>> Thanks in advance
>
> >>> On Feb 23, 11:51 pm, Pat Allan <[email protected]> wrote:
> >>>> Hi Pedro
>
> >>>> Perhaps post the EXPLAIN and EXPLAIN ANALYSE outputs to a gist? To be 
> >>>> honest, given I've not used the PostgreSQL equivalents, I'm not sure how 
> >>>> much sense I'll be able to make out of it, but let's start with that.
>
> >>>> Also, can you add the SQL query to the gist as well?
>
> >>>> Cheers
>
> >>>> --
> >>>> Pat
>
> >>>> On 24/02/2011, at 6:07 AM, Pedro Cunha wrote:
>
> >>>>> Hey Pat, I'm using PostgresSQL, and command is the same.
>
> >>>>> I tried looking at the output and it kinda looks like chinese to me.
> >>>>> From what I see, I can see no reference to indexes whatsoever. That
> >>>>> can be a problem. According to Postgres docs I must do an ANALYZE
> >>>>> <table_name>  to make pg server aware of an index on that specific
> >>>>> table.
>
> >>>>> After I ve done that and run the EXPLAIN and EXPLAIN ANALYSE I still
> >>>>> can't see any reference to indexes. That might be a problem.
>
> >>>>> Do you want me to copy paste here the whole log or is there some way I
> >>>>> can attach that file to this thread? (it's a bit lengthy)
>
> >>>>> Best regards
>
> >>>>> On Feb 23, 12:31 am, Pat Allan <[email protected]> wrote:
> >>>>>> Hi Pedro
>
> >>>>>> Given you've got the database indexes, this definitely feels far too 
> >>>>>> slow. Are you using MySQL or PostgreSQL?
>
> >>>>>> If you're using MySQL, then I would recommend taking the generated 
> >>>>>> sql_query value from the restaurant_core_0 source in your sphinx 
> >>>>>> configuration file, remove the $start and $end parts of the WHERE 
> >>>>>> clause, and add EXPLAIN to the start of the query. Then, run this 
> >>>>>> query in a MySQL console:
>
> >>>>>>   EXPLAIN SELECT ...
>
> >>>>>> This should give you some information on what database indexes are 
> >>>>>> being used for the query.
>
> >>>>>> I'm not sure what the equivalent of the EXPLAIN command for PostgreSQL 
> >>>>>> is, but I'd be surprised if such a thing didn't exist.
>
> >>>>>> Cheers
>
> >>>>>> --
> >>>>>> Pat
>
> >>>>>> On 23/02/2011, at 1:32 AM, Pedro Cunha wrote:
>
> >>>>>>> Hello there,
>
> >>>>>>> I'm having a weird slow indexing. Tried looking for similar issues on
> >>>>>>> this group and couldn't find.
>
> >>>>>>> When I index an has_many through association, perfomance decays a lot.
> >>>>>>> To help trace this issue I ll give as much info as I can.
>
> >>>>>>> So the association looks like this:
>
> >>>>>>> Restaurant (1 : N ) Menu Categories
> >>>>>>> Menu Categories ( 1 : N ) Menu Items
>
> >>>>>>> Index on table menu_categories for restaurant_id (DB level)
> >>>>>>> Index on table menu_items for menu_category_id (DB level)
>
> >>>>>>> define_index on restaurant
> >>>>>>> ...
> >>>>>>>    indexes menu_categories.menu_items(:name), :as => :menu_item_name
> >>>>>>>    indexes menu_categories.menu_items(:subtitle), :as
> >>>>>>> => :menu_item_subtitle
> >>>>>>>    indexes menu_categories.menu_items(:description), :as
> >>>>>>> => :menu_item_description
> >>>>>>> ...
> >>>>>>> end
>
> >>>>>>> Without the indexes before, and with a lot of other stuff I get this
> >>>>>>> perfomance:
>
> >>>>>>> indexing index 'restaurant_core'...
> >>>>>>> collected 12 docs, 3.0 MB
> >>>>>>> collected 122936 attr values
> >>>>>>> sorted 0.2 Mvalues, 100.0% done
> >>>>>>> sorted 2.2 Mhits, 100.0% done
> >>>>>>> total 12 docs, 2973437 bytes
> >>>>>>> total 16.741 sec, 177611 bytes/sec, 0.71 docs/sec
> >>>>>>> distributed index 'restaurant' can not be directly indexed; skipping.
>
> >>>>>>> With the indexes
>
> >>>>>>> indexing index 'restaurant_core'...
> >>>>>>> collected 12 docs, 15.9 MB
> >>>>>>> collected 423920 attr values
> >>>>>>> sorted 0.8 Mvalues, 100.0% done
> >>>>>>> sorted 11.4 Mhits, 100.0% done
> >>>>>>> total 12 docs, 15885288 bytes
> >>>>>>> total 320.783 sec, 49520 bytes/sec, 0.03 docs/sec
> >>>>>>> distributed index 'restaurant' can not be directly indexed; skipping.
>
> >>>>>>> table ids are pretty small so I think that issue is not causing
> >>>>>>> problems.. still I tried adding the sql step config and performance
> >>>>>>> stays the same
>
> >>>>>>> Properties on define_index are:
>
> >>>>>>>    set_property :min_prefix_len => 3
> >>>>>>>    set_property :stopwords => "#{RAILS_ROOT}/config/sphinx/
> >>>>>>> stoplist.txt"
> >>>>>>>    set_property :delta => ThinkingSphinx::Deltas::ResqueDelta
>
> >>>>>>> Tried commenting min_prefix_len and perfomance keeps the same..
>
> >>>>>>> Running out of ideas :(
>
> >>>>>>> Best regards,
> >>>>>>> Pedro
>
> >>>>>>> --
> >>>>>>> You received this message because you are subscribed to the Google 
> >>>>>>> Groups "Thinking Sphinx" group.
> >>>>>>> To post to this group, send email to [email protected].
> >>>>>>> To unsubscribe from this group, send email to 
> >>>>>>> [email protected].
> >>>>>>> For more options, visit this group 
> >>>>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google 
> >>>>> Groups "Thinking Sphinx" group.
> >>>>> To post to this group, send email to [email protected].
> >>>>> To unsubscribe from this group, send email to 
> >>>>> [email protected].
> >>>>> For more options, visit this group 
> >>>>> athttp://groups.google.com/group/thinking-sphinx?hl=en.
>
> >>> --
> >>> You received this message because you are subscribed to the Google Groups 
> >>> "Thinking Sphinx" group.
> >>> To post to this group, send email to [email protected].
> >>> To unsubscribe from this group, send email to 
> >>> [email protected].
> >>> For more options, visit this
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to