RE: Problem with database load

Boris Milikič Fri, 14 Mar 2008 02:03:14 -0700

I agree with Allen to fix the query. Creating indexes, was my idea to solve 
this problem for now. 
I ran explain plan on Oracle 10gR2 and  got the same plan with and without 
those two indexes.


Boris


 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 13, 2008 5:07 PM
To: [email protected]
Subject: Re: Problem with database load

Yes, I am convinced that sql statement is an abomination.

You said this is with 4.0 final running with the JPA backend correct?

Adding the indexes is an option, but I think the bigger problem is that query 
is ridiculous.  It is obviously trying to load way more data than it should 
need to in a single query because a 9 way join in 1 query is insane.  I'm 
wondering if this is a configuration problem with OpenJPA not properly doing 
lazy fetching.  I would think that at the very most you should see 3 tables 
involved in a single query.

The reason I questioned the need for indexes on the 2 columns below is that 
they are really only associations that should be fetched lazily, specifically 
for reasons like this.  There is no need for the object model to fetch and 
populate those category objects when you are querying for a list of weblog 
entries.

So in this case I would prefer to see us work to fix the query rather than just 
try and throw new indexes at the problem.

-- Allen


David Fisher wrote:
>  From my experience the need for an index on a database table is more 
> a function of the database configuration and the current size of a table.
> Many times the need has cropped up in my own webapp development. One 
> of the reasons why we rolled our own set of classes is that it is easy 
> to get to and manipulate the sql. Tuning is a fine art, sometimes an 
> index is indicated and other times a rewrite of the statement with a 
> hint so that it is joined in another manner. A third option is found 
> with Oracle
> - analyze the tables - this allows the Database to decide how best to 
> organize the table's index and that can improve plans. If MySQL offers 
> a similar feature perhaps this will avoid the index.
> 
> Let's look at the way the SQL is constructed:
> 
> FROM roller_comment t0 INNER JOIN weblogentry t1 ON t0.entryid = t1.id 
> LEFT OUTER JOIN weblogcategory t2 ON t1.categoryid = t2.id LEFT OUTER 
> JOIN rolleruser t8 ON t1.userid = t8.id LEFT OUTER JOIN website t9 ON 
> t1.websiteid = t9.id LEFT OUTER JOIN weblogcategory t3 ON t2.parentid 
> = t3.id LEFT OUTER JOIN website t4 ON t2.websiteid = t4.id LEFT OUTER 
> JOIN weblogcategory t5 ON t4.bloggercatid = t5.id LEFT OUTER JOIN 
> rolleruser t6 ON t4.userid = t6.id LEFT OUTER JOIN weblogcategory t7 
> ON t4.defaultcatid = t7.id WHERE (t1.websiteid = 
> '8a926693072c38bf010741c83fcb36d6' AND t0.status = 'APPROVED') ORDER 
> BY t0.posttime DESC LIMIT 0, 10
> 
> This rewrites as:
> 
> FROM roller_comment t0, weblogentry t1, weblogcategory t2, rolleruser 
> t8, website t9, weblogcategory t3, website t4, weblogcategory t5, 
> rolleruser t6, weblogcategory t7 WHERE t1.categoryid = t2.id AND 
> t1.userid = t8.id AND t1.websiteid = t9.id AND t2.parentid = t3.id AND 
> t2.websiteid = t4.id AND t4.bloggercatid = t5.id AND t4.userid = t6.id 
> AND t4.defaultcatid = t7.id AND t1.websiteid = 
> '8a926693072c38bf010741c83fcb36d6' AND t0.status = 'APPROVED' ORDER BY 
> t0.posttime DESC LIMIT 0, 10
> 
> These indexes are provided:
>  website.userid
>  weblogentry.categoryid
>  weblogentry.userid
>  weblogentry.websiteid
>  weblogcategory.id (primary key)
>  weblogcategory.parentid
>  weblogcategory.websiteid
>  rolleruser.id (primary key)
>  roller_comment.status
> 
>> create index ws_bloggercatid_idx    on website(bloggercatid);
> 
> is for "t4.bloggercatid = t5.id"
> 
>> create index ws_defaultcatid_idx    on website(defaultcatid );
> 
> is for "t4.defaultcatid = t7.id"
> 
> In other words every other possible index that would help make this 
> horrible join function except for the two missed:
> 
>> My company has a blog service with 30k blogs and 70k unique browsers 
>> per week.
> 
> This clearly indicates that without the two indexes the 30K blogs 
> impacts every JOIN significantly! The factor is roughly by 30,000 x 
> 30,000 or 900,000,000. Maybe not that bad, but you can see the EXPLOSION!
> 
> Are you convinced?
> 
> Regards,
> Dave
> 
> On Mar 12, 2008, at 11:36 AM, Allen Gilliland wrote:
> 
>> Yes, I understand why we use indexes, what I meant was that when you 
>> are suggesting that we need a new index that you provide some 
>> evidence about why that particular index is necessary.
>>
>> Your example below is completely valid, but the ur_userid_idx index 
>> has always been there, so that's not a new index.
>>
>> For example, you said we need to add these 2 indexes ...
>>
>> create index ws_bloggercatid_idx    on website(bloggercatid);
>> create index ws_defaultcatid_idx    on website(defaultcatid );
>>
>> why?  off the top of my head I can't think of a reason those indexes 
>> would be of benefit because I don't believe we run any standard 
>> queries which do a lookup or join on those columns.  I may be wrong, 
>> but that's why I would like some evidence before we would just add 
>> the indexes for no reason.
>>
>> -- Allen
>>
>>
>> Boris Milikič wrote:
>>> 1) I've got "Impossible WHERE noticed after reading const tables" 
>>> message, when I ran explain on sql from prevoious e-mail:
>>>  SELECT t0.id, t0.content, t0.contenttype, t0.email, t0.name, 
>>> t0.notify, t0.plugins, t0.posttime,  ....
>>> 2) So I run explain on this simple query:
>>> explain select a.rolename ,b.username from rolleruser b, userrole a 
>>> where b.id=a.userid;
>>> 3) Explanations of why  indexes are necessary Column  "type" in 
>>> following table is the join type. Type ref means, that all rows with 
>>> matching index values are read from this table for each combination 
>>> of rows from the previous tables, example a).
>>> For a tables that are  completely read in sequence from the hard 
>>> drive EXPLAIN lists "ALL" in the "type" column. To the second table 
>>> in the join plan for a two table query, EXPLAIN lists type: ALL,  as 
>>> for table without index, example b). In example b)  when EXPLAIN 
>>> lists type: ALL for each table in a join "this output indicates that 
>>> MySQL is generating a Cartesian product of all the tables; that is, 
>>> every combination of rows" (MySQL manual). In simpler terms: Two 
>>> tables of 10 rows each joined together does not result in 20 rows, 
>>> it results in 100 rows (10 multiplied by 10).
>>> a) with index
>>> select_type     table    type    possible_keys    key    key_len    
>>> ref    rows    extra                                    
>>> 1    SIMPLE    b    ALL    PRIMARY    <NULL>    <NULL>    <NULL>    3   
>>> 1    SIMPLE    a    ref    ur_userid_idx    ur_userid_idx    144    
>>> roller.b.id    1   
>>> b) without index (drop index ur_userid_idx on userrole( userid );)    
>>>                                
>>> 1    SIMPLE    a    ALL    <NULL>    <NULL>    <NULL>    <NULL>    4   
>>> 1    SIMPLE    b    ALL    PRIMARY    <NULL>    <NULL>    <NULL>    
>>> 3    Using where
>>> Using too many indexes on tables can make things worse, as you said. 
>>> In many cases, MySQL can calculate the best possible query plan. In 
>>> very large database partitioning help.
>>> -- Boris
>>> -----Original Message-----
>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 
>>> Friday, March 07, 2008 6:04 PM
>>> To: [email protected]
>>> Subject: Re: Problem with database load Can you also provide 
>>> explanations of why you think those indexes are necessary?  Having 
>>> too many indexes on tables can actually make things worse, so you 
>>> don't want to just flood the db with tons of indexes.
>>> -- Allen
>>> Boris Milikič wrote:
>>>> I just found  one more missing:
>>>>
>>>> create index ws_posttime_idx    on roller_comment (posttime);
>>>>
>>>> Next weekend I will walk through sql log and  dbcreate.sql script 
>>>> and record in JIRA if I find something.
>>>>
>>>> Boris
>>>>
>>>> -----Original Message-----
>>>> From: Dave [mailto:[EMAIL PROTECTED]
>>>> Sent: Friday, March 07, 2008 4:31 PM
>>>> To: [email protected]
>>>> Subject: Re: Problem with database load
>>>>
>>>> On Fri, Mar 7, 2008 at 10:15 AM, Boris Milikič 
>>>> <[EMAIL PROTECTED]> wrote:
>>>>>  Did you  created missing indexes in rollerdb (4.0):
>>>>>
>>>>>  create index ws_bloggercatid_idx    on website(bloggercatid);
>>>>>  create index ws_defaultcatid_idx    on website(defaultcatid );
>>>> Thanks Boris,
>>>>
>>>> I just opened an issue for this:
>>>> https://issues.apache.org/roller/browse/ROL-1687
>>>>
>>>> Are there any other 4.0 database issues that you know of that are 
>>>> not on record in JIRA?
>>>>
>>>> - Dave
>>
>

      Rows Plan
---------- -------------------------------------------------------------------
         1 SELECT STATEMENT
         1  NESTED LOOPS OUTER
         1   NESTED LOOPS OUTER
         1    NESTED LOOPS OUTER
         1     NESTED LOOPS OUTER
         1      NESTED LOOPS OUTER
         1       NESTED LOOPS OUTER
         1        NESTED LOOPS OUTER
         1         NESTED LOOPS OUTER
         1          NESTED LOOPS
         1           TABLE ACCESS BY INDEX ROWID ROLLER_COMMENT

      Rows Plan
---------- -------------------------------------------------------------------
         1            INDEX RANGE SCAN DESCENDING CO_COMBO1_IDX
         1           TABLE ACCESS BY INDEX ROWID WEBLOGENTRY
         1            INDEX UNIQUE SCAN SYS_C00136183
         1          TABLE ACCESS BY INDEX ROWID WEBSITE
         1           INDEX UNIQUE SCAN SYS_C00136144
         1         TABLE ACCESS BY INDEX ROWID ROLLERUSER
         1          INDEX UNIQUE SCAN SYS_C00136094
         1        TABLE ACCESS BY INDEX ROWID WEBLOGCATEGORY
         1         INDEX UNIQUE SCAN SYS_C00136164
         1       TABLE ACCESS BY INDEX ROWID WEBSITE
         1        INDEX UNIQUE SCAN SYS_C00136144

      Rows Plan
---------- -------------------------------------------------------------------
         1      TABLE ACCESS BY INDEX ROWID WEBLOGCATEGORY
         1       INDEX UNIQUE SCAN SYS_C00136164
         1     TABLE ACCESS BY INDEX ROWID ROLLERUSER
         1      INDEX UNIQUE SCAN SYS_C00136094
         1    TABLE ACCESS BY INDEX ROWID WEBLOGCATEGORY
         1     INDEX UNIQUE SCAN SYS_C00136164
         1   TABLE ACCESS BY INDEX ROWID WEBLOGCATEGORY
         1    INDEX UNIQUE SCAN SYS_C00136164

RE: Problem with database load

Reply via email to