On 4/25/2012 10:01 AM, Ted Roche wrote:
> Quantify, please: "I load 50 million rows of data and run a query that
> should return 10 records and it takes 15 minutes." Or "I expect 25
> million rows" in which case it should take 15 minutes! Or "It takes 7
> seconds and I want it to take 5" -- these are different questions.

Actually, I forgot one field.  There is a 4th field.  So focusing on 
those 4 fields (region/expcat/product/ratcell), there are say 10 
different regions, 6 different expcats, 4 different products, and 60 
different ratecells.  These all feed out to an Excel spreadsheet, with 
the region being the tab, the expcats being columns, the ratecells being 
rows, and the products helping to differentiate between the two types of 
report output (e.g., these products "a,b,c" mean it's RptType 1, "d,e,f" 
meaning RptType 2, etc.  Those products also help relate to an 
"Adjustment factor" table to pick up a multiplier value for the end-user 
to apply to the sum totals as well.  So 60 x 6 x 10 = 3600 queries to be 
run to provide the summed output.  (I'm doing a SUM on certain fields.) 
Each query goes against against 3 core tables (separate structures...not 
union-able) about 5.1 GB in total size.  I found that some queries were 
taking about 45 seconds to return their rowsets (ranging anywhere from 
very small to very large for popular data).  So 3600 queries @ 45 
seconds each = 2700 minutes, or 45 hours.  Nearly 2 days of running 
queries straight to get my datasets and totals!!  THIS IS WHAT I'M 
TRYING TO IMPROVE.  :-)

(NOTE:  It should be noted that I'm pulling the dataset sample and then 
SUMing it because I need to keep these records for auditing purposes 
later, so when the auditor says "what records went into deriving this 
total?" I can easily show them the CSV.  I'm using HeidiSQL for the 
maintenance/setup of the database and VFP9 for the SQL downloads for 
samples.  These are NOT Visual Foxpro queries.  I'm using SPT against my 
laptop's MySQL database.)

>
>>   I load the data, add the appropriate
>> indexes on the key fields in my WHERE clauses (very common usually...not
>> much variance in my queries.  E.g.., "...where region = ?cRegion and
>> expcat = ?cExpcat and product = ?cProduct"  so I have a compound index
>> on region+expcat+product), and then query away.
>
> Over optimization. If you have a compound index, your query ought to be:
>
> region+expcat+product = ?mycompoundparameter
>
> It's far, far better to use one index per column and your original
> query, though. Though it is not Rushmore, and some VFP rules don't
> apply, the rule of "index expression EXACTLY matching the left side of
> the WHERE expression" does apply.

Yes, and I thought I read you and the MySQL manual online earlier 
(http://dev.mysql.com/doc/refman/5.1/en/index-merge-optimization.html) 
to say that MySQL needs that compound index of the 3 fields to best 
optimize this query?!?  I thought that MySQL couldn't use all 3 separate 
single-field indexes and as such I wouldn't gain as much optimization.  ???


> InnoDB is for transactions, which you're not doing.  It's slower on
> reads and writes than MyISAM, but don't go changing it until we learn
> some more.

Now that my production goal has been met, I've got time to run 
alternative tests (which is why I started this) so I'll track times on 
MyISAM vs InnoDB once I've got everything in place.

>
> You don't need a great big honking $12,000 machine. You could probably
> get away with it for $1000, which is less than you've wasted waiting
> on the machine and typing all these emails.

I doubt it.  My cost is fixed since this is a W-2 gig, regardless.  :-)


> If your database is too big to be loaded into memory completely, and
> each query is pulling in different results, creating too large a cache
> means you're spending all of your time reading records into memory
> (and indexing them there, and updating the cache tables) when you're
> going to need to read new records off disk the next time.
>
> If you've got the time to experiment, trying one index per column in
> the WHERE clause and trying a couple different sets of ini settings on
> one known query, shutting down and restarting between queries, can
> give you a pretty good idea which way you should look at optimizing.

Yeah, that's a good plan.  The variables I'll try are the MyISAM vs 
InnoDB and the 3-part compound index vs the 3 single indexes.

Thanks again for the feedback!


-- 
Mike Babcock, MCP
MB Software Solutions, LLC
President, Chief Software Architect
http://mbsoftwaresolutions.com
http://fabmate.com
http://twitter.com/mbabcock16

_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://leafe.com/mailman/listinfo/profox
OT-free version of this list: http://leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/[email protected]
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Reply via email to