Re: [sqlite] Performance Problems with joining and subqueries
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Da Martian wrote: > Regarding using "Case" I didnt know sqlite supported this. In fact looking > at core functions on the web there isnt any mention of a case statement, nor > in aggregate functions. Case is not a function but rather an expression. You can see the doc at http://www.sqlite.org/lang_expr.html which also covers a lot of other things, although it would be nice if the page included a case example. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkkBmRwACgkQmOOfHg372QSCFACeNOY/PPpARA3BT+HUbteOR3+n 7aUAn3n7kczRGEhVWPqVF1PMgSuES2IW =dXC6 -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problems with joining and subqueries
Hi I do appologise. In my hast, I did copy bits of a larger query. The queries are correct they just have extra bits around the edges. I will post the corrections below. Regarding using "Case" I didnt know sqlite supported this. In fact looking at core functions on the web there isnt any mention of a case statement, nor in aggregate functions. Knowing this I can restructure as you have suggested. My first attempt at running the query as provided showed 10+ minutes performance similiar to the two subqueries joined together meaning the temp table option still wins hands down. So I still have the question about why a temp table is quicker than a subquery (derived table)? >For this query, an index on (Customer, PrincipalContractNo, SASContract, >BusinessArea, ProductGroup, Site), or any prefix of this list, might >help. Regarding this, I did create this index as mentioned in the post before, however it performed worse than the PK index. The PK index query takes around 1 minute. Creating the above index it balloons to over 10+ minutes which I found strange as in theory a index matching the group by should help collect the records together in the correct order. Thanks again for your help, and sorry about the shody "copy paste" job. Corrected Subqueires: Corrections: 1st subquery: SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id > 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id 2nd Subquery: SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id <= 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site On Thu, Oct 23, 2008 at 4:52 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote: > Da Martian <[EMAIL PROTECTED]> wrote: > > I have two tables as follows: > > > > CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine > > Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item > > Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site > > Text,Customer Integer,PrincipalContractNo Text,SASContract > > Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths > > Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort > > Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY > > (ContractNo ASC,ContractLine ASC)) > > -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON > > ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC) > > You don't need this index. PRIMARY KEY clause has already created the > same index implicitly. > > > CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo > > Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY > > (Period_ID > > ASC,ContractNo ASC,ContractLine ASC)) > > CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT > > (Period_ID ASC,ContractNo ASC,ContractLine ASC) > > Same here. Drop the index. > > > ROT_DIM_CONTRACT_LINE has 131,747 records > > ROT_FACT_REV_ROLLOUT has 3,971,369 records > > > > The process I am doing is two fold: > > 1) Joining the data for complete list > > 2) Splitting the data by date (Period_ID) into two catagories based > > on a > > threshold date. > >so if my threshold is Sep 2008 (200809) I want all records after > > Sep 2008 to be displayed AND > >I want a total column for all records prior to Sep 2008. > > > No case statements in SQLite so two subqueries does the job: > > > >> SEP 2008 > > select * from > > (SELECT cl.Customer as Customer, cl.PrincipalContractNo as > > PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, > > cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as > > NCRAmt, sum(rr.revenue_amount) as RevenueAmt > > FROM > > ROT_DIM_CONTRACT_LINE cl join > > ROT_FACT_REV_ROLLOUT rr on > >(cl.ContractNo = rr.ContractNo and > > cl.ContractLine = rr.ContractLine) > > where > >rr.period_id > 200809 > > group by > > cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, > > cl.ProductGroup, cl.Site, rr.period_id > > You have fewer closing parens here than you have opening ones. Why are > you doing "select from select"? Why an extra level of indirection? > > For this query, an index on (Customer, PrincipalContractNo, SASContract, > BusinessArea, ProductGroup, Site), or any prefix of this list, might > help. >
Re: [sqlite] Performance Problems with joining and subqueries
Da Martian <[EMAIL PROTECTED]> wrote: > I have two tables as follows: > > CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine > Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item > Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site > Text,Customer Integer,PrincipalContractNo Text,SASContract > Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths > Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort > Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY > (ContractNo ASC,ContractLine ASC)) > -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON > ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC) You don't need this index. PRIMARY KEY clause has already created the same index implicitly. > CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo > Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY > (Period_ID > ASC,ContractNo ASC,ContractLine ASC)) > CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT > (Period_ID ASC,ContractNo ASC,ContractLine ASC) Same here. Drop the index. > ROT_DIM_CONTRACT_LINE has 131,747 records > ROT_FACT_REV_ROLLOUT has 3,971,369 records > > The process I am doing is two fold: > 1) Joining the data for complete list > 2) Splitting the data by date (Period_ID) into two catagories based > on a > threshold date. >so if my threshold is Sep 2008 (200809) I want all records after > Sep 2008 to be displayed AND >I want a total column for all records prior to Sep 2008. > No case statements in SQLite so two subqueries does the job: > >> SEP 2008 > select * from > (SELECT cl.Customer as Customer, cl.PrincipalContractNo as > PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, > cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as > NCRAmt, sum(rr.revenue_amount) as RevenueAmt > FROM > ROT_DIM_CONTRACT_LINE cl join > ROT_FACT_REV_ROLLOUT rr on >(cl.ContractNo = rr.ContractNo and > cl.ContractLine = rr.ContractLine) > where >rr.period_id > 200809 > group by > cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, > cl.ProductGroup, cl.Site, rr.period_id You have fewer closing parens here than you have opening ones. Why are you doing "select from select"? Why an extra level of indirection? For this query, an index on (Customer, PrincipalContractNo, SASContract, BusinessArea, ProductGroup, Site), or any prefix of this list, might help. > <= SEP 2008 > SELECT cl.Customer as Customer, cl.PrincipalContractNo as > PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, > cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt > FROM >ROT_DIM_CONTRACT_LINE cl join >ROT_FACT_REV_ROLLOUT rr on > (cl.ContractNo = rr.ContractNo and >cl.ContractLine = rr.ContractLine) > where > rr.period_id <= 200809 > group by > cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, > cl.ProductGroup, cl.Site) ncritd on And here you have more closing parents than you have opening ones. I'm confused. Could you post syntactically correct statements that can actually be executed? If I understand your problem, you want, for each (Customer, ...) combination, one row for each month after 9/08, and a single row for all months before and including that. E.g., now is October 2008, so you want two lines for each (Customer, ...) - one for October, and one for all time before that. I believe something like this should work: SELECT cl.Customer, cl.PrincipalContractNo, cl.SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, (case when rr.period_id > 200809 then rr.period_id else 0 end) as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, period; Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Performance Problems with joining and subqueries
I have two tables as follows: CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site Text,Customer Integer,PrincipalContractNo Text,SASContract Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY (ContractNo ASC,ContractLine ASC)) -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC) CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY (Period_ID ASC,ContractNo ASC,ContractLine ASC)) CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT (Period_ID ASC,ContractNo ASC,ContractLine ASC) ROT_DIM_CONTRACT_LINE has 131,747 records ROT_FACT_REV_ROLLOUT has 3,971,369 records The process I am doing is two fold: 1) Joining the data for complete list 2) Splitting the data by date (Period_ID) into two catagories based on a threshold date. so if my threshold is Sep 2008 (200809) I want all records after Sep 2008 to be displayed AND I want a total column for all records prior to Sep 2008. No case statements in SQLite so two subqueries does the job: > SEP 2008 select * from (SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id > 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id <= SEP 2008 SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id <= 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site) ncritd on (fut.Customer = ncritd.Customer and fut.PrincipalContractNo = ncritd.PrincipalContractNo and fut.SASContractNo = ncritd.SASContractNo and fut.BusinessArea = ncritd.BusinessArea and fut.ProductGroup = ncritd.ProductGroup and fut.Site = ncritd.Site) order by fut.Customer, fut.PrincipalContractNo, fut.SASContractNo, fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period Ignoring joining for the moment. Each of these queries on there own takes around 1 minute. If however I create an index to satisfy the group by, say : create index IDX_ROT_DIM_CONTRACT_LINE_2 on ROT_DIM_CONTRACT_LINE (Customer ASC, PrincipalContractNo ASC, SASContract ASC, BusinessArea ASC, ProductGroup ASC, Site ASC) create index IDX_ROT_DIM_CONTRACT_LINE_3 on ROT_DIM_CONTRACT_LINE (ContractNo ASC, ContractLine ASC, Customer ASC, PrincipalContractNo ASC, SASContract ASC, BusinessArea ASC, ProductGroup ASC, Site ASC) analyze The query will run for 13 muntes + (killed after 13 minutes last time). Explain Query plan confirms the use of IDX_ROT_DIM_CONTRACT_LINE_2 Now back to joining, without any extra indices and an empty sqlite_Stat1 table the following query performs terribly (in my opinion). Its simply a join between the above two subqueries: select * from (SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id > 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site, rr.period_id) fut join (SELECT cl.Customer as Customer, cl.PrincipalContractNo as PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea, cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt FROM ROT_DIM_CONTRACT_LINE cl join ROT_FACT_REV_ROLLOUT rr on (cl.ContractNo = rr.ContractNo and cl.ContractLine = rr.ContractLine) where rr.period_id <= 200809 group by cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea, cl.ProductGroup, cl.Site) ncritd on (fut.Customer = ncritd.Customer and fut.PrincipalContractNo = ncritd.PrincipalContractNo and fut.SASContractNo = ncritd.SASContractNo and fut.BusinessArea = ncritd.BusinessArea and fut.ProductGroup = ncritd.ProductGroup and fut.Site = ncritd
Re: [sqlite] Performance Problems with joining and subqueries
"Da Martian" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Does anyone have any ideas on how to optimise this type of process in > SQLite? What type of process? Show your tables and your query. Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Performance Problems with joining and subqueries
Hi I have two queries. Each returns a full result set in around 1 minute when run on there own. Making each of these queries a subquery and joining them with an inner join, and the query takes more than 10 minutes. I cancelled it after 10 minutes. Does anyone have any ideas on how to optimise this type of process in SQLite? Indices only make things worse. I have read through most of the threads on this and it still seems unclear as to when and why indices chosen seem to be worse. Analyse always seems to make things worse. So does adding indices. I have a two tables: one wide and low on records and one thin and having around 4 million records. Withonly UNIQUE PK indices, the queries complete in around 1 minute usually. With an index that satisfies the group by, it takes over 10 minutes and still doesnt complete. Analsye makes sure it uses the group by index as well. To get best performance I seem to have to clear all indices and the sqlite_stat1 table. So I am at a loss as to how to take two 1 minute queriues and join them in an efficient manner as subqueries. If I make the first subquery into a temp table then do join the temp table to the second query, it completes in 1.5 minutes. I would expect the same performance from subqueries. Thanksm ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users