Re: [sqlite] Performance Problems with joining and subqueries

2008-10-24 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Da Martian wrote:
> Regarding using "Case" I didnt know sqlite supported this. In fact looking
> at core functions on the web there isnt any mention of a case statement, nor
> in aggregate functions.

Case is not a function but rather an expression.  You can see the doc at
http://www.sqlite.org/lang_expr.html which also covers a lot of other
things, although it would be nice if the page included a case example.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkkBmRwACgkQmOOfHg372QSCFACeNOY/PPpARA3BT+HUbteOR3+n
7aUAn3n7kczRGEhVWPqVF1PMgSuES2IW
=dXC6
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Problems with joining and subqueries

2008-10-24 Thread Da Martian
Hi

I do appologise. In my hast, I did copy bits of a larger query. The queries
are correct they just have extra bits around the edges. I will post the
corrections below.

Regarding using "Case" I didnt know sqlite supported this. In fact looking
at core functions on the web there isnt any mention of a case statement, nor
in aggregate functions.
Knowing this I can restructure as you have suggested. My first attempt at
running the query as provided showed 10+ minutes performance similiar to the
two subqueries joined together meaning the temp table option still wins
hands down.

So I still have the question about why a temp table is quicker than a
subquery (derived table)?

>For this query, an index on (Customer, PrincipalContractNo, SASContract,
>BusinessArea, ProductGroup, Site), or any prefix of this list, might
>help.
Regarding this, I did create this index as mentioned in the post before,
however it performed worse than the PK index. The PK index query takes
around 1 minute. Creating the above index it balloons to over 10+ minutes
which I found strange as in theory a index matching the group by should help
collect the records together in the correct order.


Thanks again for your help, and sorry about the shody "copy paste" job.

Corrected Subqueires:

Corrections:
1st subquery:
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as
NCRAmt, sum(rr.revenue_amount) as RevenueAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
  (cl.ContractNo = rr.ContractNo and
 cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
  group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
  cl.ProductGroup, cl.Site, rr.period_id

2nd Subquery:
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
  PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
 cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
  FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
   (cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
  where
   rr.period_id <= 200809
  group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
  cl.ProductGroup, cl.Site

On Thu, Oct 23, 2008 at 4:52 PM, Igor Tandetnik <[EMAIL PROTECTED]> wrote:

> Da Martian <[EMAIL PROTECTED]> wrote:
> > I have two tables as follows:
> >
> > CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine
> > Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item
> > Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site
> > Text,Customer Integer,PrincipalContractNo Text,SASContract
> > Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths
> > Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort
> > Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY
> > (ContractNo ASC,ContractLine ASC))
> > -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON
> > ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC)
>
> You don't need this index. PRIMARY KEY clause has already created the
> same index implicitly.
>
> > CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo
> > Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY
> > (Period_ID
> > ASC,ContractNo ASC,ContractLine ASC))
> > CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT
> > (Period_ID ASC,ContractNo ASC,ContractLine ASC)
>
> Same here. Drop the index.
>
> > ROT_DIM_CONTRACT_LINE has 131,747 records
> > ROT_FACT_REV_ROLLOUT has 3,971,369 records
> >
> > The process I am doing is two fold:
> > 1) Joining the data for complete list
> > 2) Splitting the data by date (Period_ID) into two catagories based
> > on a
> > threshold date.
> >so if my threshold is Sep 2008 (200809) I want all records after
> > Sep 2008 to be displayed AND
> >I want a total column for all records prior to Sep 2008.
>
> > No case statements in SQLite so two subqueries does the job:
> >
> >> SEP 2008
> > select * from
> > (SELECT cl.Customer as Customer, cl.PrincipalContractNo as
> > PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
> > cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as
> > NCRAmt, sum(rr.revenue_amount) as RevenueAmt
> > FROM
> >  ROT_DIM_CONTRACT_LINE cl join
> >  ROT_FACT_REV_ROLLOUT rr on
> >(cl.ContractNo = rr.ContractNo and
> > cl.ContractLine = rr.ContractLine)
> > where
> >rr.period_id > 200809
> > group by
> >   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
> > cl.ProductGroup, cl.Site, rr.period_id
>
> You have fewer closing parens here than you have opening ones. Why are
> you doing "select from select"? Why an extra level of indirection?
>
> For this query, an index on (Customer, PrincipalContractNo, SASContract,
> BusinessArea, ProductGroup, Site), or any prefix of this list, might
> help.
>

Re: [sqlite] Performance Problems with joining and subqueries

2008-10-23 Thread Igor Tandetnik
Da Martian <[EMAIL PROTECTED]> wrote:
> I have two tables as follows:
>
> CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine
> Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item
> Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site
> Text,Customer Integer,PrincipalContractNo Text,SASContract
> Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths
> Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort
> Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY
> (ContractNo ASC,ContractLine ASC))
> -CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON
> ROT_DIM_CONTRACT_LINE (ContractNo ASC,ContractLine ASC)

You don't need this index. PRIMARY KEY clause has already created the 
same index implicitly.

> CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo
> Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY
> (Period_ID
> ASC,ContractNo ASC,ContractLine ASC))
> CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT
> (Period_ID ASC,ContractNo ASC,ContractLine ASC)

Same here. Drop the index.

> ROT_DIM_CONTRACT_LINE has 131,747 records
> ROT_FACT_REV_ROLLOUT has 3,971,369 records
>
> The process I am doing is two fold:
> 1) Joining the data for complete list
> 2) Splitting the data by date (Period_ID) into two catagories based
> on a
> threshold date.
>so if my threshold is Sep 2008 (200809) I want all records after
> Sep 2008 to be displayed AND
>I want a total column for all records prior to Sep 2008.

> No case statements in SQLite so two subqueries does the job:
>
>> SEP 2008
> select * from
> (SELECT cl.Customer as Customer, cl.PrincipalContractNo as
> PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
> cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as
> NCRAmt, sum(rr.revenue_amount) as RevenueAmt
> FROM
>  ROT_DIM_CONTRACT_LINE cl join
>  ROT_FACT_REV_ROLLOUT rr on
>(cl.ContractNo = rr.ContractNo and
> cl.ContractLine = rr.ContractLine)
> where
>rr.period_id > 200809
> group by
>   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
> cl.ProductGroup, cl.Site, rr.period_id

You have fewer closing parens here than you have opening ones. Why are 
you doing "select from select"? Why an extra level of indirection?

For this query, an index on (Customer, PrincipalContractNo, SASContract, 
BusinessArea, ProductGroup, Site), or any prefix of this list, might 
help.

> <= SEP 2008
> SELECT cl.Customer as Customer, cl.PrincipalContractNo as
> PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
> cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
> FROM
>ROT_DIM_CONTRACT_LINE cl join
>ROT_FACT_REV_ROLLOUT rr on
>   (cl.ContractNo = rr.ContractNo and
>cl.ContractLine = rr.ContractLine)
> where
>   rr.period_id <= 200809
> group by
>   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
> cl.ProductGroup, cl.Site) ncritd on

And here you have more closing parents than you have opening ones. I'm 
confused. Could you post syntactically correct statements that can 
actually be executed?

If I understand your problem, you want, for each (Customer, ...) 
combination, one row for each month after 9/08, and a single row for all 
months before and including that. E.g., now is October 2008, so you want 
two lines for each (Customer, ...) - one for October, and one for all 
time before that. I believe something like this should work:

SELECT cl.Customer, cl.PrincipalContractNo, cl.SASContract, 
cl.BusinessArea,
cl.ProductGroup, cl.Site,
(case when rr.period_id > 200809 then rr.period_id else 0 end) as 
period,
sum(cl.NCRAmt) as NCRAmt, sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
 cl.ContractLine = rr.ContractLine)
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
   cl.ProductGroup, cl.Site, period;

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Performance Problems with joining and subqueries

2008-10-23 Thread Da Martian
I have two tables as follows:

CREATE TABLE ROT_DIM_CONTRACT_LINE (ContractNo Integer,ContractLine
Integer,StartDate Integer,EndDate Integer,NCRAmt Float,Item
Integer,BusinessArea Text,ProductGroup Text,ProductStyle Text,Site
Text,Customer Integer,PrincipalContractNo Text,SASContract
Text,ContractMonths Float,StartMonths Float,FullMonths Float,EndMonths
Float,MonthlyAmortAmt Float,FirstAmortAmt Float,LastAmort
Float,BalancingAmortAmt Float,RolloutToContractDiff Float, PRIMARY KEY
(ContractNo ASC,ContractLine ASC))
-CREATE UNIQUE INDEX IDX_ROT_DIM_CONTRACT_LINE_1 ON ROT_DIM_CONTRACT_LINE
(ContractNo ASC,ContractLine ASC)

CREATE TABLE ROT_FACT_REV_ROLLOUT (Period_ID Integer,ContractNo
Integer,ContractLine Integer,Revenue_Amount Float, PRIMARY KEY (Period_ID
ASC,ContractNo ASC,ContractLine ASC))
CREATE UNIQUE INDEX IDX_ROT_FACT_REV_ROLLOUT_1 ON ROT_FACT_REV_ROLLOUT
(Period_ID ASC,ContractNo ASC,ContractLine ASC)


ROT_DIM_CONTRACT_LINE has 131,747 records
ROT_FACT_REV_ROLLOUT has 3,971,369 records

The process I am doing is two fold:
1) Joining the data for complete list
2) Splitting the data by date (Period_ID) into two catagories based on a
threshold date.
so if my threshold is Sep 2008 (200809) I want all records after Sep
2008 to be displayed AND
I want a total column for all records prior to Sep 2008.

No case statements in SQLite so two subqueries does the job:

> SEP 2008
select * from
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt,
sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
 cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id

<= SEP 2008
SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
   (cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
   rr.period_id <= 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site) ncritd on
   (fut.Customer = ncritd.Customer and
fut.PrincipalContractNo = ncritd.PrincipalContractNo and
fut.SASContractNo = ncritd.SASContractNo and
fut.BusinessArea = ncritd.BusinessArea and
fut.ProductGroup = ncritd.ProductGroup and
fut.Site = ncritd.Site)
order by
   fut.Customer, fut.PrincipalContractNo, fut.SASContractNo,
fut.BusinessArea, fut.ProductGroup, fut.Site, fut.Period

Ignoring joining for the moment. Each of these queries on there own takes
around 1 minute. If however I create an index to satisfy the group by, say :

create index IDX_ROT_DIM_CONTRACT_LINE_2 on ROT_DIM_CONTRACT_LINE (Customer
ASC, PrincipalContractNo ASC, SASContract ASC, BusinessArea ASC,
ProductGroup ASC, Site ASC)
create index IDX_ROT_DIM_CONTRACT_LINE_3 on ROT_DIM_CONTRACT_LINE
(ContractNo ASC, ContractLine ASC, Customer ASC, PrincipalContractNo ASC,
SASContract ASC, BusinessArea ASC, ProductGroup ASC, Site ASC)
analyze

The query will run for 13 muntes + (killed after 13 minutes last time).
Explain Query plan confirms the use of IDX_ROT_DIM_CONTRACT_LINE_2

Now back to joining, without any extra indices and an empty sqlite_Stat1
table the following query performs terribly (in my opinion). Its simply a
join between the above two subqueries:

select * from
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id as period, sum(cl.NCRAmt) as NCRAmt,
sum(rr.revenue_amount) as RevenueAmt
FROM
  ROT_DIM_CONTRACT_LINE cl join
  ROT_FACT_REV_ROLLOUT rr on
(cl.ContractNo = rr.ContractNo and
 cl.ContractLine = rr.ContractLine)
where
rr.period_id > 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site, rr.period_id) fut join
(SELECT cl.Customer as Customer, cl.PrincipalContractNo as
PrincipalContractNo, SASContract as SASContractNo, cl.BusinessArea,
cl.ProductGroup, cl.Site, sum(cl.NCRAmt) as NCRAmt
FROM
ROT_DIM_CONTRACT_LINE cl join
ROT_FACT_REV_ROLLOUT rr on
   (cl.ContractNo = rr.ContractNo and
cl.ContractLine = rr.ContractLine)
where
   rr.period_id <= 200809
group by
   cl.Customer, PrincipalContractNo, SASContract, cl.BusinessArea,
cl.ProductGroup, cl.Site) ncritd on
   (fut.Customer = ncritd.Customer and
fut.PrincipalContractNo = ncritd.PrincipalContractNo and
fut.SASContractNo = ncritd.SASContractNo and
fut.BusinessArea = ncritd.BusinessArea and
fut.ProductGroup = ncritd.ProductGroup and
fut.Site = ncritd

Re: [sqlite] Performance Problems with joining and subqueries

2008-10-23 Thread Igor Tandetnik
"Da Martian" <[EMAIL PROTECTED]> wrote
in message
news:[EMAIL PROTECTED]
> Does anyone have any ideas on how to optimise this type of process in
> SQLite?

What type of process? Show your tables and your query.

Igor Tandetnik



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Performance Problems with joining and subqueries

2008-10-23 Thread Da Martian
Hi

I have two queries. Each returns a full result set in around 1 minute when
run on there own. Making each of these queries a subquery and joining them
with an inner join, and the query takes more than 10 minutes. I cancelled it
after 10 minutes.

Does anyone have any ideas on how to optimise this type of process in
SQLite?

Indices only make things worse. I have read through most of the threads on
this and it still seems unclear as to when and why indices chosen seem to be
worse.

Analyse always seems to make things worse. So does adding indices.

I have a two tables:  one wide and low on records and one thin and having
around 4 million records. Withonly UNIQUE PK indices, the queries complete
in around 1 minute usually.

With an index that satisfies the group by, it takes over 10 minutes and
still doesnt complete. Analsye makes sure it uses the group by index as
well. To get best performance I seem to have to clear all indices and the
sqlite_stat1 table.


So I am at a loss as to how to take two 1 minute queriues and join them in
an efficient manner as subqueries.

If I make the first subquery into a temp table then do join the temp table
to the second query, it completes in 1.5 minutes.

I would expect the same performance from subqueries.

Thanksm
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users