Re: [sqlite] Improving performance of GROUP BY

Udi Karni Sun, 17 Jun 2012 10:03:08 -0700

Thanks. There are no indexes of any kind because I would need many and they
take very long to create.


I am experimenting with various GROUP-BY scenarios like the following where
all the work is done in TEMP.

When TEMP is on disk - it finishes but is slower - when it's in memory -
it's faster but I sometime get an "out of memory" error.

I didn't completely understand the ANALYZE output. Is there a way to get it
to estimate how many MB/GB of TEMP a SQL statement would need to
successfully complete?

  CREATE TABLE X
      AS
  SELECT
         COL1,
         COL2,
         COUNT (*) CNT
    FROM
         300K_ROW_TABLE X,
         100M_ROW_TABLE Y
   WHERE
         Y.YYYYMM BETWEEN X.YYYYMM_START  AND  X.YYYYMM_END
GROUP BY
         COL1,
         COL2
  HAVING
         COUNT (*) = 24
/


On Tue, May 29, 2012 at 3:04 PM, Nico Williams <[email protected]>wrote:

> On Fri, May 25, 2012 at 1:38 PM, Udi Karni <[email protected]> wrote:
> > I am running the following query -
> >
> > CREATE TABLE XYZ AS
> > SELECT ID, MIN (DATE)
> > FROM SOURCE-TABLE
> > WHERE CRITERIA
> > GROUP BY ID ;
> >
> > SOURCE-TABLE has 600 million rows, 2 million meet the WHERE criteria, and
> > get grouped to 100,000 distinct IDs. There are no indexes as there is too
> > much variety in the WHERE clauses and index creation on this table takes
> > many hours. Temp_store=memory. When executed - the source table is read
> > very slowly, and runtime is 1/2 hour.
> >
> > I split this SQL into 2 steps - the first just extracting - the second
> just
> > grouping. The extract read the table extremely fast and finished in 3
> > minutes. The GROUP BY ran in a few seconds. So a 10X improvement - which
> is
> > pretty incredible speed.
>
> Meaning?  Something like this:
>
> CREATE TABLE FOO AS SELECT ID, DATE FROM SOURCE_TABLE;
> CREATE TABLE BAR AS SELECT ID, MIN(DATE) FROM FOO GROUP BY ID;
>
> ?
>
> Are you sure that there are no indexes?  Maybe there are indexes
> implied by UNIQUE constraints and maybe SQLite3 is incorrectly
> deciding to scan the index instead of the table?
>
> My guess is that you have a UNIQUE constraint on ID in SOURCE_TABLE
> and SQLite3 is scanning the index and then fetching the DATE from the
> table.  But this makes no sense because then it makes no sense to
> select the min() of DATE as there would be a single row per-ID.  My
> guess has to be wrong.  Maybe there's a UNIQUE constraint on several
> columns including ID?  That would not be incompatible with the
> semantics of your query.
>
> Can you post EXPLAIN QUERY PLAN output for your statements?  And/or your
> schema.
>
> Nico
> --
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Improving performance of GROUP BY

Reply via email to