Re: [sqlite] help with a complicated join of two tables

2011-09-14 Thread Petite Abeille

On Sep 14, 2011, at 8:40 PM, Igor Tandetnik wrote:

> Think about it this way. You have a phone book, where names are sorted by 
> last name, then first name. You want to find all people whose last name is 
> greater than 'Smith' and first name less than 'John'. The alphabetic order 
> helps you with the first half, but not really with the second half - names 
> satisfying both conditions don't appear sequentially in the list.

Ah, yes, I wish SQLite supported bitmap indices :))

http://en.wikipedia.org/wiki/Bitmap_index

Bitmap Index vs. B-tree Index: Which and When?
http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-14 Thread Igor Tandetnik

On 9/14/2011 2:07 PM, Jan Hudec wrote:

On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:

select geo.id, min_age, max_age, age_bottom, age_top, name, color

>from geo left join intervals i on i.id = (

select id from intervals
where age_bottom>=
(select age_bottom from intervals where name = geo.max_age)
and age_top<=
(select age_top from intervals where name = geo.min_age)
order by (age_bottom - age_top) limit 1
);


Looking at the query I'd say the needed indices are:

  create index intervals_name on intervals (name)
  create index intervals_ages on intervals (age_bottom, age_top)

separate indices on age_bottom and age_top are much less useful to this
query.


I don't believe an index on intervals(age_bottom, age_top) can be used 
here. Or rather, it can be, but really only one half of it, to satisfy 
age_bottom>=X condition. The other half, age_top<=Y, needs to be 
satisfied with a linear scan. That's why I said that an index on 
intervals(age_bottom) would help, or one on intervals(age_top), but not 
both at the same time.


Think about it this way. You have a phone book, where names are sorted 
by last name, then first name. You want to find all people whose last 
name is greater than 'Smith' and first name less than 'John'. The 
alphabetic order helps you with the first half, but not really with the 
second half - names satisfying both conditions don't appear sequentially 
in the list.

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-14 Thread Jan Hudec
On Mon, Sep 12, 2011 at 12:16:55 -0400, Igor Tandetnik wrote:
> On 9/12/2011 12:02 PM, Mr. Puneet Kishor wrote:
> >
> >On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:
> >>Something like this:
> >>
> >>select geo.id, min_age, max_age, age_bottom, age_top, name, color
> >>from geo left join intervals i on i.id = (
> >>select id from intervals
> >>where age_bottom>=
> >>(select age_bottom from intervals where name = geo.max_age)
> >>and age_top<=
> >>(select age_top from intervals where name = geo.min_age)
> >>order by (age_bottom - age_top) limit 1
> >>);
> >>
> >
> >
> >Thanks Igor. The above does work and produces the correct result. The
> >query speed, however, is pretty slow ~ 75 seconds. So, I created
> >indexes on intervals.name, geo.max_age, and geo.min_age, and that
> >brought the query time to ~ 11 seconds. Still too slow.
> 
> Indexes on geo.max_age and min_age are unlikely to help with this
> query (use EXPLAIN QUERY PLAN to see which indexes are actually
> used). An index on intervals.age_bottom might. So would an index on
> intervals.age_top (but not both at the same time).

Looking at the query I'd say the needed indices are:

 create index intervals_name on intervals (name)
 create index intervals_ages on intervals (age_bottom, age_top)

separate indices on age_bottom and age_top are much less useful to this
query.

-- 
 Jan 'Bulb' Hudec 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-12 Thread Igor Tandetnik

On 9/12/2011 12:02 PM, Mr. Puneet Kishor wrote:


On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:

Something like this:

select geo.id, min_age, max_age, age_bottom, age_top, name, color
from geo left join intervals i on i.id = (
select id from intervals
where age_bottom>=
(select age_bottom from intervals where name = geo.max_age)
and age_top<=
(select age_top from intervals where name = geo.min_age)
order by (age_bottom - age_top) limit 1
);




Thanks Igor. The above does work and produces the correct result. The
query speed, however, is pretty slow ~ 75 seconds. So, I created
indexes on intervals.name, geo.max_age, and geo.min_age, and that
brought the query time to ~ 11 seconds. Still too slow.


Indexes on geo.max_age and min_age are unlikely to help with this query 
(use EXPLAIN QUERY PLAN to see which indexes are actually used). An 
index on intervals.age_bottom might. So would an index on 
intervals.age_top (but not both at the same time).

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-12 Thread Mr . Puneet Kishor

On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote:

> Mr. Puneet Kishor  wrote:
 The table geo can also have rows with min_age = max_age. I want a result 
 set with geo.id, min_age, max_age, age_bottom, age_top,
 name, color like so:
 
 - every row should be for one and only one geo record. I have 39K rows in 
 "geo" table, so the result set should have 39K rows.
 
 - when min_age = max_age, list the corresponding intervals.name and color
 
 - when min_age != max_age, find the interval with the smallest different 
 between age_bottom and age_top that would span min_age
 and max_age of geo. In other words, the interval whose age_bottom is 
 bigger than the age_bottom of the max_age and whose age_top
 is smaller than the age_top of the min_age.
> 
> Something like this:
> 
> select geo.id, min_age, max_age, age_bottom, age_top, name, color
> from geo left join intervals i on i.id = (
>select id from intervals
>where age_bottom >=
>(select age_bottom from intervals where name = geo.max_age)
>and age_top <= 
>(select age_top from intervals where name = geo.min_age)
>order by (age_bottom - age_top) limit 1
> );
> 


Thanks Igor. The above does work and produces the correct result. The query 
speed, however, is pretty slow ~ 75 seconds. So, I created indexes on 
intervals.name, geo.max_age, and geo.min_age, and that brought the query time 
to ~ 11 seconds. Still too slow. So, I created an intermediate table to hold 
the results, and the speed is more satisfactory.

Many thanks again.

Puneet.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-12 Thread Igor Tandetnik
Mr. Puneet Kishor  wrote:
>>> The table geo can also have rows with min_age = max_age. I want a result 
>>> set with geo.id, min_age, max_age, age_bottom, age_top,
>>> name, color like so:
>>> 
>>> - every row should be for one and only one geo record. I have 39K rows in 
>>> "geo" table, so the result set should have 39K rows.
>>> 
>>> - when min_age = max_age, list the corresponding intervals.name and color
>>> 
>>> - when min_age != max_age, find the interval with the smallest different 
>>> between age_bottom and age_top that would span min_age
>>> and max_age of geo. In other words, the interval whose age_bottom is bigger 
>>> than the age_bottom of the max_age and whose age_top
>>> is smaller than the age_top of the min_age.

Something like this:

select geo.id, min_age, max_age, age_bottom, age_top, name, color
from geo left join intervals i on i.id = (
select id from intervals
where age_bottom >=
(select age_bottom from intervals where name = geo.max_age)
and age_top <= 
(select age_top from intervals where name = geo.min_age)
order by (age_bottom - age_top) limit 1
);

-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-11 Thread Mr. Puneet Kishor

On Sep 11, 2011, at 9:58 PM, Igor Tandetnik wrote:

> Mr. Puneet Kishor  wrote:
>> geo table: 39K rows
>> id max_age min_age
>> --- --- 
>> 1 Holocene Holocene
>> 5 Cambrian Silurian
>> 12 Cambrian Ordovician
>> 229 Cretaceous Quaternary
>> 
>> intervals table: ~450 rows
>> id age_bottom age_top name color
>> --- --  --- --- ---
>> 3 0.0117 0. Holocene #FEF2E0
>> 105 443.7000 416. Silurian #B3E1B6
>> 112 488.3000 443.7000 Ordovician #009270
>> 421 2.5880 0. Quaternary #F9F97F
>> 122 542. 488.3000 Cambrian #7FA056
>> 33 145.5000 65.5000 Cretaceous #7FC64E
>> 
>> Keep in mind, max_age is older than min_age, and age_bottom is older than 
>> age_top.
>> 
>> The table geo can also have rows with min_age = max_age. I want a result set 
>> with geo.id, min_age, max_age, age_bottom, age_top,
>> name, color like so: 
>> 
>> - every row should be for one and only one geo record. I have 39K rows in 
>> "geo" table, so the result set should have 39K rows.
>> 
>> - when min_age = max_age, list the corresponding intervals.name and color
>> 
>> - when min_age != max_age, find the interval with the smallest different 
>> between age_bottom and age_top that would span min_age
>> and max_age of geo. In other words, the interval whose age_bottom is bigger 
>> than the age_bottom of the max_age and whose age_top
>> is smaller than the age_top of the min_age.  
> 
> I'm not sure I understand. Let's take geo.id = 5, max_age=Cambrian, 
> min_age=Silurian. You say you want a record whose age_bottom is greater than 
> that corresponding to Cambrian, that is 542.; and whose age_top is 
> smaller than that corresponding to Silurian, or 416.. I don't seem to see 
> any such record in your example.


Hi Igor, I muddied the issue by providing limited sample data that doesn't have 
an answer that fits my needs. Needless to say, in the complete tables there 
would be answers. My (slightly simplified) query below shows the kind of result 
I want

SELECT g.id, max_age, min_age, Min(i.age_bottom - i.age_top) age_range, 
i.interval_name, i.interval_color
FROM geo g 
 LEFT JOIN intervals i_max ON g.max_age = i_max.interval_name 
 LEFT JOIN intervals i_min ON g.min_age = i_min.interval_name
 JOIN intervals i ON 
i.age_bottom >= i_max.age_bottom AND 
i.age_top <= i_min.age_top 
GROUP BY g.gid, g.max_age, g.min_age, i.interval_name, i.interval_color
ORDER BY g.gid, age_range;

produces the following ---

g.idmax_age min_age age_range   interval_name   
interval_color
--- --- --- --- --- 
--
1   Paleozoic   Paleozoic   291.Paleozoic   #99C08D
1   Paleozoic   Paleozoic   542.Phanerozoic #9AD9DD
4   Precambrian Precambrian 3458.   Precambrian #F04370
5   CambrianSilurian291.Paleozoic   #99C08D
5   CambrianSilurian542.Phanerozoic #9AD9DD
6   SilurianSilurian27.7000 Silurian#B3E1B6
6   SilurianSilurian291.Paleozoic   #99C08D
..
94K+ rows

What I want from the above table is only the first row of each g.id group 
because that has the interval that has the smallest "age_range" that spans the 
"max_age" and "min_age"

Sorry, I am not able to articulate this more clearly, but I hope the above 
example makes things clearer.

Puneet.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] help with a complicated join of two tables

2011-09-11 Thread Igor Tandetnik
Mr. Puneet Kishor  wrote:
> geo table: 39K rows
> id max_age min_age
> --- --- 
> 1 Holocene Holocene
> 5 Cambrian Silurian
> 12 Cambrian Ordovician
> 229 Cretaceous Quaternary
> 
> intervals table: ~450 rows
> id age_bottom age_top name color
> --- --  --- --- ---
> 3 0.0117 0. Holocene #FEF2E0
> 105 443.7000 416. Silurian #B3E1B6
> 112 488.3000 443.7000 Ordovician #009270
> 421 2.5880 0. Quaternary #F9F97F
> 122 542. 488.3000 Cambrian #7FA056
> 33 145.5000 65.5000 Cretaceous #7FC64E
> 
> Keep in mind, max_age is older than min_age, and age_bottom is older than 
> age_top.
> 
> The table geo can also have rows with min_age = max_age. I want a result set 
> with geo.id, min_age, max_age, age_bottom, age_top,
> name, color like so: 
> 
> - every row should be for one and only one geo record. I have 39K rows in 
> "geo" table, so the result set should have 39K rows.
> 
> - when min_age = max_age, list the corresponding intervals.name and color
> 
> - when min_age != max_age, find the interval with the smallest different 
> between age_bottom and age_top that would span min_age
> and max_age of geo. In other words, the interval whose age_bottom is bigger 
> than the age_bottom of the max_age and whose age_top
> is smaller than the age_top of the min_age.  

I'm not sure I understand. Let's take geo.id = 5, max_age=Cambrian, 
min_age=Silurian. You say you want a record whose age_bottom is greater than 
that corresponding to Cambrian, that is 542.; and whose age_top is smaller 
than that corresponding to Silurian, or 416.. I don't seem to see any such 
record in your example.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users