Re: [sqlite] help with a complicated join of two tables
On Sep 14, 2011, at 8:40 PM, Igor Tandetnik wrote: > Think about it this way. You have a phone book, where names are sorted by > last name, then first name. You want to find all people whose last name is > greater than 'Smith' and first name less than 'John'. The alphabetic order > helps you with the first half, but not really with the second half - names > satisfying both conditions don't appear sequentially in the list. Ah, yes, I wish SQLite supported bitmap indices :)) http://en.wikipedia.org/wiki/Bitmap_index Bitmap Index vs. B-tree Index: Which and When? http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
On 9/14/2011 2:07 PM, Jan Hudec wrote: On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote: select geo.id, min_age, max_age, age_bottom, age_top, name, color >from geo left join intervals i on i.id = ( select id from intervals where age_bottom>= (select age_bottom from intervals where name = geo.max_age) and age_top<= (select age_top from intervals where name = geo.min_age) order by (age_bottom - age_top) limit 1 ); Looking at the query I'd say the needed indices are: create index intervals_name on intervals (name) create index intervals_ages on intervals (age_bottom, age_top) separate indices on age_bottom and age_top are much less useful to this query. I don't believe an index on intervals(age_bottom, age_top) can be used here. Or rather, it can be, but really only one half of it, to satisfy age_bottom>=X condition. The other half, age_top<=Y, needs to be satisfied with a linear scan. That's why I said that an index on intervals(age_bottom) would help, or one on intervals(age_top), but not both at the same time. Think about it this way. You have a phone book, where names are sorted by last name, then first name. You want to find all people whose last name is greater than 'Smith' and first name less than 'John'. The alphabetic order helps you with the first half, but not really with the second half - names satisfying both conditions don't appear sequentially in the list. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
On Mon, Sep 12, 2011 at 12:16:55 -0400, Igor Tandetnik wrote: > On 9/12/2011 12:02 PM, Mr. Puneet Kishor wrote: > > > >On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote: > >>Something like this: > >> > >>select geo.id, min_age, max_age, age_bottom, age_top, name, color > >>from geo left join intervals i on i.id = ( > >>select id from intervals > >>where age_bottom>= > >>(select age_bottom from intervals where name = geo.max_age) > >>and age_top<= > >>(select age_top from intervals where name = geo.min_age) > >>order by (age_bottom - age_top) limit 1 > >>); > >> > > > > > >Thanks Igor. The above does work and produces the correct result. The > >query speed, however, is pretty slow ~ 75 seconds. So, I created > >indexes on intervals.name, geo.max_age, and geo.min_age, and that > >brought the query time to ~ 11 seconds. Still too slow. > > Indexes on geo.max_age and min_age are unlikely to help with this > query (use EXPLAIN QUERY PLAN to see which indexes are actually > used). An index on intervals.age_bottom might. So would an index on > intervals.age_top (but not both at the same time). Looking at the query I'd say the needed indices are: create index intervals_name on intervals (name) create index intervals_ages on intervals (age_bottom, age_top) separate indices on age_bottom and age_top are much less useful to this query. -- Jan 'Bulb' Hudec___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
On 9/12/2011 12:02 PM, Mr. Puneet Kishor wrote: On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote: Something like this: select geo.id, min_age, max_age, age_bottom, age_top, name, color from geo left join intervals i on i.id = ( select id from intervals where age_bottom>= (select age_bottom from intervals where name = geo.max_age) and age_top<= (select age_top from intervals where name = geo.min_age) order by (age_bottom - age_top) limit 1 ); Thanks Igor. The above does work and produces the correct result. The query speed, however, is pretty slow ~ 75 seconds. So, I created indexes on intervals.name, geo.max_age, and geo.min_age, and that brought the query time to ~ 11 seconds. Still too slow. Indexes on geo.max_age and min_age are unlikely to help with this query (use EXPLAIN QUERY PLAN to see which indexes are actually used). An index on intervals.age_bottom might. So would an index on intervals.age_top (but not both at the same time). -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
On Sep 12, 2011, at 6:51 AM, Igor Tandetnik wrote: > Mr. Puneet Kishorwrote: The table geo can also have rows with min_age = max_age. I want a result set with geo.id, min_age, max_age, age_bottom, age_top, name, color like so: - every row should be for one and only one geo record. I have 39K rows in "geo" table, so the result set should have 39K rows. - when min_age = max_age, list the corresponding intervals.name and color - when min_age != max_age, find the interval with the smallest different between age_bottom and age_top that would span min_age and max_age of geo. In other words, the interval whose age_bottom is bigger than the age_bottom of the max_age and whose age_top is smaller than the age_top of the min_age. > > Something like this: > > select geo.id, min_age, max_age, age_bottom, age_top, name, color > from geo left join intervals i on i.id = ( >select id from intervals >where age_bottom >= >(select age_bottom from intervals where name = geo.max_age) >and age_top <= >(select age_top from intervals where name = geo.min_age) >order by (age_bottom - age_top) limit 1 > ); > Thanks Igor. The above does work and produces the correct result. The query speed, however, is pretty slow ~ 75 seconds. So, I created indexes on intervals.name, geo.max_age, and geo.min_age, and that brought the query time to ~ 11 seconds. Still too slow. So, I created an intermediate table to hold the results, and the speed is more satisfactory. Many thanks again. Puneet. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
Mr. Puneet Kishorwrote: >>> The table geo can also have rows with min_age = max_age. I want a result >>> set with geo.id, min_age, max_age, age_bottom, age_top, >>> name, color like so: >>> >>> - every row should be for one and only one geo record. I have 39K rows in >>> "geo" table, so the result set should have 39K rows. >>> >>> - when min_age = max_age, list the corresponding intervals.name and color >>> >>> - when min_age != max_age, find the interval with the smallest different >>> between age_bottom and age_top that would span min_age >>> and max_age of geo. In other words, the interval whose age_bottom is bigger >>> than the age_bottom of the max_age and whose age_top >>> is smaller than the age_top of the min_age. Something like this: select geo.id, min_age, max_age, age_bottom, age_top, name, color from geo left join intervals i on i.id = ( select id from intervals where age_bottom >= (select age_bottom from intervals where name = geo.max_age) and age_top <= (select age_top from intervals where name = geo.min_age) order by (age_bottom - age_top) limit 1 ); -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
On Sep 11, 2011, at 9:58 PM, Igor Tandetnik wrote: > Mr. Puneet Kishorwrote: >> geo table: 39K rows >> id max_age min_age >> --- --- >> 1 Holocene Holocene >> 5 Cambrian Silurian >> 12 Cambrian Ordovician >> 229 Cretaceous Quaternary >> >> intervals table: ~450 rows >> id age_bottom age_top name color >> --- -- --- --- --- >> 3 0.0117 0. Holocene #FEF2E0 >> 105 443.7000 416. Silurian #B3E1B6 >> 112 488.3000 443.7000 Ordovician #009270 >> 421 2.5880 0. Quaternary #F9F97F >> 122 542. 488.3000 Cambrian #7FA056 >> 33 145.5000 65.5000 Cretaceous #7FC64E >> >> Keep in mind, max_age is older than min_age, and age_bottom is older than >> age_top. >> >> The table geo can also have rows with min_age = max_age. I want a result set >> with geo.id, min_age, max_age, age_bottom, age_top, >> name, color like so: >> >> - every row should be for one and only one geo record. I have 39K rows in >> "geo" table, so the result set should have 39K rows. >> >> - when min_age = max_age, list the corresponding intervals.name and color >> >> - when min_age != max_age, find the interval with the smallest different >> between age_bottom and age_top that would span min_age >> and max_age of geo. In other words, the interval whose age_bottom is bigger >> than the age_bottom of the max_age and whose age_top >> is smaller than the age_top of the min_age. > > I'm not sure I understand. Let's take geo.id = 5, max_age=Cambrian, > min_age=Silurian. You say you want a record whose age_bottom is greater than > that corresponding to Cambrian, that is 542.; and whose age_top is > smaller than that corresponding to Silurian, or 416.. I don't seem to see > any such record in your example. Hi Igor, I muddied the issue by providing limited sample data that doesn't have an answer that fits my needs. Needless to say, in the complete tables there would be answers. My (slightly simplified) query below shows the kind of result I want SELECT g.id, max_age, min_age, Min(i.age_bottom - i.age_top) age_range, i.interval_name, i.interval_color FROM geo g LEFT JOIN intervals i_max ON g.max_age = i_max.interval_name LEFT JOIN intervals i_min ON g.min_age = i_min.interval_name JOIN intervals i ON i.age_bottom >= i_max.age_bottom AND i.age_top <= i_min.age_top GROUP BY g.gid, g.max_age, g.min_age, i.interval_name, i.interval_color ORDER BY g.gid, age_range; produces the following --- g.idmax_age min_age age_range interval_name interval_color --- --- --- --- --- -- 1 Paleozoic Paleozoic 291.Paleozoic #99C08D 1 Paleozoic Paleozoic 542.Phanerozoic #9AD9DD 4 Precambrian Precambrian 3458. Precambrian #F04370 5 CambrianSilurian291.Paleozoic #99C08D 5 CambrianSilurian542.Phanerozoic #9AD9DD 6 SilurianSilurian27.7000 Silurian#B3E1B6 6 SilurianSilurian291.Paleozoic #99C08D .. 94K+ rows What I want from the above table is only the first row of each g.id group because that has the interval that has the smallest "age_range" that spans the "max_age" and "min_age" Sorry, I am not able to articulate this more clearly, but I hope the above example makes things clearer. Puneet. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] help with a complicated join of two tables
Mr. Puneet Kishorwrote: > geo table: 39K rows > id max_age min_age > --- --- > 1 Holocene Holocene > 5 Cambrian Silurian > 12 Cambrian Ordovician > 229 Cretaceous Quaternary > > intervals table: ~450 rows > id age_bottom age_top name color > --- -- --- --- --- > 3 0.0117 0. Holocene #FEF2E0 > 105 443.7000 416. Silurian #B3E1B6 > 112 488.3000 443.7000 Ordovician #009270 > 421 2.5880 0. Quaternary #F9F97F > 122 542. 488.3000 Cambrian #7FA056 > 33 145.5000 65.5000 Cretaceous #7FC64E > > Keep in mind, max_age is older than min_age, and age_bottom is older than > age_top. > > The table geo can also have rows with min_age = max_age. I want a result set > with geo.id, min_age, max_age, age_bottom, age_top, > name, color like so: > > - every row should be for one and only one geo record. I have 39K rows in > "geo" table, so the result set should have 39K rows. > > - when min_age = max_age, list the corresponding intervals.name and color > > - when min_age != max_age, find the interval with the smallest different > between age_bottom and age_top that would span min_age > and max_age of geo. In other words, the interval whose age_bottom is bigger > than the age_bottom of the max_age and whose age_top > is smaller than the age_top of the min_age. I'm not sure I understand. Let's take geo.id = 5, max_age=Cambrian, min_age=Silurian. You say you want a record whose age_bottom is greater than that corresponding to Cambrian, that is 542.; and whose age_top is smaller than that corresponding to Silurian, or 416.. I don't seem to see any such record in your example. -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users