Re: [sqlite] The performance of indexed select

2018-01-07 Thread Nick
Thank you Keith for your useful advice. I am considering to organize the
columns based on BCNF.

I guess that table t3 is needed to remove functional dependency, which means
I should use table t2 and t3 instead of one table t2 with 4 columns a-d. Is
that right?

I am not familiar with the concept BCNF, and I want to make sure that if it
is recommended to create my tables in the way you wrote.

Thanks



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-07 Thread Dinu
... and the downside that it's just linear overhead for i.e. an unique index,
it works best for indexes with low cardinality... win some, lose some :)



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-07 Thread Dinu
What I do notice reading https://www.sqlite.org/fileformat.html (if I get it
right) is that index lines are in the form (for an index on a,b,c ie):

   
   
   

Whereas they could be represented as:

   [ , ,  ](3)

whith [pk_list] being a subtree; reverse lookup from table record to index
record would be arguably as fast if not faster. So this would have the
advantage of more compact indexes (less data), having an index line count
(no prefix so there is always just 1 update involved), with the downside of
the complexity of an added level of indirection.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Dinu
I think storing index prefix counts would only make sense in a special kind
of 'statistical' index, where you would store count(x IS NOT NULL), sum(x),
sum(x^2) so that usual statistical functions can be computed optimally.

For a table count, I think it would make sense.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Keith Medcalf

If I understand your question correctly you have not normalized your data.  The 
whole point of a RELATIONAL DATABASE is that the relationships are based ON THE 
DATA and ONLY ON THE DATA.  If you have not normalized you data to at least 
BCNF you can expect terrible performance and all sorts of access and update 
anomalies.

https://en.wikipedia.org/wiki/Database_normalization

Your schema, assuming that c is unique text should be something like (add the 
COLLATE NOCASE if it is not case sensitive):

create table t3
(
  d integer primary key,
  c text [collate nocase] unique
);
create table t2
(
  a INTEGER,
  b INTEGER,
  d INTEGER REFERENCES t3
);
create index t2d on t2 (d);
+ whatever indexes you need on a and b ...

When you add a new c, you insert it into t3 and then find out what the 
last_rowid (d) was so you can insert a and b and d in t2.  Or if the c already 
exists, then you lookup d and use a, b, d to insert into t2.

populate it with data, and run ANALYZE;

select a,b,c from t3, t2
 where t2.d = t3.d
   and (your other conditions on a b and c)

Then let the query planner decide how to compute the answer ...

---
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.


>-Original Message-
>From: sqlite-users [mailto:sqlite-users-
>boun...@mailinglists.sqlite.org] On Behalf Of Nick
>Sent: Friday, 5 January, 2018 20:32
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] The performance of indexed select
>
>I am trying to analysis the performance of indexed select.
>
>CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT);
>CREATE INDEX t2c ON t2(c);
>
>I think there may be much more leaf index b-tree pages whose header
>is
>'0x0A' if the length of the content of index key 'c' is always 20-25
>bytes,
>as I notice the format of index inside sqlite consist of the index
>key and
>rowid.
>
>I can establish mapping relation between column 'c' and a new INTEGER
>column
>'d'. Then I am wondering if it is reasonable to create new index
>t2(d) to
>get a better performance, as sqlite stores INTEGER in a variable-
>length way
>which means there will be less index pages.
>
>So if it is correct that the performance of indexed select is up to
>the
>number of index pages which is fetched in getPageNormal() within the
>select?
>I think it has positive correlation but I do not know if it is the
>major
>constraint.
>
>And does sqlite have a profile tool to get call tree or execution
>time of
>each functions? All I know is VDBE_PROFILE.
>
>Thanks for any light you can shed.
>
>
>I want to profile sqlite
>
>
>
>--
>Sent from: http://sqlite.1065341.n5.nabble.com/
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Simon Slavin
On 6 Jan 2018, at 8:42pm, Dinu  wrote:

> Richard Hipp-3 wrote
>> all the parent b-tree pages must be updated
> 
> Yup, no question about it, at best it could be an opt-in. But as it is a
> design decision, I checked to make sure count() really is O(n) as Jonathan's
> question implied.

It would be possible instead just to keep a count of the total number of rows 
in each table.   The count could be kept with the table header, or in a new 
column of the table structure as the current autoincrement counter



Just like the discussion up to this point, it would be very difficult to figure 
out, for the average user, whether doing this either way would make things 
significantly faster or slower.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Dinu
Richard Hipp-3 wrote
> all the parent b-tree pages must be updated

Yup, no question about it, at best it could be an opt-in. But as it is a
design decision, I checked to make sure count() really is O(n) as Jonathan's
question implied.



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Richard Hipp
On 1/6/18, Dinu  wrote:
>
> I think b-trees can store the counts of descendant nodes for every node to
> solve this issue in O(log n), but I don't see anything like it in the SQLite
> format.

They can do that, but it also means that all the parent b-tree pages
must be updated whenever an entry is added or removed from a leaf
page, which increases the amount of I/O needed for operations like
INSERT, DELETE, or UPDATE.  It seemed to me that INSERT, DELETE, and
UPDATE are rather more common than SELECT count(*),  so I decided not
to keep a child count in the b-tree pages when I first designed the
current b-tree format for SQLite.  back in 2003.
-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Dinu
Clemens Ladisch wrote
> For count(*), the database does not need the actual table rows.

I think this is not true, he has a point here: SELECT COUNT(*) WHERE
=? needs to examine every index key prefix (excluding at least
ROWID) that matches. This may mean reading in the whole index.

I think b-trees can store the counts of descendant nodes for every node to
solve this issue in O(log n), but I don't see anything like it in the SQLite
format.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-06 Thread Clemens Ladisch
Nick wrote:
> Or in another word, if a TEXT column has similar meaning with an INTEGER
> column in my applications,(such as use userID instead of userName, still the
> way that the data works in my head:) ) is it recommended to use INTEGER one
> in order to get a less index pages?

Yes; an index on a smaller column needs less space, so it's faster to load
from disk.

But you should always measure how much difference it actually makes.

>> For instance, once SQLite has found the right entry in the index it might
>> need to look up that entry in the table to retrieve values which are not
>> in the index.
>
> I understand the execution process you said. And in my opinion, sqlite
> should fetch pages when looking up the entry both in the index and then in
> the table. But I only found pages with '0x0A' and '0x02' when
> getPageNormal() is called during the time running select SQL. Could you give
> me any advises to find the code when sqlite fetching the '0x0D' pages?

For count(*), the database does not need the actual table rows.
You'd have to run a query that reads some other column:

  SELECT sum(a) FROM t2 WHERE d = xx;


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Simon Slavin


On 6 Jan 2018, at 6:41am, Nick  wrote:

> I find it is indeed faster than t2(c).

If you want to know which is the best index, create all the indexes you think 
might be good, run ANALYZE, then use

EXPLAIN QUERY PLAN SELECT (rest of SELECT statement here)

and see which index SQLite chooses to use.  Then you can delete the index(es) 
it didn’t choose.

Note that this gives accurate results only if your tables have convincing data 
in.  It’s not pointless, but nowhere near as accurate, if you're running it on 
a test database with 10 entries.

> Or in another word, if a TEXT column has similar meaning with an INTEGER
> column in my applications,(such as use userID instead of userName, still the
> way that the data works in my head:) ) is it recommended to use INTEGER one
> in order to get a less index pages?  

The correct way to arrange your data is to have userID everywhere you don’t 
need to know the actual name.  UserID can always remain the same but people 
change their names.  You wouldn’t want to have to go update your invoice table 
every time someone got married, would you ?

When you need to print out your invoices showing the actual name, that’s what 
JOIN is for.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Nick
Some simple SQLs:
SELECT count(*) FROM t2 WHERE c = xx; (or d = xx) 



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Nick
Thank you Simon. 

But I am still uncertain if it is a good way to replace column 'c'. 

CREATE TABLE t2(a INTEGER, b INTEGER, d INTEGER); 
or:
CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT, d INTEGER); 
and then
CREATE INDEX t2d ON t2(d);
SELECT count(*) FROM t2 WHERE d = xx;

I find it is indeed faster than t2(c). 

Or in another word, if a TEXT column has similar meaning with an INTEGER
column in my applications,(such as use userID instead of userName, still the
way that the data works in my head:) ) is it recommended to use INTEGER one
in order to get a less index pages?  


One more small question:
> For instance, once SQLite has found the right entry in the index it might
> need to look up that entry in the table to retrieve values which are not
> in the index.

I understand the execution process you said. And in my opinion, sqlite
should fetch pages when looking up the entry both in the index and then in
the table. But I only found pages with '0x0A' and '0x02' when
getPageNormal() is called during the time running select SQL. Could you give
me any advises to find the code when sqlite fetching the '0x0D' pages? 

Thanks.




--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Clemens Ladisch
Nick wrote:
>I am trying to analysis the performance of indexed select. 
>
>CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT);
>CREATE INDEX t2c ON t2(c); 

Show the query that you are trying to analyze.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The performance of indexed select

2018-01-05 Thread Simon Slavin
On 6 Jan 2018, at 3:32am, Nick  wrote:

> I think there may be much more leaf index b-tree pages whose header is
> '0x0A' if the length of the content of index key 'c' is always 20-25 bytes,
> as I notice the format of index inside sqlite consist of the index key and
> rowid.

You’re overthinking it.  Establish your tables and indexes according to however 
the data works in your head.  Insert some plausible data into the tables. The 
more this data is like a fully-populated production database, the better.  Then 
run the SQL command "ANALYZE".

See if the results are fast enough for your intended purposes.  They probably 
will be.  Only if they’re not, start worrying about optimization.

Try to make SQL serve the way you want to organise your data, not the other way 
around.

> So if it is correct that the performance of indexed select is up to the
> number of index pages which is fetched in getPageNormal() within the select?
> I think it has positive correlation but I do not know if it is the major
> constraint.

More complicated than that.  For instance, once SQLite has found the right 
entry in the index it might need to look up that entry in the table to retrieve 
values which are not in the index.  So it might be better to make an index 
which contains all those values (called a "covering index").  But it might not, 
because that will make the index bigger, and will mean that your computer has 
to do a lot more disk access while searching the index.

> And does sqlite have a profile tool to get call tree or execution time of
> each functions? All I know is VDBE_PROFILE.

No.  But the shell tool has a timer which can closely time the execution of any 
command.  And that is a far more reliable way of knowing what will take longer 
or shorter in the real world than timing individual calls.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] The performance of indexed select

2018-01-05 Thread Nick
I am trying to analysis the performance of indexed select. 

CREATE TABLE t2(a INTEGER, b INTEGER, c TEXT);
CREATE INDEX t2c ON t2(c); 

I think there may be much more leaf index b-tree pages whose header is
'0x0A' if the length of the content of index key 'c' is always 20-25 bytes,
as I notice the format of index inside sqlite consist of the index key and
rowid.

I can establish mapping relation between column 'c' and a new INTEGER column
'd'. Then I am wondering if it is reasonable to create new index t2(d) to
get a better performance, as sqlite stores INTEGER in a variable-length way
which means there will be less index pages. 

So if it is correct that the performance of indexed select is up to the
number of index pages which is fetched in getPageNormal() within the select?
I think it has positive correlation but I do not know if it is the major
constraint. 

And does sqlite have a profile tool to get call tree or execution time of
each functions? All I know is VDBE_PROFILE. 

Thanks for any light you can shed.


I want to profile sqlite



--
Sent from: http://sqlite.1065341.n5.nabble.com/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users