[sqlite] How would sqlite read this from disk?

2015-10-30 Thread Simon Slavin

On 30 Oct 2015, at 4:22pm, Mark Hamburg  wrote:

> I knew it would dirty the whole page. I was thinking about the memory work 
> and ignoring that the disk I/O swamps that.

As someone who is not part of the development team I can write this.

I can't answer your question, but your pattern of thinking suggests that you're 
trying to optimize SQLite performance in some way.  But the questions you're 
asking suggest that you don't actually know of anything SQLite does in a bad 
way.  So you're chosing the wrong things to criticize.

SQLite is seriously good.  The things it does, and the choices made when it was 
written have been under test for ten years.  And the developer team are all 
professional computer scientists: they not only have programming experience, 
but they've read all the articles and books you'd expect them to have read.

So I urge you to go out and experiment.  Write some code which uses the API.  
If you come back to us with poor performance in something then we have a proper 
basis for discussion.  But right now you're just pointlessly learning details 
about SQLite -- which you could learn by reading the source code -- and 
mentioning things which are done the way they are for good reasons.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-30 Thread Mark Hamburg
On Oct 30, 2015, at 5:56 AM, Richard Hipp  wrote:

>> Will SQLite rewrite the whole row if you just change field2 from one float
>> to another?
> 
> Yes.  Not just the whole row but the whole page on which that row
> resides.  And even if SQLite did just try to write the 8 bytes that
> changes, your OS and disk controller will both end up writing the
> entire sector and/or track, so it amounts to about the same I/O either
> way.

I knew it would dirty the whole page. I was thinking about the memory work and 
ignoring that the disk I/O swamps that. Will it at least avoid dirtying 
continuation pages if the space before the blob doesn't change?

Mark


[sqlite] How would sqlite read this from disk?

2015-10-30 Thread Richard Hipp
On 10/30/15, Mark Hamburg  wrote:
>
>> On Oct 29, 2015, at 12:24 PM, Richard Hipp  wrote:
>>
>> If you do have large BLOBs or strings, SQLite handles this best if the
>> large blob/string is stored in a table by itself and then linked into
>> other tables using an integer primary key.  For example:
>>
>>   CREATE TABLE BigBlobTab (
>>   blobid INTEGER PRIMARY KEY,
>>   content BLOB -- big BLOB or text field.
>>   );
>>   CREATE TABLE OtherStuff (
>>   field1 VARCHAR(10),
>>   field2 FLOAT,
>>   field3 BOOLEAN,
>>   blobid INT REFERENCES BigBlobTab
>>   );
>
> Will SQLite rewrite the whole row if you just change field2 from one float
> to another?
>

Yes.  Not just the whole row but the whole page on which that row
resides.  And even if SQLite did just try to write the 8 bytes that
changes, your OS and disk controller will both end up writing the
entire sector and/or track, so it amounts to about the same I/O either
way.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-30 Thread Mark Hamburg

> On Oct 29, 2015, at 12:24 PM, Richard Hipp  wrote:
> 
> If you do have large BLOBs or strings, SQLite handles this best if the
> large blob/string is stored in a table by itself and then linked into
> other tables using an integer primary key.  For example:
> 
>   CREATE TABLE BigBlobTab (
>   blobid INTEGER PRIMARY KEY,
>   content BLOB -- big BLOB or text field.
>   );
>   CREATE TABLE OtherStuff (
>   field1 VARCHAR(10),
>   field2 FLOAT,
>   field3 BOOLEAN,
>   blobid INT REFERENCES BigBlobTab
>   );

Will SQLite rewrite the whole row if you just change field2 from one float to 
another?

Mark




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H

> On 10/29/15, Jason H  wrote:
> >
> > I'm open to ideas, but I was going to use this as an excuse to invent
> > something of a general tool.
> 
> Post your schema and query.  Also run ANALYZE and post the content of
> the sqlite_stat1 table.

I really appreciate the offer, but unfortunately, I can't post it, and it's not 
my design. I can alter it though, however I'd have to design a process to map 
from the original schema to the altered one every time we get a snapshot. There 
already exists a script which adds indexes on the columns that we join through. 

Perhaps I could write a script that would textually replace the table and 
column names. That will take a little time.

(FWIW, I'm rather proud of my own schemas and they resemble nothing like this 
mess)

Thanks again.



[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
>
> On 10/29/15, Jason H  wrote:
> >>
> > If I were to try to work around...
> 
> Before we go any further, have you actually measured a performance
> problem?  Or are you optimizing without prior knowledge of where your
> application is spending time?

Currently, I have a SQLite database of around 10gig that takes 25 minutes to 
run a single query against no other activity (it's never queried in read/write, 
just read). I've created indexes the best I can. To get this lower, I'm looking 
at hadoop, or use it as an excuse to re architect something in SQLite. Hadoop 
has many non-ideal attributes. I want to stick at SQLite because that's what we 
know and my research indicates it's also still the fastest at reading. 

For this specific database, we join though about 12 tables on an average query 
(519 tables total, 319 code tables), most of which have over 2 dozen columns, 
some over 256 columns, max 384. the longest row is 16k in total.


I'm open to ideas, but I was going to use this as an excuse to invent something 
of a general tool.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 7:49pm, Jason H  wrote:

> If I were to try to work around all this excess reading, how good or bad 
> would it be to take the following approach:
> Define a set of smaller tables (column count and size) with a common key, 
> then create a view for the specific query, then query the view.
> 
> create table k (id integer primary key, ... other always used columns...);
> create table c1 (k integer, data text); -- seldomly used large column
> create table c2 (k integer, data text); -- seldomly used large column
> create table c3 (k integer, data text); -- seldomly used large column
> ...
> create table c128 (k integer, data text); -- seldomly used large column

Handling large numbers of tables is slow.  Handling large numbers of rows is 
fast.  For cases where you have "seldomly used large column" use sparse storage:

CREATE TABLE sparseColumns (id INTEGER, k INTEGER, value TEXT, PRIMARY KEY 
(id,k));

Generally speaking SQLite is ridiculously good at this stuff and trying to work 
around it only makes things worse.  It's in its third version and even version 
3 has been worked on, including major re-writes, for ten years.  It doesn't do 
slow complicated things (row-level locking, character-sets).  By all means tell 
us when you get slow results, but if you're trying to predict slow results just 
from reading the documentation you'll probably not getting useful data.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 6:22pm, Jason H  wrote:

> Ah, so this is what I seem to have missed. The pages... This is unfortunate 
> as the read-heavy application won't likely benefit from SQLite. Only a small 
> fraction of the row data is used each query, and it seems like It'll have to 
> read the entire row (via the page) each time?

It will read the entire page each time.  But the page is normally the size of a 
disk block, and the read is done in one operation: seek-to-sector-head and a 
read.  It's the quickest possible thing to do and it happens very quickly 
indeed.  It's actually as fast as seek-to-sector and read-a-few-bytes, because 
reading the entire block is how disk subsystems work internally.

Also, because pages are quite big the page you want is probably already in 
memory.  If you're not doing multi-access stuff it won't need to re-read.


On 29 Oct 2015, at 6:32pm, Wade, William  wrote:

> if a particular row requires twenty blocks of storage

That happens almost never.  The size of a page is a power of two between 512 
and 65536 inclusive.  When you create a new database, if you know you're going 
to have really long rows you tend to set a big page size.

> and I
> want to read a one-byte field that is in the twentieth block, all twenty
> blocks will be read.

Which is why you generally put small commonly-used columns before long text and 
blobs.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 8:59pm, Jason H  wrote:

> Currently, I have a SQLite database of around 10gig that takes 25 minutes to 
> run a single query against no other activity (it's never queried in 
> read/write, just read). I've created indexes the best I can.

You can do better.  I have a 43 Gigabyte database and I get responses from 
queries including JOINs in milliseconds.  It has four tables and the widest 
table has five columns.

> For this specific database, we join though about 12 tables on an average 
> query (519 tables total, 319 code tables), most of which have over 2 dozen 
> columns, some over 256 columns, max 384. the longest row is 16k in total.

There's your problem.  You need to redesign your schema.  As a general rule, if 
a database has so many tables -- or a table has so many columns -- that you 
can't keep them all in your head at the same time, it's badly-designed.  256 is 
silly.

You don't need a tool.  You need a better schema.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
> On 10/29/15, Jason H  wrote:
> >
> > Ah, so this is what I seem to have missed. The pages... This is unfortunate
> > as the read-heavy application won't likely benefit from SQLite.
> 
> Your filesystem and your disk hardware work the same way.  Your
> application might only ask for 10 bytes, but the underlying OS and
> hardware are pulling a complete sector (or perhaps a complete track)
> from disk, pass the 10 bytes you asked for off to your app, and
> holding the rest in cache.
> 
> So, no, you cannot improve performance by only reading a few bytes
> here and a few bytes there.  The disk has to work just as hard.

First, thanks everyone, I've learned a lot about SQLite3!

You make a very good point here, the OS will be caching disk reads (16-64k from 
what I understand) so the reading is on average cheaper, but overall more 
expensive on average if I do not need all the data. I was hoping that those 
very large tables (width in terms of column number and size, not row count) 
could benefit from a database being able to discard fields. If I could ask one 
more question

If I were to try to work around all this excess reading, how good or bad would 
it be to take the following approach:
Define a set of smaller tables (column count and size) with a common key, then 
create a view for the specific query, then query the view.

create table k (id integer primary key, ... other always used columns...);
create table c1 (k integer, data text); -- seldomly used large column
create table c2 (k integer, data text); -- seldomly used large column
create table c3 (k integer, data text); -- seldomly used large column
...
create table c128 (k integer, data text); -- seldomly used large column

create view myquery as 
select k1.id id, c1.data c1, c2.data c2, c3.data c3 from k
left join c1 on c1.k=k.id 
left join c2 on c2.k=k.id 
left join c3 on c3.k=k.id;

select * from myquery;

Shouldn't that skip the data I don't need, make good use of disk caches. What I 
don't know though, is how efficient SQLite3 will be at putting that view 
together. Did I make a mess giving sqlite3 busy work or did I speed it up? 

"Nobody ever made them like this! I mean the architect had to be a certified 
genius or an authentic wacko!" - Dr Ray Stantz, Ghostbusters



[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 8:24 PM, Richard Hipp  wrote:

> On 10/29/15, Dominique Devienne  wrote:
> >
> > I've discussed blobs a few times already :). Especially the fact you
> can't
> > have transactional and incremental updates of blobs (i.e. the transaction
> > "copies" only the blob pages actually modified).
>
> Large blobs are transactional when they are stored in SQLite tables.
>

Even in the context of https://www.sqlite.org/c3ref/blob_write.html ?

Don't "it is not possible to increase the size of a BLOB" and
"Writes to the BLOB that occurred before the BLOB handle expired are not
rolled back"
imply that there is a loss of transactionality?


> Perhaps you are talking about storing BLOBs in separate files and then
> just storing the filename in the SQLite table?
>

No, I was talking about "in-row" blobs. Sorry, I was indeed unclear.

I wish to write blobs incrementally and transactionally *without* that
implying
having SQLite rewrite the whole blob (to maintain transactionality). Said
differently,
I'd like incremental blob updates to be transactional *and* efficient, when
the amount
written is a fraction of the total blob size.

The way Oracle achieves that, AFAIK, is that the row contains a small blob
that
contains the page indexes for the blobs, and the blob itself is stored
"out-of-row"
in blob dedicated pages. When doing an incremental blob update, only blob
pages
actually written are copied for the transaction, as well as the page for
the row, with
the "blob-index" blob now referencing the new blob pages (as well as the
same old
unmodified blob pages).


> If you do have large BLOBs or strings, SQLite handles this best if the
> large blob/string is stored in a table by itself and then linked into
> other tables using an integer primary key.  For example:
>
>CREATE TABLE BigBlobTab (
>blobid INTEGER PRIMARY KEY,
>content BLOB -- big BLOB or text field.
>);
>CREATE TABLE OtherStuff (
>field1 VARCHAR(10),
>field2 FLOAT,
>field3 BOOLEAN,
>blobid INT REFERENCES BigBlobTab
>);
>

But then there's a lifetime management issue. The blob is an "attribute" of
the row,
and the parent-child relationship is "upside-down". If the "blob" table
references the
"row" table, then the row table can exist w/o a blob (when it is a non-null
attribute).
And conversely, the same blob can be used by several rows, or several rows
can
share the same blob. And either way, you can't enforce the 1:1 relationship
that a
row attribute implicitly enjoys. (and is required in some cases).

Please there's the fact that foreign-keys (and ON DELETE CASCADE) is
optional,
so any user of the DB file that can break referential integrity of those
files, which it
can't if the blob is in-row.


> This technique work similarly to storing BLOBs in separate files, in
> that it does not mix the BLOB content with the rest of the row, making
> the row faster to search and update.  But it has the advantage over
> separate storage that BLOB changes are transactional and all content
> is stored in a single file on disk.
>

I don't want separate storage at all! I'd want my blobs *in* SQLite in fact.
But the above basically makes it too inefficient IMHO.


> This is how the Fossil VCS (https://www.fossil-scm.org/) stores content,
> fwiw.


Fossil stores its "blobs" by SHA-1, and sharing of blobs across
revisions/branches
is in fact a feature. And all those blobs are immutable by design. At least
that the
impression I got reading the documentation.

SQLite is a wonderful piece of software. And I thank you for it. I'm a big
fan of it.
But the above is one of my main pet peeve with it :). Which is based on my
limited
comprehension of how SQLite works. Please do correct me if I'm wrong.
Thanks, --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 7:32 PM, Wade, William  wrote:

> From https://www.sqlite.org/fileformat2.html, if a row does not fit into
> a single block, its overflow is stored in a linked list. The implication
> (to me) is that if a particular row requires twenty blocks of storage, and
> I want to read a one-byte field that is in the twentieth block, all twenty
> blocks will be read.
>
> Clearly it would be possible to do better even without switching to a
> column-major layout (instead use a record-internal b-tree mapping
> column-number-for-this-record to page number of the overflow block) but I
> don't see any documented evidence that sqlite does that. For some
> applications it would certainly be nice.
>

But that would require a format change, which the SQLite team is not
willing to make IMHO.


> It would also be nice to get something closer to random access into large
> blobs (and the same technique works there).
>

I've discussed blobs a few times already :). Especially the fact you can't
have transactional and incremental updates of blobs (i.e. the transaction
"copies" only the blob pages actually modified).
This would require a blob-index, and again that's a format change. There is
a vacuum mode each tracks a bit more info about pages I vaguely remember,
with special pages, that helps a little, but I don't recall the details.
--DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 7:22 PM, Jason H  wrote:
>
> Or is that just crazy talk?
>

Mostly, yes. You seem to think that reading from disk at arbitrary offsets
and in arbitrarily small increments will be efficient. It won't be.
And as you pointed out, that ignores transactionality, WAL mode, etc... --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 7:09 PM, Jason H  wrote:

> This seems to be at odds with my understanding of "2.1 Record Format" of
> the document. Given that it reads the row varint, which contains the length
> of itself and the serial types of the columns, in order, it should be
> completely able to skip the reading of unneeded columns. If I have 39,39,39
> as my header (excepting the length of header) I know I am dealing with 3
> text columns of 13 bytes each. If I want only the last column, I can then
> seek(78), read(26). At no time did I need to read the prior two columns.
>
> Or did I misunderstand something?
>

You seem to miss what Scott and Paul already explained, which is that
SQLite is *paged*.
The IO increment is the page (from 1/2K to 64K, with 1K, 2K, 4K, and 8K
being popular choices I think).
An SQLite file is a 100 byte header, and then N pages of a given size,
determined at DB creation time.
The sqlite_master table is the "root" page, which lists all tables,
indexes, etc, and their starting page in that file.
Look at the diagram to the right of http://www.sqlite.org/vfs.html (that I
already sent).
All IO is done by the *pager*, as it names indicate, it reads and writes
*pages*.

What you call "reading" column c1, skip c2, etc... is not "reading" per se.
It's "decrypting" the memory block that represents a disk file page, that
the pager read from file (actual IO of a whole page) and put in the page
cache.
it's a layered architecture with clean separation of concerns and
responsibilities. --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
> Sent: Thursday, October 29, 2015 at 2:04 PM
> From: "Paul Sanderson" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> It reads a complete page at a time so there is no seeking other than
> to the start of each row - in the sense of a disk seek.
> 
> Note that there may be multiple required rows on the same page if the
> row length is much less than the page length, or if rows are longer
> than a size determined by some arcane math from the page size (see the
> file format documentation), a row may overflow to one or more pages


Ah, so this is what I seem to have missed. The pages... This is unfortunate as 
the read-heavy application won't likely benefit from SQLite. Only a small 
fraction of the row data is used each query, and it seems like It'll have to 
read the entire row (via the page) each time?


I wonder how much work it would be to come up with a "SQLite read-only mode"  
bypasses the page interface, but allows the SQL parser to query the data. it 
seems that pages are only needed when you're managing updates to the table. If 
you're write-once-query-forever it seems there would be tremendous performance 
gains to be had. 

Or is that just crazy talk?


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H


> Sent: Thursday, October 29, 2015 at 1:53 PM
> From: "Simon Slavin" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> 
> On 29 Oct 2015, at 5:20pm, Jason H  wrote:
> 
> > Thanks, this is some really great information!
> 
> You're welcome.
> 
> > If I could ask a followup question. You made the statement "SQLite reads 
> > that row of the table from storage, from the first column to the last 
> > column needed by the SELECT, but perhaps not all the way to the end of the 
> > columns in the row.", Given an example select that requires columns 2,3,5, 
> > does this mean that column 4 is also read, or is column 4 be skipped? I 
> > guess the question is I am assuming that a row is serialization of one or 
> > more strings that are a serialization of a string preceded by their lengths 
> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5 
> > example):
> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read 
> > L3*byte, read L4, seek L4, read L5, read L5*byte
> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
> 
> You are missing some useful information.  Whenever SQLite has to change even 
> one variable-length column in a row it rewrites the whole row.  It has no way 
> of storing a short column in the space originally used by a longer one.  So 
> even if all you do is
> 
> UPDATE t SET c5='short string' WHERE c5='the original extremely long string'
> 
> the new copy of any of those rows has the new c6 immediately following the 
> new c5.  If you've read to the end of c5 then you're already pointing to c6 
> so there's no need to seek it.  Contriwise, the only way to seek c6 is to 
> read all the columns before it.  There is no pointer to the beginning of c6 
> and no magic mark that says "Column c6 starts here".
> 
> I'll answer your specific question (I hope) with an example.
> 
> CREATE TABLE t (c1, c2, c3, ... c8);
> CREATE INDEX t_23 ON myTable (c2, c3);
> SELECT c1, c3, c5, c6, FROM t WHERE c2='Smith' ORDER BY c3;
> 
> Given the above, the clauses in the SELECT can be perfectly satisfied by the 
> index, but that index doesn't supply the c1, c5 or c6 needed by the SELECT.
> 
> First SQLite figures out which rows, are needed to answer the query.  And in 
> doing that it decides to use index t_23 which means that for each of those 
> rows it already has the value of c3.  So knows which rows to read, and it 
> knows that it still needs c1, c5 and c6 from each of those rows.  The rows go 
> up to c8, but it doesn't need any column past c6.
> 
> So for each row needed, SQLite is pointed to the beginning of c1 for that 
> row.  There's no way of knowing where to find c5 without reading all the 
> preceding columns, because any of those may be any length.  So it progresses 
> through all the columns from c1 to c6 using the length-indicator to skip them 
> or read them depending on whether it needs their values.  It knows it can 
> stop after c6 so it stops once it has seen that one.
> 
> Hope that helps.  If you want the fine detail about the file format, you can 
> find it here:
> 
> <https://www.sqlite.org/fileformat2.html>
> 
> though that won't give you the algorithm details I've described above.

This seems to be at odds with my understanding of "2.1 Record Format" of the 
document. Given that it reads the row varint, which contains the length of 
itself and the serial types of the columns, in order, it should be completely 
able to skip the reading of unneeded columns. If I have 39,39,39 as my header 
(excepting the length of header) I know I am dealing with 3 text columns of 13 
bytes each. If I want only the last column, I can then seek(78), read(26). At 
no time did I need to read the prior two columns.

Or did I misunderstand something?

Many thanks, yet again.




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H


> Sent: Thursday, October 29, 2015 at 1:34 PM
> From: "Scott Hess" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
> >
> > If I could ask a followup question. You made the statement "SQLite reads
> > that row of the table from storage, from the first column to the last
> > column needed by the SELECT, but perhaps not all the way to the end of the
> > columns in the row.", Given an example select that requires columns 2,3,5,
> > does this mean that column 4 is also read, or is column 4 be skipped? I
> > guess the question is I am assuming that a row is serialization of one or
> > more strings that are a serialization of a string preceded by their lengths
> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
> > example):
> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
> > L3*byte, read L4, seek L4, read L5, read L5*byte
> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
> 
> 
> You should consider reading https://www.sqlite.org/fileformat2.html ,
> especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
> Pages".  If your project _really_ needs to know this level of detail, then
> you really should read up on the underlying system.  Also maybe throw in
> https://www.sqlite.org/arch.html to get a broad feel of how things break
> down.


Thanks Scott. I had actually already linked to the fileformat2 url, but I did 
not see a discussion to the level of detail for row reading, for which I am 
after. I did however re-read in the context of your previous statement and 
gained some insight. I should a have stated it more like this:
begin row read header_varint, serial_types*N, seek, read, seek, read or
begin row read header_varint, serial_types*N, seek, read, read, read

The documentation does not go into the detail of the engine is able to skip the 
reading of unneeded interior rows. In theory, it can because the length is 
contained in the header. So instead of read() on every column in the row, it 
can call seek() if it knows it doesn't need that column. My question is now 
simply: does it seek past unneeded columns, or does everything get send through 
read once data has started being read? It's a minor detail with bug performance 
implications. 

Many thanks again.



[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Wade, William


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
> Sent: Thursday, October 29, 2015 at 12:10 PM
> From: "Simon Slavin" 
> To: "SQLite mailing list" 
> Subject: Re: [sqlite] How would sqlite read this from disk?
>
> 
> On 29 Oct 2015, at 2:29pm, Jason H  wrote:
> 
> > In college databases, we calculated the estimated number of blocks 
> > (512-byte device blocks) read given schema and statistics. Now, I'm asking 
> > how would SQLite actually do that?
> 
> SQLite uses a block structure in its database files, but one file block is 
> not always the same size as a storage-subsystem block.  If you're okay with 
> that, then the answer is to do it at the VFS level as Dominique described.
> 
> > I am particularly interested in knowing if SQLite can skip the reading of 
> > columns (and their disk blocks) not used in the query. that is, would Q1 
> > read on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 
> > 256 blocks, or 16384 blocks (the whole table again, irrespective of the 
> > previous statement and caches). Under what scenarios does the behavior 
> > change (indexes?)
> 
> From your question it seems that you believe that SQLite stores the data for 
> many rows in a column in the same block.  This is not the case.  SQLite 
> stores data in tables in one contiguous section per row of the table.  So the 
> data for one row, not one column, is stored in one chunk (possibly all in the 
> same block), with column 1 first, then column 2, etc..  Like this
> 
> A table
>   A row
>   column 1
>   column 2 ...
>   A different row 
>   column 1
>   column 2 ...
>   A different row ...
> 
> Let's assume that your SELECT's WHERE and ORDER BY clauses can be perfectly 
> served using an index -- either the primary key index for the table or 
> another index created using CREATE INDEX.  When this is the case, data from 
> the table is needed only once the right row has been found, and the columns 
> needed are detailed in the part between SELECT and FROM.  SQLite reads that 
> row of the table from storage, from the first column to the last column 
> needed by the SELECT, but perhaps not all the way to the end of the columns 
> in the row. [1]
> 
> So when creating a table you would theoretically get an improvement in 
> efficiency if you put the columns in decreasing order of use.  In real life 
> situations this tends to have significant effect only when you put long text 
> columns at the end of the rows.
> 
> Something similar is done when the SELECT's WHERE and ORDER BY clauses can't 
> be perfectly served using an index.  In that case the columns needed for 
> those clauses needs to be read from the table as well as the columns needed 
> by the part between SELECT and FROM.
> 
> Simon Slavin.
> 
> [1] Columns whose values are mentioned in the index used do not need to be 
> read from the table since those values are already known.  And there are some 
> details I've left out for brevity.


Thanks, this is some really great information!

If I could ask a followup question. You made the statement "SQLite reads that 
row of the table from storage, from the first column to the last column needed 
by the SELECT, but perhaps not all the way to the end of the columns in the 
row.", Given an example select that requires columns 2,3,5, does this mean that 
column 4 is also read, or is column 4 be skipped? I guess the question is I am 
assuming that a row is serialization of one or more strings that are a 
serialization of a string preceded by their lengths (i.e. [L|data] ), would 
SQLite do something akin to (using my 2,3,5 example):
begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read L3*byte, 
read L4, seek L4, read L5, read L5*byte
or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?

Many thanks again!




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 6:09pm, Jason H  wrote:

> This seems to be at odds with my understanding of "2.1 Record Format" of the 
> document. Given that it reads the row varint, which contains the length of 
> itself and the serial types of the columns, in order, it should be completely 
> able to skip the reading of unneeded columns. If I have 39,39,39 as my header 
> (excepting the length of header) I know I am dealing with 3 text columns of 
> 13 bytes each. If I want only the last column, I can then seek(78), read(26). 
> At no time did I need to read the prior two columns.

You're right.  I explained it wrong.  You only need to read the serial type of 
the preceding columns.  You don't need to read past the contents.  But you do 
need to read and interpret the serial types for all columns 1 to 4 in order to 
know where column 5 starts.  There's no indication that says "column 5 starts 
at byte 1828.".

Thanks for checking and not just believing me.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Paul Sanderson
It reads a complete page at a time so there is no seeking other than
to the start of each row - in the sense of a disk seek.

Note that there may be multiple required rows on the same page if the
row length is much less than the page length, or if rows are longer
than a size determined by some arcane math from the page size (see the
file format documentation), a row may overflow to one or more pages
. 
Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
http://sandersonforensics.com/forum/content.php?195-SQLite-Forensic-Toolkit
-Forensic Toolkit for SQLite
email from a work address for a fully functional demo licence


On 29 October 2015 at 17:59, Jason H  wrote:
>
>
>> Sent: Thursday, October 29, 2015 at 1:34 PM
>> From: "Scott Hess" 
>> To: "SQLite mailing list" 
>> Subject: Re: [sqlite] How would sqlite read this from disk?
>>
>> On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
>> >
>> > If I could ask a followup question. You made the statement "SQLite reads
>> > that row of the table from storage, from the first column to the last
>> > column needed by the SELECT, but perhaps not all the way to the end of the
>> > columns in the row.", Given an example select that requires columns 2,3,5,
>> > does this mean that column 4 is also read, or is column 4 be skipped? I
>> > guess the question is I am assuming that a row is serialization of one or
>> > more strings that are a serialization of a string preceded by their lengths
>> > (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
>> > example):
>> > begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
>> > L3*byte, read L4, seek L4, read L5, read L5*byte
>> > or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?
>>
>>
>> You should consider reading https://www.sqlite.org/fileformat2.html ,
>> especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
>> Pages".  If your project _really_ needs to know this level of detail, then
>> you really should read up on the underlying system.  Also maybe throw in
>> https://www.sqlite.org/arch.html to get a broad feel of how things break
>> down.
>
>
> Thanks Scott. I had actually already linked to the fileformat2 url, but I did 
> not see a discussion to the level of detail for row reading, for which I am 
> after. I did however re-read in the context of your previous statement and 
> gained some insight. I should a have stated it more like this:
> begin row read header_varint, serial_types*N, seek, read, seek, read or
> begin row read header_varint, serial_types*N, seek, read, read, read
>
> The documentation does not go into the detail of the engine is able to skip 
> the reading of unneeded interior rows. In theory, it can because the length 
> is contained in the header. So instead of read() on every column in the row, 
> it can call seek() if it knows it doesn't need that column. My question is 
> now simply: does it seek past unneeded columns, or does everything get send 
> through read once data has started being read? It's a minor detail with bug 
> performance implications.
>
> Many thanks again.
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 5:20pm, Jason H  wrote:

> Thanks, this is some really great information!

You're welcome.

> If I could ask a followup question. You made the statement "SQLite reads that 
> row of the table from storage, from the first column to the last column 
> needed by the SELECT, but perhaps not all the way to the end of the columns 
> in the row.", Given an example select that requires columns 2,3,5, does this 
> mean that column 4 is also read, or is column 4 be skipped? I guess the 
> question is I am assuming that a row is serialization of one or more strings 
> that are a serialization of a string preceded by their lengths (i.e. [L|data] 
> ), would SQLite do something akin to (using my 2,3,5 example):
> begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read 
> L3*byte, read L4, seek L4, read L5, read L5*byte
> or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?

You are missing some useful information.  Whenever SQLite has to change even 
one variable-length column in a row it rewrites the whole row.  It has no way 
of storing a short column in the space originally used by a longer one.  So 
even if all you do is

UPDATE t SET c5='short string' WHERE c5='the original extremely long string'

the new copy of any of those rows has the new c6 immediately following the new 
c5.  If you've read to the end of c5 then you're already pointing to c6 so 
there's no need to seek it.  Contriwise, the only way to seek c6 is to read all 
the columns before it.  There is no pointer to the beginning of c6 and no magic 
mark that says "Column c6 starts here".

I'll answer your specific question (I hope) with an example.

CREATE TABLE t (c1, c2, c3, ... c8);
CREATE INDEX t_23 ON myTable (c2, c3);
SELECT c1, c3, c5, c6, FROM t WHERE c2='Smith' ORDER BY c3;

Given the above, the clauses in the SELECT can be perfectly satisfied by the 
index, but that index doesn't supply the c1, c5 or c6 needed by the SELECT.

First SQLite figures out which rows, are needed to answer the query.  And in 
doing that it decides to use index t_23 which means that for each of those rows 
it already has the value of c3.  So knows which rows to read, and it knows that 
it still needs c1, c5 and c6 from each of those rows.  The rows go up to c8, 
but it doesn't need any column past c6.

So for each row needed, SQLite is pointed to the beginning of c1 for that row.  
There's no way of knowing where to find c5 without reading all the preceding 
columns, because any of those may be any length.  So it progresses through all 
the columns from c1 to c6 using the length-indicator to skip them or read them 
depending on whether it needs their values.  It knows it can stop after c6 so 
it stops once it has seen that one.

Hope that helps.  If you want the fine detail about the file format, you can 
find it here:



though that won't give you the algorithm details I've described above.

Simon.


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>
> I'm open to ideas, but I was going to use this as an excuse to invent
> something of a general tool.

Post your schema and query.  Also run ANALYZE and post the content of
the sqlite_stat1 table.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Simon Slavin

On 29 Oct 2015, at 2:29pm, Jason H  wrote:

> In college databases, we calculated the estimated number of blocks (512-byte 
> device blocks) read given schema and statistics. Now, I'm asking how would 
> SQLite actually do that?

SQLite uses a block structure in its database files, but one file block is not 
always the same size as a storage-subsystem block.  If you're okay with that, 
then the answer is to do it at the VFS level as Dominique described.

> I am particularly interested in knowing if SQLite can skip the reading of 
> columns (and their disk blocks) not used in the query. that is, would Q1 read 
> on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 256 
> blocks, or 16384 blocks (the whole table again, irrespective of the previous 
> statement and caches). Under what scenarios does the behavior change 
> (indexes?)


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>>
> If I were to try to work around...

Before we go any further, have you actually measured a performance
problem?  Or are you optimizing without prior knowledge of where your
application is spending time?

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Dominique Devienne
On Thu, Oct 29, 2015 at 3:29 PM, Jason H  wrote:

> I'm trying to figure out how SQLite3 would read the schema and data off
> the disk below. I read https://www.sqlite.org/fileformat2.html but didn't
> find what I was looking for.
> In college databases, we calculated the estimated number of blocks
> (512-byte device blocks) read given schema and statistics. Now, I'm asking
> how would SQLite actually do that?
>
> Given this pseudo code:
> create table aTable(c1 text, ..., c128 text);
> for 1 to 100,000:
>  insert into aTable (data1, ..., data128); -- where each dataN is a random
> length from 0 to 128k bytes, with an average of 64kb
>
> Now for the magic:
> select data64 from aTable; -- Q1
> select data1, data128 from aTable; -- Q2
>
> I am particularly interested in knowing if SQLite can skip the reading of
> columns (and their disk blocks) not used in the query. that is, would Q1
> read on order of 128 blocks, or 16384 blocks (the whole table). And in Q2,
> 256 blocks, or 16384 blocks (the whole table again, irrespective of the
> previous statement and caches). Under what scenarios does the behavior
> change (indexes?)
>
> Many, many thanks in advance for this info.
>

Depends on the page size, page cache size, vacuum options, etc...

If you really want to know at that level of detail, then write your own
shim VFS (http://www.sqlite.org/vfs.html) to know when/if SQLite reads and
writes to disk.

I think I remember seeing one of the graphical SQLite tools showing a "heat
map" of DB pages, which at the time thought must be using a VFS to show
that. --DD


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Jason H
I'm trying to figure out how SQLite3 would read the schema and data off the 
disk below. I read https://www.sqlite.org/fileformat2.html but didn't find what 
I was looking for.
In college databases, we calculated the estimated number of blocks (512-byte 
device blocks) read given schema and statistics. Now, I'm asking how would 
SQLite actually do that?

Given this pseudo code:
create table aTable(c1 text, ..., c128 text);
for 1 to 100,000:
 insert into aTable (data1, ..., data128); -- where each dataN is a random 
length from 0 to 128k bytes, with an average of 64kb

Now for the magic:
select data64 from aTable; -- Q1
select data1, data128 from aTable; -- Q2

I am particularly interested in knowing if SQLite can skip the reading of 
columns (and their disk blocks) not used in the query. that is, would Q1 read 
on order of 128 blocks, or 16384 blocks (the whole table). And in Q2, 256 
blocks, or 16384 blocks (the whole table again, irrespective of the previous 
statement and caches). Under what scenarios does the behavior change (indexes?)

Many, many thanks in advance for this info. 




[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Dominique Devienne  wrote:
>
> I've discussed blobs a few times already :). Especially the fact you can't
> have transactional and incremental updates of blobs (i.e. the transaction
> "copies" only the blob pages actually modified).

Large blobs are transactional when they are stored in SQLite tables.

Perhaps you are talking about storing BLOBs in separate files and then
just storing the filename in the SQLite table?

If you do have large BLOBs or strings, SQLite handles this best if the
large blob/string is stored in a table by itself and then linked into
other tables using an integer primary key.  For example:

   CREATE TABLE BigBlobTab (
   blobid INTEGER PRIMARY KEY,
   content BLOB -- big BLOB or text field.
   );
   CREATE TABLE OtherStuff (
   field1 VARCHAR(10),
   field2 FLOAT,
   field3 BOOLEAN,
   blobid INT REFERENCES BigBlobTab
   );

This technique work similarly to storing BLOBs in separate files, in
that it does not mix the BLOB content with the rest of the row, making
the row faster to search and update.  But it has the advantage over
separate storage that BLOB changes are transactional and all content
is stored in a single file on disk.

This is how the Fossil VCS (https://www.fossil-scm.org/) stores content, fwiw.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Wade, William  wrote:
> From https://www.sqlite.org/fileformat2.html, if a row does not fit into a
> single block, its overflow is stored in a linked list. The implication (to
> me) is that if a particular row requires twenty blocks of storage, and I
> want to read a one-byte field that is in the twentieth block, all twenty
> blocks will be read.

(1) We encourage developers to put small boolean and integer value
fields, and frequently accessed fields, first in the CREATE TABLE
statement, and put huge strings and blobs at the end, for exactly this
reason.

(2) SQLite is able to skip intermediate blocks on the linked list
sometimes, when the database is in incremental- or auto-vacuum mode.
It's now 100%* but it works pretty well.  Even so, solution (1) works
better so that is what you should do.

* In the previous paragraph "It's not 100%" means that SQLite is not
always successful in skipping intermediate blocks.  It does always get
the correct answer, 100% of the time.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>
> Ah, so this is what I seem to have missed. The pages... This is unfortunate
> as the read-heavy application won't likely benefit from SQLite.

Your filesystem and your disk hardware work the same way.  Your
application might only ask for 10 bytes, but the underlying OS and
hardware are pulling a complete sector (or perhaps a complete track)
from disk, pass the 10 bytes you asked for off to your app, and
holding the rest in cache.

So, no, you cannot improve performance by only reading a few bytes
here and a few bytes there.  The disk has to work just as hard.
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Richard Hipp
On 10/29/15, Jason H  wrote:
>
> This seems to be at odds with my understanding of "2.1 Record Format" of the
> document. Given that it reads the row varint, which contains the length of
> itself and the serial types of the columns, in order, it should be
> completely able to skip the reading of unneeded columns. If I have 39,39,39
> as my header (excepting the length of header) I know I am dealing with 3
> text columns of 13 bytes each. If I want only the last column, I can then
> seek(78), read(26). At no time did I need to read the prior two columns.
>

I think you are confusing "reading" with "decoding".

SQLite always reads content from disk in page-size chunks.  Each page
is typically 1024 or 4096 bytes, but can be any power of two between
512 and 64K.

Most of the time, a complete row is contained on a single page.  And
that whole page is read off of disk and into memory all in one go.  If
the row has 12 columns and you only ask for the 5th one, then only the
5th column is decoded.  And only the first 5 "serial type" integers
are decoded.  But since all 12 columns are on the same page, they all
get read into memory.

If a single row overflows unto multiple pages, the overflow pages are
only read if needed.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Scott Hess
On Thu, Oct 29, 2015 at 10:59 AM, Jason H  wrote:
>
> The documentation does not go into the detail of the engine is able to
> skip the reading of unneeded interior rows. In theory, it can because the
> length is contained in the header. So instead of read() on every column in
> the row, it can call seek() if it knows it doesn't need that column. My
> question is now simply: does it seek past unneeded columns, or does
> everything get send through read once data has started being read? It's a
> minor detail with bug performance implications.


Everything is page-structured, so you can't skip anything smaller than a
page (even if you could, there may be no benefit, since seeks cost way more
than reads).  Rows which are larger than fit in a page use a singly-linked
list of overflow pages, so you can't get to page N+1 without reading page N
first.

-scott


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Scott Hess
On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
>
> If I could ask a followup question. You made the statement "SQLite reads
> that row of the table from storage, from the first column to the last
> column needed by the SELECT, but perhaps not all the way to the end of the
> columns in the row.", Given an example select that requires columns 2,3,5,
> does this mean that column 4 is also read, or is column 4 be skipped? I
> guess the question is I am assuming that a row is serialization of one or
> more strings that are a serialization of a string preceded by their lengths
> (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
> example):
> begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
> L3*byte, read L4, seek L4, read L5, read L5*byte
> or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?


You should consider reading https://www.sqlite.org/fileformat2.html ,
especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
Pages".  If your project _really_ needs to know this level of detail, then
you really should read up on the underlying system.  Also maybe throw in
https://www.sqlite.org/arch.html to get a broad feel of how things break
down.

-scott