Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Jim Wilcoxson
On 6/22/10, Eric Smith  wrote:
> Jim Wilcoxson wrote:
...
>> Did you see my earlier note about combining your two integers into the
>> primary key?  This will also give you constant insert times, if you
>> insert items in the order:
...
> Thanks also for the tip on insertion order.  Does that also hold for
> multi-column indices (and not single-column indices transformed from two
> integers)?  I assume it's because we get more cache hits and fewer tree
> rebalances when we insert in key-order?

Yes, I'm pretty sure this applies to multi-column indices too.  So if
you do your inserts as:

a=0 b=0
a=1 b=0
...
a=0 b=1
a=1 b=1

Then the first set of rows with b=0 will be added in "more or less"
constant time.  When you start doing the second set of inserts, with
b=1, that will cause pain, because you will be modifying every index
record you created earlier and squeezing a new entry in between every
existing entry.  This will require a lot of journalling.

I think it would run faster (less journalling) to insert in order with:

a=0 b=0
a=0 b=1
a=1 b=0
a=1 b=1
etc.

Even if you load the data without indexes and add the index later, my
guess is that SQLite will still traverse the data in rowid order to
create the index.  So you are still better off inserting in the 2nd
order rather than the first.  The added advantage is that your index
pages will be clustered together.

Jim
-- 
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Simon Slavin

On 23 Jun 2010, at 1:18am, Eric Smith wrote:

> *sigh* kill me.  Sorry for wasting your time there. :/

We've all done it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Scott Hess wrote: 

> You should reduce your demonstration case to something you'd be 
> willing to post the code for.  Probably using synthetic data 
> (preferably generated data).  There's something missing in the thread 
> right now, and it's unlikely to be exposed by random shots in the 
> dark.  

As I was putting together the answer to this request, I decided to 
double-check the result by #ifdef'ing out the sqlite calls.

Turns out I *was* being stupid.

An old app-level error check ran after a hunk of data was inserted.
The check was supported by the PK definition, which I had just removed.
So sqlite was doing a table scan every batch.  Measurements were 
better with user-level indices because one of the indices was usable 
in the error check.  

*sigh* kill me.  Sorry for wasting your time there. :/

So the summary of this thread for those who follow is: 

1. Primary keys cause implicit indices to be defined.
2. If you insert data out of order (according to the index) then you
   have to read/write more pages per insertion.  This, among other
   things, means the journal file can grow with the starting db size,
   not just with the size of the transaction.
3. Consider reducing churn against OS-level caches (or the disk) 
   by increasing sqlite's cache_size.

Thanks again, everyone, for your help!

Eric 

-- 
Eric A. Smith

You made the fatal mistake of assuming that your friends are genuine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Scott Hess wrote: 

> You should reduce your demonstration case to something you'd be 
> willing to post the code for.  Probably using synthetic data 
> (preferably generated data).  There's something missing in the thread 
> right now, and it's unlikely to be exposed by random shots in the 
> dark.  

I'll start doing that and reply here with an obfuscated schema.  In 
the mean time, where can I find version 3.6.18 (whom someone 
claimed definitely does constant-time insertions)?  

-- 
Eric A. Smith

You will never amount to much. 
-- Munich Schoolmaster, to Albert Einstein, age 10
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Scott Hess
Eric,

You should reduce your demonstration case to something you'd be
willing to post the code for.  Probably using synthetic data
(preferably generated data).  There's something missing in the thread
right now, and it's unlikely to be exposed by random shots in the
dark.

-scott


On Tue, Jun 22, 2010 at 3:01 PM, Eric Smith  wrote:
> Richard Hipp wrote:
>
>> When there are no indices, SQLite can put the rows into the database in
>> any order it wants, and it chooses to put them in ascending order.  Thus,
>> each insert becomes a constant-time append.  (Approximately - the truth is
>> a little more complicated, but by waving our arms vigorously, we can still
>> claim constant time per row.)  But if there are indices, each row must be
>> inserted in index order, which involves a b-tree search to find the right
>> spot and possible a rebalancing operation - O(logN) per row.
>
> Again, my observation is that you are *not* doing constant-time inserts
> when there are no indices.
>
> What do you mean, "ascending order"?
>
> The only constraint on the relevant table is a foreign-key ref to a tiny
> table.  But the asymptotic behavior is the same with pragma foreign_keys
> off or on.
>
> I double-checked sqlite_master and there are no indices (not even
> auto-indices) on the table.
>
> Inserts are *faster* at high row counts when there *are* indices.
>
> I am using 3.6.23.1.  I haven't tested earlier versions (waiting on a
> reply in another thread to find out where to get them).
>
> How vigorously are you waving?  Can you describe the real algorithm, or
> at least a second-order approximation?
>
> Eric
>
> --
> Eric A. Smith
>
> What the hell is it good for?
>    -- Robert Lloyd (engineer of the Advanced Computing Systems
>       Division of IBM), to colleagues who insisted that the
>       microprocessor was the wave of the future, c. 1968
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Jay A.  Kreibich wrote: 

> What OS/filesystem are you using?  
> 
> SQL inserts should be near-constant, assuming the table does not 
> have an INTEGER PRIMARY KEY with explicit values.  The table's root 
> B-Tree needs to re-balance every now and then, but if the inserts are 
> in-order (which they will be with an automatic ROWID) this should be 
> rare and cheap-- should should get more rare as the number of rows 
> increases.  
> 
> Many *filesystems* do not provide linear access times, however, 
> especially with larger files.  

Interesting.  But behavior is better (logarithmic) with indices 
defined.  

Right now I'm in 64-bit linux 2.6.18, rhel 5.4.  The fs is ext3.  Not 
sure if this particular box has a raid5 array like the other box did.  
But again, I think it's a moot point: even when I'm completely in the 
page cache behavior is linear, and it improves with indices.  This 
suggests a software algo issue.  

(Just got your corrections, I knew what you meant.:-)

Eric

-- 
Eric A. Smith

This non-pronunciation of initial _h_ is especially common among 
French and British people, who can't pronounce English very well.
-- Tim Pulju
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Jay A. Kreibich

   Uggg

On Tue, Jun 22, 2010 at 05:12:38PM -0500, Jay A. Kreibich scratched on the wall:
> On Tue, Jun 22, 2010 at 04:16:42PM -0400, Eric Smith scratched on the wall:
> > Jim Wilcoxson wrote: 
> > 
> > > Insert times should be constant for the 2nd case: no primary key, no 
> > > indexes; ie, it doesn't matter how many records are already in the 
> > > database.  I confirmed this with SQLite 3.6.18.  
> > 
> > Definitely not constant.  Looks linear to me -- you saw the plot, you
> > can decide for yourself.
> 
>   What OS/filesystem are you using? 
>   
>   SQL inserts should be near-constant, assuming the table does not
>   have an INTEGER PRIMARY KEY with explicit values.  The table's root
>   B-Tree needs to re-balance every now and then, but if the inserts are
>   in-order (which they will be with an automatic ROWID) this should be
>   rare and cheap-- should should get more rare as the number of rows

 and should...

>   increases.
> 
>   Many *filesystems* do not provide linear access times, however,
>   especially with larger files.

   ...constant access...  Many filesystems do not provide constant access.

> 
>-j
> 

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Jay A. Kreibich
On Tue, Jun 22, 2010 at 04:16:42PM -0400, Eric Smith scratched on the wall:
> Jim Wilcoxson wrote: 
> 
> > Insert times should be constant for the 2nd case: no primary key, no 
> > indexes; ie, it doesn't matter how many records are already in the 
> > database.  I confirmed this with SQLite 3.6.18.  
> 
> Definitely not constant.  Looks linear to me -- you saw the plot, you
> can decide for yourself.

  What OS/filesystem are you using? 
  
  SQL inserts should be near-constant, assuming the table does not
  have an INTEGER PRIMARY KEY with explicit values.  The table's root
  B-Tree needs to re-balance every now and then, but if the inserts are
  in-order (which they will be with an automatic ROWID) this should be
  rare and cheap-- should should get more rare as the number of rows
  increases.

  Many *filesystems* do not provide linear access times, however,
  especially with larger files.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Richard Hipp wrote: 

> When there are no indices, SQLite can put the rows into the database in 
> any order it wants, and it chooses to put them in ascending order.  Thus, 
> each insert becomes a constant-time append.  (Approximately - the truth is 
> a little more complicated, but by waving our arms vigorously, we can still 
> claim constant time per row.)  But if there are indices, each row must be 
> inserted in index order, which involves a b-tree search to find the right 
> spot and possible a rebalancing operation - O(logN) per row.  

Again, my observation is that you are *not* doing constant-time inserts 
when there are no indices.  

What do you mean, "ascending order"?  

The only constraint on the relevant table is a foreign-key ref to a tiny 
table.  But the asymptotic behavior is the same with pragma foreign_keys 
off or on.  

I double-checked sqlite_master and there are no indices (not even 
auto-indices) on the table.  

Inserts are *faster* at high row counts when there *are* indices.  

I am using 3.6.23.1.  I haven't tested earlier versions (waiting on a 
reply in another thread to find out where to get them).  

How vigorously are you waving?  Can you describe the real algorithm, or 
at least a second-order approximation?  

Eric 

-- 
Eric A. Smith

What the hell is it good for?
-- Robert Lloyd (engineer of the Advanced Computing Systems
   Division of IBM), to colleagues who insisted that the
   microprocessor was the wave of the future, c. 1968
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Richard Hipp
On Tue, Jun 22, 2010 at 3:08 PM, Eric Smith  wrote:

> I have confirmed that INSERT times are roughly logarithmic in
> the number of existing records after creating my specific user
> indices.
>
> But INSERT times appeared to be *linear* in the number of existing
> records before I had created any user indices (and with no primary
> keys or unique indices defined).
>
> Can anyone explain this?
>

When there are no indices, SQLite can put the rows into the database in any
order it wants, and it chooses to put them in ascending order.  Thus, each
insert becomes a constant-time append.  (Approximately - the truth is a
little more complicated, but by waving our arms vigorously, we can still
claim constant time per row.)  But if there are indices, each row must be
inserted in index order, which involves a b-tree search to find the right
spot and possible a rebalancing operation - O(logN) per row.



>
> Eric
>
> >   time (minutes) to insert 2m records
> > 10
> ++--+--+---+--+---+-++
> >+   +  +   +  A   +   +
>+
> >  9
> ++..++
> >|  AAA A
>   |
> >  8
> ++..AAA..A..++
> >| A   
>   |
> >  7
> ++.A.A..AAA.++
> >|  
>|
> >  6
> ++..AA...A.A++
> >|AAA
>   |
> >  5
> ++A...AA++
> >  4
> ++.AA.AA..A.A...++
> >|   A AAA  AA
>|
> >  3
> ++..++
> >|   AAA A  A
>   |
> >  2
> ++.AA...++
> >| AA
>   |
> >  1
> ++...AAA++
> >AAA +  +   +  +   +
>+
> >  0
> ++--+--+---+--+---+-++
> >0  100200 300400 500
>  600
> >millions of existing records
>
> --
> Eric A. Smith
>
> A nickel ain't worth a dime anymore.
>-- Yogi Berra
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
-
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
Jim Wilcoxson wrote: 

> Insert times should be constant for the 2nd case: no primary key, no 
> indexes; ie, it doesn't matter how many records are already in the 
> database.  I confirmed this with SQLite 3.6.18.  

Definitely not constant.  Looks linear to me -- you saw the plot, you
can decide for yourself.

I'm in SQLite 3.6.23.1.  How do I score an old version to test it?

> Did you see my earlier note about combining your two integers into the 
> primary key?  This will also give you constant insert times, if you 
> insert items in the order: 

Hey sorry, I didn't see that.  Cute idea, but my accessors are in Tcl, 
I don't want to do bit twiddling or query mangling on the read side from 
Tcl, and I don't want to re-write it in C.  Plus a host of other reasons
that would bore the SQLite community.  I'm actually rather happy without
any primary key definition right now.

Thanks also for the tip on insertion order.  Does that also hold for 
multi-column indices (and not single-column indices transformed from two 
integers)?  I assume it's because we get more cache hits and fewer tree 
rebalances when we insert in key-order?

Before I make any more adjustments, I want to understand why I'm linear
with no indices!

I'm pretty sure I'm not doing anything stupid, like setting evil 
compile-time options or whatever.  But then again most stupid people 
don't think their results come from being stupid.  

Eric 

-- 
Eric A. Smith

Aeropalmics (ayr o palm' iks), n.: 
The study of wind resistance conducted by holding a cupped 
hand out the car window.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Jim Wilcoxson
On 6/22/10, Eric Smith  wrote:
> I have confirmed that INSERT times are roughly logarithmic in
> the number of existing records after creating my specific user
> indices.
>
> But INSERT times appeared to be *linear* in the number of existing
> records before I had created any user indices (and with no primary
> keys or unique indices defined).
>
> Can anyone explain this?
>
> Eric

Insert times should be constant for the 2nd case: no primary key, no
indexes; ie, it doesn't matter how many records are already in the
database.  I confirmed this with SQLite 3.6.18.

Did you see my earlier note about combining your two integers into the
primary key?  This will also give you constant insert times, if you
insert items in the order:

a=0, b=0
a=0, b=1
a=1, b=0
a=1, b=1
etc.

Jim
-- 
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Eric Smith
I have confirmed that INSERT times are roughly logarithmic in 
the number of existing records after creating my specific user 
indices.

But INSERT times appeared to be *linear* in the number of existing 
records before I had created any user indices (and with no primary 
keys or unique indices defined).

Can anyone explain this?

Eric

>   time (minutes) to insert 2m records
> 10 ++--+--+---+--+---+-++
>+   +  +   +  A   +   +  +
>  9 ++..++
>|  AAA A |
>  8 ++..AAA..A..++
>| A      |
>  7 ++.A.A..AAA.++
>|    |
>  6 ++..AA...A.A++
>|AAA |
>  5 ++A...AA++
>  4 ++.AA.AA..A.A...++
>|   A AAA  AA|
>  3 ++..++
>|   AAA A  A |
>  2 ++.AA...++
>| AA |
>  1 ++...AAA++
>AAA +  +   +  +   +  +
>  0 ++--+--+---+--+---+-++
>0  100200 300400 500600
>millions of existing records

--
Eric A. Smith

A nickel ain't worth a dime anymore.
-- Yogi Berra
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users