Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-05 Thread Udi Karni
Watching Windows 7 Resource Monitor (launched from a button on Windows Task
Manager) - I see that sqlite - directly - or through Windows - generates
quite a bit of activity on a temp file located on
C:\users\owner\AppData\Local\Temp - especially when running large joins,
etc. There are large read and write queues on this file - so it's
definitely slowing things down.

I suspect this is the sqlite equivalent of TempDB - a scratchpad where
sorting / grouping / joining takes place.

I am wondering about adding a Solid State Drive, and redirecting Windows
from the C:\ location this SSD drive - where performance will hopefully be
better than the C: drive. There seem to be some posts on how to do this.

Then again - Resource Monitor shows dozens of various files on the C: drive
being constantly read / written as Sqlite works - though to a much lesser
degree than the Temp location. Since Windows seems very tightly bound to
the C: drive - I am also wondering about getting a PC with an SSD C: drive
where all IO to all locations on the C: drive will hopefully be faster.

Lots to research, and a new concept to get used to - having the entire OS
on an SSD.

If anyone has tried any of this and would like to share their experience -
it would be much appreciated.



On Fri, Feb 3, 2012 at 10:48 AM, Alexey Pechnikov wrote:

> You can try page size 8192 and cache size 128000.
> Note: is required vacuuming after changing page size on existing database.
>
> Inserting/updating big indexed tables may be slow. In last SQLite versions
> a index creating performance is optimized and so indexing fully populated
> table is a good idea when it's possible.
>
> Don't use b-tree index for text fields. Use instead FTS4 extension or
> integer
> hash value to index text fields. Note: index compression is not supported
> by
> SQLite.
>
> 100+ Gb table and database is not too much for SQLite. I think you have
> problems with big indexed tables but not with big tables. Big cache or
> RAM drive or SSD disk may increase index updating speed. When a index
> is larger than cache size (for parameters above cache size will be
> 128 000 *8 192 bytes) all index moditications is very disk expensive
> operations.
>
> P.S. Do not use cache size > 2Gb on 32-bit hosts.
>
> 2012/2/3 Udi Karni :
> > Marcus - thanks. I will experiment with those 2 PRAGMAs.
> >
> > Meanwhile - I was trying to update some columns in a table with 130
> million
> > rows and got this error -
> >
> > "Error: out of memory"
> >
> > I am not sure why. I thought the "UPDATE" just writes out new rows and
> > maybe a journal of the old rows for recoverability - but I am not sure
> why
> > it would need very much memory.
> >
> > Be that as it may - and with regard to your suggestion - and in light of
> > this error message - given that I only have 4GB of RAM on my PC - is this
> > really enough RAM to handle tables of this size ? Or am I giving Sqlite
> an
> > unreasonably small amount of resources and it's time for a serious
> hardware
> > upgrade?
> >
> > Thanks,
> >
> > Udi
> >
> > On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm  >wrote:
> >
> >> > Given how clever and compelling Sqlite is - I am testing how it
> scales to
> >> > tables in the 100GB / 200 million row range. This is for a strictly
> "read
> >> > only" application - but first the tables must be populated in a one
> time
> >> > process. As is often the case with Big Data - the data is a little
> dirty
> >> -
> >> > so the process involves importing - selecting - counting - inspecting
> -
> >> > updating some rows - deleting some rows - selecting - counting -
> >> > inspecting, etc. until clean.
> >> >
> >> > Placing the Sqlite database on a traditional C: drive - IO was too
> slow.
> >> > At
> >> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> >> > external Raid array where I ran across an interesting find. IO wasn't
> >> that
> >> > much faster - until I vaccuumed the database - which increase IO 10X
> to
> >> > 150
> >> > MB/sec - with the same CPU utilization.
> >> >
> >> > This is good news for the final implementation of this read-only
> database
> >> > -
> >> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing
> >> > a
> >> > single DML against a table - even a DELETE which deletes no rows at
> all -
> >> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting
> /
> >> > DMLing - which makes the data loading / cleansing phase very long.
> >> >
> >> > So I have 2 questions -
> >> >
> >> > (1) Why would simple DML cause such an extreme slowdown as compared
> with
> >> > "post vaccuum" speeds ?
> >> >
> >> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> >> > without resorting to ".vaccuum" ?
> >>
> >>
> >> You didn't tell if you already set the usual tricks to speed up
> >> your load phase. That would be to increase the page cache and try
> >> with reduced 

Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-03 Thread Udi Karni
Thanks !  I will experiment.

I have Windows 7 (64 bit) with 4GB RAM - so I suspect that 2GB is the most
I can specify anyway.

Is there a 32-bit versus 64-bit official releases of the command line shell?

On Fri, Feb 3, 2012 at 10:48 AM, Alexey Pechnikov wrote:

> You can try page size 8192 and cache size 128000.
> Note: is required vacuuming after changing page size on existing database.
>
> Inserting/updating big indexed tables may be slow. In last SQLite versions
> a index creating performance is optimized and so indexing fully populated
> table is a good idea when it's possible.
>
> Don't use b-tree index for text fields. Use instead FTS4 extension or
> integer
> hash value to index text fields. Note: index compression is not supported
> by
> SQLite.
>
> 100+ Gb table and database is not too much for SQLite. I think you have
> problems with big indexed tables but not with big tables. Big cache or
> RAM drive or SSD disk may increase index updating speed. When a index
> is larger than cache size (for parameters above cache size will be
> 128 000 *8 192 bytes) all index moditications is very disk expensive
> operations.
>
> P.S. Do not use cache size > 2Gb on 32-bit hosts.
>
> 2012/2/3 Udi Karni :
>  > Marcus - thanks. I will experiment with those 2 PRAGMAs.
> >
> > Meanwhile - I was trying to update some columns in a table with 130
> million
> > rows and got this error -
> >
> > "Error: out of memory"
> >
> > I am not sure why. I thought the "UPDATE" just writes out new rows and
> > maybe a journal of the old rows for recoverability - but I am not sure
> why
> > it would need very much memory.
> >
> > Be that as it may - and with regard to your suggestion - and in light of
> > this error message - given that I only have 4GB of RAM on my PC - is this
> > really enough RAM to handle tables of this size ? Or am I giving Sqlite
> an
> > unreasonably small amount of resources and it's time for a serious
> hardware
> > upgrade?
> >
> > Thanks,
> >
> > Udi
> >
> > On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm  >wrote:
> >
> >> > Given how clever and compelling Sqlite is - I am testing how it
> scales to
> >> > tables in the 100GB / 200 million row range. This is for a strictly
> "read
> >> > only" application - but first the tables must be populated in a one
> time
> >> > process. As is often the case with Big Data - the data is a little
> dirty
> >> -
> >> > so the process involves importing - selecting - counting - inspecting
> -
> >> > updating some rows - deleting some rows - selecting - counting -
> >> > inspecting, etc. until clean.
> >> >
> >> > Placing the Sqlite database on a traditional C: drive - IO was too
> slow.
> >> > At
> >> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> >> > external Raid array where I ran across an interesting find. IO wasn't
> >> that
> >> > much faster - until I vaccuumed the database - which increase IO 10X
> to
> >> > 150
> >> > MB/sec - with the same CPU utilization.
> >> >
> >> > This is good news for the final implementation of this read-only
> database
> >> > -
> >> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing
> >> > a
> >> > single DML against a table - even a DELETE which deletes no rows at
> all -
> >> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting
> /
> >> > DMLing - which makes the data loading / cleansing phase very long.
> >> >
> >> > So I have 2 questions -
> >> >
> >> > (1) Why would simple DML cause such an extreme slowdown as compared
> with
> >> > "post vaccuum" speeds ?
> >> >
> >> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> >> > without resorting to ".vaccuum" ?
> >>
> >>
> >> You didn't tell if you already set the usual tricks to speed up
> >> your load phase. That would be to increase the page cache and try
> >> with reduced syncs.
> >> See
> >> PRAGMA cache_size
> >> PRAGMA synchronous
> >>
> >> In particular the page cache should be increased dramatically
> >> for huge DB files.
> >>
> >> Marcus
> >>
> >> >
> >> > Thanks,
> >> >
> >> > Udi
> >> > ___
> >> > sqlite-users mailing list
> >> > sqlite-users@sqlite.org
> >> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >> >
> >>
> >>
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
> --
> Best regards, Alexey Pechnikov.
> http://pechnikov.tel/
>  ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-03 Thread Alexey Pechnikov
You can try page size 8192 and cache size 128000.
Note: is required vacuuming after changing page size on existing database.

Inserting/updating big indexed tables may be slow. In last SQLite versions
a index creating performance is optimized and so indexing fully populated
table is a good idea when it's possible.

Don't use b-tree index for text fields. Use instead FTS4 extension or integer
hash value to index text fields. Note: index compression is not supported by
SQLite.

100+ Gb table and database is not too much for SQLite. I think you have
problems with big indexed tables but not with big tables. Big cache or
RAM drive or SSD disk may increase index updating speed. When a index
is larger than cache size (for parameters above cache size will be
128 000 *8 192 bytes) all index moditications is very disk expensive
operations.

P.S. Do not use cache size > 2Gb on 32-bit hosts.

2012/2/3 Udi Karni :
> Marcus - thanks. I will experiment with those 2 PRAGMAs.
>
> Meanwhile - I was trying to update some columns in a table with 130 million
> rows and got this error -
>
> "Error: out of memory"
>
> I am not sure why. I thought the "UPDATE" just writes out new rows and
> maybe a journal of the old rows for recoverability - but I am not sure why
> it would need very much memory.
>
> Be that as it may - and with regard to your suggestion - and in light of
> this error message - given that I only have 4GB of RAM on my PC - is this
> really enough RAM to handle tables of this size ? Or am I giving Sqlite an
> unreasonably small amount of resources and it's time for a serious hardware
> upgrade?
>
> Thanks,
>
> Udi
>
> On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm wrote:
>
>> > Given how clever and compelling Sqlite is - I am testing how it scales to
>> > tables in the 100GB / 200 million row range. This is for a strictly "read
>> > only" application - but first the tables must be populated in a one time
>> > process. As is often the case with Big Data - the data is a little dirty
>> -
>> > so the process involves importing - selecting - counting - inspecting -
>> > updating some rows - deleting some rows - selecting - counting -
>> > inspecting, etc. until clean.
>> >
>> > Placing the Sqlite database on a traditional C: drive - IO was too slow.
>> > At
>> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
>> > external Raid array where I ran across an interesting find. IO wasn't
>> that
>> > much faster - until I vaccuumed the database - which increase IO 10X to
>> > 150
>> > MB/sec - with the same CPU utilization.
>> >
>> > This is good news for the final implementation of this read-only database
>> > -
>> > but still a dilemma at the data load phase. After a ".vaccuum" - issueing
>> > a
>> > single DML against a table - even a DELETE which deletes no rows at all -
>> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
>> > DMLing - which makes the data loading / cleansing phase very long.
>> >
>> > So I have 2 questions -
>> >
>> > (1) Why would simple DML cause such an extreme slowdown as compared with
>> > "post vaccuum" speeds ?
>> >
>> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
>> > without resorting to ".vaccuum" ?
>>
>>
>> You didn't tell if you already set the usual tricks to speed up
>> your load phase. That would be to increase the page cache and try
>> with reduced syncs.
>> See
>> PRAGMA cache_size
>> PRAGMA synchronous
>>
>> In particular the page cache should be increased dramatically
>> for huge DB files.
>>
>> Marcus
>>
>> >
>> > Thanks,
>> >
>> > Udi
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Marcus Grimm
> Marcus - thanks. I will experiment with those 2 PRAGMAs.
>
> Meanwhile - I was trying to update some columns in a table with 130
> million
> rows and got this error -
>
> "Error: out of memory"

Did you get that after you increased the cache ?

Strange, I never got that during my huge DB tests.
But I'm not sure what will happen if a cache is
specified that would exceed the hardware (RAM) capabilities.
Interesting question, though.

Be careful: The cache pragma works on page-sizes, that is, it
specifies for example in 1K blocks.

>
> I am not sure why. I thought the "UPDATE" just writes out new rows and
> maybe a journal of the old rows for recoverability - but I am not sure why
> it would need very much memory.

It may depend on the update if it involves indexes loads and updates.

>
> Be that as it may - and with regard to your suggestion - and in light of
> this error message - given that I only have 4GB of RAM on my PC - is this
> really enough RAM to handle tables of this size ? Or am I giving Sqlite an
> unreasonably small amount of resources and it's time for a serious
> hardware
> upgrade?

I don't think that you will be able to specify more than 2-3 GB
cache, depending on your OS, unless you compile a 64 bit version
of sqlite (I never did that).
However, a 100 GB sqlite DB file should not be any problem, except
the reported slow down.

Marcus

>
> Thanks,
>
> Udi
>
> On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm
> wrote:
>
>> > Given how clever and compelling Sqlite is - I am testing how it scales
>> to
>> > tables in the 100GB / 200 million row range. This is for a strictly
>> "read
>> > only" application - but first the tables must be populated in a one
>> time
>> > process. As is often the case with Big Data - the data is a little
>> dirty
>> -
>> > so the process involves importing - selecting - counting - inspecting
>> -
>> > updating some rows - deleting some rows - selecting - counting -
>> > inspecting, etc. until clean.
>> >
>> > Placing the Sqlite database on a traditional C: drive - IO was too
>> slow.
>> > At
>> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
>> > external Raid array where I ran across an interesting find. IO wasn't
>> that
>> > much faster - until I vaccuumed the database - which increase IO 10X
>> to
>> > 150
>> > MB/sec - with the same CPU utilization.
>> >
>> > This is good news for the final implementation of this read-only
>> database
>> > -
>> > but still a dilemma at the data load phase. After a ".vaccuum" -
>> issueing
>> > a
>> > single DML against a table - even a DELETE which deletes no rows at
>> all -
>> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting
>> /
>> > DMLing - which makes the data loading / cleansing phase very long.
>> >
>> > So I have 2 questions -
>> >
>> > (1) Why would simple DML cause such an extreme slowdown as compared
>> with
>> > "post vaccuum" speeds ?
>> >
>> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
>> > without resorting to ".vaccuum" ?
>>
>>
>> You didn't tell if you already set the usual tricks to speed up
>> your load phase. That would be to increase the page cache and try
>> with reduced syncs.
>> See
>> PRAGMA cache_size
>> PRAGMA synchronous
>>
>> In particular the page cache should be increased dramatically
>> for huge DB files.
>>
>> Marcus
>>
>> >
>> > Thanks,
>> >
>> > Udi
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Udi Karni
Marcus - thanks. I will experiment with those 2 PRAGMAs.

Meanwhile - I was trying to update some columns in a table with 130 million
rows and got this error -

"Error: out of memory"

I am not sure why. I thought the "UPDATE" just writes out new rows and
maybe a journal of the old rows for recoverability - but I am not sure why
it would need very much memory.

Be that as it may - and with regard to your suggestion - and in light of
this error message - given that I only have 4GB of RAM on my PC - is this
really enough RAM to handle tables of this size ? Or am I giving Sqlite an
unreasonably small amount of resources and it's time for a serious hardware
upgrade?

Thanks,

Udi

On Thu, Feb 2, 2012 at 10:03 PM, Marcus Grimm wrote:

> > Given how clever and compelling Sqlite is - I am testing how it scales to
> > tables in the 100GB / 200 million row range. This is for a strictly "read
> > only" application - but first the tables must be populated in a one time
> > process. As is often the case with Big Data - the data is a little dirty
> -
> > so the process involves importing - selecting - counting - inspecting -
> > updating some rows - deleting some rows - selecting - counting -
> > inspecting, etc. until clean.
> >
> > Placing the Sqlite database on a traditional C: drive - IO was too slow.
> > At
> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> > external Raid array where I ran across an interesting find. IO wasn't
> that
> > much faster - until I vaccuumed the database - which increase IO 10X to
> > 150
> > MB/sec - with the same CPU utilization.
> >
> > This is good news for the final implementation of this read-only database
> > -
> > but still a dilemma at the data load phase. After a ".vaccuum" - issueing
> > a
> > single DML against a table - even a DELETE which deletes no rows at all -
> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> > DMLing - which makes the data loading / cleansing phase very long.
> >
> > So I have 2 questions -
> >
> > (1) Why would simple DML cause such an extreme slowdown as compared with
> > "post vaccuum" speeds ?
> >
> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> > without resorting to ".vaccuum" ?
>
>
> You didn't tell if you already set the usual tricks to speed up
> your load phase. That would be to increase the page cache and try
> with reduced syncs.
> See
> PRAGMA cache_size
> PRAGMA synchronous
>
> In particular the page cache should be increased dramatically
> for huge DB files.
>
> Marcus
>
> >
> > Thanks,
> >
> > Udi
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Marcus Grimm
> Given how clever and compelling Sqlite is - I am testing how it scales to
> tables in the 100GB / 200 million row range. This is for a strictly "read
> only" application - but first the tables must be populated in a one time
> process. As is often the case with Big Data - the data is a little dirty -
> so the process involves importing - selecting - counting - inspecting -
> updating some rows - deleting some rows - selecting - counting -
> inspecting, etc. until clean.
>
> Placing the Sqlite database on a traditional C: drive - IO was too slow.
> At
> 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> external Raid array where I ran across an interesting find. IO wasn't that
> much faster - until I vaccuumed the database - which increase IO 10X to
> 150
> MB/sec - with the same CPU utilization.
>
> This is good news for the final implementation of this read-only database
> -
> but still a dilemma at the data load phase. After a ".vaccuum" - issueing
> a
> single DML against a table - even a DELETE which deletes no rows at all -
> causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> DMLing - which makes the data loading / cleansing phase very long.
>
> So I have 2 questions -
>
> (1) Why would simple DML cause such an extreme slowdown as compared with
> "post vaccuum" speeds ?
>
> (2) Any knobs to turn to try and maintain the higher speeds post DML -
> without resorting to ".vaccuum" ?


You didn't tell if you already set the usual tricks to speed up
your load phase. That would be to increase the page cache and try
with reduced syncs.
See
PRAGMA cache_size
PRAGMA synchronous

In particular the page cache should be increased dramatically
for huge DB files.

Marcus

>
> Thanks,
>
> Udi
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Udi Karni
Thanks !  Good ideas. SSD is still a little exotic price-wise, but closing
and defragging is easy to try.

Coming to think of it - your suggestion to break the process into batches -
is probably the way to go. By placing each table in its own DB - not only
are they smaller and easier to manager - but the vaccuum process becomes a
single table vaccuum instead of a multi-table vaccum - not having to redo
tables that are already streamlined. Very cool.

In which case - how about not combining them at all - and attaching?

In other words - is the performance of 1 large DB that includes all the
data tables plus the "work" tables - about equivalent to a "work only" DB
that attaches, as needed, to multiple external DBs - each containing 1
table? With some databases - cross-database communication is much slower
than keeping everything local. Is there substantial overhead to using
multiple "attach" statements in Sqlite?

Thanks,

Udi

On Thu, Feb 2, 2012 at 3:51 PM, Simon Slavin  wrote:

>
> On 2 Feb 2012, at 11:31pm, Udi Karni wrote:
>
> > Given how clever and compelling Sqlite is - I am testing how it scales to
> > tables in the 100GB / 200 million row range. This is for a strictly "read
> > only" application - but first the tables must be populated in a one time
> > process. As is often the case with Big Data - the data is a little dirty
> -
> > so the process involves importing - selecting - counting - inspecting -
> > updating some rows - deleting some rows - selecting - counting -
> > inspecting, etc. until clean.
> >
> > Placing the Sqlite database on a traditional C: drive - IO was too slow.
> At
> > 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> > external Raid array where I ran across an interesting find. IO wasn't
> that
> > much faster - until I vaccuumed the database - which increase IO 10X to
> 150
> > MB/sec - with the same CPU utilization.
> >
> > This is good news for the final implementation of this read-only
> database -
> > but still a dilemma at the data load phase. After a ".vaccuum" -
> issueing a
> > single DML against a table - even a DELETE which deletes no rows at all -
> > causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> > DMLing - which makes the data loading / cleansing phase very long.
>
> Nice description of your situation and requirements.  Makes it easier to
> answer your questions.
>
> > So I have 2 questions -
> >
> > (1) Why would simple DML cause such an extreme slowdown as compared with
> > "post vaccuum" speeds ?
>
> Spins.  Reading one area of the database file means waiting for rotations
> of the disk.  If the data you need is fragmented you end up wasting a lot
> of time.  You can't speed it up because you are just sitting there waiting
> for the disk to turn to be in the right place.  Instead of using a RAID try
> using a non-episodic medium like a solid-state storage instead.
>
> > (2) Any knobs to turn to try and maintain the higher speeds post DML -
> > without resorting to ".vaccuum" ?
>
> There are a ton of things which will help but probably not enough to make
> it worthwhile for a one-shot job.  You can presort your data into batches.
>  Searching an index which is already in order is faster.  You can close the
> database, use your OS' facilities to defragment the file, then reopen the
> database.  (This helps a lot under Windows which is very sensitive to
> fragmentation, somewhat less for other OSen.)  You can do your initial
> insertions into smaller databases then merge them.
>
> But your telling of your situation suggests to me that this isn't worth
> doing.  You have a one-time-only activity.  It's probably gonna take you
> longer to do the programming than you'll save.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Simon Slavin

On 2 Feb 2012, at 11:31pm, Udi Karni wrote:

> Given how clever and compelling Sqlite is - I am testing how it scales to
> tables in the 100GB / 200 million row range. This is for a strictly "read
> only" application - but first the tables must be populated in a one time
> process. As is often the case with Big Data - the data is a little dirty -
> so the process involves importing - selecting - counting - inspecting -
> updating some rows - deleting some rows - selecting - counting -
> inspecting, etc. until clean.
> 
> Placing the Sqlite database on a traditional C: drive - IO was too slow. At
> 15 MB/sec - reading a 50GB table would take an hour. So I moved it to
> external Raid array where I ran across an interesting find. IO wasn't that
> much faster - until I vaccuumed the database - which increase IO 10X to 150
> MB/sec - with the same CPU utilization.
> 
> This is good news for the final implementation of this read-only database -
> but still a dilemma at the data load phase. After a ".vaccuum" - issueing a
> single DML against a table - even a DELETE which deletes no rows at all -
> causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
> DMLing - which makes the data loading / cleansing phase very long.

Nice description of your situation and requirements.  Makes it easier to answer 
your questions.

> So I have 2 questions -
> 
> (1) Why would simple DML cause such an extreme slowdown as compared with
> "post vaccuum" speeds ?

Spins.  Reading one area of the database file means waiting for rotations of 
the disk.  If the data you need is fragmented you end up wasting a lot of time. 
 You can't speed it up because you are just sitting there waiting for the disk 
to turn to be in the right place.  Instead of using a RAID try using a 
non-episodic medium like a solid-state storage instead.

> (2) Any knobs to turn to try and maintain the higher speeds post DML -
> without resorting to ".vaccuum" ?

There are a ton of things which will help but probably not enough to make it 
worthwhile for a one-shot job.  You can presort your data into batches.  
Searching an index which is already in order is faster.  You can close the 
database, use your OS' facilities to defragment the file, then reopen the 
database.  (This helps a lot under Windows which is very sensitive to 
fragmentation, somewhat less for other OSen.)  You can do your initial 
insertions into smaller databases then merge them.

But your telling of your situation suggests to me that this isn't worth doing.  
You have a one-time-only activity.  It's probably gonna take you longer to do 
the programming than you'll save.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Speeding up Sqlite reads after DML

2012-02-02 Thread Udi Karni
Given how clever and compelling Sqlite is - I am testing how it scales to
tables in the 100GB / 200 million row range. This is for a strictly "read
only" application - but first the tables must be populated in a one time
process. As is often the case with Big Data - the data is a little dirty -
so the process involves importing - selecting - counting - inspecting -
updating some rows - deleting some rows - selecting - counting -
inspecting, etc. until clean.

Placing the Sqlite database on a traditional C: drive - IO was too slow. At
15 MB/sec - reading a 50GB table would take an hour. So I moved it to
external Raid array where I ran across an interesting find. IO wasn't that
much faster - until I vaccuumed the database - which increase IO 10X to 150
MB/sec - with the same CPU utilization.

This is good news for the final implementation of this read-only database -
but still a dilemma at the data load phase. After a ".vaccuum" - issueing a
single DML against a table - even a DELETE which deletes no rows at all -
causes IO to drop back down to 15 MB/sec - on the table I'm selecting /
DMLing - which makes the data loading / cleansing phase very long.

So I have 2 questions -

(1) Why would simple DML cause such an extreme slowdown as compared with
"post vaccuum" speeds ?

(2) Any knobs to turn to try and maintain the higher speeds post DML -
without resorting to ".vaccuum" ?

Thanks,

Udi
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users