Re: [sqlite] Shared in-memory SQLite database in shared memory

2012-11-03 Thread Jaco Breitenbach
Hi Simon,

In my application I have multiple separate processes, not threads.  I
believe the shared cache doesn't work across processes.  I essentially want
a single shared read-only database that can be accessed at high speed.
 Since the data set can become quite large and with multiple processes
requiring access to the identical data, I wanted to have the database in
shared memory.   The database will contain reference information with which
the data I'm processing must be enriched.  So once the database has been
loaded in memory, it will remain static and its contents will never change.

Best regards,
Jaco

On Saturday, November 3, 2012, Simon Slavin wrote:

>
> On 3 Nov 2012, at 12:26pm, Jaco Breitenbach 
> >
> wrote:
>
> > My application consists of several indepent processes that must all
> access
> > (read) the same data table during processing.  In order to optimise
> memory
> > usage I was wondering if it is possible to load an in-memory SQLite
> > database into shared memory.
>
> Not simply and not without a little strangeness.  However, SQLite has a
> Shared Cache mode which is easy to use, and you can open your database on
> disk, but declare a really big cache size for it.  This will remove the
> need to use an in-memory database since SQLite will gradually read the
> whole database in to the cache.  Read this:
>
> <http://www.sqlite.org/sharedcache.html>
>
> and this:
>
> <http://www.sqlite.org/pragma.html#pragma_cache_size>
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org 
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Sent from Gmail Mobile
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Page cache block size problem

2012-02-09 Thread Jaco Breitenbach
Dear experts,

In my application performance is of critical importance.  I've chosen to
preallocate a large contiguous block of memory for the page cache.

I make use of the following function to configure the page cache:
sqlite3_config(SQLITE_CONFIG_PAGECACHE, void*, int sz, int N);

The documentation for this function says the following:
*This option specifies a static memory buffer that SQLite can use for the
database page cache with the default page cache implementation. There are
three arguments to this option: A pointer to 8-byte aligned memory, the
size of each page buffer (sz), and the number of pages (N). The sz argument
should be the size of the largest database page (a power of two between 512
and 32768) plus a little extra for each page header. The page header size
is 20 to 40 bytes depending on the host architecture. It is harmless, apart
from the wasted memory, to make sz a little too large.*

Now, first of all, with my default database block size of 1024, the 20 to
40 bytes documented for the cache header size is wrong.  On the Linux and
HP-UX Itanium systems where I've tested this with SQLite 3.7.8, the cache
header size was 248 bytes.  This could be seen in the output from
sqlite3_status(SQLITE_STATUS_PAGECACHE_SIZE) which returned a value of
about 1272 bytes.  (1272 - 1024 = 248).  This means the page cache header
size is already 25% of the size of the actual data block, or put
differently, 20% of the preallocated cache buffer is 'lost' to the header
component.  With a cache buffer of 5 GB, this means a total of 1 GB for the
header meta data only.

SQLite is a little quirky in the way that even though I may allocate a
block of 1 GB of memory for the page cache, if I specify a too small cache
block size (sz value) in the call to
sqlite3_config(SQLITE_CONFIG_PAGECACHE, void*, int sz, int N), then the
cache page allocation from the preallocated buffer will fail and that
memory won't be used for the cache at all, instead it falls back to the
default allocator and simply allocates additional memory for the cache.
The preallocated memory block is totally wasted.

Now since my application performs a high number of transactions, the page
cache grows to several GB of memory.  Since memory is also a finite
resource, I tried to limit the cache block size to the minimum required.
When I first tried it with the documented 20 - 40 additional bytes for the
header, it failed.  However, using the 248 header bytes that resulted from
the output of sqlite_status(SQLITE_STATUS_PAGECACHE_SIZE), it worked fine.
That was with SQLite version 3.7.8.

Now my dilemma is that it seems the page cache block header size was
increased from 3.7.8 to 3.7.10.  With 3.7.10
sqlite_status(SQLITE_STATUS_PAGECACHE_SIZE) now returns 1312 bytes,
indicating a page cache header size of 288 bytes.  This means that once
again my application is not using the preallocated buffer at all, but is
allocating cache with malloc() and is running out of memory.  So I have to
change the header size constant, recompile and redeploy the application.

In order to avoid this scenario in future with future releases of SQLite,
is there a way to determine the page cache header size within the code,
keeping in mind that the page cache is configured using sqlite3_config() *
before* the call to sqlite3_initialize()?

Any advice on handling this problem would be much appreciated.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] WAL checkpointing

2011-09-29 Thread Jaco Breitenbach
Dear all,

In the WAL documentation (http://www.sqlite.org/wal.html) it is hinted that
the checkpoint operation, which is normally in the same thread if execution
as the database commit, could be moved to a separate thread or process.  If
I were to run the checkpoint in a separate thread (using
sqlite3_wal_checkpoint(DB, NULL)), can I reuse the original database
connection, or will the checkpointing interfere with the ongoing processing
(inserts done in the main thread) if the connection is shared?  Would it be
better to create a new connection to the database just for the sake of the
checkpointing?

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Cache question

2011-09-26 Thread Jaco Breitenbach
I just wanted to make a correction:

On 26 September 2011 22:50, Jaco Breitenbach wrote:

> Now as the cache usage increases, I find that it finally caps at 3226.381
> MB (of the total 4096 MB) before the overflow size starts to increase.  At
> an initial page size of 1300, the 4096 MB gives 3,303,821 pages.  So,
> dividing the used 3226.381 MB by the number of pages, gives 1024 bytes per
> page.
>


> 2.  The size reported by S*_PAGECACHE_SIZE reports only the size used to
> store actual data, excluding the headers?


I've checked again, and realised that SQLITE_STATUS_PAGECACHE_USED returns
the number of pages and not a size in bytes as
SQLITE_STATUS_PAGECACHE_OVERFLOW does, so the discrepancy is explained by me
using the wrong constant to multiply with the page count to obtain physical
memory size.

Apologies for that.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Cache question

2011-09-26 Thread Jaco Breitenbach
Dear experts,

I have a question about the following cache related functions:

   - sqlite3_config(SQLITE_CONFIG_PAGECACHE, B, S, P)
   - sqlite3_status(SQLITE_STATUS_PAGECACHE_USED, C, H, 0)
   - sqlite3_status(SQLITE_STATUS_PAGECACHE_OVERFLOW, C, H, 0)
   - sqlite3_status(SQLITE_STATUS_PAGECACHE_SIZE, C, H, 0)

I preallocate an 8 byte aligned memory block of 4096 MB at start-up to use
as page cache.  My page size is 1024 bytes.  According to the documentation,
the page header should be 20 to 40 bytes.  If I assume a cache page header
size of 40 for a total of 1064 bytes, the sqlite3_status() call
S*_PAGECACHE_USED returns 0 while S*_PAGECACHE_OVERFLOW returns a growing
size.  This seems to indicate that the preallocated buffer is not actually
being used by the application, presumably because the page size is too
small.

Looking at the output of S*_PAGECACHE_SIZE, the reported page size is 1272.
If I set the page size to a size larger than this value (I used 1300),
S*_PAGECACHE_OVERFLOW is now 0, while S*_PAGECACHE_USED shows a steadily
growing size.  This is what I expect to indicate that the preallocated
buffer is now being used successfully.

Now as the cache usage increases, I find that it finally caps at 3226.381 MB
(of the total 4096 MB) before the overflow size starts to increase.  At an
initial page size of 1300, the 4096 MB gives 3,303,821 pages.  So, dividing
the used 3226.381 MB by the number of pages, gives 1024 bytes per page.

Is it therefore correct to assume that:

   1. The page cache header size is actually 248 bytes, and not 20 or 40 as
   stated in the documentation, and
   2. The size reported by S*_PAGECACHE_SIZE reports only the size used to
   store actual data, excluding the headers?

If that is correct, it looks like the page cache header is about 25% the
size of the actual data page, which becomes rather expensive as the cache
size increases.

Any comments or feedback will be much appreciated.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite disc and I/O usage

2011-09-19 Thread Jaco Breitenbach
On 19 September 2011 19:25, Simon Slavin  wrote:

>
> On 19 Sep 2011, at 7:13pm, Jaco Breitenbach wrote:
>
> > While on the topic of performance, I've performed a test in which new
> tables
> > were created within an active transaction (manual 'partitioning') and
> > populated within the same transaction.  What I've found, was that as the
> > number of tables created within the transaction increased,
>
> You mean 'tables' or 'rows' ?


I meant 'tables'. Depending on a configuration parameter up to 1440 tables
could dynamically be created within the same transaction within which they
are also being populated.  If the the tables already existed from a previous
execution, they were simply reused (CREATE IF NOT EXISTS ...).  If no tables
were created within a transaction, the transaction was completed within
1-2s.  If tables were also created within a transaction, the speed of record
inserts drastically decreased following the table creation (transaction
processing time would increase from 2s to several minutes).  It must also be
noted that any one table creation typically completes within milliseconds,
so the time spent on the actual table creation can not account for the total
time lost.


> > the performance
> > of the inserts within the same transaction decreased significantly.  I
> > haven't spent any further time trying to analyze the cause for this
> apparent
> > behaviour.
>
> Because it has to do more work.  Inserting the 10,000th record requires
> comparing it with 10,000 others to see whether it's a duplicate.  Inserting
> the 20,000th record requires twice as many comparisons.  It's not a linear
> function, since the comparison is done by searching a tree rather than a
> list, but that's a big component of it.


Agreed, but that is not the issue here.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite disc and I/O usage

2011-09-19 Thread Jaco Breitenbach
Hi Christian,

Thanks for your response.

On 19 September 2011 16:45, Christian Smith  wrote:

> On Mon, Sep 19, 2011 at 02:42:42PM +0100, Jaco Breitenbach wrote:
> > Hi Simon,
> >
> > Thanks for the reply.
> >
> > On 19 September 2011 13:23, Simon Slavin  wrote:
> >
> > > > I run the database (3.7.7.1) in WAL mode, with checkpointing
> performed at
> > > 10
> > > > minute intervals.
> > >
> > > You turned 'PRAGMA wal_autocheckpoint' off, right ?
> > >
> > Correct.  The frequency of checkpointing seems to have a significant
> impact
> > on overall performance.  I've disabled auto-checkpointing and introduced
> a
> > time-based manual checkpoint frequency similar to that of TimesTen.  In
> my
> > tests I've set the interval to 10 minutes.
> >
> > If you know you are going to do lots of INSERTs in a row you can put
> > > BEGIN/END around them.  This will dramatically speed up the operation.
>  On
> > > the other hand, if your benchmark is simulating lots of separate
> logging
> > > entries you will not want to do this.
> > >
> > That is indeed how my application works.  Records are processed in
> > transactions of approximately 20,000 records each.
> >
> > > <http://www.sqlite.org/draft/pragma.html#pragma_journal_mode>
> > >
> > > If, on the other hand you aren't concerned about using too much space,
> and
> > > are simply exploring to find out how things are likely to work in real
> life,
> > > carry on.
> > >
> > My application is rather simple.  It simply inserts random 22-character
> > strings into a single-column, unique-indexed table as fast as possible.
>  The
> > main concern is application performance and I've spent many hours
> optimising
> > as much as possible.  In terms of hardware, the resource I am most
> concerned
> > about is I/O bandwidth and I am trying to estimate the I/O requirement as
> a
> > function of record size and record processing rate.
>
>
> Is this the actual application, or just some test application that is
> representative of the final application?


> I ask because I can't see random 22-character string inserts as
> representative
> of anything other than worst case performance, so the test might not
> actually
> be valid.
>
> The test was performed with the actual application.  The processing rates
I've mentioned initially weren't that important at that point.  Actual
performance varies greatly depending on the hardware platform.

The purpose of my application is to check input records for duplicates.
This is done by calculating a 22-character string that uniquely identifies
an input record.  This key is then inserted into the unique-indexed table.
If the insert fails because of a unique constraint violation, the record is
flagged as duplicate and filtered, otherwise the insert succeeds and the
input record is processed further.  My statement about inserting random
strings wasn't therefor not entirely correct, but a fair approximation of
reality.

You'd be better off testing with real data with real insert patterns, then
> evaluating the IO demands of that. Your test above will result in lots of
> random IO, which is the worst case for spinning disk media.
>
> For insert mostly updates in big batches, WAL mode might not be optimal.
> The data will be written twice, once to the WAL, once to the database file
> whereas in rollback journal mode, most data will be written just once (with
> smaller amounts of rollback data written to the journal.)
>
> Once you introduce updates and deletes, the performance patterns can change
> as more updates become in place updates, which might favour WAL mode more.
>
> There will never be any updates or deletes; only inserts.


> Either way, the IO requirements are not likely to reflect record size and
> processing rate per se. You're more likely to be limited to transaction
> rate, transaction rate being limited by the cache flush rate of your
> IO stack and bound in the spinning platter case to the write latency
> of the disk. Such limits are in the order of 10s of transactions per second
> max for non-NVRAM cached spinning disks.
>
> Of course the first rule of programming is "make it work", followed
> afterwards by "make it work fast". And well designed schemas and indexes
> are likely to have more impact on performance (orders of magnitude
> potentially) than the micro-benchmark based optimizations you're
> looking at at the moment.
>
> Some of the things I have experimented with in order to optimise
performance are the follow

Re: [sqlite] SQLite disc and I/O usage

2011-09-19 Thread Jaco Breitenbach
I have to apologize:  I had the PRAGMA options confused.  My application
makes exclusive use of PRAGMA journal_mode=WAL.

What I meant to say was that the WAL log file is never truncated, except for
it being removed when the database connection is closed successfully.

On 19 September 2011 15:02, Simon Slavin  wrote:

>
> On 19 Sep 2011, at 2:42pm, Jaco Breitenbach wrote:
>
> > On 19 September 2011 13:23, Simon Slavin  wrote:
> >
> >> If you're actually concerned about filesize, then WAL mode is probably
> not
> >> the right mode to use.  You may be better off with 'PRAGMA journal_mode
> =
> >> DELETE'.  If your platform stores data on an SSD, then you may be better
> off
> >> with 'PRAGMA journal_mode = PERSIST'.  For further details see
> >
> > My application is currently using the PERSIST journal mode.  Actual file
> > size is not important.  I've mentioned the file sizes only to add more
> > detail and a clearer indication of where I get the numbers from.  The
> > PERSIST mode is used because it seems to have the least overhead of the
> > available journal modes.
>
> Okay.  If I understand the page correctly, then if you have the mode set to
> 'PERSIST' then you are not using WAL mode.  This means SQLite uses the
> journal file (what you called the log file) very differently and you should
> ignore the link to <http://www.sqlite.org/draft/wal.html#ckpt> I posted
> earlier.  Also, it means that ...
>
> On 19 Sep 2011, at 10:13am, Jaco Breitenbach wrote:
>
> > When it comes to checkpointing (every 10 minutes), I assume the log file
> is
> > read once in its entirety, and the data file written once for every
> record
> > inserted.
>
>
> If you're not using WAL mode, then I don't think the journal file will ever
> be read in its entirety unless SQLite is trying to recover from a crash or
> power failure.  In this mode the data file is kept pretty-much up-to-date.
>  The journal file contains rollback and recovery information.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite disc and I/O usage

2011-09-19 Thread Jaco Breitenbach
Hi Simon,

Thanks for the reply.

On 19 September 2011 13:23, Simon Slavin  wrote:

> On 19 Sep 2011, at 10:13am, Jaco Breitenbach wrote:
>
> > CREATE TABLE T  (   K   varchar(22)   PRIMARY KEY   );
>
> Note that SQLite doesn't really do 'varchar'.  All text fields are text
> fields and can have any number of characters in.  You can choose to put 22
> character in each entry, but you might choose not to.  For clarity, defining
> that as 'K text' would have an identical result.
>

I've understood from the SQLite documentation that there really are only a
small(ish) number of primitive column types.  I've chosen to use the
'varchar' label because it's similar to the Oracle syntax that I'm familiar
with.


> > I run the database (3.7.7.1) in WAL mode, with checkpointing performed at
> 10
> > minute intervals.
>
> You turned 'PRAGMA wal_autocheckpoint' off, right ?
>
> Correct.  The frequency of checkpointing seems to have a significant impact
on overall performance.  I've disabled auto-checkpointing and introduced a
time-based manual checkpoint frequency similar to that of TimesTen.  In my
tests I've set the interval to 10 minutes.

If you know you are going to do lots of INSERTs in a row you can put
> BEGIN/END around them.  This will dramatically speed up the operation.  On
> the other hand, if your benchmark is simulating lots of separate logging
> entries you will not want to do this.
>
> That is indeed how my application works.  Records are processed in
transactions of approximately 20,000 records each.


> If you're actually concerned about filesize, then WAL mode is probably not
> the right mode to use.  You may be better off with 'PRAGMA journal_mode =
> DELETE'.  If your platform stores data on an SSD, then you may be better off
> with 'PRAGMA journal_mode = PERSIST'.  For further details see
>
> My application is currently using the PERSIST journal mode.  Actual file
size is not important.  I've mentioned the file sizes only to add more
detail and a clearer indication of where I get the numbers from.  The
PERSIST mode is used because it seems to have the least overhead of the
available journal modes.


> <http://www.sqlite.org/draft/pragma.html#pragma_journal_mode>
>
> If, on the other hand you aren't concerned about using too much space, and
> are simply exploring to find out how things are likely to work in real life,
> carry on.
>
> My application is rather simple.  It simply inserts random 22-character
strings into a single-column, unique-indexed table as fast as possible.  The
main concern is application performance and I've spent many hours optimising
as much as possible.  In terms of hardware, the resource I am most concerned
about is I/O bandwidth and I am trying to estimate the I/O requirement as a
function of record size and record processing rate.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite disc and I/O usage

2011-09-19 Thread Jaco Breitenbach
Dear experts,

I am creating a (simple) model of the disc and I/O usage of my SQLite
database so I can advise my customers on hardware requirements.

My database schema is very simple.  It contains only one table with a single
column with a primary key:

CREATE TABLE T  (   K   varchar(22)   PRIMARY KEY   );

I run the database (3.7.7.1) in WAL mode, with checkpointing performed at 10
minute intervals.

>From tests performed on an HP-UX Itanium server I have the following
figures:

Processing rate: 135,416 inserts/min (2,257 inserts/s)
Total Inserts @ checkpoint (10 min):  1,354,167

After inserting 2,500,000 records, the file sizes (in bytes) on disc were as
follows:
Data file: 174,230,528 (174 MB)
Log file:  2,050,719,096 (2 GB)

These file sizes of the log and data files were obtained after the final
checkpoint.

So, from these figures I calculate the following:
Total bytes per record in the data file = 174,230,528 / 2,500,000 = 70
bytes/record
Total bytes per record in the log file = 2,050,719,096 / 1,354,167 = 1514
bytes/record

In the case of the data file, since my record is only 22 bytes wide, am I
correct in assuming that the extra 48 bytes are for the index?

Now in terms of the I/O bandwidth, if an entry of approximately 1514 bytes
is written to the log file for every record inserted, am I correct in
assuming that if normal processing runs at N records/s, that the I/O
(writing) generated is approximately 1514*N bytes/s?  In the example above,
that is approximately 3.26 MB/s.

When it comes to checkpointing (every 10 minutes), I assume the log file is
read once in its entirety, and the data file written once for every record
inserted.

In our system we have several of these SQLite-based processing units running
in parallel together with a large number of other related applications.
Disc I/O has proven to be a major limiting factor in system performance and
the more accurately we can model the generated I/O, the better we can
estimate the hardware requirements.

Any feedback would be much appreciated.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speeding up index creation

2011-09-05 Thread Jaco Breitenbach
Dear all,

Any speculation on if and how the new code would affect inserts into an
indexed table?

Jaco

On 5 September 2011 09:15, Filip Navara  wrote:

> On Mon, Sep 5, 2011 at 9:39 AM, Baruch Burstein 
> wrote:
> > This is unrelated, but I have never done any serious testing, and am
> > wondering what the user/sys results are in all these performance tests.
> Is
> > user I/O time and sys CPU time?
>
> User is the time spent in user-mode code, ie. mostly SQLite code and
> operating system libraries.
> Sys is the time spent in kernel-mode code, ie. kernel, drivers, I/O
> operatons.
>
> In both cases the values are per-process and thus they should be more
> accurate than wall clock time.
>
> Sum of both values should give the total time spent executing a given
> command.
>
> Best regards,
> Filip Navara
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Maximum number of tables in a data file

2011-08-09 Thread Jaco Breitenbach
Hi Gabriel,

Is there such a database that is both free and non-GPL that you can
recommend?

Jaco

On 9 August 2011 14:38, gabriel.b...@gmail.com wrote:

> Heve you ever considere using a NOSQL database I think it would serve you
> better
>
> 2011/8/9 Jaco Breitenbach 
>
> > Hi Igor and Michael,
> >
> > Yes, of course, 1440 minutes in a day. :-)
> >
> > I am building an application that filters out duplicate input data by
> > generating an MD5 hash of each input, and implicitly comparing that
> against
> > a set of keys already stored in the SQLite database by doing an insert
> into
> > a unique-indexed table.  If the insert fails, a duplicate is assumed,
> > otherwise the new unique key is stored, and the input processed.
> >
> > The problem that I'm facing, is that I would ultimately need to process
> > 1,000,000,000 records a day, with history to be kept for up to 128 days.
>  I
> > am currently creating a new data file per day, with hourly tables.
> >  However,
> > that will eventually result in 40,000,000+ records to be inserted into a
> > single indexed table.  Unfortunately the performance rate of the inserts
> > into the indexed tables decreases significantly as the number of records
> in
> > the tables increases.  This seems to be because of a CPU bottleneck
> rather
> > than I/O while doing the searches.
> >
> > I am now considering partitioning the data even further into tables that
> > span shorter time periods, e.g. 60 min, 30 min, 15 min, 5 min, 1 min.  I
> am
> > hoping that reducing the search space will help to maintain a higher
> insert
> > rate.
> >
> > I'd appreciate any feedback and comments on my suggested approach.
> >
> > Regards,
> > Jaco
> >
> >
> > On 9 August 2011 14:13, Igor Tandetnik  wrote:
> >
> > > Jaco Breitenbach  wrote:
> > > > Can anyone please tell me if there is a limit to the number of tables
> > > that
> > > > can be held in a single data file?  I am considering an application
> > that
> > > > will require a table for every minute in a day, i.e. 3600+ tables in
> a
> > > > single database or data file.
> > >
> > > First, there are 1440 minutes in a day. Second, you should be able to
> > > create this number of tables: if the limit exists, it's likely much
> > higher
> > > than that. Finally, I predict that the schema you envision would be
> very
> > > awkward to work with. Have you considered a single table having
> > MinuteOfDay
> > > as an extra column?
> > > --
> > > Igor Tandetnik
> > >
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users@sqlite.org
> > > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Maximum number of tables in a data file

2011-08-09 Thread Jaco Breitenbach
Yes, but each input record also contains a timestamp that can be used to
identify the relevant table.

On 9 August 2011 14:43, Igor Tandetnik  wrote:

> Jaco Breitenbach  wrote:
> > I am building an application that filters out duplicate input data by
> > generating an MD5 hash of each input, and implicitly comparing that
> against
> > a set of keys already stored in the SQLite database by doing an insert
> into
> > a unique-indexed table.  If the insert fails, a duplicate is assumed,
> > otherwise the new unique key is stored, and the input processed.
> >
> > The problem that I'm facing, is that I would ultimately need to process
> > 1,000,000,000 records a day, with history to be kept for up to 128 days.
>  I
> > am currently creating a new data file per day, with hourly tables.
>
> Doesn't that defeat the point? Wouldn't that only guarantee uniqueness
> within the last hour?
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Maximum number of tables in a data file

2011-08-09 Thread Jaco Breitenbach
Hi Igor and Michael,

Yes, of course, 1440 minutes in a day. :-)

I am building an application that filters out duplicate input data by
generating an MD5 hash of each input, and implicitly comparing that against
a set of keys already stored in the SQLite database by doing an insert into
a unique-indexed table.  If the insert fails, a duplicate is assumed,
otherwise the new unique key is stored, and the input processed.

The problem that I'm facing, is that I would ultimately need to process
1,000,000,000 records a day, with history to be kept for up to 128 days.  I
am currently creating a new data file per day, with hourly tables.  However,
that will eventually result in 40,000,000+ records to be inserted into a
single indexed table.  Unfortunately the performance rate of the inserts
into the indexed tables decreases significantly as the number of records in
the tables increases.  This seems to be because of a CPU bottleneck rather
than I/O while doing the searches.

I am now considering partitioning the data even further into tables that
span shorter time periods, e.g. 60 min, 30 min, 15 min, 5 min, 1 min.  I am
hoping that reducing the search space will help to maintain a higher insert
rate.

I'd appreciate any feedback and comments on my suggested approach.

Regards,
Jaco


On 9 August 2011 14:13, Igor Tandetnik  wrote:

> Jaco Breitenbach  wrote:
> > Can anyone please tell me if there is a limit to the number of tables
> that
> > can be held in a single data file?  I am considering an application that
> > will require a table for every minute in a day, i.e. 3600+ tables in a
> > single database or data file.
>
> First, there are 1440 minutes in a day. Second, you should be able to
> create this number of tables: if the limit exists, it's likely much higher
> than that. Finally, I predict that the schema you envision would be very
> awkward to work with. Have you considered a single table having MinuteOfDay
> as an extra column?
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Maximum number of tables in a data file

2011-08-09 Thread Jaco Breitenbach
Dear experts,

Can anyone please tell me if there is a limit to the number of tables that
can be held in a single data file?  I am considering an application that
will require a table for every minute in a day, i.e. 3600+ tables in a
single database or data file.

Regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Insert not working for sqlite3

2011-07-07 Thread Jaco Breitenbach
Are you missing a COMMIT perhaps?

On 7 July 2011 09:11, Stephan Beal  wrote:

> On Thu, Jul 7, 2011 at 9:31 AM, James_21th  wrote:
>
> > try {
> > $result=$dbh->exec("INSERT INTO tbl1(one,two) VALUES ('new1','new2')");
> > } catch(PDOExecption $e) {
> > print "Error!: " . $e->getMessage() . "";
> > }
> >
>
> PDO _only_ throws exceptions if you set the exceptions reporting option.
> Try
> instantiating it like this:
>
>$attr = array( PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION );
>$dbh = new PDO($dsn, "", "", $attr );
>
> --
> - stephan beal
> http://wanderinghorse.net/home/stephan/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] AIX 6.1 Autoconf support

2011-06-07 Thread Jaco Breitenbach
Dear experts,

I've been trying to get SQLite compiled on an AIX 6.1 system.  The configure
script that came with version 3.7.3 contains several exceptions for
different AIX versions, notably versions 4 and 5, but not 6.  This means for
example that it is unable to generate a shared library because the
appropriate tests are not performed correctly.  After editing the configure
script and adding a case for AIX 6, I managed to persuade it to create the
shared library instead of the static one.

I was just wondering if there were any plans in the pipeline to add AIX 6.1
support.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query question

2011-05-06 Thread Jaco Breitenbach
Hi Matthew,

sqlite> select * from a;
10|1
10|2
10|2
11|2
11|2
11|3

sqlite> select c.a, count(*) from (select distinct a, b from a) c group by
c.a;
10|2
11|2

Was that the result you were after?

Jaco


On 6 May 2011 15:03, Matthew Jones  wrote:

> A simple query question that I really ought to know the answer to but
> don't:
>
> Given a table with multiple columns in it how do I count how many
> entries there are with column A matching some value where column B is
> distinct?
>
> sqlite> create table a (a, b);
> sqlite> insert into a values (10, 1);
> sqlite> insert into a values (10, 2);
> sqlite> insert into a values (10, 2);
> sqlite> insert into a values (11, 2);
> sqlite> insert into a values (11, 2);
> sqlite> insert into a values (11, 3);
> sqlite> select * from a where a=10 group by b;
> 10|1
> 10|2
> sqlite> select * from a where a=11 group by b;
> 11|2
> 11|3
>
> How can I do count equivalent of such a query to find out how many
> distinct values of b there are for a given a? (That is get an answer of
> 2 in the above)
>
> Thanks
>
> --
> Matthew Jones
> Hewlett-Packard Ltd
> Long Down Avenue
> Stoke Gifford
> Bristol.  BS34 8QZ
> Tel:   +44 (0) 117 312 7490
> Email:matthew.jo...@hp.com
>
> Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks,
> RG12 1HN. Registered No: 690597 England
>
> The contents of this message and any attachments to it are confidential
> and may be legally privileged. If you have received this message in
> error, you should delete it from your system immediately and advise the
> sender.
>
> To any recipient of this message within HP, unless otherwise stated you
> should consider this message and attachments as "HP CONFIDENTIAL".
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Backing up online SQLite data files

2011-05-06 Thread Jaco Breitenbach
Hi Igor,

On 6 May 2011 14:15, Igor Tandetnik  wrote:

> Jaco Breitenbach  wrote:
> > My application has several data files open (one main, several other
> > attached) during normal operation.  The journaling mode is WAL.  I need
> to
> > perform an online backup of the data files without shutting down the
> > application.
>
> http://sqlite.org/backup.html
>
>
I've also looked at that page.  However, that either involves modifying the
main application or writing a separate backup utility application based on
the examples provided.

One thing that I've noticed, is that if the running database that is being
backed up is continuously being written to, then following a call to
sqlite3_sleep(), the backup will automatically be restarted.  This seems to
indicate that my backups might never complete.

Would a simple copy of the database, -shm and -wal files result in a
corrupted database if these files are used to restore the system, or would
it simply appear as if SQLite is recovering from a power failure?  Sure,
there may be a small amount of data loss in doing the copy without a lock,
but that may be acceptable so long as the data that is in the database is
not corrupted.

Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Backing up online SQLite data files

2011-05-06 Thread Jaco Breitenbach
Dear experts,

My application has several data files open (one main, several other
attached) during normal operation.  The journaling mode is WAL.  I need to
perform an online backup of the data files without shutting down the
application.

What is the risk in simply copying all files, i.e. data files, -shm and -wal
files to a backup directory?  Could the sqlite3 built-in .backup command be
used while the application is accessing the data files?  Is there a specific
backup utility available for SQLite?

Any comments and suggestions will be much appreciated.

Regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Problem setting journal_mode=WAL

2011-04-12 Thread Jaco Breitenbach
Dear experts,

My application makes use of several data files attached to the SQLite
database.  Data files are attached on demand as they become necessary.  I'm
experimenting with the WAL journalling mode as a speed/performance
optimisation.  Unfortunately I'm having some difficulty setting the journal
mode to WAL on the attached databases.

Here is some diagnostic output:
Setting database 'DC2_DB_20110412' journal mode to WAL.
sqlite3_exec('0x1f8f57a8','COMMIT',NULL,NULL,'cannot commit - no transaction
is active') = SQLITE_ERROR
sqlite3_exec('0x1f8f57a8','PRAGMA
DC2_DB_20110412.JOURNAL_MODE=WAL',NULL,NULL,'cannot change into wal mode
from within a transaction') = SQLITE_ERROR

The two sqlite3_exec() statements are executed directly after each other.
If the 'COMMIT' reports that no transaction is active, why is the
JOURNAL_MODE pragma still complaining about it being within a transaction?
Am I missing something here?

I'm working with SQLite 3.7.3.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Disabling a unique index

2011-04-08 Thread Jaco Breitenbach
Hi Dan,

I haven't changed the vacuum setting from the default, so it should be
0/NONE.  Having just connected to it using the command line tool, the value
returned was 0 as well.

Jaco

On 8 April 2011 11:12, Dan Kennedy  wrote:

> On 04/08/2011 04:42 PM, Jaco Breitenbach wrote:
> > Dear experts,
> >
> > My application makes use of a SQLite table with a unique index.  During
> > normal processing, the unique index is used to ensure no duplicate
> records
> > are entered into the table.  However, at regular intervals large numbers
> of
> > records (millions of records) that are already known to be unique, are
> > inserted into the SQLite table from another source.  In order to speed up
> > these bulk inserts, I first drop the index on the SQLite table, do the
> bulk
> > insert, and then recreate the index.
> >
> > The problem I'm encountering is that dropping of the index can
> potentially
> > take much longer than recreating it.  In a test I've run this morning, it
> > took 53 minutes to complete the "DROP INDEX my_unique_index", but only 9
> > minutes to recreate it.
>
> Is your database an auto-vacuum database? If so, it might be faster if
> you issue a "PRAGMA auto_vacuum = 2" before dropping the index.
>
> Dan.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Disabling a unique index

2011-04-08 Thread Jaco Breitenbach
Dear experts,

My application makes use of a SQLite table with a unique index.  During
normal processing, the unique index is used to ensure no duplicate records
are entered into the table.  However, at regular intervals large numbers of
records (millions of records) that are already known to be unique, are
inserted into the SQLite table from another source.  In order to speed up
these bulk inserts, I first drop the index on the SQLite table, do the bulk
insert, and then recreate the index.

The problem I'm encountering is that dropping of the index can potentially
take much longer than recreating it.  In a test I've run this morning, it
took 53 minutes to complete the "DROP INDEX my_unique_index", but only 9
minutes to recreate it.  Looking at the documentation for "DROP INDEX" I've
noticed that it says, "The index is completely removed from the disk."  I
can only assume that that is the reason why dropping the index could take so
long.

Are there any alternatives to dropping the index?  Ideally I only want to
disable it, without actually removing it from the file, and incurring the
unnecessary cost of rewriting the data file.

Any ideas or comments would be much appreciated.

Regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Permanently disabling AutoCommit

2011-03-24 Thread Jaco Breitenbach
Dear Experts,

Is there an easy way to permanently disable the default AutoCommit using the
C API?  The nature of my application is such that I don't know exactly where
a new transaction begins.  Records are inserted into the database, and at
the end an explicit COMMIT is requested.

It seems that it would be possible to check whether a transaction has been
created (BEGIN TRANSACTION) by using sqlite3_get_autocommit(), and if the
return value is TRUE (AutoCommit is active), then issue the BEGIN.  However,
it seems that the internal DB structure contains an autoCommit flag, and I'm
hoping there's a simple way of setting it not to autocommit at startup.

Any ideas or advice will be most welcome.

Best regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] PRAGMA cache_size problem

2011-03-23 Thread Jaco Breitenbach
Dear all,

I have compiled sqlite-autoconf-3070500 on a RedHat Linux machine.  My
application is written in C, so uses the SQLite C interface.

At startup I connect to an admin datafile, and then attach 20 more datafiles
to the connection.  After creating the initial connection (before attaching
to the other datafiles), I issue "PRAGMA cache_size=1048576".  If I query
the cache size with "PRAGMA cache_size", the correct value is returned.
However, this appears to have no actual effect on the application.  As I
proceed to insert data randomly into the 20 attached datafiles, the
application's memory footprint is capped at 64 MB, even though the combined
size of the datafiles is well over 400 MB.

As a test I recompiled SQLite with -DSQLITE_DEFAULT_CACHE_SIZE=1048576.
Without issuing the PRAGMA, the memory (cache) size now continues to grow to
match the size of the datafiles as expected.

Has any of you encountered this problem before?  Am I doing something wrong,
or could there be a problem with this release of SQLite?

Any ideas or comments will be much appreciated.

Regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] PRAGMA cache_size problem

2011-03-23 Thread Jaco Breitenbach
Hi Dan,

That fixed the problem for me.  Thanks a lot, much appreciated!
Best regards,
Jaco
On 23 March 2011 09:10, Dan Kennedy  wrote:

> On 03/23/2011 03:26 PM, Jaco Breitenbach wrote:
> >   Dear experts,
> >
> > I have compiled sqlite-autoconf-3070500 on a RedHat Linux machine.  My
> > application is written in C, so uses the SQLite C interface.
> >
> > At startup I connect to an admin datafile, and then attach 20 more
> datafiles
> > to the connection.  After creating the initial connection (before
> attaching
> > to the other datafiles), I issue "PRAGMA cache_size=1048576".  If I query
> > the cache size with "PRAGMA cache_size", the correct value is returned.
> > However, this appears to have no actual effect on the application.  As I
> > proceed to insert data randomly into the 20 attached datafiles, the
> > application's memory footprint is capped at 64 MB, even though the
> combined
> > size of the datafiles is well over 400 MB.
> >
> > As a test I recompiled SQLite with -DSQLITE_DEFAULT_CACHE_SIZE=1048576.
> > Without issuing the PRAGMA, the memory (cache) size now continues to grow
> to
> > match the size of the datafiles as expected.
>
> Each attached database has a separate limit. You will need to do:
>
>   ATTACH 'new.db' AS aux;
>   PRAGMA aux.cache_size = 1048576;
>
> or something.
>
> Dan.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] PRAGMA cache_size problem

2011-03-23 Thread Jaco Breitenbach
 Dear experts,

I have compiled sqlite-autoconf-3070500 on a RedHat Linux machine.  My
application is written in C, so uses the SQLite C interface.

At startup I connect to an admin datafile, and then attach 20 more datafiles
to the connection.  After creating the initial connection (before attaching
to the other datafiles), I issue "PRAGMA cache_size=1048576".  If I query
the cache size with "PRAGMA cache_size", the correct value is returned.
However, this appears to have no actual effect on the application.  As I
proceed to insert data randomly into the 20 attached datafiles, the
application's memory footprint is capped at 64 MB, even though the combined
size of the datafiles is well over 400 MB.

As a test I recompiled SQLite with -DSQLITE_DEFAULT_CACHE_SIZE=1048576.
Without issuing the PRAGMA, the memory (cache) size now continues to grow to
match the size of the datafiles as expected.

Has any of you encountered this problem before?  Am I doing something wrong,
or could there be a problem with this release of SQLite?

Any ideas or comments will be much appreciated.

Regards,
Jaco
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users