Re: [sqlite] Coding standard

2019-12-12 Thread Valentin Davydov
On Thu, Dec 12, 2019 at 11:19:44AM -0500, Richard Hipp wrote:
> 
> #define sqlite3Strlen30NN(C) (strlen(C)&0x3fff)
> 
> The tool does not provide any details beyond "Use of strlen".

So why not just #define sqlite3Strlen30NN(C) (strnlen(C,0x3fff)) ?
From the point of view of program logic it looks similar (at least for 
me), but shifts security burden from you to authors of libc. And of course
this should calm static analyzers anxious about strlen(), sprintf() etc.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to increase performance when inserting a lot of small data into table using indices

2019-09-12 Thread Valentin Davydov
On Tue, Sep 10, 2019 at 05:25:38PM +0200, mailing lists wrote:
> Hi,
> 
> I cannot really put all the inserts into one transaction because in case of a 
> failure I loose all the already inserted data. Though I made some tests. 
> There is hardly any performance gain anymore when doing 1000 or 10 000 
> insertions in one transaction including immediate insertion into indices (in 
> my case the difference is in the per cent range).

What do you mean "to loose data"? Do you need them to be immediately available 
via SQL, or just written to persistent storage? In the latter case you can 
implement your own data cache, like sequentilal log files, which will be 
periodically (and/or on demand) rotated, and afterwards asynchronously parsed, 
inserted into the SQLite database with optimized CACHE_SIZE, transaction size,
journal mode etc, and deleted only after successfull commit. Thus you shift the 
burden from SQL to filesystem which is less limited by natural data structure 
and might perform better.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 2D query

2017-12-13 Thread Valentin Davydov
Hi, all!

Given the following table:

CREATE TABLE people(name,sex);
INSERT INTO people VALUES("Alex","F");
INSERT INTO people VALUES("Alex","M");
INSERT INTO people VALUES("Jane","F");
INSERT INTO people VALUES("John","M");

How to construct a query which returns coalesced sex but individual names,
such as "F: Alex, Jane. M: Alex, John."?

Sincerely, 
Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] whish list for 2016

2015-12-28 Thread Valentin Davydov
On Fri, Dec 25, 2015 at 12:51:55PM +, Simon Slavin wrote:
> 
> On 25 Dec 2015, at 12:39pm, Valentin Davydov  
> wrote:
> 
> > It would be desirable to improve algorithm of INTEGRITY_CHECK pragma.
> > Presently it is generally useless for indexed databases which don't fit
> > entirely in RAM (and which usually need checking much strongly than
> > smaller ones).
> 
> Valentin, could you expand on that ?

As far as I understand, INTEGRITY_CHECK simply iterates over the records 
(of tables and indices) one by one in some arbitrary order. So, if the 
database is too big to fit in the available memory (sqlite's own cache, 
system file cache etc), then each iteration implies a random seek on disk(s),
or even several ones in some scenarios. So, check of a few terabytes database 
with some tens billions of records and a dozen of indices would take more than 
10^11 disk operations of more than 10 milliseconds each. That is, years.

My wish is to involve some intermediate level of cache (perhaps bitmaps?)
which could reside in reasonable amount of memory (say, < 1% of the total 
database size) and obviate the need to read each page more than once.
So, things would speed up in proportion of page size to record size. 

> PRAGMA schema.foreign_key_check;
> 
> ?  Does it have the same problem as far as you're concerned ?

I don't have access to a huge databases with foreign keys now.

Valentin Davydov.


[sqlite] whish list for 2016

2015-12-25 Thread Valentin Davydov
Hi, All!

It would be desirable to improve algorithm of INTEGRITY_CHECK pragma.
Presently it is generally useless for indexed databases which don't fit
entirely in RAM (and which usually need checking much strongly than
smaller ones).

Valentin Davydov.


[sqlite] Native sqlite4 on FreeBSD

2015-09-11 Thread Valentin Davydov
On Thu, Sep 10, 2015 at 01:30:37PM +0200, Eduardo Morras wrote:
> 
> Use gmake to compile.

It didn't work either. Finally I've just installed some brand new linux 
on a nearby virtual machine, made there make -f Makefile.linux-gcc and 
thoroughly repeated it's output line-by-line on my FreeBSD while replacing 
gcc by cc. Eventually it compiled well (with almost the same set of warnings) 
and the result seems to work (at least within my own coverage).

Thanks to the high general coding style, there were only two idiosyncrasies 
to be corrected: lack of the system-wide malloc.h (which is replaced by 
unistd.h in FreeBSD) and yet another lack of the fdatasync() syscall, which
is already carefully screened from the sources by a special symbol called 
__ANDROID__ ;-)

By the way, clang kindly revealed a couple of forgotten "unsigned" in the
sources, which could (and did in the past) produce some nice bugs. Attention
to developers.

> Note that there isn't a port,

I know. Despite the code being frozen for almost a full year, nobody
wants to take a trouble of maintaining FreeBSD port (perhaps me too).

Valentin Davydov.



[sqlite] Native sqlite4 on FreeBSD

2015-09-10 Thread Valentin Davydov
Hi, all!

How to build native sqlite4 (preferrably statically linked shell only,
without any extensions) on a recent version of FreeBSD? It has clang 
instead of gcc as a default system compiler, so makefiles bundled with
sqlite sources don't work.

Valentin Davydov.


[sqlite] AUTOINC vs. UUIDs

2015-05-21 Thread Valentin Davydov
On Wed, May 20, 2015 at 11:52:08PM +, Peter Aronson wrote:
> Now you're just getting silly. ?What if the application sets all rowids,
> everywhere to 1? ?The fact is, the chance of collision on a UUID is pretty
> astronomically low as long as a decent source of entropy is used
> (see?http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates).
> ?Yes, some application might not generate proper UUIDs, but that's true
> with any scheme that needs to coordinate disconnected data editing or
> generation on multiple machines.

Moreover, there are widespread examples of colliding UUIDs, say
EBD0A0A2-B9E5-4433-87C0-68B6B72699C7. This means that this idea 
have already proven to fail on it's intended usage.

> ?There are lots of applications out there that use UUIDs pretty successfully.

Much less than a number of applications which use integers ;-)

Val. Dav.



Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Valentin Davydov
On Fri, Nov 21, 2014 at 02:01:39PM -0500, Joseph Fernandes wrote:

> We wanted to known the following
> 1) How could we improve the performance on the write side so that we have 
> minimal latency?
> 2) Will ther be any write performance hit when the number of records in the 
> DB increase?

Generally speaking, you have to do some work to arrange your data (modification 
times) in some ordered way. This work can be done eihter in advance, as you 
suggest, or on demand, as some people have already told you. But anyway this 
will eat up necessary resources, regardless of whose code would do it, either
yours or SQLite. In practice (given magnetic disks as underlying storage), 
most scarce of the mentioned resources is rotational/seek latency, which 
detrimentally affects all disk operations of any scheduled priority. SQLite 
performs extensive random disk access (mostly reads) on most operation 
scenarios - selects, inserts, indexing etc. with possible exception of 
small updates of non-indexed data (which are accessed in a similar fashion 
by later selects). The only way to cope with the slow disk is keeping all 
necessary data somwhere else, for example, into the RAM cache. Of course, 
cache itself should be populated in advance to give this benefit, and, given 
current RAM prices, it seems not very feasible to steal available memory from 
smart applications in favour of dumb cache.

Hope, this considerations will help you in tuning your code.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 50% faster than 3.7.17

2014-09-22 Thread Valentin Davydov
On Fri, Sep 19, 2014 at 09:14:17PM -0400, Richard Hipp wrote:

> The latest SQLite 3.8.7 alpha version (available on the download page
> http://www.sqlite.org/download.html) is 50% faster than the 3.7.17 release
> from 16 months ago.  That is to say, it does 50% more work using the same
> number of CPU cycles.

Is there any similar benchmarks with regard to disk i/o operations rather 
than CPU? Especially random read of cache misses, I mean.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow select from database

2014-03-12 Thread Valentin Davydov
On Wed, Mar 12, 2014 at 02:38:15PM +0400, Георгий Жуйков wrote:
> 1 . We have a database of measurements: time DATETIME, name TEXT, value
> NUMERIC
> indexes: 
> 'tags_name_index' ON 'TAGS' ('NAME' ASC)
> 'tags_name_itemtime_index' ON 'TAGS' ('NAME' ASC ', ITEMTIME' ASC)
> In case of record auto_vacuum=INCREMENTAL flag is used
[skip]
> 4 . The request of data is made for time slot, i.e. from several databases.
> For example:
> SELECT COUNT (*) as COUNTER FROM Entries WHERE (TIMESTAMP BETWEEN @STARTTIME
> AND @ENDTIME)
> SELECT * from Entries WHERE (TIMESTAMP BETWEEN @STARTTIME AND @ENDTIME) of
> ORDER BY TIMESTAMP DESC LIMIT 1000 OFFSET 0

I can't comletely understand your database schema, but why didn't you use
index on just DATETIME field when selecting data of given time period?

> 5 . Initially all requests are expedited.
> After a while (about 50 minutes) the same requests start being executed more
> slowly, request to each database (from 1 to 30 minutes). 
> Repeated request of the same data - quickly.
> The System.Data.SQLite.x86 updating to version 1.0.91.3 doesn't bring any
> positive result.
> 
> What to do?

First try to figure out where it spends so much time (I suspect random
disk reads: it seems to be no other slow physical phenomena in your
system, except probably database locking by some other process which you
didn't mention). Then either revise logic of your application (e.g. by 
adding an abovementioned index, changing autovacuum to forced vacuum 
sceduled at specific time of day etc.) or give it more resources (say, 
by putting the database on an SSD drive).

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.8.2 running 2.5x faster for some queries.

2013-11-24 Thread Valentin Davydov
On Sat, Nov 23, 2013 at 06:18:29AM -0500, Richard Hipp wrote:
> On Sat, Nov 23, 2013 at 5:26 AM, Pepijn Van Eeckhoudt <
> pep...@vaneeckhoudt.net> wrote:
> 
> > Is datetime special in thuis context or will constant expression hoisting
> > like this happen for any function?
> 
> SQLite must know that the function always gives the same output given the
> same inputs.  No every function works that way.  Counterexamples include
> random() and last_insert_rowid().  But most built-in functions are
> factorable in the same way that datetime() is.

Wait a second... and you'll get different value of datetime('now'). In this 
sense datetime() is as deterministic as random(): it may give the same result
next invocation or may not, dependng on various circumstances not related to
the function itself.

> Currently there is no API to designate an application-defined function as
> being "constant" in the sense that it always generates the same output
> given the same inputs.  Hence, SQLite assumes the worst about
> application-defined functions and never tries to factor them out of the
> inner loop.  Probably we should add a new API that allows the application
> to state auxiliary properties about application-defined functions (such as
> whether or not it is "constant", whether or not it can return NULL, whether
> or not it might change the encoding of its input parameters, etc.)  But
> that has not been done yet.

To my opinion, the most general solution is to let to the application 
programmer to decide whether to calcucale the function once (say, at 
the beginning of a transaction), store the result and then access 
the stored value, or to make the new call to the function each 
iteration, depending on the application semantics.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite scalability

2013-11-22 Thread Valentin Davydov
On Thu, Nov 21, 2013 at 11:43:32AM +0200, Baruch Burstein wrote:
> Hi all,
> 
> I know SQLite is supposed to support DB sizes in the TB (I think the
> default configuration can reach 1TB). I am curious if anyone actually uses
> SQlite at anywhere near this.

Yes. 

> Does anyone use it regularly for DBs 500GB+
> in size, or with tables containing 10 billion rows+? 

I've got an installation with SQLite DB of several terabytes in size.
It contains about 20 billions thoroughly indexed records, and grows every
day (more precisely, every night) by a few tens of millions of new records.

> How much concurrency does your use require?

I've spent some efforts to eliminate concurrency in application. That is,
updates and selects occur at very different times of the day.

> How long do selects take (assuming indexes are set correctly?)

It depends of the size of the select. Single row is selected instantaneously.
Check of the uniqueness takes about 1-2 minutes per 1 million of records, most 
of time being spent parsing SQL commands. Whereas aggregate functions over 
substantional fraction of the entire database, of course, take too long to 
be executed in real time.

> Are there problems of locking ("normal" SQLite doesn't usually
> suffer from locking since transactions are very quick, but if transactions
> can be in the order of 100's of ms, I think locking can easily happen if
> the DB is accessed concurrently a few times a second, though I am not sure
> if this may only apply to writes).

Yes. Single transaction (insertion of that tens of millions of new recors)
takes hours in the worst case.

> I understand that the answer to most of these questions can be very
> hardware (and software) dependent,

Indeed not so. The only hardware capable of storing such amount of data 
is an array of magnetic disks, and their latency time (about 10-20 ms for
random access) is much more than any reasonable software overhead. Even
cache (internal SQLite page cache and/or operation system file cache) 
occupies the same memory and therefore has almost the same effect. The
only software which determines the performance is SQLite itself, in my
case, perhaps, trees rebalancing algorithm.

> but I am just trying to get a feel for
> SQLite's applicability for a project I am working on that may reach limits
> like these.

The only definive SQLite limits are documentet in the relevant manual page.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Write performance question for 3.7.15

2012-12-29 Thread Valentin Davydov
On Fri, Dec 28, 2012 at 03:35:17PM -0600, Dan Frankowski wrote:
> 
> 3. Would horizontal partitioning (i.e. creating multiple tables, each for a
> different key range) help?

This would seriously impair read performance (you'd have to access two indices
instead of one).

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fwd: Write performance question for 3.7.15

2012-12-29 Thread Valentin Davydov
On Fri, Dec 28, 2012 at 03:34:02PM -0600, Dan Frankowski wrote:
> I am running a benchmark of inserting 100 million (100M) items into a
> table. I am seeing performance I don't understand. Graph:
> http://imgur.com/hH1Jr. Can anyone explain:
> 
> 1. Why does write speed (writes/second) slow down dramatically around 28M
> items?

Most probably, indices became too large to fit in the in-memory cache.
You can verify this by tracing system activity: this threshold should 
manifest itself by drastical increase in _read_ operations on disk(s).

> 2. Are there parameters (perhaps related to table size) that would change
> this write performance?

CACHE_SIZE. It makes sense to enlarge it up to the all available memory.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Any limitation of file size on sqlite?

2012-12-17 Thread Valentin Davydov
On Sun, Dec 16, 2012 at 10:02:41PM +0200, ?a?lar Orhan wrote:
> Hello again,
> Thank you Simon.
> The page http://www.sqlite.org/limits.html has my answer partly. In MS IIS
>  what gbytes bigger size of .sqlite file should significant to work on it
> properly?
> I mean, is there a 10-12 GB sqlite file that working properly and with any
> problem?

There is only limitation on the total number of pages in the file. So, the
file size is limited by SQLITE_MAX_PAGE_COUNT times PAGE_SIZE. Thus, given
enough page size, say, 8 to 64 kilobytes, sqlite successfully handles files 
of many terabytes in size, provided underlying filesystem supports them, of 
course. Be prepared to wait quite a long for non-sqlite operations (copy,
delete etc.) to complete on such a big files.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] IP address support (was: Consequences of lexicographic sorting of keys in SQLite4?)

2012-07-11 Thread Valentin Davydov
On Mon, Jul 02, 2012 at 05:05:25PM +0100, Niall O'Reilly wrote:
> 
> On 2 Jul 2012, at 16:13, Nico Williams wrote:
> 
> > That reminds me: it'd be nice to have a bit string type, since the
> > correct way to sort IPv4 CIDR blocks is as bit strings.
> 
>   Nice, definitely!
> 
> > This is also
> > a proper way to sort IPv6 blocks.  Alternatively, it'd be nice to have
> > native IP address types in SQLite4, as otherwise one has to jump
> > through hoops to handle IP addresses properly.
> 
>   Bit strings would be more general.
>   Native IP would remove a sometimes-asserted motivation for preferring
>   PostgreSQL.
> 
>   As I see it, ranges, as well as single addresses and CIDR prefixes, 
>   need to be supported, perhaps like the Perl Net::IP module does.

Individual IP addresses are nicely supported in the form of unsigned
integers, and prefixes/ranges - as contiguous ranges of such integers.
For example, to determine whether given IP address belongs to a particular
subnet, one can calculate "IP between NETWORK_MIN and NETWORK_MAX", which 
sqlite does quite efficiently. This is for IPv4 at least.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can't create empty database

2012-06-25 Thread Valentin Davydov
On Mon, Jun 25, 2012 at 01:01:46AM -0700, L Anderson wrote:
> Googling on how to create a sqlite database (empty one) it appears
> I need only do 'sqlite3 test.db'.  However, when I try that I get:
> 
> ->sqlite3 test.db
> SQLite version 3.7.13 2012-06-11 02:05:22
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite>
> 
> however, no database is created.  What am I doing wrong and what
> do I need to do to create an empty database 'test.db'?

Perhaps you have to fill the database with some of your own data (and later
delete them if you want really empty DB). Otherwise it will not even remember
your PRAGMA settings etc.:

$ rm -f test.db
$ sqlite3 test.db "pragma page_size=512;"
$ wc -c test.db
   0 test.db
$ sqlite3 test.db "pragma page_size;"
1024
$ sqlite3 test.db "pragma page_size=512;
create table t(c);drop table t;vacuum;"
$ wc -c test.db
 512 test.db
$ sqlite3 test.db "pragma page_size;"
512
$

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index creation for single-use DB

2012-05-13 Thread Valentin Davydov
On Sun, May 13, 2012 at 11:02:54AM +0300, Baruch Burstein wrote:
> I am using an in-memory DB to load data into it, do a few sort / find
> duplicates / SELECTs, and then dispose of the DB. It can vary in size from
> a few thousand rows to over a million.
> Would the time used for creating an index be worth it for just a single
> sort and a few SELECTs?

It depends on the selects. If more than 2-3 of them iterate over entire
table, then appropriate indexing would help. In fact, indexing of a column
comprises iterating over it and sorting the result (besides storing the
index in memory/file).

> If so, would it be faster to do all the INSERTs and
> then add the index, or create the index and then do the INSERTs?

The former is faster, because all necessary information is already available.

Вал. Дав.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-09 Thread Valentin Davydov
On Mon, May 07, 2012 at 01:34:13PM -0400, peter korinis wrote:

> So, if I use gawk to change my comma separated file to | (pipe) delimiter  .
> will it work? 

If you use gawk, you can easily change your comma separated file to the 
series of correct SQL INSERT statements (besides ensuring validity of 
input data such as right number of fields in each row etc.) and not rely
on the sqlite shell csv-parsing capabilities.

Valentin Davydov.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-05 Thread Valentin Davydov
On Fri, May 04, 2012 at 11:23:42AM -0400, peter korinis wrote:
>
> After I get the data loaded and inspect for nulls in prospective index
> attributes, can I add indices?

Yes, of course. Moreover, it would be much faster to add indices at once 
at the end rather than create them beforehand and then update with every new
piece of data.

> I was planning to load using sqlite3 CLI ".import" command. Is there a way I
> can monitor the progress of the load, with only minimal impact on
> performance ?

You can monitor (by some external means) either the database file size or 
read pointer position in the input data file. Both of them grow linearly 
with amount of data processed.

> I've started several loads only to find out hours later that
> nothing has been loaded.

Anyway be prepared to spend some of your time on learning.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is SQLite the right tool to analyze a 44GB file

2012-05-04 Thread Valentin Davydov
On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data. 

Yes, SQLite is quite capable of doing simple analyzis of such amounts of data,
especially selecting small subsets based on a simple criteria. However before 
trying to do some real work you have to understand the structure of your data,
realize your possible queries and carefully design database schema (tables 
and, equally important, indises). Perhaps, putting all data in a single 
600-column table is not a good idea (though allowed technically), especially
if your columns are equal by their physical nature: it is not so easy to
select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
> with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times 
more disk space than the raw data. Probably 200GB would not be enough, 
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).

Try bare sqlite shell instead of those external tools. It should take 
at least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3 terabytes 
of disk space, contains more than 10^10 records and still provides small 
selects of indexed data in real time.

> 1.   Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find 
there. SQLite isn't very good for calculation of complex aggregate functions,
but works fine in simple selecting and sorting.

> 2.   If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and 
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single 
transaction when creating database for the first time.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] question about zipvfs & performance

2012-03-25 Thread Valentin Davydov
On Sun, Mar 25, 2012 at 07:48:51PM +0200, Tal Tabakman wrote:
> Hi,
> I am writing an application that performs a lot of DB writes. I am using a
> lot of recommended optimizations (like using transactions and more...)
> I want to improve my recording time by reducing the amount of I/O. one way
> to do so is by compressing the data before dumping it to DISK.
> I am evaluating a sqlite extension called zipvfs. this VFS extension
> compresses pages before writing them to disk
> I am using zlib compress/uncompress as my compression callback functions
> for this VFS. I assumed that database writing will  be faster with this VFS
> since
> compression [means less I/O], in reality I see no difference (but the data
> is indeed compressed)...
> any idea why I don't see any recording time improvement ?

Yes. If you are using drive with rortating magnetic plates, then the most
critical stage is seek (latency) time, rather than linear throughput. In
other words, you are limited by a _number_ of random i/o operatitions,
and not by an _amount_ of the information written.

So, if you want to improve your write performance, then your have to
use low-latency storage, such as SSD drive for small databases or
RAID array with plenty ow write cache memory for huge ones. 

> is there an overhead with zipvfs ?

You can easily measure this overhead yourself on the in-memory database. 
RAM is cheap now ;-)

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow inserts in indexed table

2012-02-09 Thread Valentin Davydov
On Thu, Feb 09, 2012 at 02:05:53PM +, Simon Slavin wrote:
> 
> On 9 Feb 2012, at 1:45pm, Valentin Davydov wrote:
> 
> > CREATE TABLE T(X, Y, Z, ..., UNIQUE (X, Y, Z, ...) ON CONFLICT IGNORE);
> > CREATE INDEX IX ON T(X);
> > CREATE INDEX IY ON T(Y);
> > CREATE INDEX IZ ON T(Z);
> > .
> 
> Not an answer to your question, but it's worth knowing that those
> indexes may not be helping anything you're doing.

Perhaps IX is indeed redundant (because it should coinside with the
beginning of the implicit uniqueness index), but other indices are 
quite useful in doing quick selects of small subsets of data (e.g. 
SELECT MAX(Z) FROM T;), which is typical usage pattern in my application.

>SQLite already has to create an index idea for coping with your UNIQUE 
>clause.  So it already has an index on
> 
> (X, Y, Z, ...)

But this index is useless in searches which don't iclude X column.

> Try removing those indexes and see if this improves matters. 
> It'll certainly make your filesize smaller.

Not so significant. They altogether use up no more than one third of the 
total space, two other thirds being occupied by the table itself and by
the implicit index respectively. Actually even less, because some columns
don't have individual indices.

Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Slow inserts in indexed table

2012-02-09 Thread Valentin Davydov
Hi, All!

I've got a database containing single table with a dozen of columns and some 
indices like that:

CREATE TABLE T(X, Y, Z, ..., UNIQUE (X, Y, Z, ...) ON CONFLICT IGNORE);
CREATE INDEX IX ON T(X);
CREATE INDEX IY ON T(Y);
CREATE INDEX IZ ON T(Z);
.

Data in the different columns are small positive integers, 32-bit integers 
or few-bytes blobs. Neither of the X, Y, Z, ... are unique by themselves, 
only their combination is unique (that's why I opt using relational database 
to process them). My application treats this table as append-only, that is, 
doing either INSERT or SELECT on it, without any UPDATEs. Few millions of 
inserts are wrapped in a single transaction in order to reduce journal usage. 
Total number of records in the table is more than 10^10, so it doesn't fit 
in any RAM. PRAGMA CACHE_SIZE is adjusted to use most of the available memory. 
Journal file (of enough size to hold all the pages dirtied by a biggest 
transaction) is created in advance and PRAGMA JOURNAL_MODE is set to PERSIST 
in order not to bother operating system with creating/deleting files. Page
size is matched to the underlying filesystem block size and to the stripe
size of RAID containing that filesystem. Sqlite version is now 3.7.3, but 
it seems that exact version doesn't matter.

When trying to insert data already present in the table, performance is 
fairly well, most of the CPU time is spent only on parsing SQL statements 
and converting data to internal format, while database operation itself 
(that is checking data against the constraint) is almost instantaneous, 
which is quite impressive given the table size. But when the application 
inserts new data, things change drastically: total throughput drops by a 
2-3 orders of magnitude. CPU is staying almost idle, and all time is spent
waiting for disk _reading_ (rarely interspersed with fast and happy write
bursts on each COMMIT). What is sqlite reading there? Does it try to 
perfectly balance each index on each insert (million times per 
transaction) or something else?

Sincerely,
Valentin Davydov.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Shell bug: incorrect handling of some BLOBs.

2011-11-10 Thread Valentin Davydov
Subsystem: Shell

Title: Incorrect handling of some BLOBs.

Type: Code_Defect

Severity: Important

Priority: Unknown

Affected: All SQLite versions containing output_hex_blob() up to 3.7.9.

Environment: Tested on various FreeBSD 8.x versions, both i386 and amd64.
Probably, other operating systems would be affected as well.

How-to-reproduce: Obtain FreeBSD and install sqlite3 by some of the standard
ways (from a package, port, etc.). Then see screenshot (shell output indented
for clarity):
$ sqlite3 /tmp/db.tmp
SQLite version 3.7.9 2011-11-01 00:52:41
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table t(v blob);
sqlite> insert into t values(X'0123456789');
sqlite> .mode insert
sqlite> select * from t;
INSERT INTO table VALUES(X'01234567ff89');
sqlite> 

Description: In some circumstances (for example, when .mode insert is set
and select results contain BLOBs) the shell calls internal function 
output_hex_blob(). This function uses wrong data types which could result 
in an implicit sign extension and thus data corruption.

Suggested-fix: Following patch helps. Perhaps there might be better solution,
such as reimplementing of the necessary fprintf(3) functionality without 
relying on the operation system libraries.
--- src/shell.c.orig2011-11-01 16:31:18.0 +0400
+++ src/shell.c 2011-11-10 22:45:11.0 +0400
@@ -490,7 +490,7 @@
 */
 static void output_hex_blob(FILE *out, const void *pBlob, int nBlob){
   int i;
-  char *zBlob = (char *)pBlob;
+  unsigned char *zBlob = (unsigned char *)pBlob;
   fprintf(out,"X'");
   for(i=0; i