[sqlite] error during bind

2015-07-03 Thread Kumar Suraj
Hi

So whats the solution here. I am giving 4 statements because i need insert
to be transactional. Some other process could be inserting in the same db
and so we need insert and row id values to be in one transaction.  Is there
any other way i can achieve it.

-Suraj

On Fri, Jul 3, 2015 at 5:13 PM, Stephan Beal  wrote:

> On Fri, Jul 3, 2015 at 1:39 PM, Kumar Suraj  wrote:
>
> > #define INSERT_DN "BEGIN TRANSACTION; INSERT INTO TBL (dn) VALUES (?);
> > SELECT last_insert_rowid(); COMMIT;"
> >
> >
> prepare() expects a _single_ statement. You're giving it 4 statements and
> then trying to bind to part of that, which won't work.
>
>
> > snprintf(command, 512, INSERT_DN);
> >
>
> --
> - stephan beal
> http://wanderinghorse.net/home/stephan/
> http://gplus.to/sgbeal
> "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
> those who insist on a perfect world, freedom will have to do." -- Bigby
> Wolf
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Mikael
Wait, so if connection 1 is doing VACUUM and connection 2 attempts
INSERT/UPDATE/DELETE and connection 3 attempts SELECT all concurrently,
what happens and how ACID will the DB be?

2015-07-03 20:07 GMT+07:00 Clemens Ladisch :

> Simon Slavin wrote:
> > On 3 Jul 2015, at 1:39pm, Rob Willett 
> wrote:
> >> is Vacuum the same as doing the .dump and restore or is it different?
> >
> > It's the same thing (more or less).
>
> The implementation is completely different.  From the outside, the main
> difference is that VACUUM works inside SQLite's transaction and locking
> mechanisms, so it is safe against concurrent accesses, and against data
> loss even if problems happen while the old database file is replaced
> with the new one.
>
> >> We like the .dump as it gives us a nice easy to use backup :)
>
> In theory, a backup created with .backup is even easier to use; .dump is
> more useful if you want a text file instead of a working database file.
>
> > Good for keeping many generations of archive around.
>
> ... and for creating diffs between them.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Reader.GetBytes() - when is a byte not a byte?

2015-07-03 Thread Jean Chevalier
In SQLite, every value you store is stored alongside its type. This is unlike 
other databases where the column determines the type and every value stored 
against it share it. In SQLite you could have a table in which all value types 
as stored contradict all column types as declared, if you so craft it (except 
integer primary key). As for a column declared BLOB, it results in an affinity 
of NONE meaning "perform no implicit conversions" (as long as lossless and 
reversible), it doesn't mean "affinity binary" because there is no such thing 
as something more generic than binary that should implicitly be cast to binary 
upon storing it. So a blob-declared column would not try and implicitly 
convert, but it will still store exactly what you put in, including its type, 
inferred from the way in which you did the assignment, e.g., for set c = '1' it 
will store {text-1} for integer it will store {integer-of-1-} (a 
special case), for x'01' it will store {blob-01}, and so on. This lower case 
'blob' is the most generic case (your 'bytes') but to obtain that one must 
cast() to a blob which means "strip from what I'm giving you any indication 
that it may be a text, or an integer, or a real, as it is really neither". 
Think of blob not as a type ('binary') but the absence of types. You can 
further apply a check constraint to the column to ensure that no value stored 
will be stored along with a tag meaning "I'm of a certain kind". This 
check(typeof(c)='blob') will ensure the data you put in will be 'naked' so to 
speak, and when the data you put in is naked, SQLite can only store it as 
bytes, and by default return it as bytes. So you can see how it differs, blob 
as column type meaning 'suggest no type' or 'add no type' and blob as value 
type meaning 'a value with no type tag attached' or 'convey no type'. They're 
complementary. Think of the two together as "none given - none taken" (or 'none 
added'). Naturally the constraint will not let you set c = '1' because '1' 
implies text by way of the quotes, same for Int (and how would it know if you 
meant a one-byte int, or two, or four?), and I think that's what you want in 
your application, but it will let you assign something in the x'' notation, or 
a cast() expression. So a check constraint makes the column more 'demanding' so 
to speak, type-wise, stricter, and closer to what you observe in other 
databases. Just remember to tweak the check constraint to allow nulls for 
nullable columns. After which, we'd expect that if no type marker was carried 
in, none will come out upon querying it, unless you have some other layer 
wrapped around it such as some ODBC driver assuming that it's its duty to 
further cast a column to a target type based on the column's declared type. But 
this is because some wrapper drivers assume the user will try and use SQLite as 
he uses most other databases, or because the driver cannot support exposing a 
series of column values of varying types. A native SQLite driver (one built in 
accordance to how SQLite really works) shouldn't take this freedom.


You wrote:
>
> My assumption was that GetBytes() could be used to read
> the bytes of any column regardless of type. What difference
> should column type or affinity make if I'm asking for bytes?
> If I'm using GetBytes() on a column with REAL affinity,
> it should put 8 bytes in the read buffer, but it doesn't.
> If I send text to a BLOB column it's typeof() changes to
> TEXT which confuses GetBytes(). I think this is pretty silly
> because bytes are bytes, right? Why bother verifying type
> when the method name already tells you what you are getting
> and bytes can be obtained from any data type?
> (Rhetorical questions, but feel free to respond anyway.)
>


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Mikael
2015-07-03 5:51 GMT+07:00 Simon Slavin :

>
> On 2 Jul 2015, at 11:16pm, Rob Willett 
> wrote:
>
> > We process this XML feed and pull out chunks of the XML, process it and
> update our database with it. This database is currently 16GB in size. Our
> concern is the time taken to process each file every five minutes. It has
> been getting steadily longer and longer. It started out at around 6 seconds
> and is now around 22-24 seconds. Whilst we expect processing time to get
> bigger, we are getting concerned about the performance and decided to have
> a look.
>
> This suggests fragmentation.  There can be fragmentation at various levels
> for a SQLite database.  Fortunately it's simple to get rid of them.  First,
> take some timings.
>
> Using the '.dump' and '.read' commands from the command-line tool to
> create a new database.  It will be created with each row in each table in
> primary-key order.  Once you've done that defragment the disk the database
> is stored on using your OS tools (if possible).  Once you've done that take
> the same timings and see whether anything improved.
>

VACUUM is the ultimate anti-fragmenting tool isn't it (in particular when
combined with WAL_checkpoint() .. or??)?


[sqlite] error during bind

2015-07-03 Thread Stephan Beal
On Fri, Jul 3, 2015 at 6:16 PM, Kumar Suraj  wrote:

> So whats the solution here. I am giving 4 statements because i need insert
> to be transactional.


You need to prepare() 4 statements or, like Clemens suggests, use exec()
where possible instead of prepare/step.

Some other process could be inserting in the same db
> and so we need insert and row id values to be in one transaction.  Is there
> any other way i can achieve it.


Simplest, i think, is something like this pseudocode:

exec("BEGIN;");
prepare("INSERT...;");
bind(...)
step(...);
finalize(...);
id = sqlite3_last_insert_rowid(...);
exec("COMMIT;");

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread GB
Aargh! Should have paid attention to that Collation-Thing right away. 
And try it for myself.

@Rob Willett:

try

"select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ? COLLATE 
NOCASE"

sqlite should then choose Rag_Idx1 without being forced to.

In general, Collations should be defined on the Table definition. Create 
Indexes with specific Collations only when absolutely necessary.

hth
Gerd


[sqlite] error during bind

2015-07-03 Thread Simon Slavin

On 3 Jul 2015, at 5:16pm, Kumar Suraj  wrote:

> So whats the solution here. I am giving 4 statements because i need insert
> to be transactional. Some other process could be inserting in the same db
> and so we need insert and row id values to be in one transaction.  Is there
> any other way i can achieve it.

Just issue your SQL commands as four separate commands.  The database knows 
that they all came from the same connection and it will keep the database 
locked for you until you COMMIT.

Simon.


[sqlite] SQLite in multi-thread application

2015-07-03 Thread ALBERT Aurélien
Hi,

I'm using SQLite v3.8.8.3 in my muli-threaded application.

SQLite is configured so these asserts are satisfied :

assert(sqlite3_threadsafe() > 0);
assert(sqlite3_config(SQLITE_CONFIG_MULTITHREAD) == SQLITE_OK);

I have multiple connections to the same database file :
-   1 single connection is used by only a single thread
-   1 single thread use only 1 connection
-   All threads (about 4-8 threads) can read simultaneously
-   Using a mutex, only a single thread can write to the database (but 
reads can happen during this time)

But I have sometimes "Database is locked" errors.

Did I miss something in my configuration ?
Did I miss something in my mutex protection ?

Thanks for your help !




[sqlite] error during bind

2015-07-03 Thread Kumar Suraj
Hi

I have trying to insert in a sqlite db.. here is the code but i am getting
following error.. what could be the issue.

*Error Insert : sqlite3_bind_blob, Error code : 25*

#define INSERT_DN "BEGIN TRANSACTION; INSERT INTO TBL (dn) VALUES (?);
SELECT last_insert_rowid(); COMMIT;"

BinBuffer aInBuffer
char buffer[100] = "a/b/c/d";
char * testdn = buffer;
aInBuffer.size = 100;
aInBuffer.ptr = testdn;

reset_stmt(newStmt);
snprintf(command, 512, INSERT_DN);
if ( (rv = sqlite3_prepare_v2(sqlHandle->db, command, -1,
 , NULL) ) != SQLITE_OK )
{
   fprintf(stderr, "Error Insert : sqlite3_prepare_v2, Error code :
%d\n", rv);
   return;
}
rv = sqlite3_bind_blob(newStmt, 1, aInBuffer.ptr, aInBuffer.size,
SQLITE_STATIC);
if (rv != SQLITE_OK)
{
fprintf(stderr, "Error Insert : sqlite3_bind_blob, Error code :
%d\n", rv);
return;
}


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Clemens Ladisch
Simon Slavin wrote:
> On 3 Jul 2015, at 1:39pm, Rob Willett  wrote:
>> is Vacuum the same as doing the .dump and restore or is it different?
>
> It's the same thing (more or less).

The implementation is completely different.  From the outside, the main
difference is that VACUUM works inside SQLite's transaction and locking
mechanisms, so it is safe against concurrent accesses, and against data
loss even if problems happen while the old database file is replaced
with the new one.

>> We like the .dump as it gives us a nice easy to use backup :)

In theory, a backup created with .backup is even easier to use; .dump is
more useful if you want a text file instead of a working database file.

> Good for keeping many generations of archive around.

... and for creating diffs between them.


Regards,
Clemens


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Simon Slavin

On 3 Jul 2015, at 2:31pm, Mikael  wrote:

> Wait, so if connection 1 is doing VACUUM and connection 2 attempts
> INSERT/UPDATE/DELETE and connection 3 attempts SELECT all concurrently,
> what happens and how ACID will the DB be?

VACUUM locks the database, just like INSERT and UPDATE.  Other writing 
operations will be forced to wait until the VACUUM is complete and the database 
is unlocked again.

Depending on your settings and journal mode, read operations will (I believe) 
read the uncommitted changes from the 'old' copy of the table, or will be 
delayed until the VACUUM is complete.  Whichever it is, they should get correct 
data.

Of course, waiting for VACUUM may take far longer than waiting for a single 
INSERT or UPDATE, which is why you need a longer timeout setting than just a 
few seconds.

Simon.


[sqlite] error during bind

2015-07-03 Thread Clemens Ladisch
Kumar Suraj wrote:
> BEGIN TRANSACTION; INSERT INTO TBL (dn) VALUES (?); SELECT 
> last_insert_rowid(); COMMIT;

Please note that the value returned by the SQL function "last_insert_rowid()"
is also available with the C API function "sqlite3_last_insert_rowid(db)".

When you have SQL commands without parameters and without a result, it is
easier to execute them with sqlite3_exec():

rv = sqlite3_exec(sqlHandle->db, "COMMIT;", NULL, NULL, NULL);
if (rv != SQLITE_OK) ...


Regards,
Clemens


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Simon Slavin

On 3 Jul 2015, at 1:39pm, Rob Willett  wrote:

> is Vacuum the same as doing the .dump and restore or is it different? We like 
> the .dump as it gives us a nice easy to use backup :)

It's the same thing (more or less).  Mikael remembered it and I'd forgotten it 
(thanks).  But if your data are important then there are reasons to do both.  
Do a .dump first, then VACUUM.  The VACUUM is to speed things up and the .dump 
is for archiving.  Not that I've ever seen VACUUM fail and lose both the 
'before' and 'after' versions of a table, even when running out of disk space.

I like using '.dump' too because I can read the results with human eyes and I 
don't need to have a working SQLite platform to use the resulting file.  For 
all I know the problem that caused me to need the backup was so serious 
(duplicate UNIQUE keys ?) that I will have to read the results into a text 
processor and modify them, which is something you cannot do with a SQLite 
database file.

The SQL command files which result from .dump compress well into .zip or .7z 
files, of course.  Good for keeping many generations of archive around.

Simon.


[sqlite] error during bind

2015-07-03 Thread Stephan Beal
On Fri, Jul 3, 2015 at 1:43 PM, Stephan Beal  wrote:

> On Fri, Jul 3, 2015 at 1:39 PM, Kumar Suraj  wrote:
>
>> #define INSERT_DN "BEGIN TRANSACTION; INSERT INTO TBL (dn) VALUES (?);
>> SELECT last_insert_rowid(); COMMIT;"
>>
>>
> prepare() expects a _single_ statement. You're giving it 4 statements and
> then trying to bind to part of that, which won't work.
>

Minor elaboration:

the prepare() docs say:

https://www.sqlite.org/c3ref/prepare.html

"If pzTail is not NULL then *pzTail is made to point to the first byte past
the end of the first SQL statement in zSql. These routines only compile the
first statement in zSql, so *pzTail is left pointing to what remains
uncompiled."

so you're only compiling the BEGIN part of your statement, to which you
cannot bind() anything.


-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf


[sqlite] error during bind

2015-07-03 Thread Stephan Beal
On Fri, Jul 3, 2015 at 1:39 PM, Kumar Suraj  wrote:

> #define INSERT_DN "BEGIN TRANSACTION; INSERT INTO TBL (dn) VALUES (?);
> SELECT last_insert_rowid(); COMMIT;"
>
>
prepare() expects a _single_ statement. You're giving it 4 statements and
then trying to bind to part of that, which won't work.


> snprintf(command, 512, INSERT_DN);
>

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Simon,

We had missed the incorrect defn of Calculate in the index. We?ve been changing 
around stuff and suspect we omitted to check as carefully as we we should when 
changing :(

We?ll also update BayesAttribute as well.

We?re actually rebuilding the database from scratch anyway, it?ll take around 
3-4 days now as opposed to 12-16 days as before :) 

We?ll check if we get any improvements from rebuilding using .dump by copying 
the database to another server and doing it there. Thats one of the great 
things about SQLite, pick the database up and copy it somewhere else :)

is Vacuum the same as doing the .dump and restore or is it different? We like 
the .dump as it gives us a nice easy to use backup :)

Rob

> On 3 Jul 2015, at 13:15, Simon Slavin  wrote:
> 
> 
> On 3 Jul 2015, at 11:35am, Rob Willett  
> wrote:
> 
>> CREATE TABLE "RAG" (
>>"Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
>>"Count" integer NOT NULL DEFAULT 0,
>>"Text" TEXT COLLATE NOCASE ,
>>"Peak" integer,
>>"Calculation" integer NOT NULL DEFAULT 0,
>>"Red" integer DEFAULT 0,
>>"Amber" integer DEFAULT 0,
>>"Green" integer DEFAULT 0,
>>"BayesAttributes" TEXT
>> );
> 
> [...]
> 
>> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
>> CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, 
>> "Text" COLLATE NOCASE ASC);
>> CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
>> "BayesAttributes" COLLATE NOCASE ASC);
> 
> To explain further, now that you have declared your "Text" column as being 
> COLLATE NOCASE, NOCASE is now the default collation for it, and all sorting 
> and matching on Text will be NOCASE unless you state otherwise.  So it's not 
> necessary to state COLLATE NOCASE when you create indexes on it or mention it 
> elsewhere.  It won't do any harm, though.  Shout out to Clemens for his 
> corrections to my earlier post.
> 
> I notice that "Calculation" is an INTEGER column, and that "BayesAttributes" 
> is a TEXT column but in "Rag_Ids3" you have used "Calculation" COLLATE 
> NOCASE.  I was wondering whether you had meant to use "BayesAttributes" TEXT 
> COLLATE NOCASE in your table definition instead, and rewrite the index 
> accordingly.  There doesn't seem to be any point in having "BayesAttributes" 
> be case-sensitive.
> 
> Since you've significantly re-done your table definition and indexes, now is 
> a good time to run ANALYZE.  It may or may not help, but it will overwrite 
> data which is no longer corrent.
> 
>> The second biggest improvement was the 10 secs saved by dumping the database 
>> and rebuilding from the dump file. We may have got a better increase if we 
>> did that first and then the COLLATE NOCASE which would probably then have a 
>> lesser increase :) 
> 
> While the improvement from rewriting the table definition will be permanent, 
> the improvement from defragmentation will gradually reduce as the database 
> gets changed in a fragmented way.  So you may find out !
> 
> However, it should not be necessary to dump and rebuild your database on a 
> monthly basis.  SQLite is unusually efficient at dealing with fragmented 
> database pages, and OS X, of course, has background defragmentation built in. 
>  I doubt that any difference .dump & .read would make is worth doing that 
> amount of clunky data processing on a regular basis.  However, if you do 
> decide to do it, you should do an ANALYZE before using the newly remade 
> database.  Including when you do this to your production database after 
> implementing the changes you've decided on in this thread.
> 
> Delighted we could make such a big difference.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Simon Slavin

On 3 Jul 2015, at 1:18pm, Mikael  wrote:

> VACUUM is the ultimate anti-fragmenting tool isn't it

I completely forgot VACUUM.  Yes.  It's easier than doing .dump & .read or 
writing your own code.

Simon.


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Simon Slavin

On 3 Jul 2015, at 11:35am, Rob Willett  wrote:

> CREATE TABLE "RAG" (
> "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
> "Count" integer NOT NULL DEFAULT 0,
> "Text" TEXT COLLATE NOCASE ,
> "Peak" integer,
> "Calculation" integer NOT NULL DEFAULT 0,
> "Red" integer DEFAULT 0,
> "Amber" integer DEFAULT 0,
> "Green" integer DEFAULT 0,
> "BayesAttributes" TEXT
> );

[...]

> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
> CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, 
> "Text" COLLATE NOCASE ASC);
> CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
> "BayesAttributes" COLLATE NOCASE ASC);

To explain further, now that you have declared your "Text" column as being 
COLLATE NOCASE, NOCASE is now the default collation for it, and all sorting and 
matching on Text will be NOCASE unless you state otherwise.  So it's not 
necessary to state COLLATE NOCASE when you create indexes on it or mention it 
elsewhere.  It won't do any harm, though.  Shout out to Clemens for his 
corrections to my earlier post.

I notice that "Calculation" is an INTEGER column, and that "BayesAttributes" is 
a TEXT column but in "Rag_Ids3" you have used "Calculation" COLLATE NOCASE.  I 
was wondering whether you had meant to use "BayesAttributes" TEXT COLLATE 
NOCASE in your table definition instead, and rewrite the index accordingly.  
There doesn't seem to be any point in having "BayesAttributes" be 
case-sensitive.

Since you've significantly re-done your table definition and indexes, now is a 
good time to run ANALYZE.  It may or may not help, but it will overwrite data 
which is no longer corrent.

> The second biggest improvement was the 10 secs saved by dumping the database 
> and rebuilding from the dump file. We may have got a better increase if we 
> did that first and then the COLLATE NOCASE which would probably then have a 
> lesser increase :) 

While the improvement from rewriting the table definition will be permanent, 
the improvement from defragmentation will gradually reduce as the database gets 
changed in a fragmented way.  So you may find out !

However, it should not be necessary to dump and rebuild your database on a 
monthly basis.  SQLite is unusually efficient at dealing with fragmented 
database pages, and OS X, of course, has background defragmentation built in.  
I doubt that any difference .dump & .read would make is worth doing that amount 
of clunky data processing on a regular basis.  However, if you do decide to do 
it, you should do an ANALYZE before using the newly remade database.  Including 
when you do this to your production database after implementing the changes 
you've decided on in this thread.

Delighted we could make such a big difference.

Simon.


[sqlite] Could frequent 'database is locked' errors mean SQLite is not a good fit for my application?

2015-07-03 Thread Teg
Hello Kathleen,

I'd set it to 5 minutes and see what happens. The timeout is worst
case. It's not going to slow normal processing. I use infinite here.
Is there ever a situation where you don't want to wait for it to
finish? If you have hard real time requirements. I'd probably
re-design it to be client server and have only one process actually
talk to the DB and act like a server to the other processes. If you're
going  down  that  route  though,  a  big  database might be a better
solution.

C

Thursday, July 2, 2015, 6:22:52 PM, you wrote:

KA> Thanks so much for the help. I set the timeout to 5 seconds (at each
KA> instance of a db open connection) and was still seeing the database locked
KA> errors with some frequency (haven't quantified it yet), so I will try
KA> extending it to 30 seconds, as you suggest, and hopefully that makes a
KA> difference.

KA> Failing that, PostgreSQL looks promising (thanks so much for the
KA> suggestion!), so I will start trying to implement it.

KA> Thanks Again!

KA> On Thu, Jul 2, 2015 at 6:06 PM, Simon Slavin  
wrote:

>>
>> On 2 Jul 2015, at 3:59pm, Kathleen Alexander  wrote:
>>
>> > I don't explicitly set sqlite3_busy_timeout(sqlite3*, int ms) so I will
>> try
>> > adding that after the database connection is opened to see if it limits
>> > those errors.
>>
>> This will have an extreme effect.  The default timeout for SQLite is not
>> to use a timeout at all: any clash in access will result in immediate
>> failure.  Set your timeout to 30 seconds and see what happens.
>>
>> Worth noting that the timeout needs to be set by each application for each
>> connection.  If you set it for the first application which opens the
>> database it won't automatically be applied by other connections.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
KA> ___
KA> sqlite-users mailing list
KA> sqlite-users at mailinglists.sqlite.org
KA> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



-- 
 Tegmailto:Teg at djii.com



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
I forgot to add one thing in to my last e-mail. I just checked the function 
call time which as approx 13ms, it is now approx 110 micro seconds, over a 100x 
faster. 

This looks like the speed we can get out of SQLite.

Thanks again,

Rob.


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Thanks to everybody who helped with ideas and suggestions on improving the 
performance of a query. We?ve implemented and tested out most of the 
suggestions made and it has produced a startling improvement. 

Here?s a table of the tests we made. We doubled the number of files to we used 
to 20 files as opposed to 10 to try and make sure that we get an average cross 
section of files. So bear this mind if looking at the old figures, basically 
you need to double them. We also did not go back to the same database each time 
as copying and recopying a 16GB file is boring ;)

HTML tables look like junk so this is really text.

1 - Initial Baseline Perl - 213 secs
2 - Adding COLLATE NOCASE to database table RAG and NOT changing the query - 
Run 1 - 112 Secs
3 - Rebuild RAG table without COLLATE NOCASE to check that the performance 
increase was really working  - 234 secs
4 - Adding COLLATE NOCASE to database table RAG and NOT changing the query - 
Run 2 - 99 secs
5 - Rerun performance check again to make sure. No changes to RAG table, 
COLLATE NOCASE still there, Run 3 - 98 secs
6 - Rebuild database using .dump and cat file back into new DB - Run 1   100 
secs - (Odd spike for one of the files which took 11 secs)
7 - Rerun test with existing database to check spike - Run 2 - 90 secs
8 - rerun test yet again just to be sure - Run 3 - 90 secs
9 - Vacuum test not run as rebuild using .dump and cat done instead
10 - Remove all queries to __unknown__. Simple optimisation in Perl - No other 
optimisations such as collate - 153 sec - This run should be compared to 
"baseline perl" and not to the other optimisations
11 - Remove all queries to __unknown__. Simple optimisation in Perl All other 
optimisations in place - 90 secs - The Perl ?optimisation' has now been 
optimised out by all the other collate stuff. This is odd, we thought this 
would speed things up again - Needs further investigation.
12 - Remove Perl optimsiation and check it still OK - 91 secs
13 - Added ORDER BY Text COLLATE NOCASE ASC to query - 92 secs

The headline figures are we have gone from 213 secs to process 20 files down to 
90 secs to process 20 files. We are running approx 2.5x faster. To get this 
improvement the biggest change was simply adding COLLATE NOCASE to the table 
schema. This saved around 120-130 secs which was brilliant. 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT COLLATE NOCASE ,
 "Peak" integer,
 "Calculation" integer NOT NULL DEFAULT 0,
 "Red" integer DEFAULT 0,
 "Amber" integer DEFAULT 0,
 "Green" integer DEFAULT 0,
 "BayesAttributes" TEXT
);
INSERT INTO "main".sqlite_sequence (name, seq) VALUES ("RAG", '43330');

-- 
--  Indexes structure for table RAG
-- 
CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, "Text" 
COLLATE NOCASE ASC);
CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
"BayesAttributes" COLLATE NOCASE ASC);

We also deleted one of the indexes as well as 

> CREATE UNIQUE INDEX "Rag_Idx4" ON RAG ("Id" COLLATE NOCASE ASC, "Calculation" 
> COLLATE NOCASE ASC, "Peak" COLLATE NOCASE ASC, "Red" COLLATE NOCASE ASC, 
> "Amber" COLLATE NOCASE ASC, "Green" COLLATE NOCASE ASC, "Text" COLLATE NOCASE 
> ASC);

Index Rag_Idx4 was not in the database version we copied from to start our 
testing. Our bad, we were too quick to start testing. However we checked using 
the query plan and didn?t see any issues so we have left it out. 

The second biggest improvement was the 10 secs saved by dumping the database 
and rebuilding from the dump file. We may have got a better increase if we did 
that first and then the COLLATE NOCASE which would probably then have a lesser 
increase :) 

We did add a Perl logic coding optimisation which made a massive difference of 
60-70 secs when run against the baseline, but when we added the optimisation 
back in to the database with COLLATE NOCASE it made no difference. This puzzles 
us as it removes a lot of the SELECT searches  so should still make a 
difference at 90 secs. We?ll investigate further.

Adding ORDER BY Text COLLATE NOCASE ASC to the query made no difference as we 
would expect as we have updated the indexes. 

We also recreated the test where we ran direct SELECT statements into sqlite3

time cat ,output | sqlite3 tfl.sqlite > /dev/null

real0m0.247s
user0m0.152s
sys 0m0.096s

This was probably the most revealing of all. It basically has taken no time to 
run 10,551 selects. This implies that we had issues on the database before with 
either fragmentation or indexes being poorly setup.  

When we look at the graphical output for the performance monitoring program we 
find that the bottleneck has gone from there and another area now takes up the 
most 

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Clemens Ladisch
Rob Willett wrote:
>> Do you ever depend on any indexing on the "Text" column which is not COLLATE 
>> NOCASE ASC ?  If not, then you should be able to speed up your search by 
>> defining the column the way you think of it.  So in your table definition use
>>
>>   "Text" TEXT COLLATE NOCASE ASC,
>
> We don?t think adding ASC is allowable

Indeed it isn't; ASC makes sense only for (sorting) an index.


Regards,
Clemens


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Clemens,

Fine, thank you. We thought we had misunderstood and we were right, we had!

We are just checking the COLLATE NOCASE option on the RAG table and re-running 
the tests again as the improvement was so dramatic we don?t believe it :)

Rob.

> On 3 Jul 2015, at 09:26, Clemens Ladisch  wrote:
> 
> Rob Willett wrote:
>>> Do you ever depend on any indexing on the "Text" column which is not 
>>> COLLATE NOCASE ASC ?  If not, then you should be able to speed up your 
>>> search by defining the column the way you think of it.  So in your table 
>>> definition use
>>> 
>>>  "Text" TEXT COLLATE NOCASE ASC,
>> 
>> We don?t think adding ASC is allowable
> 
> Indeed it isn't; ASC makes sense only for (sorting) an index.
> 
> 
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Steven M. McNeese
This was one of the best posts I have read. No ego chest thumping - just 
knowledgable experience-based support. 

Shout out to the original poster for providing such thorough detail on the 
problem. This allowed the experts to quickly identify the possible problem and 
potential solution quickly. 

This thread was helpful for less experienced people like me quickly follow the 
performance problem and solution. 

Appreciated,

Steve 

Sent from my iPhone

> On Jul 3, 2015, at 8:47 AM, Simon Slavin  wrote:
> 
> 
>> On 3 Jul 2015, at 2:31pm, Mikael  wrote:
>> 
>> Wait, so if connection 1 is doing VACUUM and connection 2 attempts
>> INSERT/UPDATE/DELETE and connection 3 attempts SELECT all concurrently,
>> what happens and how ACID will the DB be?
> 
> VACUUM locks the database, just like INSERT and UPDATE.  Other writing 
> operations will be forced to wait until the VACUUM is complete and the 
> database is unlocked again.
> 
> Depending on your settings and journal mode, read operations will (I believe) 
> read the uncommitted changes from the 'old' copy of the table, or will be 
> delayed until the VACUUM is complete.  Whichever it is, they should get 
> correct data.
> 
> Of course, waiting for VACUUM may take far longer than waiting for a single 
> INSERT or UPDATE, which is why you need a longer timeout setting than just a 
> few seconds.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users




[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Simon, Clemens,

> Do you ever depend on any indexing on the "Text" column which is not COLLATE 
> NOCASE ASC ?  If not, then you should be able to speed up your search by 
> defining the column the way you think of it.  So in your table definition use
> 
>"Text" TEXT COLLATE NOCASE ASC,

We?re just checking this out and we?re getting an error on creating the table. 
We?ve just checked the SQlite spec 
(https://www.sqlite.org/syntax/column-constraint.html 
) and can?t see how to 
add ASC to it unless we create the Text field as a primary key which is a 
significant coding change for us. 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT COLLATE NOCASE ,
 "Peak" integer,
 "Calculation" integer NOT NULL DEFAULT 0,
 "Red" integer DEFAULT 0,
 "Amber" integer DEFAULT 0,
 "Green" integer DEFAULT 0,
 "BayesAttributes" TEXT
);

works

and 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT COLLATE NOCASE ASC , 

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Gerd,

Thanks for this. 

We?ll add this to the to-do list for the day. We?ve had a lot of responses back 
and we need to put a plan together to work through them one by one and see the 
impact, It?ll be an interesting day.

Thank you for taking the time to look at our e-mail and replying

Rob

> On 3 Jul 2015, at 07:06, GB  wrote:
> 
> 
> 
> Rob Willett schrieb am 03.07.2015 um 00:16:
>> SCAN TABLE RAG USING COVERING INDEX Rag_Idx4
>> 
>> 
> 
> Since Rag_Idx4 is quite wide and not primarily ordered by "Text", an index 
> scan might not be significantly faster than a table scan. As already 
> mentioned, ANALYZE might help.
> 
> For curiosity you may try to force it to use "Rag_Idx1" by executing
> 
> "select Id,Calculation,Peak,Red,Amber,Green from RAG INDEXED BY Rag_Idx1 
> where text = ?"
> 
> and see if it makes any difference. If it does you may consider rearranging 
> Rag_Idx4.
> 
> 
> hth
> Gerd
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Clemens Ladisch
GB wrote:
> For curiosity you may try to force it to use "Rag_Idx1" by executing
>
> "select Id,Calculation,Peak,Red,Amber,Green from RAG INDEXED BY Rag_Idx1 
> where text = ?"
>
> and see if it makes any difference.

"Error: no query solution"

Rag_Idx1 index cannot be used because the collations do not match.
(But this is a good way of checking your assumptions.)


Regards,
Clemens


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Clemens Ladisch
Simon Slavin wrote:
> On 2 Jul 2015, at 11:16pm, Rob Willett  
> wrote:
>> ?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??
>>
>> CREATE TABLE "RAG" (
>>   ...
>>   "Text" TEXT,
>>   ...
>> );
>>
>> It has four indexes on it
>>
>> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
>
> Do you ever depend on any indexing on the "Text" column which is not COLLATE 
> NOCASE ASC ?  If not, then you should be able to speed up your search by 
> defining the column the way you think of it.  So in your table definition use
>
>"Text" TEXT COLLATE NOCASE ASC,

This is correct.

> You may find that this immediately speeds up the search.  Or you may find 
> that you may have to change your SELECT to
>
> SELECT Id,Calculation,Peak,Red,Amber,Green FROM RAG WHERE text = ? ORDER BY 
> Text COLLATE NOCASE ASC

This would not help; the lookup on the text is independent of the sorting.

To make the lookup use NOCASE, both operands of the = need to have the same 
collation:

  SELECT ... FROM RAG WHERE text = ? COLLATE NOCASE


Regards,
Clemens


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Clemens Ladisch
Ward Willats wrote:
> It may or may not apply to your situation, but after doing lots of inserts, 
> running ANALYZE can sometimes work wonders.

As a rule of thumb, ANALYZE can help only if there are multiple ways of
using indexes, and if SQLite's heuristics happen to choose the wrong one.

(Also see .)


Regards,
Clemens


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread GB


Rob Willett schrieb am 03.07.2015 um 00:16:
> SCAN TABLE RAG USING COVERING INDEX Rag_Idx4
>
>

Since Rag_Idx4 is quite wide and not primarily ordered by "Text", an 
index scan might not be significantly faster than a table scan. As 
already mentioned, ANALYZE might help.

For curiosity you may try to force it to use "Rag_Idx1" by executing

"select Id,Calculation,Peak,Red,Amber,Green from RAG INDEXED BY Rag_Idx1 
where text = ?"

and see if it makes any difference. If it does you may consider 
rearranging Rag_Idx4.


hth
Gerd


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Thanks for the reply

> I can tell you that the SQLite3 command line program uses threading and is 
> extremely fast and optimised, it is no surprise that the query takes much 
> longer in many other programs using a library or even compiled-in SQLite 
> code. If it takes 50 seconds piped into SQLite then 100+ seconds in Perl is 
> not completely weird.
> 
> You say that the SELECT on the Table is taking 13ms/call, but I assume you 
> mean the SELECT statement takes 13ms to run from start to finish and copy all 
> the selected DB content into Perl text arrays... this may be a bottleneck. 
> How many lines (records) are returned by the queries on average? For 10K+ 
> lines, 13ms would be a fantastic result - but on a 43K row table I doubt that 
> very much.

I have no doubt that sqlite3 is fast and optimised, thats why we like it. 

My comments before on the speed difference of 50 secs for doing 5,000 selects 
on the database (approx) vs 120 seconds for doing ALL the Perl work which 
includes the 5,000 individual SELECT?s. My assumption was that sqlite3 is very 
good at its job and that using sqlite3 directly is as good a performance as we 
can get. So thats my baseline to work from.

Each select will return either one row or no rows. The text column is unique 
across the table. 

We see the difference of 10ms and 13ms as acceptable in relative terms. If Perl 
adds 0.03ms to the select query we can understand that as thats the overhead of 
using Perl as the database manager. What we are trying to understand, and I 
apologise for stating it inelegantly or not being clear, is whether 10ms is 
acceptable performance to pull a single line out of a database using a select 
on a text field in a table that has an index on it. 

If 10ms is the very best sqlite3 can do using nothing but a select statement 
directly onto the database using sqlite3, with no overhead of Perl then we are 
comfortable with the13ms it takes in Perl, We understand the 0.03ms extra that 
Perl adds, we?re trying to see if the 10ms is acceptable. That only works out 
at 100 selects per second, but the test pages we have seen suggests we should 
get far, far faster. 

> 
> The most immediate optimisation I can see is making the column collation 
> NOCASE (as opposed to the Index). This will avoid running the collation 
> function on every item. SQLite may actually optimise this already and I don't 
> know if it does, but it's worth testing.

We will try this and report back.

> Another thing I would do is add a column which uses tokens for strings, as it 
> seems these are standard repeating strings which may be reduced to a simple 
> Integer, which will be much faster to look up than long string comparisons.

Sadly the text column is not standard repeating strings. The aim of the table 
is to hold all the myriad of ways that Transport for London can say similar 
things, e.g. they will state ?Traffic is normal for the time of day?, ?Traffic 
is normal?, ?Normal Traffic?, Traffic is normal this morning?, "trafic is slow? 
(the spelling mistake is deliberate) and so on and so on. 

Add in all the various ways you can descriptor lane closures, traffic jams, 
traffic lights being out, bridge closures and openings. A person writes the 
update for each disruption and they often do use the same words but they 
misspell something, they change things round, they add new stuff in. 

The RAG table is a way to ?normalise? that information so that we have a 
database of what has been said, what we classified it as (Red/Amber/Green), how 
we calculated that RAG status, we also shorten and stem the words and do 
Bayesian analysis on new versions of phrases to try and work out what is meant. 
Its a key table for us and a lot of pre-processing work goes into creating it. 

We wish there was a standard set of phrases but there isn?t. The 43K lines 
demonstrate it, we are constantly surprised by the new ways people find to 
describe exactly the same situation :) We had looked at your idea previously 
and assumed that we could use a similar approach, but the wide range of text we 
found stopped that working. Using natural language stemming and Bayesian 
analysis got us closer but its still imperfect

> If it isn't part of the usual maintenance, try re-pack the DB file using 
> VACUUM.

We?ll check tor see if we do that and if not we?ll try it and report back.
> 
> In normal Code we would of course compute and re-use the prepared statements 
> shaving off another significant bit of time, but not sure if the Perl 
> interface allows for this or if it will reduce significant time for those 
> straight-forward queries.

We already use prepared statements wherever possible. Its pretty much standard 
coding practise for us. We also tried removing the prepared statement for this 
particular query and found it made little difference which surprised us. 

> Looking at your system specs, it's hard to imagine your results NOT being 
> significantly faster than the quoted 

[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Ward,

Thanks for this. 

We are already doing ANALYZE and sadly its not made any difference,.

Rob

> On 3 Jul 2015, at 00:04, Ward Willats  wrote:
> 
> 
>> On Jul 2, 2015, at 3:16 PM, Rob Willett  
>> wrote:
>> 
>> We?re trying to understand whether or not we have a performance problem with 
>> our Sqlite database.
> 
> It may or may not apply to your situation, but after doing lots of inserts, 
> running ANALYZE can sometimes work wonders.
> 
> -- Ward
> 
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread R.Smith


On 2015-07-03 12:16 AM, Rob Willett wrote:
> To try and isolate the problem away from the Perl program, we then generated 
> the 5,000 SQL calls that would be made by the Perl program into a file. e.g
>
> select Id,Calculation,Peak,Red,Amber,Green from RAG where text = 'traffic is 
> moving well on diversion.';
> select Id,Calculation,Peak,Red,Amber,Green from RAG where text = 'approach 
> with caution.';
> select Id,Calculation,Peak,Red,Amber,Green from RAG where text = 'traffic is 
> currently flowing well in the area.';
> select Id,Calculation,Peak,Red,Amber,Green from RAG where text = 'traffic is 
> slow moving past the restrictions.';
> select Id,Calculation,Peak,Red,Amber,Green from RAG where text = 'traffic is 
> slow moving on approach to closures in both directions.';
> select Id,Calculation,Peak,Red,Amber,Green from RAG where text = 'expect 
> delays.';
>
> ?. similar but slightly different SQL for the next 5,000 lines.
>
> We then piped that straight into sqlite and timed the output.
>
> time cat ,test1 | sqlite3 db.sqlite > /dev/null
>
> real  0m49.610s
> user  0m31.328s
> sys   0m18.272s
>
> This took around 50 seconds which is faster than the Perl program (122 
> seconds). Since the Perl program has to read in a 10 x 5MB XML file, build up 
> XML structures, walk the XML, extract lines from the XML and do lots of other 
> programming stuff this doesn?t seem unreasonable.
>
> Whilst piping a large amount of SQL into sqlite doesn?t feel quite right, 
> we?re struggling with how else we can benchmark this to see if we have a 
> problem or not. All suggestions welcomed.

I can tell you that the SQLite3 command line program uses threading and 
is extremely fast and optimised, it is no surprise that the query takes 
much longer in many other programs using a library or even compiled-in 
SQLite code. If it takes 50 seconds piped into SQLite then 100+ seconds 
in Perl is not completely weird.

You say that the SELECT on the Table is taking 13ms/call, but I assume 
you mean the SELECT statement takes 13ms to run from start to finish and 
copy all the selected DB content into Perl text arrays... this may be a 
bottleneck. How many lines (records) are returned by the queries on 
average? For 10K+ lines, 13ms would be a fantastic result - but on a 43K 
row table I doubt that very much.

The most immediate optimisation I can see is making the column collation 
NOCASE (as opposed to the Index). This will avoid running the collation 
function on every item. SQLite may actually optimise this already and I 
don't know if it does, but it's worth testing. Another thing I would do 
is add a column which uses tokens for strings, as it seems these are 
standard repeating strings which may be reduced to a simple Integer, 
which will be much faster to look up than long string comparisons.

You could add the token replacement in Perl or SQL, and update the 
current db by simply:
UPDATE RAG SET token = 1 WHERE text = 'traffic is moving well on diversion.'
UPDATE RAG SET token = 2 WHERE text = 'approach with caution.'
etc.
Then create an index for token and select like this:
select Id,Calculation,Peak,Red,Amber,Green from RAG where token = 2
and so on.
The gain will be minimal for any single select, but on 5K selects it 
will not be insignificant.

If it isn't part of the usual maintenance, try re-pack the DB file using 
VACUUM.

In normal Code we would of course compute and re-use the prepared 
statements shaving off another significant bit of time, but not sure if 
the Perl interface allows for this or if it will reduce significant time 
for those straight-forward queries.

Looking at your system specs, it's hard to imagine your results NOT 
being significantly faster than the quoted test on the SQLite site 
(Since then, even SQLite itself has undergone many significant 
performance improvements). If the above still doesn't make it much 
faster I'd start checking for hardware bottlenecks or other settings 
that might influence the speed.




[sqlite] Weird FTS bugs (wrong, missing and false positive results)

2015-07-03 Thread Dan Kennedy
On 07/02/2015 09:24 PM, Kevin Benson wrote:
> FTFY, you're welcome ;-)
> Now *MAYBE* someone can read it and possibly reply.
>
> --
> --
>--
>   --???--
>  K e V i N
>
> On Thu, Jul 2, 2015 at 9:11 AM,  wrote:
>
>> We use SQLite for indexing and searching the text contents of our app
>> using FTS4
>> (about 27k unique words, about 1 million words for the whole contents). In
>> particular,
>> we use the offsets function. Currently, after some testing, we?re
>> experiencing a plenty
>> of problems with finding the results needed.
>>
>> For the forms of the words searching we use the ?all-to-all? way, which
>> means we have
>> some standard English endings and words forms and modify the initial
>> request so that
>> all possible combinations are included and separated by OR from each
>> other.
>>
>> I. e. if we have two forms and two words in the request (for simplicity),
>> that would look
>> like (the MATCH part):
>>
>> ?(word1_form1 NEAR/10 word2_form1)
>> OR (word1_form1 NEAR/10 word2_form2)
>> OR (word1_form2 NEAR/10 word2_form1)
>> OR (word1_form2 NEAR/10 word2_form2)?.
>>
>> Initially, the problem appeared that the query returned offsets for
>> absolutely wrong words.
>> While searching for ?honest fair?, we got words like ?good?, ?sport?,
>> ?natural? offsets.
>> We moved from the system provided (iOS, OS X) SQLite (version 3.8.5,
>> further called
>> ?old SQLite?) to the latest one available on the official site (version
>> 3.8.10.2,
>> further called ?new SQLite? or just ?SQLite?), which solved the issue, but
>> returned offsets
>> for words which were not near (according to NEAR/10). I supposed it?s
>> because the request
>> was too big because of lots of the words forms. But, for example, this
>> request
>>
>> for ?offense is rank?:
>> select documents.name, contents.rowid, part, offsets(contents)
>> from contents left join documents on contents.document_id = documents.id
>> where text match '(\"offense\" NEAR/10 \"is\" NEAR/10 \"rank\")'
>> order by document_id
>>
>> returned the correct results, while this one returned false positive
>> results (0 stands for
>> apostrophe so it?s not interpreted as a separate token for the simple
>> tokenizer we use):
>>
>> select documents.name, contents.rowid, part, offsets(contents)
>> from contents left join documents on contents.document_id = documents.id
>> where text match '(\"offense\" NEAR/10 \"is\" NEAR/10 \"rank\")
>> OR (\"offense\" NEAR/10 \"is\" NEAR/10 \"rank0s\")'
>> order by document_id
>>
>> The experiments revealed that ?rank0s? can be whatever: ranks",
>> "rankqwerty" and so on,
>> even "rankqwertyvrfbgbrevwkefovmwpsrvrm" or "yugbuoiipkipnuo?. If it's
>> removed, nothing
>> found. If after that the first word is modified, the one correct result
>> found. Also, a
>> search for ?speak again? didn?t find the results at all, though there
>> should be quite a
>> lot of them.
>>
>> The ?database disk image is malformed? error was reported, but integrity
>> check completed
>> with no errors, the database was recreated and that didn?t help, meanwhile
>> the old SQLite
>> did find the results for this request in exactly that file with no error
>> reported.
>>
>> Also, the new SQLite worked well for the same request if the offsets
>> function was replaced
>> with the snippet function (that also solved the false positive results
>> problem described
>> above). The search for ?father? returned not all results. For example, the
>> first result
>> which can manually be found in the contents table using SQLite Database
>> Browser for this
>> request:
>>
>> select documents.name, contents.rowid, part, offsets(contents)
>> from contents left join documents on contents.document_id = documents.id
>> where text match '(\"father\")
>> OR (\"fathere\") OR (\"fathering\")
>> OR (\"fatherish\") OR (\"fathers\")
>> OR (\"fatheres\") OR (\"fatherian\")
>> OR (\"fatheral\") OR (\"father0s\")'
>> and document_id in (25)
>> order by document_id
>>
>> missed when actually searched, the snippet function doesn?t help here,
>> neither helps
>> searching for a single word form. Error logging with
>>
>> sqlite3_config(SQLITE_CONFIG_LOG, errorLogCallback, NULL)
>>
>> was never called.
>>   Any ideas?


It's difficult to say. The level of detail you have provided is a little 
overwhelming.

If you are using a custom tokenizer and it behaves inconsistently you 
could get these kind of problems. Or it could also be a bug in the 
combination of NEAR, OR and an auxiliary FTS function.

Are you able to post a database online along with a query that returns 
an incorrect result? And the tokenizer implementation too if possible, 
although it might be possible to figure it out without that.

Does "neither helps searching for a single word form" mean that a simple 
query like "text MATCH 'father'" is also failing? If so, that's the one 
to post.

You are compiling with SQLITE_ENABLE_FTS3_PARENTHESIS defined, correct?

Dan.






[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Simon Slavin

On 2 Jul 2015, at 11:16pm, Rob Willett  wrote:

> We process this XML feed and pull out chunks of the XML, process it and 
> update our database with it. This database is currently 16GB in size. Our 
> concern is the time taken to process each file every five minutes. It has 
> been getting steadily longer and longer. It started out at around 6 seconds 
> and is now around 22-24 seconds. Whilst we expect processing time to get 
> bigger, we are getting concerned about the performance and decided to have a 
> look.

This suggests fragmentation.  There can be fragmentation at various levels for 
a SQLite database.  Fortunately it's simple to get rid of them.  First, take 
some timings.

Using the '.dump' and '.read' commands from the command-line tool to create a 
new database.  It will be created with each row in each table in primary-key 
order.  Once you've done that defragment the disk the database is stored on 
using your OS tools (if possible).  Once you've done that take the same timings 
and see whether anything improved.

> The database (DB) we use is only ever read and written by a single process, 
> we do not network the database

Don't forget that write access to the database require exclusive access to the 
entire database.  You could implement some fancy multi-processing scheme only 
to find that it's defeated by the SQLite library.

> The actual SQL called 5,000 times is 
> 
> ?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??

[...]
> 
> The RAG table schema is 
> 
> CREATE TABLE "RAG" (
>"Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
>"Count" integer NOT NULL DEFAULT 0,
>"Text" TEXT,
>"Peak" integer,
>"Calculation" integer NOT NULL DEFAULT 0,
>"Red" integer DEFAULT 0,
>"Amber" integer DEFAULT 0,
>"Green" integer DEFAULT 0,
>"BayesAttributes" TEXT
> );
> 
> It has four indexes on it 
> 
> CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);

Do you ever depend on any indexing on the "Text" column which is not COLLATE 
NOCASE ASC ?  If not, then you should be able to speed up your search by 
defining the column the way you think of it.  So in your table definition use

 "Text" TEXT COLLATE NOCASE ASC,

You may find that this immediately speeds up the search.  Or you may find that 
you may have to change your SELECT to

SELECT Id,Calculation,Peak,Red,Amber,Green FROM RAG WHERE text = ? ORDER BY 
Text COLLATE NOCASE ASC

Simon.


[sqlite] Could frequent 'database is locked' errors mean SQLite is not a good fit for my application?

2015-07-03 Thread purch...@bilney.co.uk
Hi,

I'm not an expert on bash or scripts etc.
However, if I were doing something similar in C++ then I'd consider having a 
dedicated thread to
manage a queue of dB operations and committing them all from this single thread.
You can then take control of a whole host of things - maintaining the
write order if important, doing the writes either synchronously or 
asynchronously (and notifying
the caller when complete etc.)
You might also have to consider amalgamating many requests into a single 
transaction (although that would complicate
the handling of errors)

Graham

On 02/07/2015 15:09, Kathleen Alexander wrote:
> Hi,
>
> I apologize if this is an incorrect forum for this question, but I am
> pretty new to SQLite and have been unable to resolve this issue through
> other searches. Feel free to direct me to a more appropriate forum.
>
> Essentially, I have written an application in C++ that interfaces (reads
> and writes) with a SQLite database, and I am getting lots of 'database is
> locked' errors. Right now, I am trying to establish whether those errors
> are due to my improper use of SQLite itself, or if the real problem is that
> SQLite is not a good fit for my application.
>
> My application runs on Linux (ubuntu 13.10), and is driven by a bash script
> that spawns many (~60 on a 64 core workstation) instances of a serial, C++
> program, each of which opens its own connection to the database and
> performs reads and writes.
>
> *An example SELECT query from my program looks like:*
> //open db connection
> sqlite3 *db;
> char *zErrMsg = 0;
> SQLITE3 sql(dbase.c_str());
>
> statement = "SELECT * from configs_table WHERE id='31'";
> sql.exe(statement.c_str());
> if( sql.vcol_head.size() > 0 ){
> //do things with sql.vdata[]
> }//end query returned results
>
> *An example of a write statement looks like:*
> statement = "UPDATE configs_table SET searched='2' WHERE id='31'";
> sql.exe(statement.c_str());
>
> About 97% of the time, the select statement works fine, but in the other 3%
> of cases, I see a 'database is locked' error in the log file of my program.
> About 50% of the time, the write statement returns 'database is locked'.
>
> Additionally, if this application is running and I try to query the
> database from the terminal, I almost always get a 'database is locked'
> error.
>
> Thus, I am wondering if I am doing something wrong in my implementation of
> the C++ --> SQLite interaction, or if the real problem is that this
> application is not well suited to use with SQLite (I went through the
> checklist before implementing it and thought my application passed the
> suitability requirements).
>
> Lastly:
> A. if it seems like this is an implementation issue, rather than a
> capability issue, if I were to scale up my method to spawn say 500-1000
> processes at a time (on a supercomputing cluster), would there be any
> concern about SQLite scaling to that level?
> B. If SQLite is not a good fit for my program, do you have any suggestions
> of an alternative database engine that is free or might be free or
> inexpensive for academic use?
>
> Thanks in advance,
> Kathleen
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
>
> -
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2015.0.6037 / Virus Database: 4365/10144 - Release Date: 07/02/15
>
>



[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-03 Thread Rob Willett
Hi,

We?re trying to understand whether or not we have a performance problem with 
our Sqlite database. It may well be that we are ?optimal? for some value of 
optimal or it may be that we can improve our system. I was hoping to use the 
experience here to help focus our attention or give some guidance based on real 
world usage.

It may well be that we don?t have a problem and our performance is appropriate. 
Thats also a good result as we can stop worrying about things :)

This e-mail has quite a lot of information in as we want to give as much 
information as appropriate. We don?t want to swamp you but we can?t simplify it 
too much.

Background

Our database takes a five minute feed of traffic information in London. This 
dated is XML based and has approximately 500-700 items in it. These items cover 
everything from a new traffic jam through to three year old roadworks. Yes 
there are roadworks in London that go back three years!

We process this XML feed and pull out chunks of the XML, process it and update 
our database with it. This database is currently 16GB in size. Our concern is 
the time taken to process each file every five minutes. It has been getting 
steadily longer and longer. It started out at around 6 seconds and is now 
around 22-24 seconds. Whilst we expect processing time to get bigger, we are 
getting concerned about the performance and decided to have a look.

Our XML processing system is written in Perl, which has advantages and 
disadvantages, Probably the biggest advantage is its forgiving for development 
and allows us to move quickly with processing significantly large amounts of 
text within the XML. The biggest disadvantage to use is the lack of 
multithreading so its limited by the clock speed of the CPU. The database (DB) 
we use is only ever read and written by a single process, we do not network the 
database or allow any access apart from processing the XML file and getting an 
output JSON file.  

As our processing time has got longer, we decided to have a look at where our 
Perl program was spending most time. For the interested we use the Perl module 
Devel::NYTProf. This is a well known and well respected performance profiling 
tool. 

http://search.cpan.org/~timb/Devel-NYTProf-6.01/lib/Devel/NYTProf.pm 


We do not need to adapt our perl code to use it, you simply involve Perl with 
-d:NTYProd  .

The output is a nice html web page that allows you to drill down into the 
sections of the code that take the most time up. We ran the program over ten 
iterations of files to even things out on a 16GB Sqlite database and looked at 
the output. We were very surprised to see where most of the time was spent, out 
of a two minute run 70% of the time was spent in an innocuous function. This 
was odd!

We drilled down even further and found that a single select call was occupying 
90% of that 70%. The performance profiler was indicating that a select on a 
table was taking around 13ms/call and we were doing around 5,000 calls in our 
ten file run. The approx 5,000 calls was about right as there are around 500 
distinct chunks of information per file. So the numbers seemed to add up. The 
issue for us was the 13ms per call.

The actual SQL called 5,000 times is 

?select Id,Calculation,Peak,Red,Amber,Green from RAG where text = ??

RAG is a Red/Amber/Green table and sentence is a simple sentence that describes 
road conditions. See below for examples.

The RAG table schema is 

CREATE TABLE "RAG" (
 "Id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
 "Count" integer NOT NULL DEFAULT 0,
 "Text" TEXT,
 "Peak" integer,
 "Calculation" integer NOT NULL DEFAULT 0,
 "Red" integer DEFAULT 0,
 "Amber" integer DEFAULT 0,
 "Green" integer DEFAULT 0,
 "BayesAttributes" TEXT
);

It has four indexes on it 

CREATE UNIQUE INDEX "RAG_Idx1" ON RAG ("Text" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "RAG_Idx2" ON RAG ("Calculation" COLLATE NOCASE ASC, "Text" 
COLLATE NOCASE ASC);
CREATE INDEX "Rag_Idx3" ON RAG ("Calculation" COLLATE NOCASE ASC, 
"BayesAttributes" COLLATE NOCASE ASC);
CREATE UNIQUE INDEX "Rag_Idx4" ON RAG ("Id" COLLATE NOCASE ASC, "Calculation" 
COLLATE NOCASE ASC, "Peak" COLLATE NOCASE ASC, "Red" COLLATE NOCASE ASC, 
"Amber" COLLATE NOCASE ASC, "Green" COLLATE NOCASE ASC, "Text" COLLATE NOCASE 
ASC);

The table has approximately 43,000 rows in it and doesn?t grow very much now. 
We don?t have complex keys linking things together, we have a simple table.

Our first thought was that the Perl program was in the way

We made sure we prepared statements in advance, we analysed the indexes and 
rebuilt them, we checked the query plan again and did normal simple housekeeping

The query plan showed 

SCAN TABLE RAG USING COVERING INDEX Rag_Idx4

which we think is OK.

To try and isolate the problem away from the Perl program, we then generated 
the 

[sqlite] Could frequent 'database is locked' errors mean SQLite is not a good fit for my application?

2015-07-03 Thread Simon Slavin

On 2 Jul 2015, at 3:59pm, Kathleen Alexander  wrote:

> I don't explicitly set sqlite3_busy_timeout(sqlite3*, int ms) so I will try
> adding that after the database connection is opened to see if it limits
> those errors.

This will have an extreme effect.  The default timeout for SQLite is not to use 
a timeout at all: any clash in access will result in immediate failure.  Set 
your timeout to 30 seconds and see what happens.

Worth noting that the timeout needs to be set by each application for each 
connection.  If you set it for the first application which opens the database 
it won't automatically be applied by other connections.

Simon.