date:20170303

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Keith Medcalf


You must never have used a Microsoft Operating system ...

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Deon Brewis
> Sent: Saturday, 4 March, 2017 00:33
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
> 
> Can you give an example of an Intel Processor SKU with half-assed cores?
> 
> There's HyperThreading of course, but I don't think anybody has ever
> considered HyperThreading to be separate cores.
> 
> - Deon
> 
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Keith Medcalf
> Sent: Friday, March 3, 2017 4:52 PM
> To: SQLite mailing list 
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
> 
> 
> No, the good rule of thumb is to allocate one thread per CPU.  Depending
> on the particular multi-core CPU you "may" be able to use all the cores
> simultaneously but in many cases diminishing returns will set in long
> before you can execute one thread per core.  If this is an Intel processor
> that claims it has more than one thread per core be *extremely* careful as
> that will give you one thread and one half-assed thread per core.
> Sometimes, half-assed cores are presented as real cores when they lack a
> separate execution unit.  Be vary wary.
> 
> I presume you are forcing separate threads to separate cores by setting
> processor affinity on the threads and not just blindly hoping that the OS
> scheduler does "the right thing"?
> 
> > -Original Message-
> > From: sqlite-users
> > [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> > On Behalf Of Jens Alfke
> > Sent: Friday, 3 March, 2017 17:19
> > To: SQLite mailing list
> > Subject: Re: [sqlite] Massively multithreaded SQLite queries
> >
> >
> > > On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
> > >
> > > Then mess with the '5' until you find a good value.
> >
> > A common rule of thumb with thread pools is to allocate one thread per
> > CPU core.
> >
> > —Jens
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Deon Brewis

Can you give an example of an Intel Processor SKU with half-assed cores?

There's HyperThreading of course, but I don't think anybody has ever considered 
HyperThreading to be separate cores.

- Deon

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Keith Medcalf
Sent: Friday, March 3, 2017 4:52 PM
To: SQLite mailing list 
Subject: Re: [sqlite] Massively multithreaded SQLite queries

No, the good rule of thumb is to allocate one thread per CPU.  Depending on the 
particular multi-core CPU you "may" be able to use all the cores simultaneously 
but in many cases diminishing returns will set in long before you can execute 
one thread per core.  If this is an Intel processor that claims it has more 
than one thread per core be *extremely* careful as that will give you one 
thread and one half-assed thread per core.  Sometimes, half-assed cores are 
presented as real cores when they lack a separate execution unit.  Be vary wary.

I presume you are forcing separate threads to separate cores by setting 
processor affinity on the threads and not just blindly hoping that the OS 
scheduler does "the right thing"?

> -Original Message-
> From: sqlite-users 
> [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jens Alfke
> Sent: Friday, 3 March, 2017 17:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
> 
> 
> > On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
> >
> > Then mess with the '5' until you find a good value.
> 
> A common rule of thumb with thread pools is to allocate one thread per 
> CPU core.
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] [BUG REPORT] Open blob is invalidated by the update to unrelated fields in the same row

2017-03-03 Thread Yuri

The write operation using the open sqlite3_blob object fails after some 
other field in the same row is updated.


The testcase below illustrates the problem.


Yuri



---testcase---

#include 
#include 
#include "sqlite3.h"

int main(int argc, char **argv) {
if(argc < 2) {
fprintf(stderr, "Usage: %s  \n", 
argv[0]);

exit(1);
}

// file to insert
FILE *f = fopen(argv[2], "rb");
if(NULL == f) {
fprintf(stderr, "Couldn't open file %s\n", argv[2]);
exit(1);
}
// Calculate size of file
fseek(f, 0, SEEK_END);
long filesize = ftell(f);
fseek(f, 0, SEEK_SET);

// Table name we're going to use
char tablename[] = "testblob";
char columnname[] = "blobby";

// Actual database handle
sqlite3 *db = NULL;

// Database commands
char create_sql[1024];
snprintf(create_sql, sizeof(create_sql), "CREATE TABLE IF NOT 
EXISTS %s ("
"id INTEGER PRIMARY KEY, fld INTEGER, %s 
BLOB)", tablename, columnname);


// Going to insert a zeroblob of the size of the file
char insert_sql[1024];
snprintf(insert_sql, sizeof(insert_sql), "INSERT INTO %s (%s) 
VALUES (?)", tablename, columnname);


// SQLite return value
int rc;
// Open the database
rc = sqlite3_open(argv[1], &db);
if(SQLITE_OK != rc) {
fprintf(stderr, "Can't open database %s (%i): %s\n", 
argv[1], rc, sqlite3_errmsg(db));

exit(1);
}

char *exec_errmsg;
rc = sqlite3_exec(db, create_sql, NULL, NULL, &exec_errmsg);
if(SQLITE_OK != rc) {
fprintf(stderr, "Can't create table (%i): %s\n", rc, 
sqlite3_errmsg(db));

sqlite3_close(db);
exit(1);
}
sqlite3_stmt *insert_stmt;
rc = sqlite3_prepare_v2(db, insert_sql, -1, &insert_stmt, NULL);
if(SQLITE_OK != rc) {
fprintf(stderr, "Can't prepare insert statment %s (%i): 
%s\n", insert_sql, rc, sqlite3_errmsg(db));

sqlite3_close(db);
exit(1);
}
// Bind a block of zeros the size of the file we're going to 
insert later

sqlite3_bind_zeroblob(insert_stmt, 1, filesize);
if(SQLITE_DONE != (rc = sqlite3_step(insert_stmt))) {
fprintf(stderr, "Insert statement didn't work (%i): 
%s\n", rc, sqlite3_errmsg(db));

exit(1);
}

sqlite3_int64 rowid = sqlite3_last_insert_rowid(db);
printf("Created a row, id %i, with a blank blob size %i\n", 
(int)rowid, (int)filesize);


sqlite3_blob *blob;
rc = sqlite3_blob_open(db, "main", tablename, columnname, 
rowid, 1, &blob);

if(SQLITE_OK != rc) {
fprintf(stderr, "Couldn't get blob handle (%i): %s\n", 
rc, sqlite3_errmsg(db));

exit(1);
}

const int BLOCKSIZE = 1024;
int len;
void *block = malloc(BLOCKSIZE);
int offset = 0;
while(0 < (len = fread(block, 1, BLOCKSIZE, f))) {
if(SQLITE_OK != (rc = sqlite3_blob_write(blob, block, 
len, offset))) {
fprintf(stderr, "Error writing to blob handle. 
Offset %i, len %i, rc=%d\n", offset, len, rc);

exit(1);
}
offset+=len;
printf("... wrote block @off=%d\n", offset);

// update that breaks the following sqlite3_blob_write

if(SQLITE_OK != (rc = sqlite3_exec(db, "UPDATE testblob 
SET fld=1", NULL, NULL, &exec_errmsg))) {
fprintf(stderr, "Error while updating the 
integer field, error=%d\n", rc);

exit(1);
}

}

sqlite3_blob_close(blob);

printf("Successfully wrote to blob\n");

free(block);

fclose(f);
sqlite3_finalize(insert_stmt);
sqlite3_close(db);
return 0;
}

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Keith Medcalf


As an aside, you may likely find that the OS will schedule multiple processes 
far more efficiently than it schedules multiple threads (especially from the 
perspective of scheduling the cores independently).  You may find that it is 
far more efficient to perhaps schedule a dozen processes with six to a dozen 
worker threads each (or whatever number seems to be optimal) and use some kind 
of IPC to submit workload to the individual processes.  

Also, make sure the database is in WAL journal mode -- this will help even if 
you are only running read transactions + if you have enough memory make sure 
the page cache and temp space is big enough on each connection and force temp 
work into memory - but do not overcommit memory.

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Andrew Brown
> Sent: Friday, 3 March, 2017 18:24
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
> 
> Now this is a interesting question. As it happens I /am/ blindly assuming
> the os would schedule it properly. I will look at my options this evening
> and see about addressing that.
> 
> On Mar 3, 2017 4:52 PM, Keith Medcalf  wrote:
> 
> No, the good rule of thumb is to allocate one thread per CPU.  Depending
> on the particular multi-core CPU you "may" be able to use all the cores
> simultaneously but in many cases diminishing returns will set in long
> before you can execute one thread per core.  If this is an Intel processor
> that claims it has more than one thread per core be *extremely* careful as
> that will give you one thread and one half-assed thread per core.
> Sometimes, half-assed cores are presented as real cores when they lack a
> separate execution unit.  Be vary wary.
> 
> I presume you are forcing separate threads to separate cores by setting
> processor affinity on the threads and not just blindly hoping that the OS
> scheduler does "the right thing"?
> 
> > -Original Message-
> > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> > On Behalf Of Jens Alfke
> > Sent: Friday, 3 March, 2017 17:19
> > To: SQLite mailing list
> > Subject: Re: [sqlite] Massively multithreaded SQLite queries
> >
> >
> > > On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
> > >
> > > Then mess with the '5' until you find a good value.
> >
> > A common rule of thumb with thread pools is to allocate one thread per
> CPU
> > core.
> >
> > —Jens
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> >
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglis
> ts.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-
> users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9
> a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0u
> oib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0
> 
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglis
> ts.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-
> users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9
> a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0u
> oib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UNION

2017-03-03 Thread jose isaias cabrera

Thanks, Don.

-Original Message- 
From: don v nielsen

Sent: Friday, March 3, 2017 3:21 PM
To: SQLite mailing list
Subject: Re: [sqlite] UNION

Might I suggest: https://www.w3schools.com/sql/

dvn

On 03/01/2017 09:02 AM, jose isaias cabrera wrote:

Ryan,

I just want to thank you for your kindness and display of goodwill to
mankind.  This is a great response.  I even learned something from this
post.  Thanks so much for your responses.  There are others like us in
this group that love to learn and your posts always are well received.
Thanks.  In Spanish we say, "muchas gracias."

josé

On 2017-03-01 09:51, R Smith wrote:

On 2017/03/01 3:40 AM, do...@mail.com wrote:

# SELECT * FROM processors UNION SELECT * FROM storage;
Error: SELECTs to the left and right do not have the same number of
result columns.

All tables that I created in my database have differing column names,
values, and amounts of columns with the noted exception of the one
column which is common (board). I've no idea what to do now.

Why is this an error?
I think that perhaps you are new to SQL and other replies assumed you 
simply wanted what you wrote. I could be wrong, but just in case, here 
are my suggestions:

Perhaps what you would rather like to do is JOIN these tables and not 
UNION them?

Do you wish to match /every/ processor with /every/ board?
In this case, the statement should read: (Warning: this could produced 
excessively long listings)

SELECT * FROM processors, storage;

Do you wish to match only processors and storage that fit on the same 
boards?

In this case the statement might read something like:
SELECT *
FROM processors
JOIN storage USING board
;

Do you wish to list /all/ processors and add the storage options for the 
same board /if/ there are any?

In this case the statement might read something like:
SELECT *
FROM processors
LEFT JOIN storage ON storage.board = processors.board
;

As you can see, lots of different things can be achieved. A quick course 
in SQL via perhaps W3Schools will teach all these in a few hours.

Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Scott Hess

You might want to try enabling mmap mode:
  pragma mmap_size = 4294967296;
or something like that.  Try to make it larger than your databases.  I'd
expect that if you're running with that many cores, you're _probably_
running in a 64-bit address space, so it'll probably work.

-scott


On Fri, Mar 3, 2017 at 5:22 PM, Andrew Brown <
andrew.br...@economicmodeling.com> wrote:

> Yes, each thread has its own connection.
>
> On Mar 3, 2017 4:45 PM, Keith Medcalf  wrote:
>
> Does each thread have its own connection?
>
> > -Original Message-
> > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> > On Behalf Of Andrew Brown
> > Sent: Friday, 3 March, 2017 13:14
> > To: sqlite-users@mailinglists.sqlite.org
> > Subject: [sqlite] Massively multithreaded SQLite queries
> >
> > Hello,
> >
> > Based on my reading of the documentation it seems like SQLite is a great
> > candidate for cases where you have a large number of threads that only
> > need to read a database simultaneously, in our case for fast data lookup,
> > aggregation, etc. I've been able to generate SQL queries that do this,
> but
> > once we start running them on a large server with a lot of multithreading
> > going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> > 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> > __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> > more cores I run it on, the slower it ends up going.
> >
> > To give a bit more detail, I'm working with dotnet core, have written a
> > custom sqlite wrapper (since the dotnet core one lacks the ability to set
> > connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> > implemented), and I'm running on linux against a bunch of SQLite files in
> > the 2gb-400gb size range. Individual queries are wicked fast, but once I
> > start spreading the load over all the cores by running simultaneous
> > queries I lose the performance advantage and it actually becomes
> > significantly slower.
> >
> > Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> > shared cache, read uncommitted. Tried without shared cache, read
> > uncommitted. Tried WAL. If I write a less efficient query, I spend less
> > time in __raw_spin_lock, but of course then it takes longer for the
> > queries themselves to return.
> >
> > Any tips to handle massively multithreaded side by side chunked queries
> on
> > the same database?
> >
> > Thank you in advance,
> > Andrew Brown
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.
> brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=
> gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.
> brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=
> gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Warren Young

On Mar 3, 2017, at 6:24 PM, Andrew Brown  
wrote:
> 
> I /am/ blindly assuming the os would schedule it properly.

Why would it?  Windows won’t do that for you, either.

Now, if you’re using .NET’s thread pool mechanisms, it *should* be making sure 
it doesn’t oversubscribe the cores too much, but then we’re assuming Microsoft 
is writing good Linux code here.

The .NET Core code is open, so you can see what it does on Linux.

(I could see, too, but I can’t say I care enough to bother. :) )
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Andrew Brown

Now this is a interesting question. As it happens I /am/ blindly assuming the 
os would schedule it properly. I will look at my options this evening and see 
about addressing that.

On Mar 3, 2017 4:52 PM, Keith Medcalf  wrote:

No, the good rule of thumb is to allocate one thread per CPU.  Depending on the 
particular multi-core CPU you "may" be able to use all the cores simultaneously 
but in many cases diminishing returns will set in long before you can execute 
one thread per core.  If this is an Intel processor that claims it has more 
than one thread per core be *extremely* careful as that will give you one 
thread and one half-assed thread per core.  Sometimes, half-assed cores are 
presented as real cores when they lack a separate execution unit.  Be vary wary.

I presume you are forcing separate threads to separate cores by setting 
processor affinity on the threads and not just blindly hoping that the OS 
scheduler does "the right thing"?

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jens Alfke
> Sent: Friday, 3 March, 2017 17:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
> >
> > Then mess with the '5' until you find a good value.
>
> A common rule of thumb with thread pools is to allocate one thread per CPU
> core.
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0uoib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Ca4729ccfb1754b9a534008d46298b2d9%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=C6yNkbM7p0uoib5OgXXYS7%2BEy8cJR1I6N8JjaQ4iMCU%3D&reserved=0

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Andrew Brown

Yes, each thread has its own connection.

On Mar 3, 2017 4:45 PM, Keith Medcalf  wrote:

Does each thread have its own connection?

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Andrew Brown
> Sent: Friday, 3 March, 2017 13:14
> To: sqlite-users@mailinglists.sqlite.org
> Subject: [sqlite] Massively multithreaded SQLite queries
>
> Hello,
>
> Based on my reading of the documentation it seems like SQLite is a great
> candidate for cases where you have a large number of threads that only
> need to read a database simultaneously, in our case for fast data lookup,
> aggregation, etc. I've been able to generate SQL queries that do this, but
> once we start running them on a large server with a lot of multithreading
> going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> more cores I run it on, the slower it ends up going.
>
> To give a bit more detail, I'm working with dotnet core, have written a
> custom sqlite wrapper (since the dotnet core one lacks the ability to set
> connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> implemented), and I'm running on linux against a bunch of SQLite files in
> the 2gb-400gb size range. Individual queries are wicked fast, but once I
> start spreading the load over all the cores by running simultaneous
> queries I lose the performance advantage and it actually becomes
> significantly slower.
>
> Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> shared cache, read uncommitted. Tried without shared cache, read
> uncommitted. Tried WAL. If I write a less efficient query, I spend less
> time in __raw_spin_lock, but of course then it takes longer for the
> queries themselves to return.
>
> Any tips to handle massively multithreaded side by side chunked queries on
> the same database?
>
> Thank you in advance,
> Andrew Brown
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D&reserved=0

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Scott Hess

Yes, if they are lock bound, then they need to have the number of cores
which reduces the locking overhead to the point where it's not degrading
performance too much.  Though I guess the OP really didn't say that (more
CPUs may spend more time in spinlocks and still spend less wallclock time).

Another thing to look at it whether any queries can be more effectively
scheduled.  Having hundreds of completely-unrelated queries seems unlikely
to me.  More likely is that you have a smaller number of queries which are
targeting various different bind parameters.  Preparing a particular query
once, then looping and running each set of bind parameters on one thread is
probably going to be _much_ more efficient.

-scott

On Fri, Mar 3, 2017 at 5:03 PM, Warren Young  wrote:

> On Mar 3, 2017, at 5:51 PM, Keith Medcalf  wrote:
> >
> > No, the good rule of thumb is to allocate one thread per CPU.
>
> It depends on the workload.  Parallel make (e.g. “make -jN” in GNU make)
> typically improves in speed past N=core count to about 1.5x the core count.
>
> SQLite seems like a similar kind of workload: lots of CPU *and* disk I/O,
> so that you need a bit of oversubscription to keep all the cores busy,
> because some threads/processes will be stalled on I/O.
>
> Not that any of this is relevant at the current point, since the OP is
> currently neither I/O bound nor CPU-bound, but lock-bound.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Warren Young

On Mar 3, 2017, at 5:51 PM, Keith Medcalf  wrote:
> 
> No, the good rule of thumb is to allocate one thread per CPU.

It depends on the workload.  Parallel make (e.g. “make -jN” in GNU make) 
typically improves in speed past N=core count to about 1.5x the core count.

SQLite seems like a similar kind of workload: lots of CPU *and* disk I/O, so 
that you need a bit of oversubscription to keep all the cores busy, because 
some threads/processes will be stalled on I/O.

Not that any of this is relevant at the current point, since the OP is 
currently neither I/O bound nor CPU-bound, but lock-bound.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Keith Medcalf

No, the good rule of thumb is to allocate one thread per CPU.  Depending on the 
particular multi-core CPU you "may" be able to use all the cores simultaneously 
but in many cases diminishing returns will set in long before you can execute 
one thread per core.  If this is an Intel processor that claims it has more 
than one thread per core be *extremely* careful as that will give you one 
thread and one half-assed thread per core.  Sometimes, half-assed cores are 
presented as real cores when they lack a separate execution unit.  Be vary wary.

I presume you are forcing separate threads to separate cores by setting 
processor affinity on the threads and not just blindly hoping that the OS 
scheduler does "the right thing"?

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jens Alfke
> Sent: Friday, 3 March, 2017 17:19
> To: SQLite mailing list
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
> 
> 
> > On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
> >
> > Then mess with the '5' until you find a good value.
> 
> A common rule of thumb with thread pools is to allocate one thread per CPU
> core.
> 
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Keith Medcalf


Does each thread have its own connection?

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Andrew Brown
> Sent: Friday, 3 March, 2017 13:14
> To: sqlite-users@mailinglists.sqlite.org
> Subject: [sqlite] Massively multithreaded SQLite queries
> 
> Hello,
> 
> Based on my reading of the documentation it seems like SQLite is a great
> candidate for cases where you have a large number of threads that only
> need to read a database simultaneously, in our case for fast data lookup,
> aggregation, etc. I've been able to generate SQL queries that do this, but
> once we start running them on a large server with a lot of multithreading
> going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> more cores I run it on, the slower it ends up going.
> 
> To give a bit more detail, I'm working with dotnet core, have written a
> custom sqlite wrapper (since the dotnet core one lacks the ability to set
> connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> implemented), and I'm running on linux against a bunch of SQLite files in
> the 2gb-400gb size range. Individual queries are wicked fast, but once I
> start spreading the load over all the cores by running simultaneous
> queries I lose the performance advantage and it actually becomes
> significantly slower.
> 
> Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> shared cache, read uncommitted. Tried without shared cache, read
> uncommitted. Tried WAL. If I write a less efficient query, I spend less
> time in __raw_spin_lock, but of course then it takes longer for the
> queries themselves to return.
> 
> Any tips to handle massively multithreaded side by side chunked queries on
> the same database?
> 
> Thank you in advance,
> Andrew Brown
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Andrew Brown

So in this case I'm running on a 72 core machine.

Also the databases have one table in them each... The goal here is to bring all 
the cores to bear on the many queries - each grabbing the next query to be run 
and running it, until finally there are no more chunks to run.

Then within my own apis I aggregate the results and form my response.

I'm going to try preparing all the statements before I run any later tonight. 
I'm also toying with the idea of using a shared nothing architecture, in which 
I run 72 processes instead of threads, in the hope that there will be less 
contention that way. Thoughts on that idea?

I really appreciate everyone's responsiveness.

On Mar 3, 2017 4:19 PM, Jens Alfke  wrote:

> On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
>
> Then mess with the '5' until you find a good value.

A common rule of thumb with thread pools is to allocate one thread per CPU core.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7C1941086b72c84122e80e08d462942220%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=ytN2iaMT9eiK%2BktzJva1shKgrBfYhxeUHyesJscJnB8%3D&reserved=0

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Warren Young

On Mar 3, 2017, at 1:13 PM, Andrew Brown  
wrote:
> 
> This is being run on 64 and 72 core machines, and the more cores I run it on, 
> the slower it ends up going.

What happens if you rework the app to do only one thing, single-threaded, but 
run between 64 and 108 instances in parallel on the same system?  (108 == 72 * 
1.5, a common work factor value for parallel applications.)

You can use a tool like GNU parallel to manage the work queue:

https://www.gnu.org/software/parallel/

If nothing else, this would separate out the in-process locking from the file 
locking, thereby making more clear where the delays are coming from.

Your __raw_spin_lock() result isn’t terribly enlightening because that’s a very 
low level Linux kernel mechanism.  It doesn’t tell us whether the slowdown is 
due to file locking in SQLite, mutexes in SQLite, some non-optimal pthreads use 
in .NET Core, etc.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Jens Alfke


> On Mar 3, 2017, at 3:51 PM, Simon Slavin  wrote:
> 
> Then mess with the '5' until you find a good value.

A common rule of thumb with thread pools is to allocate one thread per CPU 
core. 

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Can sqlite3_blob_write be done through SQL?

2017-03-03 Thread Yuri


On 03/01/2017 23:41, Clemens Ladisch wrote:


It would certainly be possible to add your own user-defined SQL function
to call sqlite3_blob_write().



I think this should be added to sqlite itself. Writing a portion of blob 
is very much like updating a field. There is currently no way to do this 
in SQL fashion in an efficient way. So such command can be chained with 
other SQL statements and make the client code much simpler.



Yuri

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Simon Slavin

On 3 Mar 2017, at 11:47pm, Scott Hess  wrote:

> I'd say you should consider switching to some sort of queue feeding a
> worker pool, then experimenting with pool sizes.

Agreed.  Set up 5 threads which have actual access to the database.  Have your 
1835 jobs queue up their jobs for servicing by those 5 threads.

Then mess with the '5' until you find a good value.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Scott Hess

I'd say you should consider switching to some sort of queue feeding a
worker pool, then experimenting with pool sizes.  Often problems reward the
first few threads you add, but at some point additional threads become a
negative unless the system is specifically designed for high thread counts
(and such design can be annoying for low-thread-count users).  There also
may be caching interactions which improve with a smaller number of threads.

Something else to try is to have multiple databases which are not sharing
page caches (to reduce locking).  It is entirely possible that having 4
databases each with 8 threads could be faster than one database with 32
threads, because they each keep out of each other's way, more.

[None of the above is really SQLite specific.]

-scott


On Fri, Mar 3, 2017 at 3:37 PM, Andrew Brown <
andrew.br...@economicmodeling.com> wrote:

> Well, in the situation I'm working with, my API is constructing 1835 small
> SQL jobs to be run, and then passing them off to a structure in which 72
> threads are running, each with their own db connection (I assume that's
> what you mean by a database handle, a DB connection, but please, correct me
> if I'm wrong!). So in this case, 72 database handles on my bigger server.
>
> Unfortunately, I'm not running the same queries over and over (one example
> is a 400gb database with 3-5 dimension columns and a few data columns, and
> this is slicing on that data) so preparing them will have somewhat less
> benefit in that sense than in other cases. That said, I can still try
> preparing all the statements before I run any.
>
>
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jens Alfke
> Sent: Friday, March 3, 2017 3:25 PM
> To: SQLite mailing list 
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 2:52 PM, Josh Hunsaker 
> wrote:
> >
> > Your problem might be mitigated if you could compile your queries in
> advance.
>
> Precompiled statements are a must if you want the best performance (and
> you’re running the same queries over and over.)
>
> Andrew, how many database handles are open?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.
> brown%40economicmodeling.com%7C5fba3607e2164acbf45808d4628c85a6%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=
> 1xM5isGNHfqUu7yWfoohbYBryDxcgzed0Qlz37K0FDw%3D&reserved=0
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Andrew Brown

Well, in the situation I'm working with, my API is constructing 1835 small SQL 
jobs to be run, and then passing them off to a structure in which 72 threads 
are running, each with their own db connection (I assume that's what you mean 
by a database handle, a DB connection, but please, correct me if I'm wrong!). 
So in this case, 72 database handles on my bigger server.

Unfortunately, I'm not running the same queries over and over (one example is a 
400gb database with 3-5 dimension columns and a few data columns, and this is 
slicing on that data) so preparing them will have somewhat less benefit in that 
sense than in other cases. That said, I can still try preparing all the 
statements before I run any.

-Original Message-
From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On 
Behalf Of Jens Alfke
Sent: Friday, March 3, 2017 3:25 PM
To: SQLite mailing list 
Subject: Re: [sqlite] Massively multithreaded SQLite queries

> On Mar 3, 2017, at 2:52 PM, Josh Hunsaker  wrote:
> 
> Your problem might be mitigated if you could compile your queries in advance.

Precompiled statements are a must if you want the best performance (and you’re 
running the same queries over and over.)

Andrew, how many database handles are open?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-users&data=01%7C01%7Candrew.brown%40economicmodeling.com%7C5fba3607e2164acbf45808d4628c85a6%7C7cc1677566a34e8b80fd5b1f1db15061%7C0&sdata=1xM5isGNHfqUu7yWfoohbYBryDxcgzed0Qlz37K0FDw%3D&reserved=0
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Jens Alfke


> On Mar 3, 2017, at 2:52 PM, Josh Hunsaker  wrote:
> 
> Your problem might be mitigated if you could compile your queries in advance.

Precompiled statements are a must if you want the best performance (and you’re 
running the same queries over and over.)

Andrew, how many database handles are open?

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Josh Hunsaker

On Fri, Mar 3, 2017 at 1:13 PM, Andrew Brown wrote:
>
> Any tips to handle massively multithreaded side by side chunked
> queries on the same database?

In my (limited) experience, it seems that multithreaded SQLite
acquires a lock on a shared, in-memory b-tree every time a
statement is prepared.  It might not be the database read itself,
but statement preparation that is causing threads to wait for
each other.  Your problem might be mitigated if you could compile
your queries in advance.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite3 feature or regression

2017-03-03 Thread Simon Slavin

On 3 Mar 2017, at 1:12pm, Vermes Mátyás  wrote:

> The attached ruby script demonstrates a feature of the newer sqlite3 
> libraries, which may be a regression.

Just a note that you cannot attach files to posts to this list.  If your file 
is text, just paste it into a message.  Otherwise please put it on the web and 
post a URL.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UNION

2017-03-03 Thread don v nielsen


Might I suggest: https://www.w3schools.com/sql/

dvn

On 03/01/2017 09:02 AM, jose isaias cabrera wrote:

Ryan,

I just want to thank you for your kindness and display of goodwill to
mankind.  This is a great response.  I even learned something from this
post.  Thanks so much for your responses.  There are others like us in
this group that love to learn and your posts always are well received.
Thanks.  In Spanish we say, "muchas gracias."

josé

On 2017-03-01 09:51, R Smith wrote:


On 2017/03/01 3:40 AM, do...@mail.com wrote:


# SELECT * FROM processors UNION SELECT * FROM storage;
Error: SELECTs to the left and right do not have the same number of
result columns.

All tables that I created in my database have differing column names,
values, and amounts of columns with the noted exception of the one
column which is common (board). I've no idea what to do now.

Why is this an error?

I think that perhaps you are new to SQL and other replies assumed you simply 
wanted what you wrote. I could be wrong, but just in case, here are my 
suggestions:

Perhaps what you would rather like to do is JOIN these tables and not UNION 
them?

Do you wish to match /every/ processor with /every/ board?
In this case, the statement should read: (Warning: this could produced 
excessively long listings)
SELECT * FROM processors, storage;

Do you wish to match only processors and storage that fit on the same boards?
In this case the statement might read something like:
SELECT *
FROM processors
JOIN storage USING board
;

Do you wish to list /all/ processors and add the storage options for the same 
board /if/ there are any?
In this case the statement might read something like:
SELECT *
FROM processors
LEFT JOIN storage ON storage.board = processors.board
;

As you can see, lots of different things can be achieved. A quick course in SQL 
via perhaps W3Schools will teach all these in a few hours.

Cheers,
Ryan

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Andrew Brown

Hello,

Based on my reading of the documentation it seems like SQLite is a great 
candidate for cases where you have a large number of threads that only need to 
read a database simultaneously, in our case for fast data lookup, aggregation, 
etc. I've been able to generate SQL queries that do this, but once we start 
running them on a large server with a lot of multithreading going on, I find 
that we spend a lot of time in __raw_spin_lock - perhaps 75%+ of the time (In 
one case, 87% of 350 seconds x 72 cores was spent in __raw_spin_lock). This is 
being run on 64 and 72 core machines, and the more cores I run it on, the 
slower it ends up going.

To give a bit more detail, I'm working with dotnet core, have written a custom 
sqlite wrapper (since the dotnet core one lacks the ability to set connection 
flags beyond readonly, and doesn't have sqlite_prepare_v2() implemented), and 
I'm running on linux against a bunch of SQLite files in the 2gb-400gb size 
range. Individual queries are wicked fast, but once I start spreading the load 
over all the cores by running simultaneous queries I lose the performance 
advantage and it actually becomes significantly slower.

Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried shared 
cache, read uncommitted. Tried without shared cache, read uncommitted. Tried 
WAL. If I write a less efficient query, I spend less time in __raw_spin_lock, 
but of course then it takes longer for the queries themselves to return.

Any tips to handle massively multithreaded side by side chunked queries on the 
same database?

Thank you in advance,
Andrew Brown
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] sqlite3 feature or regression

2017-03-03 Thread Vermes Mátyás

Hi!
The attached ruby script demonstrates a feature of the newer sqlite3 libraries, 
which may be a regression.
-- 
Vermes Mátyás  
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread Dave Blake

Thanks all for your input, it has really helped.

In my real world application tmp_keep is a temporary table populated by
examinining a number of other tables etc., and I suddenly realsied that it
could even contain duplicate ids.  Sloppy thinking on my part.

I get the best results by creating another table:
CREATE TEMPORARY TABLE tmp_keep_unique (id1 integer primary key);
INSERT INTO tmp_keep_unique SELECT DISTINCT id1 from tmp_keep;

It takes far longer to create an index on tmp_keep, than it save times on
the above query with one.

Then
*with a primary key on both table1 and tmp_keep_unique*DELETE FROM table1
WHERE id1 NOT IN (SELECT id1 FROM tmp_keep_unique);

is acceptably efficient.

On SQLite NOT EXISTS is quicker than NOT IN, but I also need to use same
SQL on a MySQL implementation, and it behaves the oppoiste.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread Clemens Ladisch

Dave Blake wrote:
> Yes the id1 are integer primary keys.

In both tables?  If yes, then there is not much you could do, and the
problem probably are all the modifications done to the actual table
and its indexes.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread Simon Slavin

On 3 Mar 2017, at 10:53am, Dave Blake  wrote:

> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
> 
> But this becomes inefficient when tmp_keep is large.

Do you have an appropriate index on tmp_keep ?

CREATE INDEX tk_id1 ON tmp_keep (id1)

then try it again.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread Keith Medcalf


DELETE FROM table1 WHERE NOT EXISTS (SELECT 1 from tmp_keep WHERE id1 = 
table1.id1);

Does it in a single pass by doing a correlated subquery on each row in table1 
to see if the id is in tmp_keep.

> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Dave Blake
> Sent: Friday, 3 March, 2017 03:53
> To: SQLite mailing list
> Subject: [sqlite] Deleting records from a large table
> 
> Say table1 has more then 50 records, and there is a second table
> tmp_keep with the ids of the records in table1 to be kept, the rest need
> to
> be deleted. The number of records in tmp_keep can vary from 0 to all the
> records in table1, with any values in between.
> 
> What is the best strategy for doing the deletion?
> 
> For deleting a large number of records (tmp_keep is small), this works
> fine:
> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
> 
> But this becomes inefficient when tmp_keep is large.
> 
> Any suggestions?
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Fossil version 2.0

2017-03-03 Thread Richard Hipp

Fossil is the version control system (VCS) written specifically to
support SQLite.  See https://www.fossil-scm.org/ for details.

Version 2.0 of Fossil has just been released.  The 2.0 release
supports the use of SHA3-256 hashes for naming files and check-ins in
the repository.  SHA1 hashes are also still supported for backwards
compatibility.  In other words, Fossil 2.0 will read both SHA1 and
SHA3-256 hashes.  On the other hand, older Fossil-1.x releases will
only read SHA1 hashes and will generate errors if they encounter
SHA3-256 hashes.

To be clear:  Fossil-2.0 and Fossil-1.x are fully compatible and will
seamlessly interoperate on repositories that contain only SHA1
content.  However, once a repository acquires SHA3-256 content,
Fossil-1.x will stop working and you will need to upgrade to
Fossil-2.0.

If you are using Fossil to access the SQLite source repository, you
should upgrade since we will soon begin pushing SHA3-256 hashed
check-ins and files to the main SQLite repository.  Once that happens,
you will no longer be able to pull or clone the SQLite repository
using older 1.x versions of Fossil.

The web interface to Fossil continues to operate exactly has it did
before.  All legacy URLs (even those involving SHA1 hash names)
continue to work without change.  If you only access Fossil through
the web interface, no action is necessary.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread Dave Blake

Could be keep almost all the records so ~50, but it varies greatly so
sometimes will be just keep 10. I can adjust approach depending on size if
necessary.

Yes the id1 are integer primary keys.

Table1 has a number of indexes and views, so the create new table approach
is less attractive

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Index usefulness for GROUP BY

2017-03-03 Thread Clemens Ladisch

Jeffrey Mattox wrote:
> is my index on weekday worthwhile, time-wise and space-wise?  (Query
> speed is not a big issue for me, and the DB is relatively small

Indexes are optimizations.  In a small DB, the effect is probably not
noticeable, which implies that you should not bother.

Where exactly the point is at which the index becomes useful in your
system is something which you have to measure yourself.

Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread Clemens Ladisch

Dave Blake wrote:
> For deleting a large number of records (tmp_keep is small), this works
> fine:
> DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);
>
> But this becomes inefficient when tmp_keep is large.

SQLite usually creates a temporary index for the values in the IN clause.
How large is "large"?

Are the id1 values integers?  Then you can make tmp_keep.id1 the INTEGER
PRIMARY KEY.

Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Deleting records from a large table

2017-03-03 Thread R Smith



On 2017/03/03 12:53 PM, Dave Blake wrote:

Say table1 has more then 50 records, and there is a second table
tmp_keep with the ids of the records in table1 to be kept, the rest need to
be deleted. The number of records in tmp_keep can vary from 0 to all the
records in table1, with any values in between.

What is the best strategy for doing the deletion?

For deleting a large number of records (tmp_keep is small), this works
fine:
DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);

But this becomes inefficient when tmp_keep is large.


BEGIN TRANSACTION;
ALTER TABLE table1 RENAME TO tmp1;
CREATE TABLE table1 (
   -- Your standard Table creation code here for table1...
);
INSERT INTO table1 SELECT tmp1.* FROM tmp1 JOIN tmp_keep ON tmp_keep.id1 
= tmp1.id1;

DROP TABLE tmp1;
COMMIT;

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Deleting records from a large table

2017-03-03 Thread Dave Blake

Say table1 has more then 50 records, and there is a second table
tmp_keep with the ids of the records in table1 to be kept, the rest need to
be deleted. The number of records in tmp_keep can vary from 0 to all the
records in table1, with any values in between.

What is the best strategy for doing the deletion?

For deleting a large number of records (tmp_keep is small), this works
fine:
DELETE FROM table1 WHERE id1 NOT IN (SELECT id1 FROM tmp_keep);

But this becomes inefficient when tmp_keep is large.

Any suggestions?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Index usefulness for GROUP BY

2017-03-03 Thread Hick Gunter

YES. AFAIK if SQLite detects that the rows are/can be made to be returned in 
GROUP BY order it can use internal variables to accumulate the group results. 
This is expected to be significantly faster than locating and updating a 
temporary BTree row for each record scanned.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Jeffrey Mattox
Gesendet: Freitag, 03. März 2017 11:30
An: SQLite mailing list 
Betreff: [sqlite] Index usefulness for GROUP BY

Given this DB schema (simplified, there are other columns):

  CREATE TABLE History (
history_ID  INTEGER PRIMARY KEY,
gameCount  INTEGER,
weekday  INTEGER, /* 0=Sunday, 6=Saturday */
hour  INTEGER, /* (0..23) */
datetime  INTEGER /* unix datetime */ );

  CREATE INDEX  Idx_weekday  ON  History( weekday );

-
Now, I look at a recent set of rows...

SELECT TOTAL(gameCount), weekday  FROM History
  WHERE datetime >= strftime('%s','now','-28 days')
  GROUP BY weekday
  ORDER BY 1 DESC

QUERY PLANS:
without the index:
  0  0  0  SCAN TABLE History
  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
  0  0  0  USE TEMP B-TREE FOR ORDER BY

with the index:
  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
  0  0  0  USE TEMP B-TREE FOR ORDER BY

Either way, the entire table is scanned (right?).  My index covers the entire 
table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the 
WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on 
weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue 
for me, and the DB is relatively small -- there are, at most, 60 rows added per 
day.  Memory is plentiful, OSX).

---
Jeff

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Index usefulness for GROUP BY

2017-03-03 Thread Jeffrey Mattox

Given this DB schema (simplified, there are other columns):

  CREATE TABLE History (
history_ID  INTEGER PRIMARY KEY,
gameCount  INTEGER,
weekday  INTEGER, /* 0=Sunday, 6=Saturday */
hour  INTEGER, /* (0..23) */
datetime  INTEGER /* unix datetime */ );

  CREATE INDEX  Idx_weekday  ON  History( weekday );

-
Now, I look at a recent set of rows...

SELECT TOTAL(gameCount), weekday  FROM History
  WHERE datetime >= strftime('%s','now','-28 days')
  GROUP BY weekday
  ORDER BY 1 DESC

QUERY PLANS:
without the index:
  0  0  0  SCAN TABLE History
  0  0  0  USE TEMP B-TREE FOR GROUP BY  <-- weekday (7 groups)
  0  0  0  USE TEMP B-TREE FOR ORDER BY

with the index:
  0  0  0  SCAN TABLE History USING INDEX Idx_weekday
  0  0  0  USE TEMP B-TREE FOR ORDER BY

Either way, the entire table is scanned (right?).  My index covers the entire 
table, but the TEMP B-TREE FOR GROUP BY contains only the rows matching the 
WHERE clause, so the TEMP B-TREE is much smaller (right?).  So, is my index on 
weekday worthwhile, time-wise and space-wise?  (Query speed is not a big issue 
for me, and the DB is relatively small -- there are, at most, 60 rows added per 
day.  Memory is plentiful, OSX).

---
Jeff

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

38 matches

Mail list logo