Re: [sqlite] Best practices for huge batch inserts with WAL?

2017-04-24 Thread Scott Hess
What I'd do:

1) Look at the indices, and make sure the input is sorted to insert in
index order.  Also drop any unnecessary indices and add them back at the
end.  [Read the code for vacuum to see what kinds of things make sense to
defer.]

2) Bump up the cache a lot.  Inserting in sorted order makes this less
essential, but for a one-off like this it can't hurt.

3) Turn synchronous off while loading, and commit to starting over if you
get an OS-level crash.  Even with synchronous off, things should be
correctly ordered for app-level crashes.

4) Maybe use exclusive locking?  That wouldn't have a huge impact if you're
batching inserts, I expect.

---

WRT #3, you could also consider journal_mode off (or memory, if your code
requires transactions to work right).  In that case, the database state is
indeterminate if you have an app-level crash, but you should be fine if you
make it to the end.

WRT #1, I would consider fts as an index for these purposes, but it may
require schema changes to make is possible to selectively disable/enable
the indexing.  See https://sqlite.org/fts5.html#external_content_tables .
I'm not sure there would be much gain from disabling fts when loading,
though, as long as your bulk transactions are large.

-scott


On Mon, Apr 24, 2017 at 11:00 AM, Jens Alfke  wrote:

> I’m importing a large data set with a lot of rows — an entire Wikipedia
> dump, about 60GB, one article per row — into a brand new SQLite database in
> WAL mode. What’s the fastest way to import it?
>
> I started with one big transaction, but noted that (of course) the WAL
> file was growing rapidly while the main database file stayed tiny. I
> figured this would become inefficient, so I stopped the run and adjusted my
> code to commit and re-open a transaction every 10,000 rows.
>
> With that, the import started quickly, but as time went on the commits
> were taking longer and longer, so the process was spending most of its time
> committing. (I wasn’t able to finish the job, as it ran into an unrelated
> fatal error in my code about ⅔ of the way through.)
>
> Would it have been faster to use a single transaction? Even if the commit
> at the end is epic-length, it wouldn’t be rewriting the b-tree nodes over
> and over again. If so, would periodic WAL checkpoints help?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Memoization in sqlite json1 functions

2017-03-23 Thread Scott Hess
What is the goal, though?  Your app knows your data and performance needs,
so if you find yourself running the same query to read off the same result
set over and over, change your app to do the right thing.

If it's somehow more convenient to have SQLite do it, populate a temporary
table and pull the data from that, which doesn't require any new API at all.

-scott


On Thu, Mar 23, 2017 at 6:23 AM, Domingo Alvarez Duarte 
wrote:

> Hello Richard !
>
> Yes I do see "because the implementation parsing the JSON anew each time"
> and this is a recurring pattern in some sqlite functions, would be nice if
> we could have a "session/query/row" storage space to store query
> information that can be  reused, for example on the json functions we could
> reuse an already parsed json field several times, another example we could
> have session/query/row variables.
>
> We could have something like "sqlite3_set_auxdata" but with granularity
> for row/query/session and as a bonus would be nice to have session
> variables like https://dev.mysql.com/doc/refman/5.7/en/user-variables.html
> .
>
> Cheers !
>
> On 23/03/17 08:30, Richard Hipp wrote:
>
>> On 3/22/17, Domingo Alvarez Duarte  wrote:
>>
>>> Hello Richard !
>>>
>>> I noticed that sqlite do not use any memoization in json1 functions.
>>>
>>> For example jsonExtractFunc and others parse the json string every time
>>> it's called even when the json string is the same.
>>>
>>> minimal example : "select json_extract(json, '$.name') name,
>>> json_extract(json, '$.address') name from some_table;"
>>>
>>> Could be possible to have some kind of memoization as a general option
>>> for any sqlite functions ?'
>>>
>> In paragraph 3.0 of https://www.sqlite.org/json1.html:  "All json1
>> functions currently throw an error if any of their arguments are BLOBs
>> because BLOBs are reserved for a future enhancement in which BLOBs
>> will store the binary encoding for JSON."
>>
>> But let me ask this:  Have you actually measured a performance
>> problem?  Or are you just assuming that because the implementation
>> parses the JSON anew each time it see it that it must therefore be
>> inefficient?
>>
>>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Scott Hess
You might want to try enabling mmap mode:
  pragma mmap_size = 4294967296;
or something like that.  Try to make it larger than your databases.  I'd
expect that if you're running with that many cores, you're _probably_
running in a 64-bit address space, so it'll probably work.

-scott


On Fri, Mar 3, 2017 at 5:22 PM, Andrew Brown <
andrew.br...@economicmodeling.com> wrote:

> Yes, each thread has its own connection.
>
> On Mar 3, 2017 4:45 PM, Keith Medcalf  wrote:
>
> Does each thread have its own connection?
>
> > -Original Message-
> > From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> > On Behalf Of Andrew Brown
> > Sent: Friday, 3 March, 2017 13:14
> > To: sqlite-users@mailinglists.sqlite.org
> > Subject: [sqlite] Massively multithreaded SQLite queries
> >
> > Hello,
> >
> > Based on my reading of the documentation it seems like SQLite is a great
> > candidate for cases where you have a large number of threads that only
> > need to read a database simultaneously, in our case for fast data lookup,
> > aggregation, etc. I've been able to generate SQL queries that do this,
> but
> > once we start running them on a large server with a lot of multithreading
> > going on, I find that we spend a lot of time in __raw_spin_lock - perhaps
> > 75%+ of the time (In one case, 87% of 350 seconds x 72 cores was spent in
> > __raw_spin_lock). This is being run on 64 and 72 core machines, and the
> > more cores I run it on, the slower it ends up going.
> >
> > To give a bit more detail, I'm working with dotnet core, have written a
> > custom sqlite wrapper (since the dotnet core one lacks the ability to set
> > connection flags beyond readonly, and doesn't have sqlite_prepare_v2()
> > implemented), and I'm running on linux against a bunch of SQLite files in
> > the 2gb-400gb size range. Individual queries are wicked fast, but once I
> > start spreading the load over all the cores by running simultaneous
> > queries I lose the performance advantage and it actually becomes
> > significantly slower.
> >
> > Have tried setting SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX. Tried
> > shared cache, read uncommitted. Tried without shared cache, read
> > uncommitted. Tried WAL. If I write a less efficient query, I spend less
> > time in __raw_spin_lock, but of course then it takes longer for the
> > queries themselves to return.
> >
> > Any tips to handle massively multithreaded side by side chunked queries
> on
> > the same database?
> >
> > Thank you in advance,
> > Andrew Brown
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users=01%7C01%7Candrew.
> brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0=
> gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D=0
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users=01%7C01%7Candrew.
> brown%40economicmodeling.com%7Cc0018bf7b8cd49bb588608d46297c70b%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0=
> gad9Y8BC67c4ZAC05GckCghNHgKmm8i5piwgjM2MXgw%3D=0
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Scott Hess
Yes, if they are lock bound, then they need to have the number of cores
which reduces the locking overhead to the point where it's not degrading
performance too much.  Though I guess the OP really didn't say that (more
CPUs may spend more time in spinlocks and still spend less wallclock time).

Another thing to look at it whether any queries can be more effectively
scheduled.  Having hundreds of completely-unrelated queries seems unlikely
to me.  More likely is that you have a smaller number of queries which are
targeting various different bind parameters.  Preparing a particular query
once, then looping and running each set of bind parameters on one thread is
probably going to be _much_ more efficient.

-scott


On Fri, Mar 3, 2017 at 5:03 PM, Warren Young  wrote:

> On Mar 3, 2017, at 5:51 PM, Keith Medcalf  wrote:
> >
> > No, the good rule of thumb is to allocate one thread per CPU.
>
> It depends on the workload.  Parallel make (e.g. “make -jN” in GNU make)
> typically improves in speed past N=core count to about 1.5x the core count.
>
> SQLite seems like a similar kind of workload: lots of CPU *and* disk I/O,
> so that you need a bit of oversubscription to keep all the cores busy,
> because some threads/processes will be stalled on I/O.
>
> Not that any of this is relevant at the current point, since the OP is
> currently neither I/O bound nor CPU-bound, but lock-bound.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Massively multithreaded SQLite queries

2017-03-03 Thread Scott Hess
I'd say you should consider switching to some sort of queue feeding a
worker pool, then experimenting with pool sizes.  Often problems reward the
first few threads you add, but at some point additional threads become a
negative unless the system is specifically designed for high thread counts
(and such design can be annoying for low-thread-count users).  There also
may be caching interactions which improve with a smaller number of threads.

Something else to try is to have multiple databases which are not sharing
page caches (to reduce locking).  It is entirely possible that having 4
databases each with 8 threads could be faster than one database with 32
threads, because they each keep out of each other's way, more.

[None of the above is really SQLite specific.]

-scott


On Fri, Mar 3, 2017 at 3:37 PM, Andrew Brown <
andrew.br...@economicmodeling.com> wrote:

> Well, in the situation I'm working with, my API is constructing 1835 small
> SQL jobs to be run, and then passing them off to a structure in which 72
> threads are running, each with their own db connection (I assume that's
> what you mean by a database handle, a DB connection, but please, correct me
> if I'm wrong!). So in this case, 72 database handles on my bigger server.
>
> Unfortunately, I'm not running the same queries over and over (one example
> is a 400gb database with 3-5 dimension columns and a few data columns, and
> this is slicing on that data) so preparing them will have somewhat less
> benefit in that sense than in other cases. That said, I can still try
> preparing all the statements before I run any.
>
>
>
> -Original Message-
> From: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org]
> On Behalf Of Jens Alfke
> Sent: Friday, March 3, 2017 3:25 PM
> To: SQLite mailing list 
> Subject: Re: [sqlite] Massively multithreaded SQLite queries
>
>
> > On Mar 3, 2017, at 2:52 PM, Josh Hunsaker 
> wrote:
> >
> > Your problem might be mitigated if you could compile your queries in
> advance.
>
> Precompiled statements are a must if you want the best performance (and
> you’re running the same queries over and over.)
>
> Andrew, how many database handles are open?
>
> —Jens
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> https://na01.safelinks.protection.outlook.com/?url=
> http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%
> 2Fmailman%2Flistinfo%2Fsqlite-users=01%7C01%7Candrew.
> brown%40economicmodeling.com%7C5fba3607e2164acbf45808d4628c85a6%
> 7C7cc1677566a34e8b80fd5b1f1db15061%7C0=
> 1xM5isGNHfqUu7yWfoohbYBryDxcgzed0Qlz37K0FDw%3D=0
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Thread safety of serialized mode

2017-02-14 Thread Scott Hess
On Tue, Feb 14, 2017 at 5:05 PM, Darren Duncan 
wrote:

> On 2017-02-14 4:46 PM, Richard Hipp wrote:
>
>>  This is yet another reason why I say "threads are evil".  For
>> whatever reason, programmers today think that "goto" and pointers and
>> assert() are the causes of all errors, but threads are cool and
>> healthful.  Entire programming languages are invited (I'm thinking of
>> Java) to make goto and pointers impossible or to make assert()
>> impossible (Go) and yet at the same time encourage people to use
>> threads.  It boggles the mind 
>>
>
> There is nothing inherently wrong with threads in principle, just in how
> some people implement them.  Multi-core and multi-CPU hardware is normal
> these days and is even more the future.  Being multi-threaded is necessary
> to properly utilize the hardware, or else we're just running on a single
> core and letting the others go idle.  The real problem is about properly
> managing memory.  Also giving sufficient hints to the programming language
> so that it can implicitly parallelize operations.  For example, want to
> filter or map or reduce a relation and have 2 cores, have one core evaluate
> half the tuples and another evaluate the other half, and this can be
> implicit simply by declaring the operation associative and commutative and
> lacking of side-effects or whatever.


I'm with Dr Hipp - threads are evil.  It's not so much how they work when
everything goes well, it's that it's so challenging to align everything so
that it goes well.  My experience is that even very talented programmers
write bugs into their multi-threaded code without realizing it.  I think
what happens is that multi-threaded code often makes things much more
complicated than they look, so if you write to the limits of the complexity
you can understand, you're already over your head.

IMHO, if you're using a message-passing system which does implicit
parallelization, well, _you're_ not using threads, the implementation is
using threads on your behalf.  That I can get behind.  Unfortunately,
decent systems along those lines are like nuclear fusion, they've been just
around the corner for decades, now :-).

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Patch for consideration: auto_vacuum slack.

2017-02-13 Thread Scott Hess
On Mon, Feb 13, 2017 at 12:13 PM, Richard Hipp  wrote:
>
> Scott:  The motivation for your patch seem to be to get auto-vacuum to
> run a little faster.  But if performance is your goal, why not just
> turn auto-vacuum off?  Or, failing that, set it to INCREMENTAL and
> then run "PRAGMA incremental_vacuum" when "PRAGMA freelist_count"
> reaches some threshold?
>

My first implementation did this manually using incremental-vacuum in a
layer above SQLite, because I already had a convenient function which
detected when updates had happened.  For explicit transactions, this is
pretty reasonable, albeit inelegant depending on whether you provide a
Commit() function or have to detect a COMMIT statement using string
operations.  But for auto-commit statements you can't get your incremental
vacuum into the commit scope.  [AFAICT, the commit and update hooks do not
allow running anything against the database.]  This version is certainly
doable if my suggested patch is considered beyond the pale.

I then reimplemented as more of an auto-incremental-vacuum, basically the
same code as I posted except that it keyed off the incremental-vacuum
flag.  It was pretty clean, but I did find myself thinking that it wouldn't
work well if you wrote code assuming the periodic cleaning was happening
automatically, but someone had neglected to setup the slack-sizing pragma.
In that case it would just stop collecting garbage, and your code would
never do it explicitly.  With the auto-vacuum version, it would just fall
back to auto-vacuum-every-time.  [I'm not sure which is actually worse, in
the end.]

Mostly, my motivation was that it felt like there was a gap between the
SQLITE_FCNTL_CHUNK_SIZE feature and auto_vacuum/incremental_vacuum, and
when I looked the gap was pretty clean to fill.  Having it key off of the
actual "Should I vacuum free pages" decision seemed more reasonable than
writing external code which makes educated guesses about what's going on.

I'm not really looking at "Should I rewrite all of the auto-vacuum
databases I can find to use manual vacuum".  I think a change like this
would make auto-vacuum databases behave a bit more like non-auto-vacuum
databases in terms of reuse of free space.

---

WRT your list of reasons for "Why even bother", another consideration to
add is that fragmentation on SSDs may not be the big problem it is on hard
drives.  If you have your page sizes reasonably aligned with the units of
the underlying filesystem and hardware, it may not matter much whether a
particular page is next to other pages which are logically adjacent in the
btree.  Of course, there's a lot of complexity in there, like whether the
OS continues to do read-ahead for SSDs.

That said, AFAICT, there's no code in place to make page placement
decisions based on locality, so I would expect that a
non-auto/incremental-vacuum database being actively updated would also
generate fragmentation.

---

I'm not going to get into the "Why not just use VACUUM".  My experience is
that this is a tough issue with databases in general, and developers often
have troubles wrapping their heads around it.  In my case, I'm thinking of
how to mitigate some concerns developers had with existing code, not how
they might write greenfield code going forward.  [As if they'd even ask my
advice before writing code and shipping it.  They only ask me things after
they've already shipped a problem :-).]

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Patch for consideration: auto_vacuum slack.

2017-02-13 Thread Scott Hess
A developer was asking me questions about auto_vacuum I/O characteristics,
because they were worried about "churn", where a page is moved to fill a
freelist gap, then soon enough a new page is allocated anyhow, so the move
wasn't really necessary.  This made me wonder if auto_vacuum recognized
that in some cases SQLITE_FCNTL_CHUNK_SIZE would make the move pointless
(because it wouldn't even save any space).

Below is a patch which implements [PRAGMA auto_vacuum_slack_pages = N].
This setting allows client code to signal that auto_vacuum can leave pages
on the freelist until releasing them would allow a db size change.  I think
this would reduce the number of auto_vacuum moves for many databases.  The
FCNTL and PRAGMA could obviously be integrated into one setting, I haven't
done that in this patch.  Also, the test in autoVacuumCommit() could
certainly be more cleanly combined with the existing code, I left it
distinct to make the logic easier to see.  I'd be happy to adjust the code
to be cleaner and write tests, mostly I didn't want to invest time in that
without floating a prototype to see if it would be an interesting addition.

Thanks,
scott


--- Patch follows ---
Index: src/btree.c
==
--- src/btree.c
+++ src/btree.c
@@ -2810,10 +2810,50 @@
   sqlite3BtreeLeave(p);
   return rc;
 #endif
 }

+/*
+** Change the 'auto-vacuum-slack-pages' property of the database. If auto
vacuum
+** is enabled, this is the number of chunks of slack to allow before
+** automatically running an incremental vacuum.
+*/
+int sqlite3BtreeSetAutoVacuumSlackPages(Btree *p, int autoVacuumSlack){
+#ifdef SQLITE_OMIT_AUTOVACUUM
+  return SQLITE_READONLY;
+#else
+  BtShared *pBt = p->pBt;
+  int rc = SQLITE_OK;
+  u8 cc = (u8)autoVacuumSlack;
+  if( autoVacuumSlack>cc ){
+cc = 0xFF;
+  }
+
+  sqlite3BtreeEnter(p);
+  pBt->autoVacuumSlack = cc;
+  sqlite3BtreeLeave(p);
+  return rc;
+#endif
+}
+
+/*
+** Return the value of the 'auto-vacuum-slack-pages' property.
+*/
+int sqlite3BtreeGetAutoVacuumSlackPages(Btree *p){
+#ifdef SQLITE_OMIT_AUTOVACUUM
+  return 0;
+#else
+  int rc = 0;
+  sqlite3BtreeEnter(p);
+  if( p->pBt->autoVacuum!=0 ){
+rc = p->pBt->autoVacuumSlack;
+  }
+  sqlite3BtreeLeave(p);
+  return rc;
+#endif
+}
+

 /*
 ** Get a reference to pPage1 of the database file.  This will
 ** also acquire a readlock on that file.
 **
@@ -3651,17 +3691,31 @@
 ** i.e. the database has been reorganized so that only the first *pnTrunc
 ** pages are in use.
 */
 static int autoVacuumCommit(BtShared *pBt){
   int rc = SQLITE_OK;
+  int bShouldVacuum = pBt->autoVacuum && !pBt->incrVacuum;
   Pager *pPager = pBt->pPager;
   VVA_ONLY( int nRef = sqlite3PagerRefcount(pPager); )

   assert( sqlite3_mutex_held(pBt->mutex) );
   invalidateAllOverflowCache(pBt);
   assert(pBt->autoVacuum);
-  if( !pBt->incrVacuum ){
+  if( bShouldVacuum && pBt->autoVacuumSlack ){
+Pgno nOrig;/* Database size before freeing */
+Pgno nFree;/* Number of pages on the freelist initially */
+
+nOrig = btreePagecount(pBt);
+nFree = get4byte(>pPage1->aData[36]);
+bShouldVacuum =
+(nOrig-nFree)/pBt->autoVacuumSlack < nOrig/pBt->autoVacuumSlack;
+/* TODO: When integrating this test with the following code, contrive
to
+** trim to the integral chunk boundary, rather than trimming the
entire free
+** list.
+*/
+  }
+  if( bShouldVacuum ){
 Pgno nFin; /* Number of pages in database after autovacuuming
*/
 Pgno nFree;/* Number of pages on the freelist initially */
 Pgno iFree;/* The next page to be freed */
 Pgno nOrig;/* Database size before freeing */


Index: src/btree.h
==
--- src/btree.h
+++ src/btree.h
@@ -76,10 +76,12 @@
 int sqlite3BtreeSecureDelete(Btree*,int);
 int sqlite3BtreeGetOptimalReserve(Btree*);
 int sqlite3BtreeGetReserveNoMutex(Btree *p);
 int sqlite3BtreeSetAutoVacuum(Btree *, int);
 int sqlite3BtreeGetAutoVacuum(Btree *);
+int sqlite3BtreeSetAutoVacuumSlackPages(Btree *, int);
+int sqlite3BtreeGetAutoVacuumSlackPages(Btree *);
 int sqlite3BtreeBeginTrans(Btree*,int);
 int sqlite3BtreeCommitPhaseOne(Btree*, const char *zMaster);
 int sqlite3BtreeCommitPhaseTwo(Btree*, int);
 int sqlite3BtreeCommit(Btree*);
 int sqlite3BtreeRollback(Btree*,int,int);

Index: src/btreeInt.h
==
--- src/btreeInt.h
+++ src/btreeInt.h
@@ -410,10 +410,11 @@
   BtCursor *pCursor;/* A list of all open cursors */
   MemPage *pPage1;  /* First page of the database */
   u8 openFlags; /* Flags to sqlite3BtreeOpen() */
 #ifndef SQLITE_OMIT_AUTOVACUUM
   u8 autoVacuum;/* True if auto-vacuum is enabled */
+  u8 autoVacuumSlack;   /* Optional pages of slack for auto-vacuum */
   u8 incrVacuum;/* True if incr-vacuum is enabled */
   u8 bDoTruncate;   

Re: [sqlite] [OT] mySQL ORDER BY clause in Views

2017-01-19 Thread Scott Hess
On Thu, Jan 19, 2017 at 1:03 PM, R Smith <rsm...@rsweb.co.za> wrote:
> On 2017/01/19 9:01 PM, Simon Slavin wrote:
>> On 19 Jan 2017, at 6:54pm, Scott Hess <sh...@google.com> wrote:
>>> Just to be clear, you're saying that the VIEW has an ORDER BY, but
>>> when you SELECT from the VIEW you aren't using an ORDER BY?
>>>
>>> If your outer SELECT is using an ORDER BY and that is not respected,
>>> that seems like an egregious bug.  But if your outer SELECT has no
>>> ORDER BY, then that seems like depending on implementation-defined
>>> behavior.
>>
>> Scott wins.  I should have read more carefully.  I thought you were
>> referring to a VIEW with both WHERE and ORDER BY.
>>
>> If you perform a SELECT on a VIEW and your SELECT does not specify an
>> order, the results can be in any order.  To enforce an order on the SELECT,
>> you have to specify the order in the SELECT.
>
> Very correct - just to add a little bit as to WHY this is (since the OP
> seems to be on new territory) - A view, just like a table, is regarded by
> SQL as a set and it has no inherent order, nor can it have order explicitly
> per definition (meant here as "as it was defined"). The fact that SQLite
> allows ordering in a view is simply 'cause it is nice, in the same way that
> it will order output from a table when using an ordered index, but this
> behaviour is not required by the standard, nor guaranteed by SQLite, plus,
> it might change in future. You simply /must/ include an ORDER BY in the
> final SELECT if you wish to see ordered output.Yes... every time.
>
> It boils down to: If you do not add the ORDER BY clause explicitly to your
> final SELECT, then you have no right to expect an ordered outcome.

Note that your engine should do the right thing if you specify "too
many" ORDER BY clauses.  If you have an index which orders the data
the same way your ORDER BY clause does, then SQLite can happily
compile to the same code with or without the ORDER BY clause.

SQLite implements a VIEW by kind of inlining the VIEW's SELECT
statement.  There is no data storage associated with a SQLite VIEW.
So defining the VIEW without an ORDER BY and defining the SELECT with
the ORDER BY should result in identical performance, but with improved
correctness.

[Yes, I can see how it may be annoying to some to have to sprinkle
ORDER BY clauses all over the place, rather than having them
centralized.  But note that ORDER BY is specifying the order that the
code calling this current statement expects, so it is appropriate to
specify the ORDER BY at that point, rather than having it baked into
the schema somehow, where it can easily be misplaced.  Basically, if
your code wants the data in a particular order for a particular
statement, you should consider the act of specifying that ordering as
part of your API contract.]

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [OT] mySQL ORDER BY clause in Views

2017-01-19 Thread Scott Hess
On Wed, Jan 18, 2017 at 3:36 PM, Peter Haworth  wrote:
> I am in the process of converting an SQLite database to mySQL.  The SQLIte
> db includes several views with ORDER BY clauses that have always returned
> qualifying rows in the correct order.
>
> I am discovering that in mySQL issuing a SELECT statement against these
> same views works fine in terms of the order in which the rows are returned
>  if the SELECT does not include a WHERE clause but if I include a WHERE
> claus, the view's ORDER BY clause is ignored and the rows are returned in
> seemingly random order.
>
> Searching around the web suggests that this behavior is accepted as correct
> in mySQL although I haven't been able to find a justification for it and it
> seems to me that the SQLite behavior is correct.
>
> Does anyone know what the official SQL specification has to say on this
> topic, if anything?

Just to be clear, you're saying that the VIEW has an ORDER BY, but
when you SELECT from the VIEW you aren't using an ORDER BY?

If your outer SELECT is using an ORDER BY and that is not respected,
that seems like an egregious bug.  But if your outer SELECT has no
ORDER BY, then that seems like depending on implementation-defined
behavior.

If you have:

CREATE TABLE t (id INTEGER PRIMARY KEY, v TEXT);
INSERT INTO t VALUES (1, 'x'), (2, 'a'), (3, 'h');
CREATE VIEW tv (id, v) AS SELECT id, v FROM t ORDER BY v;

Then:
  SELECT * FROM tv ORDER BY v;
should always return rows as ordered by column v, but:
  SELECT * FROM tv;
can return rows in an implementation-defined order.  That order may
happen to be the order defined by CREATE VIEW, depending on
implementation.

I don't think the standard is likely to address this, because the code
which constructs the result set from the VIEW might use various
optimization tricks (such as temporary tables or indices) based on the
interactions of the various WHERE clauses.  I don't think a VIEW or a
TABLE would matter for this.

In fact, I'd expect it to be more likely to forbid ORDER BY in a VIEW
definition, which happens:
   https://msdn.microsoft.com/en-us/library/ms188385.aspx
"The ORDER BY clause is not valid in views, inline functions, derived
tables, and subqueries, unless either the TOP or OFFSET and FETCH
clauses are also specified. When ORDER BY is used in these objects,
the clause is used only to determine the rows returned by the TOP
clause or OFFSET and FETCH clauses. The ORDER BY clause does not
guarantee ordered results when these constructs are queried, unless
ORDER BY is also specified in the query itself."

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Executing multiple statements at once

2017-01-19 Thread Scott Hess
On Thu, Jan 19, 2017 at 9:24 AM, Igor Korot  wrote:
> Is it possible to write something like this:
>
> sqlite3_prepare_v2( m_db, "BEGIN TRANSACTION; CREATE TEMP TABLE temp
> AS SELECT * FROM mytable; DROP TABLE mytable; CREATE TABLE mytable(id
> INTEGER PRIMARY KEY, name TEXT PRIMARY KEY, salary INTEGER); INSERT
> INTO mytable SELECT * FROM temp; DROP TABLE temp; COMMIT;", -1, ,
> NULL );
>
> or I will have to issue multiple single SQL statements?

You could use sqlite3_exec(), which is internally just going to issue
multiple single SQL statements.

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] extension to run bash

2017-01-11 Thread Scott Hess
Though it may be cleaner long-term to implement system() to pass
individual arguments, rather than passing a single string which will
have to be re-processed by the shell.  So the API would end up like:
  UPDATE result SET nRows = system('wc', '-l', fileNames);

The reason I suggest this is because [fileNames] could have spaces
which would have to be escaped, but there are probably a dozen other
similar issues which are likely to come up.

[Though, yes, this means you'll have to use fork() and execlp() and
waitpid() to implement, rather than popen().  There are examples out
there of how to do that:
   
https://www.securecoding.cert.org/confluence/pages/viewpage.action?pageId=2130132
]

-scott

On Wed, Jan 11, 2017 at 1:38 PM, Roman Fleysher
 wrote:
> Yes, Richard, this is exactly what I mean.
>
> Roman
> 
> From: sqlite-users [sqlite-users-boun...@mailinglists.sqlite.org] on behalf 
> of Richard Hipp [d...@sqlite.org]
> Sent: Wednesday, January 11, 2017 4:34 PM
> To: SQLite mailing list
> Subject: Re: [sqlite] extension to run bash
>
> On 1/11/17, Roman Fleysher  wrote:
>> Dear SQLites,
>>
>> I am using exclusively sqlite3 shell for all the processing and may need
>> ability to run bash commands and assign result to a column. For example:
>>
>> UPDATE  result SET nRows =` wc -l fileNames` ;
>>
>> Here I used `` as would be in bash for command substitution. This would run
>> wc command (word count), count number of lines in each file listed in column
>> fileNames and update the row correspondingly.
>>
>> As far as I understand I should be able to write loadable extension to
>> accomplish this.
>
> No, You cannot do exactly what you describe with a loadable extension.
>
> But you could, perhaps, create a loadable extension that implements a
> new system() SQL function like this:
>
>UPDATE result SET nRows = system('wc -l ' || fileNames);
>
> Note that || is the SQL string concatenation operator.  You didn't
> say, but I'm guessing that fileNames is a column in the result table.
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Allow overriding unsigned 64-bit integer

2017-01-08 Thread Scott Hess
On Sun, Jan 8, 2017 at 6:23 AM, Kirill Müller  wrote:
> On 08.01.2017 14:20, Clemens Ladisch wrote:
>> Kirill Müller wrote:
>>> On 08.01.2017 12:54, Clemens Ladisch wrote:
 Kirill Müller wrote:
> ... there's no portable support for 64-bit integers.
> I'm working around this issue by using a struct of size 8

 A struct of size 8 does not behave the same as an integer type.

>>> For compiling sqlite.c, I'm using a 64-bit integer type. I'm only
>>> using the struct for compiling C++ modules that include sqlite.h.
>>
>> This means that you cannot link the C and C++ code together, because
>> they use different types.
>
> Good point. I was planning to cross fingers that an 8-byte POD struct will
> align nicely with a 64-bit integer. (Would need to take care about
> endianness, too.) I'm open to alternatives.
>>
>>> On the C++ side I'll find a suitable way to handle the data.
>>
>> How exactly do you get the C++ compiler to handle an 8-byte struct the
>> same as a 64-bit integer (a type which it supposedly does not have)?
>
> I'd use a multiprecision C++ class. My code uses 64-bit integers only for
> reading and writing column data.

SQLite uses 64-bit integers all over the place internally, and it's
going to expect those to behave like 64-bit integers.

If you need to compile on a platform which does not support an
integer-like 64-bit integer, and you can modify your code to store
large integers in blobs, you might try defining the 64-bit integer
type to be your largest int (presumably 32-bit) and see if that works.
There will be many things which would definitely _not_ work, but as
long as you don't ever present SQLite with an integer outside of the
32-bit range, it might work well enough.

[YMMV, I have never tried this myself.  I also suspect that you might
find that 64-bit integer support is not your last problem.]

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report + fix: SQLite 3.11+ broken on EBCDIC systems

2016-12-12 Thread Scott Hess
On Mon, Dec 12, 2016 at 9:30 PM, Bradford Larsen  wrote:
> An alternative possibility would be to revert to the pre-3.11 tokenizer on
> EBCDIC systems.  If I recall, the old tokenizer used a big switch statement
> with character literals instead of the 'aiClass' table.  I believe this
> would avoid the EBCDIC tokenizing troubles, at the expense of lower
> performance on those systems, and with the maintenance cost of keeping 2
> code paths around.

Or the build pass could compile a switch-based generator.  It would
make cross-compiling more complicated.

Perhaps easier would be to have a set of unrolled table initializers
which used the raw characters as indices, called as part of
sqlite3_init().

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there any way to close a connection without checkpointing the WAL?

2016-12-05 Thread Scott Hess
On Mon, Dec 5, 2016 at 1:34 PM, Simon Slavin <slav...@bigfraud.org> wrote:
> On 5 Dec 2016, at 9:26pm, Scott Hess <sh...@google.com> wrote:
>> An obvious solution would be to simply not call sqlite3_close(),
>> though that has various other unfortunate side effects.
>
> Yeah.  Don’t do that, eh ?

:-).  OK, the biggest unfortunate effect is that you don't release the
various resources like memory and file descriptors, which in turn
means that you spend your time populating exception policies for
automated leak detectors and the like.  Also, it means you have to
have high confidence that you're skipping the close because of
shutdown rather than some other reason (like you're closing the
database before deleting it).  All of this can probably be ground
through eventually, but this kind of thing is likely to cause people
to wonder if the code is doing the right thing.

> What you’re actually trying to do is disable/delay fsync() for
> a time for a particular storage device.  Can that not be done
> at device-driver level ?  It would then affect all the programs
> writing "I just quit" to their log files too, which would be an
> additional benefit.

I don't think so, that seems like it could result in corruption.
Unless you mean something more like causing the OS to block all fsync
calls on the filesystem and release them as a single uber-sync?

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there any way to close a connection without checkpointing the WAL?

2016-12-05 Thread Scott Hess
On Mon, Dec 5, 2016 at 1:38 PM, Richard Hipp <d...@sqlite.org> wrote:
> On 12/5/16, Scott Hess <sh...@google.com> wrote:
>> Is there any clean way to request no WAL checkpoint on sqlite3_close()?
>
> sqlite3_db_config(SQLITE_DBCONFIG_NO_CKPT_ON_CLOSE, db).  See
> https://www.sqlite.org/draft/c3ref/c_dbconfig_enable_fkey.html at the
> bottom.

Oh!  Now I wonder if I asked about this earlier :-).

Thanks,
scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Is there any way to close a connection without checkpointing the WAL?

2016-12-05 Thread Scott Hess
At Chromium shutdown, various services desire to write data to their
SQLite databases, which results in a (small) thundering herd of
fsyncs, which makes shutdown slower than it could be.  Normally, one
could enable WAL mode to amortize the fsync cost across longer periods
than a single transaction, but as best I can tell, sqlite3_close()
requires the WAL checkpoint, so won't help.

Is there any clean way to request no WAL checkpoint on sqlite3_close()?

An obvious solution would be to simply not call sqlite3_close(),
though that has various other unfortunate side effects.

Thanks,
scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Read-only access which does not block writers

2016-11-29 Thread Scott Hess
On Tue, Nov 29, 2016 at 10:10 AM, Mark Hamburg  wrote:
> On Nov 29, 2016, at 9:09 AM, Simon Slavin  wrote:
>>> On 29 Nov 2016, at 4:18pm, Mark Hamburg  wrote:
>>>
>>> Does this make sense? Does it seem useful? (It seems useful to me when I 
>>> see multi-megabyte WAL files.)
>>
>> Sorry, but I cannot spare the time right now to analyze the system you laid 
>> out.  It usually takes half an hour to diagram out the read and write 
>> procedures and point out where multiple simultaneous ones don’t fit together.
>>
>> I can tell you that entire books are written about the difficulties of 
>> simultaneous access to a database, and that I’ve read too many of them.  And 
>> that it has been proven many times that there’s no solution.  You cannot 
>> design a system which (A) provides up-to-date data to readers (B) allows 
>> writers to get rid of their data immediately without ever locking up and (C) 
>> guarantees that earlier changes to the data are ’saved' before later 
>> changes, thus preserving uncorrupted data in the case of power-cuts, etc..
>>
>> It is possible to implement a version if you drop one of the requirements.  
>> For example, you can have many simultaneous writers as long as you don’t 
>> need any readers.  Or you can have many readers as long as you have only one 
>> writer and you don’t need readers to be completely up-to-date.
>>
>> You might like to read more about Brewer’s Theorem:
>>
>> 
>>
>> In the meantime, I’m glad that WAL mode seems to be useful for you, if you 
>> can cope with big journal files until all connections are closed, it’s a 
>> good solution.
>
> What I probably haven't accounted for is what it would take to do an 
> atomic/safe swap of the WAL files in my double WAL scenario. I need to give 
> that some more thought.

Don't do swaps.  The issue is that WAL needs everyone to agree to
reset to the front of the WAL file.  The _A and _B files could have a
serial number to differentiate which comes first.  Then when the first
file has no readers blocking, that file can be checkpointed to the
main database.  Then that file becomes available for future use.

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Pragma to flag unknown pragma?

2016-11-23 Thread Scott Hess
On Tue, Nov 22, 2016 at 10:50 PM, R Smith <rsm...@rsweb.co.za> wrote:
> On 2016/11/23 2:08 AM, Scott Hess wrote:
>> https://www.sqlite.org/pragma.html has:
>> "No error messages are generated if an unknown pragma is issued.
>> Unknown pragmas are simply ignored. This means if there is a typo in a
>> pragma statement the library does not inform the user of the fact."
>>
>> I just lost some time due to this, even though I was fully aware of
>> it.  My code wasn't working, so I instrumented to report errors, and
>> gradually dug things deeper and deeper.  It wasn't until I was
>> verifying statements line-by-line against sqlite3 in a terminal window
>> that I saw that I was setting journal_mod rather than journal_mode!
>>
>> I realize that pragma don't have the compatibility guarantees that
>> other syntax has.  But that means I actually _would_ want my code to
>> start barfing if a PRAGMA stops being supported.  Say I'm issuing
>> "PRAGMA secure_delete=ON" - presumably it's for a reason, and I'm
>> going to want to revisit that reason if it's no longer supported.
>
> The problem is more the other way round - Backward compatibility is hard
> when you introduce a new pragma, and a previous version of SQLite "barfs"
> suddenly when it encounters that statement. Recent posts here re-emphasize
> the frequency with which old versions are still used out there
>
> Silently ignoring an unrecognized pragma is a great way to be able to
> introduce new functionality without worrying that the old will break.

I may be misunderstanding where you're going with that, but my point
was that there are some situations where I'm calling a PRAGMA because
it's important to some implementation detail.  For instance, if your
app's schema requires operational foreign keys, then simply ignoring
"PRAGMA foreign_keys = ON" is _not_ kosher (I mean, yes, there's
nothing SQLite can do to fix being old or compiled that way, but
signalling "I can't do that, Dave" would be helpful).  How the PRAGMA
I am _not_ calling are handled doesn't seem super relevant to that
case, because I'm not calling them.

Of course, my "PRAGMA pedantic_pragma = ON" call would itself succeed
by failing on an older version of SQLite.  You have to bootstrap
someplace, so you'd have to make the call to turn it on, then do
another query to verify that it is on.  Which I guess is basically
what your PRAGMA should already be doing, though there are subtleties
(PRAGMA result patterns aren't consistent across different PRAGMA).

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Pragma to flag unknown pragma?

2016-11-22 Thread Scott Hess
https://www.sqlite.org/pragma.html has:
"No error messages are generated if an unknown pragma is issued.
Unknown pragmas are simply ignored. This means if there is a typo in a
pragma statement the library does not inform the user of the fact."

I just lost some time due to this, even though I was fully aware of
it.  My code wasn't working, so I instrumented to report errors, and
gradually dug things deeper and deeper.  It wasn't until I was
verifying statements line-by-line against sqlite3 in a terminal window
that I saw that I was setting journal_mod rather than journal_mode!

I realize that pragma don't have the compatibility guarantees that
other syntax has.  But that means I actually _would_ want my code to
start barfing if a PRAGMA stops being supported.  Say I'm issuing
"PRAGMA secure_delete=ON" - presumably it's for a reason, and I'm
going to want to revisit that reason if it's no longer supported.

One could perhaps fake pedantic pragma with SQLITE_FCNTL_PRAGMA in a
custom VFS, but that seems a little excessive for something like this.
Something like "PRAGMA pedantic_pragma = on" would be much slicker.

OK, back to the mines,
scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about sqlite3_backup() versus page_size.

2016-10-27 Thread Scott Hess
As soon as I hit Send, I thought of something.  This only holds until
you actually read the destination database.  So in my example code,
after the check that the database contains the right data
("e\nf\ng\nh" bit), another page_size check will show 4096 instead of
1024.  So the window for this to confuse something is pretty narrow.

-scott


On Thu, Oct 27, 2016 at 4:08 PM, Scott Hess <sh...@google.com> wrote:
> Here's an example code for reference:
>http://pastebin.com/pQdfkneR
>
> I just noticed that if you use sqlite3_backup() where the source
> database page_size isn't the same as the destination database
> page_size, after a successful backup the destination database
> continues to report the page_size from before the backup was made.  In
> most cases, this is not the case.  For instance, if you try to change
> the page_size on a database which contains pages, then later query the
> page_size, you will get back the _actual_ page_size, not the one you
> set.
>
> Is this intentional?
>
> -scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Question about sqlite3_backup() versus page_size.

2016-10-27 Thread Scott Hess
Here's an example code for reference:
   http://pastebin.com/pQdfkneR

I just noticed that if you use sqlite3_backup() where the source
database page_size isn't the same as the destination database
page_size, after a successful backup the destination database
continues to report the page_size from before the backup was made.  In
most cases, this is not the case.  For instance, if you try to change
the page_size on a database which contains pages, then later query the
page_size, you will get back the _actual_ page_size, not the one you
set.

Is this intentional?

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Q about integer overflow in sqlite3MulInt64().

2016-09-20 Thread Scott Hess
Yes - once the undefined behavior has happened, the compiler can
dispense with everything else, so if it can prove that your
after-the-fact checks can only happen in case of signed overflow, it
can simply omit them.  Great fun.

Dr Hipp landed https://www.sqlite.org/src/info/db3ebd7c52cfc5fc ,
which is basically what you suggested.  I also see that some systems
include __builtin_mul_overflow() intrinsics, which can use the CPU's
overflow flag, if available, which seems plausible.

-scott

On Tue, Sep 20, 2016 at 4:51 PM, Bernardo Sulzbach
 wrote:
> In time, ignore my previous reply to this thread as SQLite portability
> requirements make it invalid (at least I think they would). According to the
> C language standard, signed overflow is undefined behavior and, therefore,
> should not be relied upon.
>
> There is also a simpler way to check it using a division of the maximum
> possible value by the multiplier (which will never overflow).
>
>
> --
> Bernardo Sulzbach
> http://www.mafagafogigante.org/
> mafagafogiga...@mafagafogigante.org
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Q about integer overflow in sqlite3MulInt64().

2016-09-20 Thread Scott Hess
sqlite3MulInt64() in util.c appears to try to detect integer overflow
by dividing the inputs by 2^32.  If both inputs are 0 when divided by
2^32, it does the 64-bit multiplication and moves on.

In the case of something like |SELECT 3452005775*3452005775|, both
inputs are greater than 2^31 but less than 2^32, but the result is
greater than 2^63, so it ends up as a large negative number (ie,
overflow, which is undefined for signed integers in C).  The smallest
number this overflow happens to is sqrt(2^63)+1, which is 3037000500.
Obviously there's a range of values where this can happen.

No patch suggested, though I wouldn't be surprised if my brain makes a
suggestion after things simmer for an hour or so.  If either value
needs less than 31 bits, it can't happen, but there's not a simple bit
pattern to check, AFAICT.

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite - Delete large table in sqlite

2016-09-09 Thread Scott Hess
Do you have auto_vacuum turned on?  It may be that the cost isn't
actually in deleting the table, it may be that the cost is rearranging
the rest of the file to fill the gaps left by deleting the table.  In
that case you could turn off auto_vacuum, or you could use incremental
vacuum to smooth out the cost of doing this over time.

Your 183 columns point concerns me.  The minimum size of such a row is
around 200 bytes, but presumably you're actually storing data in
there.  If the amount of data per row is often larger than your page
size, you could have a lot of overflow pages, and rows are often
substantially larger than the page size that can be a problem.  You
might experiment with different page sizes, but, honestly, you might
want to revisit the reasons for having 183 columns in the first place.

-scott


On Fri, Sep 9, 2016 at 5:49 AM, Bhavesh Patel
 wrote:
> I have a 4GB SQLite database (currently using SQLCipher for encryption). I 
> have created a physical table in the SQLite DB which has approx 20 rows x 
> 183 columns. When I am trying to delete the table. Sqlite is taking up a lot 
> of time to delete the huge table.
>
> I tried drop table and also delete all the rows - in both the cases it takes 
> up 1-2 minutes approx.
>
> during the process of delete/drop is running, if I try to connect I randomly 
> get database lock error.
>
> I would like to know if there is any optimized way to delete such huge table 
> or any other ways to run drop/delete but allow other users to connect and use 
> the database or can avoid waiting of other connect to the DB?
>
> Thanks in advance
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] page_size on ATTACH-ed databases

2016-08-16 Thread Scott Hess
Is there any possibility that the attached db already existed before
you ran this?  Because once a db exists (contains pages) the page size
is fixed until you run vacuum.

On Tue, Aug 16, 2016 at 10:53 AM, Ward WIllats  wrote:
>
>>> On Aug 12, 2016, at 11:44 PM, Dan Kennedy  wrote:
>>>
>>> On 08/13/2016 01:14 AM, Ward WIllats wrote:
>>>
>>> Can't reproduce this problem here. Are you able to reproduce it with the 
>>> shell tool?
>>>
>>
>>
>> Yes, if I use the shell on our embedded system (OpenWRT/Linux). I should 
>> have added that:
>>
>> 1. The main DB in on a JFFS filesystem and we've moved the -shm file onto 
>> /tmp with a marginally-supported sqlite #define.
>> 2. The secondary DB is on /tmp in a RAM FS (which is larger/faster/volatile).
>>
>> We actually issue quite a few pragmas on open. I'll post the whole sequence 
>> when I can get back to our box.
>>
>
> Here we go:
>
> ~# /usr/local/bin/sqlite3 /opt/foundation/core_db.db
> SQLite version 3.10.1 2016-01-13 21:41:56
>
> Enter ".help" for usage hints.
> sqlite> attach database '/tmp/RareData.db' as rd;   < ATTACH SECOND DB
> sqlite>
> sqlite> pragma page_size=512;   <- SET MAIN DB PAGE SIZE
> sqlite> pragma cache_size=200;
> sqlite> pragma mmap_size=0;
> 0
> sqlite> pragma busy_timeout=57000;
> 57000
> sqlite> pragma foreign_keys=ON;
> sqlite> pragma synchronous=normal;
> sqlite> pragma journal_mode=WAL;
> wal
> sqlite> pragma temp_store=2;
> sqlite> pragma waL_autocheckpoint=10;
> 10
> sqlite> pragma journal_size_limit=15360;
> 15360
> sqlite> pragma auto_vacuum=2;
> sqlite> pragma max_page_count=16384;
> 16384
> sqlite>
> sqlite> pragma page_size;
> 512   < MAIN DB PAGE SIZE OK
> sqlite>
> sqlite> pragma rd.page_size=4096; <- SET ATTACHED DB PAGE SIZE TO 
> 4K
> sqlite> pragma rd.cache_size=32;
> sqlite> pragma rd.mmap_size=0;
> 0
> sqlite> pragma rd.busy_timeout=57000;
> 57000
> sqlite> pragma rd.foreign_keys=ON;
> sqlite> pragma rd.synchronous=normal;
> sqlite> pragma rd.journal_mode=WAL;
> wal
> sqlite> pragma rd.temp_store=2;
> sqlite> pragma rd.waL_autocheckpoint=2;
> 2
> sqlite> pragma rd.journal_size_limit=16384;
> 16384
> sqlite> pragma rd.auto_vacuum=2;
> sqlite> pragma rd.max_page_count=5000;
> 5000
> sqlite>
> sqlite> pragma rd.page_size;
> 1024   <-- GET 1K DEFAULT PAGE SIZE BACK 
> ON ATTACHED DB, *NOT* 4K SET ABOVE
> sqlite>
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bad db feature request

2016-06-29 Thread Scott Hess
On Wed, Jun 29, 2016 at 10:36 AM, Simon Slavin  wrote:
> On 29 Jun 2016, at 5:45pm, Drago, William @ CSG - NARDA-MITEQ 
>  wrote:
>> Aren't there things like that already built in to the hard disk controllers 
>> (CRC, Reed Solomon, etc.)?
>
> Yes.  But they operate at the level they understand.  For instance ...
>
> A change is made in a field which involves changing just one page of data.  
> In terms of the SQLite file format this would mean that a table page is 
> overwritten -- a one disk sector change.  If SQLite checksums existed then 
> this would mean that the checksum, stored in the table pointer page, would 
> also have to be updated.  Which would mean that another disk sector has to be 
> changed too.
>
> Now suppose there's a big in the storage medium driver which means it 
> occasionally writes the correct data to the wrong sector on disk.  Without 
> checksums this fault would not be noticed: since the wrong sector on disk was 
> updated the wrong checksum on disk would be updated too.  The errors would 
> match.

I think the bigger problem is that delegating this means that you
assume the entire underlying stack is working correctly.  For
instance, the disk may have elaborate error-correction protocols that
are working correctly per sector, but SQLite's pages may span sectors.
Or the underlying disk may be perfect and the filesystem doesn't
provide the same guarantees.  Or someone is running things over NFS.
Having the page checksum embedded in the page at the SQLite level
would provide end-to-end confidence.

Chaining the checksums is a whole different level of assurance.  To
the best of my knowledge _all_ legitimately (1) corrupted databases
I've seen had pages which were individually valid, but not valid when
taken together.  Like an index page referred to a row which wasn't
present in the table page.  This implies that the atomicity guarantees
SQLite relies on were broken at the filesystem or disk level.

-scott

(1) I consider a system where the filesystem is simply broken to not
be legitimate corruption.  For instance, if you get a page of random
which doesn't appear to have every been SQLite data in the first
place.  There's not much SQLite can do about that kind of thing.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bad db feature request

2016-06-29 Thread Scott Hess
On Wed, Jun 29, 2016 at 2:17 AM, R Smith  wrote:
> In response to a recent forum post and many other posts, where SQLite
> corrupt files or Index integrity was the problem at hand, I was wondering if
> we could ask for an API function that would corrupt a DB for us.

I have done some things like this in Chromium's sql/ wrapper.  There
are helpers at:
   https://cs.chromium.org/chromium/src/sql/test/test_helpers.h

sql::test::CorruptSizeInHeader() is a helper function which uses
mostly libc operations to modify the page count in the header, which
results in SQLITE_CORRUPT for most operations.

In the same file, sql::test::CorruptTableOrIndex() is a helper which
snapshots the root page of a table or index, runs a given update
query, then restores the snapshot.  This can be used to build
mis-matched indices or tables.

These have example use in:
   https://cs.chromium.org/chromium/src/sql/connection_unittest.cc
   https://cs.chromium.org/chromium/src/sql/recovery_unittest.cc
Of course, none of this would be directly useful, since it's written
in Chromium C++.  But it would be straightforward to write them using
libc and the SQLite C API, or your appropriate wrapper.

Unfortunately, one thing I found in this is that it's kind of hard to
inject real corruption reliably.  You have to figure out ways to be
wrong enough to be noticeable, and right enough that SQLite can make
progress to see the corruption, and then you have to make sure SQLite
is running through the code paths which will actually expose the
corruption.

It might be cleaner to just have a way to do error injection.  Like
maybe something like the authorizer interface, but where it could
detect the table row/column being touched and introduce an error.  Or
maybe you could write a vtable which proxies a real table and
introduces errors.

-scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite database corrupt , PHP and multithread C program accessing same database

2016-06-22 Thread Scott Hess
On Wed, Jun 22, 2016 at 8:13 PM, Simon Slavin  wrote:
> On 23 Jun 2016, at 3:52am, mon siong  wrote:
>> PHP is using sqlite3 library (http://php.net/manual/en/book.sqlite3.php)
>> and C program is handle the sqlite using Serialized.
>>
>> Both of them are accessing the same DB at the same time . This can cause DB 
>> corrupt ?
>
> No.  These two work correctly together.  You should get no corruption from 
> using these two together.

I wonder if it would be worthwhile to provide a PRAGMA (or function)
which returned a blob derived from the VFS pointer, perhaps hashed.
Then you could very easily verify if multiple higher-level libraries
are sharing the same underlying sqlite library, because that operation
would return the same result in both cases.

I'm suggesting to derive from the VFS pointer because if separate
sqlite libraries were sharing the same Unix VFS I think the POSIX
locking problem would not occur.  sqlite_source_id() is _really_
close, but technically could be identical for distinct sqlite
libraries linked into the same process.

Sorry if I'm too tired to make sense,
scott
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Managing SQLite indices.

2016-06-14 Thread Scott Hess
One thing I would add is to try to populate your example database with
representative data - in fact, try hard to figure out what
representative data looks like, it informs many decisions.  My
experience is that sometimes people assume that because something is
fast enough on their workstation, it's fast enough for production, but
in production their joins involving multiple full table scans are
suddenly generating tens of thousands of rows for their SORT to order
and their WHERE to filter, instead of the small constant number of
rows in their simple test database.

-scott


On Mon, Jun 13, 2016 at 5:04 PM, Simon Slavin  wrote:
>
> On 14 Jun 2016, at 12:27am, Smith, Randall  wrote:
>
>> the info from EXPLAIN QUERY PLAN and from reading the query itself don't 
>> always lead to an obvious statement of what indices are needed.
>
> I don't think this can be done well by software.  Of course, I haven't tried 
> it.
>
>> Has anyone figured out a good system for managing indices in a smooth, 
>> efficient, and reliable way in a non-trivial SQLite application?
>
> Sure.  But you're not going to like it.
>
> General principles:
>
> A) Plan your schema properly.  Tables represent things.  Think through 
> COLLATE for every column, especially key columns.  Use foreign keys rather 
> than copying data into other tables.  No need to be obsessive about it but 
> "that's how it was done when I started here" is not good enough.  Work out 
> your tables, your primary keys and your views and your indexes will take care 
> of themselves.
>
> B) If a query runs fast enough, it runs fast enough.  Don't mess with "as 
> fast as possible".  That way lies madness.
>
> C) Don't index a column just because it looks important.  You create an 
> index, when you create an index, for a particular statement. You look at the 
> "WHERE" and "ORDER BY" clauses and figure it out from there.  It's always 
> possible to create the best possible index for a statement by inspecting 
> those two clauses and thinking about how "chunky" each column is.  You may 
> not need the full index -- the rightmost column(s) may be unnecessary -- but 
> it's a good starting point.
>
> Got the principles ?  Right.  Now here's the procedure:
>
> 1) Delete all indexes.
> 2) Run ANALYZE.
> 3) Run your application.
> 4) Note the SQLite command which takes the most annoyingly long time.
> 5) Work out a good index which will fix the problem.
> 6) Create the index.
> 7) Repeat from step 2.
>
> When your application runs fast enough not to annoy you, you're done.  If 
> you're not willing to do step (1), don't bother with anything else.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] 2 different SQLite versions inside the same process space

2016-05-10 Thread Scott Hess
IF you have two different versions of SQLite linked into the same
executable, both accessing the same database, then the problem that
the globals work around can happen.  It won't happen if different
processes use different versions of SQLite (say two versions of the
sqlite3 binary, or sqlite3 versus Python, etc), nor is there an issue
if a single process with multiple versions linked is used to access
different database files (say if one version is part of nss used to
access the system certificate database and the other version is what
you compile in to access your databases).

The gist of the problem is that if a single process has two file
descriptors open on the same file, and one holds a lock, closing the
other file descriptor clears the lock.  SQLite's Unix VFS puts the
close in a global queue to close later to prevent this.

-scott


On Tue, May 10, 2016 at 2:29 PM, Steve Schow  wrote:
>
> I would like to understand this issue a little bit better?
>
>
> On May 10, 2016, at 2:31 PM, Richard Hipp  wrote:
>>
>> In unix, SQLite has to use global variables to work around the
>> well-known design bugs in posix advisory locks.  And so if you have
>> two different instances of SQLite running on unix, they will use
>> different global variables, causing them to break each others locks
>> and you will get database corruption.
>
>
> are you saying that on UNIX, if two different versions of the sqlite3 binary 
> attempt to access a DB file at the same time?then the globals that are used 
> in the sqlite3 binaries related to locking may be different in the two 
> different binaries, and may result in DB corruption?
>
> If that is the case, then although the internal DB file format may be 
> backwards compatible between versions of sqlite3, its very important that I 
> take care not to allow two different versions of the SQLITE executable code 
> attempt to access the DB file at the same time.  As long as they are totally 
> separate non-concurrent accesses, it sounds like it should be fine?but if 
> they attempt concurrently, then concurrency locking between them can?t be 
> garaunteed due to changes in the way you are handling it with globals as the 
> code has evolved.  On UNIX anyway.  Do I have that right?
>
> That?s a very important thing to keep in mind with so many different versions 
> of sqlite3 executable code floating around..its built into python a lot older 
> then the sqlite3 binary I have installed, which might be different from what 
> is compiled into fossil, etc..
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Make mmap_size dynamic?

2016-05-02 Thread Scott Hess
If you're enabling mmap on a platform where it is intentionally not enabled
by SQLite, then you probably need to get in and thoroughly verify that it's
going to work correctly.  You'll need to enable SQLITE_MMAP_READWRITE,
obviously, but I don't think there's anything in there which makes the
guarantee that SQLite will _only_ use memory-mapped access to the file.

[I think it might be reasonable to have a SQLITE_MMAP_NO_FILEIO type flag
for this case, which would force a memory-mapped file to be accessed only
via mmap.]

-scott


On Mon, May 2, 2016 at 12:19 PM, Mikael  wrote:

> Aha, great! So what I was asking for was already in the box, and a
> mmap_size of say 2^63 will be fine then!
>
>
> One reason that I asked for this was that I want to use it on OpenBSD, and
> there, mmaping in Sqlite is disabled altogether, in the absence of a
> unified buffer cache (UBC) in the OS.
>
> If I just force it on (by hacking the build script), as long as mmap_size
> always is 2^63, will Sqlite access the file via memory accesses only, and
> never using fread/fwrite which would lead to undefined behavior because of
> the absence of a UBC?
>
> Thanks!
>
> On Tuesday, 3 May 2016, Scott Hess  wrote:
>
> > The existing mmap functionality only maps the actual blocks associated
> with
> > the file.  So if your file is 16kb and your mmap_size is 1GB, only 16kb
> is
> > used.  Unless you add data to the file, then the mmap area grows,
> > obviously.
> >
> > -scott
> >
> >
> > On Mon, May 2, 2016 at 2:01 AM, Mikael  > <javascript:;>> wrote:
> >
> > > Dear Dr. Hipp & list,
> > >
> > > What about making Sqlite's memory mapping adapt to the current database
> > > size, in increments of say 100MB?
> > >
> > > The at least 48 bits (256TB) of addressing space that modern 64bit
> > > architectures give per process is not suffering any risk of depletion,
> as
> > > long as the space not is used wastefully, which would be the case now
> as
> > > today in the absence of an incremental setting, to guarantee that a
> > > database never will grow outside of the mmap size, a developer is
> tempted
> > > to set mmap_size to a value so high that it guaranteedly never will be
> > > reached e.g. 1TB, and that way an application could be almost 100%
> > wasteful
> > > with address space, and that way a process would get a constraint of
> max
> > > 200 or so databases.
> > >
> > > Can Sqlite user code implement this by itself already somehow?
> > >
> > > This would also be useful to do memorymapped-only IO on an OS that not
> > has
> > > a unified buffer cache, such as OpenBSD, where memory mapping is
> disabled
> > > altogether for this reason currently.
> > >
> > > Looking forward to your response,
> > >
> > > Thanks,
> > > Mikael
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users at mailinglists.sqlite.org <javascript:;>
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > >
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org <javascript:;>
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Make mmap_size dynamic?

2016-05-02 Thread Scott Hess
The existing mmap functionality only maps the actual blocks associated with
the file.  So if your file is 16kb and your mmap_size is 1GB, only 16kb is
used.  Unless you add data to the file, then the mmap area grows, obviously.

-scott


On Mon, May 2, 2016 at 2:01 AM, Mikael  wrote:

> Dear Dr. Hipp & list,
>
> What about making Sqlite's memory mapping adapt to the current database
> size, in increments of say 100MB?
>
> The at least 48 bits (256TB) of addressing space that modern 64bit
> architectures give per process is not suffering any risk of depletion, as
> long as the space not is used wastefully, which would be the case now as
> today in the absence of an incremental setting, to guarantee that a
> database never will grow outside of the mmap size, a developer is tempted
> to set mmap_size to a value so high that it guaranteedly never will be
> reached e.g. 1TB, and that way an application could be almost 100% wasteful
> with address space, and that way a process would get a constraint of max
> 200 or so databases.
>
> Can Sqlite user code implement this by itself already somehow?
>
> This would also be useful to do memorymapped-only IO on an OS that not has
> a unified buffer cache, such as OpenBSD, where memory mapping is disabled
> altogether for this reason currently.
>
> Looking forward to your response,
>
> Thanks,
> Mikael
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Calling some predefined SQL function from another custom SQL function?

2016-03-31 Thread Scott Hess
On Thu, Mar 31, 2016 at 6:39 AM, Olivier Mascia  wrote:

> > Le 31 mars 2016 ? 11:03, Clemens Ladisch  a ?crit :
> >> I think it is obvious I could build a SQL statement from within the
> >> function and execute it. But it sounds costly to involve the parser
> >> (yes, it's fast) for that, isn't it?
> >
> > You can prepare the statement beforehand.
>
> Not really. Preparing the statement involves the connection object which
> won't be known before function call. So preparation has to be done
> dynamically.  Did I really miss something here?


Put your function's persistent storage in the void*pApp passed to
sqlite3_create_function(), then get it back using sqlite3_user_data(ctx).
If the user_data is per-db, then you can have per-db state in there.

-scott


[sqlite] Article about pointer abuse in SQLite

2016-03-18 Thread Scott Hess
Not sure where you're going with this.  "Undefined behavior" in this case
is obviously referring to things defined by the C standard.  Things not
defined by the standard can (and do) change over time as compilers advance,
and also often differ between compilers from different vendors.

-scott


On Fri, Mar 18, 2016 at 1:40 PM, Keith Medcalf  wrote:

>
> There is no such thing as "undefined behaviour".  The machine code does
> exactly what it is told to do in exactly the manner in which it has been
> told to do it and obtains exactly the correct answer every time.
>
> That the computation is "advanced beyond the realm of understanding of the
> observer" does not make the behaviour undefined.  It is perfectly defined,
> however, it is occasionally necessary to describe things as "undefined",
> oftentimes because it is too complicated to explain.  Just because someone
> says something as "undefined" does not mean that is so.  It is simply a
> euphemism for "I don't understand how it did that/what it is supposed to be
> doing (or, more often an appeal to self-proclaimed authority which said
> that such behaviour was undefined" without  having to admit fault, much in
> the same way that "supported" is a euphemism for "make money from".
>
> Things will only be non-deterministic and perhaps undefined when run on
> Quantum Computers using Heisenberg registers for intermediate results.
>
> > 
>
> > "SQLite is a carefully engineered and thoroughly tested piece of
> software.
> > Even so, it contains undefined behaviors because, until recently, no good
> > checker for these behaviors existed. If anything is going to save us from
> > UB hell, it?s tools combined with developers who care to listen to them.
> "
>
> > Simon.
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] [sqlite-dev] Changing the default page_size in 3.12.0

2016-03-08 Thread Scott Hess
On Fri, Mar 4, 2016 at 7:48 AM, Richard Hipp  wrote:

> The tip of trunk (3.12.0 alpha) changes the default page size for new
> database file from 1024 to 4096 bytes.


I have noticed that the OSX sqlite library seems to use default page_size
of 4096, and default cache_size of either -2000 or 500.  I think that
provides some support for the notion that for many users this may be a
non-event.

-scott


[sqlite] Page_size

2016-02-29 Thread Scott Hess
Also note that almost all current storage you can purchase uses 4k basic
blocks.  So it's not just some weird Windows thing.

In addition to performance advantages of getting the block size right,
there is also the advantage that most storage systems strive hard to make
sure block operations are atomic in the face of power losses.  This is
easier to accomplish when writing a full block, as opposed to doing a
read-modify-write required to write a smaller block.

Circling back to testing page_size, I have noticed that it is easy to
mistake cache effects for improvements due to page_size.  If your pages are
2x as big, your cache is 2x as big (unless you modify it).  You can
convince yourself to use huge pages, only to later realize that actually
the improvement was from using a huge cache or from preloading the cache.
So when experimenting, do try to determine _why_ the improvement happens,
because there may be other ways to accomplish the improvement.

-scott


On Mon, Feb 29, 2016 at 7:28 AM, Keith Medcalf  wrote:

>
> The NTFS cluster size is usually 4K, unless you have changed it (you can
> use fsutil or chkdsk to see what the cluster size is).  Windows does I/O in
> units of a cluster.  Therefore the smallest (and largest) I/O which Windows
> will do is 1 Cluster, or 4K.  While you may set the page size larger or
> smaller at the application level, Windows will always do I/O in cluster
> units.  So if you set the page size to 1K, it will still I/O 4K for each
> page read/write, and if you set it larger than the cluster size, then it
> will scatter-gather the number of clusters required to fill your request.
>
> > -Original Message-
> > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-
> > bounces at mailinglists.sqlite.org] On Behalf Of Olivier Mascia
> > Sent: Monday, 29 February, 2016 02:15
> > To: SQLite mailing list
> > Subject: Re: [sqlite] Page_size
> >
> >
> > > Le 29 f?vr. 2016 ? 09:22, Jim Wang <2004wqg2008 at 163.com> a ?crit :
> > >
> > > hi,all
> > >
> > > Could different page_size do impact on the speed of retrieving
> > record?
> > > Is page_size 8192 faster than page_size 1024 or page_szie 4096?  or
> > The pagesize has nothing to do with the retrieving speed.
> > >
> > > Best reagard!
> > > Jim Wang.
> >
> > I'm eager to read the comments of seasoned users on this.
> >
> > On my side, having started using SQLite very recently (December 2015),
> > after a lot of tests using real data, we have found the sweet spot (for
> > us, and for now) to be 4K page sizes, which just happen to be the virtual
> > memory page size of Windows system (except older Itanium versions).
> > Though we use a slightly smaller cache size (between 500 and max 1000
> > pages) instead of the default 2000 pages.  We use private caches per
> > connection only, and use only wal mode.
> >
> > You'll obviously pack more record and index cells per page when the page
> > size is larger, but you will have larger I/O too to read or write any of
> > those pages.  Your cache memory usage will obviously be higher with
> larger
> > page size, unless you cache fewer pages (which we chose to do).
> >
> > I can't bring any comment regarding other OS.
> > But there is probably not ONE answer to your question.
> >
> > --
> > Meilleures salutations, Met vriendelijke groeten, Best Regards,
> > Olivier Mascia, integral.be/om
>
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Patch to fix buffer overflow in icu extension.

2016-02-26 Thread Scott Hess
Summary: Certain Unicode code points expand to more than two code points
when run through u_strToUpper().

SQLite's src/ext/icu/icu.c contains icuCaseFunc16() which implements custom
upper() and lower() functions.  It allocates a buffer of twice the input
size because some code points take more space when uppercased (or
lowercased) than the input.  Code points such as U+FB04 (ffl ligature)
uppercase to _three_ code points, so this can lead to a buffer overflow
because the result is not nul-terminated.

The following patch catches the U_BUFFER_OVERFLOW_ERROR result and
re-allocates to the actual size needed.

CL patching Chromium is:
https://codereview.chromium.org/1704103002/

-scott

---
 third_party/sqlite/src/ext/icu/icu.c | 31 +--
 third_party/sqlite/src/test/icu.test |  7 +++
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/third_party/sqlite/src/ext/icu/icu.c
b/third_party/sqlite/src/ext/icu/icu.c
index 7e2b800..d384f71 100644
--- a/third_party/sqlite/src/ext/icu/icu.c
+++ b/third_party/sqlite/src/ext/icu/icu.c
@@ -341,26 +341,45 @@ static void icuCaseFunc16(sqlite3_context *p, int
nArg, sqlite3_value **apArg){
   if( !zInput ){
 return;
   }
-  nInput = sqlite3_value_bytes16(apArg[0]);
+  nOutput = nInput = sqlite3_value_bytes16(apArg[0]);

-  nOutput = nInput * 2 + 2;
   zOutput = sqlite3_malloc(nOutput);
   if( !zOutput ){
 return;
   }

   if( sqlite3_user_data(p) ){
-u_strToUpper(zOutput, nOutput/2, zInput, nInput/2, zLocale, );
+nOutput = u_strToUpper(
+zOutput, nOutput/2, zInput, nInput/2, zLocale, ) * 2;
   }else{
-u_strToLower(zOutput, nOutput/2, zInput, nInput/2, zLocale, );
+nOutput = u_strToLower(
+zOutput, nOutput/2, zInput, nInput/2, zLocale, ) * 2;
   }

-  if( !U_SUCCESS(status) ){
+  if ( status == U_BUFFER_OVERFLOW_ERROR ) {
+UChar* newOutput = sqlite3_realloc(zOutput, nOutput);
+if( !newOutput ){
+  sqlite3_free(zOutput);
+  return;
+}
+zOutput = newOutput;
+status = U_ZERO_ERROR;
+if( sqlite3_user_data(p) ){
+  nOutput = u_strToUpper(
+  zOutput, nOutput/2, zInput, nInput/2, zLocale, ) * 2;
+}else{
+  nOutput = u_strToLower(
+  zOutput, nOutput/2, zInput, nInput/2, zLocale, ) * 2;
+}
+  }
+
+  if( U_FAILURE(status) ){
 icuFunctionError(p, "u_strToLower()/u_strToUpper", status);
+sqlite3_free(zOutput);
 return;
   }

-  sqlite3_result_text16(p, zOutput, -1, xFree);
+  sqlite3_result_text16(p, zOutput, nOutput, xFree);
 }

 /*
diff --git a/third_party/sqlite/src/test/icu.test
b/third_party/sqlite/src/test/icu.test
index 73cb9b9..22948aa 100644
--- a/third_party/sqlite/src/test/icu.test
+++ b/third_party/sqlite/src/test/icu.test
@@ -56,6 +56,10 @@ set ::ograve "\xF2"
 #
 set ::szlig "\xDF"

+# U+FB03 (ffi ligature) and U+FB04 (ffl ligature). They're uppercased
+# to 'FFI' and 'FFL'.
+set ::ffi_ffl "\ufb03\ufb04"
+
 # Tests of the upper()/lower() functions.
 #
 test_expr icu-2.1 {i1='HellO WorlD'} {upper(i1)} {HELLO WORLD}
@@ -72,6 +76,9 @@ test_expr icu-2.6 {i1=$::OGRAVE} {upper(i1)} $::OGRAVE
 test_expr icu-2.7 {i1=$::szlig} {upper(i1)}  "SS"
 test_expr icu-2.8 {i1='SS'} {lower(i1)}  "ss"

+test_expr icu-2.9 {i1=$::ffi_ffl} {upper(i1)}  "FFIFFL"
+test_expr icu-2.10 {i1=$::ffi_ffl} {lower(i1)}  $::ffi_ffl
+
 # In turkish (locale="tr_TR"), the lower case version of I
 # is "small dotless i" (code point 0x131 (decimal 305)).
 #
-- 
2.7.0


[sqlite] SQLite tuning advice for caching scenario

2016-02-17 Thread Scott Hess
Just FYI, FTS writes each transaction's index data in a segment, then does
segment merges over time.  So there's some advantage to bulk updates versus
one-at-a-time updates in terms of index fragmentation and write overhead.
Having an in-memory FTS table which you spill to the on-disk table(s) as a
big transaction can work pretty well.

Note that big transactions can have their own issues, for instance if you
end up having to spill the page cache.

[Obviously, you'll want to test this for your case.]

-scott


On Wed, Feb 17, 2016 at 8:20 AM, Dave Baggett  wrote:

> That's a great suggestion. One issue, though is that I'd have to run two
> FTS searches to search -- one on the disk-based database, and one on the
> memory-based one. I also already have the database split into 8 .dat files
> for scaling purposes. :)
>
> But this may be workable -- thanks. (BTW, I am using SQLite via apsw --
> thanks for that too!)
>
> Dave
>
> Sent with inky
>
> "Roger Binns"  wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
>
> Hash: SHA1
>
>
>
> On 17/02/16 06:37, Dave Baggett wrote:
>
> > I'd welcome any suggestions
>
>
>
> How about two databases?  Create an in memory database for the cache.
>
> Then whenever it hits a certain size (eg 64MB) or time passed (eg 5
>
> minutes), copy/move data from the memory database to the persistent
>
> (disk) one.  This ensures the writes to the disk database are in big
>
> chunks.
>
>
>
> Roger
>
> -BEGIN PGP SIGNATURE-
>
> Version: GnuPG v2
>
>
>
> iEYEARECAAYFAlbEm9AACgkQmOOfHg372QR9rwCgu+MPM+kJEGYlBTzkKRYhHsOu
>
> U98AoOZ4kBue7MV6Q8P9+vkljyJGobVu
>
> =61/4
>
> -END PGP SIGNATURE-
>
> ___
>
> sqlite-users mailing list
>
> sqlite-users at mailinglists.sqlite.org
>
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Use of __builtin_expect in SQLite

2016-02-08 Thread Scott Hess
On Sun, Feb 7, 2016 at 10:39 PM, Matthias-Christian Ott 
wrote:

> On 2016-02-08 04:31, Roger Binns wrote:
> > On 07/02/16 00:56, Dominique Pell? wrote:
> >> I'm curious about the outcome on SQLite benchmarks.
> >
> > About a year ago I tried them out on some tight code (non-SQLite) that
> > absolutely had to use less CPU time.  I couldn't get them to make any
> > difference outside the bounds of measurement error.  Since SQLite has
> > lots of "tight code" places, the instrumentation would have to help in
> > most of them to make a difference (Amdahl's law).
>
> Amdahl's law is not applicable here and describes a completely different
> problem. SQLite does not involve concurrency.
>

"Amdahl's law _In computer architecture, Amdahl's law (or Amdahl's
argument[1]) gives the theoretical speedup in latency of the execution of a
task at fixed workload that can be expected of a system whose resources are
improved."

You _can_ use it to analyze why parallelizing an algorithm across N CPUs
won'y cause the runtime to become 1/N, but that's not the only use.

In typical scenarios SQLite is also limited by the number of IOPS and
> optimization to improve branch predictability have no measurable effect
> and you just waste time with them. Just use PCIe SSDs that are able to
> saturate the bus and all of you performance problems with SQLite are
> gone if you use SQLite on typical computer.
>

In fact, your argument here is Amdahl's Law.  If your performance is
primarily limited by I//O latency, then improving CPU efficiency won't help
much.

-scott


[sqlite] Find SQLITE_BUSY reason?

2016-01-21 Thread Scott Hess
On Thu, Jan 21, 2016 at 4:25 AM, Daniel Polski 
wrote:

> Den 2016-01-21 kl. 11:30, skrev Simon Slavin:
>
>> On 21 Jan 2016, at 9:44am, Daniel Polski  wrote:
>>
>>> The Webserver/PHP can process up to 16 requests simultanuously and will
>>> share one database connection among all instances.
>>> The process for each request is:
>>>
>>> * Use PHP's PDO extension to open a persistent (shared among all
>>> instances) connection to the database
>>>   -> In the background PHP will only open the database once and then use
>>> this connection forever
>>>
>> You have a persistent PHP process on the server ?  How are you limiting
>> this to 16 simultaneous connections ?  I'm not saying any of your procedure
>> is wrong, I'm just curious about how it is done under (presumably) Apache.
>>
>
> PHP is running in fastcgi mode, in combination with a lightttp server.
> This lighttp server always starts 16 instances of the PHP daemon - once a
> request arrives it passes a request to one of the PHP processes.


I don't know what this means in real life, but what it sounds like is you
believe you're opening a single database connection then forking that into
the 16 children.  SQLite is in-process, so that doesn't work like it might
for something like MySQL - you can't have a connection "shared among all
instances" if the instances are in separate processes, you'd need some sort
of SQLite server proxy to share in that way.

So I think you actually do have multiple connections to the underlying
SQLite database file, and those multiple connections can cause locking
issues between themselves.  From what I can recall the most common reason
I've seen causing SQLITE_BUSY is when someone leaves dangling prepared
statements around.  If the prepared statement is stepped to SQLITE_DONE, it
should be fine, but if the most recent sqlite3_step() returned SQLITE_ROW
and the client code decided to early exit, then it needs to call
sqlite3_reset() or sqlite3_finalize().

Hmm.  That sounded ambiguous, and I often find it's a subtle issue to
explain.  Specifically, what I mean is that if you have /tmp/db.db
something like:
   CREATE TABLE t (c INTEGER);
   INSERT INTO "t" VALUES(1);
   INSERT INTO "t" VALUES(2);

Then if you write code something like:
#include 
#include 
#include 

int main(void) {
  const char* db_name = "/tmp/db.db";
  sqlite3* db;
  sqlite3_stmt* s;
  int i;

  assert(SQLITE_OK == sqlite3_open(db_name, ));
  assert(SQLITE_OK == sqlite3_exec(db, "BEGIN", NULL, NULL, NULL));
  assert(SQLITE_OK == sqlite3_exec(db, "INSERT INTO t VALUES (3)", NULL,
NULL, NULL));
  assert(SQLITE_OK == sqlite3_prepare(db, "SELECT c FROM t", -1, , NULL));
  assert(SQLITE_ROW == sqlite3_step(s));
  i = sqlite3_column_int(s, 0);
  /* sqlite3_reset(s); */
  assert(SQLITE_OK == sqlite3_exec(db, "COMMIT", NULL, NULL, NULL));

  /* Sleep forever. */
  printf("Successful, going to sleep.\n");
  while (1) {
sqlite3_sleep(1000 * 1000);
  }

  return 0;
}

Then if you compile this and run it in one window, you'll get "Successful,
going to sleep.", then if you run it in another window, you'll get:
Assertion failed: (SQLITE_OK == sqlite3_exec(db, "COMMIT", NULL, NULL,
NULL)), function main, file /tmp/ex.c, line 19.

If you remove the comments around sqlite3_reset() and try again, the second
run will also get to the "Successful..." output.  Or it could be
sqlite3_finalize(), though often this comes up because you have a
prepared-statement cache, which is why I phrased it as sqlite3_reset() in
the example.

THIS MAY NOT BE WHAT YOU ARE SEEING!  But this kind of issue is the first
place I'd look.

-scott


[sqlite] Wish list: allow developers use the power of sqliteparser

2016-01-18 Thread Scott Hess
On Mon, Jan 18, 2016 at 10:27 PM, David Barrett 
wrote:

> One use of this I would like is to create a security framework around
> arbitrary SQL queries from the user.  So, for example, I'd love to
> determine which tables (and which columns of those tables) a particular
> query is going to access, and then compare that list against a whitelist of
> columns the user is authorized to access.


You shouldn't use the parser interface for access control, you should use
the authorizer interface.

-scott


[sqlite] whish list for 2016

2016-01-13 Thread Scott Hess
On Wed, Jan 13, 2016 at 12:42 AM, Jean-Christophe Deschamps <
jcd at antichoc.net> wrote:

> At 08:28 13/01/2016, you wrote:
>
>> On Wed, Jan 13, 2016 at 2:39 AM, Simon Slavin 
>> wrote:
>> > On 12 Jan 2016, at 11:56pm, Scott Hess  wrote:
>> > > If I am writing a client that can read SQLite databases, then I
>> probably
>> > > don't want your database to be injecting a bunch of arbitrary PRAGMA
>> > calls
>> > > into my client.
>> >
>> > It is, after all, the equivalent of an autoexecute macro.  And we all
>> know
>> > how good an idea that was.
>> >
>> > Okay, folks, I'm with Scott.  If you want SQLite to execute a bunch of
>> > commands when the database is opened write your own replacement for
>> _open_
>> > which opens the file then reads commands from a table and executes them.
>> >
>>
>> You have in my opinion taken this out of context, and are assuming the
>> important part is the application, and not the data (i.e. database file).
>>
>> In original idea was simply to associate with the database some essential
>> settings, which should have been such in the first place and aren't only
>> because of legacy reasons, and not arbitrary macros. Keith just got
>> carried
>> away :) --DD
>>
>
> Exactly! I see a clear difference between settings which are *-required-*
> by the schema and those which are only a convenience or comfort for the
> application(s).
>
> The former type includes what should actually be part of the schema in an
> ideal world:
>   foreign keys, recursive triggers, loading of [auto]extensions set
> ON/OFF, ...;
>   function extensions and collations, ICU, ...
>
> If some index uses a custom collation which isn't loaded, then queries
> will give erroneous results and not perform DB changes as intended.
>

Many of these items are things which require that your version of SQLite be
compiled with the right settings and/or additional code, so they definitely
_are_ the purview of the client, not the data.  The data may be corrupted
by trying to use it in the wrong context, sure, but the client may be
rendered insecure or buggy by loading inappropriate binary code.  I think
best case is for the database to have a catalog of what it needs, and the
client to call an API function to verify that it is constructed in a way to
meet the catalog's needs.  Perhaps the intersection could have allowances
for missing requirements, for instance read-only access can work without
foreign-key support.

If a schema change requires for instance recursive triggers ON, then maybe
> 5, 10 or more applications will need maintainance and redistribution.
> Forgetting update of a single app may silently spoil DB integrity.
>

This issue would seem to apply if you make a change requiring a new SQLite
version, or different SQLite compile-time settings.  Or if you find a bug
with your queries or schema.  These are really distinct areas of
responsibility, so rather than partial piecemeal solutions at the SQLite
level, you'd probably be better off finding a distribution strategy which
makes such updates reasonable, or an architecture which isolates the
database access into a single component.

-scott


[sqlite] whish list for 2016

2016-01-12 Thread Scott Hess
On Tue, Jan 12, 2016 at 3:43 PM, Keith Medcalf  wrote:

> On Tuesday, 12 January, 2016 13:51, James K. Lowden <
> jklowden at schemamania.org> said:
> > On Fri, 8 Jan 2016 08:28:29 +0100
> > Dominique Devienne  wrote:
> > > > One way to do that would be to honor a special user-created table,
> > > > say "PRAGMAS", with name-value pairs that are automatically applied
> > > > when the database is opened.
> > > Interesting idea. A "special"-to-SQLite sqlite_pragmas table, similar
> > > to sqlite_master.
> > Similar in name, yes.  sqlite_master is special in that it's built into
> > the database file structure, though.  You couldn't add another such
> > table without changing the file structure, something OFL understandably
> > doesn't want to do.
>
> > I'm suggesting that the presence of a user-created
> > "configuration table" would govern the behavior of the database,
> > analogous to a configuration file.
>
> I would suppose that one could implement this using an SQLITE_EXTRA_INIT
> linked function (a symbol attached to SQLITE_EXTRA_INIT is called at the
> end of the sqlite3_initialize routine to perform extra initialization).
> Pointing this to a function which calls a bunch of sqlite3_auto_extension
> to add internally compiled extension init routines to the initialization of
> each connection thus making such extensions available to all connections by
> auto-magic.
>
> I suppose one could write some code which looks for the desired table and,
> if found, reads and executes the statements contained therein.
>
> The entry point of this routine can be added into the auto_extension list
> (even though it is not an extension) through the EXTRA_INIT hook.
>
> From my reading of the code, a pointer in the auto_extension list does not
> *have* to actually be an extension -- it is just code that is called when a
> connection object is created that receives the connection object as an
> argument.  If it *is* an extension init routine, it just happens to make
> some calls to add functions/aggregates/modules/etc., but I see nothing to
> indicate that this is actually required.
>
> The only thing you will NOT be able to do this way is anything that needs
> to be set *before* initialization.  Running a set of pragma's on each
> connection should work just peachy though, I would expect.


If you are writing a client that uses SQLite databases, then you probably
have a wrapper for sqlite3_open*(), and you could just put the code to
query and run the pragmas in there.

If I am writing a client that can read SQLite databases, then I probably
don't want your database to be injecting a bunch of arbitrary PRAGMA calls
into my client.

To be honest, I'm kind of pressed to imagine the set of pragmas which make
sense for the database to select, rather than the client, since most of
them affect client operation.  I'd be very annoyed if my client has strong
memory constraints and the database forced some how cache_size setting.  Or
if my client was written assuming temp_store = FILE and the database forced
temp_store = MEMORY.  Having the database run synchronous = OFF or
journal_mode = OFF would be pretty fun!  The only ones which seem more like
they could be part of the schema itself are foreign_keys and secure_delete,
so maybe it would make more sense to handle them differently in the first
place, more like page_size or encoding where once set they're sticky.

-scott


[sqlite] hard links and SQLite

2016-01-11 Thread Scott Hess
On Mon, Jan 11, 2016 at 9:12 PM, Felipe Gasper 
wrote:

> On 11 Jan 2016 9:06 PM, Rowan Worth wrote:
>
>> * if it returns SQLITE_OK and zero rows, the schema hasn't been created
>> yet
>>
>
> Sure; however, by the time you do the next action it?s possible that
> something else will be creating the schema. This is the condition that I?m
> trying to avoid.


If something else created the schema between check and use, what's the
problem?

You can avoid this by doing:
 - query sqlite_master.
 - if the query returns rows, the schema exists, exit; else
 - open a transaction.
 - query sqlite_master.
 - if the query returns rows, the schema exists, end transaction and exit;
else
 - create the schema.
 - commit transaction.

Or, just use CREATE IF NOT EXISTS style when creating the schema, and you
will successfully not create it the second time.  Or just pay attention to
your error codes in the create code.

-scott


[sqlite] hard links and SQLite

2016-01-11 Thread Scott Hess
On Mon, Jan 11, 2016 at 11:00 AM, Felipe Gasper 
wrote:

> On 11 Jan 2016 1:45 PM, Scott Hess wrote:
>
>> As far as preventing the other process from using it before the schema
>> exists, do "SELECT count(*) FROM sqlite_master", and if the result is 0,
>> the schema does not exist.  If you create the schema as a transaction,
>> that
>> will be atomic.
>>
>
> But in order for that SELECT to avert TOCTTOU errors, we?d have to do
> BEGIN EXCLUSIVE LOCK at the beginning of every single DB handle creation.
> That would seem to get expensive?


The solution really depends on what you're doing.  You only need to verify
that the schema exists once, if you check and find that it doesn't exist,
you shouldn't be using it, if you check and find that it does exist, then
it exists until you take action to change that situation.

Presumably right now you have some sort of startup loop which looks like
"Does the database exist?  If not, sleep and check again."  I'd just change
that to "Does the schema exist?  If not, sleep and check again."

Note that the sqlite_master table is held in the same page as the SQLite
header, unless it grows large enough to split into multiple pages.  Also
note that any use of the database tables will require reading the schema
page.  So querying the sqlite_master table at startup will not add any
additional costs.  Another option would be to use PRAGMA user_version, to
signal from the schema-creation process to the schema-using process.  I
don't think it's really any better than querying the schema, though.

Another option would be to just prepare a statement and if it fails with
SQLITE_ERROR, that's probably because the relevant schema doesn't exist.
So sleep and try again later.

-scott


[sqlite] hard links and SQLite

2016-01-11 Thread Scott Hess
Since this doesn't provide a -journal file, certain kinds of crashes cannot
be recovered correctly.

Why you you hard-link before the commit?  The schema doesn't exist until
the commit is successful, so there's no advantage to anyone else reading
the file before then.

As far as preventing the other process from using it before the schema
exists, do "SELECT count(*) FROM sqlite_master", and if the result is 0,
the schema does not exist.  If you create the schema as a transaction, that
will be atomic.

-scott


On Mon, Jan 11, 2016 at 10:25 AM, Felipe Gasper 
wrote:

> Hi all,
>
> To prevent race conditions where a 2nd process accesses a
> newly-created SQLite file before the creator process can set up the schema,
> I?ve implemented logic like this:
>
> - create SQLite file as a .tmp beside permanent location
>
> - BEGIN EXCLUSIVE LOCK
>
> - hard-link the temp file to the permanent location
>
> - create schema
>
> - COMMIT
>
> - unlink temp file
>
> This prevents the problem because the permanent file never exists
> unlocked prior to the schema existence. The 2nd process would either fail
> to open the DB because it?s locked or open it and see the schema.
>
> I?m seeing some corruption in our SQLite files now but have had a
> hard time tracking down the cause. Could it be because of the above logic?
> What have folks here done to avoid the race condition originally described?
> I?ve seen some documentation of SQLite and hard links but
>
> -FG
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Some FTS5 guidance

2016-01-08 Thread Scott Hess
With fts4 you could search for matching terms in an fts4aux table, then use
those to construct a query against the original table.  You'd have a full
scan of the fts index, but you'd not have to do a full table scan of the
primary data.  Unfortunately if there were a large number of hits in the
index scan, then it would be cheaper to just do the full table scan and
skip the index scan.  I don't know if there's a similar thing for fts5 at
this time.

This wouldn't be as efficient as something more suited to substring matches
(an N-gram index, maybe?), but I haven't heard anyone talking about writing
a virtual table to do that.

-scott


On Fri, Jan 8, 2016 at 11:54 AM, Charles Leifer  wrote:

> You can create a custom tokenizer as well then use the standard search
> APIs. I imagine that functionality would work well in this case:
> https://sqlite.org/fts5.html#section_7
>
> On Thu, Jan 7, 2016 at 3:59 PM, Stadin, Benjamin <
> Benjamin.Stadin at heidelberg-mobil.com> wrote:
>
> > One such algorithm would be a (generalized) Ukkonnen suffix tree (
> > https://en.m.wikipedia.org/wiki/Ukkonen%27s_algorithm).
> > It allows you to search efficiently for substrings.
> > It would be possible to do some match weigthing based on match distance
> > within words. But a general solution for a database is probably not
> trivial
> > to implement.
> >
> > Ben
> >
> > Von meinem iPad gesendet
> >
> > > Am 07.01.2016 um 21:46 schrieb Matthias-Christian Ott :
> > >
> > >> On 2016-01-07 19:31, Mario M. Westphal wrote:
> > >> I hence wonder if this problem has been tackled already and if there
> is
> > a
> > >> "standard" solution.
> > >
> > > If I understand you correctly, it seems that you are looking for a
> > > compound splitting or decompounding algorithm. Unfortunately there is
> > > not a "standard solution" for this. There are many languages in the
> > > world and for some usable compound splitting algorithms exist. There
> are
> > > also attempts to create statistical universal algorithms.
> > >
> > > As you said, for English a simple sub-string search might suffice but
> > > for other languages it more complex. I assume that you speak German. If
> > > you have a document that contains the term "Verkehrsleitsystem" and
> your
> > > search query is "Verkehr leiten", it's reasonable to assume that the
> > > document is relevant to the search query. Unfortunately a sub-string
> > > search could not find the document. Other languages are even more
> > > difficult (a textbook on linguistics will explain this better than I
> > can).
> > >
> > > Even if you have such algorithm, it's not trivial to score the results
> > > and there are more aspects to consider to create a simple search
> > > algorithm. For example, in English you will also have to do some
> > > analysis of the phrase structure to identify open compounds.
> > >
> > > Perhaps it helps to mention the languages you are interested in and the
> > > application you have in mind to evaluate whether the SQLite FTS5 could
> > > meet your requirements.
> > > ___
> > > sqlite-users mailing list
> > > sqlite-users at mailinglists.sqlite.org
> > > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] How to make sqlite3_release_memory produce a result?

2016-01-07 Thread Scott Hess
Hmm.  I see that sqlite3PcacheReleaseMemory() is a no-op
if sqlite3GlobalConfig.nPage is non-zero, and that happens
when SQLITE_DEFAULT_PCACHE_INITSZ is not zero.  If I compile with
-DSQLITE_DEFAULT_PCACHE_INITSZ=0 , then I see the expected pages freed.
AFAICT, it doesn't matter if you make use of more than
SQLITE_DEFAULT_PCACHE_INITSZ pages, it still doesn't free the excess.

Looks like that was added in 3.8.11 back in July.

-scott


On Thu, Jan 7, 2016 at 5:44 AM, Bart Smissaert 
wrote:

> These are the compile flags of my sqlite3.dll:
>
> compile_option
> 
> DEBUG
> ENABLE_COLUMN_METADATA
> ENABLE_MEMORY_MANAGEMENT
> MEMDEBUG
> OMIT_LOOKASIDE
> THREADSAFE=0
>
> And these are the set pragma's:
>
> PRAGMA cache_size 32768
> PRAGMA default_cache_size 32768
> PRAGMA page_count 712711
> PRAGMA max_page_count 1073741823
> PRAGMA page_size 1024
> PRAGMA journal_size_limit -1
> PRAGMA locking_mode normal
> PRAGMA automatic_index 1
> PRAGMA encoding UTF-8
> PRAGMA ignore_check_constraints 0
> PRAGMA read_uncommitted 0
> PRAGMA recursive_triggers 0
> PRAGMA reverse_unordered_selects 0
> PRAGMA secure_delete 0
> PRAGMA wal_autocheckpoint 1000
> PRAGMA writable_schema 0
> PRAGMA journal_mode off
> PRAGMA auto_vacuum NONE
> PRAGMA synchronous OFF
> PRAGMA temp_store DEFAULT
>
>
> RBS
>
> On Thu, Jan 7, 2016 at 1:39 PM, Bart Smissaert 
> wrote:
>
> > > So you should
> > see results if you start a transaction, do a few update statements,
> commit
> > the transaction, the call sqlite3_release_memory(db).
> >
> > I tried this with a large table, first with no transaction then with a
> > transaction
> > and tried the sqlite3_release_memory directly after the sqlite3_finalize,
> > but in both cases result still zero:
> >
> > sqlite3_memory_used: 37190712
> > sqlite3_release_memory:0
> > sqlite3_memory_used: 37190712
> >
> > This was a single update, where I don't think a transaction is helpful.
> > Still no idea how I can make sqlite3_release_memory produce non-zero.
> >
> > RBS
> >
> >
> >
> > On Thu, Jan 7, 2016 at 6:05 AM, Scott Hess  wrote:
> >
> >> On Wed, Jan 6, 2016 at 3:03 PM, Bart Smissaert <
> bart.smissaert at gmail.com>
> >> wrote:
> >>
> >> > Have compiled sqlite3.dll (latest) compiled with
> >> ENABLE_MEMORY_MANAGEMENT,
> >> > but sofar
> >> > not been able yet to make sqlite3_release_memory produce anything else
> >> than
> >> > 0.
> >> > What would be the simplest way to make this happen?
> >> > I don't want to do this with C coding, so it should be some SQL
> >> scenario or
> >> > to do with simple
> >> > SQLite functions such as sqlite3_step, _prepare, -finalize etc.
> >>
> >>
> >> Last time I was paying attention to this, I believe that I found that
> the
> >> biggest effect was to free unpinned pages from the page cache.  So it
> >> might
> >> not free pages if you're in a transaction, for instance.  I would guess
> >> that if you had memory-mapped mode on and are doing only reads, there
> >> would
> >> be no pages to free (mmap pages aren't in the page cache).  So you
> should
> >> see results if you start a transaction, do a few update statements,
> commit
> >> the transaction, the call sqlite3_release_memory(db).
> >>
> >> -scott
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users at mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] How to make sqlite3_release_memory produce a result?

2016-01-07 Thread Scott Hess
In that case - oops.  I definitely was experimenting with the db-centric
version.  It looks to me like the lowest-level mechanism is similar between
the two, but sqlite3_db_release_memory() isn't optionally compiled.

-scott


On Thu, Jan 7, 2016 at 12:32 AM, Bart Smissaert 
wrote:

> OK, will try that.
> sqlite3_release_memory doesn't have the DB connection as an argument, but
> found
> sqlite3_db_release_memory and that has that as an argument and that may
> work better.
>
> RBS
>
> On Thu, Jan 7, 2016 at 6:05 AM, Scott Hess  wrote:
>
> > On Wed, Jan 6, 2016 at 3:03 PM, Bart Smissaert  >
> > wrote:
> >
> > > Have compiled sqlite3.dll (latest) compiled with
> > ENABLE_MEMORY_MANAGEMENT,
> > > but sofar
> > > not been able yet to make sqlite3_release_memory produce anything else
> > than
> > > 0.
> > > What would be the simplest way to make this happen?
> > > I don't want to do this with C coding, so it should be some SQL
> scenario
> > or
> > > to do with simple
> > > SQLite functions such as sqlite3_step, _prepare, -finalize etc.
> >
> >
> > Last time I was paying attention to this, I believe that I found that the
> > biggest effect was to free unpinned pages from the page cache.  So it
> might
> > not free pages if you're in a transaction, for instance.  I would guess
> > that if you had memory-mapped mode on and are doing only reads, there
> would
> > be no pages to free (mmap pages aren't in the page cache).  So you should
> > see results if you start a transaction, do a few update statements,
> commit
> > the transaction, the call sqlite3_release_memory(db).
> >
> > -scott
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] How to make sqlite3_release_memory produce a result?

2016-01-06 Thread Scott Hess
On Wed, Jan 6, 2016 at 3:03 PM, Bart Smissaert 
wrote:

> Have compiled sqlite3.dll (latest) compiled with ENABLE_MEMORY_MANAGEMENT,
> but sofar
> not been able yet to make sqlite3_release_memory produce anything else than
> 0.
> What would be the simplest way to make this happen?
> I don't want to do this with C coding, so it should be some SQL scenario or
> to do with simple
> SQLite functions such as sqlite3_step, _prepare, -finalize etc.


Last time I was paying attention to this, I believe that I found that the
biggest effect was to free unpinned pages from the page cache.  So it might
not free pages if you're in a transaction, for instance.  I would guess
that if you had memory-mapped mode on and are doing only reads, there would
be no pages to free (mmap pages aren't in the page cache).  So you should
see results if you start a transaction, do a few update statements, commit
the transaction, the call sqlite3_release_memory(db).

-scott


[sqlite] whish list for 2016

2016-01-05 Thread Scott Hess
Maybe one option would be to add a layer to affect that explicitly, so that
instead of the problem being that the existing rows can't be reordered
without re-writing the entire table, the problem is to just change the
schema to indicate where the columns should appear in "SELECT *" statements.

Basically, you might have:
  CREATE TABLE x (id INTEGER PRIMARY KEY, v TEXT);
then say:
  ALTER TABLE x ADD COLUMN k TEXT UNIQUE AFTER id;
which converts the table to:
  CREATE TABLE x (id INTEGER PRIMARY KEY, v TEXT, k TEXT UNIQUE AFTER id);

This could be extended to delete columns by instead annotating the schema
to indicate that a column is hidden or ignored.  Perhaps the annotations
could be constrained to only the end of the statement (so that series of
placements and deletions can be comprehended).  All that would be left
would be renaming a column, which is already plausible (just rename the
column in the schema).

You could already implement this by renaming the table and replacing it
with a view plus a trigger.  But specifying that would be somewhat involved
and brittle.

WRT handling things like view and trigger and foreign key references, I
think that would be reasonable to just expose manually.   Replace the views
and triggers in the same transaction.  Provide a means to replace column
definitions so that foreign key references can be handled in a similar
fashion.

[Personally my position is that "SELECT * FROM x" is not appropriate for
production code.  Otherwise you're implicitly relying on implementation
details.]

-scott


On Tue, Jan 5, 2016 at 5:23 AM, gunnar  wrote:

> Indeed those two possibilities! And perhaps also to MODIFY a column. We
> alter tables all the time :)
>
> The possibility to specify at which place an added column should appear
> when I do a "SELECT * ..." would be a nice extra, in MySQL it is done like:
> ALTER TABLE table_name ADD COLUMN col_name1 [_AFTER_ col_name2 | FIRST]
>
>
>
>
> Gr.,
> Gunnar
>
>
>
>
> On 12/24/2015 05:14 PM, Simon Slavin wrote:
>
>> On 24 Dec 2015, at 3:12pm, John McKown 
>> wrote:
>>
>> ?I'm curious as to why. Doing so would, most likely, require rewriting the
>>> entire table. If you want a SELECT * to get the columns in a particular
>>> order, just create a VIEW with the columns in the order in which you want
>>> them.
>>>
>> I think Gunnar just wants forms such as
>>
>> ALTER TABLE CREATE COLUMN ...
>> ALTER TABLE DROP COLUMN ...
>>
>> to aid with making small changes to the schema.  These are available in
>> most SQL engines but the way SQLite3 is written makes it difficult or
>> inefficient to implement them.
>>
>> I have a database where one table takes up more than 30 Gigabytes of
>> space.  While developing the software I needed several times to change a
>> column definition and since SQLite lacks these facilities I had to move 30
>> Gig of data around every time I did it.  Annoying.  But it's not normally
>> that much of a problem for me.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users at mailinglists.sqlite.org
>> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] TEXT columns with Excel/VBA

2015-12-03 Thread Scott Hess
On Thu, Dec 3, 2015 at 4:49 AM, R Smith  wrote:

> On 2015/12/03 3:04 AM, Scott Hess wrote:
>
>> I discourage this kind of usage because it means that in some distant
>> future when someone has to make things work with a different database
>> engine, they have to grind through and check every weirdo VARCHAR(73) and
>> MEDIUMBIGINT declaration someone put in, because none of them have ever
>> been tested with range enforcement enabled.  So where someone meant
>> "VARCHAR(256)" but actually typed "VARCHAR(6)", will your code suddenly
>> start throwing clean errors which immediately suggest where to look, or
>> will it just muddle through corrupting your data?  There can certainly be
>> subtle issues in any type, but my experience is that when you're trying to
>> decode someone's code, it's easiest when the code says what is actually
>> happening, not what it wishes were happening!
>>
>
> I posit that a column declared as:
> col VARCHAR(32) NOT NULL
> says a whole lot more about what is actually happening than:
> col TEXT NOT NULL
>

I'm saying that:
   col TEXT NOT NULL
fairly describes what is actually happening, but:
   col VARCHAR(32) NOT NULL
implies that things are happening which are not happening.  CHAR is kind of
happening, in that it is character data, VAR is not happening in that it
can store more data than a 1-byte prefix can describe, and 32 is definitely
not happening.

And sure, I agree a mistyped word can be hard to decode by a future
> programmer, but that is a statistical probability in any case.
>

For most engines if you type VRCHAR(32) you'll get an error.  In SQLite
you'll get a field with TEXT affinity.  If you typed VRCHR(32) SQLite will
give you a field with no affinity, which will work just fine for the most
part.  If you develop code with VARCHAR(6) on another database, you'll
notice that your new code isn't storing your 60-character strings pretty
quickly, but if it's three years later and you're porting, you might _not_
notice the problem unless you have a good test suite in place.

[I'd _love_ something like SQLITE_ENABLE_PEDANTIC and "PRAGMA pedantic =
true" to provide an extra layer of checking on these things.  When you find
that someone made a mistake in shipping code, you have to decide whether to
risk fixing it, or just to leave it be, and I'd rather have stronger
assertions about this kind of thing up front.]

Of course, if you are _currently_ writing cross-platform code, absolutely
>> write the cross-platform code!  And I will agree that the above position
>> has some issues when faced with things such as INT being 32 bits//
>> etc.
>>
>
> I *ALWAYS* write cross-platform code as far as SQL is concerned. I even
> think in this day and age every programmer should, or is there a case for
> the opposite?


My experience has always been that cross-platform code written to run on a
single platform turns out to not be very cross-platform at all.  Changing
TEXT to VARCHAR as appropriate is the kind of thing which is generally
pretty easy to do, the hard parts will be the things that nobody even
realized had platform issues, like INT storing 32 bits rather than 64 bits,
or subtle differences in treatment of NULL values.

-scott


[sqlite] TEXT columns with Excel/VBA

2015-12-02 Thread Scott Hess
On Wed, Dec 2, 2015 at 4:29 PM, R Smith  wrote:
>
> Personally I use VARCHAR(Len) in table column definitions - simply because
> my schema is then directly interchangeable with MySQL/PostGres and the
> SQLite query planner sometimes notes that length when considering data
> shape - but for data purposes, SQLite doesn't care and neither do any
> wrappers I know of. Also, I can't imagine Excel would have an issue,
> whether referencing a data object via VB Script or OLE DB table import etc,
> I have never seen it matter in Excels 2003 through 2013.


I discourage this kind of usage because it means that in some distant
future when someone has to make things work with a different database
engine, they have to grind through and check every weirdo VARCHAR(73) and
MEDIUMBIGINT declaration someone put in, because none of them have ever
been tested with range enforcement enabled.  So where someone meant
"VARCHAR(256)" but actually typed "VARCHAR(6)", will your code suddenly
start throwing clean errors which immediately suggest where to look, or
will it just muddle through corrupting your data?  There can certainly be
subtle issues in any type, but my experience is that when you're trying to
decode someone's code, it's easiest when the code says what is actually
happening, not what it wishes were happening!

Of course, if you are _currently_ writing cross-platform code, absolutely
write the cross-platform code!  And I will agree that the above position
has some issues when faced with things such as INT being 32 bits on some
platforms, which I guess would argue for using BIGINT and BIGTEXT.  Unless
you need VARCHAR(MAX) or LONGTEXT, but maybe MEDIUMTEXT is more portable
... and pretty soon you give up and just circle back to not decorating with
unused/unchecked type annotations.

-scott


[sqlite] How hard is it to add the constraint name to the 'FOREIGN KEY constraint failed' message?

2015-11-18 Thread Scott Hess
On Wed, Nov 18, 2015 at 3:22 PM, Yuri  wrote:

> On 11/18/2015 09:55, R Smith wrote:
>
>> There is no "first" constraint that can fail. There is a procession of
>> constraints either within a statement or within a transaction (both can
>> have many constraints) and as they are required, they are counted up, and
>> as they become resolved they are counted down. At the end the result is
>> simply either that "some constraint failed" or "All resolved eventually".
>>
>
> Why not have two variants of the error message: one for immediate failure
> with the foreign key name, and one like now, for the complicated case of
> delayed constraints?


What do you plan to use the error message for?  There is generally no
interface contract with error messages, so you can't generally depend on
them being readable by code.  They are helpful to the developer, but
usually they just save you a few minutes figuring it out yourself.  I'm not
hating on your few minutes, but saving a few milliseconds or bytes at
runtime on a few million devices is worth a periodic few minutes of my
time.  You can use SQL to generate pretty complicated schema and queries,
so in the limit this kind of problem can be pretty deep to resolve!

Maybe it would make sense to have some sort of
SQLITE_WITH_EXTENDED_FK_ERRORS type define so that you can get a better
edit/compile/run loop going in development, without being expensive in
production.

-scott


[sqlite] Retrieving the table info fails

2015-11-16 Thread Scott Hess
On Mon, Nov 16, 2015 at 11:20 AM, R Smith  wrote:

> On 2015/11/16 7:59 PM, Igor Korot wrote:
>>
>> BTW, are only name, type and pk fields are guaranteed to have a value?
>>
>
> Nothing is guaranteed to have a value unless created with NOT NULL in the
> field specification in the CREATE TABLE schema SQL.
>
> Usually (in most DBs) a Primary key is guaranteed to not have NULL values,
> but in SQLite it is permissible if the PK is not specifically stated to be
> NOT NULL.


I'll go one further - since this is the results from a PRAGMA call, IMHO
you should make no assumptions at all about whether any of the return
values can be NULL.  PRAGMA are entirely non-standard, and unless something
is specifically guaranteed the outcome is open to changes due to future
implementation changes.

-scott


[sqlite] Array or set type or some other indexable data?

2015-11-12 Thread Scott Hess
On Thu, Nov 12, 2015 at 4:52 PM, J Decker  wrote:

> > So something like "select value from option_map_view where path is
>  > set type>"?
> > A path name like '/system/device/com port/1' is used as an array of names
> > here. Only the indexing with intst and substr is laborious. Maybe some
> > future SQLite version includes a group_split function to make this
> easier.
> > It seems impossible to me in plain SQL to write an efficient view for a
> > query like this. As you say below the whole map need to be walked at each
> > query to find a match.
> > User defined functions, virtual tables or function based indexes may
> offer
> > an efficient solution.
> >
>
> Hmm a user function could be interesting; hard to see a threadsafe
> version but could keep the array of values internally and and return
> them as 'getOption(n)' as I descend each layer...
>

I think this case wants something like a table-valued user function:
   https://www.sqlite.org/vtab.html#tabfunc2
as used by:
   https://www.sqlite.org/json1.html

That's basically what you just said, except in a way which isn't kludging
things together.

[I think, apologies if I mis-read things.]

-scott


[sqlite] Random performance issues with SQLite

2015-11-10 Thread Scott Hess
Also look for failures at the bus level.  I have had cases where some
component was mucking up the bus, and got long pauses because the OS kept
resetting the bus.  These caused huge pauses, like 30 seconds or more, and
due to how various OS components mostly do synchronous disk access, that
kind of thing can have pretty wide-ranging effects.

[For Linux, usually the errors will show up in the dmesg output.  No idea
where to look for other operating systems.]

-scott


On Tue, Nov 10, 2015 at 1:52 AM, Simon Slavin  wrote:

>
> On 10 Nov 2015, at 8:18am, OBones  wrote:
>
> > However, using tools to read the SMART values for this harddrive, I
> noticed that it had been eating up spare sectors for quite a while.
>
> That's actually a good suggestion for you: look at the SMART values, run
> any hard disk diagnostics you have, etc..  Once you've backed-up, of course.
>
> > So to answer, yes, it is definitely possible that the hardware is
> crashing and windows does not notice it.
>
> You will see errors that components report to Windows.  But a dodgy sector
> on a drive may just cause the driver to retry reading repeatedly (mystery
> delays) at the driver level, and be reported as an error only if reading
> fails completely.
>
> Simon.
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Scott Hess
On Thu, Oct 29, 2015 at 10:59 AM, Jason H  wrote:
>
> The documentation does not go into the detail of the engine is able to
> skip the reading of unneeded interior rows. In theory, it can because the
> length is contained in the header. So instead of read() on every column in
> the row, it can call seek() if it knows it doesn't need that column. My
> question is now simply: does it seek past unneeded columns, or does
> everything get send through read once data has started being read? It's a
> minor detail with bug performance implications.


Everything is page-structured, so you can't skip anything smaller than a
page (even if you could, there may be no benefit, since seeks cost way more
than reads).  Rows which are larger than fit in a page use a singly-linked
list of overflow pages, so you can't get to page N+1 without reading page N
first.

-scott


[sqlite] How would sqlite read this from disk?

2015-10-29 Thread Scott Hess
On Thu, Oct 29, 2015 at 10:20 AM, Jason H  wrote:
>
> If I could ask a followup question. You made the statement "SQLite reads
> that row of the table from storage, from the first column to the last
> column needed by the SELECT, but perhaps not all the way to the end of the
> columns in the row.", Given an example select that requires columns 2,3,5,
> does this mean that column 4 is also read, or is column 4 be skipped? I
> guess the question is I am assuming that a row is serialization of one or
> more strings that are a serialization of a string preceded by their lengths
> (i.e. [L|data] ), would SQLite do something akin to (using my 2,3,5
> example):
> begin row read, read L1, seek L1, read L2, read L2*byte, read L3, read
> L3*byte, read L4, seek L4, read L5, read L5*byte
> or would the 'read L4, seek L4' be changed to 'read L4, read L4*byte' ?


You should consider reading https://www.sqlite.org/fileformat2.html ,
especially sections 2.3 "Representation Of SQL Tables" and 1.5 "B-tree
Pages".  If your project _really_ needs to know this level of detail, then
you really should read up on the underlying system.  Also maybe throw in
https://www.sqlite.org/arch.html to get a broad feel of how things break
down.

-scott


[sqlite] Simple Math Question

2015-10-23 Thread Scott Hess
You're right, any base-2 representation right of the decimal should be
precise to represent in base-10.  But it's the kind of thing where if you
find yourself counting on it, you probably made a grave error earlier in
your design :-).

-scott


On Fri, Oct 23, 2015 at 10:40 AM, Alessandro Marzocchi <
alessandro.marzocchi at gmail.com> wrote:

> Sorry, i replied to wrong Scott Hess... mine was meant to be a reply to his
> message...
> " Internally, they are base-2 scientific notation,
> so asking for more significant digits in the base-10 representation won't
> help - base-10 fractional numbers cannot always be represented precisely in
> base-2, ALSO base-2 fractional numbers cannot always be represented
> precisely in base-10".
> Sorry for my error.
> Il 23/ott/2015 19:18, "Scott Robison"  ha
> scritto:
>
> On Fri, Oct 23, 2015 at 10:45 AM, Alessandro Marzocchi <
> alessandro.marzocchi at gmail.com> wrote:
>
> > Scott actually all base2 fractions are rapresentable as base10
> > fractions... 10 is divisable by 2.
> > As for Richard try to think about this... computer does 2 base
> > calculations as we usually do calculation in base 10. But the same would
> > happend to us when talking about different bases..
> > Let's say we want to add 3 times 20 minutes (expressed in decimal of
> hours
> > up to 6th digit). 0h20' is 0.33 hours . If you multiply that for 3
> you
> > get 0.99.. not 1 as you expect. The
> >
>
> Yes, they are. What did I write that leads you to believe I don't
> understand that? Or are you addressing a different Scott (as there are
> three in this conversation)? I'm not upset or angry or demanding
> satisfaction, just trying to understand where my commentary went wrong and
> if I need to correct anything for the record.
>
> When I say floating point calculations are "inexact" what I mean is
> "potentially inexact" because conversion from decimal in source code to
> binary floating point in memory is potentially inexact, the arithmetic
> performed will potentially involve rounding of some form, and the final
> conversion back from binary floating point to decimal representation for
> humans can only work with what is left over after those previous potential
> approximations.
>
>
> > Il 23/ott/2015 18:31, "Scott Robison"  ha
> > scritto:
> >
> > > On Fri, Oct 23, 2015 at 9:34 AM, Rousselot, Richard A <
> > > Richard.A.Rousselot at centurylink.com> wrote:
> > >
> > > > Scott,
> > > >
> > > > I agree with everything you said but...  To me if a program/CPU
> > evaluates
> > > > something internally, then when it reports the result it should be
> the
> > > > result as it sees it.  It shouldn't report something different.
> > > >
> > >
> > > This is true to an extent, and there are ways to display something
> "more
> > > exact". But the library programmers wrote code to format floating point
> > > numbers in a way that is appropriate for display to humans. Knowing
> that
> > > floating point calculations are inexact, they round values after a
> > certain
> > > number of decimal places, as most applications expect to see something
> > like
> > > "25" not "24.9995" (numbers made up).
> > >
> > >
> > > >
> > > > So using your analogy, I ask a English speaking person a two
> > interrelated
> > > > questions, they translate the questions to Japanese in their head,
> then
> > > > answers one question in Japanese and another in English.  I say pick
> a
> > > > language and stick with it.  Either answer my question all in English
> > or
> > > > all in Japanese don't mix it.
> > > >
> > > > I think we are getting to hung up on the details of what is going on
> > > > internally.  The real question is why don't the two results, which
> are
> > > > coming from the same program, agree?  (i.e. return 22.99
> > not
> > > > 23.0)
> > > >
> > > > Richard
> > > >
> > > > -Original Message-
> > > > From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:
> > > > sqlite-users-bounces at mailinglists.sqlite.org] On Behalf Of Scott
> Hess
> > > > Sent: Friday, October 23, 2015 10:05 AM
> > > > To: General Discussion of SQLite Database
> > > > Subject: Re: [sqlite] Simple Math Question
> > > >
> > > >

[sqlite] Simple Math Question

2015-10-23 Thread Scott Hess
In one case, you asked "When I add these imprecise values together, do they
equal this other precise value?"  In the other case you asked "When I add
these imprecise values together, what is the decimal expansion?" and then
you noticed that the decimal expansion did not equal that precise value.

My point is that what is going on internally is all there is.  It's not
reporting something different from the result it sees, it is very literally
reporting what it has.  In the language metaphor, you're asking the
questions in English (base-10 in this case), and the computer only knows
how to think in Japanese (base-2 in this case), so you can't avoid the
translation back and forth, and when you give it little bits and pieces
then ask it to put them together, it can't understand your intention from
the bits and pieces.

In your example, the computer didn't at some point think "I had a 23, here,
but I'm going to report 22.9 just for the heck of it".  What probably
happened on the other case is that it had a near-25 value which was closer
to 25 than to 24.99, so it printed 25, whereas on the near-23 case
it was closer to 22.9 than 23, so it went with that.  When you have a
bunch of base-2 representations of base-10 fractional numbers, sometimes
they're slightly too small, sometimes slightly too large.  When you add
them together, sometimes you're lucky and the errors cancel out and you
happen to get what you hoped for, but sometimes the errors go against you
and you end up slightly too small or slightly too large.

-scott


On Fri, Oct 23, 2015 at 8:34 AM, Rousselot, Richard A <
Richard.A.Rousselot at centurylink.com> wrote:

> Scott,
>
> I agree with everything you said but...  To me if a program/CPU evaluates
> something internally, then when it reports the result it should be the
> result as it sees it.  It shouldn't report something different.
>
> So using your analogy, I ask a English speaking person a two interrelated
> questions, they translate the questions to Japanese in their head, then
> answers one question in Japanese and another in English.  I say pick a
> language and stick with it.  Either answer my question all in English or
> all in Japanese don't mix it.
>
> I think we are getting to hung up on the details of what is going on
> internally.  The real question is why don't the two results, which are
> coming from the same program, agree?  (i.e. return 22.99 not
> 23.0)
>
> Richard
>
> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:
> sqlite-users-bounces at mailinglists.sqlite.org] On Behalf Of Scott Hess
> Sent: Friday, October 23, 2015 10:05 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Simple Math Question
>
> On Fri, Oct 23, 2015 at 7:39 AM, Dominique Devienne 
> wrote:
>
> > On Fri, Oct 23, 2015 at 4:16 PM, Rousselot, Richard A <
> > Richard.A.Rousselot at centurylink.com> wrote:
> > > So I decided to output 1000 digits, because why not?  So now I am
> > > more perplexed with all these digits showing it is working the
> > > opposite of
> > how I
> > > expected it.  Why is the second set of equations evaluating to a "yes"
> > when
> > > it is the only one that is obviously NOT equal to the expression???
> >
> > Indeed, that's puzzling :)
>
>
> Just to be clear, though, how floating-point numbers work is breaking your
> expectations because your expectations are wrong when applied to
> floating-point numbers.  Internally, they are base-2 scientific notation,
> so asking for more significant digits in the base-10 representation won't
> help - base-10 fractional numbers cannot always be represented precisely in
> base-2, ALSO base-2 fractional numbers cannot always be represented
> precisely in base-10, so it's like a game of telephone where you can end up
> slightly removed from where you started out, even though it seems like it's
> a simple round trip.  Since each individual digit cannot be represented
> perfectly, it doesn't matter how many digits of precision you ask for,
> you'll always be able to find cases where it doesn't line up like you
> expect.
>
> Think of it this way: Find an English sentence, and find an English to
> Japanese translator.  Translate each individual word of the sentence from
> English to Japanese, then concatenate the results together.  Then translate
> the entire original sentence to Japanese.  The results will almost never be
> the same.  Then do the same process translating the Japanese back to
> English.  Again, the two routes will provide different results, _and_ both
> of those results will almost certainly not match the original English
> sentence.  This isn't a reflection of t

[sqlite] Simple Math Question

2015-10-23 Thread Scott Hess
On Fri, Oct 23, 2015 at 8:19 AM, Jim Callahan <
jim.callahan.orlando at gmail.com> wrote:

> Pocket calculators and COBOL used binary coded decimal (bcd) numbers to
> avoid the representation/round off issues. But this meant another entire
> number type (supported with addition, subtraction and having to be type
> checked in functions) in addition to integer and floating point; most found
> it easier to use integers to keep track on pennies...


Indeed, and the cost was the need to have two completely independent math
systems, one precise and one fast.  For obvious reasons over time people
who did a lot of math just figured out how to make the fast one precise
enough for their needs, so mostly nobody wanted the precise one.  The
obvious choice at that point was to reclaim that silicon space to make
everything else faster, and we ended up here.

-scott


[sqlite] Simple Math Question

2015-10-23 Thread Scott Hess
On Fri, Oct 23, 2015 at 7:39 AM, Dominique Devienne 
wrote:

> On Fri, Oct 23, 2015 at 4:16 PM, Rousselot, Richard A <
> Richard.A.Rousselot at centurylink.com> wrote:
> > So I decided to output 1000 digits, because why not?  So now I am more
> > perplexed with all these digits showing it is working the opposite of
> how I
> > expected it.  Why is the second set of equations evaluating to a "yes"
> when
> > it is the only one that is obviously NOT equal to the expression???
>
> Indeed, that's puzzling :)


Just to be clear, though, how floating-point numbers work is breaking your
expectations because your expectations are wrong when applied to
floating-point numbers.  Internally, they are base-2 scientific notation,
so asking for more significant digits in the base-10 representation won't
help - base-10 fractional numbers cannot always be represented precisely in
base-2, ALSO base-2 fractional numbers cannot always be represented
precisely in base-10, so it's like a game of telephone where you can end up
slightly removed from where you started out, even though it seems like it's
a simple round trip.  Since each individual digit cannot be represented
perfectly, it doesn't matter how many digits of precision you ask for,
you'll always be able to find cases where it doesn't line up like you
expect.

Think of it this way: Find an English sentence, and find an English to
Japanese translator.  Translate each individual word of the sentence from
English to Japanese, then concatenate the results together.  Then translate
the entire original sentence to Japanese.  The results will almost never be
the same.  Then do the same process translating the Japanese back to
English.  Again, the two routes will provide different results, _and_ both
of those results will almost certainly not match the original English
sentence.  This isn't a reflection of the translator's abilities at all.

I'm not saying the computer is always right, just that the computer is
following a very strict recipe with reproducible results.  I don't mean
reproducible like your three examples make logical sense to you, the user,
I mean reproducible like my Intel box gives the same results as my AMD box
as my ARM box.  If you want to be able to deal with fractional decimal
values with high fidelity, you either need to arrange for base-10
representation (slow, because computers have to simulate it), or you have
to do your math in shifted fashion (fast, but can be error prone).

-scott


[sqlite] FTS5 issue on OS X

2015-10-22 Thread Scott Hess
Dollars to donuts you're compiling SQLite but then linking against the
system version.

-scott


On Thu, Oct 22, 2015 at 7:51 AM, Gergely Lukacsy (glukacsy) <
glukacsy at cisco.com> wrote:

> Hi Simon,
>
> Thanks for coming back to me.
>
> I ran sqlite3_compileoption_get in a loop to enumerate all options (I
> assume this is the same as the pragma query suggested below) and I got the
> following:
>
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_FTS3
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_FTS3_PARENTHESIS
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_LOCKING_STYLE=1
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomENABLE_RTREE
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomOMIT_AUTORESET
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomOMIT_BUILTIN_TEST
> 2015-10-22T14:46:24.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomOMIT_LOAD_EXTENSION
> 2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomSYSTEM_MALLOC
> 2015-10-22T14:46:26.000Z  [0x7fff7679d300] DBWrapper.cpp:33
> DBInit:OptiomTHREADSAFE=2
>
>
> I am really puzzled by this as its not that FTS is set to version 3, but
> THREADSAFE=2 which is not what I am giving clang as an option. If I omit
> -DSQLITE_ENABLE_FTS5 I see I can put garbage in the FTS5 specific parts of
> the amalgamation and it still compiles, with -DSQLITE_ENABLE_FTS5 it fails
> - indicating that the FTS5 specific parts are compiled and the
> preprocessof flags honoured.
>
>
> Regards,
> Gergely
>
>
> On 22/10/2015 14:51, "Simon Slavin"  wrote:
>
> >
> >On 22 Oct 2015, at 2:38pm, Gergely Lukacsy (glukacsy)
> > wrote:
> >
> >> ... I get an "error: no module: fts5" error message. This indicates
> >>that I didn't compile SQLite with FTS5 support, but I am pretty sure
> >>that I am using the right preprocessor flags
> >
> >Using that version of SQLite, can you execute the command
> >
> >PRAGMA compile_options;
> >
> >and tell us what it reports back ?  It will reply with a series of rows
> >just like a SELECT command would.
> >
> >See  for
> >further details.
> >
> >Simon.
> >___
> >sqlite-users mailing list
> >sqlite-users at mailinglists.sqlite.org
> >http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] SQLITE_CANTOPEN after days of operation

2015-10-14 Thread Scott Hess
errno 24 is EMFILE "Too many open files".  You almost certainly have a
file-descriptor leak.

-scott


On Wed, Oct 14, 2015 at 12:52 AM, Andrew Miles  wrote:

> Fully opening the directory failed to fix the issue.  So in summary the
> program works for days then dies with this in the log:
>
> (14) cannot open file at line 28488 of [f5b5a13f73]
> (14) os_unix.c:28488: (24) open(/usr/share/cm160) -
> (14) cannot open file at line 28488 of [f5b5a13f73]
> (14) os_unix.c:28488: (24) open(/usr/share/cm160) -
> (14) cannot open file at line 30285 of [f5b5a13f73]
> (14) os_unix.c:30285: (24) open(/usr/share/cm160/eagleowl_stat.db-journal)
> -
> (14) statement aborts at 29: [UPDATE energy_hour_stat SET status = 1 WHERE
> record_count = 60]
>
> Any advice on what could be causing this?
>
> Andy
>
> On 11 October 2015 at 20:56, Andrew Miles  wrote:
>
> > Thanks
> >
> > Log showed it unable to open the directory and then unable to write the
> > journal file.  The directory is root writable and the process is run as
> > root so I didn't expect a problem here.  I've now modified the directory
> > access to be writable by all to see if that changes things.
> >
> > Andy
> >
> > On 5 October 2015 at 11:39, Richard Hipp  wrote:
> >
> >> On 10/5/15, Andrew Miles  wrote:
> >> > Hi
> >> >
> >> > I have a strange issue with a piece of C code using sqlite3.  The code
> >> runs
> >> > for days and then stops on an SQL insert into db with the error code
> >> > SQLITE_CANTOPEN.
> >> >
> >> > Further info:
> >> >
> >> > 1) The program is a status monitoring app - it writes values into the
> >> SQL
> >> > db once every 60 seconds.  Average duration is about 2 days before
> >> dying so
> >> > roughly 3000 successful writes before failure.
> >> > 2) The SQL db is kept open for the entire time i.e. I don't open/close
> >> on
> >> > every transaction.
> >> > 3) There is another process accessing the same db but read only
> >> > 4) I tried waiting 5 secs on the first CANTOPEN and trying again in
> case
> >> > the other process had locked the db but same message.  Even closing
> the
> >> db
> >> > and reopening won't free the error.  The only method is to stop the
> >> program
> >> > and restart, then it works again every time.
> >> > 5) I ran lsof on the db, only one process (this one) had the file open
> >> > 6) The extended error code reported is 14 i.e. no further information
> >> > 7) Disk has 5GB free
> >> >
> >> > Any advice on how to debug further?
> >> >
> >>
> >> Activate the error and warning log.  https://www.sqlite.org/errlog.html
> >>
> >> --
> >> D. Richard Hipp
> >> drh at sqlite.org
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users at mailinglists.sqlite.org
> >> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >>
> >
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] SQLITE_CANTOPEN after days of operation

2015-10-13 Thread Scott Hess
On Sun, Oct 11, 2015 at 12:56 PM, Andrew Miles  wrote:

> Log showed it unable to open the directory and then unable to write the
> journal file.  The directory is root writable and the process is run as
> root so I didn't expect a problem here.  I've now modified the directory
> access to be writable by all to see if that changes things.


This implies that the process was previously able to open journal files in
that directory.  Having access rights to the database files change for no
reason is concerning, but so is having them not change but it just doesn't
work!  You might want to log access() and/or stat() results for the
directory when you see this error.  Also, you might want to check the
rights up the directory tree, and whether you have interesting errors being
reported at the OS level.

Based on your statement that it happens after a few days, and the problem
doesn't go away if you retry, but it _does_ go away if you restart the
process, you might also run lsof against the process.  Maybe the process
has a file-descriptor leak or something of that sort.

Only other things I can think of is to try PRAGMA journal_mode=TRUNCATE.
This truncates the journal file rather than deleting it.  Since the journal
file sticks around between transactions, this may work around the issue.

-scott


[sqlite] [sqlite-dev] SQLite version 3.8.12 enters testing

2015-10-07 Thread Scott Hess
On Wed, Oct 7, 2015 at 9:05 AM, Dominique Devienne 
wrote:

> On Wed, Oct 7, 2015 at 5:39 PM, Richard Hipp  wrote:
> > On 10/7/15, Jaroslaw Staniek  wrote:
> > > ? would you elaborate what? is the
> > > benefit of using x.y.z versioning scheme if so many new features come
> to
> > > the "z" release?
> >
> > [...] The community seems to want the second number (current 8) to
> > increment
> > every time a new feature is added to SQLite.  I will take your request
> > under advisement.  Realize, however, that had the current preferred
> > number scheme been used for SQLite from the beginning, the next
> > release would be called 3.112.
> >
>
> That 3.112 version is a better reflection of all the changes in a way.
>
> Minor version bumps are kinda arbitrary,


Having been involved in open source and commercial software for 25 years,
by now I've figured out that version numbers are themselves kinda
arbitrary.  I honestly don't think that the SQLite mailing list having this
tired old discussion is going to shed any new light on the issue or result
in a system which pleases everyone.  There is no chance that the version
number is going to become a standardized reliable form out out-of-band
messaging.  If you care about what's in there, you're going to have to pay
attention to what's in there.

-scott


[sqlite] Multiple connections to in-memory DB and virtual tables

2015-10-06 Thread Scott Hess
Your thread made me ponder what might be up, so I wrote a test using fts3:
   http://pastebin.com/AKP2yHuM
and AFAICT, it works alright.  I haven't specifically verified each of the
flags to sqlite3_open_v2(), I just spammed what looked relevant in there.

Hmm, should have commented the #if's before posting.  The ":memory:" case
is to verify that it doesn't work right with plain in-memory database.  The
real-file case doesn't work if run twice, to verify that I'm not
accidentally working right with a shared in-memory database.  The third
version should be a real shared in-memory database, as best I can tell.
The other #if verifies that it works the same for regular table or virtual
table.

If I were debugging this, the first place I'd look is my virtual table's
xConnect/xCreate implementation.

-scott


On Tue, Oct 6, 2015 at 7:53 AM, Dominique Devienne 
wrote:

> On Tue, Oct 6, 2015 at 4:39 PM, Simon Slavin  wrote:
>
> > On 6 Oct 2015, at 2:51pm, Dominique Devienne 
> wrote:
> > > Is this use case supported by SQLite?
> >
> > Whether this works correctly depends on how your Virtual Module is
> > written.  It should be possible to implement this correctly, but I can
> > imagine a virtual system written quickly which does not take the care to
> > get this right.
> >
>
> It was not written "quickly" but it was definitely not written with that
> use-case in mind. And writing a virtual module correctly is not easy.
>
>
> > There are also things Virtual Tables can't do.  For instance you cannot
> > index a Virtual Table using SQL commands.
> >
>
> That's a bit beside the point. Our vtable already provide indexing support.
> Just not dynamically added indexing via SQL commands.
>
> From Gunter's answer, it seems at least that this use case is possible,
> even though his is multi-process and DB-file based, I think I see the
> parallel he's trying to make to our multi-threaded (1 connection / thread)
> and in-memory DB use case.
>
> I just wish I could get a bit more hand-holding on how to proceed, if
> possible :). As it is not quite crystal clear to me yet. Especially since
> that vmodule code was written a few years back, so I don't recall the
> details. Thanks, --DD
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Speed of count distinct rows if unique index on all fields

2015-10-02 Thread Scott Hess
On Fri, Oct 2, 2015 at 8:41 AM, Bart Smissaert 
wrote:

> > you're just throwing random terms around and hoping something sticks.
>
> Not sure where you got that idea from, but let me explain better:
>

AFAICT this is the first posting where you said "I want to count all the
unique rows of this table".  Previously, you said that you wanted to count
all the rows of the table (without the qualifier "unique") and you also
said this was faster if you had a unique index.  That didn't make sense,
because the only case where an index should make that faster is if the
index narrows the table rather than containing the table.

I agree, if this is what you want:


> SELECT COUNT(*) AS UNIQUE_ROWS
> FROM (SELECT DISTINCT * FROM TABLE1)
>

Then it could in some cases be reduced to just counting the rows in the
table:

SELECT COUNT(*) FROM TABLE1
>

and that would be faster.  But the latter statement would not itself become
faster with a schema change.

This is why I suggested posting schema and query examples.  Then we're
talking about the same thing.  This could have been a two- or three-post
thread.

-scott


[sqlite] Speed of count distinct rows if unique index on all fields

2015-10-02 Thread Scott Hess
Why does any of that matter?  SELECT COUNT(*) FROM table; already knows all
of that information.

If you have a question about why one query is faster/slower than another
query given one schema versus another schema, then post representative
schema and queries.  Right now you're just throwing random terms around and
hoping something sticks.  Earlier you said that you weren't talking about
DISTINCT as applied to result sets, but just now you're re-introducing
COUNT DISTINCT for some reason.  SQL code will make your question concrete.

-scott


On Fri, Oct 2, 2015 at 7:54 AM, Bart Smissaert 
wrote:

> It is faster because if it knows there is no where or join or whatever row
> limiting condition and it also knows there is
> a unique index on all fields it can simply do select count(rowid) from
> table1 and not do any count distinct.
>
> RBS
>
>
> On Fri, Oct 2, 2015 at 3:51 PM, Scott Hess  wrote:
>
> > On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert  >
> > wrote:
> >
> > > > The Uniqueness of the output depends on which fields are included,
> > JOINs,
> > > UNIONs, etc. etc.
> > >
> > > I am not talking about that situation. I am only referring to a
> situation
> > > where you want to count all
> > > rows in a table. I know it will be uncommon to have an index on all
> > fields
> > > and this is not really a practical
> > > question. I suppose as it so uncommon it is not worth it to put this
> > > optimization in.
> >
> >
> > Is your case of having the unique index on all fields faster than having
> a
> > unique index on a single field?
> >
> > Maybe you should include an example of your schema.  I can't think of how
> > scanning an index on all fields could be smaller than the underlying
> table,
> > so it's unclear how that could be faster.  But a unique index on a subset
> > of the data could be faster simply from being smaller.
> >
> > Also, 10x faster makes me wonder about whether you're accounting for
> > caching effects.  A blunt way to test for that is to run your queries a
> > couple times.  If the first time is slow and the second and later times
> are
> > much faster, then it's likely the cache is causing the speedup.
> >
> > -scott
> > ___
> > sqlite-users mailing list
> > sqlite-users at mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Speed of count distinct rows if unique index on all fields

2015-10-02 Thread Scott Hess
On Fri, Oct 2, 2015 at 7:43 AM, Bart Smissaert 
wrote:

> > The Uniqueness of the output depends on which fields are included, JOINs,
> UNIONs, etc. etc.
>
> I am not talking about that situation. I am only referring to a situation
> where you want to count all
> rows in a table. I know it will be uncommon to have an index on all fields
> and this is not really a practical
> question. I suppose as it so uncommon it is not worth it to put this
> optimization in.


Is your case of having the unique index on all fields faster than having a
unique index on a single field?

Maybe you should include an example of your schema.  I can't think of how
scanning an index on all fields could be smaller than the underlying table,
so it's unclear how that could be faster.  But a unique index on a subset
of the data could be faster simply from being smaller.

Also, 10x faster makes me wonder about whether you're accounting for
caching effects.  A blunt way to test for that is to run your queries a
couple times.  If the first time is slow and the second and later times are
much faster, then it's likely the cache is causing the speedup.

-scott


[sqlite] Any database unique ID across multiple connections ?

2015-09-24 Thread Scott Hess
On Thu, Sep 24, 2015 at 4:56 AM, ALBERT Aur?lien <
aurelien.albert at alyotech.fr> wrote:

> @ Stephan Beal
>
> "Every instance of a :memory: db is a unique instance, so you cannot have
> multiple connections to a single :memory: db."
>
> >> I know, this is one of the various reasons that made my solution not
> really ideal
>
> @ Simon Slavin


You can share in-memory databases using URI names.
   https://www.sqlite.org/inmemorydb.html

"There's a PRAGMA for storing and retrieving a 'user-version' in the
> database:
>
> 
>
> It's not used by SQLite itself, you can use it to store any 32-bit signed
> integer.  So you could store a different number to each of your databases
> and check to see whether the user-version of one connection is the same as
> that from a different connection."
>
> >> That's a good idea, but this 'user-version' pragma is persistent and
> stored on disk. So it's difficult to use in my case : if the user quit the
> application and restart it without loading the same databases (or in the
> same order) assigning 'user-version' id will be really difficult.
>

I'm not exactly sure what your concern is.  If you assign it a random
number, each database will still see a different value next time you open
things, regardless of ordering.  If you're greatly concerned about birthday
paradox, you could do something like:

   CREATE VIEW IF NOT EXISTS GUID AS SELECT '';

and then use that to differentiate databases.

In fact, I expected something like a unique memory address (void*) per
> database.


For the most part they're entirely distinct connections each accessing the
same underlying file, there's no reason to have such a unique memory
address.

You may be able to do something using shared-cache mode:
   https://www.sqlite.org/sharedcache.html
I can't immediately tell if this would mean a single underlying handle to
the file or not.  If so, you could maybe use sqlite3_file_control() to dig
into things.  But shared-cache mode changes how things work a bit, so it
might not be appropriate for this use case.

-scott


[sqlite] json1.c: isalnum(), isspace(), and isdigit() usage

2015-09-17 Thread Scott Hess
On Thu, Sep 17, 2015 at 1:24 PM, Ralf Junker  wrote:

> On 17.09.2015 20:14, Scott Hess wrote:
>
>> The problem is that there are LOCALE settings where tolower() does things
>> C
>> programmers don't expect.  I think tr_TR was one case, the handling of 'I'
>> (Google "tr_tr locale bug" and you'll see lots of people hitting the same
>> general problem).  It isn't a problem of type safety, it's a problem that
>> the same inputs might have different outputs for certain library functions
>> when you change environment variables.  I don't remember whether there
>> were
>> specific problems with other ctype functions, or if I just thought it was
>> a
>> good idea to be careful, once I realized the class of problem.
>>
>
> And this check-in therefore misses the point as it does not address this
> LOCALE problem IMHO:
>
> http://www.sqlite.org/src/info/6713e35b8a8c997a


Hmm.  Well, it might miss _a_ point, while solidly landing some other point
:-).

Current fts3 seems to spread this code differently, under fts2.c it was
like:

https://code.google.com/p/chromium/codesearch#chromium/src/third_party/sqlite/src/ext/fts2/fts2.c=336
where it's not a wrapper around the library, instead it was a direct
implementation of the functions which only acted on 7-bit values in the
ASCII set.

I think these should really be in terms of sqlite3UpperToLower
and sqlite3CtypeMap.  That might be an issue to expose to an extension
sensibly.

-scott


[sqlite] Suggestion: Regularize output of setting pragmas.

2015-09-17 Thread Scott Hess
Often, PRAGMA are documented like mmap_size, like:

> Query or change the maximum number of bytes that are set aside
> for memory-mapped I/O on a single database. The first
> form (without an argument) queries the current limit. The
> second form (with a numeric argument) sets the limit for the
> specified database, or for all databases if the optional
> database name is omitted.

I notice that mmap_size _also_ returns the set value, which is useful to
query since it could be constrained by compile-time options or
sqlite3_config settings.  Like:

   sqlite> pragma mmap_size = 3221225472;  -- 3GB
   2147418112  -- 2GB - 64KB

Sometimes pragma don't do this:

   sqlite> pragma auto_vacuum = 1;
   sqlite> pragma auto_vacuum;
   1

Might it be useful to have a general policy that all PRAGMA calls return a
result consistent with the first form, even if an argument is passed?  As
things currently stand, you really shouldn't depend on behavior like
mmap_size's current implementation to do set-and-query, but you also can't
write code as if setting pragma do not return results.

-scott


[sqlite] json1.c: isalnum(), isspace(), and isdigit() usage

2015-09-17 Thread Scott Hess
The problem is that there are LOCALE settings where tolower() does things C
programmers don't expect.  I think tr_TR was one case, the handling of 'I'
(Google "tr_tr locale bug" and you'll see lots of people hitting the same
general problem).  It isn't a problem of type safety, it's a problem that
the same inputs might have different outputs for certain library functions
when you change environment variables.  I don't remember whether there were
specific problems with other ctype functions, or if I just thought it was a
good idea to be careful, once I realized the class of problem.

[My run-in with this issue was in development of fts2.c/fts3.c.  I'm sure
the same general series of events leads to similar local implementations in
other projects :-).]

-scott


On Thu, Sep 17, 2015 at 7:03 AM, Jan Nijtmans 
wrote:

> 2015-08-26 19:03 GMT+02:00 Ralf Junker :
> > ext/misc/json1.c uses the following functions from the C library:
> >
> > isalnum(): http://www.sqlite.org/src/artifact/541004e47235cefc?ln=564
> > isspace(): http://www.sqlite.org/src/artifact/541004e47235cefc?ln=635
> > isdigit(): http://www.sqlite.org/src/artifact/541004e47235cefc?ln=829
>
> > Shouldn't json1.c avoid them for the same reasons?
>
> Simpler is: cast the argument to (unsigned char), that
> has the same effect (but is more efficient).
>
> Proposed patch below.
>
> Thanks!
>
> Regards,
>   Jan Nijtmans
> ==
> --- ext/misc/json1.c
> +++ ext/misc/json1.c
> @@ -583,18 +583,18 @@
>char c;
>u32 j;
>int iThis;
>int x;
>JsonNode *pNode;
> -  while( isspace(pParse->zJson[i]) ){ i++; }
> +  while( isspace((unsigned char)pParse->zJson[i]) ){ i++; }
>if( (c = pParse->zJson[i])==0 ) return 0;
>if( c=='{' ){
>  /* Parse object */
>  iThis = jsonParseAddNode(pParse, JSON_OBJECT, 0, 0);
>  if( iThis<0 ) return -1;
>  for(j=i+1;;j++){
> -  while( isspace(pParse->zJson[j]) ){ j++; }
> +  while( isspace((unsigned char)pParse->zJson[j]) ){ j++; }
>x = jsonParseValue(pParse, j);
>if( x<0 ){
>  if( x==(-2) && pParse->nNode==(u32)iThis+1 ) return j+1;
>  return -1;
>}
> @@ -601,17 +601,17 @@
>if( pParse->oom ) return -1;
>pNode = >aNode[pParse->nNode-1];
>if( pNode->eType!=JSON_STRING ) return -1;
>pNode->jnFlags |= JNODE_LABEL;
>j = x;
> -  while( isspace(pParse->zJson[j]) ){ j++; }
> +  while( isspace((unsigned char)pParse->zJson[j]) ){ j++; }
>if( pParse->zJson[j]!=':' ) return -1;
>j++;
>x = jsonParseValue(pParse, j);
>if( x<0 ) return -1;
>j = x;
> -  while( isspace(pParse->zJson[j]) ){ j++; }
> +  while( isspace((unsigned char)pParse->zJson[j]) ){ j++; }
>c = pParse->zJson[j];
>if( c==',' ) continue;
>if( c!='}' ) return -1;
>break;
>  }
> @@ -620,18 +620,18 @@
>}else if( c=='[' ){
>  /* Parse array */
>  iThis = jsonParseAddNode(pParse, JSON_ARRAY, 0, 0);
>  if( iThis<0 ) return -1;
>  for(j=i+1;;j++){
> -  while( isspace(pParse->zJson[j]) ){ j++; }
> +  while( isspace((unsigned char)pParse->zJson[j]) ){ j++; }
>x = jsonParseValue(pParse, j);
>if( x<0 ){
>  if( x==(-3) && pParse->nNode==(u32)iThis+1 ) return j+1;
>  return -1;
>}
>j = x;
> -  while( isspace(pParse->zJson[j]) ){ j++; }
> +  while( isspace((unsigned char)pParse->zJson[j]) ){ j++; }
>c = pParse->zJson[j];
>if( c==',' ) continue;
>if( c!=']' ) return -1;
>break;
>  }
> @@ -656,21 +656,21 @@
>  jsonParseAddNode(pParse, JSON_STRING, j+1-i, >zJson[i]);
>  if( !pParse->oom ) pParse->aNode[pParse->nNode-1].jnFlags = jnFlags;
>  return j+1;
>}else if( c=='n'
>   && strncmp(pParse->zJson+i,"null",4)==0
> - && !isalnum(pParse->zJson[i+4]) ){
> + && !isalnum((unsigned char)pParse->zJson[i+4]) ){
>  jsonParseAddNode(pParse, JSON_NULL, 0, 0);
>  return i+4;
>}else if( c=='t'
>   && strncmp(pParse->zJson+i,"true",4)==0
> - && !isalnum(pParse->zJson[i+4]) ){
> + && !isalnum((unsigned char)pParse->zJson[i+4]) ){
>  jsonParseAddNode(pParse, JSON_TRUE, 0, 0);
>  return i+4;
>}else if( c=='f'
>   && strncmp(pParse->zJson+i,"false",5)==0
> - && !isalnum(pParse->zJson[i+5]) ){
> + && !isalnum((unsigned char)pParse->zJson[i+5]) ){
>  jsonParseAddNode(pParse, JSON_FALSE, 0, 0);
>  return i+5;
>}else if( c=='-' || (c>='0' && c<='9') ){
>  /* Parse number */
>  u8 seenDP = 0;
> @@ -729,11 +729,11 @@
>if( zJson==0 ) return 1;
>pParse->zJson = zJson;
>i = jsonParseValue(pParse, 0);
>if( pParse->oom ) i = -1;
>if( i>0 ){
> -while( isspace(zJson[i]) ) i++;
> +while( isspace((unsigned char)zJson[i]) ) i++;
>  if( zJson[i] ) i = -1;
>}
>if( i<=0 ){
>  

[sqlite] What is the best page cache size when the database is larger than system RAM?

2015-09-14 Thread Scott Hess
On Sun, Sep 13, 2015 at 8:18 PM, David Barrett 
wrote:

> Hello!  If I have a database that is larger than the system's physical RAM,
> am I correct in thinking I should actually set a very *small* page cache so
> as to avoid "double caching" the same pages in both sqlite and the file
> cache?
>
> To clarify, if the database is *smaller* than system RAM, I imagine the
> best case scenario (for read performance at least) would be to set a page
> cache size equal to or larger than the database size.  This would ensure
> there is always enough RAM set aside to store up to the entire database in
> the page cache.  (Some subset of the database would also be stored in the
> file cache, but once the page cache is warmed up, you would never touch the
> file cache.)
>
> However, if the database is *larger* than system RAM, even assigning all of
> it to the page cache wouldn't enable it to store the whole database in RAM,
> no matter how "warmed up" it gets.  Furthermore, every time you "miss" the
> page cache, you would always *also* miss the file system cache (because the
> only way a page makes it into the page cache is by first reading it from
> the file system).  So every page read into the page cache requires not just
> that memory, but the same amount of memory in the file system cache.  So
> the larger the ratio between database and system size, the more likely you
> spend 2x the RAM for each page in the page cache.
>
> Accordingly, it would seem that the best configuration for a database
> significantly larger than the system RAM (assuming a perfectly uniform
> access pattern where every page is accessed with equal probability) would
> be to disable the page cache entirely -- and thereby preserve as much RAM
> as possible for the file cache.  Yes, it would be better to have it in the
> page cache than the file cache, but I imagine it's vastly better to read
> from the file cache than to go to the disk.  Is this right?
>

For a read-mostly database, enabling memory-mapped I/O would work well, if
your system supports it.  In that case, SQLite handles pages using a small
wrapper structure which references the memory-mapped data which is in the
filesystem buffers directly.

If that isn't an option, you shouldn't disable the page cache entirely.
SQLite uses b-trees to organize data, so even if every piece of data has
equal likelihood for access, the b-tree interior node pages have _much_
greater likelihood of access than leaf-node pages.  So your cache should be
big enough to hold all/most of the interior pages, plus enough extra space
to make sure they aren't being accidentally evicted too often.  The OS may
be able to figure out the right caching pattern for these pages, but SQLite
certainly has more knowledge about how they're used.

There is the opposite option, contrive to have the database opened with an
O_DIRECT type access, so that the filesystem reads into SQLite's buffers
and doesn't keep a cache.  You'd still want to leave enough free memory for
the filesystem to cache things like indirection nodes, but usually those
are very efficiently packed.  Then you can devote lots of memory to
SQLite's cache.

Of course, any of these are just fiddling while the Titanic sinks.  When
faced with a need to do literal random access to a file bigger than memory,
it almost doesn't matter how you structure things, it's going to be
painful.  You'd be much better off to rearrange your access patterns to be
more streaming in nature, then you can just allocate modest cache to SQLite
and let the filesystem do appropriate read-ahead.

-scott


[sqlite] Feedback request: JSON support in SQLite

2015-09-11 Thread Scott Hess
On Fri, Sep 11, 2015 at 8:58 AM, Richard Hipp  wrote:

> Draft documentation for the current design of JSON support in SQLite
> can be seen on-line at
>
> https://www.sqlite.org/draft/json1.html
>
> Your feedback is encouraged.
>
> All features described in the document above are implemented and
> working in the latest trunk version of SQLite, which you can download
> in source-code form from https://www.sqlite.org/src/timeline?y=ci and
> compile yourself using instructions found at
> https://www.sqlite.org/src/doc/trunk/README.md


Given json_each(), you can transform json into a table-alike feature.  I
found myself wondering if there was anything which could easily take a
result set and convert it to json.  I can see that something like this
could convert a result row into a json object:
  SELECT json_object("c", c, "d", d", ...) FROM ...;
but it's not obvious to me how to feed all of those json objects to
json_array().  You could do things like string together a bunch of
sub-queries, I suppose.

Riffing off group_concat(), maybe something like this makes sense:
  SELECT json_group_array(json_object("c", c, "d", d)) FROM t WHERE
 GROUP BY ;

Now, if you were to ask me what I would use that for ... I would come up
blank.  I mostly suggested it because it feels like there is a piece
missing.

===break===

Someone else mentioned json_array_index() - while I agree that
json_extract() can do the same thing, it does somewhat feel like a natural
thing to have json_array_index().  And it could optionally support slicing:
  json_array_index('[1,2,3,4]', 1) -> 2
  json_array_index('[1,2,3,4]', 2, 2) -> '[3,4]'
Though I suppose something like:
  json_extract('[1,2,3,4]', '$[0,2..3]')
might be a cleaner place to put a slicing syntax.  Of course if I went
there, I'd also wonder about '$.{a,c}' as syntax to generate an object
slice.  Maybe v+1 for this kind of thing.

===break===

In the table for json_insert/replace/set(), I suspect the last row should
have function json_set().

-scott


[sqlite] Thread safety problem encountered by people working on WebKit

2015-09-10 Thread Scott Hess
On Thu, Sep 10, 2015 at 4:58 PM, Scott Hess  wrote:
>
> The same basic logic applies to sqlite3_initialize()'s testing and setting
> of sqlite3GlobalConfig.isInit , in a different thread+core the test can see
> "true" before that core sees the setup implied by isInit being set to
> "true".
>

BTW, I _think_ this is broadly the problem in double-checked locking
documents where the reference to the instance is correctly accessed, but
the instance itself is not guaranteed to be fully constructed in the case
of the unlocked check.

-scott


[sqlite] Feature request for sqlite3_initialize().

2015-09-10 Thread Scott Hess
On Sat, Sep 5, 2015 at 6:42 PM, Darin Adler  wrote:

> Michael is planning a workaround in WebKit that will call
> sqlite3_initialize manually exactly once before WebKit uses sqlite, using
> std::once to deal with the thread safety issue.
>

This reminds me ... I was recently working on a patch which used
sqlite3_config(), which needs to run before sqlite3_initialize().  Since we
were already calling sqlite3_initialize(), I put it in the obvious place.
Then I found a case where someone was making a sqlite3_*() call before
calling into the normal code path.  That worked fine for lazy
initialization, but caused my sqlite3_config() to start failing depending
on what happened in which order.

It would be modestly interesting to have a macro SQLITE_REQUIRE_INIT (or
similar) which caused SQLite to throw errors if an API is being called
before sqlite3_initialize().  My pencil implementation would probably
piggyback on SQLITE_OMIT_AUTOINIT, changing those callsites from
sqlite3_initialize() to sqlite3Initialize(), then having that
implementation look something like:

#ifndef SQLITE_OMIT_AUTOINIT
int sqlite3Initialize() {
#ifdef SQLITE_REQUIRE_INIT
  return sqlite3GlobalConfig.isInit ? SQLITE_OK : SQLITE_MISUSE;
#else
  return sqlite3_initialize();
#endif
}
#endif

Mostly I'd want this in debug and testing builds, so that I could have
confidence in setting SQLITE_OMIT_AUTOINIT  for release builds.

Thanks,
scott


[sqlite] Thread safety problem encountered by people working on WebKit

2015-09-10 Thread Scott Hess
On Sat, Sep 5, 2015 at 8:02 PM, Richard Hipp  wrote:

> On 9/5/15, Darin Adler  wrote:
> > Hi folks.
> >
> > I?m sending this on behalf of Michael Catanzaro, a contributor to the
> WebKit
> > open source project, who is working on a WebKit bug report, "Crash when
> > WebCore::SQLiteFileSystem::openDatabase is called from multiple threads"
> > , which seems to be
> caused
> > by an issue in SQLite. In short, we've noticed many applications that use
> > WebKit crash when sqlite3_initialize is called simultaneously in multiple
> > threads in the Fedora platform
> > 
> > 
> > 
> >   despite the fact
> that
> > sqlite3_initialize is documented to be thread-safe and called
> automatically
> > by the library when needed < https://sqlite.org/c3ref/initialize.html>.
> >
> > Michael is planning a workaround in WebKit that will call
> sqlite3_initialize
> > manually exactly once before WebKit uses sqlite, using std::once to deal
> > with the thread safety issue.
> >
> > We?d like to file an SQLite bug report about this, and as I understand
> it,
> > the proper way for a newcomer to do that is to send mail here.
> >
> > In the process of trying to report this, Michael discovered that the page
> > explaining how to report bugs against SQLite
> >  lists an incorrect
> email
> > address, . Mail to that address is rejected.
> >
>
> Thanks for the bug report.  Code which might fix this has been checked
> into trunk.  I have also corrected the email address on the bug-report
> procedures page.
>

Chromium had a similar issue awhile back at http://crbug.com/248101 , with
the solution of putting sqlite3_initialize() at an appropriate place.
Looks like filing a bug upstream dropped through the cracks, sorry about
that.

If the fix was http://www.sqlite.org/src/info/11a9a786ec06403a , I suspect
it may not be sufficient.  I have built a local Chromium against current
SQLite checked out of fossil (it includes your fix), and when I run the
tests in the above Chromium bug under tsan (*), I still get tsan alerts
about races in sqlite3_initialize() vs itself and sqlite3MallocInit() vs
sqlite3Malloc() (**).  It's possible that tsan is flagging something
distinct from Michael's report, but it seems reasonable to suspect it's all
related.

As best I can tell (and I admit to being rusty on this), the memory barrier
you added prevents re-ordering across that line of code, so that the other
items are all written before xMutexAlloc is written ... for the current
thread on the current CPU core.  But since there is no memory barrier
before the code checking xMutexAlloc, a thread running on a different core
can see xMutexAlloc as being set even though the setup implied by the
memory barrier has not yet been done from the viewpoint of that core.

The same basic logic applies to sqlite3_initialize()'s testing and setting
of sqlite3GlobalConfig.isInit , in a different thread+core the test can see
"true" before that core sees the setup implied by isInit being set to
"true".

[I _think_ xMutexAlloc being 64 bits is probably safe on most current
processors and compilers, but that did have me nervous for a bit.]

I don't think these can be resolved by adding memory barriers, because the
barriers can race each other.  I was able to satisfy at least tsan with
pthread_mutex locks covering sqlite3_initialize() and sqlite3MutexInit()
(plausibly the latter was overkill given the former), but that's obviously
a non-starter given the goal of lazy initialization for
sqlite3_initialize().  Having a per-thread flag in thread-local storage
might be reasonable, assuming your TLS isn't slower than just using a mutex.

Apologies if I'm analyzing this incorrectly.  This kind of thing is often
subtle, and as I mentioned it's possible we're seeing different problems.

-scott

(*) https://www.chromium.org/developers/testing/threadsanitizer-tsan-v2

(**) Apologies for being vague with that - it's a release-mode compile
because otherwise it can take too long to run, and there's some inlining
making it harder to interpret the results.  I might try to repro in a
reduced context where I can relax the optimizations.


[sqlite] FTS4 - simulate phrase with NEAR/0

2015-08-27 Thread Scott Hess
On Thu, Aug 27, 2015 at 11:29 AM, Martin Kucej <
i.librarian.software at gmail.com> wrote:

> On Thu, Aug 27, 2015 at 1:02 PM, Scott Hess  wrote:
> > NEAR/0 will probably not care about ordering.
>
> Ah, yes. You are correct. This match expression:
>
> MATCH 'column:word1 NEAR/0 column:word2 NEAR/0 column:word3'
>
> matcher both "word1 word2 word3" and "word3 word2 word1" phrases. So,
> it is a no go.


Due to grammar rules, for most languages the query's given order will
probably dominate the results, so performance-wise it might be fine to
post-process using a sub-query or in the client code.  It will only really
lose if the reverse ordering is common relative to the intended ordering.

-scott


[sqlite] FTS4 - simulate phrase with NEAR/0

2015-08-27 Thread Scott Hess
On Thu, Aug 27, 2015 at 10:50 AM, Martin Kucej <
i.librarian.software at gmail.com> wrote:

> Recently, I was asked to implement full-text search in an application
> with up to 1 million items, each with several columns having AND, OR
> and a phrase search capabilities. I can only work with FTS4, which
> does not allow a phrase search with column specifiers, e. g. this is
> invalid:
>
> MATCH 'column1:"lorem ipsum" column2:"gaudeamus igitur"'
>
> My question is - can I simulate phrases with the NEAR keyword? Like so:
>
> MATCH 'column1:lorem NEAR/0 column1:ipsum column2:gaudeamus NEAR/0
> column2:igitur'
>
> The way NEAR keyword is explained, it should work, and all my tests
> give me correct results. I just want to make sure it is always going
> to be correct.
>

NEAR/0 will probably not care about ordering.

It might be easier to just put the columns in separate fts4 tables and join
on rowid.  Or phrase-match the columns separately in sub-queries against
the same table (one sub-query does column1 MATCH "lorem ipsum" the other
does the other match, and you join them together).

I think performance-wise, joining on rowid shouldn't be a problem unless
you have a very large number of hits for both queries.

-scott


[sqlite] order by not working in combination with random()

2015-08-27 Thread Scott Hess
I keep thinking I remember a thread from years ago where a lot of this was
hashed out, but I cannot find it.

I seem to remember one point which made sense was that while most functions
with no parameters were reasonably considered static across the entire
statement's execution, RANDOM() needed to be a special case to be useful
for much.  For instance:
   SELECT RANDOM();
would be an obvious way to select a random value, with no ambiguity, while
in either of these two cases:
   SELECT *, RANDOM() FROM t;
   SELECT * FROM t ORDER BY RANDOM() LIMIT 1;
the statements make no sense if RANDOM() is lifted as constant across the
entire execution.  But only in the second case is SQLite really in a
privileged position - in the first case, the client code could trivially
augment the result set with random values, while in the second case SQLite
can internally optimize to return a random subset efficiently.

Now, something like:
   SELECT RANDOM(), * FROM t ORDER BY 1 LIMIT 1;
is ambiguous.  On the one hand, the author's intent seems obvious, to pick
a random row and keep the value used to pick the row.  On the other hand,
you have to stretch a bit to construct a logical reason why the client code
cares what random values were being used by SQLite for this.  Likewise for
using RANDOM() in the WHERE clause - what possible difference could it make
to the client?

Where I think that ends up is a principle-of-least-surprise position.  If a
statement uses RANDOM() in multiple places, each of those should be
independent each time within a row, and also each time across rows.  But
RANDOM() AS r should allow r to be logically consistent when used in a
WHERE clause or ORDER BY.

There is already some precedent for this, because ORDER BY RANDOM() must
internally be holding the random values used fixed during the course of the
sort, rather than regenerating them each time it looks the value up.

-scott


On Thu, Aug 27, 2015 at 4:11 AM, R.Smith  wrote:

>
>
> On 2015-08-27 03:03 AM, James K. Lowden wrote:
>
>> On Wed, 26 Aug 2015 13:39:09 +0100
>> Simon Slavin  wrote:
>>
>> On 26 Aug 2015, at 11:27am, tonyp at acm.org wrote:
>>>
>>> Plus, it apparently recognizes if the random() expression in the
 ORDER BY is the same as the SELECT one and again sort correctly
 (without re-evaluating) and without needing an alias.

>>> Ah, but I would call /that/ a bug !
>>>
>> I think I can convince you Postgres is right and SQLite is not.  :-)
>> I'll give you 4 reasons:
>>
>> 1.  Theory
>> 2.  Authority
>> 3.  Atomicity
>> 4.  Consistency
>>
>
> Thanks for the informative post, and I do agree with it except for
> specifically the Random() function.
>
> Random() is not a function you use when you expect reproducible results.
> The very nature of the word suggests the opposite.
>
> Further to this, among other posts there seem to be some confusion about
> the purpose of aliasing - especially since people think in programming
> terms. An alias isn't a variable or an array, it doesn't magically hold a
> state - it simply is a nick-name for a function.
>
> This statement:
>   SELECT Random() AS rr ORDER BY rr
>
> cannot ever be functionally different to:
>   SELECT Random() ORDER BY Random()
>
> To use programming terms: rr is not a variable - it does not "hold" a
> value for the Random() function, it simply is a alphanumeric "pointer" to
> the the Random() function (in this case).
>
> I wouldn't go so far as to call the behavior in Postgres a "bug" but just
> an implementation detail - I mean the values do end up being random, do
> they not? They just chose to make the sorting possible by keeping the state
> information - a good thing since it produces what most people (especially
> programmers) seem to expect, but in no way (I think) an absolute adherence
> to a standard.
>
> By James' standard, the Query:
> SELECT Random() FROM t;  -- imagine table t hold 10 tuples
>
> Should provide:
> Random()
> --
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
> 76398763987097830
>
> And I happen to agree if we do apply strict relational theory. The thing
> is though, when people invoke the Random() function, they really do not
> wish to see the above (as opposed to when the reference the Now() or
> similar functions, the above would be exactly what they wish to see). So
> SQLite, like PostGres, offer an alternate implementation detail in this
> regard.
>
> The debate on whether either is absolutely "right" or "wrong" is just a
> silly one.  Postgres has the luxury of not being "Lite" nor needing to run
> on a phone, so the added code and memory to deal with saving a state of
> randomness for an entire result-set is not really problematic in Postgres,
> and not really in the design goals of SQLite.
>
>
> FWIW,
> Ryan
>
>
> ___
> sqlite-users mailing list
> 

[sqlite] Is this a bug ? How to rename a table and all dependencies ?

2015-08-21 Thread Scott Hess
On Fri, Aug 21, 2015 at 1:02 PM, Simon Slavin  wrote:

> On 21 Aug 2015, at 8:13pm, Scott Hess  wrote:
> > Since renameTriggerFunc()
> > follows renameParentFunc(), my guess is that triggers are also handled.
>
> The documentation says that statements within TRIGGERs are not changed
> correctly:
>
> <https://www.sqlite.org/lang_altertable.html>
>
> Perhaps renameParentFunc() suggests that you can rename a TRIGGER, not
> that the commands within TRIGGERs are changed.
>
> I do not know whether the current version of SQLite correctly changes
> TRIGGERs or not.  Since it would require complete parsing of the action
> text, I suspect not.


Hmm.  My impression of both triggers and views is that they don't really
have independent existence, instead they are inlined into relevant
statements.  So that would make sense as an omission.

That said, writing code to do this manually has potential to be error
prone.  It might make sense under an ENABLE flag.  It feels like an obvious
thing to have, not something out-of-scope like embedding a new language or
support for a new storage methodology.

-scott


[sqlite] Is this a bug ? How to rename a table and all dependencies ?

2015-08-21 Thread Scott Hess
On Fri, Aug 21, 2015 at 11:06 AM, Simon Slavin  wrote:

> On 21 Aug 2015, at 7:02pm, sqlite-mail  wrote:
> > I'm pointing this here because postgresql do manage this case properly !
>
> If you want postgres, you know where to find it.
>
> Please don't forget that SQLite has to run on your smartphone and your
> SatNav device and your TV.  It has different design objectives to the big
> client/server database systems.


SQLite already handles this correctly for index definitions.  It also
handles it correctly for foreign keys IFF you have run "PRAGMA foreign_keys
= ON;" implemented by renameParentFunc().  Since renameTriggerFunc()
follows renameParentFunc(), my guess is that triggers are also handled.  So
having views not be handled maybe is just an unintended omission?

-scott


[sqlite] Is this a bug ? Pragmas do not accept qualified names !

2015-08-21 Thread Scott Hess
I think you wanted:
  PRAGMA attached_db.table_info(one_table);

-scott

On Fri, Aug 21, 2015 at 7:35 AM, sqlite-mail 
wrote:

> Hello !
>
> Today I'm working with sqlite3 with attached databases and when I tried to
> get info about a tbale using "PRAGMA TABLE_INFO" I discovered that pragmas
> do
> not work with qualified/prefixed names like:
>
> PRAGMA table_info(attached_db.one_table)
>
>
>
> Is this a bug ?
>
> Cheers !
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Compile warnings

2015-08-20 Thread Scott Hess
Yeah, we saw this with Chromium, too.  The patch we use is below.

I'm with Dr Hipp that this is really more of a GCC issue.  If it was
literally a 0 constant, it would make sense to warn so that the code can be
removed.  But it's only a 0 if you optimize a certain way.

-scott


diff --git a/third_party/sqlite/src/src/expr.c
b/third_party/sqlite/src/src/expr.c
index 4012f6c..65f211e 100644
--- a/third_party/sqlite/src/src/expr.c
+++ b/third_party/sqlite/src/src/expr.c
@@ -856,7 +856,9 @@ static Expr *exprDup(sqlite3 *db, Expr *p, int flags,
u8 **pzBuffer){
   }else{
 int nSize = exprStructSize(p);
 memcpy(zAlloc, p, nSize);
-memset([nSize], 0, EXPR_FULLSIZE-nSize);
+if( EXPR_FULLSIZE>nSize ){
+  memset([nSize], 0, EXPR_FULLSIZE-nSize);
+}
   }

   /* Set the EP_Reduced, EP_TokenOnly, and EP_Static flags
appropriately. */


On Thu, Aug 20, 2015 at 3:13 AM, Bernhard Schommer <
bernhardschommer at gmail.com> wrote:

> Hi,
>
> the warning which is mentioned in the ticket
> f51d9501800de5a0fb69d5048ce6662981b461ec still occurs also with newer gcc
> versions. The ticket was closed after a gcc bug report was opened. The gcc
> bug was closed due to a missing testcase.
> I had a quick look at the problem and it seems that the warning is right
> since in certain context and with optimizations enabled gcc can prove
> that EXPR_FULLSIZE
> - nSize == 0 and thus the triggered warning for this is correct.
> Replacing
> memset([nSize], 0, EXPR_FULLSIZE-nSize);
> By
> if(EXPR_FULLSIZE-nSize > 0)
>  memset([nSize], 0, EXPR_FULLSIZE-nSize);
> would remove the warning,
>
> Cheers,
> -Bernhard
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Determine query type

2015-08-06 Thread Scott Hess
Also consider https://www.sqlite.org/c3ref/stmt_readonly.html

-scott

On Thu, Aug 6, 2015 at 10:08 AM, Stephan Beal  wrote:

> On Thu, Aug 6, 2015 at 6:57 PM, Ben Newberg  wrote:
>
> > Excellent. This is exactly what I was looking for.
> >
>
> Great :). Now that i have some code in front of me i can say, more
> concretely, that sqlite3_column_count() is what you want.
>
> https://www.sqlite.org/c3ref/column_count.html
>
> --
> - stephan beal
> http://wanderinghorse.net/home/stephan/
> http://gplus.to/sgbeal
> "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
> those who insist on a perfect world, freedom will have to do." -- Bigby
> Wolf
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Read data from database slow the first time but then very fast thereafter. How can I improve the speed for the first time read?

2015-08-04 Thread Scott Hess
On Tue, Aug 4, 2015 at 9:23 AM, John McKown 
wrote:

> On Tue, Aug 4, 2015 at 10:45 AM, Simon Slavin 
> wrote:
> > On 3 Aug 2015, at 1:58pm, Linquan Bai  wrote:
> > > I am trying to read large data from the database about 1 million
> records.
> > > It takes around 1min for the first time read. But if I do the same
> > process
> > > thereafter, the time is significantly reduced to 3 seconds. How can I
> > get a
> > > fast speed for the first time read?
> >
> > You can't.  Some part of your computer has pulled that data into cache,
> > and it's still in the cache when you run the process again, so it the
> data
> > doesn't need to be fetched from disk again.
> >
>
> That sounds correct to me. I don't know which OS the OP is running (likely
> Windows, which I don't know well). But I wonder if there is some way, like
> running a program before he runs his application which can tell the OS to
> "preload" the file into RAM cache. On Linux, I might do something like: "dd
> if=/path/to/sqlite-database.sqlite3 of=/dev/null bs=1024 count=100" which
> would, as a side effect, pull the first 100KiB of the file into RAM.


You could also write code to get the file underlying the database
(sqlite3_file_control with SQLITE_FCNTL_FILE_POINTER), then use the VFS
layer to manually page through the file.

BUT, keep in mind that you might find that you've just moved the 1min time
from query time to preload time, in which case you've just complexified
without adding anything.

If preloading makes the overall operation faster, then you could probably
see some benefit from running VACUUM.  You might also try different page
sizes (note that changing page size requires VACUUM, so make sure that
you're actually measuring page-size differences and not VACUUM
differences).  Changing to memory-mapped may change timings, too, since the
OS may predict reads differently for memory-mapped I/O versus POSIX I/O.

-scott


[sqlite] pragma temp_store_directory is deprecated, what is the alternative?

2015-07-30 Thread Scott Hess
Passing NULL to xOpen()'s zName parameter opens a temp file.

-scott


On Thu, Jul 30, 2015 at 4:29 PM, Howard Kapustein <
Howard.Kapustein at microsoft.com> wrote:

> >There cannot be a fully portable way, because path specifications are not
> portable
> Which begs the question, why isn't there an xGetTempFilename VFS function
> in sqlite3_vfs?
>
> Wouldn't the simplify things? Have the VFS handle it apropos, with the
> default VFS' in SQLite (or an older VFS lacking it) use the current hunt
> sequences for compat
>
> - Howard
>
> This email may contain confidential and privileged information.  Any
> unauthorized use is prohibited.  If you are not the intended recipient,
> please contact the  sender by reply email and destroy all copies of the
> original message.
>
> -Original Message-
> From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:
> sqlite-users-bounces at mailinglists.sqlite.org] On Behalf Of Simon Slavin
> Sent: Tuesday, July 28, 2015 4:31 AM
> To: General Discussion of SQLite Database <
> sqlite-users at mailinglists.sqlite.org>
> Subject: Re: [sqlite] pragma temp_store_directory is deprecated, what is
> the alternative?
>
>
> On 28 Jul 2015, at 9:55am, Paolo Bolzoni 
> wrote:
>
> > From the C interface, the way to decide the directory is setting the
> > value of sqlite3_temp_directory char pointer. As explained here:
> > https://www.sqlite.org/c3ref/temp_directory.html
> > This is also the first place sqlite3 checks.
>
> The documentation is clear:
>
> "Applications are strongly discouraged from using this global variable. It
> is required to set a temporary folder on Windows Runtime (WinRT). But for
> all other platforms, it is highly recommended that applications neither
> read nor write this variable. This global variable is a relic that exists
> for backwards compatibility of legacy applications and should be avoided in
> new projects."
>
> Also, sqlite3_temp_directory is accessed by the VFS level, by choice of
> the VFS.  You might someday switch to another VFS which ignores it.
>
> > From anywhere else sqlite3 checks the value of those env variables:
> > SQLITE_TMPDIR, TMPDIR, TMP, TEMP, USERPROFILE.
>
> All uses of SQLite are 'the C interface'.  That's all SQLite is: C code.
> The C code checks the above places, depending on which VFS you're using,
> which usually comes down to which OS you're using.
>
> The C code checks whatever environment variable the OS expects a user to
> use to set a default location for their temporary files.  (It also checks a
> lot of other places, but it will definitely check whatever the
> documentation for the OS says is the correct environment variable.)
> Generally speaking you should let the computer's user or administrator set
> this variable, since they know which drives they have attached to the
> computer, and which one they want temporary files on. You're just a
> programmer of one application, running on a computer you've never seen.
>
> The exceptions are for embedded controllers, where the programmer is the
> administrator, and that on all platforms SQLite can create huge temporary
> files and some systems don't have that much space available for temporary
> files.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] Mozilla wiki 'avoid SQLite'

2015-06-14 Thread Scott Hess
On Sun, Jun 14, 2015 at 1:19 PM, Richard Hipp  wrote:
> On 6/14/15, Scott Hess  wrote:
>> SQLite essentially gives you a set of
>> b-trees with syntactic sugar over them,
>
> SQL (and I speak in general terms here, not just of SQLite) provides
> way more than syntactic sugar over b-trees.  The syntactic sugar over
> b-trees part is the *least* of the many advantages of SQL.  Other more
> important features include:

To be clear, what I mean by this is that if you have a set of
operations which you could do in your native language, and you decide
to instead push them into SQLite, and you are concerned about
performance and memory use, then it really behooves you to be able to
describe how to accomplish those operations using SQLite's basic data
structures.  Many developers build their SQL assumptions in an
environment with a memory-constrained app server calling out to a
database server running on the beefiest machine their ops people can
acquire.  The set of things-that-are-appropriate are going to be
different in an environment where "Map a multi-gigabyte index into
memory" is a valid approach.

In case it isn't clear from my post (I have a bit of a head cold going
on, so I probably shouldn't be writing posts on public mailing lists),
IMHO SQLite has done a really solid job covering the bases for the
area of operations it has staked out, and has done a good job of not
adding features inappropriate to those goals.  It's just that IMHO
developers often make dubious assumptions about what _else_ SQLite can
do.

-scott


[sqlite] Mozilla wiki 'avoid SQLite'

2015-06-14 Thread Scott Hess
For various reasons I've ended up as the SQLite rep w/in Chromium, and
I bookmarked that page awhile back to periodically revisit.  People
often seem to believe that SQLite magically solves tons of problems
with their persistence layer, without realizing that many of their
assumptions are based on things like having a DBA to take care of
things behind the scenes, or having a distinct database server process
probably running on pretty beefy hardware.

Also, to be honest, I find that developers often don't really _think_
about what they're doing.  SQLite essentially gives you a set of
b-trees with syntactic sugar over them, so if you cannot adequately
describe how your schema works in terms of raw b-trees, then SQLite
isn't going to be able to magically make things awesome behind your
back.  They also don't think about the evolution of their schema over
time.  So you see developers copy/paste some other code and make it
worse, and then never float things by an expert to see if they are
making poor decisions, and three years later someone has to
disentangle the horror.

Also, heaven defend me from people writing ad-hoc array-based query
systems that automatically generate bad SQL.  My experience with ORM
systems over the years has convinced me that if you spin up an entire
dedicated team to work really hard on that problem alone for years, it
will still be quite unsatisfying in the end.  One developer making a
bunch of stuff up and then abandoning it is just a recipe for later
pain.

There are many places where I think developers would have been better
served with a shared nosql-type store (like leveldb) mapping keys to
something a decent serialization system (like protocol buffers), and
which would have some well-defined and automated procedures for
handling things like transaction scheduling and recovery from
filesystem corruption and other errors.  It wouldn't be as good as
SQLite can be, but it wouldn't be as bad as the way many people use
SQLite in practice.

[To be fair to SQLite, much of my position on this isn't much
different than my position on MySQL in a past position.  SQL is a
challenge for many developers.]

-scott


On Sun, Jun 14, 2015 at 6:08 AM, Jean Chevalier  wrote:
> Somewhat contradictory the Mozilla Foundation being a member of the SQLite 
> Consortium while their performance wiki prominently features a warning to 
> developers against using SQLite allegedly for performance reasons. Guard me 
> from my friends...
>
> http://wiki.mozilla.org/Performance/Avoid_SQLite_In_Your_Next_Firefox_Feature
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] User-defined types

2015-06-04 Thread Scott Hess
On Thu, Jun 4, 2015 at 1:54 AM, Dominique Devienne  
wrote:
> On Thu, Jun 4, 2015 at 10:20 AM, Christopher Vance  
> wrote:
>> If you really want your own types, you could always bundle with ASN.1 and
>> store the result as a blob.
>
> Or Protobuf, or ... But you're back to option 1, you must store somewhere
> that knowledge, and it's an app-convention, SQL and SQLite don't cooperate.
>
> That's far from first-class UDTs.

Who else other than your app is going to read the data directly, though?

Many other SQL systems end up becoming the de facto integration point
between disparate backend and frontend systems.  So you end up growing
additional features to let everyone play in the database using the
same set of conventions put in place by the DBA.  That doesn't make it
a really strong design for the overall system, though, even if it is
convenient and incremental to build.  I don't think the argument is as
strong for things like this in SQLite, simply because often enough
SQLite isn't really in a position to leverage any of it any better
than your app code can, and just delegating it to your app code works
out pretty well for SQLite in terms of support burden and getting
bogged down by complexity.

-scott


[sqlite] Should journal_mode=PERSIST plus secure_delete=ON be allowed?

2015-06-01 Thread Scott Hess
Someone just pointed something out which basically comes down to if
you're using these two together:
  PRAGMA secure_delete=ON;
  PRAGMA journal_mode=PERSIST;
then the first makes sure that evidence of deleted data should be
missing from the main database file, but the second can leak such
evidence in the journal file.  For now I'm switching to TRUNCATE, but
it did make me wonder.  Should PERSIST devolve to TRUNCATE
automatically in this combination?  Or maybe PERSIST could overwrite
the first journal_size_limit bytes of the file?  [Obviously that could
be optimized to be smaller in many cases.]

This isn't so much a feature request as wondering about what is
correct and the least surprising outcome.

-scott


[sqlite] Determine type of prepared statement via C Interface?

2015-05-08 Thread Scott Hess
On Fri, May 8, 2015 at 11:20 AM, Simon Slavin  wrote:
> On 8 May 2015, at 6:43pm, Peter Aronson  wrote:
>> Well, there's sqlite3_stmt_readonly which appears to do pretty much what 
>> you're asking for: https://www.sqlite.org/c3ref/stmt_readonly.html.
>
> Suppose you have this statement
>
> DELETE FROM myTable WHERE id=600
>
> and there are no rows with id=600.  I presume that the function will return 
> FALSE but a literal reading of the description says that it will return TRUE. 
>  If I'm right then it might be useful to rewrite the documentation a little.

Why would you presume that it would return FALSE?  That would
essentially require executing the statement, which definitely _would_
be something useful to add to the documentation (but I doubt it does
that).

-scott


[sqlite] Determine type of prepared statement via C Interface?

2015-05-08 Thread Scott Hess
sqlite3_stmt_readonly(stmt)?  This hits INSERT/UPDATE/DELETE, but not
BEGIN/COMMIT/ROLLBACK.

Or sqlite3_sql(stmt) if you want to do it heuristically by inspecting
the statement.

I think a "BEGIN READONLY" would be a sensible transaction type.
Having a wrapper API force the developer to select read-only or
immediate for the transaction type up front has some benefits.  My
experience is that most developers don't write their code in a way
which allows them to correctly handle SQLITE_BUSY after the
transaction starts, and once they've written it it can be hard to
refactor things to allow the overall transaction to be correctly
restarted.

-scott


On Fri, May 8, 2015 at 10:01 AM, Stephen Broberg  
wrote:
> Using versions 3.8.5 and 3.7.7.1:
>
> Is there something in the SQLite C API that will tell what type of statement 
> a prepared sqlite3_stmt is (e.g., insert, update, delete, select, etc.).  
> Specifically, I?d like to know whether a statement that either has run or is 
> about to be run (via sqlite3_step) is a read-only (select) or write (pretty 
> much everything else) operation.  I?m implementing a reader/writer lock 
> mechanism around our access to the sqlite database, and as the consumer of 
> the interface will need to specify the lock type when they construct a query 
> object (save point objects would always be write, of course), I?d want to 
> runtime sanity-check their choice after preparing the statement, and log or 
> throw an error if they flagged a non-select statement as a reader.  Perhaps 
> it?s possible to interrogate the connection object after a statement runs to 
> determine if the database was changed?
>
> A possible approach might be to use sqlite3_update_hook or 
> sqlite3_commit_hook to identify the statements that modify the db.  Does 
> sqlite3_commit_hook fire when a statement is implicitly committed (e.g., an 
> update executed outside the context of a savepoint when in autocommit mode)?  
> Another option might be to use two database connections, one read-only, the 
> other normal, but the design of our current system makes this option a major 
> redesign (this is legacy code I?ve inherited.  If I had my druthers, I?d be 
> using a dedicated sqlite connection per thread and running in multi-thread 
> mode, but right now that?s too great a refactoring task).
>
> Or am I doing this wrong?  Our multithreaded app produces a lot of 
> SQLITE_BUSY results when concurrency increases; serializing our transactions, 
> both read & write, globally solved this problem and increased throughput, but 
> I?m interested in moving beyond a simple mutex to a read/write one.  We?re 
> running Serialized mode, but is there some other sqlite configuration that 
> automatically does what I?m trying to accomplish: serialize all write 
> operations and save points for all threads, allow selects to run concurrently?
>
> Thanks.
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] saving :memory:database to disk

2015-05-07 Thread Scott Hess
On Thu, May 7, 2015 at 11:03 AM, Scott Hess  wrote:
> On Thu, May 7, 2015 at 10:53 AM, Paul Sanderson  gmail.com> wrote:
>> I am contemplating a change to a program whereby a database is
>> initailly created in memory and then later if my users choose they can
>> save it to disk and then switch to using the disk based DB.
>>
>> I can obviously create a new disk based db, iterate through
>> sqlite_master and then populate each table. But I just wondered
>> whether anyone had any suggestions of a better way of doing this?
>
> You could use the backup API to backup the in-memory database to disk.

Additionally, VACUUM is implemented essentially as "Create a new
database, copy everything over using SQL commands, replace the
original database".  If you decide to go the route of scanning
sqlite_master and populating things that way, I strongly recommend
using the VACUUM implementation as a guide, since it will have already
considered various interesting edge cases and optimizations.

-scott


[sqlite] saving :memory:database to disk

2015-05-07 Thread Scott Hess
On Thu, May 7, 2015 at 10:53 AM, Paul Sanderson
 wrote:
> I am contemplating a change to a program whereby a database is
> initailly created in memory and then later if my users choose they can
> save it to disk and then switch to using the disk based DB.
>
> I can obviously create a new disk based db, iterate through
> sqlite_master and then populate each table. But I just wondered
> whether anyone had any suggestions of a better way of doing this?

You could use the backup API to backup the in-memory database to disk.

-scott


[sqlite] Thoughts on storing arrays of complex numbers (Solved)

2015-04-24 Thread Scott Hess
On Fri, Apr 24, 2015 at 12:01 PM, Drago, William @ CSG - NARDA-MITEQ
 wrote:
> Since the data is received from the analyzer as an array of
> real/imaginary pairs (R,I,R,I,R,I,R,I...), 3202 elements total,
> that's how I will blob and store it. This is the simplest way
> to add it to the database. It's just one more field along with
> all the other data. If I ever need to operate on that trace
> data again it's a simple matter of pulling out of the database
> and un-blobbing it.

In a case like this, I don't think I've ever come to regret suggesting
the use of a serialization library, like protobuf (or cap'n proto or
third or avro or ...).  When you make your ad-hoc serialization
strategy, it works swell for six months, then a new requirement comes
downstream and you have to figure out a new format plus how to convert
all the old data.  If that happens two or three times, you start to
get a combinatoric problem which makes it hard to reason about how a
change is going to affect existing installs.  Most such requirements
are for an additional field per array index, which many serialization
libraries can support pretty transparently.

I don't have any specific recommendations along this front, as usually
projects already have one (or two or three) already linked in to cover
unrelated cases (reading config files, importing data from another
tool, etc).

-scott


  1   2   3   4   5   >