[sqlite] Prevent database file from being overwritten by other processes

2015-04-08 Thread Fabian Pröbstl
Hi there!

Currently, we are using SQLite as our application file format for a Windows 
7/C#/System.Data.SQLite based desktop application. We only allow one instance 
to open the file by running "set locking_mode=EXCLUSIVE;BEGIN EXCLUSIVE;COMMIT" 
when connecting to the database.

This all works fine, however a user can still open Windows Explorer and copy 
paste a file with the same name but different content (e.g. an empty file) over 
an existing, exclusively locked database. From what I found out with the 
OpenedFilesView tool, SQLite seems to open the file with SHARED_WRITE, which 
explains why *any* process can overwrite the contents.

Is there an easy way of configuring / changing this so that SHARED_WRITE is not 
acquired? Will SQLite even function? Is it just easier to create a hidden copy 
and work on that?

Thanks for the advice
Fabian


[sqlite] Attach via prepared statement / Multiple attach

2015-03-11 Thread Fabian Stumpf
Thanks for your reply, James!

> Parameterized queries  in general let you replace any data value
> (and not metadata).  For instance, you can replace a column value, but
> not a column name.  Using that as a guide, I'm a little suprised that
>
> ATTACH DATABASE ? AS ?
>
> works, because the second parameter is metadata.

To be honest, by 'works' I currently mean "The statement prepares
alright, I can bind both parameters and execute it without ever being
presented with anything but SQLITE_OK or SQLITE_DONE". I have not
actually yet checked if the database was attached using the bound
name, so it might just have been attached with a name of '?'. I'll
check ASAP.

As for my second question (Attaching the same file multiple times
under different names), I found the answer in the documentation for
the DETACH statement[0].
It might be useful to also include that fact in the ATTACH documentation :)

Thanks for your time!

[0] https://www.sqlite.org/lang_detach.html

--
FJS


[sqlite] Attach via prepared statement / Multiple attach

2015-03-09 Thread Fabian Stumpf
Hi everyone,

I am currently using
> ATTACH DATABASE ? AS ?;
to dynamically attach databases to a connection.
In some cases, the same physical database file is attached under
different names.

This all seems to work perfectly, although the documentation for the
ATTACH statement specifies the name as type "database-name", not as
"expr", so I'm not sure if specifying a bind parameter is "legal". So
I'd like to know if this behaviour is intended and thus safe to rely
on (and will it continue to be supported)? Same goes for attaching the
same file multiple times (under different names).

--
Kind regards,
FJS


Re: [sqlite] In-memory DB slower than disk-based?

2014-08-06 Thread Fabian Giesen

On 8/6/2014 11:51 AM, Jensen, Vern wrote:

*bump*

Anyone?


Not sure if this is your problem, but this kind of load puts a lot of 
stress on SQLite's page cache, which might well perform worse than your 
OS'es page cache (used for "real" disk IO) does, especially under high 
contention.


Also, SQLite's defaults wrt. threading are set up to be safe, which 
means that when in doubt about what your code might be doing, SQLite 
will serialize.


I did a bit of SQLite perf tuning recently and here's two things you 
might want to try:


1. What's your value for "flags"? SQLite by default (pessimistically) 
assumes that you might be sharing a DB connection between multiple 
threads, and thus wraps essentially all API functions in locks to make 
sure this is safe. In a heavily multi-threaded scenario, I would 
recommend that you manually ensure that each connection is only used by 
one thread at a time, and then add SQLITE_OPEN_NOMUTEX to your flags. 
This reduces lock traffic substantially. (If your connection was 
per-thread already, doing so will not reduce contention or have any 
impact on your asymptotic perf or scalability, but locks aren't free.)


2. Before sqlite3_initialize, try

  "sqlite3_config(SQLITE_CONFIG_MEMSTATUS, 0);"

This disables some API functions that provide access to memory stats, 
but avoids a global lock on every memory allocation/free, and SQLite 
does a lot of them. If you don't need these memory statistics, turning 
this off can make a noticeable difference in scalability.


I'm not sure if this will affect page cache throughput specifically, but 
these two things are both fairly easy to try.


-Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Fabian Giesen

On 6/10/2014 2:25 PM, Simon Slavin wrote:


On 10 Jun 2014, at 7:22pm, Fabian Giesen <fabi...@radgametools.com> wrote:


  1. Bulk insertion of relatively large amounts of data (low hundreds of 
megabytes to tens of gigabytes); a typical profile generates on the order of 
10MB of sampling data per second.


I presume you do your INSERTs inside a transaction.  If not, you should.


Yes, it's all wrapped in transactions; typically on the order of 10-100
a second. We're talking (order of magnitude) about 100k rows inserted 
per second of capture here; it would be nowhere near fast enough if

every INSERT was its own transaction.


Each of these statements ends up scanning over the entire table once. Since the 
tables in question are quite large, I would love to have some way to create 
multiple indexes in one pass.


Not possible in current SQLite.  No standard syntax for doing it in SQL.  A 
quick google suggests that no other popular DBMS implements it.


I beg to differ.

MySQL's InnoDB has it, for example:

  http://dev.mysql.com/doc/refman/5.5/en/innodb-create-index-examples.html


And ... I'm not sure it would be that much faster.  The majority of the work 
it's doing is writing.  Reading the existing table data is fast and easy, but 
writing new data and figuring out the pointers takes the majority of the time.  
Because of this I don't think it likely that this feature will be added to 
standard SQLite.  Of course, if you really care about this you could add it 
yourself.


The majority of the work is sorting, as Richard correctly points out,
not writing. The indexes in question are all on one or two integer keys
so they don't end up particularly big.

The average sample record comes out at ~100 bytes. All the records are
integer values; anything more complicated associated with a sample is
stored as unique records in separate tables and referenced via foreign
keys.


To optimize this, when you create the table put the columns you're going to 
index near the beginning of the rows.  SQLite stops reading the data of a table 
row once it has all the columns it needs.  I don't know whether this will make 
a lot of difference but it won't do any harm.


When the dataset is small enough to fit in memory (this is on
workstation PCs, so this means "less than a few gigabytes") we're CPU
bound (doing sorting, mostly, as Richard correctly suggested).

However, looking at the sorting time vs. number of records, sorting (in
that scenario, for our indices) typically proceeds at well over 1M
records/s (depends on the machine, of course; on my work PC it's
closer to 1.4Mrecs/s).

At that rate, in an out-of-core situation, we would need to be reading
~140MB/s of records, sustained, during index build time, plus the write
bandwidth for the resulting index (a small fraction; the indexes on the
large tables are all on one or two integer keys), to not become IO 
bound. That's possible with a good hard drive (and most SSDs), but it's

closer than I'm fully comfortable with.

The reason I'm asking is because we're seeing customers running into
disproportionately slow indexing times at the end of long (~1h!)
capture runs; since the dataset size is the only thing that's
fundamentally different for longer runs, I started looking into it. I'm
well aware this is an extreme case in various ways though. I just wanted
to ask to make sure there wasn't a way I was missing!


Also, store your time as a numeric value (Unix epoch or Julian day) rather than 
a string.  Numbers are faster to handle than strings.  INTEGERs are faster to 
handle than REALs.


:-)

The time fields are 64-bit ints. The exact meaning depends on the
capture target platform (we support several) but usually it's either
a system-synchronized CPU clock cycle counter (when available) or
ticks for a high-frequency bus/event timer.

-Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Building multiple indexes in one scan over a table

2014-06-10 Thread Fabian Giesen

Hi,

I'm working on a profiler that uses SQLite to store and index the results.

Due to the nature of the task, the workload ends up splitting into 
essentially three phases:


1. Bulk insertion of relatively large amounts of data (low hundreds of 
megabytes to tens of gigabytes); a typical profile generates on the 
order of 10MB of sampling data per second.


2. Indexing. The first phase normally runs without any indexes on the 
sample tables, to maximize insertion throughput. In the second phase we 
create the indexes we need to retrieve the samples efficiently. (We 
found "batch-indexing" after the fact to generally lead to shorter 
overall processing times.)


3. Browse the results. Once the samples are indexed, the user can look 
around, analyze the results using queries, and so forth. This is 
essentially read-only.


Phase 2 is crucial for quick retrieval, but at the same time it 
essentially amounts to wait time for the user, so we'd like to get it 
over with as quickly as possible. Now, looking at what happens in there, 
I noticed that we often end up creating several indexes on the same 
table back-to-back:


  CREATE INDEX samples_by_time ON samples(time);
  CREATE INDEX samples_by_name ON samples(name_id);
  CREATE INDEX samples_by_type ON samples(type_id);

Each of these statements ends up scanning over the entire table once. 
Since the tables in question are quite large, I would love to have some 
way to create multiple indexes in one pass. A cursory examination of the 
VDBE code produced for the CREATE INDEX statements indicates that 
SQLite's backend should be capable of doing so (though I'm no expert), 
but I don't see a way to express what I want in standard SQL.


So, is there any way to create multiple indexes for the same table in 
one operation in SQLite? If not, how hard would it be to add?


Thanks,
-Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ORDER BY DESC after GROUP BY not using INDEX - intentionally?

2013-10-21 Thread Fabian Büttner
Thanks. I think using GROUP BY without aggregates is a strange way to 
remove duplicates, anyway.

Not intentional.  SQLite simply fails to recognize that by using the GROUP
BY in descending order it could avoid the ORDER BY clause.  This is an
optimization that we have never considered because it has never come up
before.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] ORDER BY DESC after GROUP BY not using INDEX - intentionally?

2013-10-21 Thread Fabian Büttner

Hi,

I have been thinking about a question on stackoverflow 
(http://stackoverflow.com/questions/19236363/select-distinct-faster-than-group-by), 
where some SQL framework removes duplicates from results using GROUP BY 
instead of DISTINCT.
I don't want to discuss that this might not be a good idea. However, the 
core of that problem is the creation of temp b-trees when using ORDER BY 
... DESC after GROUP BY.
I wondered if the construction of a temp b-tree in the third query is 
intentional / by design?


I am using sqlite 3.8.1.

sqlite> PRAGMA legacy_file_format=OFF;

sqlite> create table test1 (x INTEGER);
sqlite> create index test1_idx on test1(x);
sqlite> explain query plan select x from test1 group by x order by x;
selectidorder   fromdetail
--  --  -- 
---
0   0   0   SCAN TABLE test1 USING COVERING 
INDEX test1_idx


create table test2 (x INTEGER);
sqlite> create index test2_idx on test2(x);
sqlite> explain query plan select x from test2 group by x order by x desc;
selectidorder   fromdetail
--  --  -- 
---
0   0   0   SCAN TABLE test2 USING COVERING 
INDEX test2_idx

0   0   0   USE TEMP B-TREE FOR ORDER BY

create table test3 (x INTEGER);
sqlite> create index test3_idx on test3(x desc);
sqlite> explain query plan select x from test3 group by x order by x desc;
selectidorder   fromdetail
--  --  -- 
---
0   0   0   SCAN TABLE test3 USING COVERING 
INDEX test3_idx

0   0   0   USE TEMP B-TREE FOR ORDER BY

To double check:

sqlite> explain query plan select x from test3 order by x desc;
selectidorder   fromdetail
--  --  -- 
---
0   0   0   SCAN TABLE test3 USING COVERING 
INDEX test3_idx



Regards
Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite keeps on searching endlessly

2013-10-19 Thread Fabian Büttner



I cannot definitely solve your problem but I can think of some things to try.  
First, do these:

ANALYZE;
CREATE INDEX map_dsn ON map (d, s, n);
CREATE INDEX map_dns ON map (d, n, s);

then execute the same SELECT.  Does it have the same problem ?  Does the 
EXPLAIN QUERY PLAN tell you which of the (now) three indexes SQLite has chosen.


Shouldn't ANALZYE be run _after_ creating the indexes?

sqlite> create table t(x int, y int);
sqlite> insert into t values (1,1), (2,1), (3,3);
sqlite> analyze t;
sqlite> select * from sqlite_stat1;
t||3
sqlite> create index i on t(x,y);
sqlite> select * from sqlite_stat1;
t||3
sqlite> analyze t;
sqlite> select * from sqlite_stat1;
t|i|3 1 1

Fabian
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite keeps on searching endlessly

2013-10-19 Thread Fabian Büttner
You may want to put the columns with the highest selectivity first in 
your index.

The device 15 has nearly 10 entries in the table while the remaining of
the 600 Million records belong to another device.

E.g., CREATE INDEX map_index ON map (d, ...);

Also, you should run ANALYZE map so that the query planner has accurate 
information about your index.


Fabian

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Comparing two tables column by column

2013-07-29 Thread Fabian Klebert
Wouldn't 

SELECT * FROM table1
EXCEPT
SELECT * FROM table2

solve this problem?
I think it does for the example provided. Not sure if it would work in 
real-world environment.


-Ursprüngliche Nachricht-
Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
Im Auftrag von Stephen Chrzanowski
Gesendet: Montag, 29. Juli 2013 13:01
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Comparing two tables column by column

To be fair to me, the example had the same column names.  If the two tables 
have the same column names, then having a bit of extra code to tag on the 
column name + "_1" might have worked.  As my first reply answered, untested. ;)


On Mon, Jul 29, 2013 at 6:46 AM, Clemens Ladisch  wrote:

> Simon Slavin wrote:
> > On 29 Jul 2013, at 4:03am, Fehmi Noyan ISI  wrote:
> >> One point I forgot to mention; the number of columns is unknown.
> >
> > There is no way in SQL to say "Give me the contents of all the 
> > columns
> of a row of table in an unambiguous format.".
>
> Well, just "give me" could be done with "SELECT *", but it is almost 
> impossible to compare such columns in SQL.
>
> If the tables have the same number _and_ names of columns, it would be 
> possible to use a NATURAL JOIN, and combine this with an outer join to 
> get non-matching records:
>
> SELECT table1.* FROM table1 NATURAL LEFT JOIN table2 WHERE 
> table2.rowid IS NULL
>
> This will return those records from table1 that do not have a matching 
> record in table2.  For the other direction, run the same query with 
> "table1" and "table2" exchanged.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS Find Tokens at Record Start

2013-04-05 Thread Fabian Klebert
You might want to check the following:

SELECT word FROM fts WHERE fts MATCH '^token'

Beginning with 3.7.9 this should only return records that have 'token' at the 
beginning of the record.
See changelog of 3.7.9:

If a search token (on the right-hand side of the MATCH operator) in FTS4 begins 
with "^" then that token must be the first in its field of the document. ** 
Potentially Incompatible Change **



-Ursprüngliche Nachricht-
Von: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] 
Im Auftrag von Clemens Ladisch
Gesendet: Freitag, 5. April 2013 15:41
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] FTS Find Tokens at Record Start

Paul Vercellotti wrote:
> using FTS, how do you match records that contain certain tokens beginning at 
> the start of the record

Apparently, this information is not stored in the FTS index.

Search for the tokens, then manually check with LIKE or something like that.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Phrase queries

2012-07-13 Thread Fabian
Ever since I started using FTS extensively, I frequently ran into this
limitation:

  ** TODO: Strangely, it is not possible to associate a column specifier
  ** with a quoted phrase, only with a single token. Not sure if this was
  ** an implementation artifact or an intentional decision when fts3 was
  ** first implemented. Whichever it was, this module duplicates the
  ** limitation.

Is it ever planned to be fixed, because it doesn't seem to break any
backwards compatibilty?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS: Phrase queries

2011-11-27 Thread Fabian
2011/11/14 nobre 

>
> Comment from the source:
>
>  ** TODO: Strangely, it is not possible to associate a column specifier
>  ** with a quoted phrase, only with a single token. Not sure if this was
>  ** an implementation artifact or an intentional decision when fts3 was
>  ** first implemented. Whichever it was, this module duplicates the
>  ** limitation.
>
> So, seems its really not possible
> Rafael


Thanks, I hope this limitation will be lifted someday. Fixing it will not
break any existing queries and is fully backwards compatible, so I dont
understand why FTS4 duplicates that (faulty) behaviour.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Phrase queries

2011-11-13 Thread Fabian
When I have a basic FTS query that needs to be restricted to a column, I
can write it in two ways:

1.) WHERE column MATCH 'apple'
2.) WHERE table MATCH 'column:apple'

But when I have a phrase query, I can only write it in one way:

1.) WHERE column MATCH '"apple juice"'

The problem is that when I want to combine the queries (search for 'apple'
in column1 and for "apple juice" in column2) i cannot write the query like:

WHERE column1 MATCH 'apple' AND column2 MATCH '"apple juice"'

Nor can I write it like:

WHERE table MATCH 'column1:apple column2:"apple juice"'

So this fairly simple query, seems impossible in FTS? Or does anyone know
how to workaround this (without doing two seperate queries)?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] JOIN vs IN

2011-11-11 Thread Fabian
Suppose I have two tables, and I want to have look for a value in the first
table, but display the columns from the second table. The most obvious way
would be joining them on rowid. But I don't need to SELECT any columns from
the first table, and it's a FTS4 table (which always joins a bit slower
than real tables), so I currently do:

SELECT * FROM table2 WHERE rowid IN (SELECT rowid FROM table1 WHERE amount
> 500)

It returns the same results, but it doesn't seem much faster. Is there any
performance difference to be expected from using IN instead of JOIN, or
does SQLite internally rewrite JOIN queries to something similar as IN,
which would explain they perform nearly the same?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] VACUUM

2011-11-10 Thread Fabian
2011/11/10 Richard Hipp 

>
> Because when you are inserting the Nth row, SQLite has no idea of how many
> more rows will follow or how big the subsequent rows will be, so it has no
> way to reserve contiguous space sufficient to hold them all.  The result is
> that parts of the table and parts of the indices become interleaved in the
> file.
>
>
But nothing is written to disk until I call commit (i'm using
journal_mode=memory), so when SQLite has to start writing the data, it
knows the exact total number of rows, and also that no other rows will
follow.

But then again, maybe the format of the journal in memory, is an exact copy
of the bytes it will write to disk, and in that case I understand that it
would be very inefficient to start shuffling things, instead of just
dumping it. I pictured it like a temporary table, in which case it would be
fairly easy to restructure things before writing.

Thanks for your explanation!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] VACUUM

2011-11-10 Thread Fabian
I'm trying to understand the following: Why is it that when I create a new
database, and fill it with rows, why can it not be commited to disk the
same way as VACUUM would do it? Currently I'm trying to manually vacuum the
database by inserting rows sorted by table and by rowid, but as soon as the
table contains INDEXes I cannot reproduce anywhere near the same output as
a real VACUUM, because it writes the indexes first instead of last, for
example. Is there some trick to force VACUUM-like output when building the
database, and avoid to build it twice?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4: Datatypes

2011-11-09 Thread Fabian
It seems I finally have some sort of a solution:

"As of SQLite version 3.7.9, FTS4 supports a new option - "content"
-designed to extend FTS4 to support the creation of full-text indexes
where:

+
+* The indexed documents are not stored within the SQLite database
+  at all (a "contentless" FTS4 table), or
+
+* The indexed documents are stored in a database table created and
+  managed by the user (an "external content" FTS4 table).


Using this new option I can keep all data together in a regular table,
and use FTS purely as an index.

I wonder why this new option wasn't mentioned in the changelog of
3.7.9, but maybe because it's still experimental?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS4: Datatypes

2011-11-09 Thread Fabian
Ever since I started using FTS, I'm always confronted with the problem that
I need two tables: one FTS table with the TEXT columns, and one normal
table with the INTEGER columns for numerical values. This causes all kinds
of problems (keeping the rowid's in sync, complex queries, etc.).

>From a previous post from the author of FTS, I understand that it will be
hard to implement datatypes for the virtual tables that FTS uses, so I
already ruled that out as a solution.

I could just store the numerical values as TEXT in the FTS table, but even
the most simple operation (sorting for example) will be a challenge, since
I have to format the numbers in such a way that they sort right (prepending
zeroes to make them all the same length). But suppose I take all that into
account, there still is another problem: FTS will unnecessary index all
those extra columns.

So would it be an idea to have a simple flag (NOINDEX for example) which
dictates that a certain column shouldn't be indexed by FTS, just stored? It
may be a lot simpler to implement than actual datatypes, and I could work
around the other limitations myself.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 GB 

> Maybe you could try to use a pagesize that matches the size of a disk
> allocation unit or memory page. For Windows since a typical NTFS partition
> has a clustersize of 4KiB - which happens to also be the size of a memory
> page - a pagesize of 4096 Byte seems to be a good compromise between
> pleasing the cache manager and waste of space.


I already used a page_size of 4096 for all the benchmarks, together with a
large (300MB) cache_size, but thanks for the suggestion anyway!


> You could also investigate the effect of VACUUMing and ANALYZEing the
> Database. If you have multiple indexes on tables try compiling sqlite
> defining the SQLITE_ENABLE_STAT3 switch to get even better results from
> ANALYZE.
>

I'm running these tests on a very simple database: 1 table and 1 column, so
ANALYZE shouldn't have any effect. And I already tested running VACUUM on
the database, but it didn't help (which seems logical, because I start with
a fresh db each time).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Black, Michael (IS) 

OK...you're right...a reboot kills it.
>

I'm glad someone was able to reproduce this on Linux, ruling out the
possibility it's a Windows-issue.


> However, reboot again and add "select count(*) from a;" as the first line
> of gendat2.sql
>

So if a simple SELECT COUNT(*) can speed up the same insert from 90 seconds
to 7 seconds (including the count), does this confirm it's a sequential vs
random reads problem?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Nico Williams 

>
> I don't get it.  You're reading practically the whole file in a random
> manner, which is painfully slow, so why can't you read the file in one
> fell swoop (i.e., sequential reads)??
>

I'm only reading the whole file when the number of additional inserts is
high enough to cause the whole index to be read from disk. But if I always
pre-cache the database, it will downgrade performance for cases when only
10 inserts need to be done. And I'd like to avoid to have some fuzzy logic
that tries to predicts which of the two methods is going to be faster.

Besides, pre-caching the file sounds easier than it is to accomplish,
because all methods suggested on this list did not work on Windows (for
example copying the file to null). Windows and the harddrive have their own
logic to decide which data to cache, and I haven't found a simple way to
force a certain file into cache.

Or, if FTS really works better, then use that.


I will, but I'm trying to understand the issue that i'm facing, not just
workaround it. It seems that FTS doesn't need to read the whole index from
disk, so I'm trying to pinpoint the difference. My best guess is that it
creates a fresh b-tree for the additional inserts, causing the boost in
performance.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Nico Williams 

>
> What's wrong with reading the whole file into memory at boot time as a
> way to prime the cache?  Rebooting always takes some time, mostly the
> time to read all sorts of files.
>
>
It's a desktop application, I cannot pre-cache anything before the user
launches the app. After the launch a couple of inserts need to be done, but
having to read the whole database file into memory, just to be able to
perform those inserts, doesn't seem a scalable solution.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite Error (19)

2011-11-09 Thread Fabian
I'm in the process of converting a normal table to a FTS virtual table, and
I'm noticing different behaviour if I try to insert duplicate rowid's. I'm
using the 'INSERT OR IGNORE' statement, and on the normal table I can try
to insert millions of duplicates very quickly, but on the FTS table it's
very slow, and for every failed insert I get:

SQLite error (19): abort at 10 in [INSERT INTO 'main'.'table_content'
VALUES(?,(?))]: PRIMARY KEY must be unique

So a possibly explanation of the performance difference would be that the
virtual table is sending a TRACE message for each row, but how can I
disable/prevent that? Or is there another reason the virtual table is slow
with duplicates?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Luuk 

> On 09-11-2011 17:23, Black, Michael (IS) wrote:
>
>>  time sqlite3> 19.307u 0.082s 0:19.40 99.8%0+0k 0+0io 0pf+0w
>>
>> time sqlite3<  index2.sql
>> 19.266u 0.092s 0:19.37 99.8%0+0k 0+0io 0pf+0w
>>
>>
>
> $ time sqlite3 
> real0m21.094s
> user0m20.989s
> sys 0m0.104s
> $ time sqlite3 
> real0m20.898s
> user0m20.813s
> sys 0m0.084s
> $ time sqlite3 gendat1.sqlite 
> real2m32.701s
> user0m26.038s
> sys 0m5.256s
> $ time sqlite3 gendat2.sqlite 
> real1m50.452s
> user0m25.534s
> sys 0m5.360s
>
>
> The difference when creating a db on disk (size: 35Mb) is (152-110)/110=
> 38%
>
>
Did you do a reboot between the second insert? Because the difference I'm
seeing is much larger than 38%? Did you test it on Linux or Windows?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Black, Michael (IS) 

> Are you sure you're using BEGIN/COMMIT on your transactions?
>

Yes


> I just used my benchmark data and inserted another 100,000 rows into the
> database in 2.3 seconds.


That is because you immediately insert those additional rows, after
creating the database. I get the same excellent performance as you when I
do that. But if you make sure the database is not in cache (for example,
reboot the machine), you will see that the additional inserts will take
MUCH longer than to be reasonally expected.

Could you repeat your benchmark, with a reboot between the two transactions?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Simon Slavin 

>
> Didn't someone recently note that entering the first million records was
> fast, but if he then closed and reopened the database, entering the next
> 100,000 records was slow ?
>
>
Yes, and there is still no real explanation for it, other than slow disk
reads. But even with very slow random disk I/O, 30 seconds seems still way
too slow for a 100MB file.

But today I made a new observation: if I create the same table as a virtual
FTS4 table, I can add the additional rows within 1 second (even on an
un-cached database file).

So if the reason for the slowness is disk-reads, the FTS4 way of
creating/updating the index requires much less reads? Maybe because it
allows for seperate tree-branches?

FTS is overkill for my simple requirements, but if it's so much faster than
a regular index, why not? The only things that's holding me back from
switching to FTS for this table is:

 - I cannot use the UNIQUE constraint, to disallow duplicate values.
 - I cannot search efficiently for rows that DON'T match a certain value,
because FTS doesn't allow a single NOT operator.

So can someone explain what FTS is doing behind the scenes that makes these
additional inserts so much faster?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INDEX Types

2011-11-09 Thread Fabian
2011/11/9 Black, Michael (IS) 

> Hmmm...appears to be the same for this case which, I must say, I find
> surprising.
>
>
Thanks for actually benchmarking it. I'm also a bit surprised, because I
always thought SQLite handled INTEGER more efficiently than TEXT.

I also did some new benchmarks on the inserts-slowdown I experience, and to
rule out VirtualBox, I performed them on real hardware, but the results are
the same:

10 secs to create a database with 1 million rows, and 30 secs to add an
additional 10.000 rows, if the database file is un-cached.

Maybe this is normal behaviour, but how can it possibly be that it's faster
to delete the whole database and re-create it, than just adding a
(relatively) small amount of additional rows?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] INDEX Types

2011-11-09 Thread Fabian
I'm having an issue where inserts on an un-cached database are very slow.
The reason probably is that a large part of the existing index needs to be
read from disk, to be able to insert new rows to the index. The length of
the values in the indexed column are around 60 bytes, so I'm thinking about
adding an extra column, containing a shorter hash (64bits) of the actual
value, and move the index to that column instead. This way the total size
of the index (and the database) will be much smaller, hopefully resulting
in faster inserts.

But I'm wondering if SQLite can deal more efficiently with a INTEGER index
(64bits) VS an 8-byte TEXT column (also 64bits). I know the INTEGERs
require less disk-space because SQLite can store smaller values in fewer
bytes, but are there any other differences that make them more preferable
as an INDEX? If there is no difference in performance, I could just take
the first 8 characters of the TEXT column as a hash-value, instead of
calculating a CRC64 each time.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-03 Thread Fabian
I just tested it, and it made no difference. The root cause of the problem
is most likely not slow writes, because inserting duplicate values (which
are ignored instead of written to disk) are just as slow.

But your suggestion may help me with another problem: when I fill an empty
database with million of rows, they are written to disk in the order I
inserted them, not in the order a VACUUM would put them (sequential per
table). If I can force an atomic commit by enabling a journal_mode (MEMORY
for example), I would possibly end up with much better structured database
file.

So your idea is appreciated, but for the problem reported in this thread,
it had no effect.

2011/11/3 nobre <rafael.ro...@novaprolink.com.br>

>
> Have you tried setting journal_mode to the default DELETE option ?
> Without atommic commits, maybe your inserts are going to disk one by one
> instead of in a single step, when commiting your transactions, thus slowing
> down disk writes.
>
>
> Fabian-40 wrote:
> >
> > 2011/11/2 Black, Michael (IS)
> > I do not use WAL, since I have turned 'journal_mode' off (to improve
> > insert
> > performance), and as far as I know WAL is only usefull when you need to
> > keep a journal?
> >
> > I also have 'synchronous' off, so SQLite shouldn't be waiting for any
> > filesystem flushes. I hoped it was writing all the newly inserted rows to
> > disk using a single operation, as soon as I call 'commit'. But I observed
> > it, and found out its already writing the rows as soon as soon as they
> are
> > inserted, not batching them for when I call commit.
> >
> > So that could be part of the problem. I don't have a real Windows machine
> > at hand, so I will build one tomorrow, but if your expectations are
> > correct, than it will be even slower than inside a virtual machine,
> > because
> > it will do individiual writes for the 1 million rows too, making
> > performance even worse than it is now.
> >
> > Anothing thing is that I don't expect the slow performance have anything
> > to
> > do with slow disk writes, only with disk reads. I know this because when
> I
> > make the index UNIQUE, and try to insert 10.000 duplicate rows (which are
> > all ignored), it has the same bad performance, even though there are zero
> > bytes written to disk. So it points in the direction of the reads making
> > it
> > slow, not the writes.
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Slow-INDEX-tp32766886p32772266.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Nico Williams 

>
> Incidentally, it pays to make the SQLite3 page size match the
> filesystem preferred block size.
>
>
I already have the page_size set to 4096, which should match the default
NTFS cluster size.

But note that this can still fail you when the file is larger than
> available RAM.  In that case such a flag would be very bad.  And
> SQLite3 can't know how much RAM is available.  The OS can know
> (sortof) and the user can know, but SQLite3 can't.  So I take the
> above back -- such a flag would probably result in posts about how
> SQLite3 startup causes thrashing...
>

If the flag would respect the cache_size pragma, it could work very well. I
currently set the cache_size to 300MB (72k x 4096 pages), and the database
size on disk is only 125 MB, so it would fit in perfectly. People that
never changed the default cache_size, will never experience any trashing,
because theyre database will not be pre-cached, since it doesnt fit into
cache.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Nico Williams 

>
> But that's NOT what SQLite3 is doing.  SQLite3 is doing random I/O.
> And the OS sees the random I/O pattern and concludes it's better to
> not read the whole file in.  So for those 10K inserts you pay -worst
> case- 10K I/Os.  At ~12ms per random I/O (likely the seek times for
> your disks) you're talking 120s, so you're actually far from the worst
> case -- even at 7ms seek time you're talking about twice the time
> you've seen in the worst case.
>
>
Linux will not read the whole file in, but Windows eventually does. The
inserts go progressively faster when they are reaching halfway, and Windows
reads very large pages from disk, even if you request only 10 bytes. So in
reality a very large percentage of these 10K I/O's will come from a buffer
(either Windows one or your harddrive's buffer), and will not result in any
physical reads from disk. Ofcourse you're right that these random reads
will be slower than a sequential file-copy, because they are random, and
not large, continous blocks.

>Actually, it might be nice if SQLite3 had a function or open
>flag by which to request that the whole thing be read into memory,
>because the OS certainly won't know to do it.

I completely agree, because all the current methods (copy the file to
'null', etc.) didn't work well.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Black, Michael (IS)

> Then, when you restart the file is on system disk and it is flushing each
> insert to system disk on the WAL file slowing things down dramaticalliy.
>

I do not use WAL, since I have turned 'journal_mode' off (to improve insert
performance), and as far as I know WAL is only usefull when you need to
keep a journal?

I also have 'synchronous' off, so SQLite shouldn't be waiting for any
filesystem flushes. I hoped it was writing all the newly inserted rows to
disk using a single operation, as soon as I call 'commit'. But I observed
it, and found out its already writing the rows as soon as soon as they are
inserted, not batching them for when I call commit.

So that could be part of the problem. I don't have a real Windows machine
at hand, so I will build one tomorrow, but if your expectations are
correct, than it will be even slower than inside a virtual machine, because
it will do individiual writes for the 1 million rows too, making
performance even worse than it is now.

Anothing thing is that I don't expect the slow performance have anything to
do with slow disk writes, only with disk reads. I know this because when I
make the index UNIQUE, and try to insert 10.000 duplicate rows (which are
all ignored), it has the same bad performance, even though there are zero
bytes written to disk. So it points in the direction of the reads making it
slow, not the writes.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Simon Slavin 

>
> So compare your 'restart-and-INSERT' test with one where you restart, log
> in, then do a few random operations for a minute: start your web browser
> and load a page.  Open and close a few documents or control panels.  Sit
> and do nothing for a minute.  /Then/ carry on with the test.
>
>
To rule out your suggestion of background processes slowing down the
operation, or any VirtualBox performance issues, I rebooted, and
immediately copied the DB file (125 MB) to another location, and it
completed within 5 seconds. So it doesn't seem VirtualBox or background
processes are to blame. In the past I already did some tests with waiting
for 15 minutes, and it had zero effect. Thanks for the suggestion anyway!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Paul Corke 

>
> A stateful antivirus that does lots of heavy processing when you first
> open the file?
>
>
I'm running these tests on a completey clean Win7 install, without any
anti-virus (or other software) installed. I should note that it runs
virtualized (in VirtualBox) so I can quickly switch back to the clean state
every time, but if VirtualBox performance was the root cause of these
performance issues, I would expect it to slow down the initial insert (> 1M
rows) too.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Mr. Puneet Kishor 

>
> ahh, so you *are* getting expected behavior, just not what *you* expected.
> Did you have a different number in mind instead of a factor of 300? And, if
> so, why?


To read an (un-cached) 150mb file from disk, doesn't take 30 seconds. And
SQLite shouldn't have to read the whole file (because of the INDEX), only
the relevant rows in the INDEX, so theoreticly it should even be faster
than reading the whole file. Because Windows has a aggresive caching
technique (it reads much larger blocks from disk, even if you request only
10 bytes from the file), I'm not expecting SQLite to be faster than reading
the whole file, but that it's so much slower just surprises me. Maybe there
is a very simple explanation, I just can't think of any.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow INDEX

2011-11-02 Thread Fabian
2011/11/2 Mr. Puneet Kishor 

>
> Others will have better answers, but methinks that when you reboot the
> computer, the operating system's caches are flushed out, which slows the
> operation. Try working with the db for a bit (SELECT, repeat INSERTs, etc.)
> and notice if the speed increases again to what you expect.
>
>
The reason I reboot the PC for the test, is because I want to have the
caches flushed out, and I fully expect it to make things slower, but not by
the degree (factor 300) i'm experiencing.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Slow INDEX

2011-11-02 Thread Fabian
I asked a similar question before, but received no response, so maybe it's
a very stupid question, but if so, feel free to say so.

I create a database, create one table with a TEXT column, insert 1 million
rows in 10 secs, create an index in 10 secs, VACUUM the database, and close
the database.

Now if I re-open the database, I can add an additional 10.000 rows very
fast (<1 sec). But if I reboot the (Windows) PC, and insert an additional
10.000 rows, it takes at least 30 secs, which seems very slow, if I can add
the first 1 million in under 10 seconds.

It probably has to do with inefficient disk seeks, but even if SQLite has
to read the whole database in memory to do the inserts, 30 secs is still
slow for a 150mb file (the database).

So is there anyone who can explain these slow inserts, or has ideas to
optimize for a scenario like the above? Because currently this renders my
application almost unusable.

The only optimization I can think of would be dropping the index, inserting
the rows, and re-creating the index. But it would require some smart logic
when to do it (because for just 3 inserts it would make things slower
instead of faster), and it feels like more of a work-around than a solution.

Please help me out, thanks in advance!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow inserts with UNIQUE

2011-10-31 Thread Fabian
2011/10/30 Black, Michael (IS) 

>
> #1 What version?
>

3.7.8, using System.Data.Sqlite wrapper


> #2 How long to insert the 1M?
>

10 seconds


>
> #3 What's the average string size?
>

55 characters


>
> #5 How long to create the index?
>

10 seconds


> #6 How long to insert the next 10,000?
>
>
34 seconds. But... only 1 second if I do it immediately after filling the
database, so it seems to be related wether the file is in the filesystem
cache or not. Server-apps will always have the file cached, but this is a
desktop-app, where the db will not be in the filesystem cache most of the
times.

One more thing...show us the EXPLAIN of your insert.  Is sounds like your
> insert is not using the index for the insert for some reason (buq in
> sqlite?).


0 Trace 0 0 0  00
1 Goto 0 18 0  00
2 OpenWrite 0 2 0 1 00
3 OpenWrite 1 14345 0 keyinfo(1,BINARY) 00
4 NewRowid 0 2 0  00
5 String8 0 3 0 test 00
6 SCopy 3 4 0  00
7 SCopy 2 5 0  00
8 MakeRecord 4 2 1 ab 00
9 SCopy 2 6 0  00
10 IsUnique 1 12 6 4 00
11 Goto 0 15 0  00
12 IdxInsert 1 1 0  10
13 MakeRecord 3 1 6 a 00
14 Insert 0 6 2 table 1b
15 Close 0 0 0  00
16 Close 1 0 0  00
17 Halt 0 0 0  00
18 Transaction 0 1 0  00
19 VerifyCookie 0 2 0  00
20 TableLock 0 2 1 table 00
21 Goto 0 2 0  00

The resulting database is about 125MB large. So 34 seconds seems way too
long, even if the whole db-file has to be read from disk, and stored into
memory, it shouldn't take that long.

Pragma's used:

PRAGMA page_size = 4096;
PRAGMA synchronous = OFF;
PRAGMA journal_mode = OFF;
PRAGMA temp_store = MEMORY;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA cache_size = 72500;

If you need any more info, let me know!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow inserts with UNIQUE

2011-10-29 Thread Fabian
2011/10/29 Simon Slavin 

>
> When you insert the 10,000 strings are you doing it inside a transaction ?
>
> BEGIN TRANSACTION;
> INSERT ...
> INSERT ...
> INSERT ...
> COMMIT;
>
>
Yes, I use transactions, prepared statements, cache_size, journal_mode and
synchronous PRAGMA's, almost everything you can think of to make it faster.

But most important: I use the exact same code to fill the initial million
rows, so if there was anything wrong, the initial filling would be slow too
I suppose?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Slow inserts with UNIQUE

2011-10-29 Thread Fabian
I have a table with one TEXT column. I insert 1 million rows of short
strings, and then create an UNIQUE INDEX. The speed is very acceptable. Then
I insert another 10.000 short strings, and the performance is very bad, it
almosts take longer than inserting the initial million(!) rows to fill the
table.

Is this by design, or what can be causing this? I also observed that
inserting the additional 10.000 rows goes progressively faster when nearing
the end, I suppose because this is because at that time the index is
completely in cache. But what is the reason for the bad performance of these
 inserts? I read things about b-tree page splitting, could it be related?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database file structure

2011-10-23 Thread Fabian
2011/10/23 Simon Slavin 

>
> In that case, try defragging your file sometime.  May make a big
> difference.
>
>
If you mean Windows defrag, it would be pointless, since it doesn't change
the database structure? If you mean VACUUM, it will generate the exact same
structure as 'method 2', so I could better use that method in the first
place?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint

2011-10-23 Thread Fabian
2011/10/23 Simon Slavin 

>
> In this example, the indexed column is a text column.  The text fields
> could have been very long, and checking long text fields for uniqueness can
> involve comparing every byte.  Nevertheless, I do not understand the results
> you quoted.  I wonder whether some use of transactions would have vastly
> reduced the problem.
>
>
In my case, the indexed column is a text column too, and the original author
already stated he batched the inserts into transactions, and made use of
PRAGMA statements. Only after pre-sorting the data, the problem disappeared,
but I'd rather avoid having to pre-sort everything (because SQLite's C code
should be able to do it faster, than my C# code).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database file structure

2011-10-23 Thread Fabian
2011/10/23 Simon Slavin 

>
> My immediate question is why this is two rows in two separate tables rather
> than one row in one table.  After all, if tables always have the same rows
> in, they might as well be the same row in one table.


I would love to have those rows into a single table, because those joins
slow down the queries, but I have a mix of TEXT and INTEGER columns, and I
had to move the TEXT columns to a different table (FTS virtual table), and I
could not move the INTEGERs too, because FTS doesn't support them.


> So we could ask you for your OS and disk format.  But even then the answer
> will be useful only for your exact current setup.  The next time you get an
> OS update things might change.  Manufacturers tweak this stuff all the time.
>

It's Windows/NTFS, but I get the point.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE constraint

2011-10-23 Thread Fabian
>
>
> No, a UNIQUE index and a regular index are implemented the exact same way.
> It's just that, at INSERT and UPDATE time, after finding a proper place to
> insert the new value, an additional check is made that the place isn't
> already occupied.
>

So the only overhead for UNIQUE is that extra check?


> I suspect the way you are going to manually filter duplicates will involve
> inserting them into a moral equivalent of a UNIQUE index. The performace is
> probably going to be similar to that of just creating a UNIQUE index up
> front. But only measurements with your particular data can tell for sure.
>

I was planning using the HashList from the .Net framework, which should be
the fastest way to do it.

If SQLite only has to check if a certain position is occupied, it should
outperform creating a hash for each value.

What got me worried was the post "Slow insertion for Unique Text Column" to
this mailinglist, I quote:

---

I was creating a new table and populating it with 100,000 rows of data (as
a
test case; I really wanted to populate it with over a million rows).

[Insertion A] When a Text Column was NOT Unique it would take:
8875 ms = ~9 seconds

[Insertion B] When a Text Column was Unique it would take:
155781 ms = ~156 seconds

-

The difference between 9 and 156 seconds is too large for me. The original
author solved this by pre-sorting the data for the indexed column, which
made the difference 9 to 12 seconds (including the pre-sort), which is very
acceptable.

So why does SQLite not pre-sort the data itself, when doing such large batch
inserts inside a transaction?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Database file structure

2011-10-23 Thread Fabian
I have two tables, both containing 1 million rows, which frequently need to
be joined by rowid. Right now, the insert loop is like this:

For I = 1 to 1000
 INSERT INTO TABLE1 ...
 INSERT INTO TABLE2 ...
Next

When I look at the structure of the created database-file, the rows for the
two tables are in alternating pattern. At first I thought this was a good
sign, because when the two rows needs to be joined, they are very close to
eachother on disk.

An other way to do the inserts would be:

For I = 1 to 1000
 INSERT INTO TABLE1 ...
Next

For I = 1 to 1000
 INSERT INTO TABLE2 ...
Next

Now, the actual data of the two rows, are not close to eachother on disk,
but the structure looks very clean and not fragmented.

Are there any significant performances differences to be expected when
choosing the first method vs the second? I guess that using the first method
JOINS will be faster (because the data is close), but SELECTs on a single
table will be slower (because the rows are scattered around the file), but I
don't know enough about the internals of SQLite to know if that's true.

So what is generally more preferable, or doesnt it make any difference?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] UNIQUE constraint

2011-10-23 Thread Fabian
I have a column with a normal INDEX, and I would like to turn it into an
UNIQUE index, but I'm a bit worried about the performance implications for
inserts. Can someone give some insight into how UNIQUE is implemented in
SQLite, does it create extra tables compared to a normale index, are there
many extra checks?

And a related question: I need to insert 1 million rows, and currently I
create the INDEX afterwards, because that should be faster. I cannot create
the UNIQUE INDEX afterwards, because there's a slight possibility there will
be 1 or 2 duplicates, which will make the creation fail. I now have the
possibility to specify UNIQUE upfront (in the TABLE definition) or manually
filter out any doubles before the insert (which will also take CPU time).
Would there be any advantage doing this manually, or will SQLite do it just
as efficiently?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Stand-Alone INDEX

2011-10-22 Thread Fabian
I have a very simple table: it just contains one single (text) column with
an index. This table contains million of hash-values, and because of the
index, SQLite is storing all the data twice. Behind the scenes it creates a
second table, containing all the same values in sorted order, causing the
database size to double.

Because all the data I need is also in this second index-table, is there
some kind of way to get rid of my original table, and still be able to
insert new items? My initial thought was to change the schema of my table so
that it only has a TEXT PRIMARY KEY and no other columns, but SQLite
internally still creates an INTEGER rowid, so the end-result made no
difference.

So is there some way to have a 'stand-alone index', which doesn't store
everything twice?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS: Reduce tokens

2011-10-19 Thread Fabian
Using the default tokenizer, everything that is not an alphanumeric
character or an underscore, will generate a new token.

I have a lot of columns that contains e-mail addresses or URL's, and most of
them have characters like '.', '@' and '/'. Is there a simple way to make
FTS see them as one single token, instead of splitting those strings into
many small ones? I know it's possible to develop a custom tokenizer, but
that's way over my head I'm afraid :)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Fabian
2011/10/19 Scott Hess 

>
> To be clear, how it works is that new insertions are batched into a
> new index tree, with index trees periodically aggregated to keep
> selection efficient and to keep the size contained.  So while the
> speed per insert should remain pretty stable constant, periodically an
> insert will require index maintenance, so that insert will be slower.
> If you have a lot of documents (or a small page cache) these
> maintenance events can get pretty expensive relative to the cost of a
> non-maintenance insert.  So it's not a clear-cut win, but it probably
> would be interesting as an alternative sort of index for some tables.
>
>
I always do inserts in batches of 100.000 rows, and after each batch I
manually merge the b-trees using:

 INSERT INTO table(table) VALUES('optimize');

Is there a possibility that it will do automatic maintenance half-way during
a batch? Or will it always wait untill the transaction is finished?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Fabian
2011/10/19 Alexey Pechnikov 

> FTS use index multi-tree and de-facto has _no_ insert speed degradation.
>

Thanks, that's good to hear! It makes me wonder why SQLite doesn't use that
same multi-tree mechanism for regular indexes, but that's a whole different
question.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS vs INDEX

2011-10-19 Thread Fabian
Very interesting benchmarks! However it seems to focus mainly on the speed
of SELECT queries, and the total size of the resulting database on disk. But
my main concern is about the speed of INSERT queries vs normal tables. Any
chance you compared that too?

2011/10/19 Black, Michael (IS) <michael.bla...@ngc.com>

> I recently benchmarked this...FTS4 has a prefix option that can make it
> slightly faster than TEXT.  Other than that it's about the same speed.
>
> http://www.mail-archive.com/sqlite-users@sqlite.org/msg64591.html
>
>
>
> The older part of the thread has the benchmark data
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> Advanced Analytics Directorate
>
> Advanced GEOINT Solutions Operating Unit
>
> Northrop Grumman Information Systems
>
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
> behalf of Fabian [fabianpi...@gmail.com]
> Sent: Wednesday, October 19, 2011 9:20 AM
> To: General Discussion of SQLite Database
> Subject: EXT :[sqlite] FTS vs INDEX
>
> Did anyone do some benchmarks how the insert-speed of FTS compares to a
> TEXT
> INDEX column? I don't need many of the extra features of FTS, because I
> always need to look up rows by prefix or exact match, and both can be
> implemented efficiently via TEXT INDEX too. But if the overhead is
> comparable, I'd rather use FTS.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS vs INDEX

2011-10-19 Thread Fabian
Did anyone do some benchmarks how the insert-speed of FTS compares to a TEXT
INDEX column? I don't need many of the extra features of FTS, because I
always need to look up rows by prefix or exact match, and both can be
implemented efficiently via TEXT INDEX too. But if the overhead is
comparable, I'd rather use FTS.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fast JOIN

2011-10-18 Thread Fabian
2011/10/19 Fabian <fabianpi...@gmail.com>
>
>
> Maybe there is just no easy way to do what I want, I will take a good night
> sleep about it ;)
>
>
I think the correct query would be:

SELECT table1.data1, table2.data2
FROM
(
SELECT table1.rowid, table1.data1
WHERE table1.data1 = 10
) AS table1
JOIN table2
ON table1.rowid = table2.rowid
ORDER BY table2.data2 DESC
OFFSET 0 LIMIT 250

Which should make sure that only the rows matching WHERE are going to be
sorted (instead of all rows).

Since I moved the TEXT columns out of my main table to a FTS table, I have
all these complex and inefficient joins, because all data is divided between
two tables with identical rowids. I hope some day FTS supports INTEGER
columns, so I can merge the two back together, and get rid of the joins :)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fast JOIN

2011-10-18 Thread Fabian
>
>
> Assuming that is a typo, does the following not do what you want?
>
> SELECT table1.data1, table2.data2
> FROM
> (
> SELECT table1.rowid, table1.data1
> WHERE table1.data1 = 10
> ORDER BY table1.rowid DESC
> OFFSET 0 LIMIT 250
> ) AS table1
> JOIN table2
> ON table1.rowid = table2.rowid
> ORDER BY table2.rowid;
>
>
No, because it only sorts the current page. When the user scrolls down using
the scrollbar, and the second page is fetched (OFFSET 250 LIMIT 250), it
does not match up with the sorting of the previous page. Suppose it was
sorted by a TEXT column of table2, it would look like this:

Page 1:

aaa
abb
ddd

Page 2:

bba
cca
dda

Maybe there is just no easy way to do what I want, I will take a good night
sleep about it ;)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Fast JOIN

2011-10-18 Thread Fabian
I'm working on a pagination system where two tables need to be joined. At
first my query looked like this:

SELECT table1.data1, table2.data2
FROM table1
JOIN table2 ON table1.rowid = table2.rowid
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
OFFSET 0 LIMIT 250

I got very good advice on this mailing-list, to change the query into this:

SELECT table1.data1, table2.data2
FROM
(

SELECT table1.data1
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
OFFSET 0 LIMIT 250

) AS table1
JOIN table2
ON table1.rowid = table2.rowid

This returns exactly the same results, but the performance is much better
because it only has to JOIN the rows that match the WHERE clause.

Soon, the first problem appeared: in some cases the WHERE clause was on a
column from table2, for example: WHERE table2.data2 = 10, for which the
above query wouldn't work. I changed my code so that when it detects the
WHERE is on table2, it uses this query:

SELECT table1.data1, table2.data2
FROM
(

SELECT table2.data2
WHERE table2.data2 = 10
ORDER BY table2.rowid DESC
OFFSET 0 LIMIT 250

) AS table2
JOIN table1
ON table2.rowid = table1.rowid

Which is nothing more than the exact reverse of the previous query. When my
code detects that WHERE needs BOTH table1 and table2, it uses the old (slow)
JOIN. So far, so good.

Then the second problem appeared: the user is free to sort on any column of
the output. So you can have the situation where the WHERE is on table1 but
ORDER BY is on table 2. At first I wanted to solve this by using the old
(slow) JOIN for cases like this. But it turned out to be unworkable: if the
query only returns 3 rows, it was still spending precious seconds sorting
all the other rows in the table.

Is there any solution for this? All things I tried completely break the
pagination system, because it fully depends on having ORDER and OFFSET in
the same sub-query.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit COUNT

2011-10-17 Thread Fabian
2011/10/16 Frank Missel <i...@missel.sg>

>
> But it sounds a bit like Fabian both wants to have the total number of
> records available and at the same time limit the count.
>
>
No, I only want to have a capped total available.

If I would go with Simons solution, I have to read the rows for the first
100 pages (or whatever the cap is) into a temporary table, just to show the
first page. I don't need a cache for all those other pages, so that seems a
lot of overhead. I only want to know if there are 100 or less pages (without
copying data around).

Maybe COUNT() is also creating a temporary table behind the scenes, then the
performance of Simons solutions would be comparable with what I have now,
and I would have the advantage that I can re-use that table to show
subsequent pages without reading from disk.

But I always assumed COUNT() was faster than copying between tables, maybe I
should just benchmark it.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit COUNT

2011-10-16 Thread Fabian
2011/10/16 Frank Missel 

>
> What do you want to attain with the count?
>
>
I want to allow users to paginate through a result set. The pages are
retreived through LIMIT/OFFSET, but to calculate the total number of pages,
I have execute a separate COUNT() query (without LIMIT) once.

Because I'm basicly executing the same query twice just to get a total
count, I'm trying to optimize this. Restricting the maximum number of pages
to 10 should improve performance, if there was some way to put make COUNT()
respect the LIMIT specified.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Limit COUNT

2011-10-16 Thread Fabian
How can you limit a count-query? I tried:

SELECT COUNT(*) FROM table LIMIT 5000

But it ignores the LIMIT clause. I think the workaround would be counting
the results of a sub-query, but I'm trying to understand whats wrong with
the syntax above. The goal is to make the engine stop iterating after it
reached the upper limit, in order to save performance.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Simon Slavin 

>
> So that should never happen, right ?  ANALYZE is meant to make things
> faster, not slower.  So is that an actual fixable bug or is it one of those
> extremely unlikely situations that is hard to fix ?
>
> So, OP, try issuing these commands
>
> DROP TABLE sqlite_stat1;
> DROP TABLE sqlite_stat2;
>
> then see whether anything improves.
>
>
I already tried that, but it returned:

Error: table sqlite_stat1 may not be dropped
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
>
> Do you have an sqlite_stat1 table in the database (created by
> running ANALYZE)? What is the output of the shell command
> How about the contents of the "sqlite_stat1" table? What does
> the shell command ".dump sqlite_stat1" show?
>
>
This is the output with a fresh database, where ANALYZE hasn't been run:

EXPLAIN SELECT * FROM table1 WHERE data=1 LIMIT 250 OFFSET 5;

0|Trace|0|0|0||00|
1|Integer|250|1|0||00|
2|Integer|5|2|0||00|
3|MustBeInt|2|0|0||00|
4|IfPos|2|6|0||00|
5|Integer|0|2|0||00|
6|Add|1|2|3||00|
7|IfPos|1|9|0||00|
8|Integer|-1|3|0||00|
9|Integer|1|4|0||00|
10|Goto|0|39|0||00|
11|OpenRead|0|2|0|13|00|
12|OpenRead|1|31079|0|keyinfo(1,BINARY)|00|
13|SeekGe|1|36|4|1|00|
14|IdxGE|1|36|4|1|01|
15|IdxRowid|1|5|0||00|
16|Seek|0|5|0||00|
17|AddImm|2|-1|0||00|
18|IfNeg|2|20|0||00|
19|Goto|0|35|0||00|
20|IdxRowid|1|6|0||00|
21|Column|1|0|7||00|
22|Column|0|2|8||00|
23|Column|0|3|9||00|
24|Column|0|4|10||00|
25|Column|0|5|11||00|
26|Column|0|6|12||00|
27|Column|0|7|13||00|
28|Column|0|8|14||00|
29|Column|0|9|15||00|
30|Column|0|10|16||00|
31|Column|0|11|17||00|
32|Column|0|12|18||00|
33|ResultRow|6|13|0||00|
34|IfZero|1|36|-1||00|
35|Next|1|14|0||00|
36|Close|0|0|0||00|
37|Close|1|0|0||00|
38|Halt|0|0|0||00|
39|Transaction|0|0|0||00|
40|VerifyCookie|0|10|0||00|
41|TableLock|0|2|0|table1|00|
42|Goto|0|11|0||00|

SELECT * FROM table1 WHERE data=1 ORDER BY rowid DESC LIMIT 250 OFFSET
5;

0|Trace|0|0|0||00|
1|Noop|0|0|0||00|
2|Integer|250|1|0||00|
3|Integer|5|2|0||00|
4|MustBeInt|2|0|0||00|
5|IfPos|2|7|0||00|
6|Integer|0|2|0||00|
7|Add|1|2|3||00|
8|IfPos|1|10|0||00|
9|Integer|-1|3|0||00|
10|Integer|1|4|0||00|
11|Goto|0|40|0||00|
12|OpenRead|0|2|0|13|00|
13|OpenRead|2|31079|0|keyinfo(1,BINARY)|00|
14|SeekLe|2|37|4|1|00|
15|IdxLT|2|37|4|1|00|
16|IdxRowid|2|5|0||00|
17|Seek|0|5|0||00|
18|AddImm|2|-1|0||00|
19|IfNeg|2|21|0||00|
20|Goto|0|36|0||00|
21|IdxRowid|2|6|0||00|
22|Column|2|0|7||00|
23|Column|0|2|8||00|
24|Column|0|3|9||00|
25|Column|0|4|10||00|
26|Column|0|5|11||00|
27|Column|0|6|12||00|
28|Column|0|7|13||00|
29|Column|0|8|14||00|
30|Column|0|9|15||00|
31|Column|0|10|16||00|
32|Column|0|11|17||00|
33|Column|0|12|18||00|
34|ResultRow|6|13|0||00|
35|IfZero|1|37|-1||00|
36|Prev|2|15|0||00|
37|Close|0|0|0||00|
38|Close|2|0|0||00|
39|Halt|0|0|0||00|
40|Transaction|0|0|0||00|
41|VerifyCookie|0|10|0||00|
42|TableLock|0|2|0|table1|00|
43|Goto|0|12|0||00|

Is this the expected output? If so, ANALYZE was to blame. The query
containing 'ORDER BY rowid DESC' is still slower than the one which doesn't
specify any order, but the results are closer to eachother now.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Dan Kennedy 

>
> Your EXPLAIN output shows that it is doing a linear scan of
> table1. Which is different from what I get here with the same
> schema and query. When I run them here, both queries (with and
> without the "ORDER BY rowid") use the same query plan.
>
> Do you have an sqlite_stat1 table in the database (created by
> running ANALYZE)? What is the output of the shell command
> How about the contents of the "sqlite_stat1" table? What does
> the shell command ".dump sqlite_stat1" show?


You are right, I issues an ANALYSE a couple of days ago, I completely forgot
about that. Maybe that's causing a negative impact on the performance.

sqlite> .dump sqlite_stat1
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
ANALYZE sqlite_master;
INSERT INTO "sqlite_stat1" VALUES('table1','ididx','463923 66275');
COMMIT;
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Dan Kennedy 

>
> Good question. Can you enter the following commands into
> the shell tool and post the complete output (no "QUERY PLAN"
> this time):
>
>  .version
>  .schema
>  .explain
>
>  EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
> 50;
>
>  EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50;
>
>
sqlite> .version
SQLite 3.7.8 2011-09-19 14:49:19 3e0da808d2f5b4d12046e05980ca04578f581177
sqlite> .schema
CREATE TABLE table1(id INTEGER PRIMARY KEY, data INT);
CREATE INDEX ididx ON table1(data);
sqlite> .explain
sqlite>

EXPLAIN SELECT * FROM table1 WHERE data=10 ORDER BY rowid LIMIT 250 OFFSET
50;

addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Trace  0 0 000
1 Noop   0 0 000
2 Integer250   1 000
3 Integer5  2 000
4 MustBeInt  2 0 000
5 IfPos  2 7 000
6 Integer0 2 000
7 Add1 2 300
8 IfPos  1 10000
9 Integer-13 000
10Integer1 4 000
11Goto   0 36000
12OpenRead   0 2 0 12 00
13Rewind 0 34000
14Column 0 1 500
15Ne 4 335 collseq(BINARY)  6c
16AddImm 2 -1000
17IfNeg  2 19000
18Goto   0 33000
19Rowid  0 7 000
20Column 0 1 800
21Column 0 2 900
22Column 0 3 10   00
23Column 0 4 11   00
24Column 0 5 12   00
25Column 0 6 13   00
26Column 0 7 14   00
27Column 0 8 15   00
28Column 0 9 16   00
29Column 0 1017   00
30Column 0 1118   00
31ResultRow  7 12000
32IfZero 1 34-1   00
33Next   0 14001
34Close  0 0 000
35Halt   0 0 000
36Transaction0 0 000
37VerifyCookie   0 10000
38TableLock  0 2 0 table1  00
39Goto   0 12000

EXPLAIN SELECT * FROM table1 WHERE data=10 LIMIT 250 OFFSET 50;

addr  opcode p1p2p3p4 p5  comment
  -        -  --  ---
0 Trace  0 0 000
1 Integer250   1 000
2 Integer5  2 000
3 MustBeInt  2 0 000
4 IfPos  2 6 000
5 Integer0 2 000
6 Add1 2 300
7 IfPos  1 9 000
8 Integer-13 000
9 Integer1 4 000
10Goto   0 38000
11OpenRead   0 2 0 12 00
12OpenRead   1 123405  0 keyinfo(1,BINARY)  00
13SeekGe 1 354 1  00
14IdxGE  1 354 1  01
15IdxRowid   1 5 000
16Seek   0 5 000
17AddImm 2 -1000
18IfNeg  2 20000
19Goto   0 34000
20IdxRowid   1 6 000
21Column 1 0 700
22Column 0 2 800
23Column 0 3 900
24Column 0 4 10   00
25Column 0 5 11   00
26Column 0 6 12   00
27Column 0 7 13   00
28Column 0 8 14   00
29Column 

Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Dan Kennedy 

>
> If SQLite cannot use an index to for an ORDER BY in a SELECT
> query and has to do an external sort, the EXPLAIN QUERY PLAN
> output will have something like this in it:
>
>  0|0|0|USE TEMP B-TREE FOR ORDER BY
>
>
Since my EXPLAIN does not show this, what else could be the cause for the
large decrease in performance when ordering by rowid?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Petite Abeille 

>
> Hurray! Now you must have the finest query ever to grace the intraweb! A
> true work of beauty :))
>
>
Here in Belgium we have a saying: Who doesnt honor the petite, is not worth
the big ;)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Simon Slavin 

>
> If you have an index on the data column then that's the index SQLite would
> used for that query.  Once it has picked that index it no longer has access
> to the rowid index.
>
> CREATE INDEX tdr ON table (data, rowid)
>
>
Thanks! That explains everything! Can I remove the index for 'data' if I
create an index like that, or do I need to keep them both?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
2011/10/14 Petite Abeille 

>
> Much? Really? I get the broadly same execution time for either variant:
>
> 0|0|0|SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~2192503 rows)
>
> 0|0|0|SCAN TABLE mail_header (~2192503 rows)
>
>
I get

SELECT mail_header.rowid
FROM mail_header
WHERE data = 1
ORDER BY mail_header.rowid DESC  LIMIT 250 OFFSET 142750

0 0 0 SCAN TABLE mail_header USING INTEGER PRIMARY KEY (~46392 rows)

VS

SELECT mail_header.rowid
FROM mail_header
WHERE data = 1
LIMIT 250 OFFSET 142750

0 0 0 SEARCH TABLE mail_header USING INDEX ididx (id=?) (~66275 rows)


So you seem to get the same speed in both queries, because you have no index
on the column in the WHERE clause, making my EXPLAIN differs from yours? But
when the column is indexed, the 'ORDER BY' makes the query much slower.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-14 Thread Fabian
>
>
> Exactly.
>
>
I still don't have optimal performance in the query (although it's much
better now), and it seems to be related to ORDER BY.

When I execute:

SELECT * FROM table WHERE data = 10 LIMIT 250 OFFSET 50

It's very fast, but it's get much slower (10 times) when I add an ORDER BY
clause, like rowid ASC or rowid DESC.

I'm trying to understand why this is. It seems like SQLite is actually
performing an actual sort behind the scenes, while I expected it to just
iterate in reverse order (because rowid is always incremental), which should
give comparable performance as the first query?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow JOIN on ROWID

2011-10-13 Thread Fabian
2011/10/12 Petite Abeille 

>
> Now the join is performed only 250 times, adding just a small overhead
> compare the the bare bone query without the join.
>
> The short of it: minimize the amount of work upfront :)
>
>
Thank you very much! This approach solved the problem. However, in my
situation I need to select a lot more columns than just 'id' from
'mail_header', and when I look at the resulting query it appears it is
selecting all those columns twice.

Would it be smart to change the query so that the inner loop only selects
mail_header.id, and adding a third join that fetches the extra columns from
'mail_header'? Or would the performance penalty from adding a third join
out-weight the advantage of selecting less columns in the inner loop?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Slow JOIN on ROWID

2011-10-12 Thread Fabian
Why is this very fast (20 ms):



SELECT table1.data1, table1.data2
FROM table1
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
LIMIT 250
OFFSET 50

0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)



And this very slow (3500ms):



SELECT table1.data1, table2.data2
FROM table1 JOIN table2 ON table1.rowid=table2.rowid
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
LIMIT 250
OFFSET 50

0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows)



The values in table1.data2 and table2.data2 are 100% identical, because my
app has always duplicated all columns of table2 (fts4 table) into table1 to
work around this issue. But I really like to solve it some day, because it
doubles my database size.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4: OFFSET Performance

2011-10-12 Thread Fabian
2011/10/12 Dan Kennedy 

>
> (INDEX 1) == lookup by rowid.
>
>
Okay! Then maybe my problem is unrelated to FTS, I will create a new
question.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4: OFFSET Performance

2011-10-12 Thread Fabian
2011/10/12 Dan Kennedy 

>
> Can we see the output of EXPLAIN for this query?
>

>

Without selecting table2.data2:

0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)

Including selecting table2.data2:

0 0 0 SEARCH TABLE table1 USING INDEX data1idx (data1=?) (~3 rows)
0 1 1 SCAN TABLE table2 VIRTUAL TABLE INDEX 1: (~0 rows)

So it looks like the whole problem is caused by the 'SCAN' on table2. Is
there any reason why SQLite does this SCAN instead of a SEARCH, since I JOIN
the two tables on ROWID, why does it have to do a full table scan?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] OFFSET Performance

2011-10-12 Thread Fabian
2011/10/12 Igor Tandetnik 

>
> See if this helps: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor


I tried to implement the method as suggested in the article, but it will
only work for pagination where the user is only allowed to go 1 page back or
1 page forward (since you have to remember the last rowid). In my case, the
user is allowed to jump to the last page, without visiting any of the
previous pages, making the suggested method impossible to implement.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] OFFSET Performance

2011-10-12 Thread Fabian
2011/10/12 Igor Tandetnik 

>
> See if this helps: http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor


Thanks! Very interesting! I already was aware that using a large OFFSET
could potentially be slow, because SQLite internally reads all preceding
rows, and just discards them. But I do my offsets exclusively on rowid, and
it appears there is some optimization taking place, since I can specify very
large offsets, without a change in performance. So I assumed that SQLite is
taking some shortcut when using rowid for OFFSET, and I was only wondering
why this 'shortcut' is disabled when joining a FTS table.

I'm going to implement the method described in the above article, and see if
it makes any differences.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4: OFFSET Performance

2011-10-12 Thread Fabian
2011/10/12 Dan Kennedy 

>
> Are you using 3.7.7 or newer? And specifying the order=DESC
> option when creating the FTS4 table? If not, it might be a big
> help in this case.
>
>  
> http://www.sqlite.org/fts3.**html#fts4order
>
>
Yes I'm using 3.7.7, and I already considered specifying order=DESC upon
creation, but I suspect it will reverse the problem: Slow queries with
OFFSET=0 and fast queries with a large OFFSET, which would be no solution.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4: OFFSET Performance

2011-10-12 Thread Fabian
2011/10/12 Petite Abeille 

>
> In any case, you can easily refactor the query into two steps:
>
> (1) First get your pagination
> (2) Then join to your fts
>
>
I tried implementing your refactoring, but immediately run into a problem.
The user must be able to sort the data on any of the columns returned.
So 'ORDER
BY table1.rowid DESC' is really a variabele, which could easily be 'ORDER BY
table2.data2 DESC', in which cases the query fails, because .data2 is not
selected in the inner query. And I cannot move 'ORDER BY' into the outer
loop, because else it will only sort the 250 results returned, not the
underlying data.

So the questions remains: why does increasing OFFSET kill performance when
selecting table2.data2 in additation to the columns of table1.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] FTS4: OFFSET Performance

2011-10-12 Thread Fabian
>
>
> > How would I optimize the above queries, to take advantage of the
> > LIMIT/OFFSET values, making them suitable for fast pagination?
>
> Are you sure the issue is with the fts table? Or is it the order by desc?
>
>
If I omit the FTS table in the query, like so:

SELECT table1.data1
FROM table1
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
LIMIT 250
OFFSET 0

The query always executes fast, independant of how large OFFSET is. The only
difference is that I don't include 'table2.data2' in the output, so that
leads me to think that the issue is related to FTS, or is there another
possibility? And in the FTS docs I read:

"Because of the way SQLite's virtual table interface works, retrieving the
value of the "title" column requires loading the entire row from disk
(including the "content" field, which may be quite large). This means that
if the users query matches several thousand documents, many megabytes of
"title" and "content" data may be loaded from disk into memory even though
they will never be used for any purpose."

Which seems to confirm that the reason the query is so slow, is because I
include the value of 'table2.data2' in the output?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS4: Use Cases

2011-10-12 Thread Fabian
I have three types columns in my table, which I want to move to a FTS4
table, but I'm not sure whether that would make sense.

1.) This column contains hash-values. I always need to lookup 100% exact
matches. Normally you would store them with 'TEXT INDEX' in a standard
table. Would there be any advantage moving this column to FTS? Or would it
perform worse than 'TEXT INDEX'?

2.) This column also contains hash-values, but I always need to look them up
by prefix (LIKE 'start%'), never by exact match. And the size of this prefix
is unknown in advantage, so I can't use FTS4 optimization for fixed-size
prefix searches. Is there any advantage over a traditional LIKE query? Since
I suspect FTS to only index full words, which would mean no advantage.

3.) This column contains a list of short tags seperated by spaces. For
example: 'v45 s12 h65', which I currently look up using 'LIKE '%tag%', to
search for a certain combination of tags. In a good database-design, these
tags would be stored in a second table, referencing the 'rowid' of the first
table. But every row can have many tags, and storing them in normalized
form, would make the INSERTS very slow, because I have to insert an extra
row for each tag, instead of just one row. Since my app does much more
inserts than actual selects, I choosed to store them into a single column.
Would there be any advantages storing these tags into a FTS column, so I can
do quick lookups? One concern I have is that they're not unique enough (just
like stopwords), eventually bloating the index. Or is the only right option
to create a second table with a TEXT INDEX column on these tags?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] FTS4: OFFSET Performance

2011-10-12 Thread Fabian
I have two tables, one normal table (table1), and one FTS4 table (table2).
The ROWID's between the tables are in sync. I often run queries on table1,
but every row returned must also include a value from table2 (data2), so I
create a query like this:

SELECT table1.data1, table2.data2
FROM table1 JOIN table2 ON table1.rowid=table2.rowid
WHERE table1.data1 = 10
ORDER BY table1.rowid DESC
LIMIT 250
OFFSET 0

This query returns the results as expected, and performs well. But as soon
as I raise the OFFSET to a large value (for pagination) the performance
drops drastically. After a long search through the documentation, I found
out the reason: the SELECT on table2.data2 causes FTS4 to load the entire
column from disk, and doesn't take into account the LIMIT/OFFSET clauses.

I have a related query, where I do a MATCH on table2.data2,

SELECT table1.data1, table2.data2
FROM table2 JOIN table1 ON table2.rowid=table1.rowid
WHERE table2.data2 MATCH 'value' AND table1.data1 = 10
ORDER BY table1.rowid DESC
LIMIT 250
OFFSET 0

Here the problem is even worse. When MATCH 'value' returns a lot of results,
the query is very slow (even with OFFSET 0) because it ignores the fact it
only has to return the first 250 results.

How would I optimize the above queries, to take advantage of the
LIMIT/OFFSET values, making them suitable for fast pagination?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Exclusive transaction severly slows down the queries increasingly in time

2005-08-17 Thread Fabian Matyas
Hi,

I use SQLite 3.0.8 with Quicklite 1.5.0 as wrapper for
it on Mac OS X 10.3.

I made a simple test which creates a table with 1
rows. The table contains an indexed column named
'Path'.

This is the query I ran:

for i=0 to 1000 do
begin
 begin exclusive/immediate transaction

 select where Path= 

 end transaction
end

If I use simple beginTransation, the for cycle ends in
8 seconds which is a good time.

The problem:
If I use beginImmediateTransaction or
beginExclusiveTransaction the for cycle ends in 9
minutes. The first 10-20 queries run ok, but after
that, the beginExclusive/ImmediateTransaction
instructions eat more and more time increasingly. At
the end of the , one single select lasts between 1/2
and 1 second which IS VERY MUCH.

If you'll argue that I should include the whole cycle
in a transation, the reason why I don't do that is
that my application executes some queries on the table
from time to time.

Please help,
Fabian

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com