[sqlite] Fastest way to add many simple rows to a table?

2016-05-31 Thread Gabriel Corneanu
Hi Eric,

As I know you from Delphi related projects, I thought it would be nice to
share my ideas.
I am using a home-grown framework for sqlite persistence, optimized for
storing records into tables.
Main ideas:
- using virtual tables (with the same structure) to speed up sqlite inserts
a write translates to: insert into test("ID") select "ID" from temp.tmp_test

- based on RTTI to generate optimized code for accessing/returning data
- supports all common data (ints, floats, string, blob/TBytes)

I tried your example and got ~0.45s for json and ~0.37s for my
implementation; I have also tried using simple recursive cte (see code
below) but it was slower, ~0.66s.
I think most of the time is spent in packing the integers and building the
btree(s) (i.e. CPU bound). Enabling the "name" field below ("storage"
attribute) increased the processing time (my implementation only) to ~0.47,
which I think confirms my interpretation.

Interesting: when NOT using "primary key" (I have a flag for this), the
times are ~ 0.26s vs 0.18s vs 0.47s !
Looks like sqlite is optimized to generate the rowid, and it is slower when
explicitly set (even if it's the same value).

I agree json works fast enough for such simple example, but I don't know
how you could use it for e.g. 2 fields;


Hope it helps,
Gabriel

Code:

type
  TTestRecord = record
[Storage('ID', [sfPrimaryKey])]
//[Storage('ID')]
ID : Int32;
//[Storage('name')]
name: string;
  end;

procedure TForm2.Button2Click(Sender: TObject);
var
  db : TSQLiteDatabase;
  tb : TVirtualTableDataArray;
  r : TTestRecord;
  i : integer;
  pf, t,t1,t2,t3: int64;
  json: TStringBuilder;
begin
  tb := TVirtualTableDataArray.Create;
  json := TStringBuilder.Create;
  db := TSQLiteDatabase.Create;
  db.OpenWrite(':memory:');

  tb.Setup(db.DB.DB, 'test');
  tb.CreateTable;
  tb.VTInit;

  json.Append('[');
  for i := 1 to 100 do
  begin
r.ID := i;
r.name := i.ToString;
tb.Items.Add(r);
if i > 1 then
  json.Append(',');
json.Append(i);
  end;
  json.Append(']');

  QueryPerformanceCounter(t);
  db.db.ExecSQL('insert into test(id) select value from json_each(?)',
[json.ToString]);
  QueryPerformanceCounter(t1);
  t1 := t1-t;

  tb.CreateTable;
  QueryPerformanceCounter(t);
  tb.WriteData;
  QueryPerformanceCounter(t2);
  t2 := t2-t;

  tb.CreateTable;
  QueryPerformanceCounter(t);
  db.db.ExecSQL('WITH RECURSIVE ' +
'cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE
x<'+tb.Count.ToString+
') insert into test(id) select x from cnt');
  QueryPerformanceCounter(t3);
  t3 := t3-t;


  QueryPerformanceFrequency(pf);
  ShowMessageFmt('N=%d  T1=%.2fs  T2=%0.2fs T3=%.2fs', [tb.Count, t1/pf,
t2/pf, t3/pf]);

  tb.Free;
  db.Free;
  json.Free;
end;
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Select count(*)

2014-12-11 Thread Gabriel Corneanu
I asked a similar question some time ago...
See here:
https://www.mail-archive.com/sqlite-users@sqlite.org/msg77488.html

Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] YACC/Bison vs. Lemon vs. standard input

2014-07-23 Thread Gabriel Corneanu

Hi,

From the old time in school, Bison is a LL parser and I have see that 
Lemon is LR (LALR(1) ).
That's it, Bison reduces as soon as it can while Lemon shifts as much as 
it can.


You have to force Lemon to reduce "prog" rule, i.e. newline should be 
part of another rule:

E.g. modified ladd.y:
-->
start   ::= loop .
loop::= prog NL loop .

prog::= expr(a) prog .   { printf("%d\n", a); }
prog::= .

expr(a) ::= INTEGER(b) .{ a = b; }
expr(a) ::= expr(b) PLUS expr(c) .  { a = b + c; }

<--

Regards,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite4 release date and how to compile on windows platform

2013-10-06 Thread Gabriel Corneanu

Simon,
I have asked several times similar questions, and got similar replies.
This answer is simply NOT correct.
While I understand nobody wants to commit to a release/stable version, I 
really wanted to evaluate how it fits my needs.

Therefore I took some time to understand how could I compile it.
I found the unpleasant truth: at this time (ok, a few weeks ago) Windows 
is not supported at all.
The file lsm_unix.c is not ported to Windows (contains low level file 
access, shared memory, memory mapped files, etc).


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-05 Thread Gabriel Corneanu

Hi James,

About complexity: I'm not sure it's NlogN; for each N you need to count 
N-1 columns, that's N^2 IMO.

And you have an EXTRA temporary B-TREE? Doesn't it matter??
Although I don't really understand why, it has an index on it.

My original concern is indeed simplicity and efficiency.
But I'm NOT advocating any complex RANK function, but rather to expose 
(as pseudo-data) the row index IN THE CURRENT RESULT SET; while usually 
the result set is ordered, it doesn't really matter. This information 
HAS NO other technical meaning (like rowid).


Like having about a list (on paper) containing (pseudo) RANDOM data and 
telling someone "look at 13th row"; by 13th I mean exactly as the list 
is printed.

The other one can much easily spot 13 if available than to quickly count.
If instead of row number I use "the row having xxx in it", the other one 
has to do some visual search (binary if the data would be sorted).


Now all I say is, if I print the list from my code I can simply generate 
such info.
But it would be greatly simplified if I can have it from SQL without 
complex/inefficient constructs, especially because it already does 
something similar for LIMIT.


Of course I have to accept "no" as an answer if most people are against 
this idea.


Regards,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-03 Thread Gabriel Corneanu

Keith,
I find this a little arrogant. I could say exactly the opposite, not 
caring about efficiency is ... (introduce whatever words you like).


Why is this ranking addressed here?
http://www.schemamania.org/sql/#rank.rows

Why do you need "order by" in sql but may not ask the ranking having a 
given order??


Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-03 Thread Gabriel Corneanu

James,

I reply from the web and I can't easily quote.

I don't really want to argue whether it's a workaround or not. I 
understand perfectly that's valid standard sql.
However please accept that the given sql is quite complex; you have to 
duplicate in the join clause the ordering...


About the complexity, practical example (simplified from my case):
CREATE TABLE temp.params (Chan INTEGER unique);
insert into params values(2);
insert into params values(4);
insert into params values(3);
insert into params values(1);

explain query plan
select count(lesser.rowid) as RANK, S.rowid, S.chan
from params as S
left outer join params as lesser
on   S.chan >= lesser.chan
group by S.rowid
order by S.chan

This gives:
SCAN TABLE params AS S USING INTEGER PRIMARY KEY
SEARCH TABLE params AS lesser USING COVERING INDEX 
sqlite_autoindex_params_1 (Chan
USE TEMP B-TREE FOR ORDER BY

Compare to the simple usage:
explain query plan
select S.rowid, S.chan from params as S order by S.chan

returns:
SCAN TABLE params AS S USING COVERING INDEX sqlite_autoindex_params_1

I didn't even try, but AFAIK standard sql does NOT allow you to select 
additional columns if they are not in group by clause (which would break 
the logic), isn't it??? (though sqlite accepts it).
Of course, there would be another sql solution for this too, using 
another join :) .


Given all that, I will NEVER use the pure sql (if I can use any other 
solution).


Regards,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-02 Thread Gabriel Corneanu

James,

Indeed, that's the case.
However, I can't imagine this to be efficient. It's just a pure sql 
workaround to a counter.

The only advantage is, it is standard sql and should work with any engine.

I wonder if sqlite would make some optimizations out of it, otherwise 
it's O(n^2) WHEN having an index for the sort field.
Update: with no index, sqlite uses only one temp b-tree which would be 
needed anyway for sorting.


Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-02 Thread Gabriel Corneanu

Simon,

Of course the rowid is THE field to use to uniquely identify a row (for 
edit, delete, etc...).
I am talking about some ranking a result set. The ranking can be used 
for displaying (in HTML or a desktop GUI) or as source for insert/update 
sql.
Of course in code you can have your own counter; what about writing a 
simple sql in the shell tool or any sql editor (I use SqliteSpy).


There were lots of other questions about "re-assigning" the rowid to 
"natural" numbers (1..n); a solution would be to assign the rank to a 
field (another field, not the rowid).


See the reply from James for the problem and a sql solution:
http://www.schemamania.org/sql/#rank.rows
How can this be efficient??

I don't use mysql, but Tony writes it can apparently do something like:
SELECT @ROW:=@ROW+1 as ROW,* FROM table

Regards,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-01 Thread Gabriel Corneanu

Hi Simon,

No problem, but I don't understand what has this to do with multi - 
processing / users?? (the query program has a lock on the table anyway).
The counter can be either reused (see the other "LIMIT" discussion) or 
it needs a different implementation.


As I don't think that the implementation is the problem, the real issues 
are:

a) is it "lite" enough for the core? (I believe yes)
b) is a specific (non-standard) feature desired? (I don't have a 
problem, but maintainers might have)


While I agree non-standard features should be kept to minimum possible, 
I needed this so often that I simply think there is a larger potential 
benefit.
Another example: I don't do web design, but I remember some WYSIWYG 
editors (Macromedia Dreamweaver) where it was easily possible to 
generate HTML tables (paged listings) from result sets. There was no 
code to write, just mark some fields from the result set.

I'm sure other people can give more details about this...

Regards,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT therowid)

2013-07-01 Thread Gabriel Corneanu

Hi,

My opinion is a little different:

a) of course the compiler needs to change the query program (only if 
"nrow" is requested/used)
b) I don't know the internals, but I just can't believe that such a 
value could not be exported somehow
c) I understand it would be non-standard; however there are lots of 
other specific features to sqlite
d) in certain scenarios, the alternatives are quite difficult to 
achieve; e.g. you have a simple batch file to make some updates, no 
programming involved...


I believe it's quite easy to do (technically), but it's just a matter of 
whether it's "lite" enough :) for the source code...


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-01 Thread Gabriel Corneanu

Simon,
As I wrote it's easy to do when you control the loop.
Sometimes there is either no explicit loop (direct sql, no programming 
code) or the loop is out of your reach (3rd party library).


Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT therowid)

2013-07-01 Thread Gabriel Corneanu

Then this register value is exactly the needed result.
There is also the other syntax, "limit n, m"; you have to skip somehow 
"m" rows.


Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to return the row number? (NOT the rowid)

2013-07-01 Thread Gabriel Corneanu

I also needed this for some special update queries.
Without many details, it was some kind of "insert into xxx select 
, otherfields from source order by ".
For this case there is a workaround, selecting first into a temporary 
table with auto generated rowid and using it afterwards for insert.


There are lots of other cases where this would be handy, e.g. showing 
ordinal of some results.
Of course it is simple to implement when the loop is under your control. 
But sometimes there are libraries (components) which are interfaced only 
by a cursor.
E.g a listing/grid showing cursor results, where you can NOT control the 
implementation but want to show "row" as a column.


I currently use another workaround for this, declaring a user function 
which simply increments/returns a counter.
As long as the queries are simple it is ok, but wrong usage is easily 
possible (e.g. multiple times per row).


As the sqlite query program already has an internal loop, it would be 
very easy to increment a counter for each "next" opcode and return it 
via a special function or pseudo column.
I believe such a counter already exists for "LIMIT" clause, so most of 
the work is already done...


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Gabriel Corneanu

This is not at all my case ...

I don't obviously write 1 by 1, but using blocks of data ( array of 
struct ), virtual tables wrappers, and "insert ... select".
This way I can achieve >200k rec/s, or at least 100k when having some 
more fields.
Right now I'm completely CPU bound, it's 100% load at high rate. IO is 
almost out of question, at <10MB /s; and I use 8k page size and of 
course synchronous off, wal mode...
Another type of data (less fields but with a blob inside 2-32kB) easily 
reaches ~40MB/s but only a few thousands rec/s.
The performance drops abruptly when having more fields (I don't remember 
the magic threshold); it seems most of the load is needed for field 
coding ? I use only integers for space optimization (varint); this is 
also good as I have high dynamic range.


Multi-core sure helps to have enough CPU power for the rest (hardware 
connection, pre-processing, etc).


I would definitely like to be able to get more performance, but I can 
live with the current numbers. One can use some high-end CPUs if really 
wants such high rates (the hardware around costs ~100x more :) ).
BTW I asked a few times already, is it possible to get/compile a windows 
dll for sqlite4 (just for evaluation)?

Last time I checked, it didn't compile on windows at all.

Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Gabriel Corneanu
I admit I didn't think (or didn't even read in detail) about technical 
implementation.


This is an extract from analyzer:
*** Table AE_DATA 

Percentage of total database..  99.89%
Number of entries. 1030371
Bytes of storage consumed. 67846144
Bytes of payload.. 6118671990.2%
Average payload per entry. 59.38
Average unused bytes per entry 0.34
Average fanout 752.00
Fragmentation.   0.35%
Maximum payload per entry. 65
Entries that use overflow. 00.0%
Index pages used.. 11
Primary pages used 8271
Overflow pages used... 0
Total pages used.. 8282
Unused bytes on index pages... 15678   17.4%
Unused bytes on primary pages. 337429   0.50%
Unused bytes on overflow pages 0
Unused bytes on all pages. 353107   0.52%

So I understand that the 11 index pages are pure btree pages, but the 
leaves are actually in the ~8000 data pages.
And it probably needs to visit (i.e. load) all data pages to count the 
leaves...
Even if there would be some counter in the header of each page, it still 
needs to load the pages which is bad for IO...


BTW I found this by opening some file over network, which of course made 
everything worse.
For my case (file format) the data is append (write) only, so max(rowid) 
works equally good.
As a note, I actually HAVE the record count stored somewhere else but I 
had this query in a generic copy routine which was also used for some 
other small tables.
I agree it's some kind of corner case, usually tables have some kind of 
indices. But in this case I need high speed, indices would bring 
performance down.
Not that I really need, but I have to support specified data rates up to 
100k records / second.

And I only access the data sequentially by rowid.

Just for the sake of discussion: I imagine some hacks to the btree to 
optimize this special case.
The btree nodes could store the number of leaves just for the data pages 
(e.g. 0: unknown, >0 valid number); it would need to propagate up the 
info just until it reaches a parent in an index page. And it needs to 
update this info only when a node changes from leaf to having a child.


Thanks for all your time,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Gabriel Corneanu
Again sorry for count(a), I wrote too fast. I understand of course about 
null values.


Otherwise by rowid I mean the autogenerated primary key. In my actual 
case, I have a field as alias.


CREATE TABLE t(id integer primary key, a);
explain query plan select count(*) from t -> scan table

create index ia on t(id);
explain query plan select count(*) from t -> SCAN TABLE t USING COVERING 
INDEX...


1. It means, the primary key is not as good as a cover index??

2. Is there NO WAY to quickly get the row count WITHOUT full scan if I 
only have the auto primary key??


Thanks,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-06 Thread Gabriel Corneanu
OK I understand, then it remains the question why it does not use the 
primary key??


Thanks,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] select count(*) does not use primary key??

2013-06-06 Thread Gabriel Corneanu
I was surprised to find that simple query "select count(*) from table" took
too much time, and found that it does NOT use the primary key index??
e.g.
CREATE TABLE t(a);
explain query plan select count(*) from t

I get : SCAN TABLE t (~100 rows)

If I use CREATE TABLE t(a unique), then it uses the auto-generated cover
index.
Even if I write
select count(rowid) from t
it still uses scan table...
However I would expect that it should also use the primary key for
counting, or not??

In my opinion, count(*) is the same as count(rowid) (I see that even
count() is accepted); I could say it's even the same as count(x) (any other
field).
Strange is, count(*) uses the cover index for a but "select count(a)" does
NOT use the same cover index...

Am I making any mistake here??
Thanks,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Gabriel Corneanu

That's why I asked about feedback.
My implementation is ~100 lines longer, so I think it's still "lite".
There is nothing complex in it; apart from heap implementation, there 
are quite a few simplifications in the original code.


Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Gabriel Corneanu
I thought more about a "minus" (subtract minimum), but this might be a 
better option.


Regards,
Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Gabriel Corneanu

I'm not sure we're talking about the same thing.
For me caching here means avoiding IO, not memory and/or locks.
The heap itself also needs some work, but logarithmic.
The default value of 2000 pages cache should me enough for most useful 
pages (indices, roots...), having an overhead of max ~11 entries to update.


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Gabriel Corneanu
Following a few other discussions, I had the feeling that sqlite should 
benefit from a cache which discards cached pages in a least frequently 
used order.
It generally means, index pages, often used data pages, etc, should be 
preferred (meaning kept in memory) compared to some infrequent used pages.
This helps me where I have big files which are mostly written once, but 
I also have some small tables with summaries; these should be better 
cached, the same for indices.


I first implemented a custom cache in Delphi (Pascal) using some high 
level (generic) containers (hash for keys,  heap for usage data); there 
is a significant overhead due to the classes I used and maybe also 
compiler differences.
My own usage shows some visible improvements, therefore I took some time 
to implement it directly in core (pcache1).


I would like to ask anyone who sees this interesting to try and give 
some feedback about benefits (if at all :)).

Feedback / results / benchmarks are welcome.

If it is useful, I would be happy to contribute it.
The diff is done against 3.7.15.2 ; I'm not sure if it makes it to the 
list, so here is the diff text:

http://pastebin.com/RrzqWjWv

Technical details:
- each page has a fetch counter
- the LRU list is changed to a heap, arranged according to this counter
- when discarding pages, the page with the minimum fetch counter is selected
Apart from the heap operations, the other changes are quite straightforward.

I run some tests to check for errors, maybe someone can check if the 
initialization is done in proper place (especially for shared cache group).
There is an important catch; the fetch counter overflow. I don't have 
yet a definitive idea how/when to limit or to correct it.
So this problem is currently postponed until the tests show actual 
benefit / interest.


Regards,
Gabriel



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query optimization with "order by" in a view

2013-02-20 Thread Gabriel Corneanu
I admit I didn't check what the standards say about "select", I just 
wanted to make sure the potential users (which are by no means 
"developers") get the data properly.
But you misread my example, I had "order by id" everywhere (no mixed 
sorting).
I expected that the optimizer would "see" it's the same order and avoid 
doing it twice.
Otherwise I can also admit that multiple/mixed "order by" would be 
problematic.


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 4

2013-02-20 Thread Gabriel Corneanu

The problem is, it is not ported to Windows as all...
I made a quick (and dirty) port of lsm_unix, but other things (e.g. the 
environment) are also missing.


Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query optimization with "order by" in a view

2013-02-19 Thread Gabriel Corneanu

It's hard to accept this conclusion... it seems like a simple justification.
If you say so, why is "select from v order by id" not doing a sort (with 
the data from view)?

Obviously it "sees" the id is the primary key and uses it for sorting.

I read here lots of messages about complex query optimizations, and 
there is a whole chapter about this here:

http://www.sqlite.org/optoverview.html#flattening

I will obviously handle this somehow, but I hoped it was either a slip 
or would be relatively simple to implement.
I have the feeling (at least for this case) that a simple rule (for the 
optimizer) is, only the last sort should be honored.
That means, ignore the sort from the view; and because this case seems 
to be correctly handled


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 4

2013-02-19 Thread Gabriel Corneanu
I understand, but I wanted to make a performance comparison. I read some  
good news, but I need to test it for my case.
Am in a situation where the bottleneck is the CPU (sqlite), not IO.  
Therefore I'm very interested in an early idea about performance.
Even if it's not ready, I could at least prepare it better for a later  
switch.


Gabriel

--
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] query optimization with "order by" in a view

2013-02-19 Thread Gabriel Corneanu

I need some clarifications on this issue. Here is a simplified example.
There is a table:
CREATE TABLE t(id integer primary key, data integer);
and a (simplified) view:
CREATE VIEW v as SELECT * FROM "t" order by id;

I included the "order by" in view because it's meant for some end-users  
and I wanted to avoid mistakes.
BUT I queried the view myself; I wanted to also be "safe" and included  
another "order by":


explain query plan SELECT * FROM "v" order by id;

As one can see, the plan uses 2 scans and an extra sort (btree)!! Which  
can be very expensive, of course...

I could find the following:
- if I query the view w/o "order by", it works as expected (uses primary  
key)
- if I define the view w/o "order by", and use "order by" in query, it  
also works as expected
- if I use the query with "order by rowid" (instead of id), the query plan  
is a little different; it has 2 scans, but it doesn't use a temporary  
btree anymore


As a summary, it seems that having multiple "order by" disturbs the query  
builder; of course, I expected the "optimizer" to recognize that i was the  
same order and avoid extra sorting.

Am I doing a mistake??

Thanks,
Gabriel


--
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 4

2013-02-19 Thread Gabriel Corneanu
I am also on final steps of a new project, and I would love to compare  
sqlite4 before release (the data files will be public, therefore a later  
switch would be problematic).

Is there any chance to get it for windows? I usually need the dll.
Last time I could not compile it (mingw), there are some memory mapping  
operations only for unix/linux (lsm_unix).

They have equivalents for windows, so it should be possible to port it.

Thanks,
Gabriel

--
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deletion slow?

2013-02-18 Thread Gabriel Corneanu

Wondering how nobody suggested, did you try "PRAGMA synchronous = OFF" ??
For me it is always the default... I can imagine how slow such a 
combination can be.

Just my 2c...

Gabriel

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] force read schema after "delete from sqlite_master"

2012-07-20 Thread Gabriel Corneanu

Yuriy Kaminskiy wrote:
>Check
>PRAGMA secure_delete;
>(and disable if it was enabled by default; it would be rendered 
ineffective by your trick anyway).


It is not enabled.

>Probably there are way to make it work (altering PRAGMA schema_version; or
>something), but I think it is way to hackerish and unsafe to use such 
tricks in

>anything resembling production code.
>Basically, you break consistency of your database image (and then 
kind-of-"fix"

>it with vacuum;).

Yes, and that's why I asked here.
"The Truncate Optimization" talks about something, but it's still quite 
slow.
To understand what I'm saying, please just try it; fill a dummy db until 
it's a few GB in size then try to "clear" it *quickly*.


-
Michael wrote:

>Why don't you just attach another database and switch your user 
connections to that one?

>Then you can just delete the old file and not worry about vaccum at all.

Unfortunately this is not an option at this time; readers are different 
independent processes, which are just prepared to handle a truncate...


>Sounds though like sqlite3 could use a "truncate" command like Oracle 
has which is the speedy way to zero out a table there.


This is what I asked for; but even with "Truncate Optimization" it still 
takes quite some time. It might be hard to avoid, because it probably 
needs to go through the btree to free pages.


But I really believe that a "clear" for the db should be relatively easy 
to do it *fast*; even if it's done without journaling, because it makes 
no real sense to journal a full delete (it means a full backup).
This is what I done via "delete from sqlite_master" / "vacuum", but I 
needed to be safe, therefore inside a transaction...



Gabriel


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] force read schema after "delete from sqlite_master"

2012-07-19 Thread Gabriel Corneanu


Rather than dropping all tables, why not just use

DELETE FROM TABLE myTable

with no clauses ?

Also, I would guess that the thing which is taking most of the time is  
the VACUUM command.  Do you >really need it ?  Are you very short of  
filespace ?  Are you about to make special backup copies of >the newly  
empty tables ?



Simon.


As I wrote, delete / drop takes a lot of time (try it on a big db). I need  
a way to say "quick drop everything", there is nothing to worry about.

Vacuum is instant (after drop tables), which is normal for an empty db.

While space is not a major problem, in my model it is at least unusual.
I use sqlite as file format in an append only model; file size (pages) is  
a direct measure of how much data is available.

In this model, I need the "truncate" operation.

I can live with my workaround...

Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] force read schema after "delete from sqlite_master"

2012-07-19 Thread Gabriel Corneanu
I poked through the sqlite source code and tried to force  
"sqlite3ResetOneSchema" or "sqlite3SchemaClear"... and found no clear way  
to do it...


I found however an workaround: before clearing the file I get the table  
list and execute a rename (alter) for each table using fake names. Then  
delete from sqlite_master.

After this recreating the tables works fast as expected.

After closing the transaction the database structure should be valid of  
any reader; i also call afterwards vacuum/checkpoint to trim the space.

The fake tables/names are also simply lost after the whole process.

While this it works for me, I still think it is a weakness.
Regards,
Gabriel

--
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] force read schema after "delete from sqlite_master"

2012-07-19 Thread Gabriel Corneanu
Hello,

I have the following scenario: I need to "clear"/"initialize" a db file
while potential readers are active (polling for data).
The "normal" way to do it is begin a transaction, drop all tables, recreate
tables, commit (vacuum to regain space).

The biggest problem is that dropping a "very large" table (many GB) takes a
lot of time.
I could very well delete the file, but that fails if any reader has it open.

I tried with
delete from sqlite_master where type in ("table","view","index")
(of course after setting writable schema)
This works very fast, but I have a different problem...
I can't recreate tables because it looks like sqlite still knows about them
?? Even if "select * from sqlite_master" returns nothing??

Dropping takes the same long time as originally did...

It WORKS if I do a "vacuum" first, but it won't work inside a transaction...
I had to wrap everything in a transaction to block readers seeing an
"empty" file. Therefore I also can't close/reopen etc...

Test this from shell in a test db:


create table test(a);
insert into test values(1);

select * from sqlite_master;

pragma writable_schema=1;
delete from sqlite_master where type="table";
pragma writable_schema=0;

select * from sqlite_master;

create table test(b);

vacuum;
create table test(b);


How can I solve this??

Any help appreciated.
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE3 64-bit version

2012-04-27 Thread Gabriel Corneanu

I read about it, but I prefer to use dll binding.
With the amalgamation, it's also quite easy to compile to one obj and link  
directly in Delphi (similar to jpeg; that's probably what you also do). No  
pun intended, why should I pay for it?


Off-topic: what do I need to do for a proper reply??

Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLITE3 64-bit version

2012-04-27 Thread Gabriel Corneanu

There is one more reason to use DLLs, I'm surprised noone mentioned it.
What if you don't use C??? (I use myself Delphi with a header conversion).

Gabriel

--
Using Opera's revolutionary email client: http://www.opera.com/mail/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using \"attach database\" to work around DB locking

2012-04-13 Thread Gabriel Corneanu


Or you can do your immediate writing to a database in memory, and have  
anotherprocess dump memory to disk in the background.  Depending on how  
recent youneed reading you can read the one in memory or the one on disk.


It seems I have reached the CPU boundary (>90% one 1 core), not waiting  
for the disk anymore...
Plus that I'm not using fsync ("pragma synchronous"), so the disk cache is  
in effect anyway.


If I need more through-output, I might go multi-threaded write (if  
possible).


However it would be interesting to know what's really doing; I have an  
"append" only usage.
I think most of the time is spent in updating/maintaining the primary key  
btree, which is a simple "INTEGER PRIMARY KEY" with null on inserts - so  
the values are auto-generated.


I am using a virtual table with a block of values (all fields except the  
rowid) and a "insert into  select * from ".


Is there a possibility to optimize this simple case (because the number of  
records is known, so all new rowids are virtually known)?


Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using "attach database" to work around DB locking

2012-04-12 Thread Gabriel Corneanu

Hi,

I have a similar problem; I need to reach writing 10 records/s and  
parallel reading without blocking the writer.
While previously it was not possible (I was considering HDF5 for this),  
now I decided to go back to sqlite.


Using WAL mode and lots of optimizations I am able to write >15  
records/s (one table with ~20 fields) and at the same time  
reading(processing) >30 records/s.


There are lots of tricks/variables (number of fields, page size, cache  
size, wal checkpoint size...) to achieve this.


Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] non-blocking concurrent read questions

2010-03-09 Thread Gabriel Corneanu
Hello,
Thanks for your attention,


> Although speaking generally such method could be used in some situations, I
> don't think it's good to allow to use it even with a "i know what I'm
> doing"
> pragma. Any structured file (sqlite is an example) have internal
> dependencies. One of the reasons to block is to write different parts of
> structured data together without intervention from other parties in order
> to
> keep the data integral. Imagine writing cache that kept changes for your
> writer and finally it needed to flush the data and at the same time your
> "anytime" reader started to perform some query in the middle of this
> multiply pages writing process. I can't predict whether the reader will end
> with some valid result or it will return with "database malformed" error.
>
> Instead consider changing your own logic. You wrote "without* any
> possibility to be blocked". I suppose you already have a perfect writer
> that
> fits your needs, but if you post some info about the nature of your writer
> and reader (records per second and something like this), it would help to
> be
> more specific with answers.
>
> Max


The writer application must be failsafe, as much as possible (acoustic
emission recording devices); I simply can not afford that a reader makes a
select and because of a programming error the acquisition be blocked. I had
this just by opening sqliteman.

The recording rate is variable; using a test structure (~14 fields in 1
table, all integers 32/64bit) I was able to achieve ~9 records /sec with
sqlite API, which was not really good enough (there are is no jurnal; I
don't need any rollback / consistency check).
Then I was able to make a virtual table wrapper and insert/select and
reached ~17 rec/sec, which is already a big step forward. I think is not
planned, but I would like to have some bulk insert API (not sql) to speed up
things; hdf5 with packet table API reached ~7-8 00 000 rec/sec and is
essentially IO bound. I do not expect this kind of performance from sqlite
soon... Maybe cache settings might also help, suggestions are welcome.
For reading it is much better; here I also used an workaround, a fake
aggregated function ("store(...)") is working much faster (and simpler to
implement than a virtual table).
So the reader can read faster anyway, it just needs to "follow" the data;
small delays are normal and acceptable.


Back to the technical problem; roughly described, I would expect that a
table is expanded first with new data then (on transaction end, I used
blocks of ~1000 records) update some metadata information about table pages
(which pages are used for the data). I image that it could be made somehow
safe (I'm sorry but I have no idea about the actual implementation): write
data (old data remains valid), invalidate page list (if necessary, one
single value which would be atomic on most systems if properly aligned),
write the new page list, then mark it as valid again   (nothing affects
current behavior when using existing locking anyway).
The reader could read without any problem "old" data; it would only need to
detect a "dirty" read if the page list is changed to be incompatible (if new
pages are appended only, the page list might always be valid). In this case
the reader would work in a short loop (similar to the current behavior when
locking) trying to get a "clean" page list.

A trivial example:
page count = 5, page list = 1,2,3,4,5
after writing 2 new pages
page list append 6,7 (does not change the old list entries) then set page
count = 7 (atomic)
The reader reads either 5 or 7; in any case, the page lists are valid.
If necessary, a (transaction) counter might be used to detect "dirty" reads
(when really incompatible).

So the question are: how is expanding working? would it fit such model?
Sure the indexes might create problems... I only use the build-in rowid
during writing.

Thanks again,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] non-blocking concurrent read questions

2010-03-09 Thread Gabriel Corneanu
 Hello again,

I start with your final words, "it's a general database engine".
On the main page it writes:
"Think of SQLite not as a replacement for Oracle but as a replacement for
fopen()"
That's why I try sqlite and not other database (I actually tried embedded
innodb but sqlite was muuuch better / faster from the first try).

Now the rest.
I'm not saying that I need 80 rec/s, just that I reached this speed with
hdf5. Our devices are currently rated at "only" >3 "hits"/sec; therefore
my results at >15 rec/sec are good enough for the start.
I MIGHT need high data rate (>20MB/s), but that includes blobs (waveform)
data. When using blobs sqlite can also achieve high numbers in MB/s (of
course less records).

Obviously we do have now a proprietary format; but as with any format,
changes are required to make room for new features. We tried to design a
much better, self describing, extensible file format; at the end I realize
that it's not that far from existing solutions like hdf5 or sqlite. That's
why I'm here...
 Currently I have only 1 binary structure (union), but I would change that
to 2 or 3 tables, one for each record type.

I don't need any specific indexes during writing (other than native order);
for later analysis optimizations, I might create some other indexes.
Similar to what you wrote about B-trees for pages, I assumed that indexes
are equally problematic on writing.

I do understand that it is not feasible to change much about how it works
right now, to make it valid for concurrent access.

But I still have the feeling that it could work pretty well using this kind
of model (let's call it "optimistic" locking):
for the writer:
1. on write begin (write lock), increase a special counter (n -> n+1, odd
value); this would mark pending changes
2. write new data/pages/references as usual
3. on write end (release write lock), increase the counter again (even
value); this would mark the page structure as valid
for the reader:
4. on begin read (shared lock), read (and store) the counter; if odd, a
writer is active and should return "busy" immediately (just like now).
5. read the page list in cache (I assume it is doing this right now)
6. read the counter again and compare with the initial value; if no change
is detected then page list is valid and it can read existing data. If change
is detected, it should signal "busy" just like 4.
7. when releasing the shared lock, the counter can be checked again (against
the value read on 4) and signal whether the data was changed or not. It is
up to the user to decide what to do with the "dirty" data.

For points 4, 6 it should either fail, or trying like now in a loop for a
certain period.
Even better, the change counter could (or actually "should") be per table.
So for writing there would only be trivial changes (simple counter
increment); for reading there would be some simple read/check of a value
before using the page list.

These relatively simple changes would open it for other applications; by
searching I saw that I'm not the only one who needs this kind of behavior.

Regards,
Gabriel Corneanu
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] non-blocking concurrent read questions

2010-03-09 Thread Gabriel Corneanu
Sure I could have some kind of intermediate storage, but that would mean
unnecessary data moving / copying.
I really hope that I'll find some time and try to study the source and
eventually implement my ideas (maybe others find it interesting and/or
useful too).

You said that only references are changed, right? That means, during appends
the page content is still valid even if B-trees structure is changed because
of references.

PS: what should I add to make the messages as reply (I'm using the web
interface for email)?

Regards,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] non-blocking concurrent read questions

2010-03-09 Thread Gabriel Corneanu
Hello everybody,

I have the following situation:
1. a writer needs to continuously append some data in 1 or 2 tables, *
without* any possibility to be blocked.
2. one (or eventually more) reader needs to read the data for analysis.

Pt 1 is very important; therefore I use a "PRAGMA locking_mode = EXCLUSIVE"
in the writer.
Then of course the reader can NOT read anything.

As I understood until now, there is no standard way to have real concurrent
read/write in sqlite.

I understand that in normal database operations a read uncommitted is not
possible with sqlite (I am talking about different processes).
In my situation I (try to) use sqlite as a data storage format. During
writing there are only appends, no inserts (in the middle) or updates.
(after the writing, standard database operations are applicable). The reader
needs to poll and try to read new data from the file.

I hope that using some usage restrictions, this might be possible. Therefore
I tried to enable reading without locking.
So found the "SQLITE_ENABLE_LOCKING_STYLE" define which might allow (value
2) exactly this behavior; unfortunately it seems to be used only for Unix
files.

Is there a reason not to enable this behavior via a special pragma (only
together with read-only)? I prefer to have a run-time switch, to be able to
use the same code everywhere.

Otherwise I modified winLock function with this code (the second condition
is mine):
..
if( pFile->locktype>=locktype ){
   return SQLITE_OK;
  }

// gc: always allow shared lock!
if(locktype==SHARED_LOCK){
  pFile->locktype = SHARED_LOCK;
  return SQLITE_OK;
}


I did not changed winUnlock, because it doesn't care if unlocking fails
(from shared lock). Of course this is not a full implementation, just a
quick change to test if it would work.

Then I tried in a loop with 2 programs to write / read in parallel and it
seems to work without problems. Can anyone advise if this has any chance to
work (or say it would definitely NOT work)?

As a short summary: would it be interesting for anyone to enable read-only
open with a special pragma to allow reading without locking (that means,
shared locks being a noop)?

PS. I also tried to use HDF5; it is faster, but I like the simplicity of
sqlite (and my data is not that complex to require hdf5).

Thanks for your attention,
Gabriel
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users