Re: [sqlite] Virtual table function calls

2020-02-13 Thread David Jones
Jens Alfke asks:
>Maybe pass the column name as a string, i.e. `attr('H',3)`

2 problems with that idea. First, the first argument has to refer to a value in 
the virtual table in order to invoke the overridden version (overrides are per 
table, so I use the ppArg to bind function invocation to associated virtual 
table). Second, if I created a view that gives column H a name of Total, I’d 
want to the function call to be attr(Total,3). I don’t know how to make the 
function figure out that attr(‘Total’,3) really means column H in some table.
>>Dave
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table function calls

2020-02-13 Thread Jens Alfke


> On Feb 13, 2020, at 12:52 PM, David Jones  wrote:
> 
>   sqlite> select F,G,H,attr(H,3) from summary;   # show formula used 
> to calculate column H.

Maybe pass the column name as a string, i.e. `attr('H',3)`? It sounds like your 
`attr` function needs to know the _identity_ of the column, not its contents, 
and the name is basically the identity.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table function calls

2020-02-13 Thread David Jones
I’ve written a virtual table to view spreadsheet data inside Excel workbook 
(.xlsx) files as SQlite tables. I want to implement an SQL function, attr(), 
which the virtual table overrides to give access to the raw attributes of the 
cell being returned. I’m looking for a robust way to find the actual column 
refered to by an overriden function, not just the value returned by the xColumn 
call.

Example:
   sqlite> .load xlsxvtab.so # creates 
xlsxvtab module and attr() function
   sqlite> create virtual table summary using 
xlsxvtab(‘expenses.xlsx’,’sheet1’,’F20’, ‘H32’);
   sqlite> select F,G,H,attr(H,3) from summary;   # show formula used 
to calculate column H.

The issue is that attr gets called with 2 values and I need to divine that the 
first value came from column H of the current row of the cursor opened by this 
select statement. The hack I’m using now is to give the values returned by the 
xColumn method a subtype equal to the column number. In the attr() function I 
retrieve the subtype and re-fetch that column from the current row of the last 
cursor open on that table.

Various pitfalls with this technique:
- Subtype numbers are limited to the range 0-255.
- Assumes all functions called before xNext() method called again.
- Complex queries with multiple cursors?

Any suggestions?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table OR constraint bug

2020-01-29 Thread Lalit Maganti
Thank you for the fast fix!

Best,
Lalit

On Wed, 29 Jan 2020 at 15:07, Dan Kennedy  wrote:

>
> On 29/1/63 20:09, Lalit Maganti wrote:
> > Hi folks,
> >
> > Just wanted to check up on this to see if this issue is something which
> is
> > being tracked and if there was a potential fix in the works?
>
> Thanks for the excellent bug report and minimal reproduction case. Now
> fixed here:
>
>https://www.sqlite.org/src/info/dcb4838757ca49cf
>
> None of us saw your post last month. We think it must have been filtered
> as spam by gmail. Sorry about that.
>
> Dan.
>
>
>
>
>
> >
> > Thanks,
> > Lalit
> >
> > On Thu, 2 Jan 2020 at 15:13, Lalit Maganti  wrote:
> >
> >> Hi all,
> >>
> >> I believe that I have found a bug in the virtual table bytecode
> generation
> >> when OR constraints are present and argvIndex is set in xBestIndex but
> the
> >> application does not actually filter fully.
> >>
> >> The problem seems to be in not setting/unsetting the non-null flag
> >> correctly (SQLITE_JUMPIFNULL) on the instruction which does the
> comparison;
> >> this leads to NULLs slipping through the filter.
> >>
> >> To fully repro the problem, please find attached a small C file which
> >> shows the unexpected behaviour. By adding an EXPLAIN in-front of the
> SELECT
> >> query, you should be able to see the incorrect bytecode on the NE
> opcode.
> >>
> >> If the attachment doesn't come through for some reason, I've also put
> the
> >> same file as a GitHub gist
> >> .
> >>
> >> Thank you!
> >>
> >> Regards,
> >> Lalit
> >>
> > ___
> > sqlite-users mailing list
> > sqlite-users@mailinglists.sqlite.org
> > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table OR constraint bug

2020-01-29 Thread Dan Kennedy


On 29/1/63 20:09, Lalit Maganti wrote:

Hi folks,

Just wanted to check up on this to see if this issue is something which is
being tracked and if there was a potential fix in the works?


Thanks for the excellent bug report and minimal reproduction case. Now 
fixed here:


  https://www.sqlite.org/src/info/dcb4838757ca49cf

None of us saw your post last month. We think it must have been filtered 
as spam by gmail. Sorry about that.


Dan.







Thanks,
Lalit

On Thu, 2 Jan 2020 at 15:13, Lalit Maganti  wrote:


Hi all,

I believe that I have found a bug in the virtual table bytecode generation
when OR constraints are present and argvIndex is set in xBestIndex but the
application does not actually filter fully.

The problem seems to be in not setting/unsetting the non-null flag
correctly (SQLITE_JUMPIFNULL) on the instruction which does the comparison;
this leads to NULLs slipping through the filter.

To fully repro the problem, please find attached a small C file which
shows the unexpected behaviour. By adding an EXPLAIN in-front of the SELECT
query, you should be able to see the incorrect bytecode on the NE opcode.

If the attachment doesn't come through for some reason, I've also put the
same file as a GitHub gist
.

Thank you!

Regards,
Lalit


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table OR constraint bug

2020-01-29 Thread Lalit Maganti
Hi folks,

Just wanted to check up on this to see if this issue is something which is
being tracked and if there was a potential fix in the works?

Thanks,
Lalit

On Thu, 2 Jan 2020 at 15:13, Lalit Maganti  wrote:

> Hi all,
>
> I believe that I have found a bug in the virtual table bytecode generation
> when OR constraints are present and argvIndex is set in xBestIndex but the
> application does not actually filter fully.
>
> The problem seems to be in not setting/unsetting the non-null flag
> correctly (SQLITE_JUMPIFNULL) on the instruction which does the comparison;
> this leads to NULLs slipping through the filter.
>
> To fully repro the problem, please find attached a small C file which
> shows the unexpected behaviour. By adding an EXPLAIN in-front of the SELECT
> query, you should be able to see the incorrect bytecode on the NE opcode.
>
> If the attachment doesn't come through for some reason, I've also put the
> same file as a GitHub gist
> .
>
> Thank you!
>
> Regards,
> Lalit
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table OR constraint bug

2020-01-02 Thread Lalit Maganti
Hi all,

I believe that I have found a bug in the virtual table bytecode generation
when OR constraints are present and argvIndex is set in xBestIndex but the
application does not actually filter fully.

The problem seems to be in not setting/unsetting the non-null flag
correctly (SQLITE_JUMPIFNULL) on the instruction which does the comparison;
this leads to NULLs slipping through the filter.

To fully repro the problem, please find attached a small C file which shows
the unexpected behaviour. By adding an EXPLAIN in-front of the SELECT
query, you should be able to see the incorrect bytecode on the NE opcode.

If the attachment doesn't come through for some reason, I've also put the
same file as a GitHub gist
.

Thank you!

Regards,
Lalit
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] virtual table xBestIndex and SQLITE_CONSTRAINT andzErrMsg

2019-02-19 Thread dave
> On 2/19/19, dave  wrote:
> > addition, but I have lost a capability relative to the 
> prior scheme of using
> > high query cost along with a special flag communicated in 
> pIdxInfo->idxNum,
> > that being the ablilty to emit contextual info as to why 
> the query failed.
> 
> Yeah.  There is no way to report an error out of xBestIndex.  And, in
> fact, you would not want to do that because one or more xBestIndex
> calls might actually work.  Or, there might be multiple xBestIndex
> calls that all fail for different reasons, in which case it is unclear
> which error should be reported.
> 
> I will ponder your request.  In the meantime, you can continue to use
> the old method, which still works like it always has.
> 
> -- 
> D. Richard Hipp


OK, well the theory being that the message would be emitted only when all
the candidate plans were tried, and still no solution waa found (I guess at
the same spot where the current message is emitted).  But maybe that is too
late, and any messages set along the way are already gone.

As for multiple messages, even just emitting an arbitrary one is useful.
These failures happen at design time and the developer incrementally refines
his/her query until there were no such errors.  I'm not sure if it is
possible to happen once a working query has been created.  I would think
that if you had defined a query that was demonstably solvable once, that any
subsequent executions would at worst gravitate to that known working soluton
even if the planner tried to do things differently that time (maybe based on
data values).

OK, for now I will revert to the old method.

Cheers!

-dave


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] virtual table xBestIndex and SQLITE_CONSTRAINT and zErrMsg

2019-02-19 Thread Richard Hipp
On 2/19/19, dave  wrote:
> I noticed that in 3.26 a feature was added whereby a proposed execution plan
> can be rejected in vtables by returning SQLITE_CONSTRAINT.  I welcome this
> addition, but I have lost a capability relative to the prior scheme of using
> high query cost along with a special flag communicated in pIdxInfo->idxNum,
> that being the ablilty to emit contextual info as to why the query failed.

Yeah.  There is no way to report an error out of xBestIndex.  And, in
fact, you would not want to do that because one or more xBestIndex
calls might actually work.  Or, there might be multiple xBestIndex
calls that all fail for different reasons, in which case it is unclear
which error should be reported.

I will ponder your request.  In the meantime, you can continue to use
the old method, which still works like it always has.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] virtual table xBestIndex and SQLITE_CONSTRAINT and zErrMsg

2019-02-19 Thread dave
I noticed that in 3.26 a feature was added whereby a proposed execution plan
can be rejected in vtables by returning SQLITE_CONSTRAINT.  I welcome this
addition, but I have lost a capability relative to the prior scheme of using
high query cost along with a special flag communicated in pIdxInfo->idxNum,
that being the ablilty to emit contextual info as to why the query failed.

Under the new scheme, a failed query is met with:

  Error: no query solution

But under the old scheme I was able to emit:

  Error: GROUPACCTS: There must be equality constraints on GNAME and ISLOCAL

The context info is handy for developers building the query so they can know
what they are missing, since required constraints like this are non-obvious
from a pure SQL standpoint.  This is especially true in the context of
joins, since then you otherwise wouldn't even know what table is
problemattic.

Under the old scheme I would have to fail my query in xFilter, and I would
set the error text like this:

 if ( IDXVAL_FAILQUERYPLAN == idxNum )
 {
  sqlite3_free( pThis->pVtab->zErrMsg );
  pThis->pVtab->zErrMsg = sqlite3_mprintf( VTBLA4GNAME": There must be
equality constraints on GNAME and ISLOCAL" );
  return SQLITE_CONSTRAINT;
 }

I did try setting the error text in a similar manner in the xFilter method,
however it seems this text is ignored in that case, and I only get the 'no
solution message'.

My suggestion would be to not ignore it in the case of failing xBestIndex
for no query plan, and to emit it if it has been set.

If this is done, I imagine some additional consideration would have to be
made for the case where one proposed query plan is rejected, and another
plan has been accepted.  In that case, maybe the net successful plan would
still have error texts from the previous rejected plan?  I don't know if
this would cause a problem or not.

Cheers!
-dave


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] virtual table module not found needs better logging

2018-08-03 Thread Ben Manashirov
 if( pMod==0 || pMod->pModule->xCreate==0 || pMod->pModule->xDestroy==0 ){
*pzErr = sqlite3MPrintf(db, "no such module: %s", zMod);
rc = SQLITE_ERROR;
}else{
rc = vtabCallConstructor(db, pTab, pMod, pMod->pModule->xCreate, pzErr);
}


That code doesn't tell me I have not implemented xDestroy.. I forgot to
implement xDestroy and was wondering for a while why it was returning no
such module. It would be better to add another check for xCreate & xDestroy
separately to report that the module is found but doesn't have the
requirement methods. Same goes for other mandatory methods. I haven't
tested if other mandatory methods are checked but for sqlite3_create_module
it should ideally check that the module has all the mandatory methods
implemented and report it to the log. xDestroy in version 3.18 is not
reported. I didn't check latest version of sqlite, but I looked at the
release notes all the way up to latest and didn't see anything mentioned
regarding this.

Thank you.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table row deletition on trigger

2017-12-09 Thread Tibor Balog

Hi,

I have realized that this quetion went awry so I give it a second run.

My scenario:

   Enviroment:

DotNet, no 3d party library just pinvoke.
SQLite:
Sqlite library version:3022000
-COMPILER=msvc-1911
-ENABLE_FTS5
-LIKE_DOESNT_MATCH_BLOBS
-MAX_EXPR_DEPTH=0
-OMIT_DECLTYPE
-OMIT_DEPRECATED
-OMIT_PROGRESS_CALLBACK
-OMIT_SHARED_CACHE
-TEMP_STORE=3
-THREADSAFE=1

Using:Synchronous connection.
"PRAGMA journal_mode=MEMORY"

   My assumption:
Opening just one (Read/Write|Create) conection SQLite should never signal 
busy to that connection after it successfuly acquired it.


Since you can use thight loops and it should be up to SQLite how long its 
takes for the individual command execution and how many threads its opens up 
to accomplish the job at hand,
it should come back sync. otherwise it will break the loop forcing to launch 
a signal handling thread.


   My task setup:
A table "ART"
+
A virtual fts5 contentless table:
CREATE VIRTUAL TABLE "OCR" using fts5 (content='',FullText)
+
Trigger:
CREATE TRIGGER "ART_AD" AFTER DELETE ON "ART" BEGIN INSERT INTO "OCR" 
("OCR",rowid) VALUES('DELETE',old.rowid);END

to keep the virtual table sync with row deletation of "ART"

   My task.:
0 Collecting user input from grid selection into a list of SQL DELETE range 
statements (BETWEEN value1 AND value2;)


1 Open connection
2 BEGIN_TRANSACTION
3 the delete loop on the list.
4 COMIT

SQLite breaks the loop with signal "busy" if one or more of the delete 
ranges are bigger than ~20.000 rows.


Without the transaction frame seems to be ok.

   My quetion:
Is my "assumption" above wrong and I allways have to take care of eventual 
interraptions from SQLite?
Is this a limitation because of the trigger forces a separate thread for the 
virtual table, which is maybe not that thightly integrated?

Is this a bug?

KR,
Tibor

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table row deletition on trigger

2017-12-08 Thread Tibor Balog

All right,

I think I was missing a potentialy important info.

I am deleting rows as ranges.

It is up to SQLite how to do the deletation.

I am pushing ranges in a loop in the transaction.

Than the loop breaks signaling "busy" or "overwhelmed" state.

Without the transaction frame it is working fine.

-Ursprüngliche Nachricht- 
From: Tibor Balog

Sent: Friday, December 8, 2017 7:11 PM
To: SQLite
Subject: [sqlite] Virtual table row deletition on trigger

Hello,

I run into a little anoyance regarding virtual table.

I have a contentless table:

CREATE VIRTUAL TABLE "OCR" using fts5 (content='',FullText)

bind with a trigger for row deletition:

CREATE TRIGGER "ART_AD" AFTER DELETE ON "ART" BEGIN INSERT INTO "OCR" 
("OCR",rowid) VALUES('DELETE',old.rowid);END


It works as advertised however if I am deleting rows in transaction above 
~20.000 rows put SQLite himself in “busy” state.


Is this a known limitation ?
That the virtual table row deletition triggered on normal table deletition 
interfers with the normal table in transaction?


Info:
Im not using VAL, jurnal is in memory.

Latest build.

Thank You,
Tibor
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users 


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table row deletition on trigger

2017-12-08 Thread Tibor Balog
Hello,

I run into a little anoyance regarding virtual table.

I have a contentless table:

CREATE VIRTUAL TABLE "OCR" using fts5 (content='',FullText)

bind with a trigger for row deletition:

CREATE TRIGGER "ART_AD" AFTER DELETE ON "ART" BEGIN INSERT INTO "OCR" 
("OCR",rowid) VALUES('DELETE',old.rowid);END

It works as advertised however if I am deleting rows in transaction above 
~20.000 rows put SQLite himself in “busy” state.

Is this a known limitation ?
That the virtual table row deletition triggered on normal table deletition 
interfers with the normal table in transaction?

Info:
Im not using VAL, jurnal is in memory.

Latest build.

Thank You,
Tibor
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Dimitris Bil
I see, so in the native implementation you already have the whole table in 
memory and only use the clustered b-tree index to search for tuples. So I would 
not  expect a large improvement from the virtual table implementation, but the 
virtual table being 5 times slower is strange. Maybe not the correct data 
structure used?


By the way, I had tried adding a virtual table in the sqlite amalgamation and I 
did not see observable difference. On the other hand, I have seen improvement 
in the virtual table utilization using the latest version of sqlite (in 
comparison to a release about a year ago).



From: sqlite-users  on behalf of 
Bob Friesenhahn 
Sent: Wednesday, February 8, 2017 4:09 PM
To: SQLite mailing list
Subject: Re: [sqlite] Virtual table vs real table query performance

On Wed, 8 Feb 2017, Dimitris Bil wrote:

> Do you perform the benchmark on the native database table using cold
> cache or warm cache? Also, can you briefly describe what the
> benchmark does and give the schema for the native database table?

My benchmark repeatedly reads all of the columns one by one given row
id and column name.  The table is read many (e.g. 100) times so this
is a warm cache test.

The schema is not terribly important but the table we are trying to
optimize (with 1800 or less rows) contains a 64-bit rowid, five
integer values, and two short text string values.

   int64, uint32, uint32, text[16], uint8, text[16], text[18], uint8,
   uint32

What I am looking for is expected average virtual table performance vs
native table performance for repeated column reads.

Due to being a generic implementation (supporting many virtual
tables), our virtual implementation uses programmed/dynamic
marshalling rather that compiled marshalling.  The schema definition
is also dynamically generated.

There are implementation overheads and it is useful to know what
performance is possible (e.g. compared to native table performance)
in order to know when the implementation is about as good as it can
be.

Bob
--

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Dominique Devienne
On Wed, Feb 8, 2017 at 4:50 PM, Hick Gunter  wrote:

> This is with a generic, user definable table and index structure capable
> virtual table implementation; it is not a "one module per table" statically
> typed heavily optimized implementation.
>

Ah, that makes complete sense then. I didn't want the OP to think virtual
tables were slower
than native tables in the general case, especially since he mentioned
memory arrays in C code.

And indeed the virtual-table advantage I mentioned is with a different
statically-typed vtable impl/module per vtable,
with statically defined indexes, where the table structure is hard-coded in
the vtable impl itself, and corresponds
to a native "row" data structure. In that config one leverages the
"front-end" of SQLite (parser and VDBE engine)
and very little of the "back-end" (pager and btree), except when SQLite
decides to make temporary tables for
query processing I guess. FWIW. --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Bob Friesenhahn

On Wed, 8 Feb 2017, Dimitris Bil wrote:

Do you perform the benchmark on the native database table using cold 
cache or warm cache? Also, can you briefly describe what the 
benchmark does and give the schema for the native database table?


My benchmark repeatedly reads all of the columns one by one given row 
id and column name.  The table is read many (e.g. 100) times so this 
is a warm cache test.


The schema is not terribly important but the table we are trying to 
optimize (with 1800 or less rows) contains a 64-bit rowid, five 
integer values, and two short text string values.


  int64, uint32, uint32, text[16], uint8, text[16], text[18], uint8,
  uint32

What I am looking for is expected average virtual table performance vs 
native table performance for repeated column reads.


Due to being a generic implementation (supporting many virtual 
tables), our virtual implementation uses programmed/dynamic 
marshalling rather that compiled marshalling.  The schema definition 
is also dynamically generated.


There are implementation overheads and it is useful to know what 
performance is possible (e.g. compared to native table performance) 
in order to know when the implementation is about as good as it can 
be.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Hick Gunter
This is with a generic, user definable table and index structure capable 
virtual table implementation; it is not a "one module per table" statically 
typed heavily optimized implementation.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dominique Devienne
Gesendet: Mittwoch, 08. Februar 2017 16:42
An: SQLite mailing list 
Betreff: Re: [sqlite] Virtual table vs real table query performance

On Wed, Feb 8, 2017 at 4:30 PM, Hick Gunter  wrote:

> Values are for retrieving 100.000 rows with a where clause not
> satisfiable from the index but true for alle rows
>
> asql> select count() from ;
> CPU Time: user 0.092986 sys 0.00
>
> asql> select count() from  where =4;
> CPU Time: user 0.189971 sys 0.00
> CPU Time: user 0.199969 sys 0.00
> CPU Time: user 0.199970 sys 0.00
>
> asql> select count() from  where =4;
> CPU Time: user 0.086987 sys 0.010998
> CPU Time: user 0.085987 sys 0.00
> CPU Time: user 0.076988 sys 0.002000
>


Frankly I'm surprised it's slower than "native" SQLite.

In bulk-insert, random lookup, and table-delete timings we did in 2009 between 
native in-memory SQLite, and pure-C++ virtual tables accessing pure C++ data 
structures (i.e. vm/reflection/introspection/dynamic lookup as in Python for 
example, but direct addressing of statically typed data), the virtual tables 
was always faster, and not by a small margin.

Admittedly it was a long time ago, and SQLite is getting faster all the time 
for sure, but you can't beat static typing of memory addressable structures, vs 
scanning pages of table data and dynamically/serially decoding variable sizes 
rows within those pages.

So something like "non-native" code or something "dynamic" is hiding in the 
virtual table impl, no? --DD ___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Dominique Devienne
On Wed, Feb 8, 2017 at 4:30 PM, Hick Gunter  wrote:

> Values are for retrieving 100.000 rows with a where clause not satisfiable
> from the index but true for alle rows
>
> asql> select count() from ;
> CPU Time: user 0.092986 sys 0.00
>
> asql> select count() from  where =4;
> CPU Time: user 0.189971 sys 0.00
> CPU Time: user 0.199969 sys 0.00
> CPU Time: user 0.199970 sys 0.00
>
> asql> select count() from  where =4;
> CPU Time: user 0.086987 sys 0.010998
> CPU Time: user 0.085987 sys 0.00
> CPU Time: user 0.076988 sys 0.002000
>


Frankly I'm surprised it's slower than "native" SQLite.

In bulk-insert, random lookup, and table-delete timings we did in 2009
between native in-memory SQLite, and pure-C++ virtual tables accessing
pure C++ data structures (i.e. vm/reflection/introspection/dynamic lookup
as in Python for example, but direct addressing of statically typed data),
the virtual tables was always faster, and not by a small margin.

Admittedly it was a long time ago, and SQLite is getting faster all the time
for sure, but you can't beat static typing of memory addressable structures,
vs scanning pages of table data and dynamically/serially decoding variable
sizes
rows within those pages.

So something like "non-native" code or something "dynamic" is hiding in the
virtual table impl, no? --DD
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Hick Gunter
Values are for retrieving 100.000 rows with a where clause not satisfiable from 
the index but true for alle rows

asql> select count() from ;
CPU Time: user 0.092986 sys 0.00

asql> select count() from  where =4;
CPU Time: user 0.189971 sys 0.00
CPU Time: user 0.199969 sys 0.00
CPU Time: user 0.199970 sys 0.00

asql> select count() from  where =4;
CPU Time: user 0.086987 sys 0.010998
CPU Time: user 0.085987 sys 0.00
CPU Time: user 0.076988 sys 0.002000

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Bob Friesenhahn
Gesendet: Mittwoch, 08. Februar 2017 15:39
An: SQLite mailing list 
Betreff: Re: [sqlite] Virtual table vs real table query performance

On Wed, 8 Feb 2017, Hick Gunter wrote:

> Having imlemented a memory-based virtual table complete with indices,
> full table scan and direct access via rowid (which happens to be the
> memory address of the row) I can do a batch delete of
> 100.000 rows (in a table with 1 composite index) in about 2 seconds

The case I am interested is pure read performance of a single column element at 
a time given properly implemented xBestIndex and xFilter support.  Rows are not 
being added/removed using sqlite.

It is possible that native tables can be faster since the implementation is not 
limited to the rigid set of callback functions provided for virtual tables to 
use and of course the amalgamation is optimized by the compiler as one source 
module.

By tracing the callbacks, we do see that our implementation is not invoking the 
callbacks more times than necessary (which was not the case before xBestIndex 
and xFilter support was added).  Due to the requirements of the implementation, 
POSIX reader/writer locks are used so there is some low-contention locking 
overhead.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Dimitris Bil
Hello,


Do you perform the benchmark on the native database table using cold cache or 
warm cache? Also, can you briefly describe what the benchmark does and give the 
schema for the native database table?


thanks



From: sqlite-users  on behalf of 
Bob Friesenhahn 
Sent: Wednesday, February 8, 2017 2:39 PM
To: SQLite mailing list
Subject: Re: [sqlite] Virtual table vs real table query performance



The case I am interested is pure read performance of a single column
element at a time given properly implemented xBestIndex and xFilter
support.  Rows are not being added/removed using sqlite.

It is possible that native tables can be faster since the
implementation is not limited to the rigid set of callback functions
provided for virtual tables to use and of course the amalgamation is
optimized by the compiler as one source module.

By tracing the callbacks, we do see that our implementation is not
invoking the callbacks more times than necessary (which was not the
case before xBestIndex and xFilter support was added).  Due to the
requirements of the implementation, POSIX reader/writer locks are used
so there is some low-contention locking overhead.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
GraphicsMagick Image Processing System<http://www.graphicsmagick.org/>
www.graphicsmagick.org
GraphicsMagick is a robust collection of tools and libraries to read, write, 
and manipulate an image in any of the more popular image formats including GIF, 
JPEG, PNG ...


<http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Bob Friesenhahn

On Wed, 8 Feb 2017, Hick Gunter wrote:

Having imlemented a memory-based virtual table complete with 
indices, full table scan and direct access via rowid (which happens 
to be the memory address of the row) I can do a batch delete of 
100.000 rows (in a table with 1 composite index) in about 2 seconds


The case I am interested is pure read performance of a single column 
element at a time given properly implemented xBestIndex and xFilter 
support.  Rows are not being added/removed using sqlite.


It is possible that native tables can be faster since the 
implementation is not limited to the rigid set of callback functions 
provided for virtual tables to use and of course the amalgamation is 
optimized by the compiler as one source module.


By tracing the callbacks, we do see that our implementation is not 
invoking the callbacks more times than necessary (which was not the 
case before xBestIndex and xFilter support was added).  Due to the 
requirements of the implementation, POSIX reader/writer locks are used 
so there is some low-contention locking overhead.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table vs real table query performance

2017-02-08 Thread Hick Gunter
Having imlemented a memory-based virtual table complete with indices, full 
table scan and direct access via rowid (which happens to be the memory address 
of the row) I can do a batch delete of 100.000 rows (in a table with 1 
composite index) in about 2 seconds (3.7 seconds with the condition) while 
running linux (RH 5.6 x86_64 VM) on a virtual machine. Deleting all rows of a 
native SQLite table (while checking for the value of a non-indexed field to 
avoid SQLite just dropping an re-creating the table) takes about 1 second.



Note that both operations require a full table scan to fill a „rowset“ (= 
SQLite internal temporary table) and that the virtual table function VUpdate 
expects the virtual table code to handle index deletetion which is explicitly 
coded in the native table case.



asql> explain delete from ;

addr  opcode p1p2p3p4 p5  comment

  -        -  --  -

0 Trace  0 0 000  NULL

1 Goto   0 18000  NULL

2 Integer0 1 000  NULL

3 Null   0 2 000  NULL

4 VOpen  0 0 0 vtab:187BE588:2ACC1FDC4990  00  NULL

5 Integer1 4 000  NULL

6 Integer0 5 000  NULL

7 VFilter0 12400  NULL

8 Rowid  0 3 000  NULL

9 RowSetAdd  2 3 000  NULL

10AddImm 1 1 000  NULL

11VNext  0 8 000  NULL

12Close  0 0 000  NULL

13RowSetRead 2 16300  NULL

14VUpdate0 1 3 vtab:187BE588:2ACC1FDC4990  02  NULL

15Goto   0 13000  NULL

16ResultRow  1 1 000  NULL

17Halt   0 0 000  NULL

18VBegin 0 0 0 vtab:187BE588:2ACC1FDC4990  00  NULL

19Goto   0 2 000  NULL



asql> explain delete from  where =4;

addr  opcode p1p2p3p4 p5  comment

  -        -  --  -

0 Trace  0 0 000  NULL

1 Goto   0 31000  NULL

2 Integer0 1 000  NULL

3 Null   0 2 000  NULL

4 OpenRead   0 215   0 7  00  

5 Rewind 0 13000  NULL

6 Column 0 6 400  .

7 Integer4 5 000  NULL

8 Ne 5 124   collseq(BINARY)  6c  NULL

9 Rowid  0 3 000  NULL

10RowSetAdd  2 3 000  NULL

11AddImm 1 1 000  NULL

12Next   0 6 001  NULL

13Close  0 0 000  NULL

14OpenWrite  0 215   0 8  00  

15OpenWrite  1 1362  0  Keyinfo(5,BINARY,BINARY)  00  

16RowSetRead 2 27300  NULL

17NotExists  0 26300  NULL

18Rowid  0 11000  NULL

19Column 0 1 600  .

20Column 0 2 700  .

21Column 0 3 800  .

22Column 0 4 900  .

23Column 0 5 10   00  .

24IdxDelete  1 6 600  NULL

25Delete 0 1 000  NULL

26Goto   0 16000  NULL

27Close  1 1362  000  NULL

28Close  0 0 000  NULL

29ResultRow  1 1 000  NULL

30Halt   0 0 000  NULL

31Transaction0 1 000  NULL

32VerifyCookie   0 1191  000  NULL

33TableLock  0 215   100  NULL

34Goto   0 2 000  NULL



-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Bob Friesenhahn
Gesendet: Dienstag, 07. Februar 2017 22:06
An: SQLite mailing list 
Betreff: [sqlite] Virtual table vs real table query performance



We are trying to improve the query perfo

[sqlite] Virtual table vs real table query performance

2017-02-07 Thread Bob Friesenhahn
We are trying to improve the query performance of our virtual table 
implementation (which is implemented in C).  Due to requirements of 
external code, a specified column of a specified row (by rowid) is 
queried at a time (the least efficient means of access).  Our virtual 
table is accessing entries in a memory-based array.


I have implemented a benchmark script written in Python using the APSW 
wrapper.  The benchmark script reveals that access to a native 
database table is 5 times faster than access to our virtual table.


Intuitively, I would think that access to a memory-based virtual table 
could be faster than native tables.  Our developer has implemented 
xBestIndex and xFilter support which is intended to result in direct 
access to the requested row rather than scanning the whole table.


What is the expected performance of a properly implemented virtual 
table (assuming little additional overhead) vs a native table?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table acting as wrapper of a regular table

2016-10-24 Thread Dimitris Bil
Hick,


sorry, I think I misunderstood your first reply.


Yes, of course I am calling each method twice when I use the wrapper: one time 
in the wrapper and one time in B, so an overhead is absolutely expected. But I 
would not expect this overhead to be equal to the time of the total execution 
without the wrapper, as the wrapper's methods pretty much do nothing more that 
calling the methods of table B that do the actual work. So I would expect the 
time to be double, if apart from calling the wrapper's methods I would also 
scan table A twice and searching B table double the number of the times I do in 
the normal execution.


Best,

Dimitris



From: sqlite-users  on behalf of 
Hick Gunter 
Sent: Monday, October 24, 2016 1:59 PM
To: 'SQLite mailing list'
Subject: Re: [sqlite] Virtual table acting as wrapper of a regular table

When you prepare() your statement, SQLite translates this into 2 nested loops:

Outer loop:
Rewind table A (retrieve first row)
Extract id from table A row
...
Retrieve next row from table A

Inner loop:
Set table B to A.id
Extract id from B
Check for match
(return result row)
Retrieve next row from table B

When you are using your wrapper, the inner loop gets replaced by:
Call xFilter() -> reset(), bind(), step(), column_value() .> Set table B to 
A.id, Extract id from table B row, check for match
Call xEof()
Call xColumn() -> return result value
Call xNext() -> step(), column_value() -> Retrieve next row from table B

This is about twice as much work.


-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dimitris Bil
Gesendet: Montag, 24. Oktober 2016 15:10
An: SQLite mailing list 
Betreff: Re: [sqlite] Virtual table acting as wrapper of a regular table

Indeed I am not checking argc. This code just only means to serve as a test in 
order to make comparisons for execution speed. I have stripped it of any other 
functionality.


Regarding prepare, I am only preparing the statement in xConnect (which also 
acts as xCreate). This is only executed once, when connection to the virtual 
table is opened, right?

Regarding the step method, I am executing a step the first time inside xFilter 
in order to advance the cursor in the first result. Then each time the xNext is 
called, I am executing step one more time to continue advancing the cursor. I 
think this is in accordance to what the sqlite page says: "If the virtual table 
contains one or more rows that match the search criteria, then the cursor must 
be left point at the first row. Subsequent calls to xEof must return false 
(zero). If there are no rows match, then the cursor must be left in a state 
that will cause the xEof to return true (non-zero). The SQLite engine will use 
the xColumn and xRowid methods to access that row content. The xNext method 
will be used to advance to the next row."

Also, I am using sqlite3_result_int64 only one time per result tuple inside 
xColumn. In my case, as each result tuple is the same, this indeed could be 
avoided, but I don't think that this is the cause for the overhead.

thanks again,
Dimitris




From: sqlite-users  on behalf of 
Hick Gunter 
Sent: Monday, October 24, 2016 10:39 AM
To: 'SQLite mailing list'
Subject: Re: [sqlite] Virtual table acting as wrapper of a regular table

Your xFilter method is blindly assuming that there is always an argv[0] without 
checking argc first.

You are incurring an extra "prepare" in your xConnect method, an extra "step" 
and "column" in your xFilter/xNext methods and an extra "result" in your 
xColumn function. Doing twice as much work taking twice as long seems quite 
reasonable.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dimitris Bil
Gesendet: Samstag, 22. Oktober 2016 15:31
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] Virtual table acting as wrapper of a regular table

Hello,

I am trying to create a simple virtual table that acts as wrapper for a normal 
database table for a specific query. For example, consider that I have tables A 
and B in my database and the query:
select count(A.id) from A CROSS JOIN B where A.id=B.id

Now I am using a virtual table acting as wrapper for table B, so I can execute 
the query select count(A.id) from A CROSS JOIN w where A.id=w.id, where w is 
the virtual table instance, and get the same result. My problem is that the 
second query is about 2 times slower. I would normally expect some overhead, 
but this seems quite slow, so I was wondering if there is something wrong with 
my code.

I am using a prepared statement in connect method for query "select id from B 
where id=?1" and reset/bind/step in filter method and 

Re: [sqlite] Virtual table acting as wrapper of a regular table

2016-10-24 Thread Hick Gunter
When you prepare() your statement, SQLite translates this into 2 nested loops:

Outer loop:
Rewind table A (retrieve first row)
Extract id from table A row
...
Retrieve next row from table A

Inner loop:
Set table B to A.id
Extract id from B
Check for match
(return result row)
Retrieve next row from table B

When you are using your wrapper, the inner loop gets replaced by:
Call xFilter() -> reset(), bind(), step(), column_value() .> Set table B to 
A.id, Extract id from table B row, check for match
Call xEof()
Call xColumn() -> return result value
Call xNext() -> step(), column_value() -> Retrieve next row from table B

This is about twice as much work.


-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dimitris Bil
Gesendet: Montag, 24. Oktober 2016 15:10
An: SQLite mailing list 
Betreff: Re: [sqlite] Virtual table acting as wrapper of a regular table

Indeed I am not checking argc. This code just only means to serve as a test in 
order to make comparisons for execution speed. I have stripped it of any other 
functionality.


Regarding prepare, I am only preparing the statement in xConnect (which also 
acts as xCreate). This is only executed once, when connection to the virtual 
table is opened, right?

Regarding the step method, I am executing a step the first time inside xFilter 
in order to advance the cursor in the first result. Then each time the xNext is 
called, I am executing step one more time to continue advancing the cursor. I 
think this is in accordance to what the sqlite page says: "If the virtual table 
contains one or more rows that match the search criteria, then the cursor must 
be left point at the first row. Subsequent calls to xEof must return false 
(zero). If there are no rows match, then the cursor must be left in a state 
that will cause the xEof to return true (non-zero). The SQLite engine will use 
the xColumn and xRowid methods to access that row content. The xNext method 
will be used to advance to the next row."

Also, I am using sqlite3_result_int64 only one time per result tuple inside 
xColumn. In my case, as each result tuple is the same, this indeed could be 
avoided, but I don't think that this is the cause for the overhead.

thanks again,
Dimitris




From: sqlite-users  on behalf of 
Hick Gunter 
Sent: Monday, October 24, 2016 10:39 AM
To: 'SQLite mailing list'
Subject: Re: [sqlite] Virtual table acting as wrapper of a regular table

Your xFilter method is blindly assuming that there is always an argv[0] without 
checking argc first.

You are incurring an extra "prepare" in your xConnect method, an extra "step" 
and "column" in your xFilter/xNext methods and an extra "result" in your 
xColumn function. Doing twice as much work taking twice as long seems quite 
reasonable.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dimitris Bil
Gesendet: Samstag, 22. Oktober 2016 15:31
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] Virtual table acting as wrapper of a regular table

Hello,

I am trying to create a simple virtual table that acts as wrapper for a normal 
database table for a specific query. For example, consider that I have tables A 
and B in my database and the query:
select count(A.id) from A CROSS JOIN B where A.id=B.id

Now I am using a virtual table acting as wrapper for table B, so I can execute 
the query select count(A.id) from A CROSS JOIN w where A.id=w.id, where w is 
the virtual table instance, and get the same result. My problem is that the 
second query is about 2 times slower. I would normally expect some overhead, 
but this seems quite slow, so I was wondering if there is something wrong with 
my code.

I am using a prepared statement in connect method for query "select id from B 
where id=?1" and reset/bind/step in filter method and (if there are more 
results) step in next method.

Here's my code (I am just setting id=0 to denote eof) 
http://pastebin.com/ce8b4aLL

Do you think there's something wrong or it's an unavoidable overhead that comes 
with the virtual table usage? Are there any chances to improve performance?

thanks,
Dimitris

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legal

Re: [sqlite] Virtual table acting as wrapper of a regular table

2016-10-24 Thread Dimitris Bil
Indeed I am not checking argc. This code just only means to serve as a test in 
order to make comparisons for execution speed. I have stripped it of any other 
functionality.


Regarding prepare, I am only preparing the statement in xConnect (which also 
acts as xCreate). This is only executed once, when connection to the virtual 
table is opened, right?

Regarding the step method, I am executing a step the first time inside xFilter 
in order to advance the cursor in the first result. Then each time the xNext is 
called, I am executing step one more time to continue advancing the cursor. I 
think this is in accordance to what the sqlite page says: "If the virtual table 
contains one or more rows that match the search criteria, then the cursor must 
be left point at the first row. Subsequent calls to xEof must return false 
(zero). If there are no rows match, then the cursor must be left in a state 
that will cause the xEof to return true (non-zero). The SQLite engine will use 
the xColumn and xRowid methods to access that row content. The xNext method 
will be used to advance to the next row."

Also, I am using sqlite3_result_int64 only one time per result tuple inside 
xColumn. In my case, as each result tuple is the same, this indeed could be 
avoided, but I don't think that this is the cause for the overhead.

thanks again,
Dimitris




From: sqlite-users  on behalf of 
Hick Gunter 
Sent: Monday, October 24, 2016 10:39 AM
To: 'SQLite mailing list'
Subject: Re: [sqlite] Virtual table acting as wrapper of a regular table

Your xFilter method is blindly assuming that there is always an argv[0] without 
checking argc first.

You are incurring an extra "prepare" in your xConnect method, an extra "step" 
and "column" in your xFilter/xNext methods and an extra "result" in your 
xColumn function. Doing twice as much work taking twice as long seems quite 
reasonable.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dimitris Bil
Gesendet: Samstag, 22. Oktober 2016 15:31
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] Virtual table acting as wrapper of a regular table

Hello,

I am trying to create a simple virtual table that acts as wrapper for a normal 
database table for a specific query. For example, consider that I have tables A 
and B in my database and the query:
select count(A.id) from A CROSS JOIN B where A.id=B.id

Now I am using a virtual table acting as wrapper for table B, so I can execute 
the query select count(A.id) from A CROSS JOIN w where A.id=w.id, where w is 
the virtual table instance, and get the same result. My problem is that the 
second query is about 2 times slower. I would normally expect some overhead, 
but this seems quite slow, so I was wondering if there is something wrong with 
my code.

I am using a prepared statement in connect method for query "select id from B 
where id=?1" and reset/bind/step in filter method and (if there are more 
results) step in next method.

Here's my code (I am just setting id=0 to denote eof) 
http://pastebin.com/ce8b4aLL

Do you think there's something wrong or it's an unavoidable overhead that comes 
with the virtual table usage? Are there any chances to improve performance?

thanks,
Dimitris

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table acting as wrapper of a regular table

2016-10-24 Thread Hick Gunter
Your xFilter method is blindly assuming that there is always an argv[0] without 
checking argc first.

You are incurring an extra "prepare" in your xConnect method, an extra "step" 
and "column" in your xFilter/xNext methods and an extra "result" in your 
xColumn function. Doing twice as much work taking twice as long seems quite 
reasonable.

-Ursprüngliche Nachricht-
Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
Auftrag von Dimitris Bil
Gesendet: Samstag, 22. Oktober 2016 15:31
An: sqlite-users@mailinglists.sqlite.org
Betreff: [sqlite] Virtual table acting as wrapper of a regular table

Hello,

I am trying to create a simple virtual table that acts as wrapper for a normal 
database table for a specific query. For example, consider that I have tables A 
and B in my database and the query:
select count(A.id) from A CROSS JOIN B where A.id=B.id

Now I am using a virtual table acting as wrapper for table B, so I can execute 
the query select count(A.id) from A CROSS JOIN w where A.id=w.id, where w is 
the virtual table instance, and get the same result. My problem is that the 
second query is about 2 times slower. I would normally expect some overhead, 
but this seems quite slow, so I was wondering if there is something wrong with 
my code.

I am using a prepared statement in connect method for query "select id from B 
where id=?1" and reset/bind/step in filter method and (if there are more 
results) step in next method.

Here's my code (I am just setting id=0 to denote eof) 
http://pastebin.com/ce8b4aLL

Do you think there's something wrong or it's an unavoidable overhead that comes 
with the virtual table usage? Are there any chances to improve performance?

thanks,
Dimitris

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table acting as wrapper of a regular table

2016-10-22 Thread Dimitris Bil
Hello,

I am trying to create a simple virtual table that acts as wrapper for a normal 
database table for a specific query. For example, consider that I have tables A 
and B in my database and the query:
select count(A.id) from A CROSS JOIN B where A.id=B.id

Now I am using a virtual table acting as wrapper for table B, so I can execute 
the query
select count(A.id) from A CROSS JOIN w where A.id=w.id,
where w is the virtual table instance, and get the same result. My problem is 
that the second query is about 2 times slower. I would normally expect some 
overhead, but this seems quite slow, so I was wondering if there is something 
wrong with my code.

I am using a prepared statement in connect method for query "select id from B 
where id=?1" and reset/bind/step in filter method and (if there are more 
results) step in next method.

Here's my code (I am just setting id=0 to denote eof)
http://pastebin.com/ce8b4aLL

Do you think there's something wrong or it's an unavoidable overhead that comes 
with the virtual table usage? Are there any chances to improve performance?

thanks,
Dimitris

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table

2016-09-18 Thread Eduardo Morras
On Wed, 31 Aug 2016 10:34:05 -0300
Maria de Jesus Philadelpho  wrote:

> Hi,
> 
> I implement the SQLite extension, virtcsv, which allows attaching a
> CSV file as a virtual table.  At a command line everything works just
> fine, why I can't see the result of a select * from table using the
> SQLitestudio 3.0.7?

Beacuse Sqlite Studio must load the Sqlite csv virtual table before
use, or be compiled with it.

Read Sqlite Studio manual how to achive that, I don't use SqliteStudio.

Perhaps you can get better answers from Sqlite Studio developers.

> See the attached snapshot.

Sqlite maillist don't support attachment.

> regards,
> 
> Maria Azevedo
> --
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


---   ---
Eduardo Morras 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table

2016-09-18 Thread John G
1) Can't see the 'attachment' - this list does not accept them.
Can you copy/paste sample code into an email?

John G



On 31 August 2016 at 14:34, Maria de Jesus Philadelpho <
jesus.ph...@gmail.com> wrote:

> Hi,
>
> I implement the SQLite extension, virtcsv, which allows attaching a CSV
> file as a virtual table.  At a command line everything works just fine, why
> I can't see the result of a select * from table using the SQLitestudio
> 3.0.7?
> See the attached snapshot.
>
> regards,
>
> Maria Azevedo
> --
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table

2016-08-31 Thread Maria de Jesus Philadelpho
Hi,

I implement the SQLite extension, virtcsv, which allows attaching a CSV
file as a virtual table.  At a command line everything works just fine, why
I can't see the result of a select * from table using the SQLitestudio
3.0.7?
See the attached snapshot.

regards,

Maria Azevedo
--
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual Table BestIndex Being Passed LIKE constraints in SQLite 3.13.0?

2016-06-24 Thread Richard Hipp
On Sat, Jun 25, 2016 at 2:55 AM, Mike Nicolino
 wrote:
> Hey Everyone,
>
> I'm got a system with virtual tables using System.Data.SQLite 
> 1.0.102.0/SQLite version 3.13.0, the LIKE constraint for queries is now being 
> passed to the virtual table BestIndex functions (BestIndex is getting 65 as 
> its constraint.op).
>
> Is the behavior in 3.13.0 a bug or has something changed that now makes it 
> possible for the LIKE constraint to be sent to virtual tables?
>

This is a feature introduced in version 3.10.0.
https://www.sqlite.org/releaselog/3_10_0.html


-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table BestIndex Being Passed LIKE constraints in SQLite 3.13.0?

2016-06-24 Thread Mike Nicolino
Hey Everyone,

I'm got a system with virtual tables using System.Data.SQLite 1.0.102.0/SQLite 
version 3.13.0, the LIKE constraint for queries is now being passed to the 
virtual table BestIndex functions (BestIndex is getting 65 as its 
constraint.op).  I had asked a question regarding LIKE constraints and virtual 
tables a while back and the response was that LIKE constraints were intended to 
never be passed to virtual table BestIndex due to LIKE being able to be 
overridden by the application (conversion pasted at end of this mail).

Is the behavior in 3.13.0 a bug or has something changed that now makes it 
possible for the LIKE constraint to be sent to virtual tables?

Thanks,
Mike Nicolino


On 3/5/15, Mike Nicolino  wrote:

> I'm using System.Data.SQLite version 1.0.95.0 and have what appears to

> be a bug with Virtual Tables.  Queries using 'like' in the where

> clause are not getting the like clause passed to BestIndex as a query 
> constraint.

> Specifically:

>

>

> -  Simple query: select * from foo where name like 'a%'



The LIKE operator can be overridden by the application to mean anything the 
application wants - it is not compelled to follow standard SQL semantics.  For 
that reason, virtual tables are unable to optimize using LIKE since they have 
no way of knowing what it will do.



Works as designed.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table Example?

2016-02-06 Thread John Stegeman
Hello,
Does anyone know where to find an example project that creates a SQLite virtual 
table module using a web service as the data source?  Or if anyone has done 
this type of work, please contact me directly.

Thank you,
John Stegeman


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-06-08 Thread Eric Hill
Surely enough, this SQL:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title, 
t4.category_id, t5."name" 
FROM rental  t2 
LEFT OUTER JOIN inventory t1 
ON  ( t1.inventory_id = t2.inventory_id )  
LEFT OUTER JOIN film t3 
ON  ( t3.film_id = t1.film_id )  
LEFT OUTER JOIN film_category t4 
ON  ( t4.film_id = t1.film_id )  
LEFT OUTER JOIN category t5 
ON  ( t5.category_id = t4.category_id )  ;

generates the desired xBestIndex call:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

allowing the "first constraint" heuristic to be successful.

So, in the case where my application is generating SQL, I can fix it so that 
the ON is generated in the helpful order.  But I also let users type in their 
own SQL.

Would it be too much to ask SQLite to determine  vs.  based 
on context rather than inferring based on position?  I'm looking at SQL In a 
Nutshell, 3rd Edition, page 353, and the JOIN syntax example shows the ON 
statements in the order SQLite dislikes.

Thanks very much!

Eric


-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Hick Gunter
Sent: Monday, June 08, 2015 10:21 AM
To: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Can you try changing LHS and RHS in the first ON expression?

The older, larger query has inventory_id on the LHS and film_id on the RHS. Now 
you have all fields on the RHS.

It would seem the QP may be inferring   LEFT OUTER JOIN  ON 
( = ) And placing the  fields first in the argument 
list And adding the  fields only for symmetry reasons (they can never 
be used, because they point the "wrong way")

-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Montag, 08. Juni 2015 15:55
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Revisiting a thread from about 3 weeks back, I have another xBestIndex puzzler. 
 The example query from that thread was:

SELECT t1.rental_date, t1.inventory_id, t8.film_id, t5.title AS Title,
t3."name" AS Category, t4.customer_id, t4.store_id, t4.email,
t5.length
FROM rental  t1
LEFT OUTER JOIN inventory t8
ON  ( t8.inventory_id = t1.inventory_id )
LEFT OUTER JOIN customer t4
ON  ( t4.customer_id = t1.customer_id )
LEFT OUTER JOIN film_category t7
ON  ( t7.film_id = t8.film_id )
LEFT OUTER JOIN film t5
ON  ( t5.film_id = t8.film_id )
LEFT OUTER JOIN category t3
ON  ( t3.category_id = t7.category_id )
LEFT OUTER JOIN film_actor t6
ON  ( t6.film_id = t5.film_id )
LEFT OUTER JOIN actor t2
ON  ( t2.actor_id = t6.actor_id )  ;

The "money" call to xBestIndex in that case looked like this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

In response to this, I originally promised to create an index using all 3 
constraints (two if which are the same), but SQLite rejected that and did a 
table scan instead, because it really only needed an index based on 
inventory_id.  So the heuristic we ended up with, which did fix this specific 
query, was (from Gunter):

> If you know nothing about a table except for the names of the fields 
> and the number of rows, then you are best off choosing the first 
> constraint only. (rows = cost = log n)

Well, now I have a counter-example.  This time, the SQL looks like this:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title,
t4.category_id, t5."name"
FROM rental  t2
LEFT OUTER JOIN inventory t1
ON  ( t2.inventory_id = t1.inventory_id )
LEFT OUTER JOIN film t3
ON  ( t3.film_id = t1.film_id )
LEFT OUTER JOIN film_category t4
ON  ( t4.film_id = t1.film_id )
LEFT OUTER JOIN category t5
ON  ( t5.category_id = t4.category_id )  ;

For whatever reason, the "money" call to xBestIndex in this case looks like 
this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 0 (inventory_id) = Usable

The order of the constraints is different!  So, using the "first constraint" 
heuristic, I commit to indexing based on film_id, but indexing inventory on 
film_id not helpful for this query.  SQLite sees that inventory is indexed on 
film_id and dec

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-06-08 Thread Hick Gunter
Can you try changing LHS and RHS in the first ON expression?

The older, larger query has inventory_id on the LHS and film_id on the RHS. Now 
you have all fields on the RHS.

It would seem the QP may be inferring   LEFT OUTER JOIN  ON 
( = )
And placing the  fields first in the argument list
And adding the  fields only for symmetry reasons (they can never be 
used, because they point the "wrong way")

-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Montag, 08. Juni 2015 15:55
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Revisiting a thread from about 3 weeks back, I have another xBestIndex puzzler. 
 The example query from that thread was:

SELECT t1.rental_date, t1.inventory_id, t8.film_id, t5.title AS Title,
t3."name" AS Category, t4.customer_id, t4.store_id, t4.email,
t5.length
FROM rental  t1
LEFT OUTER JOIN inventory t8
ON  ( t8.inventory_id = t1.inventory_id )
LEFT OUTER JOIN customer t4
ON  ( t4.customer_id = t1.customer_id )
LEFT OUTER JOIN film_category t7
ON  ( t7.film_id = t8.film_id )
LEFT OUTER JOIN film t5
ON  ( t5.film_id = t8.film_id )
LEFT OUTER JOIN category t3
ON  ( t3.category_id = t7.category_id )
LEFT OUTER JOIN film_actor t6
ON  ( t6.film_id = t5.film_id )
LEFT OUTER JOIN actor t2
ON  ( t2.actor_id = t6.actor_id )  ;

The "money" call to xBestIndex in that case looked like this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

In response to this, I originally promised to create an index using all 3 
constraints (two if which are the same), but SQLite rejected that and did a 
table scan instead, because it really only needed an index based on 
inventory_id.  So the heuristic we ended up with, which did fix this specific 
query, was (from Gunter):

> If you know nothing about a table except for the names of the fields
> and the number of rows, then you are best off choosing the first
> constraint only. (rows = cost = log n)

Well, now I have a counter-example.  This time, the SQL looks like this:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title,
t4.category_id, t5."name"
FROM rental  t2
LEFT OUTER JOIN inventory t1
ON  ( t2.inventory_id = t1.inventory_id )
LEFT OUTER JOIN film t3
ON  ( t3.film_id = t1.film_id )
LEFT OUTER JOIN film_category t4
ON  ( t4.film_id = t1.film_id )
LEFT OUTER JOIN category t5
ON  ( t5.category_id = t4.category_id )  ;

For whatever reason, the "money" call to xBestIndex in this case looks like 
this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 0 (inventory_id) = Usable

The order of the constraints is different!  So, using the "first constraint" 
heuristic, I commit to indexing based on film_id, but indexing inventory on 
film_id not helpful for this query.  SQLite sees that inventory is indexed on 
film_id and decides to use table scan for inventory, and it's game over.

If SQLite calls to xBestIndex do not in some way convey which constraints 
matter and which ones don't, I don't see how I can use virtual tables.

Here is the WhereTrace, in case it might help:

*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
 begin solver.  (nRowEst=0)
New0 cost=216,200 order=0
 after round 0 
 0 cost=216 nrow=200 order=0
 begin solver.  (nRowEst=201)
New0 cost=216,200 order=1
 after round 0 
 0 cost=216 nrow=200 order=1 rev=0x0
 Solution nRow=200 ORDERBY=1,0x0
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cos

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-06-08 Thread Eric Hill
Revisiting a thread from about 3 weeks back, I have another xBestIndex puzzler. 
 The example query from that thread was:

SELECT t1.rental_date, t1.inventory_id, t8.film_id, t5.title AS Title, 
t3."name" AS Category, t4.customer_id, t4.store_id, t4.email, 
t5.length 
FROM rental  t1 
LEFT OUTER JOIN inventory t8 
ON  ( t8.inventory_id = t1.inventory_id )  
LEFT OUTER JOIN customer t4 
ON  ( t4.customer_id = t1.customer_id )  
LEFT OUTER JOIN film_category t7 
ON  ( t7.film_id = t8.film_id )  
LEFT OUTER JOIN film t5 
ON  ( t5.film_id = t8.film_id )  
LEFT OUTER JOIN category t3 
ON  ( t3.category_id = t7.category_id )  
LEFT OUTER JOIN film_actor t6 
ON  ( t6.film_id = t5.film_id )  
LEFT OUTER JOIN actor t2 
ON  ( t2.actor_id = t6.actor_id )  ;

The "money" call to xBestIndex in that case looked like this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable

In response to this, I originally promised to create an index using all 3 
constraints (two if which are the same), but SQLite rejected that and did a 
table scan instead, because it really only needed an index based on 
inventory_id.  So the heuristic we ended up with, which did fix this specific 
query, was (from Gunter):

> If you know nothing about a table except for the names of the fields 
> and the number of rows, then you are best off choosing 
> the first constraint only. (rows = cost = log n)

Well, now I have a counter-example.  This time, the SQL looks like this:

SELECT t2.rental_id, t2.rental_date, t1.film_id, t3.title, 
t4.category_id, t5."name" 
FROM rental  t2 
LEFT OUTER JOIN inventory t1 
ON  ( t2.inventory_id = t1.inventory_id )  
LEFT OUTER JOIN film t3 
ON  ( t3.film_id = t1.film_id )  
LEFT OUTER JOIN film_category t4 
ON  ( t4.film_id = t1.film_id )  
LEFT OUTER JOIN category t5 
ON  ( t5.category_id = t4.category_id )  ;

For whatever reason, the "money" call to xBestIndex in this case looks like 
this:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 3
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 0 (inventory_id) = Usable

The order of the constraints is different!  So, using the "first constraint" 
heuristic, I commit to indexing based on film_id, but indexing inventory on 
film_id not helpful for this query.  SQLite sees that inventory is indexed on 
film_id and decides to use table scan for inventory, and it's game over.

If SQLite calls to xBestIndex do not in some way convey which constraints 
matter and which ones don't, I don't see how I can use virtual tables.

Here is the WhereTrace, in case it might help:

*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
 begin solver.  (nRowEst=0)
New0 cost=216,200 order=0
 after round 0 
 0 cost=216 nrow=200 order=0
 begin solver.  (nRowEst=201)
New0 cost=216,200 order=1
 after round 0 
 0 cost=216 nrow=200 order=1 rev=0x0
 Solution nRow=200 ORDERBY=1,0x0
0 0.01.00 sqlite_master f 00100 N 0 cost 0,216,200
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increases cost 271 to 272
New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread Eric Hill
Sorry, scratch that.  T2 was not a virtual table when I ran this query.  My 
bad.  Told you I was a noob.

Eric

-Original Message-
From: Eric Hill 
Sent: Wednesday, May 20, 2015 12:32 PM
To: 'General Discussion of SQLite Database'
Subject: RE: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Eric had said:

>> But then what about a query like this:
>>
>>SELECT * FROM T1
>>LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = 
>> T1.b ) AND ( T2.c = T1.c );
>>
>> xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in 
>> that order.  

To which Gunter replied in part:

>  an Index on the LHS Table of a LEFT OUTER join never helps, because 
> you are requesting to retrieve all rows of the LHS anyway. SQLite is calling 
> xBestIndex with constraints on T1 for symmetry purposes only (this may even 
> be considered a bug).
> Try again with indexes on T2.

Very good point, so this is what puzzles me:  xBestIndex is *never called* for 
T2 for this query (!).  To my undoubtedly noob mind, *that* seems like a bug.

Thanks again!

Eric




[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread Eric Hill
Eric had said:

>> But then what about a query like this:
>>
>>SELECT * FROM T1
>>LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = T1.b ) AND 
>> ( T2.c = T1.c );
>>
>> xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in 
>> that order.  

To which Gunter replied in part:

>  an Index on the LHS Table of a LEFT OUTER join never helps, because you are 
> requesting to retrieve all rows of the LHS 
> anyway. SQLite is calling xBestIndex with constraints on T1 for symmetry 
> purposes only (this may even be considered a bug). 
> Try again with indexes on T2.

Very good point, so this is what puzzles me:  xBestIndex is *never called* for 
T2 for this query (!).  To my undoubtedly noob mind, *that* seems like a bug.

Thanks again!

Eric




[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread Hick Gunter
-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Dienstag, 19. Mai 2015 22:44
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Hey, Gunter,

...
But then what about a query like this:

SELECT * FROM T1
LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = T1.b ) AND ( 
T2.c = T1.c );

xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in that 
order.  In this case, though, it seems (to the uninitiated, at least ;-) that 
the "best index" would be:

CREATE INDEX T1_all ON T1 (c, b, a);

(or would it?) but the xBestIndex call does not contain any information that 
tells me that.  To be able to know that, I would have to know independently 
what the join conditions are.  And I could make that knowledge available to my 
xBestIndex implementation.  But how much better would that be than an index on 
c and partial scans to find the others?

(I tried this exact example, once creating an index on just T1.c and once 
creating an index on T1 (c, b, a), and in both cases, my index was not used by 
SQLite.  So I don't think I understand what SQLite wants.)

  Eric,

  an Index on the LHS Table of a LEFT OUTER join never helps, because you are 
requesting to retrieve all rows of the LHS anyway. SQLite is calling xBestIndex 
with constraints on T1 for symmetry purposes only (this may even be considered 
a bug). Try again with indexes on T2.

  The more xBestIndex knows about a table, the better its answers to SQLite 
will be. Unfortunately, xBestIndex has no way of knowing how many joins the 
passed constraints are coming from (apart from the obvious giveaway of repeated 
fields).

  If you know nothing about a table except for the names of the fields and the 
number of rows, then you are best off choosing the first constraint only. (rows 
= cost = log n)

  If you know the cardinality of each field, you should choose the one with the 
highest cardinality c. (rows = n/c, cost = rows + log n)

  If you at least know which sets of fields are guaranteed to be unique, choose 
the ?most fulfilled? set (tie break: least number of fields, highest 
cardinality).

  Gunter




  ___
   Gunter Hick
  Software Engineer
  Scientific Games International GmbH
  FN 157284 a, HG Wien
  Klitschgasse 2-4, A-1130 Vienna, Austria
  Tel: +43 1 80100 0
  E-Mail: hick at scigames.at

  This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.





[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-20 Thread James K. Lowden
On Tue, 19 May 2015 20:44:17 +
Eric Hill  wrote:

> But then what about a query like this:
> 
>   SELECT * FROM T1
>   LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b =
> T1.b ) AND ( T2.c = T1.c );
> 
> xBestIndex will get called here for T1 with 3 constraints, c, b, and
> a, in that order.  In this case, though, it seems (to the
> uninitiated, at least ;-) that the "best index" would be:
> 
>   CREATE INDEX T1_all ON T1 (c, b, a);

Given that query, any index that includes a, b, and c would serve
equally well.  The order of the criteria in the ON clause is
immaterial.  

> I guess xBestIndex is saying "Tell me about indexes that you already
> have available so I can take advantage of them", 

Right.  

> I have *no* indexes, but I am willing to make whatever indexes would
> be most helpful, if I could just figure that out.

Depending on the physical characteristics of thing you're
representing as a virtual table, you might not want any indexes. If
every access requires either a sequential scan or a random access to
computed location, all search criteria are either perfect or
inconsequential.  If that's fast enough, it's fast enough.  If you want
to make it faster, indexes are certainly an option.  

Don't start by guessing, though.  Rather than beginning by trying to
anticipate what the query-generation tool will produce (and thus what
xBestIndex combinations will be interrogated), it's better to put the
horse before the cart and consider the data.  Once you've characterized
your data, you apply that information to that supplied by xBestIndex.  

If you have a table in 1NF, you have a key.  What are the columns that
identify a row?  That's your primary key, and your first index. There
may be other sets of columns that uniquely identify a row; these also
could use an index to enforce/verify uniqueness.  

Suppose your primary key is {a,b,c} and you want to construct an index
for it.  At this point I start will talking out of school, because I
don't know anything about the SQLite query planner.  But if I go astray
I'm sure others will correct me.  

The choice of the first column in the index is the main concern, and it
is influenced by the dominant search criteria i.e., the kinds of queries
that will be submitted.  

Queries in general are of two kinds: point queries, returning one row
(specifying values for every column in the index), and range queries
(specifying an incomplete set of values for the index).  Point queries
are indifferent to column order: find row in index and return. 
Range queries are the interesting ones.  

In a B+ tree such as SQLite uses for its indexes, rows will be sorted
in column order.  If your index is {a,b,c}, all the "a" rows will be
together and all the "b" columns within those "a"'s, and so on.  When
the query says, 

where a between x and y

the I/O engine can scoop all the row references out of contiguous
pages in the index.  Whereas if the query says, 

where b between x and y

the index is all but useless.  Because it's sorted by "a", it would have
to be scanned from beginning to end to find all the "b"'s that meet the
criteria.  

The other important criterion to answering xBestIndex, as Hick
mentioned, is cardinality.  You can create for each column a histogram
of the frequency distribution for the values in that column.  If the
column domain is two values (e.g., 'male' and 'female'), for example,
the probability of it matching a queried value X might be 50%.  (I think
I read that at Google 4 employees in 5 are men, in which case there a
value of 'female' would be a much more useful 20%.)  You can use the
combination of selective cardinality and your intuition (or experience)
for probable search criteria to create other indexes.  Your answer to
xBestIndex will be based on the product of the selectivity of the
contiguous columns (starting from the first) in the applicable index.  

Cardinality is only one part of the equation, however; the other is the
operator.  Equality is at least an opportunity for a point query, but
inequality implies a range.  How big a range?  If we're talking about
ages, for example, 

where age > x

If X is 0, that's not very useful, whereas if X is 100, it's very
selective of human beings (assuming age is in years).  Problem is, X
isn't supplied (for good reason) to xBestIndex.  How to answer?  

You can only work with what you know: the cardinality of the column,
the operator, and the rowcount.  Rows and cardinality you know, and
they apply directly to equality comparisons.  For inequality, you need
a heuristic.  A quarter of a century ago, during the Late Bronz Age,
Sybase used a simple heuristic: every inequality selected 25% of the
table.  That meant a BETWEEN query (two inequalities) was 6.25% = 0.25
* 0.25.  I doubt they pulled that number out of a hat.  Absent better
information, that might be your best option.  

I hope that's of some use.  Complicated, for sure, and no small amount
of

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-19 Thread Eric Hill
Hey, Gunter,

Thanks for taking the time to make such a thorough response to my question.

Yes, your query is arguably more correct, though our queries do produce 
identical results.  The SQL I included was generated by a general-purpose query 
tool that automatically joins tables based on key relationships first and then 
common variable names, so it doesn't always do what makes the most sense.

Your explanation of xBestIndex is very helpful.  The data that I am projecting 
into SQLite via virtual tables does not have the concept of keys, constraints, 
or indexes, at least not currently.  So, when SQLite presents me with multiple 
constraints and asks for the best choice, I'm flying a bit blind.  Also, I'm 
using SQLite as part of the aforementioned general-purpose query tool; I'm not 
just trying to optimize queries into this DVD rental sample data; I'm trying to 
come up with general rules that will work reasonably well for arbitrary queries 
against data that I don't know much about.  I have implemented the strategy I 
mentioned at the end of my last post (choose the first constraint, constructing 
an index for it in the xFilter call), and it's doing orders of magnitude better 
than what I had before (SQLite is using my indexes) and performs as well as 
copying the data into SQLite and letting SQLite do everything, at least for the 
small set of cases I've tried.

But then what about a query like this:

SELECT * FROM T1
LEFT OUTER JOIN T2 ON ( T2.a = T1.a ) AND ( T2.b = T1.b ) AND ( 
T2.c = T1.c );

xBestIndex will get called here for T1 with 3 constraints, c, b, and a, in that 
order.  In this case, though, it seems (to the uninitiated, at least ;-) that 
the "best index" would be:

CREATE INDEX T1_all ON T1 (c, b, a);

(or would it?) but the xBestIndex call does not contain any information that 
tells me that.  To be able to know that, I would have to know independently 
what the join conditions are.  And I could make that knowledge available to my 
xBestIndex implementation.  But how much better would that be than an index on 
c and partial scans to find the others?

(I tried this exact example, once creating an index on just T1.c and once 
creating an index on T1 (c, b, a), and in both cases, my index was not used by 
SQLite.  So I don't think I understand what SQLite wants.)

I guess xBestIndex is saying "Tell me about indexes that you already have 
available so I can take advantage of them", but in my case, I have *no* 
indexes, but I am willing to make whatever indexes would be most helpful, if I 
could just figure that out.

Thanks again,

Eric

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Hick Gunter
Sent: Tuesday, May 19, 2015 3:20 AM
To: 'General Discussion of SQLite Database'
Subject: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Eric

Working backwards from your query, I think your schema would be similar to 
(foreign keys omitted)

CREATE TABLE rental ( rentalID INTEGER PRIMARY KEY NOT NULL, rental_date TEXT, 
inventory_id INTEGER, customer_id INTEGER, ...); CREATE TABLE inventory ( 
inventory_id INTEGER PRIMARY KEY NOT NULL, film_id INTEGER, ...); CREATE TABLE 
customer ( customer_id INTEGER PRIMARY KEY NOT NULL, first_name TEXT, last_name 
TEXT,...); CREATE TABLE film ( film_id INTEGER PRIMARY KEY NOT NULL, title 
TEXT, release_year INTEGER, length INTEGER, ...); CREATE TABLE actor ( actor_id 
INTEGER PRIMARY KEY NOT NULL, first_name TEXT, last_name TEXT,...); CREATE 
TABLE category ( category_id INTEGER PRIMARY KEY NOT NULL, name TEXT, ...); 
CREATE TABLE film_actor ( film_id INTEGER, actor_id INTEGER, 
UNIQUE(film_id,actor_id) ); CREATE TABLE film_category ( film_id INTEGER, 
category_id INTEGER, UNIQUE (film_id,category_id) );

And your query should really be (note that film_actor and film_category are 
joined to the film table)

SELECT t1.rental_date, t1.inventory_id, t1.customer_id, t2.film_id,
t2.store_id, t3.first_name AS cust_firstname, 
t3.last_name AS cust_lastname, t3.email,
t6.category_id, t4.title, t4.release_year, 
t4.length,
t4.rating, t4.rental_rate, t5.actor_id, t8.name 
AS category,
t7.first_name AS actor_firstname, t7.last_name 
AS actor_lastname
FROM rental10  t1
LEFT OUTER JOIN inventory t2
ON  ( t2.inventory_id = 
t1.inventory_id )
LEFT OUTER JOIN customer t3
ON  ( t3.customer_id = 
t1.customer_id )
LEFT OUTER JOIN film t4
ON  ( t4.film

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-19 Thread Hick Gunter
 Database
Betreff: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Per Richard's request, I have produced the WhereTrace for the query that uses 
virtual tables and the same query that is not using virtual tables.  That 
output is at the end of this message.  Ultimately, I think my question is, how 
should I respond to xBestIndex such that SQLite will perform the query as 
efficiently with virtual tables as it does when I copy all the data into SQLite?

Hick asked:

> Are you setting the constraintUsage return parameters correctly?

I suspect not.  I think I don't understand what to do when xBestIndex is passed 
multiple constraints, because those are the cases when SQLite rejects my 
indexes.  And, from debugging, my indexes are being rejected because this 
condition (at the tail end of whereLoopFindLesser()):

if( (p->prereq & pTemplate->prereq)==pTemplate->prereq   /* (1)  */

is false, so the costs I am returning never get considered.

Here is an example.  For the inventory table, xBestIndex gets called twice, 
each time with 4 constraints, 3 of which happen to be the same, film_id, 
presumably because inventory.film_id is used in 3 different join constraints:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 440393770484721

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 1

The first time, all the constraints are marked unusable, so I set argvIndex and 
omit to 0 for all four constraints and set the estimatedCost to a ridiculously 
large number.  The second time, all of the constraints are marked as usable.  
My data does not have any pre-existing indexes; I'm willing to create whatever 
indexes are needed to speed up the query.  So I set omit to 1 and argvIndex to 
1, 2, 3, and 4 respectively for the four constraints.  This is clearly where I 
am confusing SQLite.

Here is a simpler example, for the film_actor table.  xBestIndex is again 
called twice:

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Unusable
   CONST[1]: 0 (actor_id) = Unusable
   Index NOT created: est. cost: 890034380901136

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 0 (actor_id) = Usable
   Index created: est. cost: 1

Here again, in the second case, I set omit to 1 for both and set argvIndex to 1 
for film_id and 2 for actor_id.  What I am trying to tell SQLite is that I am 
willing to make an index that sorts first by film_id and then by actor_id for 
rows with the same film_id.  But I'm thinking that's not what SQLite wants.

Since I'm going to have to create an index for whichever constraint that SQLite 
is going to filter on, maybe I should just take the first constraint, set omit 
to 1 and argvIndex to 1, and set omit and argvIndex to 0 for all other 
constraints?  And then set the cost to n(log n) where n is number of rows, 
since that is the cost of making an index?

Thanks very much for your help.  WhereTrace follows:

=
WhereTrace when USING virtual tables:
=

New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increases cost 271 to 272
New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-18 Thread Eric Hill
Per Richard's request, I have produced the WhereTrace for the query that uses 
virtual tables and the same query that is not using virtual tables.  That 
output is at the end of this message.  Ultimately, I think my question is, how 
should I respond to xBestIndex such that SQLite will perform the query as 
efficiently with virtual tables as it does when I copy all the data into SQLite?

Hick asked:

> Are you setting the constraintUsage return parameters correctly?

I suspect not.  I think I don't understand what to do when xBestIndex is passed 
multiple constraints, because those are the cases when SQLite rejects my 
indexes.  And, from debugging, my indexes are being rejected because this 
condition (at the tail end of whereLoopFindLesser()):

if( (p->prereq & pTemplate->prereq)==pTemplate->prereq   /* (1)  */

is false, so the costs I am returning never get considered.

Here is an example.  For the inventory table, xBestIndex gets called twice, 
each time with 4 constraints, 3 of which happen to be the same, film_id, 
presumably because inventory.film_id is used in 3 different join constraints:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 440393770484721

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 1

The first time, all the constraints are marked unusable, so I set argvIndex and 
omit to 0 for all four constraints and set the estimatedCost to a ridiculously 
large number.  The second time, all of the constraints are marked as usable.  
My data does not have any pre-existing indexes; I'm willing to create whatever 
indexes are needed to speed up the query.  So I set omit to 1 and argvIndex to 
1, 2, 3, and 4 respectively for the four constraints.  This is clearly where I 
am confusing SQLite.  

Here is a simpler example, for the film_actor table.  xBestIndex is again 
called twice:

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Unusable
   CONST[1]: 0 (actor_id) = Unusable
   Index NOT created: est. cost: 890034380901136

jmpvtab BEST INDEX:  Table: film_actor  nConstraints: 2
   CONST[0]: 1 (film_id) = Usable
   CONST[1]: 0 (actor_id) = Usable
   Index created: est. cost: 1

Here again, in the second case, I set omit to 1 for both and set argvIndex to 1 
for film_id and 2 for actor_id.  What I am trying to tell SQLite is that I am 
willing to make an index that sorts first by film_id and then by actor_id for 
rows with the same film_id.  But I'm thinking that's not what SQLite wants.

Since I'm going to have to create an index for whichever constraint that SQLite 
is going to filter on, maybe I should just take the first constraint, set omit 
to 1 and argvIndex to 1, and set omit and argvIndex to 0 for all other 
constraints?  And then set the cost to n(log n) where n is number of rows, 
since that is the cost of making an index?

Thanks very much for your help.  WhereTrace follows:

=
WhereTrace when USING virtual tables:
=

New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
*** Optimizer Finished ***
*** Optimizer Start ***
add: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
   skip: * 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
add: * 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
0 0.01.00 sqlite_master f 04000 N 1 cost 271,53,43
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
 begin solver.  (nRowEst=0)
New0 cost=271, 43 order=0
Update 1 cost=216,180 order=0  was 0 cost=271, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=0
 begin solver.  (nRowEst=181)
 sort cost=239 (1/1) increases cost 271 to 272
New0 cost=272, 43 order=0
Update 1 cost=216,180 order=1  was 0 cost=272, 43 order=0
 after round 0 
 1 cost=216 nrow=180 order=1 rev=0x0
 Solution nRow=180 ORDERBY=1,0x0
1 0.01.00 sqlite_master f 00100 N 0 cost 0,216,180
*** Optimizer Finished ***
*** Optimizer Start ***
 Solution nRow=1
0 0.01.00 sqlite_master f 01101 N 1 cost 0,33,1
***

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-18 Thread Richard Hipp
Eric:  Can you please recompile the SQLite shell (sqlite3.exe) with
-DSQLITE_DEBUG and -DSQLITE_ENABLE_WHERETRACE and then run your query
after first running

   .wheretrace 0xff

And then send us the output?  (In order for this to work, you'll have
to build your virtual table implementation as a shared library/DLL and
load it using the ".load" command.)

If you cannot easily run your query from sqlite3.exe, then please
recompile your application using -DSQLITE_DEBUG and
-DSQLITE_ENABLE_WHERETRACE and then run:

extern int sqlite3WhereTrace;
sqlite3WhereTrace = 0xff;

Right before you run the problem query.


-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-18 Thread Hick Gunter
Are you setting the constraintUsage return parameters correctly?

SQLite ist asking two questions:

1) What is the cost of a full table scan on table inventory (your answer is 
4581; I personally would leave idxNum=0 for this case)

2) What is the least cost of doing a partial table scan with any combination of 
the four constraints (resulting from the joins of t1->t2 on inventory_id and 
T4/5/6 -> t2 on film_id)

In the second case, you should not only set rows and cost, but also indicate 
which constraint is to be used e.b. by setting constraintUsage[0].argvIndex = 
1. This allows SQLite to pass the actual constraint value to the xFilter 
function.
Also, if you can guarantee that your xFilter/xNext implementation will only 
return matching rows, you should set contraintUsage[0].omit = 1. This allows 
SQLite to skip the value check that would be coded to eliminate unwanted rows.
Personally, I would be using small positive numbers for the indexes.

Again, you should not be creating Indexes in xBestIndex; if you are actually 
creating indexes on request in xFilter, the cost of doing that should be 
included in the cost return value.

Looking at your query, you will be returning the cartesian product of all 
actors and categories. I also find it strange to join inventory records to 
actors and categories instead of joining via the film table, as would be 
suggested by the n:m relational table names too.

-Urspr?ngliche Nachricht-
Von: Eric Hill [mailto:Eric.Hill at jmp.com]
Gesendet: Freitag, 15. Mai 2015 18:03
An: sqlite-users at mailinglists.sqlite.org
Betreff: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

Hey,

Let me say up front that I'm sure this is my fault.  I have SQLite version 
3.8.9.

I am using virtual tables, and I am trying to get xBestIndex and xFilter doing 
the right things so that I get optimal queries.  Currently, I am working on the 
following query:

SELECT t1.rental_date, t1.inventory_id, t1.customer_id, 
t2.film_id,
t2.store_id, t3.first_name AS cust_firstname, 
t3.last_name AS cust_lastname, t3.email,
t6.category_id, t4.title, t4.release_year, 
t4.length,
t4.rating, t4.rental_rate, t5.actor_id, t8.name 
AS category,
t7.first_name AS actor_firstname, t7.last_name 
AS actor_lastname
FROM rental10  t1
LEFT OUTER JOIN inventory t2
ON  ( t2.inventory_id = 
t1.inventory_id )
LEFT OUTER JOIN customer t3
ON  ( t3.customer_id = 
t1.customer_id )
LEFT OUTER JOIN film_category t6
ON  ( t6.film_id = t2.film_id )
LEFT OUTER JOIN film t4
ON  ( t4.film_id = t2.film_id )
LEFT OUTER JOIN film_actor t5
ON  ( t5.film_id = t2.film_id )
LEFT OUTER JOIN category t8
ON  ( t8.category_id = 
t6.category_id )
LEFT OUTER JOIN actor t7
ON  ( t7.actor_id = t5.actor_id 
);

When I execute this query, the result is correct, but it is taking too long by 
an order of magnitude or two.  It seems to be doing full table scans despite 
the fact that I am creating indexes as requested.

For example, xBestIndex gets called for the inventory table twice, with four 
constraints, once with usable set to false for all four constraints, and once 
with usable set to true for all four.  Here is my printf debugging spew:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 4581
jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 3.66096029177608

So, in the first case, I do not create an index (which I signify by setting 
idxNum to -999), and I set the cost (and, est. rows) to 4581.  In the second 
case, I create an index (0) and set cost to log10(4581) = 3.66 and est. rows to 
4.  Yet, later, whenever xFilter is called for the inventory table, SQLite 
passes in idxNum = 999 and nConstraints = 0.  The index I dutifully created is 
never asked for.  In cases where there is a single constraint, SQLite does ask 
request the index in the xFilter call, but it seems that for all the cases 
where multiple constraints are involved, 

[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-17 Thread Eric Hill
Thanks, Dominique, for the tip on multifile amalgamation. After a few internet 
searches and the acquisition of gawk.exe and tclsh85.exe, I was able to build 
the multi-file amalgamation for 3.8.10.  Tomorrow, I will see if I can tell why 
SQLite is choosing table scans over using my indexes.  If anyone has tips on 
where to set breakpoints, that would be helpful.

One thing I did do:  Since I can copy the data into SQLite and do the same join 
as I was trying using virtual tables, I did that with EXPLAIN QUERY PLAN, and I 
got this:

0   0   0   SCAN TABLE rental10 AS t1
0   1   1   SEARCH TABLE inventory AS t2 USING AUTOMATIC COVERING 
INDEX (inventory_id=?)
0   2   2   SEARCH TABLE customer AS t3 USING AUTOMATIC COVERING 
INDEX (customer_id=?)
0   3   3   SEARCH TABLE film_category AS t6 USING AUTOMATIC 
COVERING INDEX (film_id=?)
0   4   4   SEARCH TABLE film AS t4 USING AUTOMATIC COVERING INDEX 
(film_id=?)
0   5   5   SEARCH TABLE film_actor AS t5 USING AUTOMATIC COVERING 
INDEX (film_id=?)
0   6   6   SEARCH TABLE category AS t8 USING AUTOMATIC COVERING 
INDEX (category_id=?)
0   7   7   SEARCH TABLE actor AS t7 USING AUTOMATIC COVERING INDEX 
(actor_id=?)

So, when SQLite has all the data, it figures out that it needs to use indexes, 
but when using my virtual tables, for some reason it doesn't.  Still working 
under the assumption that it is my fault.

Thanks,

Eric



[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-16 Thread Eduardo Morras
On Fri, 15 May 2015 18:34:33 +
Eric Hill  wrote:

> Thanks for your reply.
> 
> I went as high as using (number of rows)^4, resulting in a cost of
> ~440 trillion for the unindexed case, along with setting the cost to
> 1 for the indexed case, and it still won't use my index.
> 
> I'd like to step out of my xBestFilter implementation into SQLite
> code to see if I can tell what is going on there, but when I step
> out, the call stack knows where I'm supposed to be, but the debugger
> does not find the right line in sqlite3.c.  I have built sqlite3.c
> simply by adding the amalgamation to my Microsoft Visual Studio 2013 C
> ++ project.  Is there something I can do to make the debugger work?
> 
> I will postpone index creation until the call to xFilter, I reckon,
> once I work out these other issues.  Thanks for the tip!

Perhaps you can disable the unindexed path, forcing to use always the index. 
This way your virtual table always use your index.

What I don't know is: 

Sqlite3 can use only one index per query, I don't know if indexes from virtual 
tables count for this limit. I suppouse not because it should be an internal 
virtual table decission implementation show the selected data the faster way. 

If it counts for index limit, perhaps Sqlite3 found a faster index when joining 
your virtual table data with other table data.

Perhaps I miss read something in documentation.

Dr. Hipps, can you explain if virtual table index count for this limit?

> Eric

---   ---
Eduardo Morras 


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Dominique Devienne
On Fri, May 15, 2015 at 8:34 PM, Eric Hill  wrote:

> Is there something I can do to make the debugger work?
>

See this thread:
http://sqlite.1065341.n5.nabble.com/Windows-Specific-2-c-files-Amalgamation-td67626.html

Basically, use a multi-file amalgamation, that uses files with fewer than
64K lines. --DD

PS: Richard or someone else might provide a link to the current-release
such amalgamation.


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Eric Hill
Thanks for your reply.

I went as high as using (number of rows)^4, resulting in a cost of ~440 
trillion for the unindexed case, along with setting the cost to 1 for the 
indexed case, and it still won't use my index.

I'd like to step out of my xBestFilter implementation into SQLite code to see 
if I can tell what is going on there, but when I step out, the call stack knows 
where I'm supposed to be, but the debugger does not find the right line in 
sqlite3.c.  I have built sqlite3.c simply by adding the amalgamation to my 
Microsoft Visual Studio 2013 C++ project.  Is there something I can do to make 
the debugger work?

I will postpone index creation until the call to xFilter, I reckon, once I work 
out these other issues.  Thanks for the tip!

Eric

-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Richard Hipp
Sent: Friday, May 15, 2015 12:23 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Virtual Table query - why isn't SQLite using my indexes?

On 5/15/15, Eric Hill  wrote:
>
> So, in the first case, I do not create an index (which I signify by 
> setting idxNum to -999), and I set the cost (and, est. rows) to 4581.  
> In the second case, I create an index (0) and set cost to log10(4581) 
> = 3.66 and est. rows to 4.  Yet, later, whenever xFilter is called for 
> the inventory table, SQLite passes in idxNum = 999 and nConstraints = 
> 0.  The index I dutifully created is never asked for.  In cases where 
> there is a single constraint, SQLite does ask request the index in the 
> xFilter call, but it seems that for all the cases where multiple 
> constraints are involved, the index is not being used.
>

Two things:

(1) You probably shouldn't be "creating an index" in response to an xBestIndex 
call.  xBestIndex should be thought of as a "what-if"
function.  It is asking your virtual table what it could do with a query given 
certain constraints.  SQLite makes no guarantees that it will actually ever 
call your virtual table that way - it is merely exploring possibilities.

(2) The query planner looks at many different cost factors and tries to pick 
the best overall query plan.  You've told it that running your virtual table 
without an index is 1252 times slower than running it with an index.  And it 
takes this into consideration.  That SQLite is not choosing to use the virtual 
table index indicates that some other part or parts of the join would be more 
than 1252 times slower if the virtual table index were in fact used, and so the 
overall query plan is faster even without the virtual table index.  If these 
estimates are incorrect, then an obvious work-around is merely to increase the 
cost of not using the indexing mode on the virtual table.  Have it return 10x 
or 100x the cost (45810 or 458100) when not using an index, and see if that 
helps.

--
D. Richard Hipp
drh at sqlite.org
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Eric Hill
Hey,

Let me say up front that I'm sure this is my fault.  I have SQLite version 
3.8.9.

I am using virtual tables, and I am trying to get xBestIndex and xFilter doing 
the right things so that I get optimal queries.  Currently, I am working on the 
following query:

SELECT t1.rental_date, t1.inventory_id, t1.customer_id, 
t2.film_id,
t2.store_id, t3.first_name AS cust_firstname, 
t3.last_name AS cust_lastname, t3.email,
t6.category_id, t4.title, t4.release_year, 
t4.length,
t4.rating, t4.rental_rate, t5.actor_id, t8.name 
AS category,
t7.first_name AS actor_firstname, t7.last_name 
AS actor_lastname
FROM rental10  t1
LEFT OUTER JOIN inventory t2
ON  ( t2.inventory_id = 
t1.inventory_id )
LEFT OUTER JOIN customer t3
ON  ( t3.customer_id = 
t1.customer_id )
LEFT OUTER JOIN film_category t6
ON  ( t6.film_id = t2.film_id )
LEFT OUTER JOIN film t4
ON  ( t4.film_id = t2.film_id )
LEFT OUTER JOIN film_actor t5
ON  ( t5.film_id = t2.film_id )
LEFT OUTER JOIN category t8
ON  ( t8.category_id = 
t6.category_id )
LEFT OUTER JOIN actor t7
ON  ( t7.actor_id = t5.actor_id 
);

When I execute this query, the result is correct, but it is taking too long by 
an order of magnitude or two.  It seems to be doing full table scans despite 
the fact that I am creating indexes as requested.

For example, xBestIndex gets called for the inventory table twice, with four 
constraints, once with usable set to false for all four constraints, and once 
with usable set to true for all four.  Here is my printf debugging spew:

jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Unusable
   CONST[1]: 1 (film_id) = Unusable
   CONST[2]: 1 (film_id) = Unusable
   CONST[3]: 1 (film_id) = Unusable
   Index NOT created: est. cost: 4581
jmpvtab BEST INDEX:  Table: inventory  nConstraints: 4
   CONST[0]: 0 (inventory_id) = Usable
   CONST[1]: 1 (film_id) = Usable
   CONST[2]: 1 (film_id) = Usable
   CONST[3]: 1 (film_id) = Usable
   Index created: est. cost: 3.66096029177608

So, in the first case, I do not create an index (which I signify by setting 
idxNum to -999), and I set the cost (and, est. rows) to 4581.  In the second 
case, I create an index (0) and set cost to log10(4581) = 3.66 and est. rows to 
4.  Yet, later, whenever xFilter is called for the inventory table, SQLite 
passes in idxNum = 999 and nConstraints = 0.  The index I dutifully created is 
never asked for.  In cases where there is a single constraint, SQLite does ask 
request the index in the xFilter call, but it seems that for all the cases 
where multiple constraints are involved, the index is not being used.

I did EXPLAIN QUERY PLAN for the query and got this, consistent with what I'm 
seeing:

0  0  0  SCAN TABLE rental10 AS t1 VIRTUAL 
TABLE INDEX -999:
0  1  1  SCAN TABLE inventory AS t2 VIRTUAL 
TABLE INDEX -999:
0  2  2  SCAN TABLE customer AS t3 VIRTUAL 
TABLE INDEX 0:
0  3  3  SCAN TABLE film_category AS t6 
VIRTUAL TABLE INDEX -999:
0  4  4  SCAN TABLE film AS t4 VIRTUAL 
TABLE INDEX 0:
0  5  5  SCAN TABLE film_actor AS t5 
VIRTUAL TABLE INDEX -999:
0  6  6  SCAN TABLE category AS t8 VIRTUAL 
TABLE INDEX -999:
0  7  7  SCAN TABLE actor AS t7 VIRTUAL 
TABLE INDEX 0:

Now, I know that SQLite is capable of efficiently performing this query, 
because I also have the ability to copy these tables into SQLite so that I am 
querying real tables instead of virtual tables.  SQLite can perform the query 
in under 1 second with real tables, but with virtual tables, it is taking > 25 
seconds.

Any thoughts on what I can do to convince SQLite to use my indexes?

Thanks,

Eric



[sqlite] Virtual Table query - why isn't SQLite using my indexes?

2015-05-15 Thread Richard Hipp
On 5/15/15, Eric Hill  wrote:
>
> So, in the first case, I do not create an index (which I signify by setting
> idxNum to -999), and I set the cost (and, est. rows) to 4581.  In the second
> case, I create an index (0) and set cost to log10(4581) = 3.66 and est. rows
> to 4.  Yet, later, whenever xFilter is called for the inventory table,
> SQLite passes in idxNum = 999 and nConstraints = 0.  The index I dutifully
> created is never asked for.  In cases where there is a single constraint,
> SQLite does ask request the index in the xFilter call, but it seems that for
> all the cases where multiple constraints are involved, the index is not
> being used.
>

Two things:

(1) You probably shouldn't be "creating an index" in response to an
xBestIndex call.  xBestIndex should be thought of as a "what-if"
function.  It is asking your virtual table what it could do with a
query given certain constraints.  SQLite makes no guarantees that it
will actually ever call your virtual table that way - it is merely
exploring possibilities.

(2) The query planner looks at many different cost factors and tries
to pick the best overall query plan.  You've told it that running your
virtual table without an index is 1252 times slower than running it
with an index.  And it takes this into consideration.  That SQLite is
not choosing to use the virtual table index indicates that some other
part or parts of the join would be more than 1252 times slower if the
virtual table index were in fact used, and so the overall query plan
is faster even without the virtual table index.  If these estimates
are incorrect, then an obvious work-around is merely to increase the
cost of not using the indexing mode on the virtual table.  Have it
return 10x or 100x the cost (45810 or 458100) when not using an index,
and see if that helps.

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread R.Smith

On 2015-03-06 03:19 PM, Richard Hipp wrote:
> Yikes! Actually (1) cannot be determined for normal (non-virtual) 
> tables either because the value could be a BLOB even if the column 
> affinity is TEXT. And so the current LIKE optimization is not valid 
> *ever*. See ticket https://www.sqlite.org/src/tktview/05f43be8fdda9f I 
> don't yet know how we will fix this... 

True, but who uses LIKE on a BLOB?  Any such usage has to be by accident 
- though granted, this can happen. Maybe a simple documentation note 
stating that LIKE operator on BLOB values will have undefined results?



[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread Hick Gunter
Using LIKE on a BLOB is not the problem.

It is the LIKE optimization that is broken, because it requires a BLOB to sort 
AFTER a text, which is never the case, while the LIKE function compares an 
expression that may contain wildcards to the raw data, which may be the case.

-Urspr?ngliche Nachricht-
Von: R.Smith [mailto:rsmith at rsweb.co.za]
Gesendet: Freitag, 06. M?rz 2015 14:25
An: sqlite-users at mailinglists.sqlite.org
Betreff: Re: [sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite 
with 'like'


On 2015-03-06 03:19 PM, Richard Hipp wrote:
> Yikes! Actually (1) cannot be determined for normal (non-virtual)
> tables either because the value could be a BLOB even if the column
> affinity is TEXT. And so the current LIKE optimization is not valid
> *ever*. See ticket https://www.sqlite.org/src/tktview/05f43be8fdda9f I
> don't yet know how we will fix this...

True, but who uses LIKE on a BLOB?  Any such usage has to be by accident
- though granted, this can happen. Maybe a simple documentation note stating 
that LIKE operator on BLOB values will have undefined results?

___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread Dominique Devienne
On Fri, Mar 6, 2015 at 1:21 PM, Hick Gunter  wrote:

> And then there remain to be considered the effects of the pragma
> CASE_SENSITIVE_LIKE
>

Good point.

But that's no different from the case when an application overrides the
LIKE behavior via a custom function, and the vtable can similarly decide to
ignore it or not;
Or can even decide changing like the semantic of LIKE, again just like an
application overrides.

Which is way I think the point you raise is tangential to SQLite not giving
vtables the opportunity to optimize LIKE. --DD


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread Hick Gunter
And then there remain to be considered the effects of the pragma 
CASE_SENSITIVE_LIKE

-Urspr?ngliche Nachricht-
Von: Dominique Devienne [mailto:ddevienne at gmail.com]
Gesendet: Freitag, 06. M?rz 2015 10:30
An: General Discussion of SQLite Database
Betreff: Re: [sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite 
with 'like'

On Fri, Mar 6, 2015 at 10:12 AM, R.Smith  wrote:

> On 2015-03-06 09:42 AM, Dominique Devienne wrote:
>
>> On Thu, Mar 5, 2015 at 7:45 PM, Richard Hipp  wrote:
>>
>>
>>> The LIKE operator can be overridden by the application to mean
>>> anything the application wants - it is not compelled to follow
>>> standard SQL semantics.  For that reason, virtual tables are unable
>>> to optimize using LIKE since they have no way of knowing what it will do.
>>>
>>> Works as designed.
>>>
>>>  Sure. But SQLite knows whether LIKE is overriden by the application
>>> or
>> not,
>> so shouldn't it only hide LIKE from the virtual table when it detects
>> it, instead of wholesale preventing the vtable from optimizing the
>> "normal semantic" LIKE case? My $0.02. --DD
>>
>
> I'm sure this is possible, but how would it be feasible?  If this was
> implemented, how would you know as a programmer designing a virtual
> table (or many other interface things) whether or not  the target you
> operate on will/will not be able to run some bit of code?  Even if you
> could query the API to find out whether it is possible (i.e. the
> normal LIKE is used), you still need to use that result as a specifier
> to decide which code block to implement. And if you are going to have
> to do two code blocks... might as well do the one where LIKE isn't supported 
> right from the start.
>
> Now if there was a setting where you could /force/ the standard LIKE
> to be used along with supporting the v-table interface, that might be
> something, but that might break a whole other universe of
> possibilities for v-table users and end-user customization.


That's not what I was saying. DRH is saying the vtable *never* sees the LIKE 
and SQLite itself a-posteriori filters out the rows based on LIKE. The vtable 
is not aware it's been denied seeing the LIKE. I'm saying that (in the vast 
majority of) cases when LIKE is not overriden by the application (globally or 
for that vtable), something SQLite can know, SQLite could pass the LIKE 
constraint to the xBestIndex, and let the vtable decide whether it can handle 
(i.e. optimize) LIKE or not, and whether SQLite should double-check it or not 
(as usual for vtable indexing). When it is overriden, it behaves as now. But 
that way the vtable has at least the opportunity to optimize LIKE, unlike now.

Whether that's a priority, obviously not given what we've read so far, but 
whether that's technically possible, from 30,000 ft I don't see why not given 
the above. My $0.02. --DD ___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: hick at scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.




[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread R.Smith


On 2015-03-06 09:42 AM, Dominique Devienne wrote:
> On Thu, Mar 5, 2015 at 7:45 PM, Richard Hipp  wrote:
>
>>
>> The LIKE operator can be overridden by the application to mean
>> anything the application wants - it is not compelled to follow
>> standard SQL semantics.  For that reason, virtual tables are unable to
>> optimize using LIKE since they have no way of knowing what it will do.
>>
>> Works as designed.
>>
> Sure. But SQLite knows whether LIKE is overriden by the application or not,
> so shouldn't it only hide LIKE from the virtual table when it detects it,
> instead of wholesale preventing the vtable from optimizing the "normal
> semantic" LIKE case? My $0.02. --DD

I'm sure this is possible, but how would it be feasible?  If this was 
implemented, how would you know as a programmer designing a virtual 
table (or many other interface things) whether or not  the target you 
operate on will/will not be able to run some bit of code?  Even if you 
could query the API to find out whether it is possible (i.e. the normal 
LIKE is used), you still need to use that result as a specifier to 
decide which code block to implement. And if you are going to have to do 
two code blocks... might as well do the one where LIKE isn't supported 
right from the start.

Now if there was a setting where you could /force/ the standard LIKE to 
be used along with supporting the v-table interface, that might be 
something, but that might break a whole other universe of possibilities 
for v-table users and end-user customization.




[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread Dominique Devienne
On Fri, Mar 6, 2015 at 10:12 AM, R.Smith  wrote:

> On 2015-03-06 09:42 AM, Dominique Devienne wrote:
>
>> On Thu, Mar 5, 2015 at 7:45 PM, Richard Hipp  wrote:
>>
>>
>>> The LIKE operator can be overridden by the application to mean
>>> anything the application wants - it is not compelled to follow
>>> standard SQL semantics.  For that reason, virtual tables are unable to
>>> optimize using LIKE since they have no way of knowing what it will do.
>>>
>>> Works as designed.
>>>
>>>  Sure. But SQLite knows whether LIKE is overriden by the application or
>> not,
>> so shouldn't it only hide LIKE from the virtual table when it detects it,
>> instead of wholesale preventing the vtable from optimizing the "normal
>> semantic" LIKE case? My $0.02. --DD
>>
>
> I'm sure this is possible, but how would it be feasible?  If this was
> implemented, how would you know as a programmer designing a virtual table
> (or many other interface things) whether or not  the target you operate on
> will/will not be able to run some bit of code?  Even if you could query the
> API to find out whether it is possible (i.e. the normal LIKE is used), you
> still need to use that result as a specifier to decide which code block to
> implement. And if you are going to have to do two code blocks... might as
> well do the one where LIKE isn't supported right from the start.
>
> Now if there was a setting where you could /force/ the standard LIKE to be
> used along with supporting the v-table interface, that might be something,
> but that might break a whole other universe of possibilities for v-table
> users and end-user customization.


That's not what I was saying. DRH is saying the vtable *never* sees the
LIKE and SQLite itself a-posteriori filters out the rows based on LIKE. The
vtable is not aware it's been denied seeing the LIKE. I'm saying that (in
the vast majority of) cases when LIKE is not overriden by the application
(globally or for that vtable), something SQLite can know, SQLite could pass
the LIKE constraint to the xBestIndex, and let the vtable decide whether it
can handle (i.e. optimize) LIKE or not, and whether SQLite should
double-check it or not (as usual for vtable indexing). When it is
overriden, it behaves as now. But that way the vtable has at least the
opportunity to optimize LIKE, unlike now.

Whether that's a priority, obviously not given what we've read so far, but
whether that's technically possible, from 30,000 ft I don't see why not
given the above. My $0.02. --DD


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread Dominique Devienne
On Thu, Mar 5, 2015 at 7:45 PM, Richard Hipp  wrote:

> On 3/5/15, Mike Nicolino  wrote:
> > I'm using System.Data.SQLite version 1.0.95.0 and have what appears to
> be a
> > bug with Virtual Tables.  Queries using 'like' in the where clause are
> not
> > getting the like clause passed to BestIndex as a query constraint.
> > Specifically:
> >
> >
> > -  Simple query: select * from foo where name like 'a%'
>
> The LIKE operator can be overridden by the application to mean
> anything the application wants - it is not compelled to follow
> standard SQL semantics.  For that reason, virtual tables are unable to
> optimize using LIKE since they have no way of knowing what it will do.
>
> Works as designed.
>

Sure. But SQLite knows whether LIKE is overriden by the application or not,
so shouldn't it only hide LIKE from the virtual table when it detects it,
instead of wholesale preventing the vtable from optimizing the "normal
semantic" LIKE case? My $0.02. --DD


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-06 Thread Richard Hipp
On 3/6/15, Dominique Devienne  wrote:
>
> Sure. But SQLite knows whether LIKE is overriden by the application or not,
> so shouldn't it only hide LIKE from the virtual table when it detects it,
> instead of wholesale preventing the vtable from optimizing the "normal
> semantic" LIKE case?

It's more complicated than that.

The optimization in question is to convert WHERE clause terms of the form:

xyz LIKE 'abc%'

Into:

xyz>='abc' AND xyz<'abd'

But this optimization is only valid if (1) xyz is a text value, not a
numeric value and (2) xyz has the "nocase" collating sequence.  We
typically do not know either for a virtual table.  You might know (2)
if the expression has an explicit COLLATE clause:

xyz LIKE 'abc%' COLLATE nocase

But SQLite does not have a mechanism whereby a virtual table can tell
the query planner that the value of a column will never be numeric.

Yikes!  Actually (1) cannot be determined for normal (non-virtual)
tables either because the value could be a BLOB even if the column
affinity is TEXT.  And so the current LIKE optimization is not valid
*ever*.  See ticket https://www.sqlite.org/src/tktview/05f43be8fdda9f

I don't yet know how we will fix this...
-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-05 Thread Mike Nicolino
Hmm, yes this will work.  Though re-educating users writing their own queries 
to avoid like in this case will be an ongoing challenge. :)  But I do 
understand that generic 'like' support for Virtual Tables given the ability to 
override would be very challenging to implement generically.

MikeN


-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Jay Kreibich
Sent: Thursday, March 05, 2015 10:55 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite 
with 'like'


On Mar 5, 2015, at 12:30 PM, Mike Nicolino  
wrote:

> I'm using System.Data.SQLite version 1.0.95.0 and have what appears to be a 
> bug with Virtual Tables.  Queries using 'like' in the where clause are not 
> getting the like clause passed to BestIndex as a query constraint.  
> Specifically:
> 


"LIKE" is not a supported virtual table index constraint.  See: 
http://www.sqlite.org/c3ref/c_index_constraint_eq.html   As such, it cannot be 
passed to a virtual table implementation, and the constraint is handled by the 
SQLite engine above the virtual table implementation.

Given that the LIKE expression is translated into an SQL function, which the 
application can override, it would be difficult for a virtual table to 
correctly implement a LIKE operation internally, while matching the exact 
functionality of the current LIKE function.

Consider a statement like this:

SELECT * FROM myVirtualTable AS vt WHERE returnThisRow_CustomFunction( 
vt.col1 );

If returnThisRow_CustomFunction() is a function returns a true or false based 
on... well, who knows what... there is no way for a virtual table 
implementation to understand the inter-workings of that function and pre-filter 
the rows.  LIKE is no different.



It should be noted that MATCH is a supported virtual table index constraint 
supported.  Along with the virtual table xFindFunction() function allows a 
virtual table to implement a table specific filter function.  This is how the 
FTS extensions implement searches.

Consider providing a virtual table specific MATCH function, over-ride use on 
your table with xFindFunction(), and rewrite statements using MATCH rather than 
LIKE.

See the FTS modules as examples.  You might want to start here: 
https://www.sqlite.org/fts3.html


 -j



--  
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it, but showing 
it to the wrong people has the tendency to make them feel uncomfortable." -- 
Angela Johnson





___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-05 Thread Mike Nicolino
> I'm using System.Data.SQLite version 1.0.95.0 and have what appears to 
> be a bug with Virtual Tables.  Queries using 'like' in the where 
> clause are not getting the like clause passed to BestIndex as a query 
> constraint.
> Specifically:
>
>
> -  Simple query: select * from foo where name like 'a%'

The LIKE operator can be overridden by the application to mean anything the 
application wants - it is not compelled to follow standard SQL semantics.  For 
that reason, virtual tables are unable to optimize using LIKE since they have 
no way of knowing what it will do.

Works as designed.


Not communicating the like constraint to virtual tables make it impossible for 
a virtual table to do query optimization to be done in that case.  I realize 
like behavior can be overridden, but the resulting 'query' still needs to 
filter down to virtual table in some way to avoid full table scans.  Otherwise 
any queries using 'like' against a virtual table of a substantial size become 
potentially unusable depending on 'time' required for a full table scan.

Incidentally, this used to 'work' in a much older version of SQLite, though the 
semantics may not have been correct in all cases.  Version 3.7.7.1, ended up 
transforming like to of pair of constraints in the "like 'a%'" case which were 
passed to BestIndex.

Thanks,
MikeN



-Original Message-
From: sqlite-users-bounces at mailinglists.sqlite.org 
[mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Richard Hipp
Sent: Thursday, March 05, 2015 10:46 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite 
with 'like'

On 3/5/15, Mike Nicolino  wrote:
> I'm using System.Data.SQLite version 1.0.95.0 and have what appears to 
> be a bug with Virtual Tables.  Queries using 'like' in the where 
> clause are not getting the like clause passed to BestIndex as a query 
> constraint.
> Specifically:
>
>
> -  Simple query: select * from foo where name like 'a%'

The LIKE operator can be overridden by the application to mean anything the 
application wants - it is not compelled to follow standard SQL semantics.  For 
that reason, virtual tables are unable to optimize using LIKE since they have 
no way of knowing what it will do.

Works as designed.

>
> -  Break inside module BestIndex
>
> -  SQLiteIndex.Inputs.Constraints has 0 length (no constraints)
>
> The above causes a full table scan of the virtual table for queries 
> using 'like', which is very bad for any virtual table of a substantial size.
> Virtual tables need to be able to use 'like' clauses to restrict 
> result set size.
>
> Before I bug this issue, is anyone aware of it and have any workaround?
> Currently, the only workaround I've got, is telling users don't use 'like'
> in their queries (obviously not a good thing).
>
> Thanks,
> MikeN
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


--
D. Richard Hipp
drh at sqlite.org
___
sqlite-users mailing list
sqlite-users at mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-05 Thread Mike Nicolino
I'm using System.Data.SQLite version 1.0.95.0 and have what appears to be a bug 
with Virtual Tables.  Queries using 'like' in the where clause are not getting 
the like clause passed to BestIndex as a query constraint.  Specifically:


-  Simple query: select * from foo where name like 'a%'

-  Break inside module BestIndex

-  SQLiteIndex.Inputs.Constraints has 0 length (no constraints)

The above causes a full table scan of the virtual table for queries using 
'like', which is very bad for any virtual table of a substantial size.  Virtual 
tables need to be able to use 'like' clauses to restrict result set size.

Before I bug this issue, is anyone aware of it and have any workaround?  
Currently, the only workaround I've got, is telling users don't use 'like' in 
their queries (obviously not a good thing).

Thanks,
MikeN



[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-05 Thread Richard Hipp
On 3/5/15, Mike Nicolino  wrote:
> Incidentally, this used to 'work' in a much older version of SQLite, though
> the semantics may not have been correct in all cases.  Version 3.7.7.1,
> ended up transforming like to of pair of constraints in the "like 'a%'" case
> which were passed to BestIndex.
>

That was a bug - it could lead to incorrect answers depending on the
virtual table and its content.  The bug was fixed on 2012-03-29.
https://www.sqlite.org/src/timeline?c=2012-03-29+14:29:07

-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-05 Thread Richard Hipp
On 3/5/15, Mike Nicolino  wrote:
> I'm using System.Data.SQLite version 1.0.95.0 and have what appears to be a
> bug with Virtual Tables.  Queries using 'like' in the where clause are not
> getting the like clause passed to BestIndex as a query constraint.
> Specifically:
>
>
> -  Simple query: select * from foo where name like 'a%'

The LIKE operator can be overridden by the application to mean
anything the application wants - it is not compelled to follow
standard SQL semantics.  For that reason, virtual tables are unable to
optimize using LIKE since they have no way of knowing what it will do.

Works as designed.

>
> -  Break inside module BestIndex
>
> -  SQLiteIndex.Inputs.Constraints has 0 length (no constraints)
>
> The above causes a full table scan of the virtual table for queries using
> 'like', which is very bad for any virtual table of a substantial size.
> Virtual tables need to be able to use 'like' clauses to restrict result set
> size.
>
> Before I bug this issue, is anyone aware of it and have any workaround?
> Currently, the only workaround I've got, is telling users don't use 'like'
> in their queries (obviously not a good thing).
>
> Thanks,
> MikeN
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
D. Richard Hipp
drh at sqlite.org


[sqlite] Virtual Table BestIndex Bug in system.data.sqlite/sqlite with 'like'

2015-03-05 Thread Jay Kreibich

On Mar 5, 2015, at 12:30 PM, Mike Nicolino  
wrote:

> I'm using System.Data.SQLite version 1.0.95.0 and have what appears to be a 
> bug with Virtual Tables.  Queries using 'like' in the where clause are not 
> getting the like clause passed to BestIndex as a query constraint.  
> Specifically:
> 


?LIKE? is not a supported virtual table index constraint.  See: 
http://www.sqlite.org/c3ref/c_index_constraint_eq.html   As such, it cannot be 
passed to a virtual table implementation, and the constraint is handled by the 
SQLite engine above the virtual table implementation.

Given that the LIKE expression is translated into an SQL function, which the 
application can override, it would be difficult for a virtual table to 
correctly implement a LIKE operation internally, while matching the exact 
functionality of the current LIKE function.

Consider a statement like this:

SELECT * FROM myVirtualTable AS vt WHERE returnThisRow_CustomFunction( 
vt.col1 );

If returnThisRow_CustomFunction() is a function returns a true or false based 
on? well, who knows what? there is no way for a virtual table implementation to 
understand the inter-workings of that function and pre-filter the rows.  LIKE 
is no different.



It should be noted that MATCH is a supported virtual table index constraint 
supported.  Along with the virtual table xFindFunction() function allows a 
virtual table to implement a table specific filter function.  This is how the 
FTS extensions implement searches.

Consider providing a virtual table specific MATCH function, over-ride use on 
your table with xFindFunction(), and rewrite statements using MATCH rather than 
LIKE.

See the FTS modules as examples.  You might want to start here: 
https://www.sqlite.org/fts3.html


 -j



--  
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it, but showing 
it to the wrong people has the tendency to make them feel uncomfortable." -- 
Angela Johnson







[sqlite] Sqlite Virtual Table Example with non-trivial xBestIndex (was: Porting SQLite to another operating system (not supported out of the box))

2015-02-11 Thread Hick Gunter
Having personally written about a dozen virtual table implementations I can 
confirm that those implementations needing a nontrivial  xBestIndex function 
are all based on building an SQLite interface on substantial proprietary 
storage subsystems like an in-memory ISAM table (with configurable record and 
key structure), a Faircom CTree driver (adding configurable record and key 
structure), a partition provider (allowing storage to be split between several 
tables of identical structure by configurable record fields), etc.

One of the more challenging tasks involved adding a fastbit based index to a 
variable record length event logfile for an OLTP application.

"Simple" virtual tables require retrieval by "rowid" (e.g. memory address or 
file offset) at best.

-Ursprüngliche Nachricht-
Von: Jay Kreibich [mailto:j...@kreibi.ch]
Gesendet: Mittwoch, 11. Februar 2015 04:04
An: Peter Aronson; General Discussion of SQLite Database
Betreff: Re: [sqlite] Porting SQLite to another operating system (not supported 
out of the box)





On Feb 10, 2015, at 11:21 AM, Peter Aronson  wrote:

> You could add VFS creation if you ever do a revised edition (along with a 
> virtual table example that actually used xBestIndex and xFilter…)

Given that the book is over four years old and covers to the end of SQLIte3 
3.6.x, there are a lot of things that would need to go into a revised edition… 
including a lot more examples of everything, according to reviews.  We greatly 
underestimated the number of SQLite developers that were touching SQL for the 
first time, and I would have never guessed people would have considered yet 
another SQL lesson to be so important, given that there are a million books and 
a bazillion websites on learning SQL basics.  You can literally find books on 
“SQL For Dummies” (Allen Taylor) to “SQL For Smarties” (Joe Celko), and 
everything in-between.  That last book (or books, actually) is awesome, BTW, 
and the “Advanced SQL Programming” one should be on the shelf of every database 
programmer doing anything more advanced than an address book.

Regardless, if we do a second edition (and at this point that’s an extremely, 
exceptionally big “if”), VFS is unlikely to make the cut.  Consider that out of 
the thousands of SQLite applications and billions of installed databases, there 
are likely less than 100 production VFS modules in the whole world.  Spending a 
lot of time and pages, driving up the cost of the book, covering an extremely 
advanced and obscure topic is a poor trade-off (every page averages about a day 
to write/edit/prep, and adds about $0.10 to the price of the book).  If you 
need that level of integration and detail, working in the guts of the I/O and 
locking system, you should likely hand the task to a systems engineer that is 
familiar with the problem domain and isn’t afraid of looking through a few 
headers and examples to figure it all out.  It’s advanced, custom stuff that 
is, almost by definition, not textbook work.  It is the kind of work that 
requires digging through nitty-gritty code, documentation, and examples from 
both SQLite and your environment.  This is the kind of thing that’s learned 
from years of experience, not by reading it in a book.

That isn’t meant to be a criticism of the original poster— there is a huge 
difference between asking if anyone knows where to start looking, and asking 
for detailed step-by-step instructions.  In fact, if we did decide to put some 
information about VFS modules in a book, it would likely be a discussion of how 
the structures and APIs fit together, what they’re used for, and the types of 
things that can be done with them— exactly the kind of info you need to get 
started, but not much beyond that.  After all, what goes in those functions is 
going to be extremely dependent on the environment the VFS is trying to use.

I might say similar things about the xBestIndex() and xFilter() functions.  
While the APIs and how they are used is a tad confusing, their purpose and 
function should be reasonably straight forward to someone comfortable with 
relational data management and design.  While the book attempts to cover how 
the APIs are meant to perform their tasks (and has a six page discussion on 
their purpose and use), actually writing such a function is extremely dependent 
on understanding the virtual table being design— and the data in it. I feel it 
is something that just needs to be done by a skilled engineer, with a lot of 
detailed knowledge about the problem that’s trying to be solved.  Again, there 
aren’t any real textbook examples here; yes, I could write a contrived example, 
but if they didn’t understand from a general description, a single specific 
example is unlikely to help anyone in their specific case.  At the end of the 
day, both functions are an optimizations anyways.  You can write a functional 
virtual table without them, it might just run a tad slower.  If you really need 
that last bit 

Re: [sqlite] Virtual table implementation details

2014-11-20 Thread Alessandro Marzocchi
Sorry but I did not make my question clear... my problem is not only
restoring data to old data (easy, sqlite already does it at some time
during it) but to do other operations during rollback (e.g. notifying the
object that its data has changed, so the object will read it back). What I
have in mind is an architecture similar to model view controller... in
which the sqlite database acts as model.  So it's important for me which
will be the state of db when I'll query it during rollback... hope I could
clarify my question enough. In the evening I'll try to write some more
accurate description of the full architecture I have in mind to see if
anybody may have suggestions. Thank you again

You seem to be forgetting one important detail…  if a Rollback is called,
the underlying data and undo tables will be rolled back automatically.
They are, after all “real” tables in the database, and will be managed as
such.  Unless you’re storing data outside the SQLite database, there is
nothing for your function to do… the SQLite engine will rollback all the
tables and put them into their prior state all on its own.

 -j


On Nov 20, 2014, at 3:26 AM, Alessandro Marzocchi <
alessandro.marzoc...@gmail.com> wrote:

> Good day,
> I'm implementing a serialization&undo/redo engine based on sqlite. My
> implementation (particulars omitted) will work by adding a layer of
virtual
> tables over real sqlite data tables plus a physical table to store the
list
> of modifications made to them (to implement undo functionality). Also a
> callback to the real object is provided to notify when values are changed
> (eg for a rollback or undo). My problem comes on implementing rollback
> function. When rollback is called my thought is to restore old data
directly
> from the data table or the undo table. For example (still simplified)
>
> 1) Virtual table: dataObjectA(id INTEGER PRIMARY KEY, Foo INTEGER)
>Data table: _dataObjectA(id INTEGER PRIMARY KEY, Foo INTEGER)
>Undo table: _undo(step INTEGER PRIMARY KEY, STRING class, id INTEGER,
> variable STRING, oldValue VARIANT, newValue VARIANT),
>objectA[27]->Foo=3; so _dataObjectA contains a single entry (27,3)
> 2) BEGIN
> 3) User calls objectA[27]->setFoo(4): c++ from this calls make an UPDATE
> dataObjectA SET Foo=4 WHERE id=27;
> 4) My xUpdate gets called. Following queries are made: UPDATE _dataObjectA
> SET Foo=4 WHERE id=27; INSERT INTO _undo VALUES (0 /*incremental step
number
> */, 'objectA', 27, 'Foo', 3, 4)
> 5) ROLLBACK
> 6) My xRollback gets called and has to restore previous state (and notify
it
> through the callback)
>
> Now what database state will xRollback see? The one before (4) or the one
> after? Looking at code I see xRollback in sqlite3.c (3.8.7.1 amalgamation)
> is called with following call stack:
> 112789: static void callFinaliser(sqlite3 *db, int offset)
> 112838: sqlite3VtabRollback
>
> Which itself may be called from:
> 125934 sqlite3RollbackAll /* After sqlite3BtreeRollback is called */
> 125744 sqlite3Close /* Not meaningfull on my case */
>
> So it seems that when my xRollback will be called it will see old data.
Can
> anybody confirm me there are no corner case in which something different
> could happened and may I assume that this behavior will be kept in future
> releases too?
>
> Also, even if I'm afraid the answer is not is there a way to force
> "reloading" of virtual table(s)? In my implementation columns can be added
> (in virtual as in real data table) through another special virtual table
> (following the example before user may do a INSERT INTO classDef VALUES
> ('ObjectA', 'Bar', 'STRING') and this would add a column to _dataObjectA.
> Have I any ways to force (other than forced DROP TABLE/CREATE VIRTUAL
TABLE)
> reloading so I can notify sqlite of new schema with sqlite3_declare_vtab?
>
> Thank you for your help
> Alessandro
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it, but
showing it to the wrong people has the tendency to make them feel
uncomfortable." -- Angela Johnson





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table implementation details

2014-11-20 Thread Jay Kreibich

You seem to be forgetting one important detail…  if a Rollback is called, the 
underlying data and undo tables will be rolled back automatically.  They are, 
after all “real” tables in the database, and will be managed as such.  Unless 
you’re storing data outside the SQLite database, there is nothing for your 
function to do… the SQLite engine will rollback all the tables and put them 
into their prior state all on its own.

 -j


On Nov 20, 2014, at 3:26 AM, Alessandro Marzocchi 
 wrote:

> Good day,
> I'm implementing a serialization&undo/redo engine based on sqlite. My
> implementation (particulars omitted) will work by adding a layer of virtual
> tables over real sqlite data tables plus a physical table to store the list
> of modifications made to them (to implement undo functionality). Also a
> callback to the real object is provided to notify when values are changed
> (eg for a rollback or undo). My problem comes on implementing rollback
> function. When rollback is called my thought is to restore old data directly
> from the data table or the undo table. For example (still simplified)
> 
> 1) Virtual table: dataObjectA(id INTEGER PRIMARY KEY, Foo INTEGER)
>Data table: _dataObjectA(id INTEGER PRIMARY KEY, Foo INTEGER)
>Undo table: _undo(step INTEGER PRIMARY KEY, STRING class, id INTEGER,
> variable STRING, oldValue VARIANT, newValue VARIANT),
>objectA[27]->Foo=3; so _dataObjectA contains a single entry (27,3)
> 2) BEGIN
> 3) User calls objectA[27]->setFoo(4): c++ from this calls make an UPDATE
> dataObjectA SET Foo=4 WHERE id=27;
> 4) My xUpdate gets called. Following queries are made: UPDATE _dataObjectA
> SET Foo=4 WHERE id=27; INSERT INTO _undo VALUES (0 /*incremental step number
> */, 'objectA', 27, 'Foo', 3, 4)
> 5) ROLLBACK
> 6) My xRollback gets called and has to restore previous state (and notify it
> through the callback)
> 
> Now what database state will xRollback see? The one before (4) or the one
> after? Looking at code I see xRollback in sqlite3.c (3.8.7.1 amalgamation)
> is called with following call stack:
> 112789: static void callFinaliser(sqlite3 *db, int offset)
> 112838: sqlite3VtabRollback
> 
> Which itself may be called from:
> 125934 sqlite3RollbackAll /* After sqlite3BtreeRollback is called */
> 125744 sqlite3Close /* Not meaningfull on my case */
> 
> So it seems that when my xRollback will be called it will see old data. Can
> anybody confirm me there are no corner case in which something different
> could happened and may I assume that this behavior will be kept in future
> releases too?
> 
> Also, even if I'm afraid the answer is not is there a way to force
> "reloading" of virtual table(s)? In my implementation columns can be added
> (in virtual as in real data table) through another special virtual table
> (following the example before user may do a INSERT INTO classDef VALUES
> ('ObjectA', 'Bar', 'STRING') and this would add a column to _dataObjectA.
> Have I any ways to force (other than forced DROP TABLE/CREATE VIRTUAL TABLE)
> reloading so I can notify sqlite of new schema with sqlite3_declare_vtab?
> 
> Thank you for your help
> Alessandro
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

--  
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it, but showing 
it to the wrong people has the tendency to make them feel uncomfortable." -- 
Angela Johnson





___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table implementation details

2014-11-20 Thread Alessandro Marzocchi
Good day,
 I'm implementing a serialization&undo/redo engine based on sqlite. My
implementation (particulars omitted) will work by adding a layer of virtual
tables over real sqlite data tables plus a physical table to store the list
of modifications made to them (to implement undo functionality). Also a
callback to the real object is provided to notify when values are changed
(eg for a rollback or undo). My problem comes on implementing rollback
function. When rollback is called my thought is to restore old data directly
from the data table or the undo table. For example (still simplified)

1) Virtual table: dataObjectA(id INTEGER PRIMARY KEY, Foo INTEGER)
Data table: _dataObjectA(id INTEGER PRIMARY KEY, Foo INTEGER)
Undo table: _undo(step INTEGER PRIMARY KEY, STRING class, id INTEGER,
variable STRING, oldValue VARIANT, newValue VARIANT),
objectA[27]->Foo=3; so _dataObjectA contains a single entry (27,3)
2) BEGIN
3) User calls objectA[27]->setFoo(4): c++ from this calls make an UPDATE
dataObjectA SET Foo=4 WHERE id=27;
4) My xUpdate gets called. Following queries are made: UPDATE _dataObjectA
SET Foo=4 WHERE id=27; INSERT INTO _undo VALUES (0 /*incremental step number
*/, 'objectA', 27, 'Foo', 3, 4)
5) ROLLBACK
6) My xRollback gets called and has to restore previous state (and notify it
through the callback)

Now what database state will xRollback see? The one before (4) or the one
after? Looking at code I see xRollback in sqlite3.c (3.8.7.1 amalgamation)
is called with following call stack:
112789: static void callFinaliser(sqlite3 *db, int offset)
112838: sqlite3VtabRollback

Which itself may be called from:
125934 sqlite3RollbackAll /* After sqlite3BtreeRollback is called */
125744 sqlite3Close /* Not meaningfull on my case */

So it seems that when my xRollback will be called it will see old data. Can
anybody confirm me there are no corner case in which something different
could happened and may I assume that this behavior will be kept in future
releases too?

Also, even if I'm afraid the answer is not is there a way to force
"reloading" of virtual table(s)? In my implementation columns can be added
(in virtual as in real data table) through another special virtual table
(following the example before user may do a INSERT INTO classDef VALUES
('ObjectA', 'Bar', 'STRING') and this would add a column to _dataObjectA.
Have I any ways to force (other than forced DROP TABLE/CREATE VIRTUAL TABLE)
reloading so I can notify sqlite of new schema with sqlite3_declare_vtab?

Thank you for your help
 Alessandro
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-05 Thread Alek Paunov

On 05.03.2014 11:02, RSmith wrote:


On 2014/03/05 10:41, Dominique Devienne wrote:

On Tue, Mar 4, 2014 at 9:41 PM, Elefterios Stamatogiannakis

One thing that IMHO long term might improve the situation would be if
SQLite's own "native" tables would use the same Virtual Table API,//...

...//Of course, the above is a "naive" abstract reflection which ignores
the realities of VDBE, so it may sound rubbish to actual SQLite
developers. Apologies for that. --DD


I don't think it is rubbish at all, but maybe idealistic.  The biggest
problem I can see from making API's pov is that you can at any time
alter, update, change the way SQLIte (or any other API) works with the
base check that the input values produce the same (or maybe
more-correct) results.  Once you let the VT use the same API, any change
is a potential change to how other people's programmed interfaces need
to talk to - or get data from - the SQLite engine. This cannot simply
change on a whim, so the levels of separation remain needed.

That said, I'm all for making a more efficient VT API, but it would
probably need to be "new" functionality since I cannot see how the
existing interface could implement any of the mentioned enhancements
without breaking existing behaviour. The OP's xNextRow suggestion seems
a good idea, but opens up a whole can of what-ifs which other posters
have alluded to, but something to that effect might be worthwhile if the
efficiency bonus is significant.



The whole thread so far is based on the OP observations in mixed C/PyPy 
and apsw/CPython environments (as being said already, we suffering the 
noise in both cases).


To be helpful to the SQLite team, before proposing any changes, please 
let someone show some well designed, pure C vtable implementation 
demonstrating the possible vtable interface inefficiency.


Let's remember that all xNextRow, xNextPage optimizations are 
applicable only for "select *" cases, not in the general "select f(x), 
y" case.


Kind regards,
Alek

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-05 Thread RSmith


On 2014/03/05 10:41, Dominique Devienne wrote:

On Tue, Mar 4, 2014 at 9:41 PM, Elefterios Stamatogiannakis

One thing that IMHO long term might improve the situation would be if
SQLite's own "native" tables would use the same Virtual Table API,//...

...//Of course, the above is a "naive" abstract reflection which ignores
the realities of VDBE, so it may sound rubbish to actual SQLite
developers. Apologies for that. --DD


I don't think it is rubbish at all, but maybe idealistic.  The biggest problem I can see from making API's pov is that you can at 
any time alter, update, change the way SQLIte (or any other API) works with the base check that the input values produce the same 
(or maybe more-correct) results.  Once you let the VT use the same API, any change is a potential change to how other people's 
programmed interfaces need to talk to - or get data from - the SQLite engine. This cannot simply change on a whim, so the levels of 
separation remain needed.


That said, I'm all for making a more efficient VT API, but it would probably need to be "new" functionality since I cannot see how 
the existing interface could implement any of the mentioned enhancements without breaking existing behaviour. The OP's xNextRow 
suggestion seems a good idea, but opens up a whole can of what-ifs which other posters have alluded to, but something to that effect 
might be worthwhile if the efficiency bonus is significant.


Cheers
Ryan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-05 Thread Dominique Devienne
On Tue, Mar 4, 2014 at 9:41 PM, Elefterios Stamatogiannakis
 wrote:
> [...] Nevertheless, having people do crazy/clever hacks like that to avoid the
> inefficiencies of an API, gives a hint that something might be problematic.
>
> In a previous email of mine, i had said half seriously/half joking about the
> cost in Watts of SQLite's worldwide usage. For something that is so widely
> used, even some % of efficiency improvement really makes a difference. It is
> not an "SQLite destroys/saves civilization" kind of difference, but IMHO it
> would be measurable in G/M Watts.

The SQLite developers clearly care about performance, that much is clear.

But they also care about the "lite"-ness aspects too (and so does part
of its community, judging by this list).

They try to find the right balance for them, with ease of
implementation/testing/maintenance as the last variable of this
equation.

One thing that IMHO long term might improve the situation would be if
SQLite's own "native" tables would use the same Virtual Table API,
because then any bottleneck would apply to all "tables", not just the
VT ones, and would be more likely to be removed. And as recently
discussed, the VT API would be forced to acquire ways to communicate
with Explain Query Plan (for example) to notice Full Scans or Covering
Indexes not just for the "native" tables. VTs came after the "native"
tables, so the current situation is normal, but if all table accesses,
"native" or otherwise, could go thru the same API in the future (in
SQLite3 or Sqlite4), it would further decouple the "front-end" from
the "back-end", and ensure no performance differences between "native"
and "virtual" tables.

Of course, the above is a "naive" abstract reflection which ignores
the realities of VDBE, so it may sound rubbish to actual SQLite
developers. Apologies for that. --DD
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Elefterios Stamatogiannakis
Also, i want to note the pretty clever "hack" from Hick Gunter that 
works by passing a "proxy" for the row (instead of the whole row's 
values) to SQLite. This proxy gets expanded using the xColumn API by a 
virtual table outside the whole query.


In this way a query using multiple VTs, only needs to incur the xColumn 
performance penalty only once.


Nevertheless, having people do crazy/clever hacks like that to avoid the 
inefficiencies of an API, gives a hint that something might be problematic.


In a previous email of mine, i had said half seriously/half joking about 
the cost in Watts of SQLite's worldwide usage. For something that is so 
widely used, even some % of efficiency improvement really makes a 
difference. It is not an "SQLite destroys/saves civilization" kind of 
difference, but IMHO it would be measurable in G/M Watts.


l.

On 4/3/2014 11:33 μμ, Alek Paunov wrote:

On 04.03.2014 20:25, Eleytherios Stamatogiannakis wrote:

On 04/03/14 20:11, Alek Paunov wrote:

On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:

On 03/03/14 03:01, Alek Paunov wrote:

It seems that the easiest optimization for this (very often) VT use
case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a
more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary
record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed
over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.



It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?


In the link that you posted above, look for all the sqlite3_bind_x
functions. In SQLite the bind API is used to pass parameters to prepared
statements. The way the the bind API works is that you have a statement
parameter "row", and you fill it by saying:

Set column 1 of statement parameter "row" to an int with value 10
Set column 2 of statement parameter "row" to an float with value 3.5
...

So instead of SQLite calling back for each column, in the bind API
"way", the program calls SQLite to fill a row's values.



Ah, OK. Let see if I finally understood your idea:

- exec "insert into t select * from vt"
- VDBE calls xNextRow
- in xNextRow implementation, the Python code calls something like
bind_xxx for each column with the scalar addresses (allocated by you)
- xNextRow returns, VDBE inserts the row, you clean on next step

Questions:

What stops you to make this wrapper right now (e.g. as apsw patch or
standalone sqlite add-on loaded by PyPy FFI)?

How you expect this model (managing one per cell count of scalar
allocations during the query) to perform in comparison with passing
encoded row pages (memory chinks) between sqlite and the script engine
especially when it is not PyPy or LuaJIT?

Regards,
Alek



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Elefterios Stamatogiannakis

On 4/3/2014 11:33 μμ, Alek Paunov wrote:

On 04.03.2014 20:25, Eleytherios Stamatogiannakis wrote:

On 04/03/14 20:11, Alek Paunov wrote:

On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:

On 03/03/14 03:01, Alek Paunov wrote:

It seems that the easiest optimization for this (very often) VT use
case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a
more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary
record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed
over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.



It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?


In the link that you posted above, look for all the sqlite3_bind_x
functions. In SQLite the bind API is used to pass parameters to prepared
statements. The way the the bind API works is that you have a statement
parameter "row", and you fill it by saying:

Set column 1 of statement parameter "row" to an int with value 10
Set column 2 of statement parameter "row" to an float with value 3.5
...

So instead of SQLite calling back for each column, in the bind API
"way", the program calls SQLite to fill a row's values.



Ah, OK. Let see if I finally understood your idea:

- exec "insert into t select * from vt"
- VDBE calls xNextRow
- in xNextRow implementation, the Python code calls something like
bind_xxx for each column with the scalar addresses (allocated by you)
- xNextRow returns, VDBE inserts the row, you clean on next step

Questions:

What stops you to make this wrapper right now (e.g. as apsw patch or
standalone sqlite add-on loaded by PyPy FFI)?


Yes these things can be done. We have written our own APSW compatible 
FFI for PyPy, in which we have tried techniques like the one you 
describe. The end result wasn't that much faster. Because no matter how 
many wrappers there are above VT's API, the efficiency characteristics 
of it are still there.


How you expect this model (managing one per cell count of scalar
allocations during the query) to perform in comparison with passing
encoded row pages (memory chinks) between sqlite and the script engine
especially when it is not PyPy or LuaJIT?



Essentially in my first post about VT's API efficiency the comparison 
was between the bind API (similar to your "encoded pages" idea but for a 
single row), and the current xColumn based VT API.


In our tests with Python, the difference between the two APIs was 3x. 
I've seen another post here by somebody else who was using a native 
compiled VT implementation saying that the difference was 2x. Both of 
these observations say that there is some inefficiency in the current 
situation with the VT API.


If this API is also used within SQLite's VM, maybe adding an xNextRow 
opcode, would speedup SQLite's VM too?


Best,

l.


Regards,
Alek



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Alek Paunov

On 04.03.2014 20:25, Eleytherios Stamatogiannakis wrote:

On 04/03/14 20:11, Alek Paunov wrote:

On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:

On 03/03/14 03:01, Alek Paunov wrote:

It seems that the easiest optimization for this (very often) VT use
case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary
record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.



It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?


In the link that you posted above, look for all the sqlite3_bind_x
functions. In SQLite the bind API is used to pass parameters to prepared
statements. The way the the bind API works is that you have a statement
parameter "row", and you fill it by saying:

Set column 1 of statement parameter "row" to an int with value 10
Set column 2 of statement parameter "row" to an float with value 3.5
...

So instead of SQLite calling back for each column, in the bind API
"way", the program calls SQLite to fill a row's values.



Ah, OK. Let see if I finally understood your idea:

- exec "insert into t select * from vt"
- VDBE calls xNextRow
- in xNextRow implementation, the Python code calls something like 
bind_xxx for each column with the scalar addresses (allocated by you)

- xNextRow returns, VDBE inserts the row, you clean on next step

Questions:

What stops you to make this wrapper right now (e.g. as apsw patch or 
standalone sqlite add-on loaded by PyPy FFI)?


How you expect this model (managing one per cell count of scalar 
allocations during the query) to perform in comparison with passing 
encoded row pages (memory chinks) between sqlite and the script engine 
especially when it is not PyPy or LuaJIT?


Regards,
Alek

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Eleytherios Stamatogiannakis

On 04/03/14 20:11, Alek Paunov wrote:

On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:

On 03/03/14 03:01, Alek Paunov wrote:

It seems that the easiest optimization for this (very often) VT use case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.



It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?


In the link that you posted above, look for all the sqlite3_bind_x 
functions. In SQLite the bind API is used to pass parameters to prepared 
statements. The way the the bind API works is that you have a statement 
parameter "row", and you fill it by saying:


Set column 1 of statement parameter "row" to an int with value 10
Set column 2 of statement parameter "row" to an float with value 3.5
...

So instead of SQLite calling back for each column, in the bind API 
"way", the program calls SQLite to fill a row's values.


Regards.

l.




For a paging API (which IMHO is too complex ATM), the bind API could be
extended with a row number parameter.




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Alek Paunov

On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote:

On 03/03/14 03:01, Alek Paunov wrote:

It seems that the easiest optimization for this (very often) VT use case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed over
to it. It already has the "bind" API which would be very suitable for
the xNextRow function too.



It seems that I do not know the sqlite3_ API very well.

http://www.sqlite.org/c3ref/funclist.html

Would you like to point me to the "bind" API page?


For a paging API (which IMHO is too complex ATM), the bind API could be
extended with a row number parameter.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Eleytherios Stamatogiannakis

Thank you for the explanation.

Answers inline.

On 04/03/14 16:16, Hick Gunter wrote:

Your VT1 table already has an xColumn implementation, possibly doing something 
like

 switch( p_column )
 {
 case 1: sqlite3_result_xxx( p_ctx, v_rec->f1, ...); break;
 ...
 case n: sqlite3_result_xxx( p_ctx, v_rec->fn, ...); break;
 }

This needs to have two cases added:

 case n+1: sqlite3_result_int64( p_ctx, (uintptr_t)v_rec );
 case n+2: sqlite3_result_int64( p_ctx, (uintptr_t)func );

where

 static int func( p_rec, p_ctx, p_column );

calls

 xColumn( v_cursor, p_ctx, p_column );

with a dummy cursor structure as defined for your table.

The VT2 table can then prepare "select __rec,__func from VT1", and in its 
xColumn implementation it calls

   v_rec  = (void *)sqlite3_column_int64( v_stmt, 0 ); // this can be 
stored and cleared in the xNext function
 v_func = (func *)sqlite3_column_int64( v_stmt, 1 ); // this can be 
stored
 v_func( v_rec, p_ctx, p_column );


I see, so you do a similar trick as what we do with passing Python's 
generators as values in SQLite.




As for your second example, as written it does not suffer from the effect 
because you are already selecting c1, c2 and c3 at the bottom level.

Rewritten as

Select processrow(c1,c2,c3) from VT2(select * from VT1);


Without knowing what VT2 will do, I don't think that this rewritting can 
happen. For example, "processrow" might return generators (nested 
tables), that get expanded by VT2. If you moved it outside VT2, then the 
generators would not be expanded.


Regards,

l.



results in the VT1 xColumn function getting called (via the VT2 xColumn 
function) just 3 times per row.

Additionally, you may like to "select __func from VT1 limit 1" and store that in your 
xFilter implementation; and then "select __rec from VT1" in your xNext implementation to 
have sqlite3_result_int64() called half as often.

HTH

-Ursprüngliche Nachricht-
Von: Eleytherios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Dienstag, 04. März 2014 14:15
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

Could you explain some more your solution?

Does it work in this following case:

select * from VT2(select * from VT1);

by directly passing the rows from VT1 to VT2 (short-circuiting SQLite)?

What would happen in the following case?:

select * from VT2(select processrow(c1,c2,c3) from VT1);

Regards,

l.

On 03/03/14 14:17, Hick Gunter wrote:

We have gotten around this problem by defining "virtual" fields that contain a reference 
to the "current record" and the entrypoint of a wrapper around the xColumn function. That 
way only two fields get passed upwards through the virtual table stack and the top level virtual 
table's xColumn implementation calls straight through to the bottom layer's wrapper.

It does take some care to avoid sorting in between the layers and 
re-preparation of statements on schema changes.

-Ursprüngliche Nachricht-
Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Sonntag, 02. März 2014 20:39
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

We have both input and output virtual tables that avoid hitting the hard disk 
and are also able to compress the incoming and outgoing data.

We have a virtual table that takes as input a query and sends the data to a port on another 
machine. This virtual table is called "OUTPUT". And another virtual table that takes as 
input data from another port and forwards it into SQLite. Lets call it "INPUT". A query 
that uses these two virtual tables would look like this in madIS:

OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081');

We actually use queries like above (actually we don't do it directly to ports 
but to buffered named pipes that are then forwarded via netcat) to run 
distributed queries on clusters, connecting all the local SQLite/madIS 
instances on the different machines together.

The main point that i want to make with above explanation is that we don't view 
SQLite only as a traditional database. We also view it as a data stream 
processing machine, that doesn't have the requirement for the data to be stored 
on a hard disk.

Under this view, the efficiency of the virtual table api is very important. 
Above query only uses 2 VTs in it, but we have other queries that use a lot 
more VTs than that.

estama


On 2/3/2014 9:34 ìì, Max Vlasov wrote:

On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
 wrote:


Our main test case is TPCH, a standard DB benchmark. The "lineitem"
table of TPCH contains 16 columns, which for 10M rows would require
160M xColumn callbacks, to pass it through the virtual 

Re: [sqlite] Virtual table API performance

2014-03-04 Thread Eleytherios Stamatogiannakis
In our tests the bind API can acquire from the Python side more than 20 
values in a single call, at the same time that xColumn acquires 2 values.


Most of the cost is in the callback and not in submitting a row's values 
through bind's API .


So with the exception of queries that need only 1 column, IMHO 
everything else should go through the xNextRow API.


To keep the complexity to the lowest minimum, my proposal is to use 
xNextRow API only for queries that only "scan" over a virtual table (no 
filtering).


l.

On 04/03/14 18:23, Hick Gunter wrote:

My guess: Yes.

It would require implementing an new opcode, either only for virtual tables or 
also for native tables too, that accepts a list of field numbers (currently 
there are only 5 parameters possible for an opcode and some of them have fixed 
meanings).

And the logic to generate theses opcodes based on the capabilities of the loaded table 
module combined with the requirements of the subject query (fields required for JOIN are 
fetched separately from those required for the result set) and the result of the 
xBestIndex calls (where it is possible to set the "omit" flag to suppress 
generation of a comparison). This also adds to the complexity of register allocation.

Take for example a join that needs 3 fields for the comparison, 2 of which are 
also required for the result set of 7 fields total. Do you request all 10 
fields at once (wastes up to 9 fields worth of effort if the JOIN is not met)? 
Or the 3 fields first and the 5 others only if the join matches (must allocate 
consecutive registers to build a result set)? Or 3 first and then 7 (which 
approximates the current behavior, as the 2 common fields are fetched twice on 
a match)?

And a set of new sqlite3_result routines that specify which of the various 
requested fields' value is being set.

-Ursprüngliche Nachricht-
Von: J. Merrill [mailto:j.merr...@enlyton.com]
Gesendet: Dienstag, 04. März 2014 16:23
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

Eleytherios Stamatogiannakis wrote

Our main test case is TPCH, a standard DB benchmark. The "lineitem"
table of TPCH contains 16 columns, which for 10M rows would require
160M xColumn callbacks, to pass it through the virtual table API.
These callbacks are very expensive, especially when at the other end
sits a VM (CPython or PyPy) handling them.


Would it be very difficult to arrange for an option that would request that 
SQLite issue a single more-complex xMultiColumns (a sample name) callback 
request, with a way for multiple results to be returned, rather than many 
xColumn callbacks? This would reduce the number of calls across the VM boundary.

Applications that don't implement xMultiColumns (and request its use) would see 
no change; those that do would get the performance boost.

J. Merrill



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Why-does-type-affinity-declared-on-a-foreign-key-column-affect-join-speed-tp74183p74295.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---
Gunter Hick
Software Engineer

Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna,
Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then
delete this message from your system. Please do not copy it or use it for any 
purposes, or disclose its contents to any person as to do so could be a breach 
of confidence. Thank you for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Hick Gunter
My guess: Yes.

It would require implementing an new opcode, either only for virtual tables or 
also for native tables too, that accepts a list of field numbers (currently 
there are only 5 parameters possible for an opcode and some of them have fixed 
meanings).

And the logic to generate theses opcodes based on the capabilities of the 
loaded table module combined with the requirements of the subject query (fields 
required for JOIN are fetched separately from those required for the result 
set) and the result of the xBestIndex calls (where it is possible to set the 
"omit" flag to suppress generation of a comparison). This also adds to the 
complexity of register allocation.

Take for example a join that needs 3 fields for the comparison, 2 of which are 
also required for the result set of 7 fields total. Do you request all 10 
fields at once (wastes up to 9 fields worth of effort if the JOIN is not met)? 
Or the 3 fields first and the 5 others only if the join matches (must allocate 
consecutive registers to build a result set)? Or 3 first and then 7 (which 
approximates the current behavior, as the 2 common fields are fetched twice on 
a match)?

And a set of new sqlite3_result routines that specify which of the various 
requested fields' value is being set.

-Ursprüngliche Nachricht-
Von: J. Merrill [mailto:j.merr...@enlyton.com]
Gesendet: Dienstag, 04. März 2014 16:23
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

Eleytherios Stamatogiannakis wrote
> Our main test case is TPCH, a standard DB benchmark. The "lineitem"
> table of TPCH contains 16 columns, which for 10M rows would require
> 160M xColumn callbacks, to pass it through the virtual table API.
> These callbacks are very expensive, especially when at the other end
> sits a VM (CPython or PyPy) handling them.

Would it be very difficult to arrange for an option that would request that 
SQLite issue a single more-complex xMultiColumns (a sample name) callback 
request, with a way for multiple results to be returned, rather than many 
xColumn callbacks? This would reduce the number of calls across the VM boundary.

Applications that don't implement xMultiColumns (and request its use) would see 
no change; those that do would get the performance boost.

J. Merrill



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Why-does-type-affinity-declared-on-a-foreign-key-column-affect-join-speed-tp74183p74295.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


---
Gunter Hick
Software Engineer

Scientific Games International GmbH
Klitschgasse 2 – 4, A - 1130 Vienna,
Austria
FN 157284 a, HG Wien
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This e-mail is confidential and may well also be legally privileged. If you 
have received it in error, you are on notice as to its status and accordingly 
please notify us immediately by reply e-mail and then
delete this message from your system. Please do not copy it or use it for any 
purposes, or disclose its contents to any person as to do so could be a breach 
of confidence. Thank you for your cooperation.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread J. Merrill
Eleytherios Stamatogiannakis wrote
> Our main test case is TPCH, a standard DB benchmark. The "lineitem" 
> table of TPCH contains 16 columns, which for 10M rows would require 160M 
> xColumn callbacks, to pass it through the virtual table API. These 
> callbacks are very expensive, especially when at the other end sits a VM 
> (CPython or PyPy) handling them.

Would it be very difficult to arrange for an option that would request that
SQLite issue a single more-complex xMultiColumns (a sample name) callback
request, with a way for multiple results to be returned, rather than many
xColumn callbacks? This would reduce the number of calls across the VM
boundary.

Applications that don't implement xMultiColumns (and request its use) would
see no change; those that do would get the performance boost.

J. Merrill



--
View this message in context: 
http://sqlite.1065341.n5.nabble.com/Why-does-type-affinity-declared-on-a-foreign-key-column-affect-join-speed-tp74183p74295.html
Sent from the SQLite mailing list archive at Nabble.com.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-04 Thread Hick Gunter
Your VT1 table already has an xColumn implementation, possibly doing something 
like

switch( p_column )
{
case 1: sqlite3_result_xxx( p_ctx, v_rec->f1, ...); break;
...
case n: sqlite3_result_xxx( p_ctx, v_rec->fn, ...); break;
}

This needs to have two cases added:

case n+1: sqlite3_result_int64( p_ctx, (uintptr_t)v_rec );
case n+2: sqlite3_result_int64( p_ctx, (uintptr_t)func );

where

static int func( p_rec, p_ctx, p_column );

calls

xColumn( v_cursor, p_ctx, p_column );

with a dummy cursor structure as defined for your table.

The VT2 table can then prepare "select __rec,__func from VT1", and in its 
xColumn implementation it calls

  v_rec  = (void *)sqlite3_column_int64( v_stmt, 0 ); // this can be stored 
and cleared in the xNext function
v_func = (func *)sqlite3_column_int64( v_stmt, 1 ); // this can be 
stored
v_func( v_rec, p_ctx, p_column );


As for your second example, as written it does not suffer from the effect 
because you are already selecting c1, c2 and c3 at the bottom level.

Rewritten as

Select processrow(c1,c2,c3) from VT2(select * from VT1);

results in the VT1 xColumn function getting called (via the VT2 xColumn 
function) just 3 times per row.

Additionally, you may like to "select __func from VT1 limit 1" and store that 
in your xFilter implementation; and then "select __rec from VT1" in your xNext 
implementation to have sqlite3_result_int64() called half as often.

HTH

-Ursprüngliche Nachricht-
Von: Eleytherios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Dienstag, 04. März 2014 14:15
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

Could you explain some more your solution?

Does it work in this following case:

select * from VT2(select * from VT1);

by directly passing the rows from VT1 to VT2 (short-circuiting SQLite)?

What would happen in the following case?:

select * from VT2(select processrow(c1,c2,c3) from VT1);

Regards,

l.

On 03/03/14 14:17, Hick Gunter wrote:
> We have gotten around this problem by defining "virtual" fields that contain 
> a reference to the "current record" and the entrypoint of a wrapper around 
> the xColumn function. That way only two fields get passed upwards through the 
> virtual table stack and the top level virtual table's xColumn implementation 
> calls straight through to the bottom layer's wrapper.
>
> It does take some care to avoid sorting in between the layers and 
> re-preparation of statements on schema changes.
>
> -Ursprüngliche Nachricht-
> Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com]
> Gesendet: Sonntag, 02. März 2014 20:39
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] Virtual table API performance
>
> We have both input and output virtual tables that avoid hitting the hard disk 
> and are also able to compress the incoming and outgoing data.
>
> We have a virtual table that takes as input a query and sends the data to a 
> port on another machine. This virtual table is called "OUTPUT". And another 
> virtual table that takes as input data from another port and forwards it into 
> SQLite. Lets call it "INPUT". A query that uses these two virtual tables 
> would look like this in madIS:
>
> OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081');
>
> We actually use queries like above (actually we don't do it directly to ports 
> but to buffered named pipes that are then forwarded via netcat) to run 
> distributed queries on clusters, connecting all the local SQLite/madIS 
> instances on the different machines together.
>
> The main point that i want to make with above explanation is that we don't 
> view SQLite only as a traditional database. We also view it as a data stream 
> processing machine, that doesn't have the requirement for the data to be 
> stored on a hard disk.
>
> Under this view, the efficiency of the virtual table api is very important. 
> Above query only uses 2 VTs in it, but we have other queries that use a lot 
> more VTs than that.
>
> estama
>
>
> On 2/3/2014 9:34 ìì, Max Vlasov wrote:
>> On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
>>  wrote:
>>>
>>> Our main test case is TPCH, a standard DB benchmark. The "lineitem"
>>> table of TPCH contains 16 columns, which for 10M rows would require
>>> 160M xColumn callbacks, to pass it through the virtual table API.
>>> These callbacks are very expensive, especially when at the other end
>>> sits a VM (CPython or PyPy) handling them.
>>>
>>
>> Ok, not stating that the performance 

Re: [sqlite] Virtual table API performance

2014-03-04 Thread Eleytherios Stamatogiannakis

Could you explain some more your solution?

Does it work in this following case:

select * from VT2(select * from VT1);

by directly passing the rows from VT1 to VT2 (short-circuiting SQLite)?

What would happen in the following case?:

select * from VT2(select processrow(c1,c2,c3) from VT1);

Regards,

l.

On 03/03/14 14:17, Hick Gunter wrote:

We have gotten around this problem by defining "virtual" fields that contain a reference 
to the "current record" and the entrypoint of a wrapper around the xColumn function. That 
way only two fields get passed upwards through the virtual table stack and the top level virtual 
table's xColumn implementation calls straight through to the bottom layer's wrapper.

It does take some care to avoid sorting in between the layers and 
re-preparation of statements on schema changes.

-Ursprüngliche Nachricht-
Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Sonntag, 02. März 2014 20:39
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

We have both input and output virtual tables that avoid hitting the hard disk 
and are also able to compress the incoming and outgoing data.

We have a virtual table that takes as input a query and sends the data to a port on another 
machine. This virtual table is called "OUTPUT". And another virtual table that takes as 
input data from another port and forwards it into SQLite. Lets call it "INPUT". A query 
that uses these two virtual tables would look like this in madIS:

OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081');

We actually use queries like above (actually we don't do it directly to ports 
but to buffered named pipes that are then forwarded via netcat) to run 
distributed queries on clusters, connecting all the local SQLite/madIS 
instances on the different machines together.

The main point that i want to make with above explanation is that we don't view 
SQLite only as a traditional database. We also view it as a data stream 
processing machine, that doesn't have the requirement for the data to be stored 
on a hard disk.

Under this view, the efficiency of the virtual table api is very important. 
Above query only uses 2 VTs in it, but we have other queries that use a lot 
more VTs than that.

estama


On 2/3/2014 9:34 ìì, Max Vlasov wrote:

On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
 wrote:


Our main test case is TPCH, a standard DB benchmark. The "lineitem"
table of TPCH contains 16 columns, which for 10M rows would require
160M xColumn callbacks, to pass it through the virtual table API.
These callbacks are very expensive, especially when at the other end
sits a VM (CPython or PyPy) handling them.



Ok, not stating that the performance improvment is impossible, I will
explain why I'm a little sceptical about it.

For every bulk insert we have a theoretical maxiumum we'd all glad to
see sqlite would perform with - the speed of simple file copying.
Sqlite can't be faster than that, but to be on par is a good goal.
This is not possible when an insert means also modification of other
parts of the file, for example when there's an index involved. But
let's forget about it. Finally when new data is added, sqlite should
write a number of database pages, the cost of this part is absolutely
in the hands of the media (driver) and OS (driver).  But for every
database page write there's also price to pay in CPU units, for many
actions sqlite should do before actual value is translated from what
the developer provided to what actually appears on disk.

The illustration of the CPU price is the following example
   CREATE TABLE t(Value)

on my ssd drive mulitply inserts (thousands)
insert into t (Value) values ('123456689  // this string
contains many symbols, for example 1024) performed with the speed
30 MB/Sec

but the query
insert into t (Value) values (10)  // this is a small integer
value only
3 Mb/Sec

Both shows almost full cpu load. Why such difference? Because with
latter query the system can do more than 30 MB of writes in 1 second,
but it should wait for sqlite spending 10 seconds in preparations.
The former is better because CPU cost of passing a large text value to
sqlite is comparatively low comparing to the  time spent in I/O in
writing this on disk.

So CPU price to pay isn't avoidable and notice that in example this is
not virtual table API, this is bind API. I suppose that the price we
pay for CPU spent in virtual table API is on par with an average price
payed in sqlite as a whole. This means that if I transfom the avove
queries into inserts from virtual tables, the final speed difference
will be similar. And this also means that for your comparision tests
(when you get x3 difference), the CPU price sqlite pays inside bind
api and in its code wrapping xColumn call is probably si

Re: [sqlite] Virtual table API performance

2014-03-04 Thread Eleytherios Stamatogiannakis

On 03/03/14 03:01, Alek Paunov wrote:

On 02.03.2014 21:38, Elefterios Stamatogiannakis wrote:

Under this view, the efficiency of the virtual table api is very
important. Above query only uses 2 VTs in it, but we have other queries
that use a lot more VTs than that.


Max tests in C shows 2x CPU work, but he explains that the test is not
very sound, so let's say somewhere between 1x-2x. Your tests - 3x time.

As you have already identified, the real reason probably is the million
scale callback quantity across the VM barrier - I do not follow PyPy,
but see these notes [1] by Mike Pall - the LuaJIT author (LuaJIT is the
leading project in the trace compilers filed):

[1] http://luajit.org/ext_ffi_semantics.html#callback_performance

Also from one of the dozens of threads touching the subject:

[2] http://www.freelists.org/post/luajit/Yielding-across-C-boundaries,3

```
Entering the VM needs a lot of state setup and leaving it isn't
free either. Constantly entering and leaving the VM via a callback
from C *to* Lua has a high overhead. For short callbacks, the
switching overhead between C and Lua may completely dominate the
total CPU time.

Calling an iterator written in C via the FFI *from* a Lua program
is much cheaper -- this compiles down to a simple call instruction.
```


I remember that i had seen above quote from Mike Pall, but i couldn't 
find it.


Thank you for unearthing it.


Unfortunately, for your "insert into t select * from vt" case an the
callback/iterator transformation is not possible (we do not have
repetitive _step call to invert the control somehow). What to do?

It seems that the easiest optimization for this (very often) VT use case
(bulk streaming) is SQLite add-on in _C_ to be written, implementing
vtable interface specialization containing xNextPage "buffering" let's
say 4K rows or even better 16KB data (in addition to your initial
proposal of xNextRow).

The technical question is: how the rows to be encoded? You said
initially that you use some compressed format. But for such extension,
to gain more traction in the future, it would be better probably a more
standard format to be chosen.

a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary record
streaming could be discussed and adopted across the SQLite binding and
add-on developers. The possible applications are not limited only to
vtables ;-).


SQLite doesn't need any special format for the records to be passed over 
to it. It already has the "bind" API which would be very suitable for 
the xNextRow function too.


For a paging API (which IMHO is too complex ATM), the bind API could be 
extended with a row number parameter.


Regards,

estama



Kind regards,
Alek



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-03 Thread Hick Gunter
We have gotten around this problem by defining "virtual" fields that contain a 
reference to the "current record" and the entrypoint of a wrapper around the 
xColumn function. That way only two fields get passed upwards through the 
virtual table stack and the top level virtual table's xColumn implementation 
calls straight through to the bottom layer's wrapper.

It does take some care to avoid sorting in between the layers and 
re-preparation of statements on schema changes.

-Ursprüngliche Nachricht-
Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com]
Gesendet: Sonntag, 02. März 2014 20:39
An: sqlite-users@sqlite.org
Betreff: Re: [sqlite] Virtual table API performance

We have both input and output virtual tables that avoid hitting the hard disk 
and are also able to compress the incoming and outgoing data.

We have a virtual table that takes as input a query and sends the data to a 
port on another machine. This virtual table is called "OUTPUT". And another 
virtual table that takes as input data from another port and forwards it into 
SQLite. Lets call it "INPUT". A query that uses these two virtual tables would 
look like this in madIS:

OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081');

We actually use queries like above (actually we don't do it directly to ports 
but to buffered named pipes that are then forwarded via netcat) to run 
distributed queries on clusters, connecting all the local SQLite/madIS 
instances on the different machines together.

The main point that i want to make with above explanation is that we don't view 
SQLite only as a traditional database. We also view it as a data stream 
processing machine, that doesn't have the requirement for the data to be stored 
on a hard disk.

Under this view, the efficiency of the virtual table api is very important. 
Above query only uses 2 VTs in it, but we have other queries that use a lot 
more VTs than that.

estama


On 2/3/2014 9:34 ìì, Max Vlasov wrote:
> On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
>  wrote:
>>
>> Our main test case is TPCH, a standard DB benchmark. The "lineitem"
>> table of TPCH contains 16 columns, which for 10M rows would require
>> 160M xColumn callbacks, to pass it through the virtual table API.
>> These callbacks are very expensive, especially when at the other end
>> sits a VM (CPython or PyPy) handling them.
>>
>
> Ok, not stating that the performance improvment is impossible, I will
> explain why I'm a little sceptical about it.
>
> For every bulk insert we have a theoretical maxiumum we'd all glad to
> see sqlite would perform with - the speed of simple file copying.
> Sqlite can't be faster than that, but to be on par is a good goal.
> This is not possible when an insert means also modification of other
> parts of the file, for example when there's an index involved. But
> let's forget about it. Finally when new data is added, sqlite should
> write a number of database pages, the cost of this part is absolutely
> in the hands of the media (driver) and OS (driver).  But for every
> database page write there's also price to pay in CPU units, for many
> actions sqlite should do before actual value is translated from what
> the developer provided to what actually appears on disk.
>
> The illustration of the CPU price is the following example
>   CREATE TABLE t(Value)
>
> on my ssd drive mulitply inserts (thousands)
>insert into t (Value) values ('123456689  // this string
> contains many symbols, for example 1024) performed with the speed
>30 MB/Sec
>
> but the query
>insert into t (Value) values (10)  // this is a small integer
> value only
>3 Mb/Sec
>
> Both shows almost full cpu load. Why such difference? Because with
> latter query the system can do more than 30 MB of writes in 1 second,
> but it should wait for sqlite spending 10 seconds in preparations.
> The former is better because CPU cost of passing a large text value to
> sqlite is comparatively low comparing to the  time spent in I/O in
> writing this on disk.
>
> So CPU price to pay isn't avoidable and notice that in example this is
> not virtual table API, this is bind API. I suppose that the price we
> pay for CPU spent in virtual table API is on par with an average price
> payed in sqlite as a whole. This means that if I transfom the avove
> queries into inserts from virtual tables, the final speed difference
> will be similar. And this also means that for your comparision tests
> (when you get x3 difference), the CPU price sqlite pays inside bind
> api and in its code wrapping xColumn call is probably similar. The
> rest is the share your code pays.
>
> Wel

Re: [sqlite] Virtual table API performance

2014-03-02 Thread Alek Paunov

On 02.03.2014 21:38, Elefterios Stamatogiannakis wrote:

Under this view, the efficiency of the virtual table api is very
important. Above query only uses 2 VTs in it, but we have other queries
that use a lot more VTs than that.


Max tests in C shows 2x CPU work, but he explains that the test is not 
very sound, so let's say somewhere between 1x-2x. Your tests - 3x time.


As you have already identified, the real reason probably is the million 
scale callback quantity across the VM barrier - I do not follow PyPy, 
but see these notes [1] by Mike Pall - the LuaJIT author (LuaJIT is the 
leading project in the trace compilers filed):


[1] http://luajit.org/ext_ffi_semantics.html#callback_performance

Also from one of the dozens of threads touching the subject:

[2] http://www.freelists.org/post/luajit/Yielding-across-C-boundaries,3

```
Entering the VM needs a lot of state setup and leaving it isn't
free either. Constantly entering and leaving the VM via a callback
from C *to* Lua has a high overhead. For short callbacks, the
switching overhead between C and Lua may completely dominate the
total CPU time.

Calling an iterator written in C via the FFI *from* a Lua program
is much cheaper -- this compiles down to a simple call instruction.
```

Unfortunately, for your "insert into t select * from vt" case an the 
callback/iterator transformation is not possible (we do not have 
repetitive _step call to invert the control somehow). What to do?


It seems that the easiest optimization for this (very often) VT use case 
(bulk streaming) is SQLite add-on in _C_ to be written, implementing 
vtable interface specialization containing xNextPage "buffering" let's 
say 4K rows or even better 16KB data (in addition to your initial 
proposal of xNextRow).


The technical question is: how the rows to be encoded? You said 
initially that you use some compressed format. But for such extension, 
to gain more traction in the future, it would be better probably a more 
standard format to be chosen.


a) Rows represented in native SQLite3 format [3]
b) ... native SQLite4 format
c) Some wide used encoding near to SQLite types [4]
d) ...

[3] http://www.sqlite.org/fileformat.html#record_format
[4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats

IMHO, It would be very nice if some common conventions for binary record 
streaming could be discussed and adopted across the SQLite binding and 
add-on developers. The possible applications are not limited only to 
vtables ;-).


Kind regards,
Alek

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-02 Thread Elefterios Stamatogiannakis
We have both input and output virtual tables that avoid hitting the hard 
disk and are also able to compress the incoming and outgoing data.


We have a virtual table that takes as input a query and sends the data 
to a port on another machine. This virtual table is called "OUTPUT". And 
another virtual table that takes as input data from another port and 
forwards it into SQLite. Lets call it "INPUT". A query that uses these 
two virtual tables would look like this in madIS:


OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081');

We actually use queries like above (actually we don't do it directly to 
ports but to buffered named pipes that are then forwarded via netcat) to 
run distributed queries on clusters, connecting all the local 
SQLite/madIS instances on the different machines together.


The main point that i want to make with above explanation is that we 
don't view SQLite only as a traditional database. We also view it as a 
data stream processing machine, that doesn't have the requirement for 
the data to be stored on a hard disk.


Under this view, the efficiency of the virtual table api is very 
important. Above query only uses 2 VTs in it, but we have other queries 
that use a lot more VTs than that.


estama


On 2/3/2014 9:34 μμ, Max Vlasov wrote:

On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
 wrote:


Our main test case is TPCH, a standard DB benchmark. The "lineitem" table of
TPCH contains 16 columns, which for 10M rows would require 160M xColumn
callbacks, to pass it through the virtual table API. These callbacks are
very expensive, especially when at the other end sits a VM (CPython or PyPy)
handling them.



Ok, not stating that the performance improvment is impossible, I will
explain why I'm a little sceptical about it.

For every bulk insert we have a theoretical maxiumum we'd all glad to
see sqlite would perform with - the speed of simple file copying.
Sqlite can't be faster than that, but to be on par is a good goal.
This is not possible when an insert means also modification of other
parts of the file, for example when there's an index involved. But
let's forget about it. Finally when new data is added, sqlite should
write a number of database pages, the cost of this part is absolutely
in the hands of the media (driver) and OS (driver).  But for every
database page write there's also price to pay in CPU units, for many
actions sqlite should do before actual value is translated from what
the developer provided to what actually appears on disk.

The illustration of the CPU price is the following example
  CREATE TABLE t(Value)

on my ssd drive mulitply inserts (thousands)
   insert into t (Value) values ('123456689  // this string
contains many symbols, for example 1024)
performed with the speed
   30 MB/Sec

but the query
   insert into t (Value) values (10)  // this is a small integer value
only
   3 Mb/Sec

Both shows almost full cpu load. Why such difference? Because with
latter query the system can do more than 30 MB of writes in 1 second,
but it should wait for sqlite spending 10 seconds in preparations.
The former is better because CPU cost of passing a large text value to
sqlite is comparatively low comparing to the  time spent in I/O in
writing this on disk.

So CPU price to pay isn't avoidable and notice that in example this is
not virtual table API, this is bind API. I suppose that the price we
pay for CPU spent in virtual table API is on par with an average price
payed in sqlite as a whole. This means that if I transfom the avove
queries into inserts from virtual tables, the final speed difference
will be similar. And this also means that for your comparision tests
(when you get x3 difference), the CPU price sqlite pays inside bind
api and in its code wrapping xColumn call is probably similar. The
rest is the share your code pays.

Well, I know that there are differences in CPU architectures and
probably there are platform where compiled code for bind api and
virtual tables api behaves a little differently making the costs more
diffrent. But imagine that hard task of fine tuning and refactoring
just to get a noticeable difference for a particular platform.


Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-02 Thread Max Vlasov
On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis
 wrote:
>
> Our main test case is TPCH, a standard DB benchmark. The "lineitem" table of
> TPCH contains 16 columns, which for 10M rows would require 160M xColumn
> callbacks, to pass it through the virtual table API. These callbacks are
> very expensive, especially when at the other end sits a VM (CPython or PyPy)
> handling them.
>

Ok, not stating that the performance improvment is impossible, I will
explain why I'm a little sceptical about it.

For every bulk insert we have a theoretical maxiumum we'd all glad to
see sqlite would perform with - the speed of simple file copying.
Sqlite can't be faster than that, but to be on par is a good goal.
This is not possible when an insert means also modification of other
parts of the file, for example when there's an index involved. But
let's forget about it. Finally when new data is added, sqlite should
write a number of database pages, the cost of this part is absolutely
in the hands of the media (driver) and OS (driver).  But for every
database page write there's also price to pay in CPU units, for many
actions sqlite should do before actual value is translated from what
the developer provided to what actually appears on disk.

The illustration of the CPU price is the following example
 CREATE TABLE t(Value)

on my ssd drive mulitply inserts (thousands)
  insert into t (Value) values ('123456689  // this string
contains many symbols, for example 1024)
performed with the speed
  30 MB/Sec

but the query
  insert into t (Value) values (10)  // this is a small integer value
only
  3 Mb/Sec

Both shows almost full cpu load. Why such difference? Because with
latter query the system can do more than 30 MB of writes in 1 second,
but it should wait for sqlite spending 10 seconds in preparations.
The former is better because CPU cost of passing a large text value to
sqlite is comparatively low comparing to the  time spent in I/O in
writing this on disk.

So CPU price to pay isn't avoidable and notice that in example this is
not virtual table API, this is bind API. I suppose that the price we
pay for CPU spent in virtual table API is on par with an average price
payed in sqlite as a whole. This means that if I transfom the avove
queries into inserts from virtual tables, the final speed difference
will be similar. And this also means that for your comparision tests
(when you get x3 difference), the CPU price sqlite pays inside bind
api and in its code wrapping xColumn call is probably similar. The
rest is the share your code pays.

Well, I know that there are differences in CPU architectures and
probably there are platform where compiled code for bind api and
virtual tables api behaves a little differently making the costs more
diffrent. But imagine that hard task of fine tuning and refactoring
just to get a noticeable difference for a particular platform.


Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual table API performance

2014-03-02 Thread Elefterios Stamatogiannakis
In our performance tests we try to work with data and queries that are 
representative of what we would find in a typical DB.


This means a lot of "small" values (ints, floats, small strings), and 
5-20 columns.


Our main test case is TPCH, a standard DB benchmark. The "lineitem" 
table of TPCH contains 16 columns, which for 10M rows would require 160M 
xColumn callbacks, to pass it through the virtual table API. These 
callbacks are very expensive, especially when at the other end sits a VM 
(CPython or PyPy) handling them.


For PyPy in particular, which is able to produce JIT compiled and 
optimized UDF code, adapted on the "shape" (type and value distribution) 
of the data flows as they pass through it, every time it faces the 
virtual table API it slows down to a crawl, becoming more than 2x slower 
than interpreted Python. This happens because it cannot see the data/row 
"shape" from the many small and unrelated between each other, single 
value based, xColumn callbacks.


Changing the subject, i've seen some requests in previous emails from 
people asking for windowing functions to be added to SQLite. I want to 
propose an alternative that we have been using for years, and is a lot 
more generic than adding specific functions for very "narrow" use cases 
in SQLite.


We have added the "EXPAND" VT function in madIS, which "emulates" nested 
tables in SQLite, enabling to have row and aggregate functions that 
return (in a streaming fashion) multiple values on multiple columns. The 
"EXPAND" function, takes as input a table containing as values (in our 
case Python) generators, and then it calls the generators "expanding" 
the input table to its final form. "EXPAND" is automatically inserted 
wherever is required, so it isn't visible. An example follows:


> select strsplit('one and other');
one|and|other <-- 3 columns

or

> select strsplitV('one and other');
one
and<-- 3 individual rows
other

So by adding a single VT function and some syntactic sugar (auto 
inserting EXPAND VT), we were able to have functionality that is not 
case specific, allowing us to run all kinds of analytics inside SQLite.


The performance of above functionality is already very good. But it 
could be a lot better with a more efficient VT API.


Regards,

estama

On 2/3/2014 9:15 πμ, Max Vlasov wrote:

Hi,
thanks for explaining your syntax in another post. Now about virtual
tables if you don't mind.

On Fri, Feb 28, 2014 at 8:24 PM, Eleytherios Stamatogiannakis
 wrote:


If we load into SQLite, 

create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');

it takes: 55 sec


If we create an external program 

it takes: 19 sec (~3x faster than using the virtual table API)




Looking at your numbers, as a user (and fan :) of virtual tables I
decided to do some tests.

I have a virtual table "all values", it was designed for enumeration
of all tables values to the one single virtual table, so finally it is
a long list of

   TableName, TableRowId, FieldName, Value

so you get the idea. As an example of what it may do, you may open
places.sqlite of mozilla browser and do

   Select * from AllValues where Value Like "%sqlite.org%"

and see actual results even not knowing how they planned their schema.

Internally this virtual table simply uses general selects for all
other tables met in sqlite_master. This is a good (but probably not
the best) test for measuring virtual tables performance, because

   SELECT * FROM AllValues

is equivalent to reading all conventional tables of this database.
Besides
- the tool I use has a tweaker implemented with VFS that allows
measuring speed and other characteristics of the query performed while
the query is in effect.
- I have an option that forces resetting windows cache for the
database file when it is reopened. So with it we exclude the windows
cache from consideration so pure I/O reading is used. Btw, when you do
your comparison, it's very important to reset system cache before
every measurement that involves I/O.


So I took a comparatively large (500 Mb) database consisting of
several small and one big table (Posts) and compared two queries.

(Query1)

   Select sum(length(Body) + length(Title)) from Posts

This ones effectively reads the table data and uses
- length() to force sqlite reading texts that don't fit into single db page
- sum() to exclude accumulating results on my side from comparison, so
we have a single row, single column result from the work completely
done by sqlite.

(Query2)

   Select Sum(Length(Value)) from AllValues

This one performs basically the same but using sqlite virtual tables
api. It also touches other tables, but since they're small, we can
forget about this.

Query1 (General):
   Read: 540MB,
   Time: 24.2 sec,
   CPU Time: 6 Sec (25%)
   Speed: 22.31 MB/Sec

Query2 (Virtual):
   Read: 540MB,
   Time: 27.3 Sec,
   CPU Time: 13 sec (51%)
   Speed: 20 MB/Sec

In my particular test the noticeable difference is at the p

Re: [sqlite] Virtual table API performance

2014-03-01 Thread Max Vlasov
Hi,
thanks for explaining your syntax in another post. Now about virtual
tables if you don't mind.

On Fri, Feb 28, 2014 at 8:24 PM, Eleytherios Stamatogiannakis
 wrote:
>
> If we load into SQLite, 
>
> create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');
>
> it takes: 55 sec
>
>
> If we create an external program 
>
> it takes: 19 sec (~3x faster than using the virtual table API)
>
>

Looking at your numbers, as a user (and fan :) of virtual tables I
decided to do some tests.

I have a virtual table "all values", it was designed for enumeration
of all tables values to the one single virtual table, so finally it is
a long list of

  TableName, TableRowId, FieldName, Value

so you get the idea. As an example of what it may do, you may open
places.sqlite of mozilla browser and do

  Select * from AllValues where Value Like "%sqlite.org%"

and see actual results even not knowing how they planned their schema.

Internally this virtual table simply uses general selects for all
other tables met in sqlite_master. This is a good (but probably not
the best) test for measuring virtual tables performance, because

  SELECT * FROM AllValues

is equivalent to reading all conventional tables of this database.
Besides
- the tool I use has a tweaker implemented with VFS that allows
measuring speed and other characteristics of the query performed while
the query is in effect.
- I have an option that forces resetting windows cache for the
database file when it is reopened. So with it we exclude the windows
cache from consideration so pure I/O reading is used. Btw, when you do
your comparison, it's very important to reset system cache before
every measurement that involves I/O.


So I took a comparatively large (500 Mb) database consisting of
several small and one big table (Posts) and compared two queries.

(Query1)

  Select sum(length(Body) + length(Title)) from Posts

This ones effectively reads the table data and uses
- length() to force sqlite reading texts that don't fit into single db page
- sum() to exclude accumulating results on my side from comparison, so
we have a single row, single column result from the work completely
done by sqlite.

(Query2)

  Select Sum(Length(Value)) from AllValues

This one performs basically the same but using sqlite virtual tables
api. It also touches other tables, but since they're small, we can
forget about this.

Query1 (General):
  Read: 540MB,
  Time: 24.2 sec,
  CPU Time: 6 Sec (25%)
  Speed: 22.31 MB/Sec

Query2 (Virtual):
  Read: 540MB,
  Time: 27.3 Sec,
  CPU Time: 13 sec (51%)
  Speed: 20 MB/Sec

In my particular test the noticeable difference is at the part of the
CPU spent more with the virtual table. I assume this can be related to
my own implementation of this virtual table since I should retrieve,
store values temporary somewhere and talk to sqlite. But this also may
shed light on your performance drop. If your virtual implementation
spend much time processing a value, you may finally get a big drop.

You may tell that this test is not fair because it does not involve
creating a table from the values of a virtual table. Unfortunately I
can't create good enough test comparing Posts and AllValues table as
sources, because the destination geometry of the tables are different
( Posts have more columns, less rows, AllValue less columns, more
rows). The closest approximation was possible when I created an
intermediate physical table containing the results from AllValues and
compared table creation from this table and from virtual table. The
virtual one took longer, but the values - 56 seconds vs 43 second not
different enough to conclude something.

I'm not sure my tests defend sqlite virtual tables sufficiently, but
currently I don't have evidence of significant inefficiency either.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual Table "Functions"

2014-03-01 Thread Elefterios Stamatogiannakis

Please excuse me for not explaining.

The syntax that you puzzle about is supported by madIS [*] which 
translates it into SQLite. Having being using madIS for many years (we 
created it in 2008) it comes natural to me, forgetting that SQLite 
doesn't support it.


What essentially madIS does is to create and destroy the virtual tables 
in an madIS/SQLite query. It also supports an "inverted" syntax, that 
permits virtual table pipelines (like pipes in unix):


select * from (XMLPARSE select * from FILE("data.xml"));

above query is the same as writting:

select * from XMLPARSE(' select * from FILE("data.xml") ')

but without the thorny quote escaping problems.

The "select * from " part is optional (it is autocompleted) so above 
query could be rewritten to:


XMLPARSE FILE "data.xml";

Both XMLPARSE and FILE are regular SQLite virtual table functions coded 
in Python (the whole madIS is in Python). Also, due to SQLite's virtual 
table API design quality, both functions are also streaming (XML parsing 
works on continuous multi-GB long streams of XML input, coming from the 
internal query that uses FILE).


You may think that it is crazy to do this pipelining through a 
relational engine (SQLite), but the whole thing is very fast and very 
versatille. We have been processing hundrends of GB of data (in various 
forms) using such "enhanced" SQLite queries for many years.


Having said all of the above, i hope that you can see why we care so 
much about SQLite's virtual table API efficiency.


estama.

[*] https://code.google.com/p/madis/

On 1/3/2014 10:35 πμ, Max Vlasov wrote:

On Fri, Feb 28, 2014 at 10:14 PM, Dominique Devienne
 wrote:

Can someone tell me how the statement below works?


Thanks for any help on this. This is really puzzling to me. --DD



Very puzzling for me too
For any statement like this

   select * from blablabla(123)

sqlite (3.8.3.1) primarily reports syntax a error...

near "(": syntax error

... before complaining about anything else.

I'd also be glad to see a shortcut when a virtual table created just
for a select statement and dies automatically, but I suspect it is not
supported and it was their own modified version of sqlite. Hope OP
reads this post also and will have an answer for us. Probably this
will help providing following information regarding his issues.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual Table "Functions"

2014-03-01 Thread Max Vlasov
On Fri, Feb 28, 2014 at 10:14 PM, Dominique Devienne
 wrote:
> Can someone tell me how the statement below works?
> 
>
> Thanks for any help on this. This is really puzzling to me. --DD


Very puzzling for me too
For any statement like this

  select * from blablabla(123)

sqlite (3.8.3.1) primarily reports syntax a error...

   near "(": syntax error

... before complaining about anything else.

I'd also be glad to see a shortcut when a virtual table created just
for a select statement and dies automatically, but I suspect it is not
supported and it was their own modified version of sqlite. Hope OP
reads this post also and will have an answer for us. Probably this
will help providing following information regarding his issues.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table "Functions"

2014-02-28 Thread Dominique Devienne
Can someone tell me how the statement below works?

> From Eleytherios Stamatogiannakis :
> create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');

I'm using virtual tables extensively in my application, to expose
runtime C++ objects, and I'm declaring them as shown in
http://www.sqlite.org/vtab.html 1.1 Usage, i.e. using

create virtual table vt using module(args...)

How does one create such Virtual Table *Functions*? It looks like it
creates a temporary table, but I don't see how to achieve the above
using registered custom SQL functions API, nor the VTable API.

Can those functions be used in joins?

And if so, can the arguments fed to the VTable "Function" be columns
from the preceding "tables" participating in the join?

Thanks for any help on this. This is really puzzling to me. --DD
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual table API performance

2014-02-28 Thread Eleytherios Stamatogiannakis

Hello,

First of all, i would like to say that, IMHO, SQLite's overall design 
and code quality is top notch. Having said that, SQLite's virtual table 
API, while beautifully designed, it is not very efficient.


We have been hitting our heads for the last few years, on the virtual 
table API, to make it work efficiently. During that time, we were 
thinking that the problem was on our side, in the code that feeds the 
virtual table API. It turns out that the virtual table API isn't very 
efficient.


The main cause of inefficiency is that it is extremely "chatty". For an 
external stream that contains many columns, "xColumn" can be called-back 
hundreds of millions of times for the stream to be consumed by SQLite. 
These callbacks have a very big cost. Let me describe a test that we did.


For our work, we use compressed streams that are being fed in SQLite 
through the virtual table API.


If we load into SQLite, the external compressed stream (containing 3M 
rows) through the virtual table API:


create table newtable as select * from READCOMPRESSEDFILE('ctable.rc');

it takes: 55 sec


If we create an external program that inserts into SQLite, the rows in 
the compressed stream one by one, using "insert into newtable values 
" and the SQLite bind API:


it takes: 19 sec (~3x faster than using the virtual table API)


Another problem with the virtual table API, is that it wrecks havok with 
VM JIT engines. Especially for tracing JIT engines, the many "small" per 
column callbacks do not permit them to specialize at all, compared to 
the specialization that a tracing JIT could achieve with one "big" 
get_an_entire_row callback.


A suggestion for improving the efficiency of the virtual table API 
naturally arises when we look at all the virtual table functions that we 
have already created. We have ~15 VT functions dealing with importing 
"all" from external sources (files, http streams, xml, clipboard, etc), 
and only one filtering VT function (a multidimensional index) "picking" 
columns to return.


So most of our queries that use VTs look like this:

create table cliptab as select * from clipboard();

, these queries most of the time select all columns from an external stream.

Based on above, an addition that improves the efficiency of the VT API 
would be an "xRow" function that the SQLite could call to get an entire 
row back (like the bind API).


Even better, and to reduce even more the callback count, would be a 
"xNextRow" function that returns the contents of the next row or EOF.


Regards,

estama.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual Table xRowid

2014-02-05 Thread Kevin Martin

On 5 Feb 2014, at 18:40, Dan Kennedy  wrote:

> But is it the case that your virtual table doesn't handle
> anything other than full scans of the entire table contents? If
> so, it's probably not a problem if rowids are inconsistent.

Thanks for the heads up on the 'or' case, I didn't know that. I don't use 
xBestIndex, so the table only supports full scans.

I think I'll try it with returning SQLITE_ERROR in xRowid, because I'd rather 
see the error than have strange results happen silently, I'll, watch what 
happens. One of the columns in the underlying table/view is unique, so one 
solution is to order by that column when doing the internal select, and then 
the auto generated row ids will be consistent, but that just feels like it adds 
a lot of calculation I don't want to do, and won't be necessary most of the 
time.

Thanks,
Kevin
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual Table xRowid

2014-02-05 Thread Dan Kennedy

On 02/06/2014 01:22 AM, Kevin Martin wrote:

Hi,

My questions are basically:

1) What causes sqlite3 to call xRowid
2) If I don't want to update my virtual table, or do a select ROWID, can I just 
use an incremental counter, increased on every call to xNext (bearing in mind 
the order of my data is not guaranteed, so this won't necessarily return the 
same rowid for the same row on separate calls to xFilter), or even better just 
return SQLITE_ERROR?


One more case. If you have a query like this:

  SELECT ... FROM vtable WHERE vtable.a = ? OR vtable.b = ?;

and the implementation of vtable can handle queries for both
(vtable.a=?) and (vtable.b=?) efficiently, then SQLite might
make two separate sets of calls to xFilter/xStep - one to obtain
the set of rows for (a=?) and a second to obtain the set that
match (b=?). It then merges the two sets together using the
rowids as a unique key. If the rowids in the two sets are not
consistent, then you might end up returning duplicate rows to
the user or omitting rows from the result.

But is it the case that your virtual table doesn't handle
anything other than full scans of the entire table contents? If
so, it's probably not a problem if rowids are inconsistent.

Dan.







The problem itself is detailed below.

Thanks,
Kevin

---

I am creating virtual table that unpacks the contents of another table/view.

As a simple example, consider the following:

create table t(week, performance_csv)

Where each row contains a csv file for the performance of the given week, and 
each csv file has two columns, person and percentage.

I want to create a virtual table as follows:

create virtual table t2 using unpack_performance(t)

This table will declare the table as follows

create table t2(week, person, percentage)

and it will iterate over each row of each csv in the table t.

Implementing xRowid is easy because I can base it off the ROWID of t and number 
of lines in the csv file.

However, the problem comes when t is a view, I've only just discovered that 
views don't have ROWIDs!

I can quite happily just increment a simple counter, but the view isn't 
necessarily ordered, so the rowid won't match on multiple filters.

In my actual application the view is the union of a few selects, so actually 
including the rowid in the view is going to be a complete pain, and is not an 
option I want to consider.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Virtual Table xRowid

2014-02-05 Thread Kevin Martin
Hi,

My questions are basically:

1) What causes sqlite3 to call xRowid
2) If I don't want to update my virtual table, or do a select ROWID, can I just 
use an incremental counter, increased on every call to xNext (bearing in mind 
the order of my data is not guaranteed, so this won't necessarily return the 
same rowid for the same row on separate calls to xFilter), or even better just 
return SQLITE_ERROR?

The problem itself is detailed below.

Thanks,
Kevin

---

I am creating virtual table that unpacks the contents of another table/view.

As a simple example, consider the following:

create table t(week, performance_csv)

Where each row contains a csv file for the performance of the given week, and 
each csv file has two columns, person and percentage.

I want to create a virtual table as follows:

create virtual table t2 using unpack_performance(t)

This table will declare the table as follows

create table t2(week, person, percentage)

and it will iterate over each row of each csv in the table t.

Implementing xRowid is easy because I can base it off the ROWID of t and number 
of lines in the csv file.

However, the problem comes when t is a view, I've only just discovered that 
views don't have ROWIDs!

I can quite happily just increment a simple counter, but the view isn't 
necessarily ordered, so the rowid won't match on multiple filters.

In my actual application the view is the union of a few selects, so actually 
including the rowid in the view is going to be a complete pain, and is not an 
option I want to consider.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Virtual Table: misuse error on sqlite3_declare_vtab()

2013-10-04 Thread Simon
GOT IT.

Well, I found my mistake that lead to all this.  I'm sorry to have wasted
your time.  Here's what happened for those who struggle with similar
issues...

I originally opted to make sqlite3_module a member of my C++ "module"
class.  My C++ constructor was therefore assigning all the xFunctions to
that module's members.  Later, without realizing the impact, my
implementation changed so my class derived sqlite3_module.  However, I
failed to switch the xFunc assignements to this->xFunc = foo;  In the end,
it was exactly as if I had never assigned anything to them.

Thank you all very much!  :)

Simon


On Fri, Oct 4, 2013 at 3:14 PM, Simon  wrote:

> The option was already on for my own stuff but not sqlite.  However, I
> just had to recompile libSqlite3 with debug option to get more details and
> it's a bit scary...
>
> SQL query using sqlite3_exec() and stmt: create virtual table foo using
> vtable_module1;
> -Calling sqlite3_prepare_v2()...
> -Calling sqlite3_step()...
> staging.bin: sqlite3.c:103774: vtabCallConstructor: Assertion `xConstruct'
> failed.
>
> What's that xConstruct?!  I can't even grep that in the sqlite source
> files!  :(
>
> After having browsed around, I think it might actually be a macro
> (somehow) which aliases either xCreate or xConnect depending on the
> situation, but this is just a guess...  and still, it doesn't really help.
>  I reviewed my xCreate and xConnect and they look fine at first glance...
>
> Thanks,
>   Simon
>
>
> On Fri, Oct 4, 2013 at 2:26 PM, Richard Hipp  wrote:
>
>>
>>
>>
>> On Fri, Oct 4, 2013 at 2:22 PM, Simon  wrote:
>>
>>> Hi Richard,
>>>
>>> That helped a lot and got me passed that point.  However, I didn't go
>>> very far from where I were...
>>>
>>> It now compiles fine, however the program segfaults within the
>>> sqlite3_step() of the "create virtual table foo using vtable_module1;"
>>> statement.
>>>
>>> Using valgrind, I get this interesting message:
>>>
>>> ==31748== Jump to the invalid address stated on the next line
>>> ==31748== at 0x0: ???
>>> ==31748== by 0x5A0D119: vtabCallConstructor (in
>>> /usr/lib64/libsqlite3.so.0.8.6)
>>> ==31748== by 0x5A3D499: sqlite3VdbeExec (in
>>> /usr/lib64/libsqlite3.so.0.8.6)
>>> ==31748== by 0x5A3E969: sqlite3_step (in /usr/lib64/libsqlite3.so.0.8.6)
>>> [...]
>>>
>>> This looks (to me) like one of the NULL function pointers in struct
>>> sqlite3_module is being called.  I added a valid xRename() function as I
>>> had initially thought it was optionnal, but still not working.  I then
>>> tried creating a dummy function for every one of these pointers, but I get
>>> the same result.  I even tried changing the "create virtual..." statement
>>> to include module params (which my module ignores) but again it didn't
>>> change anything.
>>>
>>> Where should I look at next?
>>>
>>>
>> Recompile with line-number information (-g is it?) so that you know
>> exactly which line of code tried to jump to the NULL pointer.  Then you'll
>> know exactly which method you need to add.
>>
>>
>> --
>> D. Richard Hipp
>> d...@sqlite.org
>>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


  1   2   >