[sqlite] Ordered, union all of ordered views
Hello, We have two virtual tables that are ordered, and we do a "union all" of them asking from SQLite to keep the result ordered. So the query looks like this: select * from (select x from ordered_VT1 UNION ALL select x from ordered_VT2) order by x; Both of ordered_VT1, ordered_VT2 report back (via BestIndex) that their results are ordered on x. Note that when SQLite negotiates with the ordered_VT1,2 (via BestIndex), it doesn't even ask them if an ordering on x already exists. Right now SQLite does a full scan of ordered_VT1, and then ordered_VT2 before starting to produce results. Shouldn't it do a merge union all of the two? Is there some way to help SQLite's planner to "see" that such a possibility exists? Kind regards, l. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Merge join in SQLite
Thank you for the answer as well as the merge-join complexity remark. What about the union-all with ordered input and an order-by on the whole query? Does SQLite use a "merge" algorithm for that case? Kind regards, l. On 26/06/14 21:45, Richard Hipp wrote: On Thu, Jun 26, 2014 at 11:41 AM, Eleytherios Stamatogiannakis <est...@gmail.com <mailto:est...@gmail.com>> wrote: Hi, I've seen that SQLite supports the "SorterOpen" Op, to do merge sorting. So based on that, does SQLite support merge joins when both inputs of the join are sorted? No, not at this time. Note that a merge-join is more complicated than it appears at first glance for the common case where the join key is not unique in one or the other of the two tables being joined. -- D. Richard Hipp d...@sqlite.org <mailto:d...@sqlite.org> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Merge join in SQLite
Hi, I've seen that SQLite supports the "SorterOpen" Op, to do merge sorting. So based on that, does SQLite support merge joins when both inputs of the join are sorted? Kind regards, l. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] RPAD/LPAD
IMHO, SQLite is targeted towards being a relational "core" with very wide extensibility. Adding specific scripting engines to it is would be detrimental to its main purpose (being a very good relational "core"). In our group, we use SQLite + UDFs written in Python. Creating new functions is very simple (most of the functions are a couple Python lines). The API of SQLite permits very advanced functionality to be tied to it (infinite streaming virtual tables, indexes written in Python, etc). And (in our benchmarks) the speed is better than Postgres and a LOT better than MySQL. Finally, the simplicity and extensibility of SQLite has permitted us to create hundreds of functions in a very short time [*]. l. [*] http://doc.madis.googlecode.com/hg/index.html On 07/03/14 16:59, Dominique Devienne wrote: On Fri, Mar 7, 2014 at 3:39 PM, Clemens Ladischwrote: Max Vlasov wrote: On Fri, Mar 7, 2014 at 12:49 PM, Dominique Devienne wrote: I think what SQLite lacks is a syntax to define custom function like it does for virtual tables. Something like: create function rpad(x, y) using scripty_module as "return PRINTF('%-*s',y,x)"; Nice suggestion. This probably falls into case when a small new part needed on sqlite side Actually, no change to SQLite itself would be needed. It's possible to create an extension that provides a function that allows to register another function that executes a custom SQL expression: SELECT register_simple_function('rpad', 2, 'SELECT printf(''%-*s'', ?, ?)'); I've already mentioned that Clemens. But also pointed about that then you don't have a standard way to register those dynamically generated functions. FWIW. --DD ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Virtual table API performance
On 04/03/14 20:11, Alek Paunov wrote: On 04.03.2014 15:05, Eleytherios Stamatogiannakis wrote: On 03/03/14 03:01, Alek Paunov wrote: It seems that the easiest optimization for this (very often) VT use case (bulk streaming) is SQLite add-on in _C_ to be written, implementing vtable interface specialization containing xNextPage "buffering" let's say 4K rows or even better 16KB data (in addition to your initial proposal of xNextRow). The technical question is: how the rows to be encoded? You said initially that you use some compressed format. But for such extension, to gain more traction in the future, it would be better probably a more standard format to be chosen. a) Rows represented in native SQLite3 format [3] b) ... native SQLite4 format c) Some wide used encoding near to SQLite types [4] d) ... [3] http://www.sqlite.org/fileformat.html#record_format [4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats IMHO, It would be very nice if some common conventions for binary record streaming could be discussed and adopted across the SQLite binding and add-on developers. The possible applications are not limited only to vtables ;-). SQLite doesn't need any special format for the records to be passed over to it. It already has the "bind" API which would be very suitable for the xNextRow function too. It seems that I do not know the sqlite3_ API very well. http://www.sqlite.org/c3ref/funclist.html Would you like to point me to the "bind" API page? In the link that you posted above, look for all the sqlite3_bind_x functions. In SQLite the bind API is used to pass parameters to prepared statements. The way the the bind API works is that you have a statement parameter "row", and you fill it by saying: Set column 1 of statement parameter "row" to an int with value 10 Set column 2 of statement parameter "row" to an float with value 3.5 ... So instead of SQLite calling back for each column, in the bind API "way", the program calls SQLite to fill a row's values. Regards. l. For a paging API (which IMHO is too complex ATM), the bind API could be extended with a row number parameter. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Virtual table API performance
Thank you for the explanation. Answers inline. On 04/03/14 16:16, Hick Gunter wrote: Your VT1 table already has an xColumn implementation, possibly doing something like switch( p_column ) { case 1: sqlite3_result_xxx( p_ctx, v_rec->f1, ...); break; ... case n: sqlite3_result_xxx( p_ctx, v_rec->fn, ...); break; } This needs to have two cases added: case n+1: sqlite3_result_int64( p_ctx, (uintptr_t)v_rec ); case n+2: sqlite3_result_int64( p_ctx, (uintptr_t)func ); where static int func( p_rec, p_ctx, p_column ); calls xColumn( v_cursor, p_ctx, p_column ); with a dummy cursor structure as defined for your table. The VT2 table can then prepare "select __rec,__func from VT1", and in its xColumn implementation it calls v_rec = (void *)sqlite3_column_int64( v_stmt, 0 ); // this can be stored and cleared in the xNext function v_func = (func *)sqlite3_column_int64( v_stmt, 1 ); // this can be stored v_func( v_rec, p_ctx, p_column ); I see, so you do a similar trick as what we do with passing Python's generators as values in SQLite. As for your second example, as written it does not suffer from the effect because you are already selecting c1, c2 and c3 at the bottom level. Rewritten as Select processrow(c1,c2,c3) from VT2(select * from VT1); Without knowing what VT2 will do, I don't think that this rewritting can happen. For example, "processrow" might return generators (nested tables), that get expanded by VT2. If you moved it outside VT2, then the generators would not be expanded. Regards, l. results in the VT1 xColumn function getting called (via the VT2 xColumn function) just 3 times per row. Additionally, you may like to "select __func from VT1 limit 1" and store that in your xFilter implementation; and then "select __rec from VT1" in your xNext implementation to have sqlite3_result_int64() called half as often. HTH -Ursprüngliche Nachricht- Von: Eleytherios Stamatogiannakis [mailto:est...@gmail.com] Gesendet: Dienstag, 04. März 2014 14:15 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] Virtual table API performance Could you explain some more your solution? Does it work in this following case: select * from VT2(select * from VT1); by directly passing the rows from VT1 to VT2 (short-circuiting SQLite)? What would happen in the following case?: select * from VT2(select processrow(c1,c2,c3) from VT1); Regards, l. On 03/03/14 14:17, Hick Gunter wrote: We have gotten around this problem by defining "virtual" fields that contain a reference to the "current record" and the entrypoint of a wrapper around the xColumn function. That way only two fields get passed upwards through the virtual table stack and the top level virtual table's xColumn implementation calls straight through to the bottom layer's wrapper. It does take some care to avoid sorting in between the layers and re-preparation of statements on schema changes. -Ursprüngliche Nachricht- Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com] Gesendet: Sonntag, 02. März 2014 20:39 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] Virtual table API performance We have both input and output virtual tables that avoid hitting the hard disk and are also able to compress the incoming and outgoing data. We have a virtual table that takes as input a query and sends the data to a port on another machine. This virtual table is called "OUTPUT". And another virtual table that takes as input data from another port and forwards it into SQLite. Lets call it "INPUT". A query that uses these two virtual tables would look like this in madIS: OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081'); We actually use queries like above (actually we don't do it directly to ports but to buffered named pipes that are then forwarded via netcat) to run distributed queries on clusters, connecting all the local SQLite/madIS instances on the different machines together. The main point that i want to make with above explanation is that we don't view SQLite only as a traditional database. We also view it as a data stream processing machine, that doesn't have the requirement for the data to be stored on a hard disk. Under this view, the efficiency of the virtual table api is very important. Above query only uses 2 VTs in it, but we have other queries that use a lot more VTs than that. estama On 2/3/2014 9:34 ìì, Max Vlasov wrote: On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakis <est...@gmail.com> wrote: Our main test case is TPCH, a standard DB benchmark. The "lineitem" table of TPCH contains 16 columns, which for 10M rows would require 160M xColumn callbacks, to pass it through the virtual table API. These ca
Re: [sqlite] Virtual table API performance
In our tests the bind API can acquire from the Python side more than 20 values in a single call, at the same time that xColumn acquires 2 values. Most of the cost is in the callback and not in submitting a row's values through bind's API . So with the exception of queries that need only 1 column, IMHO everything else should go through the xNextRow API. To keep the complexity to the lowest minimum, my proposal is to use xNextRow API only for queries that only "scan" over a virtual table (no filtering). l. On 04/03/14 18:23, Hick Gunter wrote: My guess: Yes. It would require implementing an new opcode, either only for virtual tables or also for native tables too, that accepts a list of field numbers (currently there are only 5 parameters possible for an opcode and some of them have fixed meanings). And the logic to generate theses opcodes based on the capabilities of the loaded table module combined with the requirements of the subject query (fields required for JOIN are fetched separately from those required for the result set) and the result of the xBestIndex calls (where it is possible to set the "omit" flag to suppress generation of a comparison). This also adds to the complexity of register allocation. Take for example a join that needs 3 fields for the comparison, 2 of which are also required for the result set of 7 fields total. Do you request all 10 fields at once (wastes up to 9 fields worth of effort if the JOIN is not met)? Or the 3 fields first and the 5 others only if the join matches (must allocate consecutive registers to build a result set)? Or 3 first and then 7 (which approximates the current behavior, as the 2 common fields are fetched twice on a match)? And a set of new sqlite3_result routines that specify which of the various requested fields' value is being set. -Ursprüngliche Nachricht- Von: J. Merrill [mailto:j.merr...@enlyton.com] Gesendet: Dienstag, 04. März 2014 16:23 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] Virtual table API performance Eleytherios Stamatogiannakis wrote Our main test case is TPCH, a standard DB benchmark. The "lineitem" table of TPCH contains 16 columns, which for 10M rows would require 160M xColumn callbacks, to pass it through the virtual table API. These callbacks are very expensive, especially when at the other end sits a VM (CPython or PyPy) handling them. Would it be very difficult to arrange for an option that would request that SQLite issue a single more-complex xMultiColumns (a sample name) callback request, with a way for multiple results to be returned, rather than many xColumn callbacks? This would reduce the number of calls across the VM boundary. Applications that don't implement xMultiColumns (and request its use) would see no change; those that do would get the performance boost. J. Merrill -- View this message in context: http://sqlite.1065341.n5.nabble.com/Why-does-type-affinity-declared-on-a-foreign-key-column-affect-join-speed-tp74183p74295.html Sent from the SQLite mailing list archive at Nabble.com. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users --- Gunter Hick Software Engineer Scientific Games International GmbH Klitschgasse 2 – 4, A - 1130 Vienna, Austria FN 157284 a, HG Wien Tel: +43 1 80100 0 E-Mail: h...@scigames.at This e-mail is confidential and may well also be legally privileged. If you have received it in error, you are on notice as to its status and accordingly please notify us immediately by reply e-mail and then delete this message from your system. Please do not copy it or use it for any purposes, or disclose its contents to any person as to do so could be a breach of confidence. Thank you for your cooperation. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Virtual table API performance
Could you explain some more your solution? Does it work in this following case: select * from VT2(select * from VT1); by directly passing the rows from VT1 to VT2 (short-circuiting SQLite)? What would happen in the following case?: select * from VT2(select processrow(c1,c2,c3) from VT1); Regards, l. On 03/03/14 14:17, Hick Gunter wrote: We have gotten around this problem by defining "virtual" fields that contain a reference to the "current record" and the entrypoint of a wrapper around the xColumn function. That way only two fields get passed upwards through the virtual table stack and the top level virtual table's xColumn implementation calls straight through to the bottom layer's wrapper. It does take some care to avoid sorting in between the layers and re-preparation of statements on schema changes. -Ursprüngliche Nachricht- Von: Elefterios Stamatogiannakis [mailto:est...@gmail.com] Gesendet: Sonntag, 02. März 2014 20:39 An: sqlite-users@sqlite.org Betreff: Re: [sqlite] Virtual table API performance We have both input and output virtual tables that avoid hitting the hard disk and are also able to compress the incoming and outgoing data. We have a virtual table that takes as input a query and sends the data to a port on another machine. This virtual table is called "OUTPUT". And another virtual table that takes as input data from another port and forwards it into SQLite. Lets call it "INPUT". A query that uses these two virtual tables would look like this in madIS: OUTPUT ip:192.168.0.1 port:8080 select * from INPUT('port:8081'); We actually use queries like above (actually we don't do it directly to ports but to buffered named pipes that are then forwarded via netcat) to run distributed queries on clusters, connecting all the local SQLite/madIS instances on the different machines together. The main point that i want to make with above explanation is that we don't view SQLite only as a traditional database. We also view it as a data stream processing machine, that doesn't have the requirement for the data to be stored on a hard disk. Under this view, the efficiency of the virtual table api is very important. Above query only uses 2 VTs in it, but we have other queries that use a lot more VTs than that. estama On 2/3/2014 9:34 ìì, Max Vlasov wrote: On Sun, Mar 2, 2014 at 5:21 PM, Elefterios Stamatogiannakiswrote: Our main test case is TPCH, a standard DB benchmark. The "lineitem" table of TPCH contains 16 columns, which for 10M rows would require 160M xColumn callbacks, to pass it through the virtual table API. These callbacks are very expensive, especially when at the other end sits a VM (CPython or PyPy) handling them. Ok, not stating that the performance improvment is impossible, I will explain why I'm a little sceptical about it. For every bulk insert we have a theoretical maxiumum we'd all glad to see sqlite would perform with - the speed of simple file copying. Sqlite can't be faster than that, but to be on par is a good goal. This is not possible when an insert means also modification of other parts of the file, for example when there's an index involved. But let's forget about it. Finally when new data is added, sqlite should write a number of database pages, the cost of this part is absolutely in the hands of the media (driver) and OS (driver). But for every database page write there's also price to pay in CPU units, for many actions sqlite should do before actual value is translated from what the developer provided to what actually appears on disk. The illustration of the CPU price is the following example CREATE TABLE t(Value) on my ssd drive mulitply inserts (thousands) insert into t (Value) values ('123456689 // this string contains many symbols, for example 1024) performed with the speed 30 MB/Sec but the query insert into t (Value) values (10) // this is a small integer value only 3 Mb/Sec Both shows almost full cpu load. Why such difference? Because with latter query the system can do more than 30 MB of writes in 1 second, but it should wait for sqlite spending 10 seconds in preparations. The former is better because CPU cost of passing a large text value to sqlite is comparatively low comparing to the time spent in I/O in writing this on disk. So CPU price to pay isn't avoidable and notice that in example this is not virtual table API, this is bind API. I suppose that the price we pay for CPU spent in virtual table API is on par with an average price payed in sqlite as a whole. This means that if I transfom the avove queries into inserts from virtual tables, the final speed difference will be similar. And this also means that for your comparision tests (when you get x3 difference), the CPU price sqlite pays inside bind api and in its code wrapping xColumn call is probably similar. The rest is the share your code pays. Well, I know that there are differences in CPU architectures
Re: [sqlite] Virtual table API performance
On 03/03/14 03:01, Alek Paunov wrote: On 02.03.2014 21:38, Elefterios Stamatogiannakis wrote: Under this view, the efficiency of the virtual table api is very important. Above query only uses 2 VTs in it, but we have other queries that use a lot more VTs than that. Max tests in C shows 2x CPU work, but he explains that the test is not very sound, so let's say somewhere between 1x-2x. Your tests - 3x time. As you have already identified, the real reason probably is the million scale callback quantity across the VM barrier - I do not follow PyPy, but see these notes [1] by Mike Pall - the LuaJIT author (LuaJIT is the leading project in the trace compilers filed): [1] http://luajit.org/ext_ffi_semantics.html#callback_performance Also from one of the dozens of threads touching the subject: [2] http://www.freelists.org/post/luajit/Yielding-across-C-boundaries,3 ``` Entering the VM needs a lot of state setup and leaving it isn't free either. Constantly entering and leaving the VM via a callback from C *to* Lua has a high overhead. For short callbacks, the switching overhead between C and Lua may completely dominate the total CPU time. Calling an iterator written in C via the FFI *from* a Lua program is much cheaper -- this compiles down to a simple call instruction. ``` I remember that i had seen above quote from Mike Pall, but i couldn't find it. Thank you for unearthing it. Unfortunately, for your "insert into t select * from vt" case an the callback/iterator transformation is not possible (we do not have repetitive _step call to invert the control somehow). What to do? It seems that the easiest optimization for this (very often) VT use case (bulk streaming) is SQLite add-on in _C_ to be written, implementing vtable interface specialization containing xNextPage "buffering" let's say 4K rows or even better 16KB data (in addition to your initial proposal of xNextRow). The technical question is: how the rows to be encoded? You said initially that you use some compressed format. But for such extension, to gain more traction in the future, it would be better probably a more standard format to be chosen. a) Rows represented in native SQLite3 format [3] b) ... native SQLite4 format c) Some wide used encoding near to SQLite types [4] d) ... [3] http://www.sqlite.org/fileformat.html#record_format [4] https://github.com/msgpack/msgpack/blob/master/spec.md#formats IMHO, It would be very nice if some common conventions for binary record streaming could be discussed and adopted across the SQLite binding and add-on developers. The possible applications are not limited only to vtables ;-). SQLite doesn't need any special format for the records to be passed over to it. It already has the "bind" API which would be very suitable for the xNextRow function too. For a paging API (which IMHO is too complex ATM), the bind API could be extended with a row number parameter. Regards, estama Kind regards, Alek ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Virtual table API performance
Hello, First of all, i would like to say that, IMHO, SQLite's overall design and code quality is top notch. Having said that, SQLite's virtual table API, while beautifully designed, it is not very efficient. We have been hitting our heads for the last few years, on the virtual table API, to make it work efficiently. During that time, we were thinking that the problem was on our side, in the code that feeds the virtual table API. It turns out that the virtual table API isn't very efficient. The main cause of inefficiency is that it is extremely "chatty". For an external stream that contains many columns, "xColumn" can be called-back hundreds of millions of times for the stream to be consumed by SQLite. These callbacks have a very big cost. Let me describe a test that we did. For our work, we use compressed streams that are being fed in SQLite through the virtual table API. If we load into SQLite, the external compressed stream (containing 3M rows) through the virtual table API: create table newtable as select * from READCOMPRESSEDFILE('ctable.rc'); it takes: 55 sec If we create an external program that inserts into SQLite, the rows in the compressed stream one by one, using "insert into newtable values " and the SQLite bind API: it takes: 19 sec (~3x faster than using the virtual table API) Another problem with the virtual table API, is that it wrecks havok with VM JIT engines. Especially for tracing JIT engines, the many "small" per column callbacks do not permit them to specialize at all, compared to the specialization that a tracing JIT could achieve with one "big" get_an_entire_row callback. A suggestion for improving the efficiency of the virtual table API naturally arises when we look at all the virtual table functions that we have already created. We have ~15 VT functions dealing with importing "all" from external sources (files, http streams, xml, clipboard, etc), and only one filtering VT function (a multidimensional index) "picking" columns to return. So most of our queries that use VTs look like this: create table cliptab as select * from clipboard(); , these queries most of the time select all columns from an external stream. Based on above, an addition that improves the efficiency of the VT API would be an "xRow" function that the SQLite could call to get an entire row back (like the bind API). Even better, and to reduce even more the callback count, would be a "xNextRow" function that returns the contents of the next row or EOF. Regards, estama. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Why must WITHOUT ROWID tables have PRIMARY KEYs?
First of all please excuse me for bringing forth again a conversation that is months old. I've just discovered the "without rowid" ability of SQLite and i have to say many many thanks for it. Let me describe a use case where a not unique key and without rowid are most welcome. We have a distributed big data system here which uses SQLite for the partitions. To be able to efficiently execute join queries on splited partitions, we need to assemble the partitions of one side of the query to create an index on them. So in the query: select y,z from A, B where A.x = B.x; We assemble the pieces of B using this query: create table B as select * from ALL("b1.db", "b2.db", "b3.db"); and then we create an index on them: create index idx_b_x on B(x,y); Above essentially creates 2 copies of table B's data (because all our indexes are always covering indexes). So to avoid these 2 copies, what we would like to have are indexes *without backing tables*. These are essentially tables "without rowid" having plain keys (as opposed to primary keys). We could work around the primary key limitation of "without rowid" tables by creating a dummy rowid column and putting it at the right of the table's key that we care about: CREATE TABLE B( x, y, rowid, PRIMARY KEY(x, rowid) ) WITHOUT ROWID; and then insert into it the data: insert into B select * from ALLWITHDUMMYROWID("b1.db", "b2.db", "b3.db"); but having this capability inside SQLITE would be a lot better. Nevertheless above comments, thank you again for "without rowid". They'll be very useful for our use case. Regards, l. On 19/11/13 21:53, Richard Hipp wrote: On Mon, Nov 18, 2013 at 5:45 PM, Nico Williamswrote: Obviously a B-Tree-based table will need *some* key, but it won't need that key to be UNIQUE. Yeah it does. Every storage system has a unique key, be it an address in memory or a filename and offset on disk. There must be some way to distinguish between two record. Two records with the same key are indistinguishable from one another and are hence the same record. The question is whether or not the unique identifier is exposed to the application. SQLite allows the unique identifier to be a PRIMARY KEY ... WITHOUT ROWID. Or it allows it to be the rowid. You get to choose. But every table has to have one or the other. You can argue that it is theoretically possible to create a table where the key is something other than PRIMARY KEY or rowid (perhaps it is the filename+offset suggested above) that is simply not exposed to the application layer. Yes, that is theoretically possible. But SQLite does not do that and I do not see any reason to add the capability. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Limit of attached databases
On 04/06/13 17:37, Simon Slavin wrote: On 4 Jun 2013, at 3:09pm, Eleytherios Stamatogiannakis <est...@gmail.com> wrote: Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read only* attached DBs? See section 11 of <http://www.sqlite.org/limits.html> It's a 64-bit value, and two bits are already taken up. Yes i have seen it in SQLite's code. I considered changing it to a bitfield, but the problem is that this long int is used in various other places in SQLite's internals for transaction' metadata. You can attach databases, copy data from them to the main database, then detach those and attach some others. Or you can create a hierarchy of shards (each of 62 shards can point to up to 62 others). Or you can rewrite your code so it never uses more than 62 shards no matter how many nodes are available. Attaching and detaching is only useful when materializing the shards into a single table: create table T attach T1 insert into T select * from T1.T detach T1 attach T2 insert into T select * from T2.T ... How would a hierarchy of shards work? You cannot attach a DB onto another attached DB for the hierarchy idea to work. Also, only using 62 shards at a time can be very constrained/slow in our use case (see below). None of them good solutions, I'm afraid. Yes :-(. Also is there anyway for SQLite to create an automatic index on a view (or Virtual Table), without having to first materialize the view (or VT)? I believe that SQLite needs the data to be in one place (i.e. at least a virtual table) for the indexing routine to work. We have tried with both views and VTs but SQLite does not create automatic indexes on them at all. So right now, to be able to have automatic indexes from SQLite's side we materialize all Virtual Tables into plain tables: create temp table T as select * from UnionAllVT1; create temp table G as select * from UnionAllVT2; ... which doubles our I/O to process a single sharded table. - 1 full read + 1 full write of all data to materialize the UnionAllVT into a plain table. - 1 full read + 1 full write of the data in the materialized table to create the automatic index. It would be very nice if the automatic index could be created directly from the UnionAllVT, but we haven't found a way to do it. If you're willing to put a bit of SQLite-only effort in, you could implement your own virtual table implementation that consulted data on each of your nodes. This would be quite highly customised for your own application's requirements but it would mean you didn't have to do any attaching or detaching at all. Your SQLite API calls could address your data as if it was all in one database file but SQLite would understand how data is partitioned between nodes and automatically gather it from all the necessary nodes. We already have done this (creating VTs is very easy in madIS [*]). We have the UnionALL virtual table that scans over all the DB shards. Above UnionALL VT only supports scans, and it is only used to materialize the shards into a regular table. It would be very costly having per shard indexes, because each Filter on the UnionALL VT would need to be passed to all of the shards. A single automatic index, works best. Another way to do it would be to implement your own VFS which would distribute over the nodes not at the row level but as if they were all one huge storage medium (i.e. like a RAID). Each shard that we use is already a self contained SQLite DB. We would need to change our whole approach to convert to a disk page based sharding approach. Thanks for your ideas. l. [*] https://code.google.com/p/madis/source/browse/src/functions/vtable/unionalldb.py ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Limit of attached databases
Hi, During our work on a distributed processing system (which uses SQLite shards), we have hit the SQLITE_MAX_ATTACHED limit of attached DBs. The way we use SQLite for distributed processing [*], is the following: - Each table is sharded into multiple SQLite DBs on different nodes of the cluster. - To process a query, we run on each shard a query which produces multiple sharded SQLite result DBs. - We redistribute in the cluster the result DBs, and the next set of cluster nodes, attaches all the input shard SQLite DBs, and it creates a temp view that unions all the input DB shards into a single view. - It then executes a query on the views that produces new result DB shards - and so on We recently got access to a cluster of 64 nodes and it is very easy now to hit the SQLITE_MAX_ATTACHED limit (1 DB shard gets produced per node). So the question that i have is: Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read only* attached DBs? Also is there anyway for SQLite to create an automatic index on a view (or Virtual Table), without having to first materialize the view (or VT)? Thanks in advance. Lefteris Stamatogiannakis. [*] The same processing ideas are used in hadapt: http://hadapt.com/ which uses Postgres for the DB shards. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Query optimizer suboptimal planning with virtual tables
On 14/03/13 17:05, Clemens Ladisch wrote: Eleytherios Stamatogiannakis wrote: explain query plan select * from t1, t2 where t1.c1=t2.c1; 0 |0 |1 | SCAN TABLE t2 (~100 rows) 0 |1 |0 | SCAN TABLE t1 VIRTUAL TABLE INDEX 0: (~0 rows) Even thought, i have put VT t1 first in the join list, SQLite will do a nested loop join (putting it on the right). How have you defined index 0 of your virtual table? The "INDEX 0:" is always there (even when no index is defined in the VT function). I don't know what it means. l. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Query optimizer suboptimal planning with virtual tables
Hi, I have came across a glitch with how SQLite's query optimizer plans virtual tables. Example follows: I have a virtual table function named "range" that produces all numbers from 1 to range's arg. This virtual table does not have any index functionality. With this i'll create the virtual table instance "t1": > create virtual table t1 using range('100'); > select * from t1; 1 2 ... 99 100 --Column names-- C1 Let's create a real table now: > create table t2 as select * from t1; The plan that the optimizer will produce when i join these two tables is this: > explain query plan select * from t1, t2 where t1.c1=t2.c1; 0 |0 |1 | SCAN TABLE t2 (~100 rows) 0 |1 |0 | SCAN TABLE t1 VIRTUAL TABLE INDEX 0: (~0 rows) Even thought, i have put VT t1 first in the join list, SQLite will do a nested loop join (putting it on the right). Wouldn't it had made more sense for SQLite to create an automatic index on the real table t2 and do the join as such? 0 |0 |0 | SCAN TABLE t1 VIRTUAL TABLE INDEX 0: (~0 rows) 0 |1 |1 | SEARCH TABLE t2 USING AUTOMATIC COVERING INDEX idx (C1=?) (~10 rows) Putting the VT on the right by default, doesn't make much sense to me, since it cannot create an automatic index on it. In general it seems to me to be a better default to always have the non automatic indexable SQLite entities (views, virtual tables) on the left of the joins and what can be automatically indexed on the right of the joins. Also, i think, that it would be even better if SQLite had the ability to scan the virtual table and build a temporary automatic covering index on it to do the join (why isn't this case allowed?). Thank you, lefteris. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full covering index without table
I'm not so sure how fast SQLite4 will be with respect to full scans. IMHO row stores have an advantage when scanning over multiple columns. Concerning dropping the index and recreating it on the other side. Its doable but very bad from a performance point of view. If you know how the other side will access the data, then the best option is to build in parallel the indexes over all the data chunks (on every cluster node), and then send the indexed chunks on the other side. Having doubled data (table+full covering index) in these chunks, halves the I/O bandwidth of the whole cluster. Also i should point that our main use case is OLAP processing and not OLTP. lefteris. On 05/03/13 10:51, Nico Williams wrote: SQLite4 gets this right... Of course, it's not been released. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Full covering index without table
On 04/03/13 18:44, Simon Slavin wrote: On 4 Mar 2013, at 4:13pm, Eleytherios Stamatogiannakis <est...@gmail.com> wrote: Is there a way in SQLite to have a full covering index on a table without also storing the duplicate table? Can we ask why you care about this ? Do you have a huge table which is taking a huge amount of space, and you're trying to fit it on a Flash Drive ? Can you distinguish between "I think it could be smaller." and "It's just a little too big and that means I can't use SQLite for this." ? We are creating a distributed processing system in the spirit of Hadapt [1], but instead of using PostgreSQL we are using SQLite. For the intermediate result tables (each one inside an SQLite DB) that we know how they will be accessed (and so we prepare their indexes), it is very wasteful to have to transfer twice the data (index + full table). This kind of systems live and die by their I/O. The most compact way of carrying SQLite databases around is to use the shell tool to dump the database to a SQL text file, then use a compression utility (e.g. ZIP) to compress that text file. But without knowing your situation I can't tell if that would help you. For streaming processing we have our own serialization format that is compressed on the fly with LZ4. These streams are opened on the other side as SQLite Virtual Tables. For store and forward type of processing, we use SQLite DBs also compressed on the fly with LZ4. On the other side we simply "attach" these DBs. A first shot toward a partial solution would be to declare all the columns on the table as primary keys: create table t(a,b,c, primary key(a,b,c)); Sorry, but it doesn't help. Even fields in the primary key are stored twice. I'm saddened to hear that. I thought that at least we had a partial solution with declaring all rows as a primary key... Thank you for answering. l. [1] http://hadapt.com ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Full covering index without table
Hi, I have a question: Is there a way in SQLite to have a full covering index on a table without also storing the duplicate table? So if a have a table: create table t(a,b,c); and an index covering all its columns: create index idx_t on t(a,b,c); SQLite will store everything twice. First on the table and then on the index. The problem is that if all access on the table happens through the covering index, then the information on the table is redundant. A first shot toward a partial solution would be to declare all the columns on the table as primary keys: create table t(a,b,c, primary key(a,b,c)); Above assumes that due to the nature of how SQLite always stores tables in a B-tree, this would in essence be like an index. But this wouldn't work if columns a,b,c are not unique. Is there any way to declare a primary key without the uniqueness constraint? Kind regards, lefteris. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unql
Warning: Only my opinion too :-) . In madIS, we ended up faking some of the features that document based databases have. Without these features, it would be extremely hard to build queries that describe advanced processing workflows. The main features that we ended up faking are: - JSON "packs" (or jpacks in madIS), which are in essence JSON lists that contain many values. The main difference between madIS's jpacks and regular JSON lists is that a jpack(base_type) = base_type . Examples: select jpack(1) as C; C 1 select jpack(1,2) as C; C [1,2] - Nested tables, without which SQL is severely deficient in describing any advanced processing. Example: select jsplit("[1,2]"); C1 | C2 --- 1 | 2 - Virtual tables having queries as parameters. select * from (XMLPARSE select "row1val1row1val1b"); {"a/b":"row1val1"} {"a/b":"row1val1b"} From my standpoint, UnQL is a very nice first try towards solving above DB deficiencies. It would be a shame if this effort is abandoned. Concerning the other query languages. Cassandra's SQL inspired queries are very limiting. There is a company that merged Cassandra with Hadoop/Hive, providing Cassandra with more advanced querying abilities, but the last time a looked at it, it wasn't working so well. Riak and a number of other key-value stores that do not have rich query interfaces are useful only for OLTP workloads and do not cover OLAP workloads at all. I don't like MongoDB's approach very much. It seems to me that extending/expanding SQL to cover MongoDB's use cases would be a better solution than starting from JSON and working towards SQL. While i like JSON very much, whenever i needed to do a MongoDB query i had to translate from SQL to MongoDB through the use of a translation table. For better or worse, SQL is the de facto lingua franca of query languages. IMHO, UnQL has a lot of promise in expanding SQL into domains that right now are served poorly from existing querying solutions. l. On 11/12/12 05:28, Roger Binns wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/12/12 19:01, dcharno wrote: What ever happened to Unql and is there any chance it will be revived? It seemed like it would have been incredible useful. Warning: My opinion only Unql seemed to be especially pointless. Being somewhat compatible with SQL syntax made no sense - SQL syntax and all the tools using it assume the relational model and simple table like data layouts. NoSQL databases vary, but generally they are non-relational and schemaless, and JSON interoperable ones have a structure completely different than SQL tables (eg item nesting). Every NoSQL database already has its own query mechanism so Unql would always be a second class citizen if any of them ever adopted Unql. NoSQL databases also tend to have looser consistency, have multiple servers available and various other semantics very different than SQL servers. The native NoSQL interfaces expose and work with those semantics. Cassandra did actually end up with something SQL inspired: http://cassandra.apache.org/doc/cql/CQL.html Riak uses something like Google advanced searches (field:value): http://docs.basho.com/riak/latest/cookbooks/Riak-Search---Querying/ MongoDB uses JSON with extra magic operators: http://docs.mongodb.org/manual/applications/read/ Personally I like MongoDB approach where queries strongly resemble the underlying stored data which means very little translation between the two. Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAlDGqEAACgkQmOOfHg372QQ8OACgxbOBcp1F5rADh9Uw5+0efsEe 5RQAn3sim96zcz6x2lKMXF+B7Sp20P1A =OHFc -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Prefix joins
Hello, Is there a version of "LIKE", in SQLite, which makes it clear that we only care to have a pattern on the postfix of a column? An example follows: select * from a,b where POSTFIX_LIKE(a.c1, b.c1 ,'%'); In above example, POSTFIX_LIKE works in the same way as if we had written: select * from a,b where a.c1 LIKE b.c1||'%'; but with the additional guarantee for the optimizer that all pattern matching will happen on the postfix and not on the prefix, so the optimizer will be able to use an index to do the join. Thanks in advance, Lefteris. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Union all patch, preliminary tests
In my preliminary tests, the patch has worked wonderfully. In these first tests i have only tested plain SQLite functionality (without virtual table or additional functions) on a 1.5GB DB. Some remarks: - It is lightning fast. For queries that fit to the free space (some of them could not finish before due to space constraints) and previously took minutes to finish, with the patch they are instantaneous and they do not touch the drive. The results are the same. - I was expecting that only simple scans over unioned queries would be affected by the patch. When i tried to do a filter on a unioned table composition, and the query was again instantaneous, i was flabbergasted. The patch really pushes index accesses downwards into the unioned tables. I didn't expect the patch to go that far. I'm amazed, - Group by on unioned all tables was also instantaneous without filling the hard disk. - Oddly, order by works the same both with the patch and without the patch (SQLite 3.7.14.1). I assume that order by had a similar optimization in previous versions of SQLite too? - On all of the above tests, the free space on the hard disk wasn't affected by the running query. More tests will be done in the context of madIS, but they will take more time. I want to test plain SQLite first, because this is what most people will use in practise. Best regards, lefteris. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Union all writting on /var/tmp
Hello, Mr. Hipp please excuse my attitude on my first email (the one your replied to). It came after 3 days of intense pressure to find out what the problem of machines coming to a crawl whenever a particular query with sufficiently large union-ed all tables was run. Due to the quality of SQLite it literally was the last thing that i looked at. I even had prepared emails to other people asking them to look at their software, and to not to buffer on the hard disk so much. Concerning your effort to fix it. I also suspected about the "path of least resistance", considering "union all". This is why in my next emails i mainly asked for a switch to change the "/var/tmp" path. The other thing that i've asked for, was for the documentation to have a warning about union all's behaviour. I believed that warning could save a lot of time and effort for other people that tripped on the same thing as me. Thank you very very much for your fix. I'm glad that you put the considerable effort to it. My Phd was in databases, so i can understand how much effort this fix required. I have already downloaded the patch and i'll test it asap. Due to me also working/having worked on Open Source software (madIS, Rigs of Rods), i realize your second point in my skin. Nevertheless sometimes, i also slide to this kind of behaviour. I'll try to be on guard against it in the future. Best regards, lefteris. On 30/10/12 03:08, Richard Hipp wrote: On Fri, Oct 26, 2012 at 2:16 PM, Eleytherios Stamatogiannakis <est...@gmail.com <mailto:est...@gmail.com>> wrote: I have been observing the following freaky behaviour of SQLite. When i run: select count(*) from (select * from huge_table union all select * from huge_table); Sqlite starts writting in /var/tmp/ a file like: /var/tmp/etilqs___gblRd6vUPcx91Hl, the root partition of fills up and an error is raised. Why does SQLite 3.7.14.1 need to write at all when doing union all? It seems to me that there is no reason for doing so. The reason for using a temp table for UNION ALL in a subquery is because that is the path of least resistance. The same identical code can handle UNION ALL, UNION, INTERSECT, EXCEPT and various other cases. Some queries (such as your UNION ALL) can in fact do without the temp table. But those are special cases that have to coded separately. Adding, testing, and maintaining that extra code involves a lot of work. And the extra code risks introducing bugs that might appear even for people who are not doing a UNION ALL in a subquery. And in over 12 years of use, in over a million different applications, with over 2 billion deployments, nobody has ever before requested this optimization. At http://www.sqlite.org/src/info/7af3acbbd4 there is a patch that adds the optimization to avoid using a temp table for your UNION ALL queries. This one small patch represents about 12 hours of intense work, so far. Much more work will be required to get the patch performing to our release standards. All of this effort on your behalf you are receiving for free. In return, we ask two things: (1) Please download and test the patch and report any problems, including performance problems. (2) Please learn to be less grumpy, demanding, and condescending when requesting help with software towards which you have contributed nothing. You have received this latest patch, and indeed all of SQLite, by grace. Therefore, please extend the same grace toward others. Best regards, lefteris. _ sqlite-users mailing list sqlite-users@sqlite.org <mailto:sqlite-users@sqlite.org> http://sqlite.org:8080/cgi-__bin/mailman/listinfo/sqlite-__users <http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users> -- D. Richard Hipp d...@sqlite.org <mailto:d...@sqlite.org> ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Union all writting on /var/tmp documentation warning
My understanding (and what my experiments have shown) is that in both cases "COMPOUND SUBQUERIES 1 AND 2" will write a temp file to /var/tmp. IMHO, the documentation should warn about this writing behaviour, because for the second case (union all) it isn't expected/predictable because fully buffering is not needed. lefteris. On 29/10/12 20:41, Clemens Ladisch wrote: Eleytherios Stamatogiannakis wrote: "union all" works exactly like plain "union". It always materializes its input. sqlite> explain query plan select 1 union select 2; sele order from deta - 0 0 0 COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION) sqlite> explain query plan select 1 union all select 2; sele order from deta - 0 0 0 COMPOUND SUBQUERIES 1 AND 2 (UNION ALL) It does not. (In the full "explain" output, "OpenEphemeral" is missing.) Neither with real tables. What particular query behaves unexpectedly for you? Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Union all writting on /var/tmp documentation warning
Look at what that page says about "union all" (emphasis on *all*) --SNIP-- "Note that the UNION ALL operator for compound queries does not use transient indices by itself (though of course the right and left subqueries of the UNION ALL might use transient indices depending on how they are composed.)" --/SNIP-- At least to my eyes, above says what i was expecting before realizing what actually happens, that "union all" tries to not materialize its results when possible. What the truth is, concerning materialization, is that in SQLite "union all" works exactly like plain "union". It always materializes its input. lefteris. On 29/10/12 16:37, Clemens Ladisch wrote: Eleytherios Stamatogiannakis wrote: Can a warning about "union all"'s behaviour of buffering everything in /var/tmp be added in SQLite's documentation? Like this? http://www.sqlite.org/tempfiles.html Regards, Clemens ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Union all writting on /var/tmp documentation warning
Can a warning about "union all"'s behaviour of buffering everything in /var/tmp be added in SQLite's documentation? I think that such a warning could save a lot of time for other SQLite users that trip over the same thing as i did. Thank you, lefteris. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Union all writting on /var/tmp
I have been observing the following freaky behaviour of SQLite. When i run: select count(*) from (select * from huge_table union all select * from huge_table); Sqlite starts writting in /var/tmp/ a file like: /var/tmp/etilqs_gblRd6vUPcx91Hl, the root partition of fills up and an error is raised. Why does SQLite 3.7.14.1 need to write at all when doing union all? It seems to me that there is no reason for doing so. Best regards, lefteris. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Please test the latest SQLite enhancements
Is the covering index scan optimization enabled in the mentioned amalgamation? For large datasets it can speed up the queries a lot. l. On 09/10/12 20:19, Richard Hipp wrote: SUMMARY: If you have a complex application that uses SQLite, please test your application with the SQLite amalgamation found at http://www.sqlite.org/sqlite3-20121009.zip and report to me (via private email) whether or not you encounter any problems. DETAILS: We have recently made a number of significant enhancements to the query optimizer in SQLite. In particular, the latest code does a much better job of recognizing when ORDER BY clauses can be satisfied by careful selection of indices and scan orders and without having to do any sorting. This optimization can result in significant performance improves for queries with large result sets. The current implementation has already been well tested: (1) All legacy tests pass (2) 100% branch test coverage (3) The Fossil server that hosts the SQLite source code is using the latest SQLite (4) This email is being composed on an instance of Firefox Nightly that is running the latest SQLite code However, with such extensive changes to the query optimizer there is a heightened risk of missing obscure corner cases in which SQLite omits sorting for an ORDER BY clause when the sort is actually needed, resulting in output appearing in the wrong order. You can help reduce this risk, and help us to ensure that the forthcoming SQLite version 3.7.15 is trouble-free, by compiling the SQLite amalgamation snapshot found at http://www.sqlite.org/sqlite3-20121009.zip into your application and then testing your application to verify that it is still working correctly. If you find any problems, please let me know. If your application continues to work normally (though perhaps a little faster) I would appreciate hearing from you then too. I are especially interested in hearing from people whose applications contain large schemas, descending indices, and complex joins containing ORDER BY clauses with multiple terms and a mixture of ASC and DESC. Thank you for your help. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Unofficial poll
Sqlite's dynamic typing made it a natural fit for using it with Python UDFs in madIS: https://code.google.com/p/madis/ Absence of the feature would have complicated the whole "functional relational" [*] workflow that madIS uses a *lot*. l. [*] Instead of Python functions calling SQL, have SQL call Python functions. On 23/09/12 13:37, Baruch Burstein wrote: I am curious about the usefulness of sqlite's "unique" type handling, and so would like to know if anyone has ever actually found any practical use for it/used it in some project? I am referring to the typeless handling, e.g. storing strings in integer columns etc., not to the non-truncating system e.g. storing any size number or any length string (which is obviously very useful in many cases). Has anyone ever actually taken advantage of this feature? In what case? ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SQL:2003 -- Window Functions
While i haven't needed this functionality, i have been thinking about how it could be solved in a performant way. IMHO, the solution would be to create a virtual table that takes, for example, as input a query that produces the following rows: C1 | C2 a1 | b1 a2 | b2 a3 | b3 and outputs the following: C1 | C2 | C1prev | C2prev -- a1 | b1 | null | null a2 | b2 | a1 | b1 a3 | b3 | a2 | b2 So in essence the window gets transformed to a single row. In a similar way, bigger windows could also be transformed. l. On 20/09/12 14:46, Gabor Grothendieck wrote: On Wed, Sep 19, 2012 at 12:51 PM, joe.fis...@tanguaylab.comwrote: Too bad SQLite doesn't yet support SQL Window Functions. Are there any SQLite Extension Libraries that support "SQL:2003 type Window Functions"? I specifically need LEAD and LAG to calculate an event integer timestamp delta between consecutive rows. I've played with some self-join code but that's proving to be complicated. SQL Window Functions is the number one feature that I could use as well. In R, sqlite can be used for manipulating R data frames via the sqldf package and this sort of functionality would be very useful. (sqldf also handles PostgreSQL which does have windowing functions but PostgreSQL requires more set up than sqlite so its not as accessible to users.) ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Javascript API for SQLite?
There is a version of SQLite compiled in Javascript: http://www.syntensity.com/static/sqlite_wip.html But as others said, it is not possible to do block access on files from inside a Web browser's contained Javascript VM. Nevertheless, theoretically you could load the full DB file in memory [*], do your operations on it in memory and then write it as a whole back to the disk, all on the client side. lefteris. [*] https://github.com/eligrey/FileSaver.js On 14/09/12 16:05, Simon Slavin wrote: On 13 Sep 2012, at 3:13pm, Jean-Denis Muyswrote: I work on an application that uses an SQLite database as it's binary document format (with a custom extension). Now we want to develop a Web App variation of that application, that should be able to read and write those documents/databases. Since the Web App is written in Javascript, I am now looking for a Javascript implementation of the SQLite library. This can't be done entirely in JavaScript, since JavaScript running in a web browser has no way of getting at files on your hard disk. This is to prevent the programmers of a web site spying on your computer's files. I have used the C SQLite library in the past, so I know about using SQLite from C. However, I am just starting with Javascript and Web Apps and I am quite a newbie on that side (Other people on the team are experienced, but I have been asked to work on the SQLite integration). There are ways you can allow JavaScript to access data inside a file on a web server. The standard way is to write a shim in PHP or some similar language. The PHP code runs on the web server and uses PHP's SQLite3 library to access databases. You ask it to execute a SQLite command, and it returns the results in JSON (or some other) format. So for instance https://myserver.com/databases/doSQLCommand.php?file=money.sqlite=SELECT * FROM transactions WHERE id=123 might return a page of application/json type containing {id: 123, trandate: "20030205", amount: 274.53} Of course in real life you're more likely to pass the parameters using POST than GET. Your JavaScript code asks the shim for the data using an XMLHttpRequest and uses JSON.parse() to turn the results into an array or an object. There are, of course, many security concerns with such a setup, so most shim programs check to see that they're being called only from their own server, by a program they recognise, running on a computer they recognise. I sometimes use a setup like this, though my shim returns the requested results as only a small part of the stuff it returns, the rest being things like error messages and last-insert-id and stuff like that. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Covering index scan optimization
Hello, I've just wanted to ask about using covering indexes for scans. A very rudimentary test: create table t (c1,c2, c3, c4); create index idxtc1 on t(c1); The simple "select" scans the full table: explain query plan select c1 from t; SCAN TABLE t (~100 rows) A select with a dummy "order by" uses the covering index: explain query plan select c1 from t order by c1; SCAN TABLE t USING COVERING INDEX idxtc1 (~100 rows) It seems to me that using a covering index scan would always be faster in both cases (fewer disk page reads). Am i wrong? Is there a reason for SQLite to not use a covering index for scans? Thank you in advance, lefteris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] madIS v1.4 release
madIS is an extensible relational database system built upon the SQLite database and with extensions written in Python (via APSW SQLite wrapper). Its is developed at: http://madis.googlecode.com Due to madIS’ SQLite core, the database format of madIS is exactly the same as SQLite’s one. This means that all SQLite database files are directly usable with madIS. In usage, madIS, feels like a lightweight personal Hive+Hadoop programming environment, without the distributed processing capabilities of Hadoop. Nevertheless due to its low overhead while running on a single computer (compared to Hadoop), madIS can easily handle tens of millions of rows on a single desktop/laptop computer. In version 1.4 of madIS: - XMLPARSE can now work without any provided prototype, producing JSON dicts which contain all parsed XML data. XMLPARSE accepts now Jdicts and Jlists in addition to XML snippets as prototypes. - FILE works with gzip compressed files, HTTP and FTP streams directly in a streaming way. - New functions which work with Jdicts were added (jdictkeys, jdictvals, jdictsplit, jdictgroupunion). - APACHELOGSPLIT parses and splits Apache log lines. - Optimizations in Virtual Tables (up to 3 times faster). XMLPARSE is up to 2x faster (using fast:1 switch). About XMLPARSE: XMLPARSE can now work without any prototype. In this mode it produces JSON dict entries containing all the paths:data below the provided root tag: -- Example mterm> select * from (XMLPARSE root:entry FILE 'http://code.google.com/feeds/p/madis/hgchanges/basic') limit 1; {"entry/updated":"2011-12-09T17:18:43Z","entry/id":"http://code.google.com/feeds/p/madis/hgchanges/basic/92bc61a496b3a34c21c5aed6d9c6cde5ac63121e","entry/link/@/href":"http://code.google.com/p/madis/source/detail?r=92bc61a496b3a34c21c5aed6d9c6cde5ac63121e","entry/link/@/type":"text/html","entry/link/@/rel":"alternate","entry/title":"Revision 92bc61a496: Fixed help formatting","entry/author/name":"est...@servum","entry/content/@/type":"html","entry/content":"Changed Paths:\n Modify/src/functions/vtable/xmlparse.py\n \n \n Fixed help formatting"} -- If one wishes to find the all the paths that appear in above feed he could use the jgroupunion aggregate function, producing the union of all XML paths: -- Example mterm> select JGROUPUNION(c1) from (XMLPARSE root:entry FILE 'http://code.google.com/feeds/p/madis/hgchanges/basic'); ["entry/updated","entry/id","entry/link/@/href","entry/link/@/type","entry/link/@/rel","entry/title","entry/author/name","entry/content/@/type","entry/content"] -- If one wishes to find the common set of XML paths that appear in all of ATOM feed's entries then he could do: --Example mterm> select JGROUPINTERSECTION(c1) from (XMLPARSE root:entry FILE 'http://code.google.com/feeds/p/madis/hgchanges/basic'); ["entry/updated","entry/id","entry/link/@/href","entry/link/@/type","entry/link/@/rel","entry/title","entry/author/name","entry/content/@/type","entry/content"] -- Note: Intersection in this example is the same as union -- Finally, to output in a tabular form the contents of the ATOM feed, one simply has to provide the list of paths as a parameter to XMLPARSE: --Example mterm> select * from ( XMLPARSE root:entry '["entry/updated","entry/id","entry/link/@/href","entry/link/@/type","entry/link/@/rel","entry/title","entry/author/name","entry/content/@/type","entry/content"]' FILE 'http://code.google.com/feeds/p/madis/hgchanges/basic') limit 1; [1|2011-12-09T17:18:43Z [2|http://code.google.com/feeds/p/madis/hgchanges/basic/92bc61a496b3a34c21c5aed6d9c6cde5ac63121e [3|http://code.google.com/p/madis/source/detail?r=92bc61a496b3a34c21c5aed6d9c6cde5ac63121e [4|text/html [5|alternate [6|Revision 92bc61a496: Fixed help formatting [7|est...@servum [8|html [9|Changed Paths: Modify/src/functions/vtable/xmlparse.py Fixed help formatting --- [0|Column names --- [1|updated [2|id [3|link_href [4|link_type [5|link_rel [6|title [7|author_name [8|content_type [9|content -- Lefteris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] New madIS v1.3 release
I've mainly used JSON because it is a well defined and widely used standard. JSON also contains associative arrays (which currently are not used in madIS). From what little i've read about Tcl lists, i believe that JSON lists are better for the eye. Compare this: ["this is the first", "second", "and third sentence"] to this: "this is the first" second "and third sentence" In the top example the commas help the eye to distinguish between the values. Nevertheless you could devise an alternative example with a lot of commas inside the strings, which would make JSON lists more difficult to read. So in the end i think it is a matter of taste which of the two is more preferable, and the kind of data that one has to deal with. l. On 26/07/11 10:08, Alexey Pechnikov wrote: > Why JPack? May be Tcl lists will be more useful? The tcl dictionary (also known as associative array) can be stored as list too. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] New madIS v1.3 release
madIS is an extensible relational database system built upon the SQLite database and with extensions written in Python (via APSW SQLite wrapper). Its is developed at: http://madis.googlecode.com Due to madIS’ SQLite core, the database format of madIS is exactly the same as SQLite’s one. This means that all SQLite database files are directly usable with madIS. In usage, madIS, feels like a lightweight personal Hive+Hadoop programming environment, without the distributed processing capabilities of Hadoop. Nevertheless due to its low overhead while running on a single computer (compared to Hadoop), madIS can easily handle tens of millions of rows on a single desktop/laptop computer. In version 1.3 of madIS: - A great deal of testing has been done on Linux, Windows and Mac OSX. - madIS now only depends on APSW on all these systems, so it is easier to be installed - XMLPARSE was added. XMLPARSE processes its input in a streaming fashion, and has been tested with very large (~20+ GB) XML source files without problems. - JPACK functions were added. Jpacks are now the preferable way to store a set of values into a single tuple. For easy viewing and exporting of the jpacks, their format was based on the JSON format. - Heavy testing under Windows and Mac OSX. CLIPBOARD and CLIPOUT virtual tables work under all OSes. - CLIPOUT and CLIPBOARD, have been tested with Excel, Libre/Open Office Calc, and iWork Numbers. - Functions that return tables, can easily be coded now, by using Python's generators (yield) - A lot of completions (via TAB) have been added to mterm. Mterm's completion engine can automatically complete, tables, column names, table index names and database filenames in attach database. In detail: MTERM changes: While using madIS's terminal (mterm), mterm completes (via TAB) column names, tables names, etc. of the opened and attached databases. Also by default mterm colours column separators and if more than 4 columns are returned, mterm "tags" the columns with numbers: mterm> select * from deficit; [1|1 | People's Republic of China [3|272.500 |2010 [1|2 | Japan [3|166.500 |2010 [1|3 | Germany [3|162.300 |2010 --- Column names --- [1|Rank [2|Country [3|CAB [4|Year Note: In mterm the column number tags are coloured red in above example XMLPARSE: If i wished to retrieve the date and author of madIS project's Source Changes ATOM feed: mterm> select * from (XMLPARSE 'tt' select * from file('http://code.google.com/feeds/p/madis/hgchanges/basic') ) limit 3; 2011-07-25T14:07:07Z|est...@servum 2011-07-25T14:04:09Z|est...@servum 2011-07-22T14:08:11Z|est...@servum --- Column names --- [1|updated [2|author_name Query executed in 0 min. 0 sec 543 msec mterm> In above query, XMLPARSE is used in an "inverted form" which is easier to write, when chaining virtual tables. JPACKS: Frequently, the need to store multiple values into a tuple arises while processing data. Previously in madIS a lot of formats were used to store all these multiple values (space separated, comma separated, tab separated). Now JPACKs are the recommended way to store multiple values. JPACKs are based on the JSON format, with the exception that a JPACK of a single value is itself. Some examples are presented below: mterm> select jpack('a'); a mterm> select jpack('a', 'b'); ["a","b"] mterm> select jsplit(jpack('a','b','c')); a|b|c mterm> select jsplitv(jpack('a','b','c')); a b c mterm> select jgroup(c1) from (select 'a' as c1 union select 'b'); ["a","b"] CLIPBOARD and CLIPOUT: If a selection of data has been made in the web browser or a spreadsheet program then by executing the following in mterm we could access the clipboard data as a table: mterm> select * from CLIPBOARD(); Rank↓ |Country↓ |CAB (billion US dollars)↓ |Year↓ 1 | People's Republic of China |272.500 |2010 2 | Japan |166.500 |2010 3 | Germany |162.300 |2010 --- Column names --- [1|C1 [2|C2 [3|C3 [4|C4 Query executed in 0 min. 0 sec 204 msec * For above data i selected the top 3 rows while browsing http://en.wikipedia.org/wiki/List_of_sovereign_states_by_current_account_balance . Similarly if i wished to process and send the data to a spreadsheet program then i could execute the following in mterm: mterm> CLIPOUT select * from deficit; 1 --- Column names --- [1|return_value Query executed in 0 min. 0 sec 111 msec and paste in a spreadsheet program. Both CLIPBOARD and CLIPOUT are virtual tables. CLIPOUT is used in an "inverted form" which is easier to write, when chaining queries. -- Lefteris ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users