Re: [sqlite] Limit of attached databases

2013-06-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/06/13 08:14, Eleytherios Stamatogiannakis wrote:
> We have tried with both views and VTs but SQLite does not create
> automatic indexes on them at all. So right now, to be able to have
> automatic indexes from SQLite's side we materialize all Virtual Tables
> into plain tables:

There is no reason that the virtual table implementation can't create
automatic indexes.  In response to the xBestIndex calls, the indices don't
actually have to exist.  You just need to ensure the estimated cost is
relatively useful.  If SQLite then decides to use the index you can then
create it on the fly.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlGuQQMACgkQmOOfHg372QRxxwCfV1QvfO6Fsky0x3krTLe08+Nm
EN4AniM6kkOHgcTm/mtREY3iD4QAF9o+
=KOXn
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit of attached databases

2013-06-04 Thread Eleytherios Stamatogiannakis

On 04/06/13 17:37, Simon Slavin wrote:


On 4 Jun 2013, at 3:09pm, Eleytherios Stamatogiannakis  wrote:


Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read only* 
attached DBs?


See section 11 of



It's a 64-bit value, and two bits are already taken up.


Yes i have seen it in SQLite's code. I considered changing it to a 
bitfield, but the problem is that this long int is used in various other 
places in SQLite's internals for transaction' metadata.




You can attach databases, copy data from them to the main database, then detach 
those and attach some others.  Or you can create a hierarchy of shards (each of 
62 shards can point to up to 62 others).  Or you can rewrite your code so it 
never uses more than 62 shards no matter how many nodes are available.


Attaching and detaching is only useful when materializing the shards 
into a single table:


create table T
attach T1
insert into T select * from T1.T
detach T1
attach T2
insert into T select * from T2.T
...

How would a hierarchy of shards work? You cannot attach a DB onto 
another attached DB for the hierarchy idea to work.


Also, only using 62 shards at a time can be very constrained/slow in our 
use case (see below).



None of them good solutions, I'm afraid.


Yes :-(.


Also is there anyway for SQLite to create an automatic index on a view (or 
Virtual Table), without having to first materialize the view (or VT)?


I believe that SQLite needs the data to be in one place (i.e. at least a 
virtual table) for the indexing routine to work.


We have tried with both views and VTs but SQLite does not create 
automatic indexes on them at all. So right now, to be able to have 
automatic indexes from SQLite's side we materialize all Virtual Tables 
into plain tables:


create temp table T as select * from UnionAllVT1;
create temp table G as select * from UnionAllVT2;
...

which doubles our I/O to process a single sharded table.

 - 1 full read + 1 full write of all data to materialize the UnionAllVT 
into a plain table.
 - 1 full read + 1 full write of the data in the materialized table to 
create the automatic index.


It would be very nice if the automatic index could be created directly 
from the UnionAllVT, but we haven't found a way to do it.



If you're willing to put a bit of SQLite-only effort in, you could implement 
your own virtual table implementation that consulted data on each of your 
nodes.  This would be quite highly customised for your own application's 
requirements but it would mean you didn't have to do any attaching or detaching 
at all.  Your SQLite API calls could address your data as if it was all in one 
database file but SQLite would understand how data is partitioned between nodes 
and automatically gather it from all the necessary nodes.


We already have done this (creating VTs is very easy in madIS [*]). We 
have the UnionALL virtual table that scans over all the DB shards.


Above UnionALL VT only supports scans, and it is only used to 
materialize the shards into a regular table. It would be very costly 
having per shard indexes, because each Filter on the UnionALL VT would 
need to be passed to all of the shards. A single automatic index, works 
best.



Another way to do it would be to implement your own VFS which would distribute 
over the nodes not at the row level but as if they were all one huge storage 
medium (i.e. like a RAID).


Each shard that we use is already a self contained SQLite DB. We would 
need to change our whole approach to convert to a disk page based 
sharding approach.


Thanks for your ideas.

l.

[*] 
https://code.google.com/p/madis/source/browse/src/functions/vtable/unionalldb.py


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit of attached databases

2013-06-04 Thread Michael Black
Oops...make that an unsigned int.

Change this declaration
#if SQLITE_MAX_ATTACHED>30
  typedef __uint128_t yDbMask;

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Michael Black
Sent: Tuesday, June 04, 2013 9:51 AM
To: est...@gmail.com; 'General Discussion of SQLite Database'
Subject: Re: [sqlite] Limit of attached databases

Gcc does have a __int128_t and __uint128_t available if you're on 64-bit and
have a current enough gcc (I'm using 4.4.4 and this works on Linux and
Windows)
Looks like a fairly easy change in the code.
Unless somebody already knows that this won't work?

main()
{
__uint128_t i128;
printf("i128=%d\n",sizeof(i128));
}
i128=16

Change this to 126
#ifndef SQLITE_MAX_ATTACHED
# define SQLITE_MAX_ATTACHED 126
#endif

Change this declaration
#if SQLITE_MAX_ATTACHED>30
  typedef __int128_t yDbMask;

Change this to 126 instead of 62
#if SQLITE_MAX_ATTACHED<0 || SQLITE_MAX_ATTACHED>62
# error SQLITE_MAX_ATTACHED must be between 0 and 62
#endif

And see if it works OK for you.  I have no way to test this but it does
compile for me.

I don't see why it wouldn't work.


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Eleytherios
Stamatogiannakis
Sent: Tuesday, June 04, 2013 9:09 AM
To: sqlite-users@sqlite.org
Subject: [sqlite] Limit of attached databases

Hi,

During our work on a distributed processing system (which uses SQLite 
shards), we have hit the SQLITE_MAX_ATTACHED limit of attached DBs.

The way we use SQLite for distributed processing [*], is the following:
  - Each table is sharded into multiple SQLite DBs on different nodes of 
the cluster.
  - To process a query, we run on each shard a query which produces 
multiple sharded SQLite result DBs.
  - We redistribute in the cluster the result DBs, and the next set of 
cluster nodes, attaches all the input shard SQLite DBs, and it creates a 
temp view that unions all the input DB shards into a single view.
  - It then executes a query on the views that produces new result DB shards
  - and so on

We recently got access to a cluster of 64 nodes and it is very easy now 
to hit the SQLITE_MAX_ATTACHED limit (1 DB shard gets produced per node).

So the question that i have is:

Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read 
only* attached DBs?

Also is there anyway for SQLite to create an automatic index on a view 
(or Virtual Table), without having to first materialize the view (or VT)?

Thanks in advance.

Lefteris Stamatogiannakis.

[*] The same processing ideas are used in hadapt:

http://hadapt.com/

which uses Postgres for the DB shards.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit of attached databases

2013-06-04 Thread Michael Black
Gcc does have a __int128_t and __uint128_t available if you're on 64-bit and
have a current enough gcc (I'm using 4.4.4 and this works on Linux and
Windows)
Looks like a fairly easy change in the code.
Unless somebody already knows that this won't work?

main()
{
__uint128_t i128;
printf("i128=%d\n",sizeof(i128));
}
i128=16

Change this to 126
#ifndef SQLITE_MAX_ATTACHED
# define SQLITE_MAX_ATTACHED 126
#endif

Change this declaration
#if SQLITE_MAX_ATTACHED>30
  typedef __int128_t yDbMask;

Change this to 126 instead of 62
#if SQLITE_MAX_ATTACHED<0 || SQLITE_MAX_ATTACHED>62
# error SQLITE_MAX_ATTACHED must be between 0 and 62
#endif

And see if it works OK for you.  I have no way to test this but it does
compile for me.

I don't see why it wouldn't work.


-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Eleytherios
Stamatogiannakis
Sent: Tuesday, June 04, 2013 9:09 AM
To: sqlite-users@sqlite.org
Subject: [sqlite] Limit of attached databases

Hi,

During our work on a distributed processing system (which uses SQLite 
shards), we have hit the SQLITE_MAX_ATTACHED limit of attached DBs.

The way we use SQLite for distributed processing [*], is the following:
  - Each table is sharded into multiple SQLite DBs on different nodes of 
the cluster.
  - To process a query, we run on each shard a query which produces 
multiple sharded SQLite result DBs.
  - We redistribute in the cluster the result DBs, and the next set of 
cluster nodes, attaches all the input shard SQLite DBs, and it creates a 
temp view that unions all the input DB shards into a single view.
  - It then executes a query on the views that produces new result DB shards
  - and so on

We recently got access to a cluster of 64 nodes and it is very easy now 
to hit the SQLITE_MAX_ATTACHED limit (1 DB shard gets produced per node).

So the question that i have is:

Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read 
only* attached DBs?

Also is there anyway for SQLite to create an automatic index on a view 
(or Virtual Table), without having to first materialize the view (or VT)?

Thanks in advance.

Lefteris Stamatogiannakis.

[*] The same processing ideas are used in hadapt:

http://hadapt.com/

which uses Postgres for the DB shards.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Limit of attached databases

2013-06-04 Thread Simon Slavin

On 4 Jun 2013, at 3:09pm, Eleytherios Stamatogiannakis  wrote:

> Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read only* 
> attached DBs?

See section 11 of



It's a 64-bit value, and two bits are already taken up.

You can attach databases, copy data from them to the main database, then detach 
those and attach some others.  Or you can create a hierarchy of shards (each of 
62 shards can point to up to 62 others).  Or you can rewrite your code so it 
never uses more than 62 shards no matter how many nodes are available.

None of them good solutions, I'm afraid.

> Also is there anyway for SQLite to create an automatic index on a view (or 
> Virtual Table), without having to first materialize the view (or VT)?

I believe that SQLite needs the data to be in one place (i.e. at least a 
virtual table) for the indexing routine to work.

If you're willing to put a bit of SQLite-only effort in, you could implement 
your own virtual table implementation that consulted data on each of your 
nodes.  This would be quite highly customised for your own application's 
requirements but it would mean you didn't have to do any attaching or detaching 
at all.  Your SQLite API calls could address your data as if it was all in one 
database file but SQLite would understand how data is partitioned between nodes 
and automatically gather it from all the necessary nodes.



Another way to do it would be to implement your own VFS which would distribute 
over the nodes not at the row level but as if they were all one huge storage 
medium (i.e. like a RAID).



I don't know which, if either, to recommend.  This kind of programming is 
beyond me, but someone into C and with a good understanding of your farm should 
be able to do it.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Limit of attached databases

2013-06-04 Thread Eleytherios Stamatogiannakis

Hi,

During our work on a distributed processing system (which uses SQLite 
shards), we have hit the SQLITE_MAX_ATTACHED limit of attached DBs.


The way we use SQLite for distributed processing [*], is the following:
 - Each table is sharded into multiple SQLite DBs on different nodes of 
the cluster.
 - To process a query, we run on each shard a query which produces 
multiple sharded SQLite result DBs.
 - We redistribute in the cluster the result DBs, and the next set of 
cluster nodes, attaches all the input shard SQLite DBs, and it creates a 
temp view that unions all the input DB shards into a single view.

 - It then executes a query on the views that produces new result DB shards
 - and so on

We recently got access to a cluster of 64 nodes and it is very easy now 
to hit the SQLITE_MAX_ATTACHED limit (1 DB shard gets produced per node).


So the question that i have is:

Is there any way to go beyond the SQLITE_MAX_ATTACHED limit for *read 
only* attached DBs?


Also is there anyway for SQLite to create an automatic index on a view 
(or Virtual Table), without having to first materialize the view (or VT)?


Thanks in advance.

Lefteris Stamatogiannakis.

[*] The same processing ideas are used in hadapt:

http://hadapt.com/

which uses Postgres for the DB shards.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users