Re: [sqlite] Feature request: extend the IN operator

2011-08-08 Thread Alexey Pechnikov
2011/8/8 Simon Slavin :
> You're quite right.  If someone was going to write that functionality into a 
> SQLite app in real life they'd probably use 'LIKE' or 'GLOB' and store all 
> the possibilities in one row.

FTS3/FTS4 is better as index for lists. See my test script for Igor.
We can fast search any id by using "match" operator on FTS table.

P.S. FTS table has nice scalability. Check insertion a lot of records
and insertion speed is constant. I did  try 400 millions of records
(and did get database size > 100 Gb).

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Edzard Pasma
On 7-aug-2011, om 13:17, Alexey Pechnikov wrote:

> Is it possible to support construction like to
>
> where x IN "1 2 3"
>
> There are a lot of situations when is very useful to store list of
> identifiers in table field.
> I know about the intarray virtual table but it's need
> constructor/destructor calls (and
> produce segfaults with the autorizer function).
>
> Does somebody interesting in this functionality too?
>
> --  
> Best regards, Alexey Pechnikov.
> http://pechnikov.tel/

Hello,

I would be interested in this functionality too, or more generalized  
in an inverse function for group_concat. Earlier, group_split was  
proposed. For SQLite I imagine this to become a virtual table. I  
created one, through APSW, with the following schema:

 group_split (line, words)

(when I use .schema group_split in the APSW shell, it  only display  
the module name)

The dialog below shows its usage. I'm not completely happy about it  
yet. The predefined column names (line and words) seem artificial. It  
lacks the option to specify a seperator character. I imagine a third  
column but that will meke the definition of the filter much more  
difficult. A built-in soultion would be welcome.

Best regards,  Edzard Pasma.

 SQLite version 3.7.7.1 (APSW 3.7.7.1-r1)
 Enter ".help" for instructions
 Enter SQL statements terminated with a ";"
 sqlite> .schema group_split
 CREATE VIRTUAL TABLE group_split USING mymod();
 sqlite> create table t1 (grp, elem);
 sqlite> insert into t1 values ('g1', 'aap');
 sqlite> insert into t1 values ('g1', 'noot');
 sqlite> insert into t1 values ('g1', 'mies');
 sqlite> insert into t1 values ('g2', 'wim');
 sqlite> insert into t1 values ('g2', 'zus');
 sqlite> insert into t1 values ('g2', 'jet');
 sqlite> select grp, group_concat (elem) from t1 group by grp;
 g1|aap,noot,mies
 g2|wim,zus,jet
 sqlite> select grp, word
 from (select grp, group_concat (elem) elems from t1 group by grp)
 join group_split on line = elems;
 g1|aap
 g1|noot
 g1|mies
 g2|wim
 g2|zus
 g2|jet

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Simon Slavin

On 7 Aug 2011, at 10:07pm, P Kishor wrote:

> I will let you heavyweights duke it out, but re. the above point,
> SQLite (and more databases) have a per row system overhead that can
> very quickly overweight the actual data if the data are too granular
> with each row storing just a tiny amount. For example, the overhead
> for the cells in a raster dataset, if stored one per row, will quickly
> surpass the size of the actual data.

You're quite right.  If someone was going to write that functionality into a 
SQLite app in real life they'd probably use 'LIKE' or 'GLOB' and store all the 
possibilities in one row.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread P Kishor
On Mon, Aug 8, 2011 at 2:19 AM, Igor Tandetnik  wrote:
> Alexey Pechnikov  wrote:
>> 2011/8/7 Simon Slavin :
>>> You don't need to. The SQLite expressions I listed tell you how to achieve 
>>> the result without doing that.
>>
>> Really? And how can you perform the query like to:
>>
>> sqlite> create table t1(ids text);
>> sqlite> insert into t1 (ids) values ('1 2 3');
>> sqlite> insert into t1 (ids) values ('2 3 4');
>> sqlite> insert into t1 (ids) values ('3 4 5');
>
> I can normalize this table, then use joins.
>
>> A simple calculation: if each list of identifiers have about 1000
>> items and there are
>> 1 000 000 lists than the table of relations (t1.rowid, t2.rowid) will
>> have 1 000 000 000
>> rows!
>
> One way or the other, you need to store 1,000,000,000 pieces of information. 
> Why is it that storing them in 1,000,000 rows holding 1000 items each is 
> unremarkable, but storing them in 1,000,000,000 rows holding one item each is 
> exclamation point-worthy?
>


I will let you heavyweights duke it out, but re. the above point,
SQLite (and more databases) have a per row system overhead that can
very quickly overweight the actual data if the data are too granular
with each row storing just a tiny amount. For example, the overhead
for the cells in a raster dataset, if stored one per row, will quickly
surpass the size of the actual data.


> If reducing the number of rows is your ultimate goal, why don't you create a 
> table with one row, holding the whole data structure encoded into one huge 
> string or blob? That'll best optimize the one metric you seem to believe 
> matters the most.
>
>> It's too slow
>
> ... when compared to what alternative? Linearly scanning all those lists?
>
>> Of cource all
>> systems store lists of
>> identifiers in similar situations.
>
> I find it hard to believe that every single system does - surely systems 
> exist that do not denormalize their data this way. In fact, I doubt the 
> design you describe is common, let alone universally accepted.
> --
> Igor Tandetnik
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Puneet Kishor http://www.punkish.org
Carbon Model http://carbonmodel.org
Charter Member, Open Source Geospatial Foundation http://www.osgeo.org
Science Commons Fellow, http://sciencecommons.org/about/whoweare/kishor
Nelson Institute, UW-Madison http://www.nelson.wisc.edu
---
Assertions are politics; backing up assertions with evidence is science
===
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Alexey Pechnikov
Igor, how about simple test? Normalized database is big and slow.

$ time ./test_norm_idx.tcl
real32m54.978s
user32m14.885s
sys 0m39.842s

$ time ./test_idx.tcl
real7m19.005s
user6m55.226s
sys 0m11.717s

$ ls -lh *db
2,1G test_idx.db
7,1G test_norm_idx.db


test_idx.tcl
=
#!/usr/bin/tclsh8.5
package require sqlite3

sqlite3 db [string map {.tcl .db} $argv0]
db eval {PRAGMA page_size=8192}

db eval {
create table parent(dt DATETIME, user_id INTEGER);
create index parent_dt_idx on parent(dt);
create index parent_user_id_idx on parent(user_id);
create virtual table parent_fts using fts4(childs TEXT);}
db transaction {
for {set i 1} {$i<=10} {incr i} {
set time [clock microseconds]
set childs ""
for {set j [expr {$i*1000}]} {$j<=[expr {$i*1000+1000}]} {incr j} {
lappend childs $j
}
db eval {insert into parent(dt, user_id) values ($time, 1)}
db eval {insert into parent_fts(childs) values ($childs)}
}
}


test_norm_idx.tcl
=
#!/usr/bin/tclsh8.5
package require sqlite3

# test normalized
sqlite3 db [string map {.tcl .db} $argv0]
db eval {PRAGMA page_size=8192}
db eval {create table link(dt DATETIME, user_id INTEGER, parent_id
INTEGER, child_id INTEGER);
create index link_child_id_idx on link(child_id);
create index link_dt_idx on link(dt);
create index link_user_id_idx on link(user_id);}
db transaction {
for {set i 1} {$i<=1} {incr i} {
set time [clock microseconds]
db eval {insert into link(dt, user_id, parent_id, child_id)
values ($time, 1, $i%1000, $i)}
}
}


P.S. With versioning of all records we need some additional fields and
normalized database
is very big and very slow.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Igor Tandetnik
Alexey Pechnikov  wrote:
> 2011/8/7 Simon Slavin :
>> You don't need to. The SQLite expressions I listed tell you how to achieve 
>> the result without doing that.
> 
> Really? And how can you perform the query like to:
> 
> sqlite> create table t1(ids text);
> sqlite> insert into t1 (ids) values ('1 2 3');
> sqlite> insert into t1 (ids) values ('2 3 4');
> sqlite> insert into t1 (ids) values ('3 4 5');

I can normalize this table, then use joins.

> A simple calculation: if each list of identifiers have about 1000
> items and there are
> 1 000 000 lists than the table of relations (t1.rowid, t2.rowid) will
> have 1 000 000 000
> rows!

One way or the other, you need to store 1,000,000,000 pieces of information. 
Why is it that storing them in 1,000,000 rows holding 1000 items each is 
unremarkable, but storing them in 1,000,000,000 rows holding one item each is 
exclamation point-worthy?

If reducing the number of rows is your ultimate goal, why don't you create a 
table with one row, holding the whole data structure encoded into one huge 
string or blob? That'll best optimize the one metric you seem to believe 
matters the most.

> It's too slow

... when compared to what alternative? Linearly scanning all those lists?

> Of cource all
> systems store lists of
> identifiers in similar situations.

I find it hard to believe that every single system does - surely systems exist 
that do not denormalize their data this way. In fact, I doubt the design you 
describe is common, let alone universally accepted.
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Alexey Pechnikov
2011/8/7 Simon Slavin :
> You don't need to.  The SQLite expressions I listed tell you how to achieve 
> the result without doing that.

Really? And how can you perform the query like to:

sqlite> create table t1(ids text);
sqlite> insert into t1 (ids) values ('1 2 3');
sqlite> insert into t1 (ids) values ('2 3 4');
sqlite> insert into t1 (ids) values ('3 4 5');
sqlite> create table t2(name text);
sqlite> insert into t2 (name) values ('name1');
sqlite> insert into t2 (name) values ('name2');
sqlite> insert into t2 (name) values ('name3');
sqlite> insert into t2 (name) values ('name4');
sqlite> insert into t2 (name) values ('name5');
sqlite> select * from t2 where rowid in (select ids from t1 where rowid=2);

A simple calculation: if each list of identifiers have about 1000
items and there are
1 000 000 lists than the table of relations (t1.rowid, t2.rowid) will
have 1 000 000 000
rows! It's too slow and is not useful in real world. Of cource all
systems store lists of
identifiers in similar situations.

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Simon Slavin

On 7 Aug 2011, at 3:32pm, Alexey Pechnikov wrote:

> 2011/8/7 Stephan Beal :
> 
>> Also keep in mind that sqlite3 tries, to a large degree, to be compatible
>> with ANSI SQL, and (IN "A B C"), in the form you describe, is not
>> ANSI-specified.
> 
> Yes, but in PostgreSQL (as example) we can create user-defined
> function returns table
> from the list. SQLite can't do it and is impossible to use user extension too.

You don't need to.  The SQLite expressions I listed tell you how to achieve the 
result without doing that.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Alexey Pechnikov
2011/8/7 Stephan Beal :

> Also keep in mind that sqlite3 tries, to a large degree, to be compatible
> with ANSI SQL, and (IN "A B C"), in the form you describe, is not
> ANSI-specified.

Yes, but in PostgreSQL (as example) we can create user-defined
function returns table
from the list. SQLite can't do it and is impossible to use user extension too.

P.S. Example for PostgreSQL:

select list2items('a b c');

CREATE OR REPLACE FUNCTION public.list2items(in_list text)
  RETURNS SETOF text AS
$BODY$

SELECT lindex($1, s) FROM generate_series(0,llength($1)-1) AS s;

$BODY$
  LANGUAGE 'sql' IMMUTABLE;

CREATE OR REPLACE FUNCTION public.lindex(list text, index int4)
  RETURNS text AS
$BODY$

  return [lindex $1 $2]

$BODY$
  LANGUAGE 'pltcl' IMMUTABLE;

CREATE OR REPLACE FUNCTION public.llength(list text)
  RETURNS int4 AS
$BODY$

  return [llength $1]

$BODY$
  LANGUAGE 'pltcl' IMMUTABLE;

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Alexey Pechnikov
2011/8/7 Simon Slavin :
> For example,
>
> SELECT * FROM cars WHERE cars.name IN ('Corolla', 'Fiesta')

create table t (names TEXT);
insert into t(name) values ('Corolla Fiesta');
SELECT * FROM cars WHERE cars.name IN (select names from t where rowid=1);


-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Simon Slavin

On 7 Aug 2011, at 12:17pm, Alexey Pechnikov wrote:

> Is it possible to support construction like to
> 
> where x IN "1 2 3"

How does this differ on the IN operator ?



For example,

SELECT * FROM cars WHERE cars.name IN ('Corolla', 'Fiesta')

> There are a lot of situations when is very useful to store list of
> identifiers in table field.

SQLite supports this:

SELECT * FROM cars WHERE cars.name IN (SELECT names FROM models WHERE 
type='obsolete')

or even

SELECT * FROM cars WHERE cars.name IN models.names

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Stephan Beal
On Sun, Aug 7, 2011 at 1:17 PM, Alexey Pechnikov wrote:

> There are a lot of situations when is very useful to store list of
> identifiers in table field.
>

The problem with the concept of "list as a string" is that it implies a
specific token separator, and there is no single universal solution to that
problem. As soon as someone implements the example you've shown, someone
else will say, "but i have a semicolon-separated list..." Ad nauseum.

Also keep in mind that sqlite3 tries, to a large degree, to be compatible
with ANSI SQL, and (IN "A B C"), in the form you describe, is not
ANSI-specified.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: extend the IN operator

2011-08-07 Thread Max Vlasov
On Sun, Aug 7, 2011 at 3:17 PM, Alexey Pechnikov wrote:

> Is it possible to support construction like to
>
> where x IN "1 2 3"
> ...
>
> Does somebody interesting in this functionality too?
>
>

I needed one, I just implemented mysql find_in_set as a user function. there
it was comma-delimited, but it's not that important, I'm sure you can change
it to any other symbol in your data

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Feature request: extend the IN operator

2011-08-07 Thread Alexey Pechnikov
Is it possible to support construction like to

where x IN "1 2 3"

There are a lot of situations when is very useful to store list of
identifiers in table field.
I know about the intarray virtual table but it's need
constructor/destructor calls (and
produce segfaults with the autorizer function).

Does somebody interesting in this functionality too?

-- 
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users