Re: [sqlite] Union all writting on /var/tmp

2012-10-30 Thread Nico Williams
Also, as far as I'm concerned, if the choice is "more optimizations in
SQLite3" or "more work on SQLite4", then put me down for the latter.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-30 Thread Nico Williams
On Mon, Oct 29, 2012 at 8:08 PM, Richard Hipp  wrote:
> The reason for using a temp table for UNION ALL in a subquery is because
> that is the path of least resistance.  [...]

For what it's worth, UNION ALL is generally understood to be a major
optimization over UNION because no temp table should be required to
implement UNION ALL;.  Also, IIUC, INTERSECT and EXCEPT could also be
run without temp tables by converting them to slightly different
queries: SELECT ... FROM source1 ... WHERE EXISTS (SELECT ... FROM
source2 ...).  And UNION ALL queries could be re-written into a UNION
where the second query is an EXCEPT query (all the rows in the second
that are not in the first).  INTERSECT and EXCEPT are not used very
frequently, but UNION ALL is used quite frequently, and probably much
more so than UNION: UNION ALL is the logical operator for building
VIEWs to implement a SQL equivalent of virtual classes, with each
virtual class implemented as the UNION ALL of several queries.  It
would be quite worthwhile to have UNION ALL not use a temp table.

Incidentally, there are a lot of possible optimizations where a query
could be re-written differently where the optimizer could be external
to SQLite, using a parser that objectifies SQL, an analyzer, and a
re-writer that outputs SQL to be parsed by SQLite.  Optimizations like
GCSE, OR -> UNION, INTERSET/EXECPT -> SELECT ... WHERE EXISTS / NOT
EXISTS, and others.  But UNION ALL cannot be so re-written, and that,
I think, is a good argument for having this optimization in SQLite
while not necessarily having the others.  I've been tempted before to
write at least the objectifying parser for this.

> At http://www.sqlite.org/src/info/7af3acbbd4 there is a patch that adds the
> optimization to avoid using a temp table for your UNION ALL queries.  This
> one small patch represents about 12 hours of intense work, so far.  Much
> more work will be required to get the patch performing to our release
> standards.  All of this effort on your behalf you are receiving for free.
> In return, we ask two things:

We all appreciate this very much.  It is a lot of work, I know.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-30 Thread Richard Hipp
On Tue, Oct 30, 2012 at 7:39 AM, Eleytherios Stamatogiannakis <
est...@gmail.com> wrote:

>
> Mr. Hipp please excuse my attitude on my first email
>

All is forgotten.  Thank you for bringing this optimization opportunity to
our attention!

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-30 Thread Eleytherios Stamatogiannakis

Hello,

Mr. Hipp please excuse my attitude on my first email (the one your 
replied to). It came after 3 days of intense pressure to find out what 
the problem of machines coming to a crawl whenever a particular query 
with sufficiently large union-ed all tables was run.


Due to the quality of SQLite it literally was the last thing that i 
looked at. I even had prepared emails to other people asking them to 
look at their software, and to not to buffer on the hard disk so much.


Concerning your effort to fix it. I also suspected about the "path of 
least resistance", considering "union all". This is why in my next 
emails i mainly asked for a switch to change the "/var/tmp" path. The 
other thing that i've asked for, was for the documentation to have a 
warning about union all's behaviour. I believed that warning could save 
a lot of time and effort for other people that tripped on the same thing 
as me.


Thank you very very much for your fix. I'm glad that you put the 
considerable effort to it. My Phd was in databases, so i can understand 
how much effort this fix required. I have already downloaded the patch 
and i'll test it asap.


Due to me also working/having worked on Open Source software (madIS, 
Rigs of Rods), i realize your second point in my skin. Nevertheless 
sometimes, i also slide to this kind of behaviour. I'll try to be on 
guard against it in the future.


Best regards,

lefteris.

On 30/10/12 03:08, Richard Hipp wrote:



On Fri, Oct 26, 2012 at 2:16 PM, Eleytherios Stamatogiannakis
> wrote:

I have been observing the following freaky behaviour of SQLite. When
i run:

select count(*) from (select * from huge_table union all select *
from huge_table);

Sqlite starts writting in /var/tmp/ a file like:

  /var/tmp/etilqs___gblRd6vUPcx91Hl, the root partition of fills up
and an error is raised.

Why does SQLite 3.7.14.1 need to write at all when doing union all?
It seems to me that there is no reason for doing so.


The reason for using a temp table for UNION ALL in a subquery is because
that is the path of least resistance.  The same identical code can
handle UNION ALL, UNION, INTERSECT, EXCEPT and various other cases.
Some queries (such as your UNION ALL) can in fact do without the temp
table.  But those are special cases that have to coded separately.
Adding, testing, and maintaining that extra code involves a lot of
work.  And the extra code risks introducing bugs that might appear even
for people who are not doing a UNION ALL in a subquery.  And in over 12
years of use, in over a million different applications, with over 2
billion deployments, nobody has ever before requested this optimization.

At http://www.sqlite.org/src/info/7af3acbbd4 there is a patch that adds
the optimization to avoid using a temp table for your UNION ALL
queries.  This one small patch represents about 12 hours of intense
work, so far.  Much more work will be required to get the patch
performing to our release standards.  All of this effort on your behalf
you are receiving for free.  In return, we ask two things:

(1) Please download and test the patch and report any problems,
including performance problems.

(2) Please learn to be less grumpy, demanding, and condescending when
requesting help with software towards which you have contributed
nothing.  You have received this latest patch, and indeed all of SQLite,
by grace.  Therefore, please extend the same grace toward others.


Best regards,

lefteris.
_
sqlite-users mailing list
sqlite-users@sqlite.org 
http://sqlite.org:8080/cgi-__bin/mailman/listinfo/sqlite-__users





--
D. Richard Hipp
d...@sqlite.org 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-29 Thread Richard Hipp
On Fri, Oct 26, 2012 at 2:16 PM, Eleytherios Stamatogiannakis <
est...@gmail.com> wrote:

> I have been observing the following freaky behaviour of SQLite. When i run:
>
> select count(*) from (select * from huge_table union all select * from
> huge_table);
>
> Sqlite starts writting in /var/tmp/ a file like:
>
>  /var/tmp/etilqs_**gblRd6vUPcx91Hl, the root partition of fills up and an
> error is raised.
>
> Why does SQLite 3.7.14.1 need to write at all when doing union all? It
> seems to me that there is no reason for doing so.
>

The reason for using a temp table for UNION ALL in a subquery is because
that is the path of least resistance.  The same identical code can handle
UNION ALL, UNION, INTERSECT, EXCEPT and various other cases.  Some queries
(such as your UNION ALL) can in fact do without the temp table.  But those
are special cases that have to coded separately.  Adding, testing, and
maintaining that extra code involves a lot of work.  And the extra code
risks introducing bugs that might appear even for people who are not doing
a UNION ALL in a subquery.  And in over 12 years of use, in over a million
different applications, with over 2 billion deployments, nobody has ever
before requested this optimization.

At http://www.sqlite.org/src/info/7af3acbbd4 there is a patch that adds the
optimization to avoid using a temp table for your UNION ALL queries.  This
one small patch represents about 12 hours of intense work, so far.  Much
more work will be required to get the patch performing to our release
standards.  All of this effort on your behalf you are receiving for free.
In return, we ask two things:

(1) Please download and test the patch and report any problems, including
performance problems.

(2) Please learn to be less grumpy, demanding, and condescending when
requesting help with software towards which you have contributed nothing.
You have received this latest patch, and indeed all of SQLite, by grace.
Therefore, please extend the same grace toward others.


>
> Best regards,
>
> lefteris.
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Elefterios Stamatogiannakis

Sorry i didn't realize before that you had:

select * from (... union all ...)

Try with a count(*) as such:

select count(*) from (... union all ...)

And you'll see that both union and "union all" will create a temp file. 
Union needs the temp file to remove the duplicates. Union All doesn't 
need it at all.


l.

On 29/10/2012 10:37 μμ, Clemens Ladisch wrote:

Eleytherios Stamatogiannakis wrote:

My understanding (and what my experiments have shown) is that in both
cases "COMPOUND SUBQUERIES 1 AND 2" will write a temp file to /var/tmp.


Okay, let's create a test database ...

   $ strace -e trace=open sqlite3 test.db
   [...]
   sqlite> create table t(x);
   open("/tmp/test.db-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 4
   open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
   open("/tmp", O_RDONLY|O_CLOEXEC)= 5
   sqlite> begin;
   sqlite> insert into t values('long string');
   open("/tmp/test.db-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 4
   sqlite> insert into t select * from t;
   sqlite> insert into t select * from t;
   sqlite> insert into t select * from t;

... that becomes so big that the subquery overflows the cache:

   [...]
   sqlite> insert into t select * from t;
   open("/var/tmp/etilqs_Oekg82a6826YGdz", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 5
   sqlite> commit;

Only the UNION query claims to use temporary storage for the subquery:
(These two queries are constructed so that they have no result records.)

   sqlite> explain query plan select * from (select rowid, x from t union all 
select rowid, x from t) where x = '';
   1|0|0|SCAN TABLE t (~10 rows)
   2|0|0|SCAN TABLE t (~10 rows)
   0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
   sqlite> explain query plan select * from (select rowid, x from t union 
select rowid, x from t) where x = '';
   2|0|0|SCAN TABLE t (~100 rows)
   3|0|0|SCAN TABLE t (~100 rows)
   1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (UNION)
   0|0|0|SCAN SUBQUERY 1 (~20 rows)

When executing them, only UNION uses temporary files:

   sqlite> select * from (select rowid, x from t union all select rowid, x from 
t) where x = '';
   sqlite> select * from (select rowid, x from t union select rowid, x from t) 
where x = '';
   open("/var/tmp/etilqs_QNvTpzSHSedfFFM", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 4
   open("/var/tmp/etilqs_RiTrAL6vrIxpnOu", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 5


Now, what UNION ALL query wants to use a temporary table?
I'd guess that SQLite needs to save the result for some other reasons.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Clemens Ladisch
Eleytherios Stamatogiannakis wrote:
> My understanding (and what my experiments have shown) is that in both
> cases "COMPOUND SUBQUERIES 1 AND 2" will write a temp file to /var/tmp.

Okay, let's create a test database ...

  $ strace -e trace=open sqlite3 test.db
  [...]
  sqlite> create table t(x);
  open("/tmp/test.db-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 4
  open("/dev/urandom", O_RDONLY|O_CLOEXEC) = 5
  open("/tmp", O_RDONLY|O_CLOEXEC)= 5
  sqlite> begin;
  sqlite> insert into t values('long string');
  open("/tmp/test.db-journal", O_RDWR|O_CREAT|O_CLOEXEC, 0644) = 4
  sqlite> insert into t select * from t;
  sqlite> insert into t select * from t;
  sqlite> insert into t select * from t;

... that becomes so big that the subquery overflows the cache:

  [...]
  sqlite> insert into t select * from t;
  open("/var/tmp/etilqs_Oekg82a6826YGdz", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 5
  sqlite> commit;

Only the UNION query claims to use temporary storage for the subquery:
(These two queries are constructed so that they have no result records.)

  sqlite> explain query plan select * from (select rowid, x from t union all 
select rowid, x from t) where x = '';
  1|0|0|SCAN TABLE t (~10 rows)
  2|0|0|SCAN TABLE t (~10 rows)
  0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
  sqlite> explain query plan select * from (select rowid, x from t union select 
rowid, x from t) where x = '';
  2|0|0|SCAN TABLE t (~100 rows)
  3|0|0|SCAN TABLE t (~100 rows)
  1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (UNION)
  0|0|0|SCAN SUBQUERY 1 (~20 rows)

When executing them, only UNION uses temporary files:

  sqlite> select * from (select rowid, x from t union all select rowid, x from 
t) where x = '';
  sqlite> select * from (select rowid, x from t union select rowid, x from t) 
where x = '';
  open("/var/tmp/etilqs_QNvTpzSHSedfFFM", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 4
  open("/var/tmp/etilqs_RiTrAL6vrIxpnOu", 
O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0600) = 5


Now, what UNION ALL query wants to use a temporary table?
I'd guess that SQLite needs to save the result for some other reasons.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Eleytherios Stamatogiannakis
My understanding (and what my experiments have shown) is that in both 
cases "COMPOUND SUBQUERIES 1 AND 2" will write a temp file to /var/tmp.


IMHO, the documentation should warn about this writing behaviour, 
because for the second case (union all) it isn't expected/predictable 
because fully buffering is not needed.


lefteris.

On 29/10/12 20:41, Clemens Ladisch wrote:

Eleytherios Stamatogiannakis wrote:

"union all" works exactly like plain "union". It always materializes its input.


sqlite> explain query plan select 1 union select 2;
sele  order  from  deta
  -    
0 0  0 COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
sqlite> explain query plan select 1 union all select 2;
sele  order  from  deta
  -    
0 0  0 COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

It does not.  (In the full "explain" output, "OpenEphemeral" is missing.)
Neither with real tables.

What particular query behaves unexpectedly for you?


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Clemens Ladisch
Eleytherios Stamatogiannakis wrote:
> "union all" works exactly like plain "union". It always materializes its 
> input.

sqlite> explain query plan select 1 union select 2;
sele  order  from  deta
  -    
0 0  0 COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
sqlite> explain query plan select 1 union all select 2;
sele  order  from  deta
  -    
0 0  0 COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

It does not.  (In the full "explain" output, "OpenEphemeral" is missing.)
Neither with real tables.

What particular query behaves unexpectedly for you?


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Eleytherios Stamatogiannakis

Look at what that page says about "union all" (emphasis on *all*)

--SNIP--
"Note that the UNION ALL operator for compound queries does not use 
transient indices by itself (though of course the right and left 
subqueries of the UNION ALL might use transient indices depending on how 
they are composed.)"

--/SNIP--

At least to my eyes, above says what i was expecting before realizing 
what actually happens, that "union all" tries to not materialize its 
results when possible.


What the truth is, concerning materialization, is that in SQLite "union 
all" works exactly like plain "union". It always materializes its input.


lefteris.

On 29/10/12 16:37, Clemens Ladisch wrote:

Eleytherios Stamatogiannakis wrote:

Can a warning about "union all"'s behaviour of buffering everything in /var/tmp 
be added in SQLite's documentation?


Like this?  http://www.sqlite.org/tempfiles.html


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Clemens Ladisch
Eleytherios Stamatogiannakis wrote:
> Can a warning about "union all"'s behaviour of buffering everything in 
> /var/tmp be added in SQLite's documentation?

Like this?  http://www.sqlite.org/tempfiles.html


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Union all writting on /var/tmp documentation warning

2012-10-29 Thread Eleytherios Stamatogiannakis
Can a warning about "union all"'s behaviour of  buffering everything in 
/var/tmp be added in SQLite's documentation?


I think that such a warning could save a lot of time for other SQLite 
users that trip over the same thing as i did.


Thank you,

lefteris.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-27 Thread Elefterios Stamatogiannakis
Thank you Simon. Your solution would work for the example i gave. 
Nevertheless my problem is still more complex because i also use SQLite 
as a generic streaming engine (yes i know, SQLite wasn't designed for 
doing things like that).


Appart from input VTs (FILE) we also have output VTs like so [*]:

OUTPUT '10.0.0.1:8080' select upper(c1), c2, lower(c3) from
(
select * from file('http://www.foo.com/list1.tsv.gz')
union all
select * from file('http://www.foo.com/list2.tsv.gz')
)
where c2!=c4;

Internally this is implemented as a VT (OUTPUT) that takes as input a 
query and sends its results to another SQLite. The beauty of the concept 
is that a streaming query flow can go from machine to machine and still 
the transactional properties hold. If anything happens anywhere in the 
query's path, all of the machines will rollback due to the transactional 
properties of SQLite.


In addition to above we are developing a parallel processing engine 
(like Hadoop) where we use SQLite "chunks" for processing and data 
storage. So tables are split (or collected) into multiple SQLite DBs and 
transferred to multiple machines, where the queries will be executed in 
parallel [**].


For above we heavilly use UNION ALLs over the attached DBs to scan over 
the many chunks of a table.


A UNION ALL that unnecessarily buffers to the disk its inputs (for a 
single scan over them), is very nasty, performance wise, to all of the 
above machinations.


I can guess that changing UNION ALL processing for single scans, isn't 
so easy, nevertheless it would be nice if the /var/tmp buffering 
directory could be at least changed.


lefteris.

[*] In practise we pipe to named pipes that point to other machines.

[**] Other projects based on the same idea are:

http://hadoopdb.sourceforge.net/

and the company that sprang from above:

http://hadapt.com/


On 27/10/2012 3:05 πμ, Simon Slavin wrote:


Does this work instead ?

CREATE TABLE t ([[whatever columns you want]]);
INSERT INTO t SELECT upper(c1),c2,lower(c3) FROM 
file('http://www.foo.com/list1.tsv.gz') WHERE c2!=c4;
INSERT INTO t SELECT upper(c1),c2,lower(c3) FROM 
file('http://www.foo.com/list2.tsv.gz') WHERE c2!=c4;

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-26 Thread Simon Slavin

On 26 Oct 2012, at 10:28pm, Elefterios Stamatogiannakis  
wrote:

> create table t as
> select upper(c1), c2, lower(c3) from
> (
> select * from file('http://www.foo.com/list1.tsv.gz')
> union all
> select * from file('http://www.foo.com/list2.tsv.gz')
> )
> where c2!=c4;

Does this work instead ?

CREATE TABLE t ([[whatever columns you want]]);
INSERT INTO t SELECT upper(c1),c2,lower(c3) FROM 
file('http://www.foo.com/list1.tsv.gz') WHERE c2!=c4;
INSERT INTO t SELECT upper(c1),c2,lower(c3) FROM 
file('http://www.foo.com/list2.tsv.gz') WHERE c2!=c4;

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-26 Thread Igor Tandetnik

On 10/26/2012 5:28 PM, Elefterios Stamatogiannakis wrote:

The real query in madIS looks like the following:

create table t as
select upper(c1), c2, lower(c3) from
(
select * from file('http://www.foo.com/list1.tsv.gz')
union all
select * from file('http://www.foo.com/list2.tsv.gz')
)
where c2!=c4;


Perhaps something like this instead:

create table t as
select upper(c1), c2, lower(c3)
from file('http://www.foo.com/list1.tsv.gz')
where c2!=c4;

insert into t
select upper(c1), c2, lower(c3)
from file('http://www.foo.com/list2.tsv.gz')
where c2!=c4;


Could the /var/tmp position that it writes to, be changed to another directory?


http://sqlite.org/c3ref/temp_directory.html

--
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-26 Thread Elefterios Stamatogiannakis
Thank you for answering Mr. Hipp. The general problem doesn't have to do 
with counting the rows (it was there as a placeholder). I just want to 
merge 2 (and more) table/streams.


The real query in madIS looks like the following:

create table t as
select upper(c1), c2, lower(c3) from
(
select * from file('http://www.foo.com/list1.tsv.gz')
union all
select * from file('http://www.foo.com/list2.tsv.gz')
)
where c2!=c4;

"list1.tsv.gz" and "list2.tsv.gz" are two enormous streams which i would 
like to process and put into table "t".


I have gone to great lengths to make the virtual table "file" being 
fully streamed. So both of the network files (list1, list2) arrive from 
the network packet by packet, get decompressed without touching the disk 
and then are broken into multiple columns ('tsv' is assumed to mean tab 
separated).


I admire SQLite very much for its predictability. So for the above query 
i would expect from it to scan first over the first file (list1) and 
then over the next (list2), and row by row put them in table "t".


This assumption was so strong that i've been searching all over the 
"file" VT code for a *very* long time to find out the bug in it that 
caused it to grid to a halt my system whenever i executed above query ( 
/ partition was filled ).


I have a request. If "union all" cannot be changed to not write on the 
hard disk when scanning just once over tables/streams. Could the 
/var/tmp position that it writes to, be changed to another directory? I 
prefer to keep a small root (/) partition and right now i'm unable to do 
any "union all" on anything that is bigger than the free space on it.


Thank you again,

lefteris.

On 26/10/2012 9:23 μμ, Richard Hipp wrote:



On Fri, Oct 26, 2012 at 2:16 PM, Eleytherios Stamatogiannakis
> wrote:

I have been observing the following freaky behaviour of SQLite. When
i run:

select count(*) from (select * from huge_table union all select *
from huge_table);

Sqlite starts writting in /var/tmp/ a file like:

  /var/tmp/etilqs___gblRd6vUPcx91Hl, the root partition of fills up
and an error is raised.

Why does SQLite 3.7.14.1 need to write at all when doing union all?
It seems to me that there is no reason for doing so.


The only way SQLite knows to evaluate the query is to (1) compute the
UNION ALL into a temporary table then (2) scan the temporary table to
count the rows.  /var/tmp space is used to hold the temporary table.

Try instead:

SELECT (select count(*) from huge_table)+(select count(*) from huge_table);




Best regards,

lefteris.
_
sqlite-users mailing list
sqlite-users@sqlite.org 
http://sqlite.org:8080/cgi-__bin/mailman/listinfo/sqlite-__users





--
D. Richard Hipp
d...@sqlite.org 


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Union all writting on /var/tmp

2012-10-26 Thread Richard Hipp
On Fri, Oct 26, 2012 at 2:16 PM, Eleytherios Stamatogiannakis <
est...@gmail.com> wrote:

> I have been observing the following freaky behaviour of SQLite. When i run:
>
> select count(*) from (select * from huge_table union all select * from
> huge_table);
>
> Sqlite starts writting in /var/tmp/ a file like:
>
>  /var/tmp/etilqs_**gblRd6vUPcx91Hl, the root partition of fills up and an
> error is raised.
>
> Why does SQLite 3.7.14.1 need to write at all when doing union all? It
> seems to me that there is no reason for doing so.
>

The only way SQLite knows to evaluate the query is to (1) compute the UNION
ALL into a temporary table then (2) scan the temporary table to count the
rows.  /var/tmp space is used to hold the temporary table.

Try instead:

SELECT (select count(*) from huge_table)+(select count(*) from huge_table);





>
> Best regards,
>
> lefteris.
> __**_
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Union all writting on /var/tmp

2012-10-26 Thread Eleytherios Stamatogiannakis

I have been observing the following freaky behaviour of SQLite. When i run:

select count(*) from (select * from huge_table union all select * from 
huge_table);


Sqlite starts writting in /var/tmp/ a file like:

 /var/tmp/etilqs_gblRd6vUPcx91Hl, the root partition of fills up and an 
error is raised.


Why does SQLite 3.7.14.1 need to write at all when doing union all? It 
seems to me that there is no reason for doing so.


Best regards,

lefteris.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users