subject:"Re\: \[sqlite\] replace many rows with one"

Re: [sqlite] replace many rows with one

2014-12-11 Thread Simon Slavin

On 10 Dec 2014, at 3:40pm, RSmith  wrote:

> INSERT INTO s2merged SELECT a, b, sum(theCount) FROM s2 GROUP BY a,b;

Thanks to Martin, Hick and R for this solution.  It was just what I was looking 
for.

> Not sure if your theCount field already contains totals or if it just has 
> 1's...  how did duplication happen? 

The existing rows contain totals.  Or maybe I should call them subtotals.  The 
data is being massaged from one format to another.  I did a bunch of stuff when 
it was text files, then imported it into SQLite and did a bunch more on it as 
rows and columns.  Eventually it'll end up in SQLite.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] replace many rows with one

2014-12-10 Thread RSmith



On 2014/12/10 13:39, Simon Slavin wrote:

Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2


I think the one you are looking for is:

INSERT INTO s2merged SELECT a, b, sum(theCount) FROM s2 GROUP BY a,b;

Not sure if your theCount field already contains totals or if it just has 1's...  how did duplication happen? Should this be the 
case you might also be able to use simply:


INSERT INTO s2merged SELECT a, b, count() FROM s2 GROUP BY a,b;

Either way, the last query will obviously show the duplication counts (if 
needed as an exercise).

For 300 mil rows this will be rather quick if it's going to be a once-off thing and not something running often. I'd say it will 
take under an hour depending on hardware and how much duplication happened in s2.  Making an index will take a lot longer, you are 
better off just running the merge as above - unless of course the eventual use of s2merged includes being a look-up attached DB or 
such, in which case making an index from the start will be worthwhile.



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] replace many rows with one

2014-12-10 Thread Hick Gunter

Both, I guess

Insert into ... select a,b,sum(theCount) group by a,b;

-Ursprüngliche Nachricht-
Von: Simon Slavin [mailto:slav...@bigfraud.org]
Gesendet: Mittwoch, 10. Dezember 2014 12:39
An: General Discussion of SQLite Database
Betreff: [sqlite] replace many rows with one

Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2

and there'll be a TOTAL() in there somewhere.  Or is it GROUP BY ?  I can't 
seem to get the right phrasing.

Also, given that this is the last operation I'll be doing on table s2, will it 
speed things up to create an index on s2 (a,b), or will the SELECT just spend 
the same time making its own temporary index ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
 Gunter Hick
Software Engineer
Scientific Games International GmbH
FN 157284 a, HG Wien
Klitschgasse 2-4, A-1130 Vienna, Austria
Tel: +43 1 80100 0
E-Mail: h...@scigames.at

This communication (including any attachments) is intended for the use of the 
intended recipient(s) only and may contain information that is confidential, 
privileged or legally protected. Any unauthorized use or dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please immediately notify the sender by return e-mail message and 
delete all copies of the original communication. Thank you for your cooperation.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] replace many rows with one

2014-12-10 Thread Martin Engelschalk


Hi Simon,

Am 10.12.2014 12:39, schrieb Simon Slavin:

Dear folks,

A little SQL question for you.  The database file concerned is purely for data 
manipulation at the moment.  I can do anything I like to it, even at the schema 
level, without inconveniencing anyone.

I have a TABLE with about 300 million (sic.) entries in it, as follows:

CREATE TABLE s2 (a TEXT, b TEXT, theCount INTEGER)

There are numerous cases where two or more rows (up to a few thousand in some 
cases) have the same values for a and b.  I would like to merge those rows into 
one row with a 'theCount' which is the total of all the merged rows.  
Presumably I do something like

CREATE TABLE s2merged (a TEXT, b TEXT, theCount INTEGER)

INSERT INTO s2merged SELECT DISTINCT ... FROM s2
insert into s2merged (a, b, theCount) select a, b, sum(theCount) from s2 
group by a, b;


and there'll be a TOTAL() in there somewhere.  Or is it GROUP BY ?  I can't 
seem to get the right phrasing.

Also, given that this is the last operation I'll be doing on table s2, will it 
speed things up to create an index on s2 (a,b), or will the SELECT just spend 
the same time making its own temporary index ?
Creating the index and select with index will probably be slower than 
select without index


Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] replace many rows with one

Re: [sqlite] replace many rows with one

Re: [sqlite] replace many rows with one

Re: [sqlite] replace many rows with one

4 matches

Site Navigation

Mail list logo

Footer information