Re: [sqlite] DB rotate / incremental

2010-10-28 Thread Eric Smith

> I currently use a single database and it works fine.  
> The trouble is replication.  
> Or the sharing of an ever growing database.  

I think your question is more architectural than sqlite-specific.

If your user base / data set is going to be getting more serious and 
if you have some time on your hands, it doesn't seem like sqlite is 
the right tool for you.  You may be better off with a nice fat 
traditional rdbms with built-in replication, load balancing, etc.  

So what follows is a not-as-good hack, but you could probably get it 
up in a few hours.  



If you're only doing inserts and selects then (as far as I understand) 
existing pages of the database should largely remain constant, 
regardless of your schema.  I'm sure the sqlite experts can expound on 
that or explain in what circumstances it's wrong.

If that's the case, then you could leverage an incremental file transfer 
program like rsync(1) to do replication.  

Keep two copies of your database around, one with all the latest stuff 
(call it 'current') and one replication candidate that you'll only 
update every once in a while (call it 'historical').  When you want,
attach 'historical' and sync the latest stuff from 'current' into it.
Obviously you want that to be fast (not scan the whole table), so 
hopefully your schema permits that.  Then detach 'historical' and rsync
'historical' to wherever it needs to go.  Only the new bytes will go over 
the wire.

You implied you'd already tried something like that -- just wanted to
point out rsync in particular if you hadn't looked into it. 



Another point is that, depending on the specifics of your data set and 
the metrics you're calculating on it, you may be able to accumulate the 
answers to your big 'select' queries as you go along, in a separate metrics
table.  E.g. use triggers, or run the 'select' only on each new chunk of 
data and update the metrics table accordingly.

Then you can replace your big select on the main table with a simple
(hopefully constant-time) select on the metrics table.

This only works if your metrics have the nice property that they can be 
incrementally calculated using O(1) space.  E.g. sum, count, mean, stddev, 
but not (afaik) median.

In fact, if you are only inserting (never deleting), and you already know
what all your metrics are, and they have that property, then you don't
need to keep the main table at all -- just the metrics table :)

Eric

-- 
Eric A. Smith

Keeping Young #1:
Avoid fried meats which angry up the blood.
-- Satchel Paige
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] DB rotate / incremental

2010-10-28 Thread Simon
There is just one table, data is simple, fits well in CSV, but I like
to do SQL query on this data and sqlite3 packs it pretty well.  The
data comes from external source and changes constantly, I just grab a
snapshot of it every minute.

I currently use a single database and it works fine.
The trouble is replication.
  Or the sharing of an ever growing database.

This data is accumulated by my server, and I have several computers on
which would like to analyse the data.  I don't need up-to-the-minute
freshness of the data I analyse, I'm alright at analysing "all i got
until yesterday at midnight".

Being able to rotate the database or use it in an incremental manner
would allow me to copy only what has changed since last time without
the complication of writting a replication protocol  (right now it
appears as if the whole db changes all the time, so the only safe
alternative I have is perhaps some kind of delta patch, but I still
need to compare 2 states of the DB file all the time, and I would need
to patch the target machine properly also).

As for grepping, I could do that, but I would loose the power SQL
offers to help at some analysis and calculations.  So this comes back
to the idea of accumulation in rotation every day, then when reading,
just copy the result of many selects (like a recursive grep) and
insert that into a temp table for further analysis.  I think this is
the simplest option I have for now, but I wonder how viable it can be
when the DB grows bigger.

Simon


On Thu, Oct 28, 2010 at 12:03 PM, Black, Michael (IS)
 wrote:
> My first question is:
>
> Why are you using a database?  Sounds to me like grep might do what you want. 
>  All you're doing is inserts and selects.
>
> Can you show what kind of selects you're doing or your database schema?
>
> And why do you need to rotate the database?  Why can't you just keep it all 
> in one?
>
>
> Michael D. Black
> Senior Scientist
> Advanced Analytics Directorate
> Northrop Grumman Information Systems
>
>
> 
>
> From: sqlite-users-boun...@sqlite.org on behalf of Simon
> Sent: Thu 10/28/2010 10:38 AM
> To: General Discussion of SQLite Database
> Subject: EXTERNAL:[sqlite] DB rotate / incremental
>
>
>
> Hi, i'm not sure what exact vocabulary I should use for this...
>
> I'm looking for a way or technique, that allows rotating the database
> just like some tools will rotate my /var/log/messages on Linux.  So,
> to rotate a database, I'm guessing the current DB needs to be renamed
> and kept nearby, new DB will contain additions and modification to the
> previous DB.  But this would need to take care of many many older DBs
> like this.
>
> This would be complicated if the rotation/increments need to take care
> of all scenarios... but my scenario is very similar to the Linux
> system log.  The calls that modify the DB are just inserts and rare
> but huge selects, zero deletes and zero updates.  With rotation I can
> simply copy all "previous" files that will never change on different
> computers, and only "today" will change constantly (but it's currently
> less than 1MB per day).  Writing to this "rotated DB" is fine, reading
> from it is the catch.
>
> I've looked into ATTACH, but the doc says "the number of attached
> databases cannot be increased above 30 on a machines with a 32-bit
> integer".  I would have to merge the "old" database at some point, but
> this is less optimal than writting a sync protocol i think.
>
> I've looked into using virtual tables on the reader end.  Inserter
> would just open a new db file each day, with no regards to the past.
> But the reader could use vtables to go around the ATTACH limit.  I
> think this approach is overly complicated though...
>
> Maybe I should program my own means of "attaching" the results of
> multiple queries on multiple databases.  I guess I could use a
> :MEMORY: database to accumulate the result, but then perhaps I should
> just use a temp table and "copy" the result of a series of
> attach-select-detach operations on all the databases.
>
> I'm looking for suggestions.  I don't have much experience with sqlite
> so I believe it's likely I misunderstood a few things and will
> probably run into some other limits.  Let me know if you think some of
> the above is good or just can't work at all, and let me know if you
> have any other ideas.
>
> Thanks!
>  Simon
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] DB rotate / incremental

2010-10-28 Thread Black, Michael (IS)
My first question is:
 
Why are you using a database?  Sounds to me like grep might do what you want.  
All you're doing is inserts and selects.
 
Can you show what kind of selects you're doing or your database schema?
 
And why do you need to rotate the database?  Why can't you just keep it all in 
one?
 
 
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Northrop Grumman Information Systems
 



From: sqlite-users-boun...@sqlite.org on behalf of Simon
Sent: Thu 10/28/2010 10:38 AM
To: General Discussion of SQLite Database
Subject: EXTERNAL:[sqlite] DB rotate / incremental



Hi, i'm not sure what exact vocabulary I should use for this...

I'm looking for a way or technique, that allows rotating the database
just like some tools will rotate my /var/log/messages on Linux.  So,
to rotate a database, I'm guessing the current DB needs to be renamed
and kept nearby, new DB will contain additions and modification to the
previous DB.  But this would need to take care of many many older DBs
like this.

This would be complicated if the rotation/increments need to take care
of all scenarios... but my scenario is very similar to the Linux
system log.  The calls that modify the DB are just inserts and rare
but huge selects, zero deletes and zero updates.  With rotation I can
simply copy all "previous" files that will never change on different
computers, and only "today" will change constantly (but it's currently
less than 1MB per day).  Writing to this "rotated DB" is fine, reading
from it is the catch.

I've looked into ATTACH, but the doc says "the number of attached
databases cannot be increased above 30 on a machines with a 32-bit
integer".  I would have to merge the "old" database at some point, but
this is less optimal than writting a sync protocol i think.

I've looked into using virtual tables on the reader end.  Inserter
would just open a new db file each day, with no regards to the past.
But the reader could use vtables to go around the ATTACH limit.  I
think this approach is overly complicated though...

Maybe I should program my own means of "attaching" the results of
multiple queries on multiple databases.  I guess I could use a
:MEMORY: database to accumulate the result, but then perhaps I should
just use a temp table and "copy" the result of a series of
attach-select-detach operations on all the databases.

I'm looking for suggestions.  I don't have much experience with sqlite
so I believe it's likely I misunderstood a few things and will
probably run into some other limits.  Let me know if you think some of
the above is good or just can't work at all, and let me know if you
have any other ideas.

Thanks!
  Simon
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users