Re: [discuss] SQL Database version control tool recommendations?

2018-08-29 Thread Byron Smith
No recommendation here, but I just came across this tool for
including/diffing sqlite binaries in git repos
, which is clearly relevant.

(It came up in a HN discussion for another related application
.)

On Fri, Aug 10, 2018 at 1:28 PM Carl Boettiger via discuss <
discuss@lists.carpentries.org> wrote:

> Hi Tiffany!, list,
>
> I think dumps are reasonable for regular backups, but not a good choice
> for creating more long-term archives; so guess it depends a bit on the
> goal.  Changes in versions, options, encoding, and database engines can
> make it difficult to import SQL dumps accurately.  I think text-file
> formats (csv) are still the best long-term archive option -- they are easy
> to version and compress and ubiquitous, but far from a perfect option -- in
> particular, a round-trip db -> csv -> db may likely not preserve data types
> (boolean / int / char etc) accurately. Storing this as 'metadata' can help
> but is somewhat manual. I'm not convinced that we have a good performant,
> compressable, cross-platform, widely established file-based exchange format
> available at this time (queue comments about json, hdf5, or parquet).
>
> A somewhat separate issue is whether such files need a git-like tool to
> manage versions.  IMHO the goal is really to preserve each dump in a way
> that doesn't risk accidental overwriting of a previous version and captures
> some basic metadata (timestamp); something a file-naming convention can
> provide and git may not be necessary (given both the potentially large size
> of data dumps and the often compelling case to compress these files in a
> binary format).
>
> I'm really no expert in any of this though, so sharing this as much to
> learn where it goes wrong rather than as solid advice!
>
> Cheers,
>
> Carl
>
> On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber  wrote:
>
>> Tiffany,
>>
>> You might experiment with some smallish databases.  The order of
>> records may well change significantly from dump to dump, making the
>> apparent differences and the actual differences between any two dumps
>> appear much larger than they really are.
>>
>> Good luck!
>>
>>
>> On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss
>>  wrote:
>> >
>> > Thanks all for your input - very helpful! Dav - happy for you to
>> questions the general strategy. As I said, I know very little about this.
>> In my case its a smallish, simple SQLite database with ~ 8 tables. So
>> dumping/transaction logs/etc might work well and easily. But if there's a
>> better and different strategy for checkpointing SQLite databases, I'd love
>> to learn.
>> >
>> > Thanks!
>> > Tiffany
>> > The Carpentries / discuss / see discussions + participants + delivery
>> options Permalink
>>
>> --
>> The Carpentries: discuss
>> Permalink:
>> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889
>> Delivery options:
>> https://carpentries.topicbox.com/groups/discuss/subscription
>>
> --
>
> http://carlboettiger.info
> *The Carpentries * / discuss /
> see discussions  +
> participants  + 
> delivery
> options 
> Permalink
> 
>

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-M5d95bbf9e8825d3a09fe7cb8
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] SQL Database version control tool recommendations?

2018-08-10 Thread Carl Boettiger via discuss
Hi Tiffany!, list,

I think dumps are reasonable for regular backups, but not a good choice for
creating more long-term archives; so guess it depends a bit on the goal.
Changes in versions, options, encoding, and database engines can make it
difficult to import SQL dumps accurately.  I think text-file formats (csv)
are still the best long-term archive option -- they are easy to version and
compress and ubiquitous, but far from a perfect option -- in particular, a
round-trip db -> csv -> db may likely not preserve data types (boolean /
int / char etc) accurately. Storing this as 'metadata' can help but is
somewhat manual. I'm not convinced that we have a good performant,
compressable, cross-platform, widely established file-based exchange format
available at this time (queue comments about json, hdf5, or parquet).

A somewhat separate issue is whether such files need a git-like tool to
manage versions.  IMHO the goal is really to preserve each dump in a way
that doesn't risk accidental overwriting of a previous version and captures
some basic metadata (timestamp); something a file-naming convention can
provide and git may not be necessary (given both the potentially large size
of data dumps and the often compelling case to compress these files in a
binary format).

I'm really no expert in any of this though, so sharing this as much to
learn where it goes wrong rather than as solid advice!

Cheers,

Carl

On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber  wrote:

> Tiffany,
>
> You might experiment with some smallish databases.  The order of
> records may well change significantly from dump to dump, making the
> apparent differences and the actual differences between any two dumps
> appear much larger than they really are.
>
> Good luck!
>
>
> On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss
>  wrote:
> >
> > Thanks all for your input - very helpful! Dav - happy for you to
> questions the general strategy. As I said, I know very little about this.
> In my case its a smallish, simple SQLite database with ~ 8 tables. So
> dumping/transaction logs/etc might work well and easily. But if there's a
> better and different strategy for checkpointing SQLite databases, I'd love
> to learn.
> >
> > Thanks!
> > Tiffany
> > The Carpentries / discuss / see discussions + participants + delivery
> options Permalink
>
> --
> The Carpentries: discuss
> Permalink:
> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889
> Delivery options:
> https://carpentries.topicbox.com/groups/discuss/subscription
>
-- 

http://carlboettiger.info

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Md590217388b52aadab77edf2
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] SQL Database version control tool recommendations?

2018-08-10 Thread Bennet Fauber
Tiffany,

You might experiment with some smallish databases.  The order of
records may well change significantly from dump to dump, making the
apparent differences and the actual differences between any two dumps
appear much larger than they really are.

Good luck!


On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss
 wrote:
>
> Thanks all for your input - very helpful! Dav - happy for you to questions 
> the general strategy. As I said, I know very little about this. In my case 
> its a smallish, simple SQLite database with ~ 8 tables. So 
> dumping/transaction logs/etc might work well and easily. But if there's a 
> better and different strategy for checkpointing SQLite databases, I'd love to 
> learn.
>
> Thanks!
> Tiffany
> The Carpentries / discuss / see discussions + participants + delivery options 
> Permalink

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] SQL Database version control tool recommendations?

2018-08-10 Thread Dav Clark via discuss
I also question the general strategy here... databases are fairly complete
ecosystems and have their own approaches for checkpointing and so on. But
whatever, that's not what you asked! This is the closest thing to a
universal dump data from this into that:

http://odo.pydata.org/en/latest/

It's easy to add another node to the network if your desired destination is
not implemented.

You'd have to (I think) write some logic to walk the tables.

Databases also tend to have good tools for dumps and the options may make
your life easier. For example, msql:

mysqldump -u root -p --tab=/var/lib/mysql-files --compatible=postgresql


pg_dump is a similar tool, and the options may make the output more
interpretable for whatever schemes you have in mind.

Cheers,
D

D

On Fri, Aug 10, 2018 at 9:50 AM Bennet Fauber  wrote:

> Where possible, we used to keep transaction logs in addition to
> database dumps.  Those can be replayed from the dumped database state
> to replay the transactions up to any point by copying and editing the
> transaction log(s).
>
> My best recollection is that they are text and would be suitable for
> entry into Git.  They are also smaller, and in the case of needing to
> recover because of an entry that causes later corruption or problems,
> much easier to modify to make corrections or deletions.
>
> What is the purpose of putting the database into Git?  If you create
> and keep transaction logs (possibly in Git), and have something akin
> to a rakefile that creates the original database structure, then the
> database itself becomes a derived file, akin to a .pyc file, and can
> be recreated at will, so tracking the database, per se, isn't
> necessary.
>
> Just a thought.
>
> -- bennet
>
>
>
>
>
> On Fri, Aug 10, 2018 at 8:49 AM Greg Wilson 
> wrote:
> >
> > Hi Tiffany,
> >
> > For small SQLite databases, the simplest thing is to dump as SQL text
> and put that under version control. I've done this with DBs up to 100kb or
> so, and it allows diff and merge to work as they usually do.  It's...not
> horrible.  For larger databases, I've seen groups create a database backup
> using the DBMS's native tool and then use something like Git LFS to manage
> that backup as a binary blob.  It works, but you then have to use the
> DBMS's own tools for finding differences and reconciling them.
> >
> > Cheers,
> >
> > Greg
> >
> > On 2018-08-09 10:47 PM, Tiffany A. Timbers via discuss wrote:
> >
> > Hi folks,
> >
> > I am looking for SQL Database version control tool recommendations? Ones
> that work with Git and are open source are ideal. I have never tread in
> this territory before, so all opinions/options welcome!
> >
> > Thanks!
> > Tiffany
> >
> >
> > --
> > If you cannot be brave – and it is often hard to be brave – be kind.
> >
> > The Carpentries / discuss / see discussions + participants + delivery
> options Permalink
>
> --
> The Carpentries: discuss
> Permalink:
> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mbf1e4c2ac14b8dfcb9781826
> Delivery options:
> https://carpentries.topicbox.com/groups/discuss/subscription
>

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Ma84029e1188f98a30f8988a2
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] SQL Database version control tool recommendations?

2018-08-10 Thread Bennet Fauber
Where possible, we used to keep transaction logs in addition to
database dumps.  Those can be replayed from the dumped database state
to replay the transactions up to any point by copying and editing the
transaction log(s).

My best recollection is that they are text and would be suitable for
entry into Git.  They are also smaller, and in the case of needing to
recover because of an entry that causes later corruption or problems,
much easier to modify to make corrections or deletions.

What is the purpose of putting the database into Git?  If you create
and keep transaction logs (possibly in Git), and have something akin
to a rakefile that creates the original database structure, then the
database itself becomes a derived file, akin to a .pyc file, and can
be recreated at will, so tracking the database, per se, isn't
necessary.

Just a thought.

-- bennet





On Fri, Aug 10, 2018 at 8:49 AM Greg Wilson  wrote:
>
> Hi Tiffany,
>
> For small SQLite databases, the simplest thing is to dump as SQL text and put 
> that under version control. I've done this with DBs up to 100kb or so, and it 
> allows diff and merge to work as they usually do.  It's...not horrible.  For 
> larger databases, I've seen groups create a database backup using the DBMS's 
> native tool and then use something like Git LFS to manage that backup as a 
> binary blob.  It works, but you then have to use the DBMS's own tools for 
> finding differences and reconciling them.
>
> Cheers,
>
> Greg
>
> On 2018-08-09 10:47 PM, Tiffany A. Timbers via discuss wrote:
>
> Hi folks,
>
> I am looking for SQL Database version control tool recommendations? Ones that 
> work with Git and are open source are ideal. I have never tread in this 
> territory before, so all opinions/options welcome!
>
> Thanks!
> Tiffany
>
>
> --
> If you cannot be brave – and it is often hard to be brave – be kind.
>
> The Carpentries / discuss / see discussions + participants + delivery options 
> Permalink

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mbf1e4c2ac14b8dfcb9781826
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] SQL Database version control tool recommendations?

2018-08-10 Thread Greg Wilson

Hi Tiffany,

For small SQLite databases, the simplest thing is to dump as SQL text 
and put that under version control. I've done this with DBs up to 100kb 
or so, and it allows diff and merge to work as they usually do.  
It's...not horrible.  For larger databases, I've seen groups create a 
database backup using the DBMS's native tool and then use something like 
Git LFS to manage that backup as a binary blob.  It works, but you then 
have to use the DBMS's own tools for finding differences and reconciling 
them.


Cheers,

Greg


On 2018-08-09 10:47 PM, Tiffany A. Timbers via discuss wrote:

Hi folks,

I am looking for SQL Database version control tool recommendations? 
Ones that work with Git and are open source are ideal. I have never 
tread in this territory before, so all opinions/options welcome!


Thanks!
Tiffany
*The Carpentries * / discuss 
/ see discussions  + 
participants  
+ delivery options 
 
Permalink 
 



--
If you cannot be brave – and it is often hard to be brave – be kind.


--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-M6caef8c352c65a5011f9bc27
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription


Re: [discuss] SQL Database version control tool recommendations?

2018-08-10 Thread Noam Ross via discuss
If you're working in R, Carl Boettiger has a package for quickly dumping
and restoring a database to/from plain-text formats:
https://cboettig.github.io/arkdb/

On Fri, Aug 10, 2018 at 5:54 AM Tiffany A. Timbers via discuss <
discuss@lists.carpentries.org> wrote:

> Hi folks,
>
> I am looking for SQL Database version control tool recommendations? Ones
> that work with Git and are open source are ideal. I have never tread in
> this territory before, so all opinions/options welcome!
>
> Thanks!
> Tiffany
> *The Carpentries * / discuss /
> see discussions  +
> participants  + 
> delivery
> options 
> Permalink
> 
>

--
The Carpentries: discuss
Permalink: 
https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mde796e3dfcbe88a00e58b039
Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription