Re: [galaxy-dev] Postgresql database cleaning

2013-09-01 Thread Christophe Antoniewski
That's right I have over-exaggerated: the size of the complete database
dump is currently 125 Mo.
Yet it does not really change my question about the galaxy database
philosophy. With time and instance upscaling, 1To is not so
unbelievable, don't you think ? On the other hand, what is the point to
keep in the database deleted histories, users, datasets, etc ? Is it just
that the db structure is so complicated that real deletion would be too
risky ? (indeed I am not a guru of postgresql at all).
Apart from this lack of esthetics in the adopted solution to keep
everything until the end of times  (just my humble opinion), there is other
aspects a bit irritating: for instance, when you manage users, as already
discussed in previous posts, you get confused by many users who do not
exist anymore since a long time. Just an example.

Sometime, when I try to imagine the future of our galaxy instance in the
next 5 years let's say, I got the feeling that the only solution would be
to restart a galaxy instance from scratch, asking users to register again,
reimport their datasets etc... which again goes against my sense of
esthetics.

I would be curious to know what are the plans for the future of
https://main.g2.bx.psu.edu/ for instance.

Chris



Christophe Antoniewski


Drosophila Genetics and Epigenetics
Laboratoire de Biologie du Développement
9, Quai St Bernard, Boîte courrier 24
75252 Paris Cedex 05

Tel +33 1 44 27 34 39
Fax +33 1 44 27 34 45
Mobile +33 6 68 60 51 50

http://drosophile.org


2013/8/29 Dannon Baker dannon.ba...@gmail.com

 Can you get a dump of table sizes for us to compare with?

 http://wiki.postgresql.org/wiki/Disk_Usage


 On Thu, Aug 29, 2013 at 12:05 PM, Nate Coraor n...@bx.psu.edu wrote:

 On Aug 29, 2013, at 11:50 AM, Nate Coraor wrote:

  On Aug 26, 2013, at 5:03 AM, Christophe Antoniewski wrote:
 
  Hi everybody,
 
  The python scripts to clean histories, datasets, users etc.. are
 fine...
  However, the records are not really removed from the postgresql
 database and as a result, this one gets bigger and bigger with unused
 records. Ours is above 1 To after 2 years of production.
 
  Is there a safe way to clean the database from unused records and
 their dependencies to reduce it size, without being a postgresql guru ?
 
  Hi Chris,
 
  The database maintains a permanent record of everything that was done,
 even though the underlying data can be removed.  There are a lot of
 dependencies between objects in Galaxy and removing records, especially
 anything with a foreign key, could easily result in a lot of problems with
 all kinds of things, from the UI to running jobs.  Because of this, records
 cannot be removed from the database.

 Somehow I missed that you said your database is 1 TB - that should not be
 the case.  Unless your database is not being vacuumed or you create objects
 at an extreme rate, it seems as though something has been stored in it that
 should not have.

 --nate

 
  --nate
 
 
  Chris
  --
  Christophe Antoniewski
 
 
 
  Drosophila Genetics and Epigenetics
  Laboratoire de Biolologie du Développement
  9, Quai St Bernard, Boîte courrier 24
  75252 Paris Cedex 05
 
  Tel   +33 1 44 27 34 39
  Fax   +33 1 44 27 34 45
  Mobile   +33 6 68 60 51 50
 
  http://drosophile.org
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
  To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
 
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/
 
  To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
 


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Postgresql database cleaning

2013-08-29 Thread Nate Coraor
On Aug 26, 2013, at 5:03 AM, Christophe Antoniewski wrote:

 Hi everybody,
 
 The python scripts to clean histories, datasets, users etc.. are fine...
 However, the records are not really removed from the postgresql database and 
 as a result, this one gets bigger and bigger with unused records. Ours is 
 above 1 To after 2 years of production.
 
 Is there a safe way to clean the database from unused records and their 
 dependencies to reduce it size, without being a postgresql guru ?

Hi Chris,

The database maintains a permanent record of everything that was done, even 
though the underlying data can be removed.  There are a lot of dependencies 
between objects in Galaxy and removing records, especially anything with a 
foreign key, could easily result in a lot of problems with all kinds of things, 
from the UI to running jobs.  Because of this, records cannot be removed from 
the database.

--nate

 
 Chris
 -- 
 Christophe Antoniewski
 
 
 
 Drosophila Genetics and Epigenetics
 Laboratoire de Biolologie du Développement
 9, Quai St Bernard, Boîte courrier 24
 75252 Paris Cedex 05
 
 Tel+33 1 44 27 34 39
 Fax+33 1 44 27 34 45
 Mobile+33 6 68 60 51 50
 
 http://drosophile.org
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Postgresql database cleaning

2013-08-29 Thread Nate Coraor
On Aug 29, 2013, at 11:50 AM, Nate Coraor wrote:

 On Aug 26, 2013, at 5:03 AM, Christophe Antoniewski wrote:
 
 Hi everybody,
 
 The python scripts to clean histories, datasets, users etc.. are fine...
 However, the records are not really removed from the postgresql database and 
 as a result, this one gets bigger and bigger with unused records. Ours is 
 above 1 To after 2 years of production.
 
 Is there a safe way to clean the database from unused records and their 
 dependencies to reduce it size, without being a postgresql guru ?
 
 Hi Chris,
 
 The database maintains a permanent record of everything that was done, even 
 though the underlying data can be removed.  There are a lot of dependencies 
 between objects in Galaxy and removing records, especially anything with a 
 foreign key, could easily result in a lot of problems with all kinds of 
 things, from the UI to running jobs.  Because of this, records cannot be 
 removed from the database.

Somehow I missed that you said your database is 1 TB - that should not be the 
case.  Unless your database is not being vacuumed or you create objects at an 
extreme rate, it seems as though something has been stored in it that should 
not have.

--nate

 
 --nate
 
 
 Chris
 -- 
 Christophe Antoniewski
 
 
 
 Drosophila Genetics and Epigenetics
 Laboratoire de Biolologie du Développement
 9, Quai St Bernard, Boîte courrier 24
 75252 Paris Cedex 05
 
 Tel   +33 1 44 27 34 39
 Fax   +33 1 44 27 34 45
 Mobile   +33 6 68 60 51 50
 
 http://drosophile.org
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
 To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
 


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Postgresql database cleaning

2013-08-29 Thread Dannon Baker
Can you get a dump of table sizes for us to compare with?

http://wiki.postgresql.org/wiki/Disk_Usage


On Thu, Aug 29, 2013 at 12:05 PM, Nate Coraor n...@bx.psu.edu wrote:

 On Aug 29, 2013, at 11:50 AM, Nate Coraor wrote:

  On Aug 26, 2013, at 5:03 AM, Christophe Antoniewski wrote:
 
  Hi everybody,
 
  The python scripts to clean histories, datasets, users etc.. are fine...
  However, the records are not really removed from the postgresql
 database and as a result, this one gets bigger and bigger with unused
 records. Ours is above 1 To after 2 years of production.
 
  Is there a safe way to clean the database from unused records and their
 dependencies to reduce it size, without being a postgresql guru ?
 
  Hi Chris,
 
  The database maintains a permanent record of everything that was done,
 even though the underlying data can be removed.  There are a lot of
 dependencies between objects in Galaxy and removing records, especially
 anything with a foreign key, could easily result in a lot of problems with
 all kinds of things, from the UI to running jobs.  Because of this, records
 cannot be removed from the database.

 Somehow I missed that you said your database is 1 TB - that should not be
 the case.  Unless your database is not being vacuumed or you create objects
 at an extreme rate, it seems as though something has been stored in it that
 should not have.

 --nate

 
  --nate
 
 
  Chris
  --
  Christophe Antoniewski
 
 
 
  Drosophila Genetics and Epigenetics
  Laboratoire de Biolologie du Développement
  9, Quai St Bernard, Boîte courrier 24
  75252 Paris Cedex 05
 
  Tel   +33 1 44 27 34 39
  Fax   +33 1 44 27 34 45
  Mobile   +33 6 68 60 51 50
 
  http://drosophile.org
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/
 
  To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/
 
 
  ___
  Please keep all replies on the list by using reply all
  in your mail client.  To manage your subscriptions to this
  and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/
 
  To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
 


 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Postgresql database cleaning

2013-08-26 Thread Christophe Antoniewski
Hi everybody,

The python scripts to clean histories, datasets, users etc.. are fine...
However, the records are not really removed from the postgresql database
and as a result, this one gets bigger and bigger with unused records. Ours
is above 1 To after 2 years of production.

Is there a safe way to clean the database from unused records and their
dependencies to reduce it size, without being a postgresql guru ?

Chris
-- 

Christophe Antoniewski


Drosophila Genetics and Epigenetics
Laboratoire de Biolologie du Développement
9, Quai St Bernard, Boîte courrier 24
75252 Paris Cedex 05

Tel +33 1 44 27 34 39
Fax +33 1 44 27 34 45
Mobile +33 6 68 60 51 50

http://drosophile.org
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/