Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-07 Thread Nate Coraor
Sergei Ryazansky wrote:
> Thanks for reply!
> 
> set_dataset_sizes.py works fine for me. By the way, what is difference 
> between file_size and total_size fields? It seems that their values are equal.

Datasets with external metadata and composite datasets may contain extra
files, and total_size will be greater if this is the case.

> Although 'allow_user_dataset_purge' setting seems to be very usefull, but it 
> is absent in my universe_wsgi.ini file (galaxy_dist).

Have a look in the universe_wsgi.ini.sample file for it.

--nate

> 
> 
> 
> 06.07.2011 22:28, Nate Coraor пишет:
> >Also, datasets created prior to the addition of the total_size column in
> >changeset 5700:70e2b1c95a69 will have this unset - it can be set by
> >running the script:
> >
> > % python ./scripts/set_dataset_sizes.py
> >
> >Also, Sergei, it's possible to allow users to force datsaets to be
> >removed from disk after they "delete" them.  See the
> >'allow_user_dataset_purge' option in universe_wsgi.ini.  If set to
> >True, users can select "Show Deleted Datasets" from the History's
> >"Options" menu and then choose datasets to purge.  Entire histories can
> >be purged from the history list.
> >
> >--nate
> 
> 
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-06 Thread Sergei Ryazansky

Thanks for reply!

set_dataset_sizes.py works fine for me. By the way, what is difference between 
file_size and total_size fields? It seems that their values are equal.
Although 'allow_user_dataset_purge' setting seems to be very usefull, but it is 
absent in my universe_wsgi.ini file (galaxy_dist).



06.07.2011 22:28, Nate Coraor пишет:

Also, datasets created prior to the addition of the total_size column in
changeset 5700:70e2b1c95a69 will have this unset - it can be set by
running the script:

 % python ./scripts/set_dataset_sizes.py

Also, Sergei, it's possible to allow users to force datsaets to be
removed from disk after they "delete" them.  See the
'allow_user_dataset_purge' option in universe_wsgi.ini.  If set to
True, users can select "Show Deleted Datasets" from the History's
"Options" menu and then choose datasets to purge.  Entire histories can
be purged from the history list.

--nate



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-06 Thread Sergei Ryazansky

Hello Assaf,
thank you for the exellent explanation, the situation is become more 
clearly for me.



06.07.2011 21:19, Assaf Gordon пишет:

Hello Sergei,

I'm experimenting with the clean-up scripts myself, so perhaps I can offer some 
information (the galaxy team is welcomed to correct me and/or explain better).


1. If you look at the output of your query, you'll notice that the "purged" field is 0 
for all datasets (I assume 0 is "false" in MySQL).
This means that the actual files where *not* purged (e.g. physically deleted) - at least not by the 
"purge_datasets.sh" or "cleanup_datasets.py -3" step.
Since you did use "-r" parameter, it means those dataset were not picked-up as 
possible deletion candidates by this script.


2. (The following I found by reading the source code, it's not really well 
explained - so if I'm wrong - correct me).
The "dataset" table has an "update_time" field, and this field is updated 
automatically whenever the dataset record changes.
This means that when you run the first cleanup script and set the "deleted" flag to true, 
the update_time is updated to "now".
When you run the next clean-up script and ask for anything that is older than 1 day ("-d 1"), it 
looks for the update_time older then one day - so it will *not* find the dataset that was just marked as 
"deleted" in the first step (because the update_time is "now"). Only if you run the next 
clean-up script tomorrow, that dataset will be deleted.

So, for example, running the following in succession:
cleanup_datasets.py universe_wsgi.ini -d 1 -6( =>  delete datasets )
cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r ( =>  purge datasets + delete 
physical files)

both run with "-d 1" - but by design, files from yesterday (1 day old) will not 
be physically deleted.

Files that the user deleted yesterday (1 day old) will be marked as "deleted", but their 
update_time will by "now".
Only files that were marked as deleted yesterday will be deleted today 
(meaning: they are 2 days old).

To really delete files now, use "-d 0" with all the scripts.
Since this is quite scary, the "-i" (info only) mode will show what what will 
be deleted (but that requires a recent version 5770:a5e0a5d3c0a1).


3. The file_size=NULL issue happen when a job fails - on some occasions (I couldn't 
pinpoint exactly when) galaxy does not pickup the fact the an output file was generated 
even if the job failed, and so you get "ghost" files which exist on the disk 
but are NULL in the database.
The "discard" means the job was discarded (by the galaxy user?) - not that the 
dataset was deleted/purged by the clean-up scripts.


Hope this helps,
  -gordon



Sergei Ryazansky wrote, On 07/06/2011 12:15 PM:

Hi,
thank you for answer.
I have tried to use the mentioned scripts but it seems that the order of their 
using at first time was incorrect.. As a result, the metadata in database 
tables are modified but the datasets files corresponded to deleted datasets in 
history remains unremoved. So, the following calling of the scripts in the 
right order (as indicated in wiki) also didn't delete the unused dataset files. 
Is there any way to update the metadata in tables according to the real state 
of files?
I think that the order of calling the scripts at first time was the following:
cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
cleanup_datasets.py universe_wsgi.ini -d 6 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 2 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 3 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 4 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 5 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -2 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -4 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -5 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r

Also there are some strange things (imho) in galaxy.dataset table: there a lot 
of datasets id having or NULL total size:

mysql>  select * from dataset where (id="148" or id="53" or id="86" or id="146" or 
id="330");
+-+-+-+---+-++--+---+---+---++
| id | create_time | update_time | state | deleted | purged | purgable | 
external_filename | _extra_files_path | file_size | total_size |
+-+-+-+---+-++--+---+---+---++
| 53 | 2011-03-29 16:21:58 | 2011-07-06 14:17:49 | error | 1 | 0 | 1 | NULL | 
NULL | 0 | NULL |
| 86 | 2011-03-29 20:35:44 | 2011-07-06 14:17:52 | discarded | 1 | 0 | 1 | NULL 
| NULL | NULL | NULL |
| 146 | 2011-05-26 01:38:14 | 2011-07-06 14:18:00 | error | 1 | 0 | 1 | NULL | 
NULL | NULL | NULL |
| 148 | 2011-05-26 02:20:44 | 2011-07-06 14:18:00 | discarded | 1 | 0 | 1 | 
NULL | NULL | NULL | NULL |
| 330 | 2011-07-05 00:4

Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-06 Thread Nate Coraor
Assaf Gordon wrote:
> Hello Sergei,
> 
> I'm experimenting with the clean-up scripts myself, so perhaps I can offer 
> some information (the galaxy team is welcomed to correct me and/or explain 
> better).
> 
> 
> 1. If you look at the output of your query, you'll notice that the "purged" 
> field is 0 for all datasets (I assume 0 is "false" in MySQL).
> This means that the actual files where *not* purged (e.g. physically deleted) 
> - at least not by the "purge_datasets.sh" or "cleanup_datasets.py -3" step.
> Since you did use "-r" parameter, it means those dataset were not picked-up 
> as possible deletion candidates by this script.
> 
> 
> 2. (The following I found by reading the source code, it's not really well 
> explained - so if I'm wrong - correct me).
> The "dataset" table has an "update_time" field, and this field is updated 
> automatically whenever the dataset record changes.
> This means that when you run the first cleanup script and set the "deleted" 
> flag to true, the update_time is updated to "now".
> When you run the next clean-up script and ask for anything that is older than 
> 1 day ("-d 1"), it looks for the update_time older then one day - so it will 
> *not* find the dataset that was just marked as "deleted" in the first step 
> (because the update_time is "now"). Only if you run the next clean-up script 
> tomorrow, that dataset will be deleted.
> 
> So, for example, running the following in succession:
> cleanup_datasets.py universe_wsgi.ini -d 1 -6( => delete datasets )
> cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r ( => purge datasets + delete 
> physical files)
> 
> both run with "-d 1" - but by design, files from yesterday (1 day old) will 
> not be physically deleted.
> 
> Files that the user deleted yesterday (1 day old) will be marked as 
> "deleted", but their update_time will by "now".
> Only files that were marked as deleted yesterday will be deleted today 
> (meaning: they are 2 days old).
> 
> To really delete files now, use "-d 0" with all the scripts.
> Since this is quite scary, the "-i" (info only) mode will show what what will 
> be deleted (but that requires a recent version 5770:a5e0a5d3c0a1).
> 
> 
> 3. The file_size=NULL issue happen when a job fails - on some occasions (I 
> couldn't pinpoint exactly when) galaxy does not pickup the fact the an output 
> file was generated even if the job failed, and so you get "ghost" files which 
> exist on the disk but are NULL in the database.
> The "discard" means the job was discarded (by the galaxy user?) - not that 
> the dataset was deleted/purged by the clean-up scripts.

Also, datasets created prior to the addition of the total_size column in
changeset 5700:70e2b1c95a69 will have this unset - it can be set by
running the script:

% python ./scripts/set_dataset_sizes.py

Also, Sergei, it's possible to allow users to force datsaets to be
removed from disk after they "delete" them.  See the
'allow_user_dataset_purge' option in universe_wsgi.ini.  If set to
True, users can select "Show Deleted Datasets" from the History's
"Options" menu and then choose datasets to purge.  Entire histories can
be purged from the history list.

--nate

> 
> 
> Hope this helps,
>  -gordon
> 
> 
> 
> Sergei Ryazansky wrote, On 07/06/2011 12:15 PM:
> > Hi,
> > thank you for answer.
> > I have tried to use the mentioned scripts but it seems that the order of 
> > their using at first time was incorrect.. As a result, the metadata in 
> > database tables are modified but the datasets files corresponded to deleted 
> > datasets in history remains unremoved. So, the following calling of the 
> > scripts in the right order (as indicated in wiki) also didn't delete the 
> > unused dataset files. Is there any way to update the metadata in tables 
> > according to the real state of files?
> > I think that the order of calling the scripts at first time was the 
> > following:
> > cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
> > cleanup_datasets.py universe_wsgi.ini -d 6 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 2 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 3 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 4 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 5 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -1 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -2 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -4 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -5 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r
> > cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
> > 
> > Also there are some strange things (imho) in galaxy.dataset table: there a 
> > lot of datasets id having or NULL total size:
> > 
> > mysql> select * from dataset where (id="148" or id="53" or id="86" or 
> > id="146" or id="330");
> > +-+-+-+---+-++--+---+---+---++
> > | id | create_time | update_time | 

Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-06 Thread Assaf Gordon
Hello Sergei,

I'm experimenting with the clean-up scripts myself, so perhaps I can offer some 
information (the galaxy team is welcomed to correct me and/or explain better).


1. If you look at the output of your query, you'll notice that the "purged" 
field is 0 for all datasets (I assume 0 is "false" in MySQL).
This means that the actual files where *not* purged (e.g. physically deleted) - 
at least not by the "purge_datasets.sh" or "cleanup_datasets.py -3" step.
Since you did use "-r" parameter, it means those dataset were not picked-up as 
possible deletion candidates by this script.


2. (The following I found by reading the source code, it's not really well 
explained - so if I'm wrong - correct me).
The "dataset" table has an "update_time" field, and this field is updated 
automatically whenever the dataset record changes.
This means that when you run the first cleanup script and set the "deleted" 
flag to true, the update_time is updated to "now".
When you run the next clean-up script and ask for anything that is older than 1 
day ("-d 1"), it looks for the update_time older then one day - so it will 
*not* find the dataset that was just marked as "deleted" in the first step 
(because the update_time is "now"). Only if you run the next clean-up script 
tomorrow, that dataset will be deleted.

So, for example, running the following in succession:
cleanup_datasets.py universe_wsgi.ini -d 1 -6( => delete datasets )
cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r ( => purge datasets + delete 
physical files)

both run with "-d 1" - but by design, files from yesterday (1 day old) will not 
be physically deleted.

Files that the user deleted yesterday (1 day old) will be marked as "deleted", 
but their update_time will by "now".
Only files that were marked as deleted yesterday will be deleted today 
(meaning: they are 2 days old).

To really delete files now, use "-d 0" with all the scripts.
Since this is quite scary, the "-i" (info only) mode will show what what will 
be deleted (but that requires a recent version 5770:a5e0a5d3c0a1).


3. The file_size=NULL issue happen when a job fails - on some occasions (I 
couldn't pinpoint exactly when) galaxy does not pickup the fact the an output 
file was generated even if the job failed, and so you get "ghost" files which 
exist on the disk but are NULL in the database.
The "discard" means the job was discarded (by the galaxy user?) - not that the 
dataset was deleted/purged by the clean-up scripts.


Hope this helps,
 -gordon



Sergei Ryazansky wrote, On 07/06/2011 12:15 PM:
> Hi,
> thank you for answer.
> I have tried to use the mentioned scripts but it seems that the order of 
> their using at first time was incorrect.. As a result, the metadata in 
> database tables are modified but the datasets files corresponded to deleted 
> datasets in history remains unremoved. So, the following calling of the 
> scripts in the right order (as indicated in wiki) also didn't delete the 
> unused dataset files. Is there any way to update the metadata in tables 
> according to the real state of files?
> I think that the order of calling the scripts at first time was the following:
> cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
> cleanup_datasets.py universe_wsgi.ini -d 6 -1 -r
> cleanup_datasets.py universe_wsgi.ini -d 2 -1 -r
> cleanup_datasets.py universe_wsgi.ini -d 3 -1 -r
> cleanup_datasets.py universe_wsgi.ini -d 4 -1 -r
> cleanup_datasets.py universe_wsgi.ini -d 5 -1 -r
> cleanup_datasets.py universe_wsgi.ini -d 1 -1 -r
> cleanup_datasets.py universe_wsgi.ini -d 1 -2 -r
> cleanup_datasets.py universe_wsgi.ini -d 1 -4 -r
> cleanup_datasets.py universe_wsgi.ini -d 1 -5 -r
> cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r
> cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
> 
> Also there are some strange things (imho) in galaxy.dataset table: there a 
> lot of datasets id having or NULL total size:
> 
> mysql> select * from dataset where (id="148" or id="53" or id="86" or 
> id="146" or id="330");
> +-+-+-+---+-++--+---+---+---++
> | id | create_time | update_time | state | deleted | purged | purgable | 
> external_filename | _extra_files_path | file_size | total_size |
> +-+-+-+---+-++--+---+---+---++
> | 53 | 2011-03-29 16:21:58 | 2011-07-06 14:17:49 | error | 1 | 0 | 1 | NULL | 
> NULL | 0 | NULL |
> | 86 | 2011-03-29 20:35:44 | 2011-07-06 14:17:52 | discarded | 1 | 0 | 1 | 
> NULL | NULL | NULL | NULL |
> | 146 | 2011-05-26 01:38:14 | 2011-07-06 14:18:00 | error | 1 | 0 | 1 | NULL 
> | NULL | NULL | NULL |
> | 148 | 2011-05-26 02:20:44 | 2011-07-06 14:18:00 | discarded | 1 | 0 | 1 | 
> NULL | NULL | NULL | NULL |
> | 330 | 2011-07-05 00:44:44 | 2011-07-05 00:44:44 | NULL | 0 | 0 | 1 | NULL | 
> NULL |

Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-06 Thread Sergei Ryazansky
Hi,
thank you for answer.
I have tried to use the mentioned scripts but it seems that the order of
their using at first time was incorrect.. As a result, the metadata in
database tables are modified but the datasets files corresponded to deleted
datasets in history remains unremoved. So, the following calling of the
scripts in the right order (as indicated in wiki) also didn't delete the
unused dataset files. Is there any way to update the metadata in tables
according to the real state of files?
I think that the order of calling the scripts at first time was the
following:
cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r
cleanup_datasets.py universe_wsgi.ini -d 6 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 2 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 3 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 4 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 5 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -1 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -2 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -4 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -5 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -3 -r
cleanup_datasets.py universe_wsgi.ini -d 1 -6 -r

Also there are some strange things (imho) in galaxy.dataset table: there a
lot of datasets id having or NULL total size:

mysql> select * from dataset where (id="148" or id="53" or id="86" or
id="146" or id="330");
+-+-+-+---+-++--+---+---+---++
| id | create_time | update_time | state | deleted | purged | purgable |
external_filename | _extra_files_path | file_size | total_size |
+-+-+-+---+-++--+---+---+---++
| 53 | 2011-03-29 16:21:58 | 2011-07-06 14:17:49 | error | 1 | 0 | 1 | NULL
| NULL | 0 | NULL |
| 86 | 2011-03-29 20:35:44 | 2011-07-06 14:17:52 | discarded | 1 | 0 | 1 |
NULL | NULL | NULL | NULL |
| 146 | 2011-05-26 01:38:14 | 2011-07-06 14:18:00 | error | 1 | 0 | 1 | NULL
| NULL | NULL | NULL |
| 148 | 2011-05-26 02:20:44 | 2011-07-06 14:18:00 | discarded | 1 | 0 | 1 |
NULL | NULL | NULL | NULL |
| 330 | 2011-07-05 00:44:44 | 2011-07-05 00:44:44 | NULL | 0 | 0 | 1 | NULL
| NULL | NULL | NULL |
+-+-+-+---+-++--+---+---+---++

I don't know how these records looked like before calling of the cleanup
scripts, but is it possible that it is because of incorrect order of their
calling? Is "discarded" state mean that the corresponded file should be
deleted? But in my case all these files are still in database folder.
Please, let me know if you need any other of clarification of my questions.


2011/7/6 Hans-Rudolf Hotz 

> Hi Sergei
>
> This is a question better asked on 'galaxy-...@bx.psu.edu' since you refer
> to your local Galaxy installation.
>
>
> In order to remove the data from your file system, you need to run the
> 'cleanup scripts', as described on this wiki page:
>
>
>
> https://bitbucket.org/galaxy/galaxy-central/wiki/Config/PurgeHistoriesAndDatasets
>
>
>
> Regards, Hans
>
>
>
> On 07/06/2011 03:33 PM, Sergei Ryazansky wrote:
>
>>
>>
>>  Исходное сообщение 
>> Тема:   deleting datasets from history
>> Дата:   Tue, 5 Jul 2011 19:58:45 +0300
>> От: Sergei Ryazansky 
>> Кому:   galaxy-user-requ...@lists.bx.psu.edu
>>
>>
>>
>> Hello all,
>>
>>
>> After the deleating datasets from the history panel in our Galaxy mirror
>> the indicator at the top right corner shows the same amount of used
>> space as before deleting. Also, the files corresponded to the datasets
>> remains in the Galaxy database/files/000 directory. It seems, that
>> deleting of datasets from history is only delete the launch to file but
>> not the file itself. How to configure the Galaxy mirror to delete not
>> only records in history panel but also the corresponed files?
>> Thank you in advance!
>>
>>
>>
>> ___
>> The Galaxy User list should be used for the discussion of
>> Galaxy analysis and other features on the public server
>> at usegalaxy.org.  Please keep all replies on the list by
>> using "reply all" in your mail client.  For discussion of
>> local Galaxy instances and the Galaxy source code, please
>> use the Galaxy Development list:
>>
>>   http://lists.bx.psu.edu/listinfo/galaxy-dev
>>
>> To manage your subscriptions to this and other Galaxy lists,
>> please use the interface at:
>>
>>   http://lists.bx.psu.edu/
>>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] [galaxy-user] Fwd: deleting datasets from history

2011-07-06 Thread Hans-Rudolf Hotz

Hi Sergei

This is a question better asked on 'galaxy-...@bx.psu.edu' since you 
refer to your local Galaxy installation.



In order to remove the data from your file system, you need to run the 
'cleanup scripts', as described on this wiki page:



https://bitbucket.org/galaxy/galaxy-central/wiki/Config/PurgeHistoriesAndDatasets



Regards, Hans


On 07/06/2011 03:33 PM, Sergei Ryazansky wrote:



 Исходное сообщение 
Тема:   deleting datasets from history
Дата:   Tue, 5 Jul 2011 19:58:45 +0300
От: Sergei Ryazansky 
Кому:   galaxy-user-requ...@lists.bx.psu.edu



Hello all,


After the deleating datasets from the history panel in our Galaxy mirror
the indicator at the top right corner shows the same amount of used
space as before deleting. Also, the files corresponded to the datasets
remains in the Galaxy database/files/000 directory. It seems, that
deleting of datasets from history is only delete the launch to file but
not the file itself. How to configure the Galaxy mirror to delete not
only records in history panel but also the corresponed files?
Thank you in advance!



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/