Re: [galaxy-dev] Datasets incorrectly flagged as deleted

2016-12-05 Thread Lance Parsons
I do use that script, though I hope it’s not causing this. The intent 
behind it was to simulate deletion from the HDA by the user, and then 
let the cleanup process progress as normal. Also, I only run on certain 
datasets, and many (all?) of the specific examples of this issue I’ve 
seen were not ones touched by the script, but datasets copied from a 
Data Library.


Looking a bit further:

 * |d.deleted| and |not hda.purged| = 66845
 o |d.deleted| and |d.purged| and |not hda.purged| = 54818
 o |d.deleted| and |not d.purged| and |not hda.purged| = 12027


Nate Coraor 
December 2, 2016 at 1:09 PM
Lance,

Do you use the administrative deletion script you wrote? That would 
probably cause d.deleted and not hda.purged.


d.deleted isn't for queries, it's sort of a deletion buffer. After all 
hda w/ hda.dataset_id = d.id  are hda.purged, the next 
cleanup step is to mark d.deleted. Then after d.deleted, it's marked 
d.purged and the file(s) are actually removed from disk. It's a bit of 
a failsafe so that datasets can be intercepted if unintentionally deleted.


--nate


Lance Parsons 
December 2, 2016 at 12:39 PM
Thanks Nate, that makes sense. However it seems I still have an issue:

```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.purged = 'f';
 count
---
 67464
(1 row)
```

Perhaps I'll need to write some script to check each of these to see 
if the data does, indeed, exist, and then set the flag 
appropriately... Hrmm.


Does anyone know what the `dataset.deleted` flag is used for? Is that 
just supposed to be set when all `hda.purged` are `t`. Sort of like a 
shortcut for queries?


- Lance

Nate Coraor 
December 2, 2016 at 11:15 AM
Lance,

usegalaxy.org  has 4,652,912 such datasets. The 
cause here is that deleting an entire history does not mark the HDAs 
deleted (so that if you view a deleted history you can see what 
datasets were deleted and which were not at the time of deletion). 
There is a separate hda.purged column that indicates that an HDA is no 
longer user-recoverable by the user. I have 699 datasets that are 
d.deleted but not hda.purged, this number should be 0.


--nate


Lance Parsons 
November 30, 2016 at 2:20 PM
I've run into issues over the past year where some jobs would 
occasionally fail to start (stuck in a `new` state). I tracked them 
down to a situataion where `dataset.deleted` is set to `t` yet the 
`history_dataset_association.deleted` is `f`. Simply setting 
`dataset.deleted` to `f` in those instances resolved the issue and the 
jobs ran. The datasets have all still been on disk.


Since this is a pretty annoying situation, I thought I'd check to see 
if there are other datasets with this problem. Shockingly, I found 
many thousands of such datasets:


```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.deleted = 'f';
 count
---
 76977
(1 row)
```

I'm hesitant to update so many rows in my database so I thought I'd 
put this out there for comment. What do others see when running the 
above query? Has anyone run into this or a similar issue? Thanks.



​
--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 136
Lewis-Sigler Institute for Integrative Genomics
Princeton University

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Datasets incorrectly flagged as deleted

2016-12-02 Thread Nate Coraor
Lance,

Do you use the administrative deletion script you wrote? That would
probably cause d.deleted and not hda.purged.

d.deleted isn't for queries, it's sort of a deletion buffer. After all hda
w/ hda.dataset_id = d.id are hda.purged, the next cleanup step is to mark
d.deleted. Then after d.deleted, it's marked d.purged and the file(s) are
actually removed from disk. It's a bit of a failsafe so that datasets can
be intercepted if unintentionally deleted.

--nate

On Fri, Dec 2, 2016 at 12:39 PM, Lance Parsons 
wrote:

> Thanks Nate, that makes sense. However it seems I still have an issue:
>
> ```
> select count(d.id)
> from dataset d
> join history_dataset_association hda on d.id = hda.dataset_id
> where d.deleted = 't' and hda.purged = 'f';
>  count
> ---
>  67464
> (1 row)
> ```
>
> Perhaps I'll need to write some script to check each of these to see if
> the data does, indeed, exist, and then set the flag appropriately... Hrmm.
>
> Does anyone know what the `dataset.deleted` flag is used for? Is that just
> supposed to be set when all `hda.purged` are `t`. Sort of like a shortcut
> for queries?
>
> - Lance
>
> Nate Coraor 
> December 2, 2016 at 11:15 AM
> Lance,
>
> usegalaxy.org has 4,652,912 such datasets. The cause here is that
> deleting an entire history does not mark the HDAs deleted (so that if you
> view a deleted history you can see what datasets were deleted and which
> were not at the time of deletion). There is a separate hda.purged column
> that indicates that an HDA is no longer user-recoverable by the user. I
> have 699 datasets that are d.deleted but not hda.purged, this number should
> be 0.
>
> --nate
>
>
> Lance Parsons 
> November 30, 2016 at 2:20 PM
> I've run into issues over the past year where some jobs would occasionally
> fail to start (stuck in a `new` state). I tracked them down to a situataion
> where `dataset.deleted` is set to `t` yet the 
> `history_dataset_association.deleted`
> is `f`. Simply setting `dataset.deleted` to `f` in those instances resolved
> the issue and the jobs ran. The datasets have all still been on disk.
>
> Since this is a pretty annoying situation, I thought I'd check to see if
> there are other datasets with this problem. Shockingly, I found many
> thousands of such datasets:
>
> ```
> select count(d.id)
> from dataset d
> join history_dataset_association hda on d.id = hda.dataset_id
> where d.deleted = 't' and hda.deleted = 'f';
>  count
> ---
>  76977
> (1 row)
> ```
>
> I'm hesitant to update so many rows in my database so I thought I'd put
> this out there for comment. What do others see when running the above
> query? Has anyone run into this or a similar issue? Thanks.
>
>
> --
> Lance Parsons - Scientific Programmer
> Carl C. Icahn Laboratory - Room 136
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
>
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Datasets incorrectly flagged as deleted

2016-12-02 Thread Lance Parsons

Thanks Nate, that makes sense. However it seems I still have an issue:

```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.purged = 'f';
 count
---
 67464
(1 row)
```

Perhaps I'll need to write some script to check each of these to see if 
the data does, indeed, exist, and then set the flag appropriately... Hrmm.


Does anyone know what the `dataset.deleted` flag is used for? Is that 
just supposed to be set when all `hda.purged` are `t`. Sort of like a 
shortcut for queries?


- Lance

Nate Coraor 
December 2, 2016 at 11:15 AM
Lance,

usegalaxy.org  has 4,652,912 such datasets. The 
cause here is that deleting an entire history does not mark the HDAs 
deleted (so that if you view a deleted history you can see what 
datasets were deleted and which were not at the time of deletion). 
There is a separate hda.purged column that indicates that an HDA is no 
longer user-recoverable by the user. I have 699 datasets that are 
d.deleted but not hda.purged, this number should be 0.


--nate


Lance Parsons 
November 30, 2016 at 2:20 PM
I've run into issues over the past year where some jobs would 
occasionally fail to start (stuck in a `new` state). I tracked them 
down to a situataion where `dataset.deleted` is set to `t` yet the 
`history_dataset_association.deleted` is `f`. Simply setting 
`dataset.deleted` to `f` in those instances resolved the issue and the 
jobs ran. The datasets have all still been on disk.


Since this is a pretty annoying situation, I thought I'd check to see 
if there are other datasets with this problem. Shockingly, I found 
many thousands of such datasets:


```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.deleted = 'f';
 count
---
 76977
(1 row)
```

I'm hesitant to update so many rows in my database so I thought I'd 
put this out there for comment. What do others see when running the 
above query? Has anyone run into this or a similar issue? Thanks.




--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 136
Lewis-Sigler Institute for Integrative Genomics
Princeton University

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Datasets incorrectly flagged as deleted

2016-12-02 Thread Nate Coraor
Lance,

usegalaxy.org has 4,652,912 such datasets. The cause here is that deleting
an entire history does not mark the HDAs deleted (so that if you view a
deleted history you can see what datasets were deleted and which were not
at the time of deletion). There is a separate hda.purged column that
indicates that an HDA is no longer user-recoverable by the user. I have 699
datasets that are d.deleted but not hda.purged, this number should be 0.

--nate

On Wed, Nov 30, 2016 at 2:20 PM, Lance Parsons 
wrote:

> I've run into issues over the past year where some jobs would occasionally
> fail to start (stuck in a `new` state). I tracked them down to a situataion
> where `dataset.deleted` is set to `t` yet the 
> `history_dataset_association.deleted`
> is `f`. Simply setting `dataset.deleted` to `f` in those instances resolved
> the issue and the jobs ran. The datasets have all still been on disk.
>
> Since this is a pretty annoying situation, I thought I'd check to see if
> there are other datasets with this problem. Shockingly, I found many
> thousands of such datasets:
>
> ```
> select count(d.id)
> from dataset d
> join history_dataset_association hda on d.id = hda.dataset_id
> where d.deleted = 't' and hda.deleted = 'f';
>  count
> ---
>  76977
> (1 row)
> ```
>
> I'm hesitant to update so many rows in my database so I thought I'd put
> this out there for comment. What do others see when running the above
> query? Has anyone run into this or a similar issue? Thanks.
>
> --
> Lance Parsons - Scientific Programmer
> Carl C. Icahn Laboratory - Room 136
> Lewis-Sigler Institute for Integrative Genomics
> Princeton University
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Datasets incorrectly flagged as deleted

2016-12-01 Thread Hans-Rudolf Hotz

Hi Lance


I don't remember seeing the problem you describe with our servers. If I 
do your query, I get:


...
 count
---
55
(1 row)


looking at the actual ids, they are all in the lower three digits range 
and many years old.


Regards, Hans-Rudolf




On 11/30/2016 08:20 PM, Lance Parsons wrote:

I've run into issues over the past year where some jobs would
occasionally fail to start (stuck in a `new` state). I tracked them down
to a situataion where `dataset.deleted` is set to `t` yet the
`history_dataset_association.deleted` is `f`. Simply setting
`dataset.deleted` to `f` in those instances resolved the issue and the
jobs ran. The datasets have all still been on disk.

Since this is a pretty annoying situation, I thought I'd check to see if
there are other datasets with this problem. Shockingly, I found many
thousands of such datasets:

```
select count(d.id)
 from dataset d
 join history_dataset_association hda on d.id = hda.dataset_id
 where d.deleted = 't' and hda.deleted = 'f';
  count
---
  76977
(1 row)
```

I'm hesitant to update so many rows in my database so I thought I'd put
this out there for comment. What do others see when running the above
query? Has anyone run into this or a similar issue? Thanks.


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/

[galaxy-dev] Datasets incorrectly flagged as deleted

2016-11-30 Thread Lance Parsons
I've run into issues over the past year where some jobs would 
occasionally fail to start (stuck in a `new` state). I tracked them down 
to a situataion where `dataset.deleted` is set to `t` yet the 
`history_dataset_association.deleted` is `f`. Simply setting 
`dataset.deleted` to `f` in those instances resolved the issue and the 
jobs ran. The datasets have all still been on disk.


Since this is a pretty annoying situation, I thought I'd check to see if 
there are other datasets with this problem. Shockingly, I found many 
thousands of such datasets:


```
select count(d.id)
from dataset d
join history_dataset_association hda on d.id = hda.dataset_id
where d.deleted = 't' and hda.deleted = 'f';
 count
---
 76977
(1 row)
```

I'm hesitant to update so many rows in my database so I thought I'd put 
this out there for comment. What do others see when running the above 
query? Has anyone run into this or a similar issue? Thanks.


--
Lance Parsons - Scientific Programmer
Carl C. Icahn Laboratory - Room 136
Lewis-Sigler Institute for Integrative Genomics
Princeton University

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
 https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
 http://galaxyproject.org/search/mailinglists/