[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-18 Thread Marco van Tol
Hi Mark,

Thanks for all your help on this topic so far.

On Fri, 18 Oct 2024 at 05:38, Mark Sapiro  wrote:

[...]

> (I agree removing messages from an archive is far from optimal, but I'm
> > not doing this for myself :D )
> >
> > It would be a minor thing if I hadn't told people this is the way to
> > find out the number of messages in a list, and also to find the most
> > recent post to a list.
>
> What happens if you search for * and sort earliest first. Does it find
> deleted messages and are there errors?


That's the strange, but also the good part.  With my current version of the
script that refreshes the Email.objects.filter() array on every iteration,
doing this goes well as well.  The top message becomes the next one that
should not have been deleted, as expected.

The only thing that is not correct is the message count on the "search for
*" result, after I delete a bunch from the beginning.

With the first version of my script, that runs into errors, the web page
gives the server error.
I don't intend to ever use that version any more.

Thanks!

Marco van Tol
RIPE NCC
___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-18 Thread Marco van Tol
Hi Stephen,

Thank you so much for your email.

On Thu, 17 Oct 2024 at 17:43, Stephen J. Turnbull 
wrote:

> Marco van Tol writes:
>
>  > I tried this, and everything works fine with the last version of my
> script,
>  > except for one sort-of minor thing,
>
> Please don't deprecate your requirements.  If you need it, and you do:
>

Fair enough :)


>  > It would be a minor thing if I hadn't told people this is the way to
> find
>  > out the number of messages in a list, and also to find the most recent
> post
>  > to a list.
>
> we want to give it to you.  Sure, sometimes it is harder than you imagine
> or in our judgment it's not worth as much as something else we could do,
> but you needn't be shy about asking for it.  I'm also pretty sure it's
> hardly a unique requirement, at least it won't be for long, between GDPR
> and other worries about privacy of archived data in many contexts.
>

I hadn't thought about that one yet, but indeed, thank you.


>  > [And what's not working right is] the message count if you search
>  > for "*".  It won't update to the right message count in the top
>  > middle of the page until I do a "rebuild_index".
>
> How much does time does cost to do that?  If it's expensive enough that on
> "monthly cleaning day" you've got some lists that stay unsynced for many
> minutes or hours, we might need to rearchitect the index to be per list.
>

At the moment the entire list server has roughly 300.000 messages.
>From memory last time it took slightly under an hour.

A 300.000 count is a lot less than others have, but it makes it sort of
okay for our server, today.
If we time it right.

It is a 24x7 service  though, so even at the best timing there's risk for a
few people to have degraded service on the archives while the index
rebuilds.  But right now I think if we time it right once per month it's
probably okay.

It would be nice if an improvement would be somewhere on a list of
nice-to-haves.
Or perhaps that list just only gets longer, it may well. :-)

 > I'm afraid "update" index only looks at the messages changed since the
> last
>  > time update was run, and misses the fact that messages have disappeared
>  > from the beginning.
>
> Have you looked at the code to verify this?  I agree it's consistent with
> the Mailman behavior you see.  Unfortunately I'm not sure that all the
> indexers we claim to support would be able to suppose such deletions
> without a full rebuild.
>

I have not verified it in the code, I aim to have a look at some point.
Indeed I was writing this with the witnessed behaviour in mind.

Thanks Stephen!

Marco van Tol
RIPE NCC
___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-17 Thread Mark Sapiro


I tried this, and everything works fine with the last version of my 
script, except for one sort-of minor thing, and that's the message count 
if you search for "*".  It won't update to the right message count in 
the top middle of the page until I do a "rebuild_index".


I'm afraid "update" index only looks at the messages changed since the 
last time update was run, and misses the fact that messages have 
disappeared from the beginning.


You are probably correct. It probably doesn't remove index entries for 
messages no longer in the archive. I'm not sure, but this may depend on 
the particular haystack backend.


(I agree removing messages from an archive is far from optimal, but I'm 
not doing this for myself :D )


It would be a minor thing if I hadn't told people this is the way to 
find out the number of messages in a list, and also to find the most 
recent post to a list.


What happens if you search for * and sort earliest first. Does it find 
deleted messages and are there errors?


--
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-17 Thread Stephen J. Turnbull
Marco van Tol writes:

 > I tried this, and everything works fine with the last version of my script,
 > except for one sort-of minor thing,

Please don't deprecate your requirements.  If you need it, and you do:

 > It would be a minor thing if I hadn't told people this is the way to find
 > out the number of messages in a list, and also to find the most recent post
 > to a list.

we want to give it to you.  Sure, sometimes it is harder than you imagine
or in our judgment it's not worth as much as something else we could do,
but you needn't be shy about asking for it.  I'm also pretty sure it's
hardly a unique requirement, at least it won't be for long, between GDPR
and other worries about privacy of archived data in many contexts.
 
 > [And what's not working right is] the message count if you search
 > for "*".  It won't update to the right message count in the top
 > middle of the page until I do a "rebuild_index".

How much does time does cost to do that?  If it's expensive enough that on
"monthly cleaning day" you've got some lists that stay unsynced for many
minutes or hours, we might need to rearchitect the index to be per list.

 > I'm afraid "update" index only looks at the messages changed since the last
 > time update was run, and misses the fact that messages have disappeared
 > from the beginning.

Have you looked at the code to verify this?  I agree it's consistent with
the Mailman behavior you see.  Unfortunately I'm not sure that all the
indexers we claim to support would be able to suppose such deletions
without a full rebuild.

Steve


___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-17 Thread Marco van Tol
On Thu, 17 Oct 2024 at 09:38, Marco van Tol  wrote:

> Hi Mark,
>
> Thank you so much for spending some of your scarse time on this for us.
>
> On Wed, 16 Oct 2024 at 22:58, Mark Sapiro  wrote:
>

[...]


> If the massive count is distributed over multiple lists with not so many
>> messages you can run the Django update_index_one_list job to just do one
>> list.
>
>
> This is very helpful, thank you so much!
>

I tried this, and everything works fine with the last version of my script,
except for one sort-of minor thing, and that's the message count if you
search for "*".  It won't update to the right message count in the top
middle of the page until I do a "rebuild_index".

I'm afraid "update" index only looks at the messages changed since the last
time update was run, and misses the fact that messages have disappeared
from the beginning.
(I agree removing messages from an archive is far from optimal, but I'm not
doing this for myself :D )

It would be a minor thing if I hadn't told people this is the way to find
out the number of messages in a list, and also to find the most recent post
to a list.

Marco van Tol
RIPE NCC
___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-17 Thread Marco van Tol
Hi Mark,

Thank you so much for spending some of your scarse time on this for us.

On Wed, 16 Oct 2024 at 22:58, Mark Sapiro  wrote:

> On 10/16/24 07:02, Marco van Tol wrote:
> > On Wed, 16 Oct 2024 at 14:45, Marco van Tol  > > wrote:
> >
> > The script went on its way for a bit, and then blew up.
> > (attachment: error-run-1.txt)
>
> I suspect the issue there is in the order of deletion and a message
> which is a parent in a thread gets deleted before the child.
>

That's very much what it looks like, yeah.
Perhaps I could somehow sort the returned array of messages on the date and
then delete them from recent-to-old.


> > I made a change to the script that's really blunt but does work.
> >
> > See the attached change.  I can make it more efficient by calling the
> > Email.objects.filter() once per loop instead of the current 2, but if
> > you have any other improvements they'd be welcome.
>
> I don't have any suggestions.
>

That's okay, thanks for thinking about it.


> > And secondary: can I avoid the call to rebuild_index?  The actual
> > production server has a massive count of messages.
>
> If the massive count is distributed over multiple lists with not so many
> messages you can run the Django update_index_one_list job to just do one
> list.


This is very helpful, thank you so much!

Marco van Tol
RIPE NCC
___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-16 Thread Mark Sapiro

On 10/16/24 07:02, Marco van Tol wrote:
On Wed, 16 Oct 2024 at 14:45, Marco van Tol > wrote:


The script went on its way for a bit, and then blew up.
(attachment: error-run-1.txt)



I suspect the issue there is in the order of deletion and a message 
which is a parent in a thread gets deleted before the child.




I made a change to the script that's really blunt but does work.

See the attached change.  I can make it more efficient by calling the 
Email.objects.filter() once per loop instead of the current 2, but if 
you have any other improvements they'd be welcome.


I don't have any suggestions.



And secondary: can I avoid the call to rebuild_index?  The actual
production server has a massive count of messages.


If the massive count is distributed over multiple lists with not so many 
messages you can run the Django update_index_one_list job to just do one 
list.


--
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-16 Thread Marco van Tol
On Wed, 16 Oct 2024 at 14:45, Marco van Tol  wrote:

> Hi Mark,
>
> Thanks again for your answer.
>
> First of all, all of what I write below happened on a test list server, so
> no real harm was done.
> I'm just curious how to recover from this, and how to get to what I need.
> :-)
>
>
> On Wed, 11 Sept 2024 at 21:16, Mark Sapiro  wrote:
>
>> On 9/11/24 00:23, Marco van Tol wrote:
>> >
>> > We're interested in the development of a feature in mailman3 where we
>> can
>> > configure it to automatically expire/remove threads in/from the archive
>> > older than x number of days.
>>
>> The script at https://www.msapiro.net/scripts/prune_arch3 could be
>> easily modified to do this and then be run periodically by cron.
>>
>
> I made a modified version of the script, and ran it.
> (attachment: script-1.py)
>
> The script went on its way for a bit, and then blew up.
> (attachment: error-run-1.txt)
>

[...]

I made a change to the script that's really blunt but does work.

See the attached change.  I can make it more efficient by calling the
Email.objects.filter() once per loop instead of the current 2, but if you
have any other improvements they'd be welcome.


> And secondary: can I avoid the call to rebuild_index?  The actual
> production server has a massive count of messages.
>

This one very much stands, hopefully I can integrate the message deletion
mode-direct into the archive search index.

Thank you very much in advance!

Marco van Tol
RIPE NCC
import os
import sys
import django
from datetime import datetime, timezone

sys.path.insert(0, '/opt/mailman-web')
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'

django.setup()
from hyperkitty.models.email import Email

def usage():
print(f'Usage: {sys.argv[0]} -mm-dd', file=sys.stderr)
sys.exit(1)

if len(sys.argv) != 2:
usage()
try:
year, month, day = sys.argv[1].split('-')
except ValueError:
usage()
try:
cutoff = datetime(int(year), int(month), int(day), tzinfo=timezone.utc)
except (TypeError, ValueError):
usage()

count = 0
while Email.objects.filter(mailinglist=4, date__lt=cutoff).count() > 0:
msg = Email.objects.filter(mailinglist=4, date__lt=cutoff)[0]
print(msg.mailinglist, msg.date, msg.sender)
msg.delete()
count += 1



print(f"Deleted {count} messages")
___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-10-16 Thread Marco van Tol
Hi Mark,

Thanks again for your answer.

First of all, all of what I write below happened on a test list server, so
no real harm was done.
I'm just curious how to recover from this, and how to get to what I need.
:-)


On Wed, 11 Sept 2024 at 21:16, Mark Sapiro  wrote:

> On 9/11/24 00:23, Marco van Tol wrote:
> >
> > We're interested in the development of a feature in mailman3 where we can
> > configure it to automatically expire/remove threads in/from the archive
> > older than x number of days.
>
> The script at https://www.msapiro.net/scripts/prune_arch3 could be
> easily modified to do this and then be run periodically by cron.
>

I made a modified version of the script, and ran it.
(attachment: script-1.py)

The script went on its way for a bit, and then blew up.
(attachment: error-run-1.txt)

Now the archive for the list is broken in the sense that when I search for
"*", and order it by "earliest first", it will show a server error.
The mailmanweb.log also gives errors when this happens.
(attachment: mailman-web.log)

I tried to run `./manage.py update_index` which did not fix the issue for
the archive.
I then ran `./manage.py rebuild_index` which did fix the issue for the
archive.

Following this I ran the script again, and it showed similar output,
including an error message after a while, but on every run it would delete
new messages.
I could keep running it until all the messages that I needed to be gone
were gone, and then do a final `rebuild_index` to get the server back in
shape.

The mailmanweb.log would show the occasional message like this while I was
doing this:
WARNING 2024-10-16 12:31:52,043 41 hyperkitty.tasks Cannot rebuild the
thread cache: thread 28 does not exist.

These did not re-appear after I rebuilt the index.

Is there some call I need to make to refresh the Email.objects list between
runs?
Do I just blindly rerun `Email.objects.filter()` after every call to
msg.delete()?

And secondary: can I avoid the call to rebuild_index?  The actual
production server has a massive count of messages.

Thanks!

Marco van Tol
RIPE NCC
f04331d81855:/opt/mailman-web-data$ python tol-test-3.py 2006-01-01
276
chair.ripe.net 2005-05-04 12:23:11+00:00 
12:04:15 [Q] INFO Enqueued [default] 56
12:04:15 [Q] INFO Enqueued [default] 57
chair.ripe.net 2005-04-28 09:38:04+00:00 
12:04:15 [Q] INFO Enqueued [default] 58
12:04:15 [Q] INFO Enqueued [default] 59
chair.ripe.net 2005-04-26 20:02:54+00:00 
12:04:15 [Q] INFO Enqueued [default] 60
12:04:15 [Q] INFO Enqueued [default] 61
chair.ripe.net 2005-04-26 23:37:03+00:00 
12:04:15 [Q] INFO Enqueued [default] 62
chair.ripe.net 2005-04-27 00:01:24+00:00 
12:04:15 [Q] INFO Enqueued [default] 63
chair.ripe.net 2005-04-27 10:51:32+00:00 
12:04:15 [Q] INFO Enqueued [default] 64
chair.ripe.net 2005-04-27 07:44:01+00:00 
12:04:15 [Q] INFO Enqueued [default] 65
12:04:15 [Q] INFO Enqueued [default] 66
chair.ripe.net 2005-05-04 12:32:51+00:00 
12:04:15 [Q] INFO Enqueued [default] 67
12:04:15 [Q] INFO Enqueued [default] 68
chair.ripe.net 2005-04-30 19:16:23+00:00 
12:04:15 [Q] INFO Enqueued [default] 69
chair.ripe.net 2005-05-02 13:25:15+00:00 
12:04:15 [Q] INFO Enqueued [default] 70
chair.ripe.net 2005-05-04 12:51:34+00:00 
12:04:15 [Q] INFO Enqueued [default] 71
chair.ripe.net 2005-05-04 12:16:24+00:00 
12:04:15 [Q] INFO Enqueued [default] 72
chair.ripe.net 2005-09-23 07:33:52+00:00 
12:04:15 [Q] INFO Enqueued [default] 73
chair.ripe.net 2005-05-05 09:30:44+00:00 
12:04:15 [Q] INFO Enqueued [default] 74
chair.ripe.net 2005-05-05 12:05:55+00:00 
12:04:15 [Q] INFO Enqueued [default] 75
chair.ripe.net 2005-05-20 14:22:36+00:00 
12:04:15 [Q] INFO Enqueued [default] 76
12:04:15 [Q] INFO Enqueued [default] 77
chair.ripe.net 2005-05-20 15:04:23+00:00 
12:04:15 [Q] INFO Enqueued [default] 78
chair.ripe.net 2005-05-24 23:16:32+00:00 
12:04:15 [Q] INFO Enqueued [default] 79
chair.ripe.net 2005-07-28 01:09:13+00:00 
12:04:15 [Q] INFO Enqueued [default] 80
chair.ripe.net 2005-07-25 13:43:37+00:00 
12:04:15 [Q] INFO Enqueued [default] 81
12:04:15 [Q] INFO Enqueued [default] 82
chair.ripe.net 2005-07-27 07:35:36+00:00 
12:04:15 [Q] INFO Enqueued [default] 83
chair.ripe.net 2005-08-25 12:24:57+00:00 
12:04:16 [Q] INFO Enqueued [default] 84
12:04:16 [Q] INFO Enqueued [default] 85
chair.ripe.net 2005-08-26 22:50:05+00:00 
Traceback (most recent call last):
  File 
"/usr/lib/python3.12/site-packages/django/db/models/fields/related_descriptors.py",
 line 218, in __get__
rel_obj = self.field.get_cached_value(instance)
  ^
  File "/usr/lib/python3.12/site-packages/django/db/models/fields/mixins.py", 
line 15, in get_cached_value
return instance._state.fields_cache[cache_name]
   
KeyError: 'parent'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/mailman-web-data/tol-test-3.py", line 31, in 
msg.delete()
  File "/

[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-09-12 Thread Marco van Tol
On Wed, 11 Sept 2024 at 21:16, Mark Sapiro  wrote:

> On 9/11/24 00:23, Marco van Tol wrote:
> >
> > We're interested in the development of a feature in mailman3 where we can
> > configure it to automatically expire/remove threads in/from the archive
> > older than x number of days.
>
>
> The script at https://www.msapiro.net/scripts/prune_arch3 could be
> easily modified to do this and then be run periodically by cron.
>
>
Hi! Thanks for this suggestion, I'll have a look!

Marco van Tol
RIPE NCC
___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9


[Mailman-Developers] Re: Custom feature request - ripe ncc - archive expires

2024-09-11 Thread Mark Sapiro

On 9/11/24 00:23, Marco van Tol wrote:


We're interested in the development of a feature in mailman3 where we can
configure it to automatically expire/remove threads in/from the archive
older than x number of days.



The script at https://www.msapiro.net/scripts/prune_arch3 could be 
easily modified to do this and then be run periodically by cron.


--
Mark Sapiro The highway is for gamblers,
San Francisco Bay Area, Californiabetter use your sense - B. Dylan

___
Mailman-Developers mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/mailman-developers.python.org/
Mailman FAQ: https://wiki.list.org/x/AgA3

Security Policy: https://wiki.list.org/x/QIA9