Re: Discourse proposal status

2023-03-06 Thread Wol

On 06/03/2023 02:52, David Kastrup wrote:

Andrew Bernard  writes:


I should add that a normal fresh install does not require large
RAM. It's only the mbox import script. I think it's poorly written and
does not constrain itself when it comes to memory.


Is it necessary to run the import on the same machine that is going to
run the server?

What sort of VM does your hosting provider use? Can you set up the VM at 
home, punch a hole through the firewall/router for testing purposes, and 
then move it to the hosting provider as/when you think it could go live?


Cheers,
Wol



Re: Discourse proposal status

2023-03-06 Thread David Wright
On Mon 06 Mar 2023 at 04:18:17 (+0100), David Kastrup wrote:
> Andrew Bernard  writes:
> 
> > Well you can dynamically increase CPU or RAM or both on Digitalocean
> > that I use. You can do it on a temporary basis - but I'm not sure if
> > you get charged for a month or on a strict time basis, it's hard to
> > find out!. It's not a matter of needing a separate system. My only
> > issue is that I am very financially constrained and I can't afford the
> > experiment.
> >
> > But the bigger fish to fry is the issue with the irregularities in the
> > mbox archives. I need to study this in depth before trying a load. I
> > did have the same problem with similar erratic mbox archives quite
> > some years ago but I can't easily recall the solution. Probably just a
> > more refined regex to pick up the 'From:' delimiters.
> 
> There isn't really much finesse involved.  Messages start at the pattern
> "^From ".  Any "From " inside of a message that would end up at the
> start of a line is changed to ">From ", so the pattern "^From " should
> be foolproof regarding splitting into messages.

I think this is rather dated. Most modern MUAs, including your own
from Sat, 25 Feb 2023 16:29:18 +0100, aren't escaping Froms any more.
The cached copy (sent via IMAP) is clean, and any mboxes I copy it to,
all contain:

  > =E2=80=A6 and I=E2=80=99ll probably get yelled at for top-posting as well=
   ;-)
  
  From a practical workflow perspective, I would much rather do all of my
  reading using a single keyboard driven interface and application than

OTOH, the digests contain:

  > … and I’ll probably get yelled at for top-posting as well ;-)
  
  >From a practical workflow perspective, I would much rather do all of my
  reading using a single keyboard driven interface and application than

as does the web page:

https://lists.gnu.org/archive/html/lilypond-user/2023-02/msg00500.html

I haven't checked what the ?monthly mboxes contain, but I would try
using a regex that includes matching the date at the end of the
From line (here deliberately offset):

  From lil…ser-bounces+lilylis=l….u...@gnu.org Sat Feb 25 09:29:53 2023

because whatever is writing these mboxes should be using a consistent
format for these.

Cheers,
David.



Re: Discourse proposal status

2023-03-06 Thread David Kastrup
Andrew Bernard  writes:

> Well you can dynamically increase CPU or RAM or both on Digitalocean
> that I use. You can do it on a temporary basis - but I'm not sure if
> you get charged for a month or on a strict time basis, it's hard to
> find out!. It's not a matter of needing a separate system. My only
> issue is that I am very financially constrained and I can't afford the
> experiment.

You could experiment with adding enough swap space to import if you are
charged based on a fixed configuration rather than processing time.  It
may take tremendous amounts of time, depending on how non-local the
memory accesses occur.

-- 
David Kastrup



Re: Discourse proposal status

2023-03-06 Thread Valentin Petzel
But that is exactly my point. A system that does require much more resources 
to set up is  a bit fishy in my eyes.

Also my point still stands: 

As far as I see discourse uses docker containers for deployment. Wouldn’t it 
be possible to set up the container on a local machine, export the set up 
container and load that on the server, which I believe is more or less what 
David enquired about.

Setting everything up on a local server might also be a good idea before 
spending money just to try out how it works.

Cheers,
Valentin

Am Montag, 6. März 2023, 03:29:43 CET schrieb Andrew Bernard:
> No. You misunderstand. Discourse is quite compact. The 8GB of RAM is
> only required temporarily for importing 20+ years worth of mbox files.
> My Discourse servers all run fine in 2GB of RAM, with unlimited posts,
> which are just in a database on disk.
> 
> On 6/03/2023 3:04 am, Valentin Petzel wrote:
> > I suppose a system that requires by a large factor more resources for
> > installation that it requires to run is not really a good way to do it.



signature.asc
Description: This is a digitally signed message part.


Re: Discourse proposal status

2023-03-05 Thread David Kastrup
Andrew Bernard  writes:

> Well you can dynamically increase CPU or RAM or both on Digitalocean
> that I use. You can do it on a temporary basis - but I'm not sure if
> you get charged for a month or on a strict time basis, it's hard to
> find out!. It's not a matter of needing a separate system. My only
> issue is that I am very financially constrained and I can't afford the
> experiment.
>
> But the bigger fish to fry is the issue with the irregularities in the
> mbox archives. I need to study this in depth before trying a load. I
> did have the same problem with similar erratic mbox archives quite
> some years ago but I can't easily recall the solution. Probably just a
> more refined regex to pick up the 'From:' delimiters.

There isn't really much finesse involved.  Messages start at the pattern
"^From ".  Any "From " inside of a message that would end up at the
start of a line is changed to ">From ", so the pattern "^From " should
be foolproof regarding splitting into messages.

I don't remember what happens to "^>From " but consider it most likely
that any "^>*From " inside of a message gets one ">" prepended when put
into an mbox file, and one taken out again when displayed/processed.

-- 
David Kastrup



Re: Discourse proposal status

2023-03-05 Thread Andrew Bernard
Well you can dynamically increase CPU or RAM or both on Digitalocean 
that I use. You can do it on a temporary basis - but I'm not sure if you 
get charged for a month or on a strict time basis, it's hard to find 
out!. It's not a matter of needing a separate system. My only issue is 
that I am very financially constrained and I can't afford the experiment.


But the bigger fish to fry is the issue with the irregularities in the 
mbox archives. I need to study this in depth before trying a load. I did 
have the same problem with similar erratic mbox archives quite some 
years ago but I can't easily recall the solution. Probably just a more 
refined regex to pick up the 'From:' delimiters.



Andrew


On 6/03/2023 1:52 pm, David Kastrup wrote:

Is it necessary to run the import on the same machine that is going to
run the server?





Re: Discourse proposal status

2023-03-05 Thread David Kastrup
Andrew Bernard  writes:

> I should add that a normal fresh install does not require large
> RAM. It's only the mbox import script. I think it's poorly written and
> does not constrain itself when it comes to memory.

Is it necessary to run the import on the same machine that is going to
run the server?

-- 
David Kastrup



Re: Discourse proposal status

2023-03-05 Thread Andrew Bernard
I should add that a normal fresh install does not require large RAM. 
It's only the mbox import script. I think it's poorly written and does 
not constrain itself when it comes to memory.


Andrew


On 6/03/2023 1:29 pm, Andrew Bernard wrote:

On 6/03/2023 3:04 am, Valentin Petzel wrote:

I suppose a system that requires by a large factor more resources for
installation that it requires to run is not really a good way to do it.






Re: Discourse proposal status

2023-03-05 Thread Andrew Bernard
No. You misunderstand. Discourse is quite compact. The 8GB of RAM is 
only required temporarily for importing 20+ years worth of mbox files. 
My Discourse servers all run fine in 2GB of RAM, with unlimited posts, 
which are just in a database on disk.


On 6/03/2023 3:04 am, Valentin Petzel wrote:

I suppose a system that requires by a large factor more resources for
installation that it requires to run is not really a good way to do it.




Re: Discourse proposal status

2023-03-05 Thread Valentin Petzel
Hello Andrew,

I suppose a system that requires by a large factor more resources for 
installation that it requires to run is not really a good way to do it. 
Renting an 8GB Server when you only need that 8GB for setup sounds a bit daft. 
I do not know if renting a server dedicated to managing a handful of messages 
per day is really a good use of resources. In my opinion the best way would be 
to have Lilypond use a small share of the resources of a company or communiy 
stable enough to grant that the system will be around some time. This is 
exactly what is currently done by GNU infastructure.

Renting a dedicated server would really only make sense if we had a handful of 
web services that really make use of these resources. Such a server might also 
be used for example to host something like Paolo’s Spontini editor. I mean, 
sure, it wouldn’t cost much individually if a few of us rented a Server 
together, but I think if we do something like this we should make full use of 
that server.

Cheers,
Valentin

Am Sonntag, 5. März 2023, 06:04:40 CET schrieb Andrew Bernard:
> I'm still keen on Discourse for our community and have been giving it
> some attention. People have right;y said the full historical archive is
> important. David Kastrup usefully pointed out all the mbox archives are
> freely available, back to the beginning. Discourse can import mbox
> archives and there is a special import instance with scripts to load them.
> 
> So I downloaded the complete set of archives and attempted to load them
> into Discourse. One problem that arose is that the documentation for the
> import script explicitly mentions that you need a system with 8GB of RAM
> (only for the import, not for normal running) and my linux servers are
> only configured with 2GB. Consequently the import fails. I use Vultr and
> Digitalocean and unfortunately upping the specification to 8GB plus is
> too expensive for me in my situation. Servers with extra RAM get
> expensive very quickly at these companies. So this is a problem. I am
> working towards a solution to work around this, perhaps doing it on my
> home server and uploading to a 2GB server later (but there are
> complexities with this).
> 
> But, the next current issue is this, and I have come across this before
> when importing mbox archives into GNU Mailman 2 and 3. The mbox format
> is rather loosely defined, and there are lots of small variations. But
> worse, even when you look at our archive set the format seems to vary
> slightly over the years. Some messages don't get unpacked properly due
> to inconsistencies in headers and my import of a sizeable batch showed
> quite a few messages with the initial message fine, but then follow up
> replies just appended in raw mailbox format rather than separate
> messages. I'll have to study this in more depth and write some scripts
> to pre-process the mbox files. Tedious, but I've had to do this in the past.
> 
> If I can load the whole history, then people can have a play with it to
> see if they like it. I also think I can get the Discourse instance to be
> a subscriber to the present list so ti would keep up to date. Of course,
> if people replied on the Discourse interface the lists would diverge,
> but this is just for proopf of concept.
> 
> As to operational matters, there is cost involved by going outside the
> GNU infrastructure. An adequate server costs around USD$20 per month to
> run. The way I handle this in the communities that I support is to ask
> for donations. Generally if a moderate number of people chip in it can
> be the cost of a cup of coffee a month. It's not excessive.
> 
> I think there have been some comments here indicating some
> dissatisfaction with this aging and somewhat limited email
> infrastructure, so I am encouraged to setup my proof of concept to
> demonstrate a modern system. I am also compelled to say that in
> practical terms the email list functionality of Discourse from the end
> user point of view is identical to what we have now with GNU Mailman 2,
> and you'd be hard pressed to find any operational/functional difference.
> So for people who reject web interfaces, in effect very little would change.
> 
> Finally, I'd just mention that Discourse of course supports plain text,
> but also HTML, and has full support for Markdown, both from email
> initiated files and web initiated topics. This means you can do nice
> formatting if required, and in particular you can make clear tables. And
> as with most forums now, you can mark code blocks so that they stand out
> clearly.
> 
> I'll post regular updates as I make progress towards a full proof of
> concept for people. And of course, if this was Discourse these posts
> would be in a distinct and separate category so as not to be noise in
> the main flow of the stream. :-)
> 
> Andrew



signature.asc
Description: This is a digitally signed message part.


Discourse proposal status

2023-03-04 Thread Andrew Bernard
I'm still keen on Discourse for our community and have been giving it 
some attention. People have right;y said the full historical archive is 
important. David Kastrup usefully pointed out all the mbox archives are 
freely available, back to the beginning. Discourse can import mbox 
archives and there is a special import instance with scripts to load them.


So I downloaded the complete set of archives and attempted to load them 
into Discourse. One problem that arose is that the documentation for the 
import script explicitly mentions that you need a system with 8GB of RAM 
(only for the import, not for normal running) and my linux servers are 
only configured with 2GB. Consequently the import fails. I use Vultr and 
Digitalocean and unfortunately upping the specification to 8GB plus is 
too expensive for me in my situation. Servers with extra RAM get 
expensive very quickly at these companies. So this is a problem. I am 
working towards a solution to work around this, perhaps doing it on my 
home server and uploading to a 2GB server later (but there are 
complexities with this).


But, the next current issue is this, and I have come across this before 
when importing mbox archives into GNU Mailman 2 and 3. The mbox format 
is rather loosely defined, and there are lots of small variations. But 
worse, even when you look at our archive set the format seems to vary 
slightly over the years. Some messages don't get unpacked properly due 
to inconsistencies in headers and my import of a sizeable batch showed 
quite a few messages with the initial message fine, but then follow up 
replies just appended in raw mailbox format rather than separate 
messages. I'll have to study this in more depth and write some scripts 
to pre-process the mbox files. Tedious, but I've had to do this in the past.


If I can load the whole history, then people can have a play with it to 
see if they like it. I also think I can get the Discourse instance to be 
a subscriber to the present list so ti would keep up to date. Of course, 
if people replied on the Discourse interface the lists would diverge, 
but this is just for proopf of concept.


As to operational matters, there is cost involved by going outside the 
GNU infrastructure. An adequate server costs around USD$20 per month to 
run. The way I handle this in the communities that I support is to ask 
for donations. Generally if a moderate number of people chip in it can 
be the cost of a cup of coffee a month. It's not excessive.


I think there have been some comments here indicating some 
dissatisfaction with this aging and somewhat limited email 
infrastructure, so I am encouraged to setup my proof of concept to 
demonstrate a modern system. I am also compelled to say that in 
practical terms the email list functionality of Discourse from the end 
user point of view is identical to what we have now with GNU Mailman 2, 
and you'd be hard pressed to find any operational/functional difference. 
So for people who reject web interfaces, in effect very little would change.


Finally, I'd just mention that Discourse of course supports plain text, 
but also HTML, and has full support for Markdown, both from email 
initiated files and web initiated topics. This means you can do nice 
formatting if required, and in particular you can make clear tables. And 
as with most forums now, you can mark code blocks so that they stand out 
clearly.


I'll post regular updates as I make progress towards a full proof of 
concept for people. And of course, if this was Discourse these posts 
would be in a distinct and separate category so as not to be noise in 
the main flow of the stream. :-)


Andrew