Re: Discourse proposal status
On 06/03/2023 02:52, David Kastrup wrote: Andrew Bernard writes: I should add that a normal fresh install does not require large RAM. It's only the mbox import script. I think it's poorly written and does not constrain itself when it comes to memory. Is it necessary to run the import on the same machine that is going to run the server? What sort of VM does your hosting provider use? Can you set up the VM at home, punch a hole through the firewall/router for testing purposes, and then move it to the hosting provider as/when you think it could go live? Cheers, Wol
Re: Discourse proposal status
On Mon 06 Mar 2023 at 04:18:17 (+0100), David Kastrup wrote: > Andrew Bernard writes: > > > Well you can dynamically increase CPU or RAM or both on Digitalocean > > that I use. You can do it on a temporary basis - but I'm not sure if > > you get charged for a month or on a strict time basis, it's hard to > > find out!. It's not a matter of needing a separate system. My only > > issue is that I am very financially constrained and I can't afford the > > experiment. > > > > But the bigger fish to fry is the issue with the irregularities in the > > mbox archives. I need to study this in depth before trying a load. I > > did have the same problem with similar erratic mbox archives quite > > some years ago but I can't easily recall the solution. Probably just a > > more refined regex to pick up the 'From:' delimiters. > > There isn't really much finesse involved. Messages start at the pattern > "^From ". Any "From " inside of a message that would end up at the > start of a line is changed to ">From ", so the pattern "^From " should > be foolproof regarding splitting into messages. I think this is rather dated. Most modern MUAs, including your own from Sat, 25 Feb 2023 16:29:18 +0100, aren't escaping Froms any more. The cached copy (sent via IMAP) is clean, and any mboxes I copy it to, all contain: > =E2=80=A6 and I=E2=80=99ll probably get yelled at for top-posting as well= ;-) From a practical workflow perspective, I would much rather do all of my reading using a single keyboard driven interface and application than OTOH, the digests contain: > … and I’ll probably get yelled at for top-posting as well ;-) >From a practical workflow perspective, I would much rather do all of my reading using a single keyboard driven interface and application than as does the web page: https://lists.gnu.org/archive/html/lilypond-user/2023-02/msg00500.html I haven't checked what the ?monthly mboxes contain, but I would try using a regex that includes matching the date at the end of the From line (here deliberately offset): From lil…ser-bounces+lilylis=l….u...@gnu.org Sat Feb 25 09:29:53 2023 because whatever is writing these mboxes should be using a consistent format for these. Cheers, David.
Re: Discourse proposal status
Andrew Bernard writes: > Well you can dynamically increase CPU or RAM or both on Digitalocean > that I use. You can do it on a temporary basis - but I'm not sure if > you get charged for a month or on a strict time basis, it's hard to > find out!. It's not a matter of needing a separate system. My only > issue is that I am very financially constrained and I can't afford the > experiment. You could experiment with adding enough swap space to import if you are charged based on a fixed configuration rather than processing time. It may take tremendous amounts of time, depending on how non-local the memory accesses occur. -- David Kastrup
Re: Discourse proposal status
But that is exactly my point. A system that does require much more resources to set up is a bit fishy in my eyes. Also my point still stands: As far as I see discourse uses docker containers for deployment. Wouldn’t it be possible to set up the container on a local machine, export the set up container and load that on the server, which I believe is more or less what David enquired about. Setting everything up on a local server might also be a good idea before spending money just to try out how it works. Cheers, Valentin Am Montag, 6. März 2023, 03:29:43 CET schrieb Andrew Bernard: > No. You misunderstand. Discourse is quite compact. The 8GB of RAM is > only required temporarily for importing 20+ years worth of mbox files. > My Discourse servers all run fine in 2GB of RAM, with unlimited posts, > which are just in a database on disk. > > On 6/03/2023 3:04 am, Valentin Petzel wrote: > > I suppose a system that requires by a large factor more resources for > > installation that it requires to run is not really a good way to do it. signature.asc Description: This is a digitally signed message part.
Re: Discourse proposal status
Andrew Bernard writes: > Well you can dynamically increase CPU or RAM or both on Digitalocean > that I use. You can do it on a temporary basis - but I'm not sure if > you get charged for a month or on a strict time basis, it's hard to > find out!. It's not a matter of needing a separate system. My only > issue is that I am very financially constrained and I can't afford the > experiment. > > But the bigger fish to fry is the issue with the irregularities in the > mbox archives. I need to study this in depth before trying a load. I > did have the same problem with similar erratic mbox archives quite > some years ago but I can't easily recall the solution. Probably just a > more refined regex to pick up the 'From:' delimiters. There isn't really much finesse involved. Messages start at the pattern "^From ". Any "From " inside of a message that would end up at the start of a line is changed to ">From ", so the pattern "^From " should be foolproof regarding splitting into messages. I don't remember what happens to "^>From " but consider it most likely that any "^>*From " inside of a message gets one ">" prepended when put into an mbox file, and one taken out again when displayed/processed. -- David Kastrup
Re: Discourse proposal status
Well you can dynamically increase CPU or RAM or both on Digitalocean that I use. You can do it on a temporary basis - but I'm not sure if you get charged for a month or on a strict time basis, it's hard to find out!. It's not a matter of needing a separate system. My only issue is that I am very financially constrained and I can't afford the experiment. But the bigger fish to fry is the issue with the irregularities in the mbox archives. I need to study this in depth before trying a load. I did have the same problem with similar erratic mbox archives quite some years ago but I can't easily recall the solution. Probably just a more refined regex to pick up the 'From:' delimiters. Andrew On 6/03/2023 1:52 pm, David Kastrup wrote: Is it necessary to run the import on the same machine that is going to run the server?
Re: Discourse proposal status
Andrew Bernard writes: > I should add that a normal fresh install does not require large > RAM. It's only the mbox import script. I think it's poorly written and > does not constrain itself when it comes to memory. Is it necessary to run the import on the same machine that is going to run the server? -- David Kastrup
Re: Discourse proposal status
I should add that a normal fresh install does not require large RAM. It's only the mbox import script. I think it's poorly written and does not constrain itself when it comes to memory. Andrew On 6/03/2023 1:29 pm, Andrew Bernard wrote: On 6/03/2023 3:04 am, Valentin Petzel wrote: I suppose a system that requires by a large factor more resources for installation that it requires to run is not really a good way to do it.
Re: Discourse proposal status
No. You misunderstand. Discourse is quite compact. The 8GB of RAM is only required temporarily for importing 20+ years worth of mbox files. My Discourse servers all run fine in 2GB of RAM, with unlimited posts, which are just in a database on disk. On 6/03/2023 3:04 am, Valentin Petzel wrote: I suppose a system that requires by a large factor more resources for installation that it requires to run is not really a good way to do it.
Re: Discourse proposal status
Hello Andrew, I suppose a system that requires by a large factor more resources for installation that it requires to run is not really a good way to do it. Renting an 8GB Server when you only need that 8GB for setup sounds a bit daft. I do not know if renting a server dedicated to managing a handful of messages per day is really a good use of resources. In my opinion the best way would be to have Lilypond use a small share of the resources of a company or communiy stable enough to grant that the system will be around some time. This is exactly what is currently done by GNU infastructure. Renting a dedicated server would really only make sense if we had a handful of web services that really make use of these resources. Such a server might also be used for example to host something like Paolo’s Spontini editor. I mean, sure, it wouldn’t cost much individually if a few of us rented a Server together, but I think if we do something like this we should make full use of that server. Cheers, Valentin Am Sonntag, 5. März 2023, 06:04:40 CET schrieb Andrew Bernard: > I'm still keen on Discourse for our community and have been giving it > some attention. People have right;y said the full historical archive is > important. David Kastrup usefully pointed out all the mbox archives are > freely available, back to the beginning. Discourse can import mbox > archives and there is a special import instance with scripts to load them. > > So I downloaded the complete set of archives and attempted to load them > into Discourse. One problem that arose is that the documentation for the > import script explicitly mentions that you need a system with 8GB of RAM > (only for the import, not for normal running) and my linux servers are > only configured with 2GB. Consequently the import fails. I use Vultr and > Digitalocean and unfortunately upping the specification to 8GB plus is > too expensive for me in my situation. Servers with extra RAM get > expensive very quickly at these companies. So this is a problem. I am > working towards a solution to work around this, perhaps doing it on my > home server and uploading to a 2GB server later (but there are > complexities with this). > > But, the next current issue is this, and I have come across this before > when importing mbox archives into GNU Mailman 2 and 3. The mbox format > is rather loosely defined, and there are lots of small variations. But > worse, even when you look at our archive set the format seems to vary > slightly over the years. Some messages don't get unpacked properly due > to inconsistencies in headers and my import of a sizeable batch showed > quite a few messages with the initial message fine, but then follow up > replies just appended in raw mailbox format rather than separate > messages. I'll have to study this in more depth and write some scripts > to pre-process the mbox files. Tedious, but I've had to do this in the past. > > If I can load the whole history, then people can have a play with it to > see if they like it. I also think I can get the Discourse instance to be > a subscriber to the present list so ti would keep up to date. Of course, > if people replied on the Discourse interface the lists would diverge, > but this is just for proopf of concept. > > As to operational matters, there is cost involved by going outside the > GNU infrastructure. An adequate server costs around USD$20 per month to > run. The way I handle this in the communities that I support is to ask > for donations. Generally if a moderate number of people chip in it can > be the cost of a cup of coffee a month. It's not excessive. > > I think there have been some comments here indicating some > dissatisfaction with this aging and somewhat limited email > infrastructure, so I am encouraged to setup my proof of concept to > demonstrate a modern system. I am also compelled to say that in > practical terms the email list functionality of Discourse from the end > user point of view is identical to what we have now with GNU Mailman 2, > and you'd be hard pressed to find any operational/functional difference. > So for people who reject web interfaces, in effect very little would change. > > Finally, I'd just mention that Discourse of course supports plain text, > but also HTML, and has full support for Markdown, both from email > initiated files and web initiated topics. This means you can do nice > formatting if required, and in particular you can make clear tables. And > as with most forums now, you can mark code blocks so that they stand out > clearly. > > I'll post regular updates as I make progress towards a full proof of > concept for people. And of course, if this was Discourse these posts > would be in a distinct and separate category so as not to be noise in > the main flow of the stream. :-) > > Andrew signature.asc Description: This is a digitally signed message part.
Discourse proposal status
I'm still keen on Discourse for our community and have been giving it some attention. People have right;y said the full historical archive is important. David Kastrup usefully pointed out all the mbox archives are freely available, back to the beginning. Discourse can import mbox archives and there is a special import instance with scripts to load them. So I downloaded the complete set of archives and attempted to load them into Discourse. One problem that arose is that the documentation for the import script explicitly mentions that you need a system with 8GB of RAM (only for the import, not for normal running) and my linux servers are only configured with 2GB. Consequently the import fails. I use Vultr and Digitalocean and unfortunately upping the specification to 8GB plus is too expensive for me in my situation. Servers with extra RAM get expensive very quickly at these companies. So this is a problem. I am working towards a solution to work around this, perhaps doing it on my home server and uploading to a 2GB server later (but there are complexities with this). But, the next current issue is this, and I have come across this before when importing mbox archives into GNU Mailman 2 and 3. The mbox format is rather loosely defined, and there are lots of small variations. But worse, even when you look at our archive set the format seems to vary slightly over the years. Some messages don't get unpacked properly due to inconsistencies in headers and my import of a sizeable batch showed quite a few messages with the initial message fine, but then follow up replies just appended in raw mailbox format rather than separate messages. I'll have to study this in more depth and write some scripts to pre-process the mbox files. Tedious, but I've had to do this in the past. If I can load the whole history, then people can have a play with it to see if they like it. I also think I can get the Discourse instance to be a subscriber to the present list so ti would keep up to date. Of course, if people replied on the Discourse interface the lists would diverge, but this is just for proopf of concept. As to operational matters, there is cost involved by going outside the GNU infrastructure. An adequate server costs around USD$20 per month to run. The way I handle this in the communities that I support is to ask for donations. Generally if a moderate number of people chip in it can be the cost of a cup of coffee a month. It's not excessive. I think there have been some comments here indicating some dissatisfaction with this aging and somewhat limited email infrastructure, so I am encouraged to setup my proof of concept to demonstrate a modern system. I am also compelled to say that in practical terms the email list functionality of Discourse from the end user point of view is identical to what we have now with GNU Mailman 2, and you'd be hard pressed to find any operational/functional difference. So for people who reject web interfaces, in effect very little would change. Finally, I'd just mention that Discourse of course supports plain text, but also HTML, and has full support for Markdown, both from email initiated files and web initiated topics. This means you can do nice formatting if required, and in particular you can make clear tables. And as with most forums now, you can mark code blocks so that they stand out clearly. I'll post regular updates as I make progress towards a full proof of concept for people. And of course, if this was Discourse these posts would be in a distinct and separate category so as not to be noise in the main flow of the stream. :-) Andrew