Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Henrique de Moraes Holschuh
On Tue, 03 Apr 2018, Michael Lange wrote: > I believe (please anyone correct me if I am wrong) that "text" files > won't contain any null byte; many text editors even refuse to open such a Depends on the encoding. For ASCII, ISO-8859-* and UTF-8 (and any other modern encoding AFAIK, other than mo

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
rhkra...@gmail.com (2018-04-03): > and the data is stored in mbox formatted files. DO NOT DO THAT. This is the only good advice you can have for that project. Store your data in a decent format. Regards, -- Nicolas George signature.asc Description: Digital sig

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
Sorry, I already have 300 MB plus stored in that format. Where were you in 2000 when I started the project? On Wednesday, April 04, 2018 07:23:25 AM Nicolas George wrote: > rhkra...@gmail.com (2018-04-03): > > and the data is stored in mbox formatted files. > > DO NO

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
rhkra...@gmail.com (2018-04-04): > Sorry, I already have 300 MB plus stored in that format. Then convert. Small extra work now. Many less headaches later. Regards, -- Nicolas George signature.asc Description: Digital signature

Re: utf

2018-04-04 Thread Henrique de Moraes Holschuh
On Tue, 03 Apr 2018, Darac Marjal wrote: > If these things matter to you, it's better to convert from UTF-8 to Unicode, UTF-8 *is* Unicode :p What you mean is either UCS-4 or UTF-32 (which are just another encoding for Unicode). But all of them are Unicode. UTF-* are only used for Unicode encod

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 01:23:25PM +0200, Nicolas George wrote: > rhkra...@gmail.com (2018-04-03): > > and the data is stored in mbox formatted files. > > DO NOT DO THAT. > > This is the only good advice you can have for that project. Store your > data in a decent form

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
I'll convert the file format after you convert the programs to work with the different file format. Those programs include kmail, nail, (essentially all email programs that use mbox as the file format), recoll (conversion should not be difficult), various editors (nedit, kate, for which I've wr

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
rhkra...@gmail.com (2018-04-04): > I'll convert the file format after you convert the programs to work with the > different file format. Those programs include kmail, nail, (essentially all > email programs that use mbox as the file format), recoll (conversion should > not > be difficult), var

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 08:26:41 AM Greg Wooledge wrote: > On Wed, Apr 04, 2018 at 01:23:25PM +0200, Nicolas George wrote: > > rhkra...@gmail.com (2018-04-03): > > > and the data is stored in mbox formatted files. > > > > DO NOT DO THAT. > > > > This is the only goo

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Apr 04, 2018 at 08:18:23AM -0300, Henrique de Moraes Holschuh wrote: > On Tue, 03 Apr 2018, Michael Lange wrote: > > I believe (please anyone correct me if I am wrong) that "text" files > > won't contain any null byte; many text editors even re

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Andre Majorel
On 2018-04-04 14:55 +0200, Nicolas George wrote: > I have given you advice (for free), you are not taking it. Too bad for > you. Good day. Is advice that comes with condescension truly free ? -- André Majorel I trust bugs.debian.org to not publish my email addr

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 04:15:48PM +0200, Andre Majorel wrote: > On 2018-04-04 14:55 +0200, Nicolas George wrote: > > > I have given you advice (for free), you are not taking it. Too bad for > > you. Good day. > > Is advice that comes with condescension truly free ? Any advice that stops the OP

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 10:15:48 AM Andre Majorel wrote: > On 2018-04-04 14:55 +0200, Nicolas George wrote: > > I have given you advice (for free), you are not taking it. Too bad for > > you. Good day. > > Is advice that comes with condescension truly free ? Thank you!

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 10:24:06 AM Greg Wooledge wrote: > On Wed, Apr 04, 2018 at 04:15:48PM +0200, Andre Majorel wrote: > > On 2018-04-04 14:55 +0200, Nicolas George wrote: > > > I have given you advice (for free), you are not taking it. Too bad for > > > you. Good day. > > > > Is advice th

Re: utf

2018-04-04 Thread deloptes
Nicolas George wrote: > No, the length of the string is hardly relevant, and when it is it is > not enough anyway. @Nicolas, I think OP does not understand you - perhaps it is not worth the effort. My impression is that you refer to a string (properly) as sequence of bytes and other refer to it a

Re: utf

2018-04-04 Thread Nicolas George
deloptes (2018-04-04): > @Nicolas, I think OP does not understand you - perhaps it is not worth the > effort. My impression is that you refer to a string (properly) as sequence > of bytes and other refer to it as number of chars, which is not consistant > with utf. Not at all, I am well speaking o

Re: utf

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 07:07:01PM +0200, Nicolas George wrote: > Find me a case where you need to access the n-th char of a string, with > n completely out of the blue, and I will explain how somebody botched > their design. Does it count if we want the 1st char, then the 2nd char, then the 3rd c

Re: utf

2018-04-04 Thread Nicolas George
Greg Wooledge (2018-04-04): > Does it count if we want the 1st char, then the 2nd char, then the 3rd > char, then the 4th char, and so on? Or is that not blue enough? It is not out of the blue, it is in sequence. > How about the last char? Or the last two chars? Ditto. >

Re: utf

2018-04-04 Thread deloptes
Nicolas George wrote: > Find me a case where you need to access the n-th char of a string, with > n completely out of the blue, and I will explain how somebody botched > their design. ok, thanks. I understood the part above, but not sure if I understand this part. A standard text editing operatio

Re: utf

2018-04-04 Thread Nicolas George
deloptes (2018-04-04): > ok, thanks. I understood the part above, but not sure if I understand this > part. A standard text editing operation is find and replace, where you get > the start and end point in the string. Of course it is not "n completely > out of the blue". I am not sure exactly what

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Don Armstrong
On Tue, 03 Apr 2018, rhkra...@gmail.com wrote: > I am building (have built several iterations) of a free format > database to work something like askSam. It is a mashup of several > applications, things like recol, kmail, nail, kate and the data is > stored in mbox formatted files. > > Each record

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Nicolas George
Don Armstrong (2018-04-04): > You should consider looking at using Maildir with notmuch and using > things which integrate notmuch.[1] Maildir is not that much better than mbox. Sure, it eliminates most of its worse flaws, but it brings flaws of its own, like trashing the inode and dentries caches

Re: utf

2018-04-04 Thread Greg Wooledge
On Wed, Apr 04, 2018 at 07:35:37PM +0200, Nicolas George wrote: > I am not sure exactly what is your example, but you got its flaw right: > n is not out of the blue, it was obtained by previously walking the > string. And in that case, you have all freedom to express n as a more > convenient entity

Re: Unknown Systemd version

2018-04-04 Thread Laurent Lyaudet
2018-04-03 22:14 GMT+02:00 Abdullah Ramazanoglu : > On Tue, 3 Apr 2018 14:24:54 -0500 David Wright said: > >> On Tue 03 Apr 2018 at 19:58:23 (+0200), Laurent Lyaudet wrote: >> >>> I don't understand why I have apache2-bin installed but apache is >>> not there??? >> >> $ aptitude why apache2-bin > >

Re: utf

2018-04-04 Thread Nicolas George
Greg Wooledge (2018-04-04): > The problem is, you reject every single example that everyone gives > you. I do not reject them, I refute them. > I don't know what you expect from us. Acknowledge that I am right once I have refuted all your examples and you have eventually understood my point. At

mbox vs maildir vs better formats [Re: Invalid UTF-8 byte? (was: Re: utf)]

2018-04-04 Thread Don Armstrong
On Wed, 04 Apr 2018, Nicolas George wrote: > Don Armstrong (2018-04-04): > > You should consider looking at using Maildir with notmuch and using > > things which integrate notmuch.[1] > > Maildir is not that much better than mbox. Sure, it eliminates most of > its worse flaws, but it brings flaws

Re: mbox vs maildir vs better formats [Re: Invalid UTF-8 byte? (was: Re: utf)]

2018-04-04 Thread Nicolas George
Don Armstrong (2018-04-04): > There are definitely better formats than Maildir, like Dovecot's > multi-dbox.[1] > > These issues are why almost everyone who uses Maildir just uses it as > the backing message store and uses the index on top to do avoid ever > reading all of the messages in the Mail

Re: utf

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 12:58:57 PM deloptes wrote: > And regarding the mbox thing, well mbox was depreciated for many reasons. I > guess if it was that good it wouldn't be depreciated. Oh, I wasn't aware that mbox was deprecated--can you shed more light on that. AFAIK, it is not defined in

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Jonathan de Boyne Pollard
rhkramer: The reason I wanted such a byte was to use it as a record separator in a set of text files (that I use as an askSam "workalike" (or "worksimilar") so that I could use msort (which depends on a 1 byte record separator to --separate the records ;-) while sorting. Some of the files alr

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 01:36:15 PM Don Armstrong wrote: > On Tue, 03 Apr 2018, rhkra...@gmail.com wrote: > > I am building (have built several iterations) of a free format > > database to work something like askSam. It is a mashup of several > > applications, things like recol, kmail, nail, k

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Henrique de Moraes Holschuh
On Wed, 04 Apr 2018, to...@tuxteam.de wrote: > On Wed, Apr 04, 2018 at 08:18:23AM -0300, Henrique de Moraes Holschuh wrote: > > On Tue, 03 Apr 2018, Michael Lange wrote: > > > I believe (please anyone correct me if I am wrong) that "text" files > > > won't contain any null byte; many text editors e

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Jonathan de Boyne Pollard
rhkramer: Where were you in 2000 when I started the project? I cannot speak for anyone else, but I was probably once again giving a frequently given answer that I eventually put up on a WWW page. http://jdebp.eu./FGA/mail-mbox-formats.html

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Don Armstrong
On Wed, 04 Apr 2018, rhkra...@gmail.com wrote: > I've considered maildir--it meets some of my requirements (that is, to > make something close to an askSam workalike), but one drawback is that > it is essentially one email (i.e., my "record"). One of the desirable > features of askSam is that you d

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread Jonathan de Boyne Pollard
Henrique de Moraes Holschuh: Also, a text file MAY contain NULs (the character), it is just considered bad practice (nowadays?). Don't assume you won't see any. For example, received e-mail is *more* likely to have NULs in it than normal text due to the quality of some mail agents out there.

Re: utf

2018-04-04 Thread Joel Roth
On Wed, Apr 04, 2018 at 02:20:17PM -0400, rhkra...@gmail.com wrote: > On Wednesday, April 04, 2018 12:58:57 PM deloptes wrote: > > And regarding the mbox thing, well mbox was depreciated for many reasons. I > > guess if it was that good it wouldn't be depreciated. > Oh, I wasn't aware that mbox wa

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Apr 04, 2018 at 03:44:23PM -0300, Henrique de Moraes Holschuh wrote: [...] > That said, it is always safe to break valid "modified UTF-8" into > records using zeroes, as long as you don't expect the result to be valid > UTF-8 (it isn't valid

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 02:10:16 PM Jonathan de Boyne Pollard wrote: > rhkramer: > > The reason I wanted such a byte was to use it as a record separator in > > a set of text files (that I use as an askSam "workalike" (or > > "worksimilar") so that I could use msort (which depends on a 1 byte >

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread rhkramer
On Wednesday, April 04, 2018 01:36:15 PM Don Armstrong wrote: > On Tue, 03 Apr 2018, rhkra...@gmail.com wrote: > > I am building (have built several iterations) of a free format > > database to work something like askSam. It is a mashup of several > > applications, things like recol, kmail, nail, k

tcp_probe module missing

2018-04-04 Thread Ireneusz Szcześniak
Hi, I'm running an up-to-date Debian Stretch on an AMD64 computer. I would like to use the tcpprobe module, and so I'm trying to do: sudo modprobe tcp_probe But I get: modprobe: FATAL: Module tcp_probe not found in directory ... Why is this module missing? Is there a quick way of getting

Re: utf

2018-04-04 Thread deloptes
Nicolas George wrote: >> What if the question is "Find all the English words that have an E >> in the 5th position and a U in the 7th"? > > Yes, what? Who would ever ask such a question? What is the point of such > a question? > > The point of such a question is only to try and disprove my point

Re: utf

2018-04-04 Thread deloptes
rhkra...@gmail.com wrote: > Oh, I wasn't aware that mbox was deprecated--can you shed more light on > that. AFAIK, it is not defined in an RFC and is used by quite a few email > programs. yes but Maildir format was introduced for couple of reasons (as well as other formats). I wouldn't store my m

Re: Invalid UTF-8 byte?

2018-04-04 Thread Ben Caradoc-Davies
On 05/04/18 02:09, to...@tuxteam.de wrote: Try UTF-16, what Microsoft (and a couple of years ago Apple) love to call "Unicode": in more "Western" contexts every second byte is NULL! The Java platform uses UTF-16 internally: "The char data type (and therefore the value that a Character object

Re: utf

2018-04-04 Thread Stefan Monnier
> You just seem to have Decided, for reasons known only to you, that > The Character Length Of A String Is Not Useful. Despite literally > decades of programs that have used strlen() in various ways. strlen was mostly used in a context where char-length = byte-length = display-width. Most of tho

Re: tcp_probe module missing

2018-04-04 Thread deloptes
Ireneusz Szcześniak wrote: > Hi, > > I'm running an up-to-date Debian Stretch on an AMD64 computer. I > would like to use the tcpprobe module, and so I'm trying to do: > > sudo modprobe tcp_probe > > But I get: > > modprobe: FATAL: Module tcp_probe not found in directory ... > > Why is this

Re: Invalid UTF-8 byte? (was: Re: utf)

2018-04-04 Thread deloptes
rhkra...@gmail.com wrote: > I'll probably look into notmuch, just for kicks. > > I've considered maildir--it meets some of my requirements (that is, to > make something close to an askSam workalike), but one drawback is that it > is essentially one email (i.e., my "record").  One of the desirable

Re: Invalid UTF-8 byte?

2018-04-04 Thread Michael Stone
On Thu, Apr 05, 2018 at 09:42:19AM +1200, Ben Caradoc-Davies wrote: On 05/04/18 02:09, to...@tuxteam.de wrote: Try UTF-16, what Microsoft (and a couple of years ago Apple) love to call "Unicode": in more "Western" contexts every second byte is NULL! The Java platform uses UTF-16 internally:

Re: utf

2018-04-04 Thread Richard Hector
On 05/04/18 05:53, Nicolas George wrote: >> What if the question is "Find all the English words that have an E >> in the 5th position and a U in the 7th"? > > Yes, what? Who would ever ask such a question? What is the point of such > a question? Solving a crossword puzzle? Richard signature.as

Re: utf

2018-04-04 Thread tomas
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Apr 04, 2018 at 11:33:13PM +0200, deloptes wrote: [...] > other formats). I wouldn't store my mail in mbox anyway. For local > system/user mails as a simple default storage perhaps yes - it might be OK, > but for public mail, where you have 10