On Wed, Jun 01, 2016 at 07:23:49AM -0000, Karl Kastner wrote: > Apparently the issue is not the umlauts (at least on my machine), but > ligatures, &c. I've a script to rename files, but some always slip. > Especially the incapability of the system to properly handle Russian > file names and contents, due to different encodings, is a nuisance. And > when processing strings in perl-scripts, it is a nightmare.
I had the impression that the Russian papers were encoded in KOI8 -- it was odd that some of my tools showed the Cyrillic without trouble and others showed the usual garbage on wrong encoding. Maybe recoding those into utf8 would also help. Almost nothing handles multiple file name encodings well; and ligatures in filenames is an extremely unpleasant thing to find. The iconv(1) tool may help renaming files, but I doubt it can help with ligatures. > I was not aware of that grep switches from text to binary mode while > parsing, and that it only does so if a grepped line contains a binary > character. It would be good if the warning was send to stderr, so that > it does not get lost in pipes. Anyway, I already added the alias grep > --text to my ~/.bashrc. Yes, I was surprised that it went to stdout too; however, it's an old enough tool that they may not be able to make changes of that scope even if they wanted to, for fear of what else might break. > Just to continue the discussion, is there a similar switch for locate? > > locate comparison-of-turbulence-models > Binary file (standard input) matches That's a real bummer. :/ I don't know of any similar tool. If it's just built on grep it may just take a -a flag there too... Probably it's best to take that one up with upstream locate developers. Thanks -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to grep in Ubuntu. https://bugs.launchpad.net/bugs/1587101 Title: Grep silently discards tails of long text streams Status in grep package in Ubuntu: Invalid Bug description: Grep silently discards tails of long streams on my machine: grep -n discharge_calculate_.m 0.txt 64264:/home/pia/phd/src/discharge/discharge_calculate_.m So far, so good, "discharge_calculate_.m" is grepped on line 64264. grep -n discharge 0.txt | grep calculate Apparently, grep gobbled, and fails to grep the line. Some tests: tail -n +64264 0.txt | grep discharge | grep calculate_ /home/pia/phd/src/discharge/discharge_calculate_.m grep -a -n discharge 0.txt | grep calculate_ 64264:/home/pia/phd/src/discharge/discharge_calculate_.m file 0.txt 0.txt: ISO-8859 text I noticed this when not finding files that I knew to exist in the directory tree and thought at first it were a bug in locate or find. I could not reproduce this on the fly when the lines leading to the grep line were filled with arbitrary characters, so the behaviour depends also on the content of the stream, not only on its length. Grep seems to interpret the text stream as binary. Looks very much like a buffer overflow, that's why I mark this as a security vulnerability. In case this is intended behaviour, grep should not silently discard the tail but output a warning to stderr. A less than 60k line limit seems also a bit too low in the 64bit era. ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: grep 2.25-1~16.04.1 ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8 Uname: Linux 4.4.0-22-generic x86_64 ApportVersion: 2.20.1-0ubuntu2 Architecture: amd64 CurrentDesktop: GNOME Date: Mon May 30 15:41:35 2016 InstallationDate: Installed on 2015-11-05 (207 days ago) InstallationMedia: Ubuntu 14.04.3 LTS "Trusty Tahr" - Beta amd64 (20150805) SourcePackage: grep UpgradeStatus: No upgrade log present (probably fresh install) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/grep/+bug/1587101/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp

