Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.
El 23/02/18 a las 08:42, Caronte Estigia escribió: > I have a text file, identified as a html document by "file" command which only > contains (from what I can see on the file) text characters. In that file there could you share that file (privately if you prefer)? > are numerous strings containing "2018", but when I use grep to find that > string > I get: > > Calendario > > anterior > > siguiente > Sumario BOE-S-2018-47: > title > ="BOE-S-2018-47 en formato PDF firmado " onclick="javascript: > pageTracker._trackPageview('/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf');">PDF > > "Sumario jueves 22 de febrero de 2018 como documento XML">XML > Notificaciones > --->Coincidencia en el fichero binario ayer.html< Could you try to grep the file previously setting LC_ALL='C'? (and without the -a option) What is the output of `locale -a` > Using previous grep version all strings were found, but now if I want grep to > work as before I need to use "grep -a". > > I guess the previous version of grep took "-a" behaviour as the default one, That is not exact. Take a look at /usr/share/doc/grep/NEWS.gz, to changes made in 2.21 and 2.23 versions. You would find some explanations there. > which treated all files as text unless specified otherwise (which in my > opinion > is the right way to go), I can't happen to see the security issues in this > behaviour and how those security issues dissapear if I specify the "-a" > parameter. Looks to me (without reviewing grep's code) that it is trying to > identify what kind of file it is checking while searching the file (a couple > of > lines are found before the binary message), and I guess it shouldn't do that. > I > think it just have to treat files as text unless specified otherwise with the > --binary-files parameter. signature.asc Description: PGP signature
Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.
Good morning Santiago. You're right, I upgraded from jessie to stretch and grep package is: ii grep 2.27-2 amd64 GNU grep, egrep and fgrep I have a text file, identified as a html document by "file" command which only contains (from what I can see on the file) text characters. In that file there are numerous strings containing "2018", but when I use grep to find that string I get: Calendario anterior siguiente Sumario BOE-S-2018-47: PDF XML Notificaciones --->Coincidencia en el fichero binario ayer.html< Using previous grep version all strings were found, but now if I want grep to work as before I need to use "grep -a". I guess the previous version of grep took "-a" behaviour as the default one, which treated all files as text unless specified otherwise (which in my opinion is the right way to go), I can't happen to see the security issues in this behaviour and how those security issues dissapear if I specify the "-a" parameter. Looks to me (without reviewing grep's code) that it is trying to identify what kind of file it is checking while searching the file (a couple of lines are found before the binary message), and I guess it shouldn't do that. I think it just have to treat files as text unless specified otherwise with the --binary-files parameter. Regards.Francisco El Jueves 22 de febrero de 2018 15:33, Santiago R.R.escribió: El 22/02/18 a las 11:18, rodrifra escribió: > Package: grep > Version: 2.27-2 > Severity: normal > > Dear Maintainer, > > > * What led up to the situation? > > Scripts working with grep stopped working after the update. No patterns >where detected ant the message informing of coincidences in the binary file >was displayed. The file is a downloaded html and "file" command returns: > > selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators > > * What exactly did you do (or not do) that was effective (or > ineffective)? > > Explicitly indicating grep to treat the file as text solved the problem: >"grep -a " > > -- System Information: > Debian Release: 9.3 > APT prefers stable-updates > APT policy: (500, 'stable-updates'), (500, 'stable') > Architecture: amd64 (x86_64) > > Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core) > Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), > LANGUAGE=es_ES.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/dash > Init: sysvinit (via /sbin/init) > I suppose you upgraded from jessie to stretch. I am not sure of fully understanding your message. Could you please clarify what version of grep didn't detect the patterns? Anyway, as far as I understand from upstream's comments, grep's previous behaviour when detecting "binary files" was not suitable. The change was made to avoid security issues, or undetermined behaviours, that could be related to invalid characters. In your case, the .html file could include invalid chars at the beginning, or the encoding is maybe wrong. This is probably not a bug. Cheers, -- Santiago
Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.
El 22/02/18 a las 11:18, rodrifra escribió: > Package: grep > Version: 2.27-2 > Severity: normal > > Dear Maintainer, > > >* What led up to the situation? > >Scripts working with grep stopped working after the update. No patterns > where detected ant the message informing of coincidences in the binary file > was displayed. The file is a downloaded html and "file" command returns: > >selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators > >* What exactly did you do (or not do) that was effective (or > ineffective)? > >Explicitly indicating grep to treat the file as text solved the problem: > "grep -a " > > -- System Information: > Debian Release: 9.3 > APT prefers stable-updates > APT policy: (500, 'stable-updates'), (500, 'stable') > Architecture: amd64 (x86_64) > > Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core) > Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), > LANGUAGE=es_ES.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/dash > Init: sysvinit (via /sbin/init) > I suppose you upgraded from jessie to stretch. I am not sure of fully understanding your message. Could you please clarify what version of grep didn't detect the patterns? Anyway, as far as I understand from upstream's comments, grep's previous behaviour when detecting "binary files" was not suitable. The change was made to avoid security issues, or undetermined behaviours, that could be related to invalid characters. In your case, the .html file could include invalid chars at the beginning, or the encoding is maybe wrong. This is probably not a bug. Cheers, -- Santiago signature.asc Description: PGP signature
Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.
Package: grep Version: 2.27-2 Severity: normal Dear Maintainer, * What led up to the situation? Scripts working with grep stopped working after the update. No patterns where detected ant the message informing of coincidences in the binary file was displayed. The file is a downloaded html and "file" command returns: selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators * What exactly did you do (or not do) that was effective (or ineffective)? Explicitly indicating grep to treat the file as text solved the problem: "grep -a " -- System Information: Debian Release: 9.3 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core) Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), LANGUAGE=es_ES.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: sysvinit (via /sbin/init) Versions of packages grep depends on: ii dpkg 1.18.24 ii install-info 6.3.0.dfsg.1-1+b2 ii libc6 2.24-11+deb9u1 ii libpcre3 2:8.39-3 grep recommends no packages. Versions of packages grep suggests: ii libpcre3 2:8.39-3 -- no debconf information