Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.

2018-02-23 Thread Santiago R.R.
El 23/02/18 a las 08:42, Caronte Estigia escribió:
> I have a text file, identified as a html document by "file" command which only
> contains (from what I can see on the file) text characters. In that file there

could you share that file (privately if you prefer)?

> are numerous strings containing "2018", but when I use grep to find that 
> string
> I get:
> 
>   Calendario
> 
> anterior
> 
> siguiente
>   Sumario BOE-S-2018-47:
>  title
> ="BOE-S-2018-47 en formato PDF firmado " onclick="javascript:
> pageTracker._trackPageview('/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf');">PDF
> 
>  "Sumario jueves 22 de febrero de 2018 como documento XML">XML
> Notificaciones
> --->Coincidencia en el fichero binario ayer.html<

Could you try to grep the file previously setting LC_ALL='C'? (and
without the -a option)

What is the output of `locale -a`

> Using previous grep version all strings were found, but now if I want grep to
> work as before I need to use "grep -a".
> 
> I guess the previous version of grep took "-a" behaviour as the default one,

That is not exact. Take a look at /usr/share/doc/grep/NEWS.gz, to
changes made in 2.21 and 2.23 versions. You would find some explanations
there.

> which treated all files as text unless specified otherwise (which in my 
> opinion
> is the right way to go), I can't happen to see the security issues in this
> behaviour and how those security issues dissapear if I specify the "-a"
> parameter. Looks to me (without reviewing grep's code) that it is trying to
> identify what kind of file it is checking while searching the file (a couple 
> of
> lines are found before the binary message), and I guess it shouldn't do that. 
> I
> think it just have to treat files as text unless specified otherwise with the 
> --binary-files parameter.


signature.asc
Description: PGP signature


Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.

2018-02-23 Thread Caronte Estigia
Good morning Santiago.
You're right, I upgraded from jessie to stretch and grep package is:
ii  grep  2.27-2
  amd64    GNU grep, egrep and fgrep
I have a text file, identified as a html document by "file" command which only 
contains (from what I can see on the file) text characters. In that file there 
are numerous strings containing "2018", but when I use grep to find that string 
I get:
  Calendario
anterior
siguiente
  Sumario BOE-S-2018-47:
    PDF
    XML
    Notificaciones
--->Coincidencia en el fichero binario ayer.html<

Using previous grep version all strings were found, but now if I want grep to 
work as before I need to use "grep -a".
I guess the previous version of grep took "-a" behaviour as the default one, 
which treated all files as text unless specified otherwise (which in my opinion 
is the right way to go), I can't happen to see the security issues in this 
behaviour and how those security issues dissapear if I specify the "-a" 
parameter. Looks to me (without reviewing grep's code) that it is trying to 
identify what kind of file it is checking while searching the file (a couple of 
lines are found before the binary message), and I guess it shouldn't do that. I 
think it just have to treat files as text unless specified otherwise with the  
--binary-files parameter.
Regards.Francisco
 

El Jueves 22 de febrero de 2018 15:33, Santiago R.R. 
 escribió:
 

 El 22/02/18 a las 11:18, rodrifra escribió:
> Package: grep
> Version: 2.27-2
> Severity: normal
> 
> Dear Maintainer,
> 
> 
>    * What led up to the situation?
> 
>    Scripts working with grep stopped working after the update. No patterns 
>where detected ant the message informing of coincidences in the binary file 
>was displayed. The file is a downloaded html and "file" command returns:
>    
>    selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators
> 
>    * What exactly did you do (or not do) that was effective (or
>      ineffective)?
> 
>    Explicitly indicating grep to treat the file as text solved the problem: 
>"grep -a "
> 
> -- System Information:
> Debian Release: 9.3
>  APT prefers stable-updates
>  APT policy: (500, 'stable-updates'), (500, 'stable')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core)
> Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), 
> LANGUAGE=es_ES.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> Init: sysvinit (via /sbin/init)
> 

I suppose you upgraded from jessie to stretch.
I am not sure of fully understanding your message. Could you please
clarify what version of grep didn't detect the patterns?

Anyway, as far as I understand from upstream's comments, grep's previous
behaviour when detecting "binary files" was not suitable.  The change
was made to avoid security issues, or undetermined behaviours, that
could be related to invalid characters. In your case, the .html file
could include invalid chars at the beginning, or the encoding is maybe
wrong.

This is probably not a bug.

Cheers,

 -- Santiago

   

Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.

2018-02-22 Thread Santiago R.R.
El 22/02/18 a las 11:18, rodrifra escribió:
> Package: grep
> Version: 2.27-2
> Severity: normal
> 
> Dear Maintainer,
> 
> 
>* What led up to the situation?
> 
>Scripts working with grep stopped working after the update. No patterns 
> where detected ant the message informing of coincidences in the binary file 
> was displayed. The file is a downloaded html and "file" command returns:
>
>selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators
> 
>* What exactly did you do (or not do) that was effective (or
>  ineffective)?
> 
>Explicitly indicating grep to treat the file as text solved the problem: 
> "grep -a "
> 
> -- System Information:
> Debian Release: 9.3
>   APT prefers stable-updates
>   APT policy: (500, 'stable-updates'), (500, 'stable')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core)
> Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), 
> LANGUAGE=es_ES.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> Init: sysvinit (via /sbin/init)
> 

I suppose you upgraded from jessie to stretch.
I am not sure of fully understanding your message. Could you please
clarify what version of grep didn't detect the patterns?

Anyway, as far as I understand from upstream's comments, grep's previous
behaviour when detecting "binary files" was not suitable.  The change
was made to avoid security issues, or undetermined behaviours, that
could be related to invalid characters. In your case, the .html file
could include invalid chars at the beginning, or the encoding is maybe
wrong.

This is probably not a bug.

Cheers,

 -- Santiago


signature.asc
Description: PGP signature


Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.

2018-02-22 Thread rodrifra
Package: grep
Version: 2.27-2
Severity: normal

Dear Maintainer,


   * What led up to the situation?

   Scripts working with grep stopped working after the update. No patterns 
where detected ant the message informing of coincidences in the binary file was 
displayed. The file is a downloaded html and "file" command returns:
   
   selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators

   * What exactly did you do (or not do) that was effective (or
 ineffective)?

   Explicitly indicating grep to treat the file as text solved the problem: 
"grep -a "

-- System Information:
Debian Release: 9.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core)
Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), 
LANGUAGE=es_ES.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)

Versions of packages grep depends on:
ii  dpkg  1.18.24
ii  install-info  6.3.0.dfsg.1-1+b2
ii  libc6 2.24-11+deb9u1
ii  libpcre3  2:8.39-3

grep recommends no packages.

Versions of packages grep suggests:
ii  libpcre3  2:8.39-3

-- no debconf information