Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-22 Thread Daniel Shahaf
Control: clone -1 -2
Control: retitle -2 diffoscope: readelf(1): Ignore data/instruction addresses 
that are de facto line numbers
Control: retitle -1 diffoscope: readelf(1): Reduce noise from deltas of offsets 
embedded in instructions

Daniel Shahaf wrote on Tue, Sep 20, 2016 at 18:47:31 +:
> However, in the .text section, each disassembled instruction is preceded
> by its address.  I think it would make sense to have the diff ignore
> those addresses: they serve a purpose similar to line numbers, and
> ignoring them cannot cause a difference to be missed.

flexc++ has a difference on *every* line of several sections (.rodata,
.eh_frame, others) because the sections start 0xc0 bytes later in the
second build than in the first build:

│   │   │   │   │ -  0x0043a320 01000200  623a423a 633a433a 
b:B:c:C:
│   │   │   │   │ -  0x0043a330 64663a46 68693a49 3a4b6c3a 4c3a6d3a 
df:Fhi:I:Kl:L:m:
⋮
│   │   │   │   │ +  0x0043a3e0 01000200  623a423a 633a433a 
b:B:c:C:
│   │   │   │   │ +  0x0043a3f0 64663a46 68693a49 3a4b6c3a 4c3a6d3a 
df:Fhi:I:Kl:L:m:

Hence, filing this as a separate issue.  #838260 can remain about
offsets embedded in instructions.

(It's safe to ignore these addresses because the start/end of the
section already appear elsewhere in the diffed output.)

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Processed: Re: Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-22 Thread Debian Bug Tracking System
Processing control commands:

> clone -1 -2
Bug #838260 [diffoscope] diffoscope: Reduce noise from offsets deltas in 
readelf(1) diffs
Bug 838260 cloned as bug 838569
> retitle -2 diffoscope: readelf(1): Ignore data/instruction addresses that are 
> de facto line numbers
Bug #838569 [diffoscope] diffoscope: Reduce noise from offsets deltas in 
readelf(1) diffs
Changed Bug title to 'diffoscope: readelf(1): Ignore data/instruction addresses 
that are de facto line numbers' from 'diffoscope: Reduce noise from offsets 
deltas in readelf(1) diffs'.
> retitle -1 diffoscope: readelf(1): Reduce noise from deltas of offsets 
> embedded in instructions
Bug #838260 [diffoscope] diffoscope: Reduce noise from offsets deltas in 
readelf(1) diffs
Changed Bug title to 'diffoscope: readelf(1): Reduce noise from deltas of 
offsets embedded in instructions' from 'diffoscope: Reduce noise from offsets 
deltas in readelf(1) diffs'.

-- 
838260: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=838260
838569: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=838569
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-21 Thread Chris Lamb
> I was thinking of something like the HTML  tag.  In my browser,
> foo renders «foo» with a dotted underline
> whose raison d'être is your concern (a)

Even so, you can't search the page with CTRL+F and, of course, it makes the
output too different between --text and --html :)

Anyway, small issue ...


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-21 Thread Daniel Shahaf
Chris Lamb wrote on Tue, Sep 20, 2016 at 20:58:29 +0100:
> Daniel Shahaf wrote:
> 
> > Example output:
> 
> Alas I'm not very learned in ELF, so I will trust the specifics are fine,
> but just to check:
> 

I'm not too familiar with ELF either.  I know a little about which
C variables live in which section, e.g., .rodata is storage for string
literals.

> > 
> > .rodata#1 is 0xA70
> > .rodata#2 is 0xA80
> 
> … would be displayed (when different, of course!) as *something* like:
> 
>  - .rodata#1 is 0xA70
>  + .rodata#1 is 0xA71

Yes.

> > The actual hex values could be displayed as a tooltip on the 'lea' line,
> > or appended to that line as a '# comment'
> 
> So, tooltips are not only HTML-specific that would also hide data, 
> particularly
> for a) users who do not even know they need to run their mouse over something,
> b) users who generally drive their browser via a keyboard (probably more 
> common
> for users of diffoscope!) and c) users with accessibility requirements.

I was thinking of something like the HTML  tag.  In my browser,
foo renders «foo» with a dotted underline
whose raison d'être is your concern (a).  I assume the user agents of
people in categories (b) and (c) have similar solutions.

In any case, displaying the values in a comment is probably better since
it makes the information available without a user action.  (As I said in
my last email, the comment should be exempted from being diffed.)

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Chris Lamb
Daniel Shahaf wrote:

> Example output:

Alas I'm not very learned in ELF, so I will trust the specifics are fine,
but just to check:

> 
> .rodata#1 is 0xA70
> .rodata#2 is 0xA80

… would be displayed (when different, of course!) as *something* like:

 - .rodata#1 is 0xA70
 + .rodata#1 is 0xA71

> The actual hex values could be displayed as a tooltip on the 'lea' line,
> or appended to that line as a '# comment'

So, tooltips are not only HTML-specific that would also hide data, particularly
for a) users who do not even know they need to run their mouse over something,
b) users who generally drive their browser via a keyboard (probably more common
for users of diffoscope!) and c) users with accessibility requirements.

Anyway, great idea - love it.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Chris Lamb
Jérémy Bobbio wrote:

> Initially, I thought of this as a way to add image comparison as 
> felt sad not knowing any free software that could easily provide
> similar features to what GitHub offers

Pff, you don't like my existing image comparison? ;-)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds


Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Jérémy Bobbio
Daniel Shahaf:
> Jérémy Bobbio wrote on Tue, Sep 20, 2016 at 13:18:49 +:
>> But why stop with images? In the precise case of the readelf output,
>> having line-oriented diff means we are carrying around a useless and
>> confusing information: the line numbers are not helpful in anyway to
>> locate and undrstand the differences.
>>
>> But what if we could replace the line numbers by the instruction
>> addresses? Then the noise mentioned by Daniel disappears. Meanwhile, the
>> actual output will become even more relevant.
> 
> In the example in the OP, the (source code) line numbers and instruction
> addresses are the same between both builds.  It is the .rodata addresses
> embeddded into the instructions that differ.

Thanks for pointing this out, I had actually misunderstood the problem
at hand. :)

-- 
Lunar.''`.
lunar at debian.org : :Ⓐ :  # apt-get install anarchism
`. `'`
  `-



signature.asc
Description: OpenPGP digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Daniel Shahaf
Jérémy Bobbio wrote on Tue, Sep 20, 2016 at 13:18:49 +:
> But why stop with images? In the precise case of the readelf output,
> having line-oriented diff means we are carrying around a useless and
> confusing information: the line numbers are not helpful in anyway to
> locate and undrstand the differences.
> 
> But what if we could replace the line numbers by the instruction
> addresses? Then the noise mentioned by Daniel disappears. Meanwhile, the
> actual output will become even more relevant.

In the example in the OP, the (source code) line numbers and instruction
addresses are the same between both builds.  It is the .rodata addresses
embeddded into the instructions that differ.

However, in the .text section, each disassembled instruction is preceded
by its address.  I think it would make sense to have the diff ignore
those addresses: they serve a purpose similar to line numbers, and
ignoring them cannot cause a difference to be missed.

Cheers,

Daniel

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Daniel Shahaf
Chris Lamb wrote on Tue, Sep 20, 2016 at 10:26:18 +0100:
> > > Perhaps there is a perfect solution whereby we would normalise these two
> > > offsets to — making it up here! — relative values, but simply need […]
> >
> > I'm not sure I understand what your idea is.  Could you give an example
> > of how the output might look?
> 
> Apologies for not explaining myself better - I don't actually have a
> concrete idea for the output, but I was just expressing a wish to avoid
> a flag to ignore certain things so was using a hypothetical solution.

Perhaps the output could replace all offsets into the .rodata section by
sequential numbers?  For example, if the .rodata section starts at 0xA00
and ends at 0xC00, and the output references 0xA80, 0xA70, 0xB80,
and 0xB70, then those could be translated to .rodata#2, .rodata#1,
.rodata#4, and .rodata#3 respectively.  To make this lossless, the
(.rodata#42 ↦ 0xB53) mapping could be appended to the file and included
in the diff.

Example output:

lea«.rodata#1»(%rip),%rsi
⋮

.rodata#1 is 0xA70
.rodata#2 is 0xA80

The actual hex values could be displayed as a tooltip on the 'lea' line,
or appended to that line as a '# comment' that will be considered equal
by the unidiff (like 'diff -w' considers space and tab equal).

Cheers,

Daniel


> We already have a few and I wish they would/could disappear! :)
> 
> 
> Regards,
> 
> -- 
>   ,''`.
>  : :'  : Chris Lamb
>  `. `'`  la...@debian.org / chris-lamb.co.uk
>`-

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Jérémy Bobbio
Chris Lamb:
>> Could these offset differences in readelf(1) output be ignored, at least
>> optionally?
> 
> Love the idea! However, my gut cautions against ignoring them. even with an
> option. 
> 
> Perhaps there is a perfect solution whereby we would normalise these two
> offsets to — making it up here! — relative values, but simply need to 
> nclude that we have done that once in the diff. That way, we have a) still
> captured the underlying issue, b) reduced the noise, and c) avoided a
> cumbersome option flag.

One idea that crossed my mind at some point that might be able to solve
this as well: be able to record other kinds of differences than just
line-oriented ones. Initially, I thought of this as a way to add image
comparison as I felt sad not knowing any free software that could easily
provide similar features to what GitHub offers [1].

But why stop with images? In the precise case of the readelf output,
having line-oriented diff means we are carrying around a useless and
confusing information: the line numbers are not helpful in anyway to
locate and undrstand the differences.

But what if we could replace the line numbers by the instruction
addresses? Then the noise mentioned by Daniel disappears. Meanwhile, the
actual output will become even more relevant.

Such an approach would require some structural changes to the code, but
could have benefits on many fronts.

 [1]: https://help.github.com/articles/rendering-and-diffing-images/

Hope that's any useful,
-- 
Lunar.''`.
lunar at debian.org : :Ⓐ :  # apt-get install anarchism
`. `'`
  `-



signature.asc
Description: OpenPGP digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Chris Lamb
Daniel,

> > Perhaps there is a perfect solution whereby we would normalise these two
> > offsets to — making it up here! — relative values, but simply need […]
>
> I'm not sure I understand what your idea is.  Could you give an example
> of how the output might look?

Apologies for not explaining myself better - I don't actually have a
concrete idea for the output, but I was just expressing a wish to avoid
a flag to ignore certain things so was using a hypothetical solution.

We already have a few and I wish they would/could disappear! :)


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-20 Thread Daniel Shahaf
Chris Lamb wrote on Mon, Sep 19, 2016 at 09:47:10 +0100:
> Perhaps there is a perfect solution whereby we would normalise these two
> offsets to — making it up here! — relative values,

I'm not sure I understand what your idea is.  Could you give an example
of how the output might look?

Do you mean, for example,
.
@@ -1,2 +3,4 @@
-0x42
+«0x42 + 0x10»
.
where the original files read "0x42" (first file) and "0x52" (second file)?

> but simply need to nclude that we have done that once in the diff.
> That way, we have a) still captured the underlying issue, b) reduced
> the noise, and c) avoided a cumbersome option flag.

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-19 Thread Chris Lamb
Hi Daniel,

> Could these offset differences in readelf(1) output be ignored, at least
> optionally?

Love the idea! However, my gut cautions against ignoring them. even with an
option. 

Perhaps there is a perfect solution whereby we would normalise these two
offsets to — making it up here! — relative values, but simply need to 
nclude that we have done that once in the diff. That way, we have a) still
captured the underlying issue, b) reduced the noise, and c) avoided a
cumbersome option flag.


Regards,

-- 
  ,''`.
 : :'  : Chris Lamb
 `. `'`  la...@debian.org / chris-lamb.co.uk
   `-

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Bug#838260: diffoscope: Reduce noise from offsets deltas in readelf(1) diffs

2016-09-19 Thread Daniel Shahaf
Package: diffoscope
Version: 60
Severity: wishlist

Dear Maintainer,

A difference in an ELF binary file can cause offsets throughout the
file to shift, usually by all of them by the same amount.

Typical example:

│   │   │   │   │  ./build/../src/nvim/indent_c.c:658
│   │   │   │   │ -   44436:48 8d 35 01 77 1c 00lea
0x1c7701(%rip),%rsi
│   │   │   │   │ +   44436:48 8d 35 f8 76 1c 00lea
0x1c76f8(%rip),%rsi
│   │   │   │   │  ./build/../src/nvim/main.c:749
│   │   │   │   │ 46eea:48 8b 3c 24 mov(%rsp),%rdi
│   │   │   │   │ -   46eee:48 8d 35 7f 50 1c 00lea
0x1c507f(%rip),%rsi
│   │   │   │   │ +   46eee:48 8d 35 76 50 1c 00lea
0x1c5076(%rip),%rsi

Here, 0x1c7701-0x1c76f8 = 0x1c507f-0x1c5076 = 9.  There are several
screenfuls of such differences, which reduces the signal-to-noise ratio
of the output, since all of these differences are secondary; the primary
difference is whatever caused the 9 bytes shift in the first place.

(On this instance, the 9 bytes offset was caused by a string literal
being present in the first build but not in the second build.)

Could these offset differences in readelf(1) output be ignored, at least
optionally?  This would make it easier to find the root cause by reading
the diff.

Cheers,

Daniel

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds