Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-21 Thread Lawrence Crowl
On 10/20/11, Gabriel Charette gcharet...@gmail.com wrote:
 I just thought about something..

 Earlier I said that ALL line_table issues were resolved after this
 patch (as it ignores the re-included headers that were guarded, as the
 non-pph compiler does naturally).

 One problem remains however, I'm pretty sure that re-included
 non-pph'ed header's line_table entries are still showing up multiple
 times (as the direct non-pph childs of a given pph_include have their
 line_table entries copied one by one from the pph file).

 I think we were talking about somehow remembering guards context in
 which DECLs were declared and then ignoring DECLs streamed in if they
 belong to a given header guard type that was previously seen in a
 prior include using the same non-pph header, allowing us to ignore
 those DECLs that are re-declared when they should have been guarded
 out the second time.

 I'm not sure whether there is machinery to handle non-pph re-includes
 yet... but... at the very least, I'm pretty sure those non-pph entries
 still show up multiple times in the line_table.

 Now, we can't just remove/ignore those entries either... doing so
 would alter the expected location offset (pph_loc_offset) applied to
 all tokens streamed in directly from the pph header.

 What we could potentially do is:
 - ignore the repeated non-pph entry
 - remember the number of locations this entry was supposed to take
 (call that pph_loc_ignored_offset)
 - then for DECLs imported after it we would then need an offset of
 pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing
 entries in the line_table

 The problem here obviously is that I don't think we have a way of
 knowing which DECLs come before, inside, and after a given non-pph
 header included in the parent pph header which we are currently
 reading.

 Furthermore, a DECL coming after the non-pph header could potentially
 refer to something inside the ignored non-pph header and the
 source_location of the referred token would now be invalid (although
 that might already be fixed by the cache hit which would redirect that
 token reference to the same token in the first included copy of that
 same header which wasn't actually skipped as it was first and which is
 valid)


 On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote:
 @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream)
   int entries_offset = line_table-used -
 PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker
 (stream);

 -  pph_reading_includes++;
 -
   for (first = true; next_lt_marker != PPH_LINETABLE_END;
next_lt_marker = pph_in_linetable_marker (stream))
 {
 @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream)
  else
lm-included_from += entries_offset;


 Also, if we do ignore some non-pph entries, the included_from
 calculation is going to need some trickier logic as well (it's fine
 for the pph includes though as each child calculates it's own offset)

 - gcc_assert (lm-included_from  (int) line_table-used);
 -

 Also, I think this slipped in my previous comment, but I don't see how
 this assert could trigger in the current code. If it did trigger
 something was definitely wrong as it asserts the offseted
 included_from is referring to an entry that is actually in the
 line_table...

  lm-start_location += pph_loc_offset;

I'm wondering if we shouldn't just whitelist the problematic cases
that we know about in the system/standard headers.  It seems that all
others we could reasonably complain to the maintainers of the code.

-- 
Lawrence Crowl


Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-20 Thread Gabriel Charette
I just thought about something..

Earlier I said that ALL line_table issues were resolved after this
patch (as it ignores the re-included headers that were guarded, as the
non-pph compiler does naturally).

One problem remains however, I'm pretty sure that re-included
non-pph'ed header's line_table entries are still showing up multiple
times (as the direct non-pph childs of a given pph_include have their
line_table entries copied one by one from the pph file).

I think we were talking about somehow remembering guards context in
which DECLs were declared and then ignoring DECLs streamed in if they
belong to a given header guard type that was previously seen in a
prior include using the same non-pph header, allowing us to ignore
those DECLs that are re-declared when they should have been guarded
out the second time.

I'm not sure whether there is machinery to handle non-pph re-includes
yet... but... at the very least, I'm pretty sure those non-pph entries
still show up multiple times in the line_table.

Now, we can't just remove/ignore those entries either... doing so
would alter the expected location offset (pph_loc_offset) applied to
all tokens streamed in directly from the pph header.

What we could potentially do is:
- ignore the repeated non-pph entry
- remember the number of locations this entry was supposed to take
(call that pph_loc_ignored_offset)
- then for DECLs imported after it we would then need an offset of
pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing
entries in the line_table

The problem here obviously is that I don't think we have a way of
knowing which DECLs come before, inside, and after a given non-pph
header included in the parent pph header which we are currently
reading.

Furthermore, a DECL coming after the non-pph header could potentially
refer to something inside the ignored non-pph header and the
source_location of the referred token would now be invalid (although
that might already be fixed by the cache hit which would redirect that
token reference to the same token in the first included copy of that
same header which wasn't actually skipped as it was first and which is
valid)


On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote:
 @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream)
   int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream);

 -  pph_reading_includes++;
 -
   for (first = true; next_lt_marker != PPH_LINETABLE_END;
        next_lt_marker = pph_in_linetable_marker (stream))
     {
 @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream)
          else
            lm-included_from += entries_offset;


Also, if we do ignore some non-pph entries, the included_from
calculation is going to need some trickier logic as well (it's fine
for the pph includes though as each child calculates it's own offset)

 - gcc_assert (lm-included_from  (int) line_table-used);
 -

Also, I think this slipped in my previous comment, but I don't see how
this assert could trigger in the current code. If it did trigger
something was definitely wrong as it asserts the offseted
included_from is referring to an entry that is actually in the
line_table...

  lm-start_location += pph_loc_offset;

Cheers,
Gab


 --
 This patch is available for review at http://codereview.appspot.com/5235061



Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-17 Thread Diego Novillo
On Fri, Oct 14, 2011 at 20:27, Gabriel Charette gcharet...@gmail.com wrote:
 Yes, I understand that.

 But when the second 2.pph is skipped when reading foo.pph, the reading
 of its line_table is also skipped (as foo.pph doesn't contain the
 line_table information for 2.h, 2.pph does and adds it when its
 included as a child, but if it's skipped, the line_table info for 2.h
 should never make it in the line_table), so I don't see why this is an
 issue for the line_table (other than the assert about the number of
 line table entries read). What I was suggesting is that as far as the
 assert is concerned it would be stronger to count the number of
 skipped child headers on read and assert num_read+num_skipped ==
 num_expected_childs basically (it is still only an assert so no big
 deal I guess).

Ah, I see what you are saying.  I didn't really bother too much with
that assert.  Since I was not reading the line table again, I figured
both asserts were triggering because of the different values coming
from the skipped file, so I left them out.


 Essentially this patch fixes the last bug we had in the line_table
 merging (i.e. that guarded out headers in the non-pph version weren't
 guarded out in the pph version) and this is a good thing. I'm just
 being picky about weakening asserts!


 I still think it would be nice to have a way to test constructs like
 the line_table at the end of parsing (maybe a new flag, as I was
 suggesting in my previous email, as gcc doesn't allow for modular
 testing) and compare pph and non-pph versions. Testing at this level
 would potentially be much better than trying to understand tricky test
 failures from the ground up. This is beyond the scope of this patch of
 course, but something to keep in mind I think.

Yeah.  I'll come back to it at a later point.


Diego.


Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-14 Thread Diego Novillo

On 11-10-13 17:55 , Gabriel Charette wrote:


I'm not sure exactly how you skip headers already parsed now (we
didn't used to when I wrote this code and that was the only problem
remaining in the line_table (i.e. duplicate entries for guarded
headers in the non-pph compile)), but couldn't you count the number of
skipped entries and assert (line_table-used - used_before) +
numSkipped == expected_in) ?


The problem is that the compilation process of foo.h - foo.pph may 
generate different line tables than a compile that includes foo.pph. 
For instance,


foo.h:
#include 1.pph
#include 2.pph
#include 3.pph

foo.cc:
#include 2.pph
#include foo.pph


When we compile foo.h, the line table incorporates the effects of 
including 2.pph, and that's what we save to foo.pph.  However, when 
compiling foo.cc, the first thing we do is include 2.pph, so when 
processing the include for foo.pph, we will completely skip over 2.pph.


That's why we cannot really have the same line table that we had when we 
generated foo.pph.



Diego.


Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-14 Thread Gabriel Charette
Yes, I understand that.

But when the second 2.pph is skipped when reading foo.pph, the reading
of its line_table is also skipped (as foo.pph doesn't contain the
line_table information for 2.h, 2.pph does and adds it when its
included as a child, but if it's skipped, the line_table info for 2.h
should never make it in the line_table), so I don't see why this is an
issue for the line_table (other than the assert about the number of
line table entries read). What I was suggesting is that as far as the
assert is concerned it would be stronger to count the number of
skipped child headers on read and assert num_read+num_skipped ==
num_expected_childs basically (it is still only an assert so no big
deal I guess).

Essentially this patch fixes the last bug we had in the line_table
merging (i.e. that guarded out headers in the non-pph version weren't
guarded out in the pph version) and this is a good thing. I'm just
being picky about weakening asserts!


I still think it would be nice to have a way to test constructs like
the line_table at the end of parsing (maybe a new flag, as I was
suggesting in my previous email, as gcc doesn't allow for modular
testing) and compare pph and non-pph versions. Testing at this level
would potentially be much better than trying to understand tricky test
failures from the ground up. This is beyond the scope of this patch of
course, but something to keep in mind I think.

Gab

On Fri, Oct 14, 2011 at 8:16 AM, Diego Novillo dnovi...@google.com wrote:

 On 11-10-13 17:55 , Gabriel Charette wrote:

 I'm not sure exactly how you skip headers already parsed now (we
 didn't used to when I wrote this code and that was the only problem
 remaining in the line_table (i.e. duplicate entries for guarded
 headers in the non-pph compile)), but couldn't you count the number of
 skipped entries and assert (line_table-used - used_before) +
 numSkipped == expected_in) ?

 The problem is that the compilation process of foo.h - foo.pph may generate 
 different line tables than a compile that includes foo.pph. For instance,

 foo.h:
 #include 1.pph
 #include 2.pph
 #include 3.pph

 foo.cc:
 #include 2.pph
 #include foo.pph


 When we compile foo.h, the line table incorporates the effects of including 
 2.pph, and that's what we save to foo.pph.  However, when compiling foo.cc, 
 the first thing we do is include 2.pph, so when processing the include for 
 foo.pph, we will completely skip over 2.pph.

 That's why we cannot really have the same line table that we had when we 
 generated foo.pph.


 Diego.


Re: [pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-13 Thread Gabriel Charette
Just looked at the line_table related sections, but see comments below:

On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote:

 Currently, the consistency check done on pre-processor symbols is
 triggering on symbols that are not really problematic (e.g., symbols
 used for double-include guards).

 The problem is that in the testsuite, we are refusing to process PPH
 images that fail that test, which means we don't get to test other
 issues.  To avoid this, I changed the error() call to warning().  Seemed
 innocent enough, but there were more problems behind that one:

 1- We do not really try to avoid reading PPH images more than once.
   This problem is different than the usual double-inclusion guard.
   For instance, suppose a file foo.pph includes 1.pph, 2.pph and
   3.pph.  When generating foo.pph, we read all 3 files just once and
   double-include guards do not need to trigger.  However, if we are
   later building a TU with:
        #include 2.pph
        #include foo.pph
   we first read 2.pph and when reading foo.pph, we try to read 2.pph
   again, because it is mentioned in foo.pph's line map table.

   I added a guard in pph_stream_open() so it doesn't try to open the
   same file more than once, but that meant adjusting some of the
   assertions while reading the line table.  We should not expect to
   find foo.pph's line map table exactly like the one we wrote.

That makes sense.

 @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream)
   int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES;
   enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream);

 -  pph_reading_includes++;
 -
   for (first = true; next_lt_marker != PPH_LINETABLE_END;
        next_lt_marker = pph_in_linetable_marker (stream))
     {
 @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream)
          else
            lm-included_from += entries_offset;

 -         gcc_assert (lm-included_from  (int) line_table-used);
 -

This should still hold, it is impossible that included_from points to
an entry that doesn't exist (i.e. beyond line_table-used), but since
we recalculate it on the previous line, adding entries_offset, this
was just a safe check to make sure everything read makes sense.

          lm-start_location += pph_loc_offset;

          line_table-used++;
        }
     }

 -  pph_reading_includes--;
 +  /* We used to expect exactly the same number of entries, but files
 +     included from this PPH file may sometimes not be needed.  For
 +     example,
 +
 +       #include 2.pph
 +       #include foo.pph
 +         +--  #include 1.pph
 +               #include 2.pph
 +               #include 3.pph
 +
 +     When foo.pph was originally created, the line table was built
 +     with inclusions of 1.pph, 2.pph and 3.pph.  But when compiling
 +     the main translation unit, we include 2.pph before foo.pph, so
 +     the inclusion of 2.pph from foo.pph does nothing.  Leaving the
 +     line table in a different shape than the original compilation.

 +     Instead of insisting on getting EXPECTED_IN entries, we expect at
 +     most EXPECTED_IN entries.  */
   {
     unsigned int expected_in = pph_in_uint (stream);
 -    gcc_assert (line_table-used - used_before == expected_in);
 +    gcc_assert (line_table-used - used_before = expected_in);

I'm not sure exactly how you skip headers already parsed now (we
didn't used to when I wrote this code and that was the only problem
remaining in the line_table (i.e. duplicate entries for guarded
headers in the non-pph compile)), but couldn't you count the number of
skipped entries and assert (line_table-used - used_before) +
numSkipped == expected_in) ?

I'd have to re-download the code, I've bee following through patches,
but I'm not so sure now exactly how the guarded headers skipping is
done, my memorized knowledge of the codebase has diverged I feel..!

A more important note: I think it could be worth having a new flag
that outputs the line_table when done parsing (as a mean to robustly
test it). My way to test it usually was to breakpoint on
varpool_assemble_decl (a random choice, but it was only called after
parsing was done...), both in pph and non-pph compiles and compare the
line_table in gdb However, to have a stable test in the long run,
it could be nice to have a flag that asks for an output of the
line_table and then we could checksum and compare the line_table
outputted by the pph and non-pph compiles.

A good test I had found to break in and analyze the line_table was
p4eabi.h as it pretty much had all the problems that I fixed regarding
the line_table (it also has re-includes if I remember correctly, but
that wasn't a problem before as we would not guard out re-includes as
I just mentioned above).

Having such a robust test would be important I feel as, as we saw with
previous bugs, discrepancies in the line_table can result in tricky
unique ID diffs.

If you are 

[pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-11 Thread Diego Novillo

Currently, the consistency check done on pre-processor symbols is
triggering on symbols that are not really problematic (e.g., symbols
used for double-include guards).

The problem is that in the testsuite, we are refusing to process PPH
images that fail that test, which means we don't get to test other
issues.  To avoid this, I changed the error() call to warning().  Seemed
innocent enough, but there were more problems behind that one:

1- We do not really try to avoid reading PPH images more than once.
   This problem is different than the usual double-inclusion guard.
   For instance, suppose a file foo.pph includes 1.pph, 2.pph and
   3.pph.  When generating foo.pph, we read all 3 files just once and
   double-include guards do not need to trigger.  However, if we are
   later building a TU with:
#include 2.pph
#include foo.pph
   we first read 2.pph and when reading foo.pph, we try to read 2.pph
   again, because it is mentioned in foo.pph's line map table.

   I added a guard in pph_stream_open() so it doesn't try to open the
   same file more than once, but that meant adjusting some of the
   assertions while reading the line table.  We should not expect to
   find foo.pph's line map table exactly like the one we wrote.
   
2- We cannot keep a global list of included files to consult external
   caches.  In the example above, if foo.pph needs to resolve an
   external reference to 2.pph, it needs to index into the 2nd slot in
   its include vector.  However, the file including foo.pph needs to
   index into the 1st slot in its include vector (since 2.pph is
   included first).  This meant moving the includes field inside
   struct pph_stream.

3- When reading a type, we should not try to access the method vector
   for it, unless the type is a class.

There's some more consequences of this, but the patch was starting to
become too big, so I'm submitting it now.  This fixes a couple of
files and changes the expected error on another two.  I'll be fixing
them separately.

Tested on x86_64.  Committed to branch.


Diego.


* pph-streamer-in.c (pph_reading_includes): Remove.  Update all users.
(pph_in_include): Call pph_add_include with the newly
materialized stream.
(pph_in_line_table_and_includes): Document differences between
non-PPH compiles and PPH compiles wrt line table behaviour.
Modify assertions accordingly.
(pph_read_tree_header): Tidy.
(report_validation_error): Change error() call to warning().
(pph_image_already_read): Remove.  Update all users.
(pph_read_file_1): If STREAM-IN_MEMORY_P is set, return.
Call pph_mark_strea_read.
(pph_add_read_image): Remove.
(pph_read_file): Change return type to pph_stream *.  Update all
users.
(pph_reader_finish): Remove.
* pph-streamer-out.c (pph_writer_init): Tidy.
(pph_add_include): Remove.
(pph_get_marker_for): Always consult the pre-loaded cache first.
(pph_writer_add_include): New.
* pph-streamer.c (pph_read_images): Make static.
(pph_init_preloaded_cache): Make static.
(pph_streamer_init): New.
(pph_streamer_finish): New.
(pph_find_stream_for): New.
(pph_mark_stream_read): New.
(pph_stream_open): Call pph_find_stream_for.  If the stream
already exists, return it.
(pph_add_include): Move from pph-streamer-in.c.  Add new
argument STREAM.
(pph_cache_lookup_in_includes): Add new argument STREAM.
Update all users.
* pph-streamer.h (pph_read_images): Remove extern declaration.
Move field INCLUDES out of union W.  Update all users.
Add field IN_MEMORY_P.
(pph_streamer_init): Declare.
(pph_streamer_finish): Declare.
(pph_mark_stream_read): Declare.
(pph_add_include): Declare.
(pph_writer_add_include): Declare.
* pph.c (pph_include_handler): Call pph_writer_add_include.
(pph_init): Call pph_streamer_init.
(pph_finish): Call pph_streamer_finish.

testsuite/ChangeLog.pph

* g++.dg/pph/d1symnotinc.cc: Change expected error.
* g++.dg/pph/x7dynarray6.cc: Likewise.
* g++.dg/pph/x7dynarray7.cc: Likewise.
* g++.dg/pph/x5dynarray7.h: Mark fixed.
* g++.dg/pph/x6dynarray6.h: Mark fixed.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index fbd78d0..ffa1433 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -47,12 +47,6 @@ DEF_VEC_ALLOC_P(char_p,heap);
   memory will remain allocated until the end of compilation.  */
 static VEC(char_p,heap) *string_tables = NULL;
 
-/* Increment when we are in the process of reading includes as we do not want
-   to add those to the parent pph stream's list of includes to be written out.
-   Decrement when done. We cannot use a simple true/false flag as read includes
-   will call pph_in_includes as well.  */