Re: [pph] Make libcpp symbol validation a warning (issue5235061)
On 10/20/11, Gabriel Charette gcharet...@gmail.com wrote: I just thought about something.. Earlier I said that ALL line_table issues were resolved after this patch (as it ignores the re-included headers that were guarded, as the non-pph compiler does naturally). One problem remains however, I'm pretty sure that re-included non-pph'ed header's line_table entries are still showing up multiple times (as the direct non-pph childs of a given pph_include have their line_table entries copied one by one from the pph file). I think we were talking about somehow remembering guards context in which DECLs were declared and then ignoring DECLs streamed in if they belong to a given header guard type that was previously seen in a prior include using the same non-pph header, allowing us to ignore those DECLs that are re-declared when they should have been guarded out the second time. I'm not sure whether there is machinery to handle non-pph re-includes yet... but... at the very least, I'm pretty sure those non-pph entries still show up multiple times in the line_table. Now, we can't just remove/ignore those entries either... doing so would alter the expected location offset (pph_loc_offset) applied to all tokens streamed in directly from the pph header. What we could potentially do is: - ignore the repeated non-pph entry - remember the number of locations this entry was supposed to take (call that pph_loc_ignored_offset) - then for DECLs imported after it we would then need an offset of pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing entries in the line_table The problem here obviously is that I don't think we have a way of knowing which DECLs come before, inside, and after a given non-pph header included in the parent pph header which we are currently reading. Furthermore, a DECL coming after the non-pph header could potentially refer to something inside the ignored non-pph header and the source_location of the referred token would now be invalid (although that might already be fixed by the cache hit which would redirect that token reference to the same token in the first included copy of that same header which wasn't actually skipped as it was first and which is valid) On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote: @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream) int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES; enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream); - pph_reading_includes++; - for (first = true; next_lt_marker != PPH_LINETABLE_END; next_lt_marker = pph_in_linetable_marker (stream)) { @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream) else lm-included_from += entries_offset; Also, if we do ignore some non-pph entries, the included_from calculation is going to need some trickier logic as well (it's fine for the pph includes though as each child calculates it's own offset) - gcc_assert (lm-included_from (int) line_table-used); - Also, I think this slipped in my previous comment, but I don't see how this assert could trigger in the current code. If it did trigger something was definitely wrong as it asserts the offseted included_from is referring to an entry that is actually in the line_table... lm-start_location += pph_loc_offset; I'm wondering if we shouldn't just whitelist the problematic cases that we know about in the system/standard headers. It seems that all others we could reasonably complain to the maintainers of the code. -- Lawrence Crowl
Re: [pph] Make libcpp symbol validation a warning (issue5235061)
I just thought about something.. Earlier I said that ALL line_table issues were resolved after this patch (as it ignores the re-included headers that were guarded, as the non-pph compiler does naturally). One problem remains however, I'm pretty sure that re-included non-pph'ed header's line_table entries are still showing up multiple times (as the direct non-pph childs of a given pph_include have their line_table entries copied one by one from the pph file). I think we were talking about somehow remembering guards context in which DECLs were declared and then ignoring DECLs streamed in if they belong to a given header guard type that was previously seen in a prior include using the same non-pph header, allowing us to ignore those DECLs that are re-declared when they should have been guarded out the second time. I'm not sure whether there is machinery to handle non-pph re-includes yet... but... at the very least, I'm pretty sure those non-pph entries still show up multiple times in the line_table. Now, we can't just remove/ignore those entries either... doing so would alter the expected location offset (pph_loc_offset) applied to all tokens streamed in directly from the pph header. What we could potentially do is: - ignore the repeated non-pph entry - remember the number of locations this entry was supposed to take (call that pph_loc_ignored_offset) - then for DECLs imported after it we would then need an offset of pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing entries in the line_table The problem here obviously is that I don't think we have a way of knowing which DECLs come before, inside, and after a given non-pph header included in the parent pph header which we are currently reading. Furthermore, a DECL coming after the non-pph header could potentially refer to something inside the ignored non-pph header and the source_location of the referred token would now be invalid (although that might already be fixed by the cache hit which would redirect that token reference to the same token in the first included copy of that same header which wasn't actually skipped as it was first and which is valid) On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote: @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream) int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES; enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream); - pph_reading_includes++; - for (first = true; next_lt_marker != PPH_LINETABLE_END; next_lt_marker = pph_in_linetable_marker (stream)) { @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream) else lm-included_from += entries_offset; Also, if we do ignore some non-pph entries, the included_from calculation is going to need some trickier logic as well (it's fine for the pph includes though as each child calculates it's own offset) - gcc_assert (lm-included_from (int) line_table-used); - Also, I think this slipped in my previous comment, but I don't see how this assert could trigger in the current code. If it did trigger something was definitely wrong as it asserts the offseted included_from is referring to an entry that is actually in the line_table... lm-start_location += pph_loc_offset; Cheers, Gab -- This patch is available for review at http://codereview.appspot.com/5235061
Re: [pph] Make libcpp symbol validation a warning (issue5235061)
On Fri, Oct 14, 2011 at 20:27, Gabriel Charette gcharet...@gmail.com wrote: Yes, I understand that. But when the second 2.pph is skipped when reading foo.pph, the reading of its line_table is also skipped (as foo.pph doesn't contain the line_table information for 2.h, 2.pph does and adds it when its included as a child, but if it's skipped, the line_table info for 2.h should never make it in the line_table), so I don't see why this is an issue for the line_table (other than the assert about the number of line table entries read). What I was suggesting is that as far as the assert is concerned it would be stronger to count the number of skipped child headers on read and assert num_read+num_skipped == num_expected_childs basically (it is still only an assert so no big deal I guess). Ah, I see what you are saying. I didn't really bother too much with that assert. Since I was not reading the line table again, I figured both asserts were triggering because of the different values coming from the skipped file, so I left them out. Essentially this patch fixes the last bug we had in the line_table merging (i.e. that guarded out headers in the non-pph version weren't guarded out in the pph version) and this is a good thing. I'm just being picky about weakening asserts! I still think it would be nice to have a way to test constructs like the line_table at the end of parsing (maybe a new flag, as I was suggesting in my previous email, as gcc doesn't allow for modular testing) and compare pph and non-pph versions. Testing at this level would potentially be much better than trying to understand tricky test failures from the ground up. This is beyond the scope of this patch of course, but something to keep in mind I think. Yeah. I'll come back to it at a later point. Diego.
Re: [pph] Make libcpp symbol validation a warning (issue5235061)
On 11-10-13 17:55 , Gabriel Charette wrote: I'm not sure exactly how you skip headers already parsed now (we didn't used to when I wrote this code and that was the only problem remaining in the line_table (i.e. duplicate entries for guarded headers in the non-pph compile)), but couldn't you count the number of skipped entries and assert (line_table-used - used_before) + numSkipped == expected_in) ? The problem is that the compilation process of foo.h - foo.pph may generate different line tables than a compile that includes foo.pph. For instance, foo.h: #include 1.pph #include 2.pph #include 3.pph foo.cc: #include 2.pph #include foo.pph When we compile foo.h, the line table incorporates the effects of including 2.pph, and that's what we save to foo.pph. However, when compiling foo.cc, the first thing we do is include 2.pph, so when processing the include for foo.pph, we will completely skip over 2.pph. That's why we cannot really have the same line table that we had when we generated foo.pph. Diego.
Re: [pph] Make libcpp symbol validation a warning (issue5235061)
Yes, I understand that. But when the second 2.pph is skipped when reading foo.pph, the reading of its line_table is also skipped (as foo.pph doesn't contain the line_table information for 2.h, 2.pph does and adds it when its included as a child, but if it's skipped, the line_table info for 2.h should never make it in the line_table), so I don't see why this is an issue for the line_table (other than the assert about the number of line table entries read). What I was suggesting is that as far as the assert is concerned it would be stronger to count the number of skipped child headers on read and assert num_read+num_skipped == num_expected_childs basically (it is still only an assert so no big deal I guess). Essentially this patch fixes the last bug we had in the line_table merging (i.e. that guarded out headers in the non-pph version weren't guarded out in the pph version) and this is a good thing. I'm just being picky about weakening asserts! I still think it would be nice to have a way to test constructs like the line_table at the end of parsing (maybe a new flag, as I was suggesting in my previous email, as gcc doesn't allow for modular testing) and compare pph and non-pph versions. Testing at this level would potentially be much better than trying to understand tricky test failures from the ground up. This is beyond the scope of this patch of course, but something to keep in mind I think. Gab On Fri, Oct 14, 2011 at 8:16 AM, Diego Novillo dnovi...@google.com wrote: On 11-10-13 17:55 , Gabriel Charette wrote: I'm not sure exactly how you skip headers already parsed now (we didn't used to when I wrote this code and that was the only problem remaining in the line_table (i.e. duplicate entries for guarded headers in the non-pph compile)), but couldn't you count the number of skipped entries and assert (line_table-used - used_before) + numSkipped == expected_in) ? The problem is that the compilation process of foo.h - foo.pph may generate different line tables than a compile that includes foo.pph. For instance, foo.h: #include 1.pph #include 2.pph #include 3.pph foo.cc: #include 2.pph #include foo.pph When we compile foo.h, the line table incorporates the effects of including 2.pph, and that's what we save to foo.pph. However, when compiling foo.cc, the first thing we do is include 2.pph, so when processing the include for foo.pph, we will completely skip over 2.pph. That's why we cannot really have the same line table that we had when we generated foo.pph. Diego.
Re: [pph] Make libcpp symbol validation a warning (issue5235061)
Just looked at the line_table related sections, but see comments below: On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote: Currently, the consistency check done on pre-processor symbols is triggering on symbols that are not really problematic (e.g., symbols used for double-include guards). The problem is that in the testsuite, we are refusing to process PPH images that fail that test, which means we don't get to test other issues. To avoid this, I changed the error() call to warning(). Seemed innocent enough, but there were more problems behind that one: 1- We do not really try to avoid reading PPH images more than once. This problem is different than the usual double-inclusion guard. For instance, suppose a file foo.pph includes 1.pph, 2.pph and 3.pph. When generating foo.pph, we read all 3 files just once and double-include guards do not need to trigger. However, if we are later building a TU with: #include 2.pph #include foo.pph we first read 2.pph and when reading foo.pph, we try to read 2.pph again, because it is mentioned in foo.pph's line map table. I added a guard in pph_stream_open() so it doesn't try to open the same file more than once, but that meant adjusting some of the assertions while reading the line table. We should not expect to find foo.pph's line map table exactly like the one we wrote. That makes sense. @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream) int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES; enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream); - pph_reading_includes++; - for (first = true; next_lt_marker != PPH_LINETABLE_END; next_lt_marker = pph_in_linetable_marker (stream)) { @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream) else lm-included_from += entries_offset; - gcc_assert (lm-included_from (int) line_table-used); - This should still hold, it is impossible that included_from points to an entry that doesn't exist (i.e. beyond line_table-used), but since we recalculate it on the previous line, adding entries_offset, this was just a safe check to make sure everything read makes sense. lm-start_location += pph_loc_offset; line_table-used++; } } - pph_reading_includes--; + /* We used to expect exactly the same number of entries, but files + included from this PPH file may sometimes not be needed. For + example, + + #include 2.pph + #include foo.pph + +-- #include 1.pph + #include 2.pph + #include 3.pph + + When foo.pph was originally created, the line table was built + with inclusions of 1.pph, 2.pph and 3.pph. But when compiling + the main translation unit, we include 2.pph before foo.pph, so + the inclusion of 2.pph from foo.pph does nothing. Leaving the + line table in a different shape than the original compilation. + Instead of insisting on getting EXPECTED_IN entries, we expect at + most EXPECTED_IN entries. */ { unsigned int expected_in = pph_in_uint (stream); - gcc_assert (line_table-used - used_before == expected_in); + gcc_assert (line_table-used - used_before = expected_in); I'm not sure exactly how you skip headers already parsed now (we didn't used to when I wrote this code and that was the only problem remaining in the line_table (i.e. duplicate entries for guarded headers in the non-pph compile)), but couldn't you count the number of skipped entries and assert (line_table-used - used_before) + numSkipped == expected_in) ? I'd have to re-download the code, I've bee following through patches, but I'm not so sure now exactly how the guarded headers skipping is done, my memorized knowledge of the codebase has diverged I feel..! A more important note: I think it could be worth having a new flag that outputs the line_table when done parsing (as a mean to robustly test it). My way to test it usually was to breakpoint on varpool_assemble_decl (a random choice, but it was only called after parsing was done...), both in pph and non-pph compiles and compare the line_table in gdb However, to have a stable test in the long run, it could be nice to have a flag that asks for an output of the line_table and then we could checksum and compare the line_table outputted by the pph and non-pph compiles. A good test I had found to break in and analyze the line_table was p4eabi.h as it pretty much had all the problems that I fixed regarding the line_table (it also has re-includes if I remember correctly, but that wasn't a problem before as we would not guard out re-includes as I just mentioned above). Having such a robust test would be important I feel as, as we saw with previous bugs, discrepancies in the line_table can result in tricky unique ID diffs. If you are
[pph] Make libcpp symbol validation a warning (issue5235061)
Currently, the consistency check done on pre-processor symbols is triggering on symbols that are not really problematic (e.g., symbols used for double-include guards). The problem is that in the testsuite, we are refusing to process PPH images that fail that test, which means we don't get to test other issues. To avoid this, I changed the error() call to warning(). Seemed innocent enough, but there were more problems behind that one: 1- We do not really try to avoid reading PPH images more than once. This problem is different than the usual double-inclusion guard. For instance, suppose a file foo.pph includes 1.pph, 2.pph and 3.pph. When generating foo.pph, we read all 3 files just once and double-include guards do not need to trigger. However, if we are later building a TU with: #include 2.pph #include foo.pph we first read 2.pph and when reading foo.pph, we try to read 2.pph again, because it is mentioned in foo.pph's line map table. I added a guard in pph_stream_open() so it doesn't try to open the same file more than once, but that meant adjusting some of the assertions while reading the line table. We should not expect to find foo.pph's line map table exactly like the one we wrote. 2- We cannot keep a global list of included files to consult external caches. In the example above, if foo.pph needs to resolve an external reference to 2.pph, it needs to index into the 2nd slot in its include vector. However, the file including foo.pph needs to index into the 1st slot in its include vector (since 2.pph is included first). This meant moving the includes field inside struct pph_stream. 3- When reading a type, we should not try to access the method vector for it, unless the type is a class. There's some more consequences of this, but the patch was starting to become too big, so I'm submitting it now. This fixes a couple of files and changes the expected error on another two. I'll be fixing them separately. Tested on x86_64. Committed to branch. Diego. * pph-streamer-in.c (pph_reading_includes): Remove. Update all users. (pph_in_include): Call pph_add_include with the newly materialized stream. (pph_in_line_table_and_includes): Document differences between non-PPH compiles and PPH compiles wrt line table behaviour. Modify assertions accordingly. (pph_read_tree_header): Tidy. (report_validation_error): Change error() call to warning(). (pph_image_already_read): Remove. Update all users. (pph_read_file_1): If STREAM-IN_MEMORY_P is set, return. Call pph_mark_strea_read. (pph_add_read_image): Remove. (pph_read_file): Change return type to pph_stream *. Update all users. (pph_reader_finish): Remove. * pph-streamer-out.c (pph_writer_init): Tidy. (pph_add_include): Remove. (pph_get_marker_for): Always consult the pre-loaded cache first. (pph_writer_add_include): New. * pph-streamer.c (pph_read_images): Make static. (pph_init_preloaded_cache): Make static. (pph_streamer_init): New. (pph_streamer_finish): New. (pph_find_stream_for): New. (pph_mark_stream_read): New. (pph_stream_open): Call pph_find_stream_for. If the stream already exists, return it. (pph_add_include): Move from pph-streamer-in.c. Add new argument STREAM. (pph_cache_lookup_in_includes): Add new argument STREAM. Update all users. * pph-streamer.h (pph_read_images): Remove extern declaration. Move field INCLUDES out of union W. Update all users. Add field IN_MEMORY_P. (pph_streamer_init): Declare. (pph_streamer_finish): Declare. (pph_mark_stream_read): Declare. (pph_add_include): Declare. (pph_writer_add_include): Declare. * pph.c (pph_include_handler): Call pph_writer_add_include. (pph_init): Call pph_streamer_init. (pph_finish): Call pph_streamer_finish. testsuite/ChangeLog.pph * g++.dg/pph/d1symnotinc.cc: Change expected error. * g++.dg/pph/x7dynarray6.cc: Likewise. * g++.dg/pph/x7dynarray7.cc: Likewise. * g++.dg/pph/x5dynarray7.h: Mark fixed. * g++.dg/pph/x6dynarray6.h: Mark fixed. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index fbd78d0..ffa1433 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -47,12 +47,6 @@ DEF_VEC_ALLOC_P(char_p,heap); memory will remain allocated until the end of compilation. */ static VEC(char_p,heap) *string_tables = NULL; -/* Increment when we are in the process of reading includes as we do not want - to add those to the parent pph stream's list of includes to be written out. - Decrement when done. We cannot use a simple true/false flag as read includes - will call pph_in_includes as well. */