[PATCH] libcpp: Fix ICE on #include after a line marker directive [PR61474]

2023-09-15 Thread Lewis Hyatt via Gcc-patches
Hello-

This fixes an old PR, bootstrap + regtest on x86-64 Linux. Please let me know 
if it's ok? Thanks!

-Lewis

-- >8 --

As noted in the PR, GCC will segfault if a file name is first seen in a
linemarker directive, and then later seen in a normal #include.  This is
because the fake include process adds the file to the cache with a null PATH
member. The normal #include finds this file in the cache and then attempts
to use the null PATH.  Resolve by adding the file to the cache with a unique
starting directory, so that the fake entry will only be found by a
subsequent fake include, not by a real one.

libcpp/ChangeLog:

PR preprocessor/61474
* files.cc (_cpp_find_file): Set DONT_READ to TRUE for fake
include files.
(_cpp_fake_include): Pass a unique cpp_dir* address so
the fake file will not be found when looked up for real.

gcc/testsuite/ChangeLog:

PR preprocessor/61474
* c-c++-common/cpp/pr61474-2.h: New test.
* c-c++-common/cpp/pr61474.c: New test.
* c-c++-common/cpp/pr61474.h: New test.
---
 libcpp/files.cc| 11 +--
 gcc/testsuite/c-c++-common/cpp/pr61474-2.h |  1 +
 gcc/testsuite/c-c++-common/cpp/pr61474.c   |  5 +
 gcc/testsuite/c-c++-common/cpp/pr61474.h   |  6 ++
 4 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr61474-2.h
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr61474.c
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr61474.h

diff --git a/libcpp/files.cc b/libcpp/files.cc
index 43a8894b7de..27301d79fa4 100644
--- a/libcpp/files.cc
+++ b/libcpp/files.cc
@@ -541,7 +541,9 @@ _cpp_find_file (cpp_reader *pfile, const char *fname, 
cpp_dir *start_dir,
 = (kind == _cpp_FFK_PRE_INCLUDE
|| (pfile->buffer && pfile->buffer->file->implicit_preinclude));
 
-  if (kind != _cpp_FFK_FAKE)
+  if (kind == _cpp_FFK_FAKE)
+file->dont_read = true;
+  else
 /* Try each path in the include chain.  */
 for (;;)
   {
@@ -1490,7 +1492,12 @@ cpp_clear_file_cache (cpp_reader *pfile)
 void
 _cpp_fake_include (cpp_reader *pfile, const char *fname)
 {
-  _cpp_find_file (pfile, fname, pfile->buffer->file->dir, 0, _cpp_FFK_FAKE, 0);
+  /* It does not matter what are the contents of fake_source_dir, it will never
+ be inspected; we just use its address to uniquely signify that this file
+ was added as a fake include, so a later call to _cpp_find_file (to include
+ the file for real) won't find the fake one in the hash table.  */
+  static cpp_dir fake_source_dir;
+  _cpp_find_file (pfile, fname, _source_dir, 0, _cpp_FFK_FAKE, 0);
 }
 
 /* Not everyone who wants to set system-header-ness on a buffer can
diff --git a/gcc/testsuite/c-c++-common/cpp/pr61474-2.h 
b/gcc/testsuite/c-c++-common/cpp/pr61474-2.h
new file mode 100644
index 000..6f70f09beec
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr61474-2.h
@@ -0,0 +1 @@
+#pragma once
diff --git a/gcc/testsuite/c-c++-common/cpp/pr61474.c 
b/gcc/testsuite/c-c++-common/cpp/pr61474.c
new file mode 100644
index 000..f835a40fc7a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr61474.c
@@ -0,0 +1,5 @@
+/* { dg-do preprocess } */
+#include "pr61474.h"
+/* Make sure that the file can be included for real, after it was
+   fake-included by the linemarker directives in pr61474.h.  */
+#include "pr61474-2.h"
diff --git a/gcc/testsuite/c-c++-common/cpp/pr61474.h 
b/gcc/testsuite/c-c++-common/cpp/pr61474.h
new file mode 100644
index 000..d9e8c3a1fec
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr61474.h
@@ -0,0 +1,6 @@
+/* Create a fake include for pr61474-2.h and exercise looking it up.  */
+/* Use #pragma once to check also that the fake-include entry in the file
+   cache does not cause a problem in libcpp/files.cc:has_unique_contents().  */
+#pragma once
+# 1 "pr61474-2.h" 1
+# 2 "pr61474-2.h" 1


Re: [PATCH] preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

2023-09-12 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 8, 2023 at 5:53 PM Jason Merrill  wrote:
>
> On 7/31/23 22:22, Lewis Hyatt via Gcc-patches wrote:
> > `#pragma GCC target' is not currently handled in preprocess-only mode (e.g.,
> > when running gcc -E or gcc -save-temps). As noted in the PR, this means that
> > if the target pragma defines any macros, those macros are not effective in
> > preprocess-only mode. Similarly, such macros are not effective when
> > compiling with C++ (even when compiling without -save-temps), because C++
> > does not process the pragma until after all tokens have been obtained from
> > libcpp, at which point it is too late for macro expansion to take place.
> >
> > Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas
> > under these conditions as well, so resolve the PR by using the new "early
> > pragma" support.
> >
> > toplev.cc required some changes because the target-specific handlers for
> > `#pragma GCC target' may call target_reinit(), and toplev.cc was not 
> > expecting
> > that function to be called in preprocess-only mode.
> >
> > I added some additional testcases from the PR for x86. The other targets
> > that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
> > already had tests verifying that the pragma sets macros as expected; here I
> > have added -save-temps to some of them, to test that it now works in
> > preprocess-only mode as well.
> >
> > gcc/c-family/ChangeLog:
> >
> >   PR preprocessor/87299
> >   * c-pragma.cc (init_pragma): Register `#pragma GCC target' and
> >   related pragmas in preprocess-only mode, and enable early handling.
> >   (c_reset_target_pragmas): New function refactoring code from...
> >   (handle_pragma_reset_options): ...here.
> >   * c-pragma.h (c_reset_target_pragmas): Declare.
> >
> > gcc/cp/ChangeLog:
> >
> >   PR preprocessor/87299
> >   * parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas ()
> >   after preprocessing is complete, before starting compilation.
> >
> > gcc/ChangeLog:
> >
> >   PR preprocessor/87299
> >   * toplev.cc (no_backend): New static global.
> >   (finalize): Remove argument no_backend, which is now a
> >   static global.
> >   (process_options): Likewise.
> >   (do_compile): Likewise.
> >   (target_reinit): Don't do anything in preprocess-only mode.
> >   (toplev::main): Adapt to no_backend change.
> >   (toplev::finalize): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR preprocessor/87299
> >   * c-c++-common/pragma-target-1.c: New test.
> >   * c-c++-common/pragma-target-2.c: New test.
> >   * g++.target/i386/pr87299-1.C: New test.
> >   * g++.target/i386/pr87299-2.C: New test.
> >   * gcc.target/i386/pr87299-1.c: New test.
> >   * gcc.target/i386/pr87299-2.c: New test.
> >   * gcc.target/s390/target-attribute/tattr-2.c: Add -save-temps to the
> >   options, to test preprocess-only mode as well.
> >   * gcc.target/aarch64/pragma_cpp_predefs_1.c: Likewise.
> >   * gcc.target/arm/pragma_arch_attribute.c: Likewise.
> >   * gcc.target/nios2/custom-fp-2.c: Likewise.
> >   * gcc.target/powerpc/float128-3.c: Likewise.
> > ---
> >
> > Notes:
> >  Hello-
> >
> >  This patch fixes the PR by enabling early pragma handling for `#pragma 
> > GCC
> >  target' and related pragmas such as `#pragma GCC push_options'. I did 
> > not
> >  need to touch any target-specific code, however I did need to make a 
> > change
> >  to toplev.cc, affecting all targets, to make it safe to call 
> > target_reinit()
> >  in preprocess-only mode. (Otherwise, it would be necessary to modify 
> > the
> >  implementation of target pragmas in every target, to avoid this code 
> > path.)
> >  That was the only complication I ran into.
> >
> >  Regarding testing, I did: (thanks to GCC compile farm for the non-x86
> >  targets)
> >
> >  bootstrap + regtest all languages - x86_64-pc-linux-gnu
> >  bootstrap + regtest c/c++ - powerpc64le-unknown-linux-gnu,
> >  aarch64-unknown-linux-gnu
> >
> >  The following backends also implement this pragma so ought to be 
> > tested:
> >  arm
> >  nios2
> >  s390
> >
> >  I am not able to test those directly. I did add coverage to their 
> > testsuites
> &

Ping: [PATCH] testsuite: Add test for already-fixed issue with _Pragma expansion [PR90400]

2023-09-08 Thread Lewis Hyatt via Gcc-patches
Hello-

May I please ping this one? It's adding a testcase prior to closing
the PR. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628488.html

-Lewis

On Fri, Aug 25, 2023 at 4:46 PM Lewis Hyatt  wrote:
>
> Hello-
>
> This is adding a testcase for a PR that was already incidentally fixed. OK
> to commit please? Thanks...
>
> -Lewis
>
> -- >8 --
>
> The PR was fixed by r12-5454. Since the fix was somewhat incidental,
> although related, add a testcase from PR90400 too before closing it out.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/90400
> * c-c++-common/cpp/pr90400.c: New test.
> ---
>  gcc/testsuite/c-c++-common/cpp/pr90400.c | 14 ++
>  1 file changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/cpp/pr90400.c
>
> diff --git a/gcc/testsuite/c-c++-common/cpp/pr90400.c 
> b/gcc/testsuite/c-c++-common/cpp/pr90400.c
> new file mode 100644
> index 000..4f2cab8d6ab
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/cpp/pr90400.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-save-temps" } */
> +/* PR preprocessor/90400 */
> +
> +#define OUTER(x) x
> +#define FOR(x) _Pragma ("GCC unroll 0") for (x)
> +void f ()
> +{
> +/* If the pragma were to be seen prior to the expansion of FOR, as was
> +   the case before r12-5454, then the unroll pragma would complain
> +   because the immediately following statement would be ";" rather than
> +   a loop.  */
> +OUTER (; FOR (int i = 0; i != 1; ++i);) /* { dg-bogus {statement 
> expected before ';' token} } */
> +}


[PATCH] testsuite: Add test for already-fixed issue with _Pragma expansion [PR90400]

2023-08-25 Thread Lewis Hyatt via Gcc-patches
Hello-

This is adding a testcase for a PR that was already incidentally fixed. OK
to commit please? Thanks...

-Lewis

-- >8 --

The PR was fixed by r12-5454. Since the fix was somewhat incidental,
although related, add a testcase from PR90400 too before closing it out.

gcc/testsuite/ChangeLog:

PR preprocessor/90400
* c-c++-common/cpp/pr90400.c: New test.
---
 gcc/testsuite/c-c++-common/cpp/pr90400.c | 14 ++
 1 file changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr90400.c

diff --git a/gcc/testsuite/c-c++-common/cpp/pr90400.c 
b/gcc/testsuite/c-c++-common/cpp/pr90400.c
new file mode 100644
index 000..4f2cab8d6ab
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr90400.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-save-temps" } */
+/* PR preprocessor/90400 */
+
+#define OUTER(x) x
+#define FOR(x) _Pragma ("GCC unroll 0") for (x)
+void f ()
+{
+/* If the pragma were to be seen prior to the expansion of FOR, as was
+   the case before r12-5454, then the unroll pragma would complain
+   because the immediately following statement would be ";" rather than
+   a loop.  */
+OUTER (; FOR (int i = 0; i != 1; ++i);) /* { dg-bogus {statement expected 
before ';' token} } */
+}


Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-23 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 03:39:40PM -0400, David Malcolm wrote:
> On Tue, 2023-08-15 at 13:58 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > Class file_cache_slot in input.cc is used to query specific lines
> > > > of source
> > > > code from a file when needed by diagnostics infrastructure. This
> > > > will be
> > > > extended in a subsequent patch to support obtaining the source
> > > > code from
> > > > in-memory generated buffers rather than from a file. The present
> > > > patch
> > > > refactors class file_cache_slot, putting most of the logic into a
> > > > new base
> > > > class cache_data_source, in preparation for reusing that code in
> > > > the next
> > > > patch. There is no change in functionality yet.
> > > > 
> 
> [...snip...]
> 
> > > 
> > > I confess I had to reread both this and patch 4/8 to make sense of
> > > this; this is probably one of those cases where it's harder to read
> > > in
> > > patch form than as source, but I think I now understand the new
> > > implementation.
> > 
> > Yes, sorry about that. I hope at least splitting into two patches
> > here made it
> > a little easier.
> > 
> > > 
> > > Did you try testing this with valgrind (e.g. "make selftest-
> > > valgrind")?
> > > 
> > 
> > Oh interesting, was not aware of this. I think it shows that new
> > leaks were
> > not introduced with the patch series.
> > 
> 
> [...snip...]
> 
> > 
> > 
> > > I don't think we have any selftest coverage for "\r" in the line-
> > > break
> > > handling; that would be good to add.
> > > 
> > > This patch is OK for trunk once the rest of the kit is approved.
> > 
> > Thank you. To be clear, were you suggesting to add selftest coverage
> > for \r
> > endings now, or in a follow up?
> 
> The former, please, so that we can sure that the patch doesn't
> introduce any buffer overreads etc.
> 
> Thanks
> Dave
>

The following (incremental to patch 5/8 or after) adds selftest coverage for
alternate line endings. I hope things aren't too unclear this way; I can
resend updated versions of some or all of the patches from scratch, if useful.

AFAIK this is the current status of things:

Patch 1/8: Reviewed, updated version incorporating feedback has not been acked
yet, at: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627250.html

Patch 2/8: OKed, pending tweak to reject fixit hints in generated data, which
was sent incrementally here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627405.html

Patch 3/8: OKed, pending new selftest attached to this email.

Patch 4/8: OKed, pending tweak to assert on non-NULL buffers which was sent
incrementally here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628283.html

Patch 5/8: OKed

Patch 6/8: OKed

Patch 7/8: Not reviewed yet

Patch 8/8: Waiting additional feedback from you, perhaps SARIF need not worry
about this for now and should just ignore generated data locations.

Thanks again for taking the time to go through this, I hope it will prove
worth it.

-Lewis

-- >8 --

gcc/ChangeLog:

* input.cc (test_reading_source_line): Test additional cases,
including generated data and alternate line endings.
(input_cc_tests): Adapt to test_reading_source_line() changes.

diff --git a/gcc/input.cc b/gcc/input.cc
index 4c99df7a205..72274732c6c 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2392,30 +2392,51 @@ test_make_location_nonpure_range_endpoints (const 
line_table_case _)
 /* Verify reading of input files (e.g. for caret-based diagnostics).  */
 
 static void
-test_reading_source_line ()
+test_reading_source_line (bool generated, const char *e1, const char *e2)
 {
   /* Create a tempfile and write some text to it.  */
+  const char *line1 = "01234567890123456789";
+  const char *line2 = "This is the test text";
+  const char *line3 = "This is the 3rd line";
+  char content[72];
+  const int content_len = snprintf (content, sizeof (content),
+   "%s%s%s%s%s",
+   line1, e1, line2, e2, line3);
+  ASSERT_LT (content_len, (int)sizeof (content));
   temp_source_file tmp (SELFTEST_LOCATION, ".txt",
-   "01234567890123456789\n"
-   "This is the test text\n"
-   "This is the 3rd line");
+   content, content_len, generated);
 
-  /* Read back a specific line from the tempfile.  */
-  char_span source_line = location_get_source_line (tmp.get_filename (), 3);
+  /* Read back some specific lines from the tempfile, not all in order.  */
+  const source_id src = generated
+? source_id (tmp.content_buf, tmp.content_len)
+: source_id (tmp.get_filename ());
+
+  char_span source_line = location_get_source_line (src, 1);
+  ASSERT_TRUE (source_line);
+  ASSERT_TRUE (source_line.get_buffer () != NULL);
+  /* N.B. If the line terminator is \r\n, the returned char_span will 

Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-23 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 04:08:47PM -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 3:46 PM David Malcolm  wrote:
> >
> > On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > > This patch enhances location_get_source_line(), which is the
> > > > > primary
> > > > > interface provided by the diagnostics infrastructure to obtain
> > > > > the line of
> > > > > source code corresponding to a given location, so that it
> > > > > understands
> > > > > generated data locations in addition to normal file-based
> > > > > locations. This
> > > > > involves changing the argument to location_get_source_line() from
> > > > > a plain
> > > > > file name, to a source_id object that can represent either type
> > > > > of location.
> > > > >
> >
> > [...]
> >
> > > > >
> > > > >
> > > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > > index 9377020b460..790279d4273 100644
> > > > > --- a/gcc/input.cc
> > > > > +++ b/gcc/input.cc
> > > > > @@ -207,6 +207,28 @@ private:
> > > > >void maybe_grow ();
> > > > >  };
> > > > >
> > > > > +/* This is the implementation of cache_data_source for generated
> > > > > +   data that is already in memory.  */
> > > > > +class data_cache_slot final : public cache_data_source
> > > >
> > > > It occurred to me: why are we caching accessing a buffer that's
> > > > already
> > > > in memory - but we're also caching the line-splitting information,
> > > > and
> > > > providing the line-splitting algorithm with a consistent interface
> > > > to
> > > > the data, right?
> > > >
> > >
> > > Yeah, for the current _Pragma use case, multi-line buffers are not
> > > going to
> > > be common, but they can occur. I was mainly motivated by the
> > > consistent
> > > interface, and by the assumption that the overhead is not critical
> > > given a
> > > diagnostic is being issued.
> >
> > (nods)
> >
> > >
> > > > [...snip...]
> > > >
> > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > > (const char *file_path)
> > > > >global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > > >  }
> > > > >
> > > > > +void
> > > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > > +   unsigned int
> > > > > data_len)
> > > > > +{
> > > > > +  if (!global_dc->m_file_cache)
> > > > > +return;
> > > > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > > >
> > > > Maybe we should rename diagnostic_context's m_file_cache to
> > > > m_source_cache?  (and class file_cache for that matter?)  But if
> > > > so,
> > > > that can/should be a followup/separate patch.
> > > >
> > >
> > > Yes, we should. Believe it or not, I was trying to minimize the size
> > > of the
> > > patch :)
> >
> > :)
> >
> > Thanks for splitting it up, BTW.
> >
> > [...]
> >
> >
> > > >
> > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > > line_num,
> > > > > If the function fails, a NULL char_span is returned.  */
> > > > >
> > > > >  char_span
> > > > > -location_get_source_line (const char *file_path, int line)
> > > > > +location_get_source_line (source_id src, int line)
> > > > >  {
> > > > > -  const char *buffer = NULL;
> > > > > -  ssize_t len;
> > > > > -
> > > > > -  if (line == 0)
> > > > > -return char_span (NULL, 0);
> > > > > -
> > > > > -  if (file_path == NULL)
> > > > > -return char_span (NULL, 0);
> > > > > +  const char_span fail (nullptr, 0);
> > > > > +  if (!src || line <= 0)
> > > > > +return fail;
> > > >
> > > > Looking at source_id's operator bool, are there effectively three
> > > > kinds
> > > > of source_id?
> > > >
> > > > (a) file names
> > > > (b) generated buffer
> > > > (c) NULL == m_filename_or_buffer
> > > >
> > > > What does (c) mean?  Is it a "something's gone wrong/error" state?
> > > > Or
> > > > is this more a special-case of (a)? (in that the m_len for such a
> > > > case
> > > > would be zero)
> > > >
> > > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > > NULL?
> > > >
> > > > [...snip...]
> > > >
> > > > The patch is OK for trunk as-is, but note the question about the
> > > > source_id ctor above.
> > > >
> > >
> > > Thanks. (c) has the same meaning as a NULL file name currently does,
> > > so a
> > > default-constructed source_id is not an in-memory buffer, but is
> > > rather a
> > > NULL filename. linemap_add() for instance, will interpret a NULL
> > > filename
> > > for an LC_LEAVE map, as a request to copy it from the natural values
> > > being
> > > returned to. I think the source_id constructor needs to accept a NULL
> > > filename to remain backwards compatible. With the current design of
> > > source_id, it is safe always to change a 'const char*' file name
> > > argument to
> > > a source_id argument instead; it will work just how it did before
> > > 

Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 3:46 PM David Malcolm  wrote:
>
> On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > This patch enhances location_get_source_line(), which is the
> > > > primary
> > > > interface provided by the diagnostics infrastructure to obtain
> > > > the line of
> > > > source code corresponding to a given location, so that it
> > > > understands
> > > > generated data locations in addition to normal file-based
> > > > locations. This
> > > > involves changing the argument to location_get_source_line() from
> > > > a plain
> > > > file name, to a source_id object that can represent either type
> > > > of location.
> > > >
>
> [...]
>
> > > >
> > > >
> > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > index 9377020b460..790279d4273 100644
> > > > --- a/gcc/input.cc
> > > > +++ b/gcc/input.cc
> > > > @@ -207,6 +207,28 @@ private:
> > > >void maybe_grow ();
> > > >  };
> > > >
> > > > +/* This is the implementation of cache_data_source for generated
> > > > +   data that is already in memory.  */
> > > > +class data_cache_slot final : public cache_data_source
> > >
> > > It occurred to me: why are we caching accessing a buffer that's
> > > already
> > > in memory - but we're also caching the line-splitting information,
> > > and
> > > providing the line-splitting algorithm with a consistent interface
> > > to
> > > the data, right?
> > >
> >
> > Yeah, for the current _Pragma use case, multi-line buffers are not
> > going to
> > be common, but they can occur. I was mainly motivated by the
> > consistent
> > interface, and by the assumption that the overhead is not critical
> > given a
> > diagnostic is being issued.
>
> (nods)
>
> >
> > > [...snip...]
> > >
> > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > (const char *file_path)
> > > >global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > >  }
> > > >
> > > > +void
> > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > +   unsigned int
> > > > data_len)
> > > > +{
> > > > +  if (!global_dc->m_file_cache)
> > > > +return;
> > > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > >
> > > Maybe we should rename diagnostic_context's m_file_cache to
> > > m_source_cache?  (and class file_cache for that matter?)  But if
> > > so,
> > > that can/should be a followup/separate patch.
> > >
> >
> > Yes, we should. Believe it or not, I was trying to minimize the size
> > of the
> > patch :)
>
> :)
>
> Thanks for splitting it up, BTW.
>
> [...]
>
>
> > >
> > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > line_num,
> > > > If the function fails, a NULL char_span is returned.  */
> > > >
> > > >  char_span
> > > > -location_get_source_line (const char *file_path, int line)
> > > > +location_get_source_line (source_id src, int line)
> > > >  {
> > > > -  const char *buffer = NULL;
> > > > -  ssize_t len;
> > > > -
> > > > -  if (line == 0)
> > > > -return char_span (NULL, 0);
> > > > -
> > > > -  if (file_path == NULL)
> > > > -return char_span (NULL, 0);
> > > > +  const char_span fail (nullptr, 0);
> > > > +  if (!src || line <= 0)
> > > > +return fail;
> > >
> > > Looking at source_id's operator bool, are there effectively three
> > > kinds
> > > of source_id?
> > >
> > > (a) file names
> > > (b) generated buffer
> > > (c) NULL == m_filename_or_buffer
> > >
> > > What does (c) mean?  Is it a "something's gone wrong/error" state?
> > > Or
> > > is this more a special-case of (a)? (in that the m_len for such a
> > > case
> > > would be zero)
> > >
> > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > NULL?
> > >
> > > [...snip...]
> > >
> > > The patch is OK for trunk as-is, but note the question about the
> > > source_id ctor above.
> > >
> >
> > Thanks. (c) has the same meaning as a NULL file name currently does,
> > so a
> > default-constructed source_id is not an in-memory buffer, but is
> > rather a
> > NULL filename. linemap_add() for instance, will interpret a NULL
> > filename
> > for an LC_LEAVE map, as a request to copy it from the natural values
> > being
> > returned to. I think the source_id constructor needs to accept a NULL
> > filename to remain backwards compatible. With the current design of
> > source_id, it is safe always to change a 'const char*' file name
> > argument to
> > a source_id argument instead; it will work just how it did before
> > because it
> > has an implicit constructor. But if the constructor would assert on a
> > non-NULL pointer, that would necessitate changing all call sites that
> > currently expect they can pass a NULL pointer there. (For example,
> > there are
> > several calls to _cpp_do_file_change() within libcpp that take
> > advantage of
> > being able to pass a 

Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > This patch enhances location_get_source_line(), which is the primary
> > interface provided by the diagnostics infrastructure to obtain the line of
> > source code corresponding to a given location, so that it understands
> > generated data locations in addition to normal file-based locations. This
> > involves changing the argument to location_get_source_line() from a plain
> > file name, to a source_id object that can represent either type of location.
> > 
> > gcc/ChangeLog:
> > 
> > * input.cc (class data_cache_slot): New class.
> > (file_cache::lookup_data): New function.
> > (diagnostics_file_cache_forcibly_evict_data): New function.
> > (file_cache::forcibly_evict_data): New function.
> > (file_cache::evicted_cache_tab_entry): Generalize (via a template)
> > to work for both file_cache_slot and data_cache_slot.
> > (file_cache::add_file): Adapt for new interface to
> > evicted_cache_tab_entry.
> > (file_cache::add_data): New function.
> > (data_cache_slot::create): New function.
> > (file_cache::file_cache): Support the new m_data_slots member.
> > (file_cache::~file_cache): Likewise.
> > (file_cache::lookup_or_add_data): New function.
> > (file_cache::lookup_or_add): New function that calls either
> > lookup_or_add_data or lookup_or_add_file as appropriate.
> > (location_get_source_line): Change the FILE_PATH argument to a
> > source_id SRC, and use it to support obtaining source lines from
> > generated data as well as from files.
> > (location_compute_display_column): Support generated data using the
> > new features of location_get_source_line.
> > (dump_location_info): Likewise.
> > * input.h (location_get_source_line): Adjust prototype. Add a new
> > convenience overload taking an expanded_location.
> > (class cache_data_source): Declare.
> > (class data_cache_slot): Declare.
> > (class file_cache): Declare new members.
> > (diagnostics_file_cache_forcibly_evict_data): Declare.
> > ---
> >  gcc/input.cc | 171 ---
> >  gcc/input.h  |  23 +--
> >  2 files changed, 153 insertions(+), 41 deletions(-)
> > 
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 9377020b460..790279d4273 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> > @@ -207,6 +207,28 @@ private:
> >void maybe_grow ();
> >  };
> >  
> > +/* This is the implementation of cache_data_source for generated
> > +   data that is already in memory.  */
> > +class data_cache_slot final : public cache_data_source
> 
> It occurred to me: why are we caching accessing a buffer that's already
> in memory - but we're also caching the line-splitting information, and
> providing the line-splitting algorithm with a consistent interface to
> the data, right?
>

Yeah, for the current _Pragma use case, multi-line buffers are not going to
be common, but they can occur. I was mainly motivated by the consistent
interface, and by the assumption that the overhead is not critical given a
diagnostic is being issued.

> [...snip...]
> 
> > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char 
> > *file_path)
> >global_dc->m_file_cache->forcibly_evict_file (file_path);
> >  }
> >  
> > +void
> > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > +   unsigned int data_len)
> > +{
> > +  if (!global_dc->m_file_cache)
> > +return;
> > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> 
> Maybe we should rename diagnostic_context's m_file_cache to
> m_source_cache?  (and class file_cache for that matter?)  But if so,
> that can/should be a followup/separate patch.
>

Yes, we should. Believe it or not, I was trying to minimize the size of the
patch :) So I didn't make such changes, but they will make things more
clear.

> [...snip...]
>  
> > @@ -525,10 +582,22 @@ file_cache_slot::create (const 
> > file_cache::input_context _context,
> >return true;
> >  }
> >  
> > +void
> > +data_cache_slot::create (const char *data, unsigned int data_len,
> > +unsigned int highest_use_count)
> > +{
> > +  reset ();
> > +  on_create (highest_use_count + 1,
> > +total_lines_num (source_id {data, data_len}));
> > +  m_data_begin = data;
> > +  m_data_end = data + data_len;
> > +}
> > +
> >  /* file_cache's ctor.  */
> >  
> >  file_cache::file_cache ()
> > -: m_file_slots (new file_cache_slot[num_file_slots])
> > +  : m_file_slots (new file_cache_slot[num_file_slots]),
> > +m_data_slots (new data_cache_slot[num_file_slots])
> 
> Should "num_file_slots" be renamed to "num_slots"?
> 
> I assume you're using the same value for both kinds of 

Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > Class file_cache_slot in input.cc is used to query specific lines of source
> > code from a file when needed by diagnostics infrastructure. This will be
> > extended in a subsequent patch to support obtaining the source code from
> > in-memory generated buffers rather than from a file. The present patch
> > refactors class file_cache_slot, putting most of the logic into a new base
> > class cache_data_source, in preparation for reusing that code in the next
> > patch. There is no change in functionality yet.
> > 
> > gcc/ChangeLog:
> > 
> > * input.cc (class file_cache_slot): Refactor functionality into a
> > new base class...
> > (class cache_data_source): ...here.
> > (file_cache::forcibly_evict_file): Adapt for refactoring.
> > (file_cache_slot::evict): Renamed to...
> > (file_cache_slot::reset): ...this, and partially refactored into
> > base class...
> > (cache_data_source::reset): ...here.
> > (file_cache_slot::get_full_file_content): Moved into base class...
> > (cache_data_source::get_full_file_content): ...here.
> > (file_cache_slot::create): Adapt for refactoring.
> > (file_cache_slot::file_cache_slot): Refactor partially into...
> > (cache_data_source::cache_data_source): ...here.
> > (file_cache_slot::~file_cache_slot): Refactor partially into...
> > (cache_data_source::~cache_data_source): ...here.
> > (file_cache_slot::needs_read_p): Remove.
> > (file_cache_slot::needs_grow_p): Remove.
> > (file_cache_slot::maybe_grow): Adapt for refactoring.
> > (file_cache_slot::read_data): Refactored, along with...
> > (file_cache_slot::maybe_read_data): this, into...
> > (file_cache_slot::get_more_data): ...here.
> > (find_end_of_line): Change interface to take a pair of pointers,
> > rather than a pointer + length.
> > (file_cache_slot::get_next_line): Refactored into...
> > (cache_data_source::get_next_line): ...here.
> > (file_cache_slot::goto_next_line): Refactored into...
> > (cache_data_source::goto_next_line): ...here.
> > (file_cache_slot::read_line_num): Refactored into...
> > (cache_data_source::read_line_num): ...here.
> > (location_get_source_line): Fix const-correctness as necessitated by
> > new interface.
> > ---
> >  gcc/input.cc | 513 +++
> >  1 file changed, 235 insertions(+), 278 deletions(-)
> > 
> 
> I confess I had to reread both this and patch 4/8 to make sense of
> this; this is probably one of those cases where it's harder to read in
> patch form than as source, but I think I now understand the new
> implementation.

Yes, sorry about that. I hope at least splitting into two patches here made it
a little easier.

> 
> Did you try testing this with valgrind (e.g. "make selftest-valgrind")?
>

Oh interesting, was not aware of this. I think it shows that new leaks were
not introduced with the patch series.

BEFORE patch series:
==1572278==
-fself-test: 7634593 pass(es) in 22.799240 seconds
==1572278==
==1572278== HEAP SUMMARY:
==1572278== in use at exit: 1,083,255 bytes in 2,394 blocks
==1572278==   total heap usage: 2,704,869 allocs, 2,702,475 frees, 
1,257,334,536 bytes allocated
==1572278==
==1572278== 8,032 bytes in 1 blocks are possibly lost in loss record 639 of 657
==1572278==at 0x4848899: malloc (in 
/usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1572278==by 0x21FE1CB: xmalloc (xmalloc.c:149)
==1572278==by 0x21B02E0: new_buff (lex.cc:4767)
==1572278==by 0x21B02E0: _cpp_get_buff (lex.cc:4800)
==1572278==by 0x21ACC80: cpp_create_reader(c_lang, ht*, line_maps*) 
(init.cc:289)
==1572278==by 0xA64282: c_common_init_options(unsigned int, 
cl_decoded_option*) (c-opts.cc:237)
==1572278==by 0x95E479: toplev::main(int, char**) (toplev.cc:2241)
==1572278==by 0x960B2D: main (main.cc:39)
==1572278==
==1572278== LEAK SUMMARY:
==1572278==definitely lost: 0 bytes in 0 blocks
==1572278==indirectly lost: 0 bytes in 0 blocks
==1572278==  possibly lost: 8,032 bytes in 1 blocks
==1572278==still reachable: 1,075,223 bytes in 2,393 blocks
==1572278== suppressed: 0 bytes in 0 blocks
==1572278== Reachable blocks (those to which a pointer was found) are not shown.
==1572278== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1572278==
==1572278== For lists of detected and suppressed errors, rerun with: -s
==1572278== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

AFTER patch series:
==1594840==
-fself-test: 7638403 pass(es) in 23.671784 seconds
==1594840==
==1594840== HEAP SUMMARY:
==1594840== in use at exit: 1,081,759 bytes in 2,367 blocks
==1594840==   total heap usage: 2,728,561 

Re: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output

2023-08-15 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 15, 2023 at 01:04:04PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The diagnostics routines for SARIF output need to read the source code back
> > in, so that they can generate "snippet" and "content" records, so they need 
> > to
> > be able to cope with generated data locations.  Add support for that in
> > diagnostic-format-sarif.cc.
> > 
> > gcc/ChangeLog:
> > 
> > * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
> > to support generated data locations.
> > (sarif_builder::maybe_make_physical_location_object): Change the
> > m_filenames hash_set to support generated data.
> > (sarif_builder::make_artifact_location_object): Use a source_id 
> > rather
> > than a plain file name.
> > (sarif_builder::maybe_make_region_object): Adapt to
> > expanded_location interface changes.
> > (sarif_builder::maybe_make_region_object_for_context): Likewise.
> > (sarif_builder::make_artifact_object): Likewise.
> > (sarif_builder::make_run_object): Handle generated data.
> > (sarif_builder::maybe_make_artifact_content_object): Likewise.
> > (get_source_lines): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * c-c++-common/diagnostic-format-sarif-file-5.c: New test.
> 
> I'm not sure if generated data is allowed as part of a SARIF artefact,
> or if there's a more standard-compliant way of representing this; SARIF
> says an artefact is a "sequence of bytes addressable via a URI".
> 
> Can you post a simple example of the generated .sarif JSON please? 
> e.g. from the new test, so that we can see it looks like.
> 
> You could run it through:
> 
>   python -m json.tool 
> 
> to format it for easier reading.

For a simple example like:

_Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")

for which the normal output is:

=
In buffer generated from t.cpp:1:
:1:24: warning: unknown option after ‘#pragma GCC diagnostic’ kind 
[-Wpragmas]
1 | GCC diagnostic ignored "-Wnot-an-option"
  |^
t.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")
  | ^~~
=

The SARIF output does not end up referencing any generated data locations,
because those are logically part of the "expansion" of the _Pragma
directive, and it doesn't output macro expansions.  In order for SARIF to
currently do something with generated data, it needs to see a generated data
location in a non-macro context. The only way to get GCC to do that, right
now, is with -fdump-internal-locations, which is what the new test case
does. That just unfortunately generates a larger amount of output. I attached
it, in case that's still helpful, for the following program:

=
_Pragma("GCC diagnostic push")
=

I guess there's potentially already a problem here because 'python -m
json.tool' is unhappy with this output and refuses to process it:

=
Invalid \escape: line 1 column 3436 (char 3435)
=

The related text is:
=
{"location": {"uri": "", "uriBaseId": "PWD"},
"contents":{"text": "GCC diagnostic push\n\0"}
=

And the \0 is not allowed it seems?

I also attached the output of 'python -m json.tool' anyway, after manually
removing the \0.

Is it better to just skip these locations for now?

-Lewis
{"$schema": 
"https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json;,
 "version": "2.1.0", "runs": [{"tool": {"driver": {"name": "GNU C++17", 
"fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) 
(x86_64-pc-linux-gnu)", "version": "14.0.0 20230811 (experimental)", 
"informationUri": "https://gcc.gnu.org/gcc-14/;, "rules": []}}, "invocations": 
[{"executionSuccessful": true, "toolExecutionNotifications": []}], 
"originalUriBaseIds": {"PWD": {"uri": "file:///home/lewis/"}}, "artifacts": 
[{"location": {"uri": "t.cpp", "uriBaseId": "PWD"}, "contents": {"text": 
"_Pragma(\"GCC diagnostic push\")\n"}, "sourceLanguage": "cplusplus"}, 
{"location": {"uri": "/usr/include/stdc-predef.h"}, "contents": {"text": "/* 
Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of 
the GNU C Library.\n\n   The GNU C Library is free software; you can 
redistribute it and/or\n   modify it under the terms of the GNU Lesser General 
Public\n 
   License as published by the Free Software Foundation; either\n   version 2.1 
of the License, or (at your option) any later version.\n\n   The GNU C Library 
is distributed in the hope that it will be useful,\n   but WITHOUT ANY 
WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS 
FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for 
more details.\n\n   You should have received a copy of the GNU Lesser General 
Public\n   License along with the GNU C Library; if not, see\n   
.  

Re: [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations

2023-08-14 Thread Lewis Hyatt via Gcc-patches
On Fri, Aug 11, 2023 at 07:02:49PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The previous patch in this series introduced the concept of LC_GEN line
> > maps. This patch continues on the path to using them to improve _Pragma
> > diagnostics, by adding a new source_id SRC member to struct
> > expanded_location, which is populated by linemap_expand_location. This
> > member allows call sites to detect and handle when a location refers to
> > generated data rather than a plain file name.
> > 
> > The previous FILE member of expanded_location is preserved (although
> > redundant with SRC), so that call sites which do not and never will care
> > about generated data do not need to be concerned about it. Call sites that
> > will care are modified here, to use SRC rather than FILE for comparing
> > locations.
> 
> Thanks; this seems like a good approach.
> 
> 
> [...snip...]
> 
> > diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
> > index 6f5bc6b9d8f..15052aec417 100644
> > --- a/gcc/edit-context.cc
> > +++ b/gcc/edit-context.cc
> > @@ -295,7 +295,7 @@ edit_context::apply_fixit (const fixit_hint *hint)
> >  {
> >expanded_location start = expand_location (hint->get_start_loc ());
> >expanded_location next_loc = expand_location (hint->get_next_loc ());
> > -  if (start.file != next_loc.file)
> > +  if (start.src != next_loc.src || start.src.is_buffer ())
> >  return false;
> >if (start.line != next_loc.line)
> >  return false;
> 
> Thinking about fix-it hints, it makes sense to reject attempts to
> create fix-it hints within generated strings, as we can't apply them or
> visualize them.
> 
> Does anywhere in the patch kit do that?  Either of 
>   rich_location::maybe_add_fixit
> or
>   rich_location::reject_impossible_fixit
> would be good places to do that.
>

So rich_location::reject_impossible_fixit does reject them for _Pragmas now,
because what the frontend sees and passes to it is a virtual location, and it
always rejects virtual locations. But it doesn't reject arbitrary generated
data locations that may be created in an ordinary non-virtual location. I
think it's this one-line change to reject those:

-- >8 --

diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index 835e8e1b8cd..382594637ad 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -2545,7 +2545,8 @@ rich_location::maybe_add_fixit (location_t start,
 = linemap_client_expand_location_to_spelling_point (next_loc,
LOCATION_ASPECT_START);
   /* They must be within the same file...  */
-  if (exploc_start.src != exploc_next_loc.src)
+  if (exploc_start.src != exploc_next_loc.src
+  || exploc_start.src.is_buffer ())
 {
   stop_supporting_fixits ();
   return;

-- >8 --

However, there are many selftests in diagnostic-show-locus.cc that actually
verify we generate the fixit hints for generated data, so I would need also to
change those to skip the test in this case as well. That looks like this:

-- >8 --

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 62c60645e88..884c55e91e9 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -3824,6 +3824,8 @@ test_diagnostic_show_locus_one_liner (const 
line_table_case _)
   test_one_liner_simple_caret ();
   test_one_liner_caret_and_range ();
   test_one_liner_multiple_carets_and_ranges ();
+  if (!ltt.m_generated_data)
+{
   test_one_liner_fixit_insert_before ();
   test_one_liner_fixit_insert_after ();
   test_one_liner_fixit_remove ();
@@ -3835,6 +3837,7 @@ test_diagnostic_show_locus_one_liner (const 
line_table_case _)
   test_one_liner_many_fixits_2 ();
   test_one_liner_labels ();
 }
+}

 /* Version of all one-liner tests exercising multibyte awareness.  For
simplicity we stick to using two multibyte characters in the test, U+1F602
@@ -4419,6 +4422,8 @@ test_diagnostic_show_locus_one_liner_utf8 (const 
line_table_case _)
   test_one_liner_simple_caret_utf8 ();
   test_one_liner_caret_and_range_utf8 ();
   test_one_liner_multiple_carets_and_ranges_utf8 ();
+  if (!ltt.m_generated_data)
+{
   test_one_liner_fixit_insert_before_utf8 ();
   test_one_liner_fixit_insert_after_utf8 ();
   test_one_liner_fixit_remove_utf8 ();
@@ -4428,6 +4433,7 @@ test_diagnostic_show_locus_one_liner_utf8 (const 
line_table_case _)
   test_one_liner_fixit_validation_adhoc_locations_utf8 ();
   test_one_liner_many_fixits_1_utf8 ();
   test_one_liner_many_fixits_2_utf8 ();
+}
   test_one_liner_labels_utf8 ();
   test_one_liner_colorized_utf8 ();
 }
@@ -5726,15 +5732,15 @@ diagnostic_show_locus_cc_tests ()
   for_each_line_table_case (test_diagnostic_show_locus_one_liner, true);
   for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8, true);
   for_each_line_table_case (test_add_location_if_nearby, true);
-  for_each_line_table_case 

Re: [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-08-13 Thread Lewis Hyatt via Gcc-patches
On Fri, Aug 11, 2023 at 06:45:31PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> 
> Hi Lewis, thanks for the patch...
> 
> > Add a new linemap reason LC_GEN which enables encoding the location of data
> > that was generated during compilation and does not appear in any source 
> > file.
> > There could be many use cases, such as, for instance, referring to the 
> > content
> > of builtin macros (not yet implemented, but an easy lift after this one.) 
> > The
> > first intended application is to create a place to store the input to a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> > 
> > The TO_FILE member of struct line_map_ordinary has been changed to a union
> > named SRC which can be either a file name, or a pointer to a line_map_data
> > struct describing the data. There is no space overhead added to the line
> > maps data structures.
> > 
> > Outside libcpp, this patch includes only the minimal changes implied by the
> > adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
> > patches will implement the new functionality.
> > 
> > libcpp/ChangeLog:
> > 
> > * include/line-map.h (enum lc_reason): Add LC_GEN.
> > (struct line_map_data): New struct.
> > (struct line_map_ordinary): Change TO_FILE from a char* to a union,
> > and rename to SRC.
> > (class source_id): New class.
> > (ORDINARY_MAP_GENERATED_DATA_P): New function.
> > (ORDINARY_MAP_GENERATED_DATA): New function.
> > (ORDINARY_MAP_GENERATED_DATA_LEN): New function.
> > (ORDINARY_MAP_SOURCE_ID): New function.
> > (ORDINARY_MAPS_SAME_FILE_P): New function.
> > (ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
> > (LINEMAP_FILE): Adapt to struct line_map_ordinary change.
> > (linemap_get_file_highest_location): Likewise.
> > * line-map.cc (source_id::operator==): New function.
> > (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
> > (linemap_add): Support creating LC_GEN maps.
> > (linemap_line_start): Support LC_GEN maps.
> > (linemap_check_files_exited): Likewise.
> > (linemap_position_for_loc_and_offset): Likewise.
> > (linemap_get_expansion_filename): Likewise.
> > (linemap_dump): Likewise.
> > (linemap_dump_location): Likewise.
> > (linemap_get_file_highest_location): Likewise.
> > * directives.cc (_cpp_do_file_change): Likewise.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c-common.cc (try_to_locate_new_include_insertion_point): Recognize
> > and ignore LC_GEN maps.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (module_state::write_ordinary_maps): Recognize and
> > ignore LC_GEN maps, and adapt to interface change in struct
> > line_map_ordinary.
> > (module_state::read_ordinary_maps): Likewise.
> > 
> > gcc/ChangeLog:
> > 
> > * diagnostic-show-locus.cc (compatible_locations_p): Adapt to
> > interface change in struct line_map_ordinary.
> > * input.cc (special_fname_generated): New function.
> > (dump_location_info): Support LC_GEN maps.
> > (get_substring_ranges_for_loc): Adapt to interface change in struct
> > line_map_ordinary.
> > * input.h (special_fname_generated): Declare.
> > 
> > gcc/go/ChangeLog:
> > 
> > * go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
> > LC_GEN maps.
> > ---
> >  gcc/c-family/c-common.cc |  11 ++-
> >  gcc/cp/module.cc |   8 +-
> >  gcc/diagnostic-show-locus.cc |   2 +-
> >  gcc/go/go-linemap.cc |   3 +-
> >  gcc/input.cc |  27 +-
> >  gcc/input.h  |   1 +
> >  libcpp/directives.cc |   4 +-
> >  libcpp/include/line-map.h    | 144 
> >  libcpp/line-map.cc   | 181 +--
> >  9 files changed, 299 insertions(+), 82 deletions(-)
> 
> [...snip...]
> 
> > 
> > diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
> > index 0514815b51f..a2aa6b4e0b5 100644
> > --- a/gcc/diagnostic-show-locus.cc
> > +++ b/gcc/diagnostic-show-locus.cc
> > @@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t 
> > loc_b)
> >  are in the same file.  */
> >    const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
> >    const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
> > -  return ord_map_a->to_file == ord_map_b->to_file;
> > +  return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);
> 
> My first thought here was: are buffers supported here, or does it have
> to be a file?
> 
> It turns out that ORDINARY_MAPS_SAME_FILE_P works on both files and
> buffers.
> 
> This suggests that it would be better named as
> ORDINARY_MAPS_SAME_SOURCE_ID_P, but note the 

[PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^

with the caret in a nonsensical location, while this one:

=
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=

produces:

file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

==
In buffer generated from file.cpp:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

and

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

* directives.cc (get_token_no_padding): Add argument to receive the
virtual location of the token.
(get__Pragma_string): Likewise.
(do_pragma): Set pfile->directive_result->src_loc properly, it should
not be a virtual location.
(destringize_and_run): Update to provide proper locations for the
_Pragma string tokens.  Support raw strings.
(_cpp_do__Pragma): Adapt to changes to the helper functions.
* errors.cc (cpp_diagnostic_at): Support
cpp_reader::diagnostic_rebase_loc.
(cpp_diagnostic_with_line): Likewise.
* include/line-map.h (class rich_location): Add new member
forget_cached_expanded_locations().
* internal.h (struct _cpp__Pragma_state): Define new struct.
(_cpp_rebase_diagnostic_location): Declare new function.
(struct cpp_reader): Add diagnostic_rebase_loc member.
(_cpp_push__Pragma_token_context): Declare new function.
(_cpp_do__Pragma): Adjust prototype.
* macro.cc (pragma_str): New static var.
(builtin_macro): Adapt to new implementation of _Pragma processing.
(_cpp_pop_context): Fix the logic for resetting
pfile->top_most_macro_node, which previously was never triggered,
although the error seems to have been harmless.
(_cpp_push__Pragma_token_context): New function.
(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
macro tracking output for _Pragma directives.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
tracking output for _Pragma directives.
* c-c++-common/cpp/pr57580.c: Likewise.
* c-c++-common/gomp/pragma-3.c: Likewise.

[PATCH v4 6/8] diagnostics: Full support for generated data locations

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Previous patches in this series have laid the groundwork for supporting
source code locations in memory ("generated data") rather than ordinary
files. This patch completes the support by adding awareness of such
locations to all places that need to support them. The main changes are to
diagnostic-show-locus.cc; the others are primarily small tweaks such as
changing from the FILE to the SRC member when inspecting an
expanded_location.

gcc/c-family/ChangeLog:

* c-format.cc (get_corrected_substring): Use the new overload of
location_get_source_line() to support generated data.
* c-indentation.cc (get_visual_column): Likewise.
(get_first_nws_vis_column): Change argument from a plain file name
to a source_id.
(detect_intervening_unindent): Likewise.
(should_warn_for_misleading_indentation): Pass
detect_intervening_unindent() the SRC field rather than the FILE
field from the expanded_location.

gcc/ChangeLog:

* gcc-rich-location.cc (blank_line_before_p): Use the new overload
of location_get_source_line() to support generated data.
* input.cc (get_source_text_between): Likewise.
(get_substring_ranges_for_loc): Likewise.
(get_source_file_content): Change the argument from a plain filename
to a source_id.
(location_missing_trailing_newline): Likewise.
* input.h (get_source_file_content): Adjust prototype.
(location_missing_trailing_newline): Likewise.
* diagnostic-show-locus.cc (layout::calculate_x_offset_display): Use
the new overload of location_get_source_line() to support generated
data.
(layout::print_line): Likewise.
(class line_corrections): Change m_filename from a plain filename to
a source_id.
(source_line::source_line): Change argument from a plain filename to
a source_id.
(line_corrections::add_hint): Adapt to source_line change.
(layout::print_trailing_fixits): Adapt to line_corrections change.
(test_layout_x_offset_display_utf8): Test generated data too.
(test_layout_x_offset_display_tab): Likewise.
(test_diagnostic_show_locus_one_liner): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_add_location_if_nearby): Likewise.
(test_diagnostic_show_locus_fixit_lines): Likewise.
(test_fixit_consolidation): Likewise.
(test_overlapped_fixit_printing): Likewise.
(test_overlapped_fixit_printing_utf8): Likewise.
(test_overlapped_fixit_printing_2): Likewise.
(test_fixit_insert_containing_newline): Likewise.
(test_fixit_insert_containing_newline_2): Likewise.
(test_fixit_replace_containing_newline): Likewise.
(test_fixit_deletion_affecting_newline): Likewise.
(test_tab_expansion): Likewise.
(test_escaping_bytes_1): Likewise.
(test_escaping_bytes_2): Likewise.
(test_line_numbers_multiline_range): Likewise.
(diagnostic_show_locus_cc_tests): Likewise.
---
 gcc/c-family/c-format.cc  |   2 +-
 gcc/c-family/c-indentation.cc |   8 +-
 gcc/diagnostic-show-locus.cc  | 227 ++
 gcc/gcc-rich-location.cc  |   2 +-
 gcc/input.cc  |  21 ++--
 gcc/input.h   |   6 +-
 6 files changed, 136 insertions(+), 130 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 529b1408179..929ec24622c 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4537,7 +4537,7 @@ get_corrected_substring (const substring_loc _loc,
   if (caret.column > finish.column)
 return NULL;
 
-  char_span line = location_get_source_line (start.file, start.line);
+  char_span line = location_get_source_line (start);
   if (!line)
 return NULL;
 
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index fce74991aae..27a90d9cc15 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
   unsigned int *first_nws,
   unsigned int tab_width)
 {
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   if (!line)
 return false;
   if ((size_t)exploc.column > line.length ())
@@ -87,7 +87,7 @@ get_visual_column (expanded_location exploc,
Otherwise, return false, leaving *FIRST_NWS untouched.  */
 
 static bool
-get_first_nws_vis_column (const char *file, int line_num,
+get_first_nws_vis_column (source_id file, int line_num,
  unsigned int *first_nws,
  unsigned int tab_width)
 {
@@ -158,7 +158,7 @@ get_first_nws_vis_column (const char *file, int line_num,
Return true if such an unindent/outdent is detected.  */
 
 static bool
-detect_intervening_unindent (const char *file,

[PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Add selftests for the new capabilities in input.cc related to source code
locations that are stored in memory rather than ordinary files.

gcc/ChangeLog:

* input.cc (temp_source_file::do_linemap_add): New function.
(line_table_case::line_table_case): Add GENERATED_DATA argument.
(line_table_test::line_table_test): Implement new M_GENERATED_DATA
argument.
(for_each_line_table_case): Optionally include generated data
locations in the set of cases.
(test_accessing_ordinary_linemaps): Test generated data locations.
(test_make_location_nonpure_range_endpoints): Likewise.
(test_line_offset_overflow): Likewise.
(input_cc_tests): Likewise.
* selftest.cc (named_temp_file::named_temp_file): Interpret a null
SUFFIX argument as a request to use in-memory data.
(named_temp_file::~named_temp_file): Support in-memory data.
(temp_source_file::temp_source_file): Likewise.
(temp_source_file::~temp_source_file): Likewise.
* selftest.h (struct line_map_ordinary): Foward declare.
(class named_temp_file): Add missing explicit to the constructor.
(class temp_source_file): Add new members to support in-memory data.
(class line_table_test): Likewise.
(for_each_line_table_case): Adjust prototype.
---
 gcc/input.cc| 81 +
 gcc/selftest.cc | 53 +---
 gcc/selftest.h  | 19 ++--
 3 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 790279d4273..8c4e40aaf23 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2066,6 +2066,20 @@ get_num_source_ranges_for_substring (cpp_reader *pfile,
 
 /* Selftests of location handling.  */
 
+/* Wrapper around linemap_add to handle transparently adding either a tmp file,
+   or in-memory generated content.  */
+const line_map_ordinary *
+temp_source_file::do_linemap_add (int line)
+{
+  const line_map *map;
+  if (content_buf)
+map = linemap_add (line_table, LC_GEN, false, content_buf,
+  line, content_len);
+  else
+map = linemap_add (line_table, LC_ENTER, false, get_filename (), line);
+  return linemap_check_ordinary (map);
+}
+
 /* Verify that compare() on linenum_type handles comparisons over the full
range of the type.  */
 
@@ -2144,13 +2158,16 @@ assert_loceq (const char *exp_filename, int 
exp_linenum, int exp_colnum,
 class line_table_case
 {
 public:
-  line_table_case (int default_range_bits, int base_location)
+  line_table_case (int default_range_bits, int base_location,
+  bool generated_data)
   : m_default_range_bits (default_range_bits),
-m_base_location (base_location)
+m_base_location (base_location),
+m_generated_data (generated_data)
   {}
 
   int m_default_range_bits;
   int m_base_location;
+  bool m_generated_data;
 };
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2167,6 +2184,7 @@ line_table_test::line_table_test ()
   gcc_assert (saved_line_table->round_alloc_size);
   line_table->round_alloc_size = saved_line_table->round_alloc_size;
   line_table->default_range_bits = 0;
+  m_generated_data = false;
 }
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2188,6 +2206,7 @@ line_table_test::line_table_test (const line_table_case 
_)
   line_table->highest_location = case_.m_base_location;
   line_table->highest_line = case_.m_base_location;
 }
+  m_generated_data = case_.m_generated_data;
 }
 
 /* Destructor.  Restore the old value of line_table.  */
@@ -2207,7 +2226,10 @@ test_accessing_ordinary_linemaps (const line_table_case 
_)
   line_table_test ltt (case_);
 
   /* Build a simple linemap describing some locations. */
-  linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
+  if (ltt.m_generated_data)
+linemap_add (line_table, LC_GEN, false, "some data", 0, 10);
+  else
+linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
 
   linemap_line_start (line_table, 1, 100);
   location_t loc_a = linemap_position_for_column (line_table, 1);
@@ -2257,21 +2279,23 @@ test_accessing_ordinary_linemaps (const line_table_case 
_)
   linemap_add (line_table, LC_LEAVE, false, NULL, 0);
 
   /* Verify that we can recover the location info.  */
-  assert_loceq ("foo.c", 1, 1, loc_a);
-  assert_loceq ("foo.c", 1, 23, loc_b);
-  assert_loceq ("foo.c", 2, 1, loc_c);
-  assert_loceq ("foo.c", 2, 17, loc_d);
-  assert_loceq ("foo.c", 3, 700, loc_e);
-  assert_loceq ("foo.c", 4, 100, loc_back_to_short);
+  const auto fname
+= (ltt.m_generated_data ? special_fname_generated () : "foo.c");
+  assert_loceq (fname, 1, 1, loc_a);
+  assert_loceq (fname, 1, 23, loc_b);
+  assert_loceq (fname, 2, 1, loc_c);
+  assert_loceq (fname, 2, 17, loc_d);
+  assert_loceq (fname, 3, 700, loc_e);
+  assert_loceq (fname, 4, 100, loc_back_to_short);
 
   /* In the very wide line, the 

[PATCH v4 3/8] diagnostics: Refactor class file_cache_slot

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Class file_cache_slot in input.cc is used to query specific lines of source
code from a file when needed by diagnostics infrastructure. This will be
extended in a subsequent patch to support obtaining the source code from
in-memory generated buffers rather than from a file. The present patch
refactors class file_cache_slot, putting most of the logic into a new base
class cache_data_source, in preparation for reusing that code in the next
patch. There is no change in functionality yet.

gcc/ChangeLog:

* input.cc (class file_cache_slot): Refactor functionality into a
new base class...
(class cache_data_source): ...here.
(file_cache::forcibly_evict_file): Adapt for refactoring.
(file_cache_slot::evict): Renamed to...
(file_cache_slot::reset): ...this, and partially refactored into
base class...
(cache_data_source::reset): ...here.
(file_cache_slot::get_full_file_content): Moved into base class...
(cache_data_source::get_full_file_content): ...here.
(file_cache_slot::create): Adapt for refactoring.
(file_cache_slot::file_cache_slot): Refactor partially into...
(cache_data_source::cache_data_source): ...here.
(file_cache_slot::~file_cache_slot): Refactor partially into...
(cache_data_source::~cache_data_source): ...here.
(file_cache_slot::needs_read_p): Remove.
(file_cache_slot::needs_grow_p): Remove.
(file_cache_slot::maybe_grow): Adapt for refactoring.
(file_cache_slot::read_data): Refactored, along with...
(file_cache_slot::maybe_read_data): this, into...
(file_cache_slot::get_more_data): ...here.
(find_end_of_line): Change interface to take a pair of pointers,
rather than a pointer + length.
(file_cache_slot::get_next_line): Refactored into...
(cache_data_source::get_next_line): ...here.
(file_cache_slot::goto_next_line): Refactored into...
(cache_data_source::goto_next_line): ...here.
(file_cache_slot::read_line_num): Refactored into...
(cache_data_source::read_line_num): ...here.
(location_get_source_line): Fix const-correctness as necessitated by
new interface.
---
 gcc/input.cc | 513 +++
 1 file changed, 235 insertions(+), 278 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index c2559614a99..9377020b460 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -55,34 +55,88 @@ file_cache::initialize_input_context 
(diagnostic_input_charset_callback ccb,
   in_context.should_skip_bom = should_skip_bom;
 }
 
-/* This is a cache used by get_next_line to store the content of a
-   file to be searched for file lines.  */
-class file_cache_slot
+/* This is an abstract interface for a class that provides data which we want 
to
+   look up by line number.  Concrete implementations will follow, which handle
+   the cases of reading the data from the input source files, or of reading it
+   from in-memory generated data buffers.  The design is driven with reading
+   from files in mind, in particular it is desirable to read only as much of a
+   file from disk as necessary.  It works like a simplified std::istream, i.e.
+   virtual function calls are only needed when we need to retrieve more data
+   from the underlying source.  */
+
+class cache_data_source
 {
-public:
-  file_cache_slot ();
-  ~file_cache_slot ();
 
-  bool read_line_num (size_t line_num,
- char ** line, ssize_t *line_len);
-
-  /* Accessors.  */
-  const char *get_file_path () const { return m_file_path; }
+public:
+  bool read_line_num (size_t line_num, const char **line, ssize_t *line_len);
   unsigned get_use_count () const { return m_use_count; }
+  void inc_use_count () { m_use_count++; }
+  bool get_next_line (const char **line, ssize_t *line_len);
+  bool goto_next_line ();
   bool missing_trailing_newline_p () const
   {
 return m_missing_trailing_newline;
   }
   char_span get_full_file_content ();
+  bool unused () const { return !m_data_begin; }
+  virtual void reset ();
+
+protected:
+  cache_data_source ();
+  virtual ~cache_data_source ();
+
+  /* These pointers delimit the data that we are processing.  They are
+ maintained by the derived classes, we only ask for more by calling
+ get_more_data().  That function should return TRUE if more data was
+ obtained.  Calling get_more_data () may invalidate these pointers
+ (i.e. reallocating them to a larger buffer).  */
+  const char *m_data_begin;
+  const char *m_data_end;
+  virtual bool get_more_data () = 0;
+
+  /* This is to be called by the derived classes when this object is
+ being activated.  */
+  void on_create (unsigned int use_count, size_t total_lines)
+  {
+m_use_count = use_count;
+m_total_lines = total_lines;
+  }
 
-  void inc_use_count () { m_use_count++; }
+private:
+  /* Non-copyable.  */
+  cache_data_source (const 

[PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers

2023-08-09 Thread Lewis Hyatt via Gcc-patches
This patch enhances location_get_source_line(), which is the primary
interface provided by the diagnostics infrastructure to obtain the line of
source code corresponding to a given location, so that it understands
generated data locations in addition to normal file-based locations. This
involves changing the argument to location_get_source_line() from a plain
file name, to a source_id object that can represent either type of location.

gcc/ChangeLog:

* input.cc (class data_cache_slot): New class.
(file_cache::lookup_data): New function.
(diagnostics_file_cache_forcibly_evict_data): New function.
(file_cache::forcibly_evict_data): New function.
(file_cache::evicted_cache_tab_entry): Generalize (via a template)
to work for both file_cache_slot and data_cache_slot.
(file_cache::add_file): Adapt for new interface to
evicted_cache_tab_entry.
(file_cache::add_data): New function.
(data_cache_slot::create): New function.
(file_cache::file_cache): Support the new m_data_slots member.
(file_cache::~file_cache): Likewise.
(file_cache::lookup_or_add_data): New function.
(file_cache::lookup_or_add): New function that calls either
lookup_or_add_data or lookup_or_add_file as appropriate.
(location_get_source_line): Change the FILE_PATH argument to a
source_id SRC, and use it to support obtaining source lines from
generated data as well as from files.
(location_compute_display_column): Support generated data using the
new features of location_get_source_line.
(dump_location_info): Likewise.
* input.h (location_get_source_line): Adjust prototype. Add a new
convenience overload taking an expanded_location.
(class cache_data_source): Declare.
(class data_cache_slot): Declare.
(class file_cache): Declare new members.
(diagnostics_file_cache_forcibly_evict_data): Declare.
---
 gcc/input.cc | 171 ---
 gcc/input.h  |  23 +--
 2 files changed, 153 insertions(+), 41 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 9377020b460..790279d4273 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -207,6 +207,28 @@ private:
   void maybe_grow ();
 };
 
+/* This is the implementation of cache_data_source for generated
+   data that is already in memory.  */
+class data_cache_slot final : public cache_data_source
+{
+public:
+  void create (const char *data, unsigned int data_len,
+  unsigned int highest_use_count);
+  bool represents_data (const char *data, unsigned int) const
+  {
+/* We can just use pointer equality here since the generated data lives in
+   memory in one persistent place.  It isn't anticipated there would be
+   several generated data buffers with the same content, so we don't mind
+   that in such a case we will store it twice.  */
+return m_data_begin == data;
+  }
+
+protected:
+  /* In contrast to file_cache_slot, we do not own a buffer.  The buffer
+ passed to create() needs to outlive this object.  */
+  bool get_more_data () override { return false; }
+};
+
 /* Current position in real source file.  */
 
 location_t input_location = UNKNOWN_LOCATION;
@@ -382,6 +404,21 @@ file_cache::lookup_file (const char *file_path)
   return r;
 }
 
+data_cache_slot *
+file_cache::lookup_data (const char *data, unsigned int data_len)
+{
+  for (unsigned int i = 0; i != num_file_slots; ++i)
+{
+  const auto slot = m_data_slots + i;
+  if (slot->represents_data (data, data_len))
+   {
+ slot->inc_use_count ();
+ return slot;
+   }
+}
+  return nullptr;
+}
+
 /* Purge any mention of FILENAME from the cache of files used for
printing source code.  For use in selftests when working
with tempfiles.  */
@@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char 
*file_path)
   global_dc->m_file_cache->forcibly_evict_file (file_path);
 }
 
+void
+diagnostics_file_cache_forcibly_evict_data (const char *data,
+   unsigned int data_len)
+{
+  if (!global_dc->m_file_cache)
+return;
+  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
+}
+
 void
 file_cache::forcibly_evict_file (const char *file_path)
 {
@@ -410,36 +456,36 @@ file_cache::forcibly_evict_file (const char *file_path)
   r->reset ();
 }
 
+void
+file_cache::forcibly_evict_data (const char *data, unsigned int data_len)
+{
+  if (auto r = lookup_data (data, data_len))
+r->reset ();
+}
+
 /* Return the cache that has been less used, recently, or the
first empty one.  If HIGHEST_USE_COUNT is non-null,
*HIGHEST_USE_COUNT is set to the highest use count of the entries
in the cache table.  */
 
-file_cache_slot*
-file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
+template 
+Slot *
+file_cache::evicted_cache_tab_entry (Slot 

[PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output

2023-08-09 Thread Lewis Hyatt via Gcc-patches
The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

* diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
to support generated data locations.
(sarif_builder::maybe_make_physical_location_object): Change the
m_filenames hash_set to support generated data.
(sarif_builder::make_artifact_location_object): Use a source_id rather
than a plain file name.
(sarif_builder::maybe_make_region_object): Adapt to
expanded_location interface changes.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
(sarif_builder::make_artifact_object): Likewise.
(sarif_builder::make_run_object): Handle generated data.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc| 88 +++
 .../diagnostic-format-sarif-file-5.c  | 31 +++
 2 files changed, 82 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 1eff71962d7..c7c0e5d4b0a 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -174,7 +174,7 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+  json::object *make_artifact_location_object (source_id src);
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -197,9 +197,9 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) 
const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
+  json::object *make_artifact_object (source_id src);
+  json::object *maybe_make_artifact_content_object (source_id src) const;
+  json::object *maybe_make_artifact_content_object (source_id src,
int start_line,
int end_line) const;
   json::object *make_fix_object (const rich_location _loc);
@@ -220,7 +220,11 @@ private:
  diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set  m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+ with that length, not a filename.  */
+  hash_set ,
+  int_hash  >
+   > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set  m_rule_id_set;
   json::array *m_rules_arr;
@@ -787,7 +791,8 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  const auto src = LOCATION_SRC (loc);
+  m_filenames.add ({src.get_filename_or_buffer (), src.get_buffer_len ()});
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -811,7 +816,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (LOCATION_SRC (loc));
 }
 
 /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4)
@@ -823,10 +828,13 @@ sarif_builder::make_artifact_location_object (location_t 
loc)
or return NULL.  */
 
 json::object *
-sarif_builder::make_artifact_location_object (const char *filename)
+sarif_builder::make_artifact_location_object (source_id src)
 {
   json::object *artifact_loc_obj = new json::object ();
 
+  const auto filename = src.is_buffer ()
+? special_fname_generated () : src.get_filename_or_buffer ();
+
   /* "uri" property (SARIF v2.1.0 section 3.4.3).  */
   artifact_loc_obj->set ("uri", new json::string (filename));
 
@@ -912,9 +920,9 @@ sarif_builder::maybe_make_region_object (location_t loc) 

[PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-08-09 Thread Lewis Hyatt via Gcc-patches
Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The TO_FILE member of struct line_map_ordinary has been changed to a union
named SRC which can be either a file name, or a pointer to a line_map_data
struct describing the data. There is no space overhead added to the line
maps data structures.

Outside libcpp, this patch includes only the minimal changes implied by the
adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
patches will implement the new functionality.

libcpp/ChangeLog:

* include/line-map.h (enum lc_reason): Add LC_GEN.
(struct line_map_data): New struct.
(struct line_map_ordinary): Change TO_FILE from a char* to a union,
and rename to SRC.
(class source_id): New class.
(ORDINARY_MAP_GENERATED_DATA_P): New function.
(ORDINARY_MAP_GENERATED_DATA): New function.
(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
(ORDINARY_MAP_SOURCE_ID): New function.
(ORDINARY_MAPS_SAME_FILE_P): New function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
(LINEMAP_FILE): Adapt to struct line_map_ordinary change.
(linemap_get_file_highest_location): Likewise.
* line-map.cc (source_id::operator==): New function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
(linemap_add): Support creating LC_GEN maps.
(linemap_line_start): Support LC_GEN maps.
(linemap_check_files_exited): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
* directives.cc (_cpp_do_file_change): Likewise.

gcc/c-family/ChangeLog:

* c-common.cc (try_to_locate_new_include_insertion_point): Recognize
and ignore LC_GEN maps.

gcc/cp/ChangeLog:

* module.cc (module_state::write_ordinary_maps): Recognize and
ignore LC_GEN maps, and adapt to interface change in struct
line_map_ordinary.
(module_state::read_ordinary_maps): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (compatible_locations_p): Adapt to
interface change in struct line_map_ordinary.
* input.cc (special_fname_generated): New function.
(dump_location_info): Support LC_GEN maps.
(get_substring_ranges_for_loc): Adapt to interface change in struct
line_map_ordinary.
* input.h (special_fname_generated): Declare.

gcc/go/ChangeLog:

* go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
LC_GEN maps.
---
 gcc/c-family/c-common.cc |  11 ++-
 gcc/cp/module.cc |   8 +-
 gcc/diagnostic-show-locus.cc |   2 +-
 gcc/go/go-linemap.cc |   3 +-
 gcc/input.cc |  27 +-
 gcc/input.h  |   1 +
 libcpp/directives.cc |   4 +-
 libcpp/include/line-map.h| 144 
 libcpp/line-map.cc   | 181 +--
 9 files changed, 299 insertions(+), 82 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9fbaeb437a1..ecfc2efc29f 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9206,19 +9206,22 @@ try_to_locate_new_include_insertion_point (const char 
*file, location_t loc)
   const line_map_ordinary *ord_map
= LINEMAPS_ORDINARY_MAP_AT (line_table, i);
 
+  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+   continue;
+
   if (const line_map_ordinary *from
  = linemap_included_from_linemap (line_table, ord_map))
/* We cannot use pointer equality, because with preprocessed
   input all filename strings are unique.  */
-   if (0 == strcmp (from->to_file, file))
+   if (ORDINARY_MAP_SOURCE_ID (from) == file)
  {
last_include_ord_map = from;
last_ord_map_after_include = NULL;
  }
 
-  /* Likewise, use strcmp, and reject any line-zero introductory
-map.  */
-  if (ord_map->to_line && 0 == strcmp (ord_map->to_file, file))
+  /* Likewise, use strcmp (via the source_id comparison), and reject any
+line-zero introductory map.  */
+  if (ord_map->to_line && ORDINARY_MAP_SOURCE_ID (ord_map) == file)
{
  if (!first_ord_map_in_file)
first_ord_map_in_file = ord_map;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ea362bdffa4..ff17cd57016 100644
--- 

[PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations

2023-08-09 Thread Lewis Hyatt via Gcc-patches
The previous patch in this series introduced the concept of LC_GEN line
maps. This patch continues on the path to using them to improve _Pragma
diagnostics, by adding a new source_id SRC member to struct
expanded_location, which is populated by linemap_expand_location. This
member allows call sites to detect and handle when a location refers to
generated data rather than a plain file name.

The previous FILE member of expanded_location is preserved (although
redundant with SRC), so that call sites which do not and never will care
about generated data do not need to be concerned about it. Call sites that
will care are modified here, to use SRC rather than FILE for comparing
locations.

libcpp/ChangeLog:

* include/line-map.h (struct expanded_location): Add SRC member. Add
zero-initializers for all members, since source_id is not a POD
type.
(class fixit_hint): Adjust prototype.
* line-map.cc (linemap_expand_location): Populate the new SRC member
in the expanded_location.
(rich_location::maybe_add_fixit): Compare explocs with the new SRC
field instead of the FILE field.
(fixit_hint::affects_line_p): Accept a source_id instead of a file
name, and use it for the comparisons.

gcc/c-family/ChangeLog:

* c-format.cc (get_corrected_substring): Compare explocs with the
new SRC field instead of the FILE field.
* c-indentation.cc (should_warn_for_misleading_indentation): Likewise.
(assert_get_visual_column_succeeds): Initialize the SRC field in the
test expanded_location.
(assert_get_visual_column_fails): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (make_range): Adapt to the new
constructor semantics for struct expanded_location.
(layout::maybe_add_location_range): Compare explocs with the new SRC
field instead of the FILE field.
(layout::validate_fixit_hint_p): Likewise.
(layout::print_leading_fixits): Use the SRC field in struct
expanded_location to query fixit_hint::affects_line_p.
(layout::print_trailing_fixits): Likewise.
* diagnostic.cc (diagnostic_report_current_module): Use the new SRC
field in expanded_location to detect LC_GEN locations and identify
them as such.
(assert_location_text): Adapt to the new constructor semantics for
struct expanded_location.
* input.cc (expand_location_1): Likewise. And when libcpp's
linemap_expand_location returns a null FILE for generated data,
replace it with special_fname_generated ().
(total_lines_num): Handle a generic source_id argument rather than a
file name only.
(get_source_text_between): Compare explocs with the new SRC field
instead of the FILE field.
(get_substring_ranges_for_loc): Likewise.
* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.
* input.h (LOCATION_SRC): New accessor macro.
---
 gcc/c-family/c-format.cc  |  4 ++--
 gcc/c-family/c-indentation.cc | 10 +-
 gcc/diagnostic-show-locus.cc  | 30 +-
 gcc/diagnostic.cc | 19 ---
 gcc/edit-context.cc   |  2 +-
 gcc/input.cc  | 21 +++--
 gcc/input.h   |  1 +
 libcpp/include/line-map.h | 24 ++--
 libcpp/line-map.cc| 15 +++
 9 files changed, 70 insertions(+), 56 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index b4eeebcb30e..529b1408179 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4522,9 +4522,9 @@ get_corrected_substring (const substring_loc _loc,
 = expand_location_to_spelling_point (fmt_substring_range.m_start);
   expanded_location finish
 = expand_location_to_spelling_point (fmt_substring_range.m_finish);
-  if (caret.file != start.file)
+  if (caret.src != start.src)
 return NULL;
-  if (start.file != finish.file)
+  if (start.src != finish.src)
 return NULL;
   if (caret.line != start.line)
 return NULL;
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index e8d3dece770..fce74991aae 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -334,7 +334,7 @@ should_warn_for_misleading_indentation (const 
token_indent_info _tinfo,
   const unsigned int tab_width = global_dc->tabstop;
 
   /* They must be in the same file.  */
-  if (next_stmt_exploc.file != body_exploc.file)
+  if (next_stmt_exploc.src != body_exploc.src)
 return false;
 
   /* If NEXT_STMT_LOC and BODY_LOC are on the same line, consider
@@ -363,7 +363,7 @@ should_warn_for_misleading_indentation (const 
token_indent_info _tinfo,
   ^ DON'T WARN HERE.  */
   if (next_stmt_exploc.line == body_exploc.line)
 {
-  if (guard_exploc.file != 

[PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-08-09 Thread Lewis Hyatt via Gcc-patches
On Mon, Jul 31, 2023 at 06:39:15PM -0400, Lewis Hyatt wrote:
> On Fri, Jul 28, 2023 at 6:58 PM David Malcolm  wrote:
> >
> > On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > > Add a new linemap reason LC_GEN which enables encoding the location
> > > of data
> > > that was generated during compilation and does not appear in any
> > > source file.
> > > There could be many use cases, such as, for instance, referring to
> > > the content
> > > of builtin macros (not yet implemented, but an easy lift after this
> > > one.) The
> > > first intended application is to create a place to store the input to
> > > a
> > > _Pragma directive, so that proper locations can be assigned to those
> > > tokens. This will be done in a subsequent commit.
> > >
> > > The actual change needed to the line-maps API in libcpp is not too
> > > large and
> > > requires no space overhead in the line map data structures (on 64-bit
> > > systems
> > > that is; one newly added data member to class line_map_ordinary sits
> > > inside
> > > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > > other,
> > > but the TO_FILE member that normally points to the file name points
> > > instead to
> > > the actual data.  This works automatically with PCH as well, for the
> > > same
> > > reason that the file name makes its way into a PCH.  In order to
> > > avoid
> > > confusion, the member has been renamed from TO_FILE to DATA, and
> > > associated
> > > accessors adjusted.
> > >
> > > Outside libcpp, there are many small changes but most of them are to
> > > selftests, which are necessarily more sensitive to implementation
> > > details. From the perspective of the user (the "user", here, being a
> > > frontend
> > > using line maps or else the diagnostics infrastructure), the chief
> > > visible
> > > change is that the function location_get_source_line() should be
> > > passed an
> > > expanded_location object instead of a separate filename and line
> > > number.  This
> > > is not a big change because in most cases, this information came
> > > anyway from a
> > > call to expand_location and the needed expanded_location object is
> > > readily
> > > available. The new overload of location_get_source_line() uses the
> > > extra
> > > information in the expanded_location object to obtain the data from
> > > the
> > > in-memory buffer when it originated from an LC_GEN map.
> > >
> > > Until the subsequent patch that starts using LC_GEN maps, none are
> > > yet
> > > generated within GCC, hence nothing is added to the testsuite here;
> > > but all
> > > relevant selftests have been extended to cover generated data maps in
> > > addition
> > > to normal files.
> >
> > [..snip...]
> >
> > Thanks for the updated patch.
> >
> > Reading this patch, it felt a bit unnatural to me to have an
> >   (exploded location, source line)
> > pair where the exploded location seems to be representing "which source
> > file or generated buffer", but the line/column info in that
> > exploded_location is to be ignored in favor of the 2nd source line.
> >
> > I think we're missing a class: something that identifies either a
> > specific source file, or a specific generated buffer.
> >
> > How about something like either:
> >
> > class source_id
> > {
> > public:
> >   source_id (const char *filename)
> >   : m_filename_or_buffer (filename),
> > m_len (0)
> >   {
> >   }
> >
> >   explicit source_id (const char *buffer, unsigned buffer_len)
> >   : m_filename_or_buffer (buffer),
> > m_len (buffer_len)
> >   {
> > linemap_assert (buffer_len > 0);
> >   }
> >
> > private:
> >   const char *m_filename_or_buffer;
> >   unsigned m_len;  // where 0 means "it's a filename"
> > };
> >
> > or:
> >
> > class source_id
> > {
> > public:
> >   source_id (const char *filename)
> >   : m_ptr (filename),
> > m_is_buffer (false)
> >   {
> >   }
> >
> >   explicit source_id (const linemap_ordinary *buffer_linemap)
> >   : m_ptr (buffer_linemap),
> > m_is_buffer (true)
> >   {
> >   }
> >
> > private:
> >   const void *m_ptr;
> >   bool m_is_buffer;
> > };
> >
> > and use one of these "source_id file" in place of "const char *file",
> > rather than replacing such things with expanded_location?
> >
> > > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > > index e8d3dece770..4164fa0b1ba 100644
> > > --- a/gcc/c-family/c-indentation.cc
> > > +++ b/gcc/c-family/c-indentation.cc
> > > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> > >  unsigned int *first_nws,
> > >  unsigned int tab_width)
> > >  {
> > > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > > +  char_span line = location_get_source_line (exploc);
> >
> > ...so this might contine to be:
> >
> >   char_span line = location_get_source_line (exploc.file, exploc.line);
> >
> > ...but expanded_location's "file" field would become a source_id,
> > rather than a const char *.  It 

Re: [PATCH] preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

2023-08-09 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 1, 2023 at 11:01 AM Joseph Myers  wrote:
>
> On Mon, 31 Jul 2023, Lewis Hyatt via Gcc-patches wrote:
>
> > I added some additional testcases from the PR for x86. The other targets
> > that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
> > already had tests verifying that the pragma sets macros as expected; here I
> > have added -save-temps to some of them, to test that it now works in
> > preprocess-only mode as well.
>
> It would seem better to have copies of the tests with and without
> -save-temps, to test in both modes, rather than changing what's tested by
> an existing test here.  Or a test variant that #includes the original test
> but uses different options, if the original test isn't doing anything that
> would fail to work with that approach.

Thank you, I will adjust this.

-Lewis


[PATCH] preprocessor: c++: Support `#pragma GCC target' macros [PR87299]

2023-07-31 Thread Lewis Hyatt via Gcc-patches
`#pragma GCC target' is not currently handled in preprocess-only mode (e.g.,
when running gcc -E or gcc -save-temps). As noted in the PR, this means that
if the target pragma defines any macros, those macros are not effective in
preprocess-only mode. Similarly, such macros are not effective when
compiling with C++ (even when compiling without -save-temps), because C++
does not process the pragma until after all tokens have been obtained from
libcpp, at which point it is too late for macro expansion to take place.

Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas
under these conditions as well, so resolve the PR by using the new "early
pragma" support.

toplev.cc required some changes because the target-specific handlers for
`#pragma GCC target' may call target_reinit(), and toplev.cc was not expecting
that function to be called in preprocess-only mode.

I added some additional testcases from the PR for x86. The other targets
that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390)
already had tests verifying that the pragma sets macros as expected; here I
have added -save-temps to some of them, to test that it now works in
preprocess-only mode as well.

gcc/c-family/ChangeLog:

PR preprocessor/87299
* c-pragma.cc (init_pragma): Register `#pragma GCC target' and
related pragmas in preprocess-only mode, and enable early handling.
(c_reset_target_pragmas): New function refactoring code from...
(handle_pragma_reset_options): ...here.
* c-pragma.h (c_reset_target_pragmas): Declare.

gcc/cp/ChangeLog:

PR preprocessor/87299
* parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas ()
after preprocessing is complete, before starting compilation.

gcc/ChangeLog:

PR preprocessor/87299
* toplev.cc (no_backend): New static global.
(finalize): Remove argument no_backend, which is now a
static global.
(process_options): Likewise.
(do_compile): Likewise.
(target_reinit): Don't do anything in preprocess-only mode.
(toplev::main): Adapt to no_backend change.
(toplev::finalize): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/87299
* c-c++-common/pragma-target-1.c: New test.
* c-c++-common/pragma-target-2.c: New test.
* g++.target/i386/pr87299-1.C: New test.
* g++.target/i386/pr87299-2.C: New test.
* gcc.target/i386/pr87299-1.c: New test.
* gcc.target/i386/pr87299-2.c: New test.
* gcc.target/s390/target-attribute/tattr-2.c: Add -save-temps to the
options, to test preprocess-only mode as well.
* gcc.target/aarch64/pragma_cpp_predefs_1.c: Likewise.
* gcc.target/arm/pragma_arch_attribute.c: Likewise.
* gcc.target/nios2/custom-fp-2.c: Likewise.
* gcc.target/powerpc/float128-3.c: Likewise.
---

Notes:
Hello-

This patch fixes the PR by enabling early pragma handling for `#pragma GCC
target' and related pragmas such as `#pragma GCC push_options'. I did not
need to touch any target-specific code, however I did need to make a change
to toplev.cc, affecting all targets, to make it safe to call target_reinit()
in preprocess-only mode. (Otherwise, it would be necessary to modify the
implementation of target pragmas in every target, to avoid this code path.)
That was the only complication I ran into.

Regarding testing, I did: (thanks to GCC compile farm for the non-x86
targets)

bootstrap + regtest all languages - x86_64-pc-linux-gnu
bootstrap + regtest c/c++ - powerpc64le-unknown-linux-gnu,
aarch64-unknown-linux-gnu

The following backends also implement this pragma so ought to be tested:
arm
nios2
s390

I am not able to test those directly. I did add coverage to their testsuites
(basically, adding -save-temps to any existing test, causes it to test the
pragma in preprocess-only mode.) Then, I verified on x86_64 with a cross
compiler, that the modified testcases fail before the patch and pass
afterwards. nios2 is an exception, it does not set any libcpp macros when
handling the pragma, so there is nothing to test, but I did verify that
processing the pragma in preprocess-only mode does not cause any problems.
The cross compilers tested were targets arm-unknown-linux-gnueabi,
nios2-unknown-linux, and s390-ibm-linux.

Please let me know if it looks OK? Thanks!

-Lewis

 gcc/c-family/c-pragma.cc  | 49 ---
 gcc/c-family/c-pragma.h   |  2 +-
 gcc/cp/parser.cc  |  6 +++
 gcc/testsuite/c-c++-common/pragma-target-1.c  | 19 +++
 gcc/testsuite/c-c++-common/pragma-target-2.c  | 27 ++
 gcc/testsuite/g++.target/i386/pr87299-1.C |  8 +++
 

Re: [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-07-31 Thread Lewis Hyatt via Gcc-patches
On Fri, Jul 28, 2023 at 6:58 PM David Malcolm  wrote:
>
> On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > Add a new linemap reason LC_GEN which enables encoding the location
> > of data
> > that was generated during compilation and does not appear in any
> > source file.
> > There could be many use cases, such as, for instance, referring to
> > the content
> > of builtin macros (not yet implemented, but an easy lift after this
> > one.) The
> > first intended application is to create a place to store the input to
> > a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> >
> > The actual change needed to the line-maps API in libcpp is not too
> > large and
> > requires no space overhead in the line map data structures (on 64-bit
> > systems
> > that is; one newly added data member to class line_map_ordinary sits
> > inside
> > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > other,
> > but the TO_FILE member that normally points to the file name points
> > instead to
> > the actual data.  This works automatically with PCH as well, for the
> > same
> > reason that the file name makes its way into a PCH.  In order to
> > avoid
> > confusion, the member has been renamed from TO_FILE to DATA, and
> > associated
> > accessors adjusted.
> >
> > Outside libcpp, there are many small changes but most of them are to
> > selftests, which are necessarily more sensitive to implementation
> > details. From the perspective of the user (the "user", here, being a
> > frontend
> > using line maps or else the diagnostics infrastructure), the chief
> > visible
> > change is that the function location_get_source_line() should be
> > passed an
> > expanded_location object instead of a separate filename and line
> > number.  This
> > is not a big change because in most cases, this information came
> > anyway from a
> > call to expand_location and the needed expanded_location object is
> > readily
> > available. The new overload of location_get_source_line() uses the
> > extra
> > information in the expanded_location object to obtain the data from
> > the
> > in-memory buffer when it originated from an LC_GEN map.
> >
> > Until the subsequent patch that starts using LC_GEN maps, none are
> > yet
> > generated within GCC, hence nothing is added to the testsuite here;
> > but all
> > relevant selftests have been extended to cover generated data maps in
> > addition
> > to normal files.
>
> [..snip...]
>
> Thanks for the updated patch.
>
> Reading this patch, it felt a bit unnatural to me to have an
>   (exploded location, source line)
> pair where the exploded location seems to be representing "which source
> file or generated buffer", but the line/column info in that
> exploded_location is to be ignored in favor of the 2nd source line.
>
> I think we're missing a class: something that identifies either a
> specific source file, or a specific generated buffer.
>
> How about something like either:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_filename_or_buffer (filename),
> m_len (0)
>   {
>   }
>
>   explicit source_id (const char *buffer, unsigned buffer_len)
>   : m_filename_or_buffer (buffer),
> m_len (buffer_len)
>   {
> linemap_assert (buffer_len > 0);
>   }
>
> private:
>   const char *m_filename_or_buffer;
>   unsigned m_len;  // where 0 means "it's a filename"
> };
>
> or:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_ptr (filename),
> m_is_buffer (false)
>   {
>   }
>
>   explicit source_id (const linemap_ordinary *buffer_linemap)
>   : m_ptr (buffer_linemap),
> m_is_buffer (true)
>   {
>   }
>
> private:
>   const void *m_ptr;
>   bool m_is_buffer;
> };
>
> and use one of these "source_id file" in place of "const char *file",
> rather than replacing such things with expanded_location?
>
> > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > index e8d3dece770..4164fa0b1ba 100644
> > --- a/gcc/c-family/c-indentation.cc
> > +++ b/gcc/c-family/c-indentation.cc
> > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> >  unsigned int *first_nws,
> >  unsigned int tab_width)
> >  {
> > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > +  char_span line = location_get_source_line (exploc);
>
> ...so this might contine to be:
>
>   char_span line = location_get_source_line (exploc.file, exploc.line);
>
> ...but expanded_location's "file" field would become a source_id,
> rather than a const char *.  It looks like doing do might make a lot of
> "is this the same file or buffer?"  turn into comparisons of source_id
> instances.
>
> So I think expanded_location would become:
>
> typedef struct
> {
>   /* Either the name of the source file involved, or the
>  specific generated buffer.  */
>   source_id file;
>
>   /* The line-location in 

Re: [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-07-29 Thread Lewis Hyatt via Gcc-patches
On Fri, Jul 28, 2023 at 6:22 PM David Malcolm  wrote:
>
> On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > Hello-
> >
> > This is an update to the v2 patch series last sent in January:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html
> >
> > While I did not receive any feedback on the v2 patches yet, they did
> > need some
> > rebasing on top of other recent commits to input.cc, so I thought it
> > would be
> > helpful to send them again now. The patches have not otherwise
> > changed from
> > v2, and the above-linked message explains how all the patches fit in
> > with the
> > original v1 series sent last November.
> >
> > Dave, I would appreciate it very much if you could please let me know
> > what you
> > think of this approach? I feel like the diagnostics we currently
> > output for _Pragmas are worth improving. As a reminder, say for this
> > example:
> >
> > =
> >  #define S "GCC diagnostic ignored \"oops"
> >  _Pragma(S)
> > =
> >
> > We currently output:
> >
> > =
> > file.cpp:2:24: warning: missing terminating " character
> > 2 | _Pragma(S)
> >   |^
> > =
> >
> > While after these patches, we would output:
> >
> > ==
> > :1:24: warning: missing terminating " character
> > 1 | GCC diagnostic ignored "oops
> >   |^
> > file.cpp:2:1: note: in <_Pragma directive>
> > 2 | _Pragma(S)
> >   | ^~~
> > ==
> >
> > Thanks!
>
> Hi Lewis; sorry for not responding to the v2 patches.
>
> I've started looking at the v3 patches in detail, but I have some high-
> level questions about memory usage:
>
> Am I right in thinking that the effect of this patch is that for every
> _Pragma in the source we will create a new line_map_ordinary, and a new
> buffer for the stringified content of that _Pragma, and that these
> allocations will persist for the rest of the compilation?  (plus a
> little extra allocation within the "location_t" space from 0 to
> 0x7fff).
>
> It sounds like this will probably be a rounding error that won't be
> noticable in profiling, but did you attempt any such measurement of the
> memory usage before/after this patch on some real-world projects?
>
> Thanks
> Dave
>

Thanks for looking at the patches, I appreciate it whenever you have
time to get to them.

This is a fair point about the memory usage, basically it means that
each instance of a _Pragma has comparable memory footprint to a macro
definition. (In addition to the overheads you mentioned, it also
creates a macro map to generate a virtual location for the tokens, so
that it's able to output the "in expansion of _Pragma" note. That part
can be disabled with -ftrack-macro-expansion=0 at least.)

I had the sense that _Pragma isn't used often enough for that to be a
problem, but agreed it is worth checking. (I really hope this memory
usage isn't an issue since there are also numerous PRs complaining
about 32-bit limitations in location tracking, that make it tempting
to explore 64-bit line maps or some other option someday too.)

I tried one thing now, wxWidgets uses a lot of diagnostic pragmas
wrapped up inside macros that use _Pragma. (See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578). The testsuite
contains a file allheaders.cpp which includes the whole library, so I
tried compiling this into a pch, which I believe measures the entire
memory footprint including the ordinary and macro line maps and the
_Pragma strings. The resulting PCH sizes were:

279000173 bytes before the changes
279491345 bytes after the changes

So 0.1% bigger. Happy to check other projects too, do you have any
standard gotos? Maybe firefox or something I take it.

I see your other response on patch #1, I am thinking about that and
will reply later. Thanks again!

-Lewis


Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-28 Thread Lewis Hyatt via Gcc-patches
On Thu, Jul 27, 2023 at 06:18:33PM -0700, Jason Merrill wrote:
> On 7/27/23 18:59, Lewis Hyatt wrote:
> > In order to support processing #pragma in preprocess-only mode (-E or
> > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> > libcpp. In full compilation modes, this is accomplished by calling
> > pragma_lex (), which is a symbol that must be exported by the frontend, and
> > which is currently implemented for C and C++. Neither of those frontends
> > initializes its parser machinery in preprocess-only mode, and consequently
> > pragma_lex () does not work in this case.
> > 
> > Address that by adding a new function c_init_preprocess () for the frontends
> > to implement, which arranges for pragma_lex () to work in preprocess-only
> > mode, and adjusting pragma_lex () accordingly.
> > 
> > In preprocess-only mode, the preprocessor is accustomed to controlling the
> > interaction with libcpp, and it only knows about tokens that it has called
> > into libcpp itself to obtain. Since it still needs to see the tokens
> > obtained by pragma_lex () so that they can be streamed to the output, also
> > adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
> > inform the preprocessor about any tokens it won't be aware of.
> > 
> > Currently, there is one place where we are already supporting #pragma in
> > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> > was done by directly interfacing with libcpp, rather than making use of
> > pragma_lex (). Now that pragma_lex () works, that code is no longer
> > necessary; remove it.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > * c-common.h (c_init_preprocess): Declare.
> > (c_lex_enable_token_streaming): Declare.
> > * c-opts.cc (c_common_init): Call c_init_preprocess ().
> > * c-lex.cc (stream_tokens_to_preprocessor): New static variable.
> > (c_lex_enable_token_streaming): New function.
> > (cb_def_pragma): Add a comment.
> > (get_token): New function wrapping cpp_get_token.
> > (c_lex_with_flags): Use the new wrapper function to support
> > obtaining tokens in preprocess_only mode.
> > (lex_string): Likewise.
> > * c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
> > when needed.
> > * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> > (pragma_diagnostic_lex): ...this.
> > (pragma_diagnostic_lex_pp): Remove.
> > (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> > all modes.
> > (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> > usage.
> > * c-pragma.h (pragma_lex_discard_to_eol): Declare.
> > 
> > gcc/c/ChangeLog:
> > 
> > * c-parser.cc (pragma_lex_discard_to_eol): New function.
> > (c_init_preprocess): New function.
> > 
> > gcc/cp/ChangeLog:
> > 
> > * parser.cc (c_init_preprocess): New function.
> > (maybe_read_tokens_for_pragma_lex): New function.
> > (pragma_lex): Support preprocess-only mode.
> > (pragma_lex_discard_to_eol): New function.
> > ---
> > 
> > Notes:
> >  Hello-
> >  Here is version 2 of the patch, incorporating Jason's feedback from
> >  https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
> >  Thanks again, please let me know if it's OK? Bootstrap + regtest all
> >  languages on x86-64 Linux looks good.
> >  -Lewis
> > 
> >   gcc/c-family/c-common.h|  4 +++
> >   gcc/c-family/c-lex.cc  | 49 +
> >   gcc/c-family/c-opts.cc |  1 +
> >   gcc/c-family/c-ppoutput.cc | 17 +---
> >   gcc/c-family/c-pragma.cc   | 56 ++
> >   gcc/c-family/c-pragma.h|  2 ++
> >   gcc/c/c-parser.cc  | 21 ++
> >   gcc/cp/parser.cc   | 45 ++
> >   8 files changed, 138 insertions(+), 57 deletions(-)
> > 
> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index b5ef5ff6b2c..2fe2f194660 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -990,6 +990,9 @@ extern void c_parse_file (void);
> >   extern void c_parse_final_cleanups (void);
> > +/* This initializes for preprocess-only mode.  */
> > +extern void c_init_preprocess (void);
> > +
> >   /* These macros provide convenient access to the various _STMT nodes.  */
> >   /* Nonzero if a given STATEMENT_LIST represents the outermost binding
> > @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, 
> > tree);
> >   /* In c-lex.cc.  */
> >   extern enum cpp_ttype
> >   conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
> > +extern void c_lex_enable_token_streaming (bool enabled);
> >   /* In c-pch.cc  */
> >   extern void pch_init (void);
> > diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> > index dcd061c7cb1..ac4c018d863 100644
> > --- a/gcc/c-family/c-lex.cc
> > +++ b/gcc/c-family/c-lex.cc
> > @@ -57,6 +57,17 @@ static void 

[PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-27 Thread Lewis Hyatt via Gcc-patches
In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare.
(c_lex_enable_token_streaming): Declare.
* c-opts.cc (c_common_init): Call c_init_preprocess ().
* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
(c_lex_enable_token_streaming): New function.
(cb_def_pragma): Add a comment.
(get_token): New function wrapping cpp_get_token.
(c_lex_with_flags): Use the new wrapper function to support
obtaining tokens in preprocess_only mode.
(lex_string): Likewise.
* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
when needed.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
---

Notes:
Hello-

Here is version 2 of the patch, incorporating Jason's feedback from
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html

Thanks again, please let me know if it's OK? Bootstrap + regtest all
languages on x86-64 Linux looks good.

-Lewis

 gcc/c-family/c-common.h|  4 +++
 gcc/c-family/c-lex.cc  | 49 +
 gcc/c-family/c-opts.cc |  1 +
 gcc/c-family/c-ppoutput.cc | 17 +---
 gcc/c-family/c-pragma.cc   | 56 ++
 gcc/c-family/c-pragma.h|  2 ++
 gcc/c/c-parser.cc  | 21 ++
 gcc/cp/parser.cc   | 45 ++
 8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if a given STATEMENT_LIST represents the outermost binding
@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
 /* In c-lex.cc.  */
 extern enum cpp_ttype
 conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
 
 /* In c-pch.cc  */
 extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const 
cpp_string *);
 static void cb_def_pragma (cpp_reader *, unsigned int);
 static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
 static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
+
+/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
+   tokens obtained here need to be streamed to the preprocessor.  */

Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-26 Thread Lewis Hyatt via Gcc-patches
On Wed, Jul 26, 2023 at 5:36 PM Jason Merrill  wrote:
>
> On 6/30/23 18:59, Lewis Hyatt wrote:
> > In order to support processing #pragma in preprocess-only mode (-E or
> > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> > libcpp. In full compilation modes, this is accomplished by calling
> > pragma_lex (), which is a symbol that must be exported by the frontend, and
> > which is currently implemented for C and C++. Neither of those frontends
> > initializes its parser machinery in preprocess-only mode, and consequently
> > pragma_lex () does not work in this case.
> >
> > Address that by adding a new function c_init_preprocess () for the frontends
> > to implement, which arranges for pragma_lex () to work in preprocess-only
> > mode, and adjusting pragma_lex () accordingly.
> >
> > In preprocess-only mode, the preprocessor is accustomed to controlling the
> > interaction with libcpp, and it only knows about tokens that it has called
> > into libcpp itself to obtain. Since it still needs to see the tokens
> > obtained by pragma_lex () so that they can be streamed to the output, also
> > add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> > sees these tokens too.
> >
> > Currently, there is one place where we are already supporting #pragma in
> > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> > was done by directly interfacing with libcpp, rather than making use of
> > pragma_lex (). Now that pragma_lex () works, that code is no longer
> > necessary; remove it.
> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-common.h (c_init_preprocess): Declare new function.
> >   * c-opts.cc (c_common_init): Call it.
> >   * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> >   (pragma_diagnostic_lex): ...this.
> >   (pragma_diagnostic_lex_pp): Remove.
> >   (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> >   all modes.
> >   (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> >   usage.
> >   * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
> >
> > gcc/c/ChangeLog:
> >
> >   * c-parser.cc (pragma_lex): Support preprocess-only mode.
> >   (pragma_lex_discard_to_eol): New function.
> >   (c_init_preprocess): New function.
> >
> > gcc/cp/ChangeLog:
> >
> >   * parser.cc (c_init_preprocess): New function.
> >   (maybe_read_tokens_for_pragma_lex): New function.
> >   (pragma_lex): Support preprocess-only mode.
> >   (pragma_lex_discard_to_eol): New funtion.
> >
> > libcpp/ChangeLog:
> >
> >   * include/cpplib.h (struct cpp_callbacks): Add new callback
> >   on_token_lex.
> >   * macro.cc (cpp_get_token_1): Support new callback.
> > ---
> >
> > Notes:
> >  Hello-
> >
> >  In r13-1544, I added support for processing `#pragma GCC diagnostic' in
> >  preprocess-only mode. Because pragma_lex () doesn't work in that mode, 
> > in
> >  that patch I called into libcpp directly to obtain the tokens needed to
> >  process the pragma. As part of the review, Jason noted that it would
> >  probably be better to make pragma_lex () usable in preprocess-only 
> > mode, and
> >  we decided just to add a comment about that for the time being, and to 
> > go
> >  ahead and implement that in the future, if it became necessary to 
> > support
> >  other pragmas during preprocessing.
> >
> >  I think now is a good time to proceed with that plan, because I would 
> > like
> >  to fix PR87299, which is about another pragma (#pragma GCC target) not
> >  working in preprocess-only mode. This patch makes the necessary 
> > changes for
> >  pragma_lex () to work in preprocess-only mode.
> >
> >  I have also added a new callback, on_token_lex (), to libcpp. This is 
> > so the
> >  preprocessor can see and stream out all the tokens that pragma_lex () 
> > gets
> >  from libcpp, since it won't otherwise see them.  This seemed the 
> > simplest
> >  approach to me. Another possibility would be to add a wrapper function 
> > in
> >  c-family/c-lex.cc, which would call cpp_get_token_with_location(), and 
> > then
> >  also stream the token in preprocess-only mode, and then change all 
> > calls
> >  into libcpp in that file to use the wrapper function.  The libcpp 
> > callback
> >  seemed cleaner to me FWIW.
>
> I think the other way sounds better to me; there are only three calls to
> cpp_get_... in c_lex_with_flags.
>
> The rest of the patch looks good.

Thank you very much for the feedback. I will test it this way and send
the updated version.

-Lewis


Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-07-26 Thread Lewis Hyatt via Gcc-patches
May I please ping this?
I am just about ready with the followup patch that fixes PR87299, but
it depends on this one. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623364.html

-Lewis

On Fri, Jun 30, 2023 at 6:59 PM Lewis Hyatt  wrote:
>
> In order to support processing #pragma in preprocess-only mode (-E or
> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> libcpp. In full compilation modes, this is accomplished by calling
> pragma_lex (), which is a symbol that must be exported by the frontend, and
> which is currently implemented for C and C++. Neither of those frontends
> initializes its parser machinery in preprocess-only mode, and consequently
> pragma_lex () does not work in this case.
>
> Address that by adding a new function c_init_preprocess () for the frontends
> to implement, which arranges for pragma_lex () to work in preprocess-only
> mode, and adjusting pragma_lex () accordingly.
>
> In preprocess-only mode, the preprocessor is accustomed to controlling the
> interaction with libcpp, and it only knows about tokens that it has called
> into libcpp itself to obtain. Since it still needs to see the tokens
> obtained by pragma_lex () so that they can be streamed to the output, also
> add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> sees these tokens too.
>
> Currently, there is one place where we are already supporting #pragma in
> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> was done by directly interfacing with libcpp, rather than making use of
> pragma_lex (). Now that pragma_lex () works, that code is no longer
> necessary; remove it.
>
> gcc/c-family/ChangeLog:
>
> * c-common.h (c_init_preprocess): Declare new function.
> * c-opts.cc (c_common_init): Call it.
> * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> (pragma_diagnostic_lex): ...this.
> (pragma_diagnostic_lex_pp): Remove.
> (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> all modes.
> (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> usage.
> * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
>
> gcc/c/ChangeLog:
>
> * c-parser.cc (pragma_lex): Support preprocess-only mode.
> (pragma_lex_discard_to_eol): New function.
> (c_init_preprocess): New function.
>
> gcc/cp/ChangeLog:
>
> * parser.cc (c_init_preprocess): New function.
> (maybe_read_tokens_for_pragma_lex): New function.
> (pragma_lex): Support preprocess-only mode.
> (pragma_lex_discard_to_eol): New funtion.
>
> libcpp/ChangeLog:
>
> * include/cpplib.h (struct cpp_callbacks): Add new callback
> on_token_lex.
> * macro.cc (cpp_get_token_1): Support new callback.
> ---
>
> Notes:
> Hello-
>
> In r13-1544, I added support for processing `#pragma GCC diagnostic' in
> preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
> that patch I called into libcpp directly to obtain the tokens needed to
> process the pragma. As part of the review, Jason noted that it would
> probably be better to make pragma_lex () usable in preprocess-only mode, 
> and
> we decided just to add a comment about that for the time being, and to go
> ahead and implement that in the future, if it became necessary to support
> other pragmas during preprocessing.
>
> I think now is a good time to proceed with that plan, because I would like
> to fix PR87299, which is about another pragma (#pragma GCC target) not
> working in preprocess-only mode. This patch makes the necessary changes 
> for
> pragma_lex () to work in preprocess-only mode.
>
> I have also added a new callback, on_token_lex (), to libcpp. This is so 
> the
> preprocessor can see and stream out all the tokens that pragma_lex () gets
> from libcpp, since it won't otherwise see them.  This seemed the simplest
> approach to me. Another possibility would be to add a wrapper function in
> c-family/c-lex.cc, which would call cpp_get_token_with_location(), and 
> then
> also stream the token in preprocess-only mode, and then change all calls
> into libcpp in that file to use the wrapper function.  The libcpp callback
> seemed cleaner to me FWIW.
>
> There are no new tests added here, since it's just a change of
> implementation covered by existing tests. Bootstrap + regtest all 
> languages
> looks good on x86-64 Linux.
>
> Please let me know what you think? Thanks!
>
> -Lewis
>
>  gcc/c-family/c-common.h  |  3 +++
>  gcc/c-family/c-opts.cc   |  1 +
>  gcc/c-family/c-pragma.cc | 56 ++--
>  gcc/c-family/c-pragma.h  |  2 ++
>  gcc/c/c-parser.cc| 34 
>  gcc/cp/parser.cc | 50 +++
>  libcpp/include/cpplib.h  |  4 

[PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The actual change needed to the line-maps API in libcpp is not too large and
requires no space overhead in the line map data structures (on 64-bit systems
that is; one newly added data member to class line_map_ordinary sits inside
former padding bytes.) An LC_GEN map is just an ordinary map like any other,
but the TO_FILE member that normally points to the file name points instead to
the actual data.  This works automatically with PCH as well, for the same
reason that the file name makes its way into a PCH.  In order to avoid
confusion, the member has been renamed from TO_FILE to DATA, and associated
accessors adjusted.

Outside libcpp, there are many small changes but most of them are to
selftests, which are necessarily more sensitive to implementation
details. From the perspective of the user (the "user", here, being a frontend
using line maps or else the diagnostics infrastructure), the chief visible
change is that the function location_get_source_line() should be passed an
expanded_location object instead of a separate filename and line number.  This
is not a big change because in most cases, this information came anyway from a
call to expand_location and the needed expanded_location object is readily
available. The new overload of location_get_source_line() uses the extra
information in the expanded_location object to obtain the data from the
in-memory buffer when it originated from an LC_GEN map.

Until the subsequent patch that starts using LC_GEN maps, none are yet
generated within GCC, hence nothing is added to the testsuite here; but all
relevant selftests have been extended to cover generated data maps in addition
to normal files.

libcpp/ChangeLog:

* include/line-map.h (enum lc_reason): Add LC_GEN.
(struct line_map_ordinary): Add new members to support LC_GEN concept.
(ORDINARY_MAP_FILE_NAME): Assert that map really does encode a file
and not generated data.
(ORDINARY_MAP_GENERATED_DATA_P): New function.
(ORDINARY_MAP_GENERATED_DATA): New function.
(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
(ORDINARY_MAP_FILE_NAME_OR_DATA): New function.
(ORDINARY_MAPS_SAME_FILE_P): Declare new function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare new function.
(LINEMAP_FILE): This was always a synonym for ORDINARY_MAP_FILE_NAME;
make this explicit.
(linemap_get_file_highest_location): Adjust prototype.
(linemap_add): Adjust prototype.
(class expanded_location): Add new members to store generated content.
* line-map.cc (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
(ORDINARY_MAPS_SAME_FILE_P): New function.
(linemap_add): Add new argument DATA_LEN. Support generated data in
LC_GEN maps.
(linemap_check_files_exited): Adapt to API changes supporting LC_GEN.
(linemap_line_start): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_expand_location): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
* directives.cc (_cpp_do_file_change): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (make_range): Initialize new fields in
expanded_location.
(compatible_locations_p): Use new ORDINARY_MAPS_SAME_FILE_P ()
function.
(layout::calculate_x_offset_display): Use the new expanded_location
overload of location_get_source_line(), so as to support LC_GEN maps.
(layout::print_line): Likewise.
(source_line::source_line): Likewise.
(line_corrections::add_hint): Likewise.
(class line_corrections): Store the location as an exploc rather than
individual filename, so as to support LC_GEN maps.
(layout::print_trailing_fixits): Use the new exploc constructor for
class line_corrections.
(test_layout_x_offset_display_utf8): Test LC_GEN maps as well as normal.
(test_layout_x_offset_display_tab): Likewise.
(test_diagnostic_show_locus_one_liner): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_add_location_if_nearby): Likewise.
(test_diagnostic_show_locus_fixit_lines): Likewise.
(test_fixit_consolidation): Likewise.
(test_overlapped_fixit_printing): Likewise.

[PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^

with the caret in a nonsensical location, while this one:

=
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=

produces:

file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

==
In buffer generated from file.cpp:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

and

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

* directives.cc (get_token_no_padding): Add argument to receive the
virtual location of the token.
(get__Pragma_string): Likewise.
(do_pragma): Set pfile->directive_result->src_loc properly, it should
not be a virtual location.
(destringize_and_run): Update to provide proper locations for the
_Pragma string tokens.  Support raw strings.
(_cpp_do__Pragma): Adapt to changes to the helper functions.
* errors.cc (cpp_diagnostic_at): Support
cpp_reader::diagnostic_rebase_loc.
(cpp_diagnostic_with_line): Likewise.
* include/line-map.h (class rich_location): Add new member
forget_cached_expanded_locations().
* internal.h (struct _cpp__Pragma_state): Define new struct.
(_cpp_rebase_diagnostic_location): Declare new function.
(struct cpp_reader): Add diagnostic_rebase_loc member.
(_cpp_push__Pragma_token_context): Declare new function.
(_cpp_do__Pragma): Adjust prototype.
* macro.cc (pragma_str): New static var.
(builtin_macro): Adapt to new implementation of _Pragma processing.
(_cpp_pop_context): Fix the logic for resetting
pfile->top_most_macro_node, which previously was never triggered,
although the error seems to have been harmless.
(_cpp_push__Pragma_token_context): New function.
(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
macro tracking output for _Pragma directives.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
tracking output for _Pragma directives.
* c-c++-common/cpp/pr57580.c: Likewise.
* c-c++-common/gomp/pragma-3.c: Likewise.

[PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output

2023-07-21 Thread Lewis Hyatt via Gcc-patches
The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

* diagnostic-format-sarif.cc (sarif_builder::xloc_to_fb): New function.
(sarif_builder::maybe_make_physical_location_object): Support
generated data locations.
(sarif_builder::make_artifact_location_object): Likewise.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
(sarif_builder::make_artifact_object): Likewise.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc| 115 +++---
 .../diagnostic-format-sarif-file-5.c  |  31 +
 2 files changed, 99 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 5e483988027..29f614124b2 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -173,7 +173,10 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+
+  typedef std::pair filename_or_buffer;
+  json::object *make_artifact_location_object (filename_or_buffer fb);
+
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -196,16 +199,17 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) 
const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
-   int start_line,
+  json::object *make_artifact_object (filename_or_buffer fb);
+  json::object *
+  maybe_make_artifact_content_object (filename_or_buffer fb) const;
+  json::object *maybe_make_artifact_content_object (expanded_location xloc,
int end_line) const;
   json::object *make_fix_object (const rich_location _loc);
   json::object *make_artifact_change_object (const rich_location );
   json::object *make_replacement_object (const fixit_hint ) const;
   json::object *make_artifact_content_object (const char *text) const;
   int get_sarif_column (expanded_location exploc) const;
+  static filename_or_buffer xloc_to_fb (expanded_location xloc);
 
   diagnostic_context *m_context;
 
@@ -219,7 +223,11 @@ private:
  diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set  m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+ with that length, not a filename.  */
+  hash_set ,
+  int_hash  >
+   > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set  m_rule_id_set;
   json::array *m_rules_arr;
@@ -749,6 +757,15 @@ sarif_builder::make_location_object (const 
diagnostic_event )
   return location_obj;
 }
 
+/* Populate a filename_or_buffer pair from an expanded location.  */
+sarif_builder::filename_or_buffer
+sarif_builder::xloc_to_fb (expanded_location xloc)
+{
+  if (xloc.generated_data_len)
+return filename_or_buffer (xloc.generated_data, xloc.generated_data_len);
+  return filename_or_buffer (xloc.file, 0);
+}
+
 /* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC,
or return NULL;
Add any filename to the m_artifacts.  */
@@ -764,7 +781,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  m_filenames.add (xloc_to_fb (expand_location (loc)));
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -788,7 +805,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (xloc_to_fb (expand_location (loc)));
 }
 
 /* The 

[PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Class edit_context handles outputting fixit hints in diff form that could be
manually or automatically applied by the user. This will not make sense for
generated data locations, such as the contents of a _Pragma string, because
the text to be modified does not appear in the user's input files. We do not
currently ever generate fixit hints in such a context, but for future-proofing
purposes, ignore such locations in edit context now.

gcc/ChangeLog:

* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.
---
 gcc/edit-context.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6f5bc6b9d8f..ae11b6f2e00 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint)
 return false;
   if (start.column == 0)
 return false;
+  if (start.generated_data)
+return false;
   if (next_loc.column == 0)
 return false;
+  if (next_loc.generated_data)
+return false;
 
   edited_file  = get_or_insert_file (start.file);
   if (!m_valid)


[PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-07-21 Thread Lewis Hyatt via Gcc-patches
Hello-

This is an update to the v2 patch series last sent in January:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html

While I did not receive any feedback on the v2 patches yet, they did need some
rebasing on top of other recent commits to input.cc, so I thought it would be
helpful to send them again now. The patches have not otherwise changed from
v2, and the above-linked message explains how all the patches fit in with the
original v1 series sent last November.

Dave, I would appreciate it very much if you could please let me know what you
think of this approach? I feel like the diagnostics we currently
output for _Pragmas are worth improving. As a reminder, say for this example:

=
 #define S "GCC diagnostic ignored \"oops"
 _Pragma(S)
=

We currently output:

=
file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^
=

While after these patches, we would output:

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

Thanks!

-Lewis


[committed] testsuite: Fix C++ UDL tests failing on 32-bit arch [PR103902]

2023-07-19 Thread Lewis Hyatt via Gcc-patches
These tests need to use "size_t" rather than "unsigned long"
for the user-defined literal function arguments.

gcc/testsuite/ChangeLog:

PR preprocessor/103902
* g++.dg/cpp0x/udlit-extended-id-1.C: Change "unsigned long" to
"size_t" throughout.
* g++.dg/cpp0x/udlit-extended-id-3.C: Likewise.
---

Notes:
Hello-

As noted on the PR, these newly added tests fail on 32-bit architectures
because they said "unsigned long" where they are supposed to say "size_t".
Committed this fix as obvious.

-Lewis

 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C | 9 +
 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C | 6 --
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
index 5ea5ef09db6..c7091e9e8a2 100644
--- a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
@@ -1,6 +1,7 @@
 // { dg-do run { target c++11 } }
 // { dg-additional-options "-Wno-error=normalized" }
 #include 
+#include 
 using namespace std;
 
 constexpr unsigned long long operator "" _π (unsigned long long x)
@@ -21,22 +22,22 @@ char x2[2_Π2];
 static_assert (sizeof x1 == 3, "test1");
 static_assert (sizeof x2 == 8, "test2");
 
-const char * operator "" _1σ (const char *s, unsigned long)
+const char * operator "" _1σ (const char *s, size_t)
 {
   return s + 1;
 }
 
-const char * operator ""_Σ2 (const char *s, unsigned long)
+const char * operator ""_Σ2 (const char *s, size_t)
 {
   return s + 2;
 }
 
-const char * operator "" _\U00e61 (const char *s, unsigned long)
+const char * operator "" _\U00e61 (const char *s, size_t)
 {
   return "ae";
 }
 
-const char* operator ""_\u01532 (const char *s, unsigned long)
+const char* operator ""_\u01532 (const char *s, size_t)
 {
   return "oe";
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C 
b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
index 11292e476e3..cb8a957947a 100644
--- a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
@@ -1,9 +1,11 @@
 // { dg-do compile { target c++11 } }
+#include 
+using namespace std;
 
 // Check that we do not look for poisoned identifier when it is a suffix.
 int _ħ;
 #pragma GCC poison _ħ
-const char * operator ""_ħ (const char *, unsigned long); // { dg-bogus 
"poisoned" }
+const char * operator ""_ħ (const char *, size_t); // { dg-bogus "poisoned" }
 bool operator ""_ħ (unsigned long long x); // { dg-bogus "poisoned" }
 bool b = 1_ħ; // { dg-bogus "poisoned" }
 const char *x = "hbar"_ħ; // { dg-bogus "poisoned" }
@@ -11,5 +13,5 @@ const char *x = "hbar"_ħ; // { dg-bogus "poisoned" }
 /* Ideally, we should not warn here either, but this is not implemented yet.  
This
syntax has been deprecated for C++23.  */
 #pragma GCC poison _ħ2
-const char * operator "" _ħ2 (const char *, unsigned long); // { dg-bogus 
"poisoned" "" { xfail *-*-*} }
+const char * operator "" _ħ2 (const char *, size_t); // { dg-bogus "poisoned" 
"" { xfail *-*-*} }
 const char *x2 = "hbar2"_ħ2; // { dg-bogus "poisoned" }


Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-07-11 Thread Lewis Hyatt via Gcc-patches
May I please ping this patch again? I think it would be worthwhile to
close this gap in the support for UTF-8 sources. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

-Lewis

On Fri, Jun 2, 2023 at 9:45 AM Lewis Hyatt  wrote:
>
> Hello-
>
> Ping please? Thanks.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html
>
> -Lewis
>
> On Tue, May 2, 2023 at 9:27 AM Lewis Hyatt  wrote:
> >
> > May I please ping this one? Thanks...
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html
> >
> > On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
> > >
> > > The PR complains that we do not handle UTF-8 in the suffix for a 
> > > user-defined
> > > literal, such as:
> > >
> > > bool operator ""_π (unsigned long long);
> > >
> > > In fact we don't handle any extended identifier characters there, whether
> > > UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space 
> > > after
> > > the "" tokens is included, since then the identifier is lexed in the 
> > > "normal"
> > > way as its own token. But when it is lexed as part of the string token, 
> > > this
> > > is handled in lex_string() with a one-off loop that is not aware of 
> > > extended
> > > characters.
> > >
> > > This patch fixes it by adding a new function scan_cur_identifier() that 
> > > can be
> > > used to lex an identifier while in the middle of lexing another token.
> > >
> > > BTW, the other place that has been mis-lexing identifiers is
> > > lex_identifier_intern(), which is used to implement #pragma push_macro
> > > and #pragma pop_macro. This does not support extended characters either.
> > > I will add that in a subsequent patch, because it can't directly reuse the
> > > new function, but rather needs to lex from a string instead of a 
> > > cpp_buffer.
> > >
> > > With scan_cur_identifier(), we do also correctly warn about bidi and
> > > normalization issues in the extended identifiers comprising the suffix.
> > >
> > > libcpp/ChangeLog:
> > >
> > > PR preprocessor/103902
> > > * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> > > some common code.
> > > (lex_identifier_intern): Use the new function.
> > > (lex_identifier): Don't run identifier diagnostics here, rather 
> > > let
> > > the call site do it when needed.
> > > (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> > > acccordingly.
> > > (struct scan_id_result): New struct.
> > > (scan_cur_identifier): New function.
> > > (create_literal2): New function.
> > > (lit_accum::create_literal2): New function.
> > > (is_macro): Folded into new function...
> > > (maybe_ignore_udl_macro_suffix): ...here.
> > > (is_macro_not_literal_suffix): Folded likewise.
> > > (lex_raw_string): Handle UTF-8 in UDL suffix via 
> > > scan_cur_identifier ().
> > > (lex_string): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR preprocessor/103902
> > > * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> > > * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> > > * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> > > * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> > > ---
> > >
> > > Notes:
> > > Hello-
> > >
> > > This is the updated version of the patch, incorporating feedback from 
> > > Jakub
> > > and Jason, most recently discussed here:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
> > >
> > > Please let me know how it looks? It is simpler than before with the 
> > > new
> > > approach. Thanks!
> > >
> > > One thing to note. As Jason clarified for me, a usage like this:
> > >
> > >  #pragma GCC poison _x
> > > const char * operator "" _x (const char *, unsigned long);
> > >
> > > The space between the "" and the _x is currently allowed but will be
> > > deprecated in C++23. GCC currently will complain about the poisoned 
> > > use of
> > > _x in this case, and this patch, which is just focused on handling 
> > > UTF-8
> > > properly, does not change this. But it seems that it would be correct
> > > not to apply poison in this case. I can try to follow up with a patch 
> > > to do
> > > so, if it seems worthwhile? Given the syntax is deprecated, maybe 
> > > it's not
> > > worth it...
> > >
> > > For the time being, this patch does add a testcase for the above and 
> > > xfails
> > > it. For the case where no space is present, which is the part touched 
> > > by the
> > > present patch, existing behavior is preserved correctly and no 
> > > diagnostics
> > > such as poison are issued for the UDL suffix. (Contrary to v1 of this
> > > patch.)
> > >
> > > Thanks! bootstrap + regtested all languages on x86-64 Linux with
> > > no regressions.
> > >
> > > -Lewis
> > >
> > >  

Re: 'unsigned int len' field in 'libcpp/include/symtab.h:struct ht_identifier' (was: [PATCH] pch: Fix streaming of strings with embedded null bytes)

2023-07-04 Thread Lewis Hyatt via Gcc-patches
On Tue, Jul 4, 2023 at 11:50 AM Thomas Schwinge  wrote:
>
> Hi!
>
> I came across this one here on my way working through another (somewhat
> related) GTY issue.  I generally do understand the issue here, but do
> have a question about 'unsigned int len' field in
> 'libcpp/include/symtab.h:struct ht_identifier':
>
> On 2022-10-18T18:14:54-0400, Lewis Hyatt via Gcc-patches 
>  wrote:
> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it 
> > contains
> > (whether they live in GC-controlled memory or not) will be marked for PCH
> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> > special-cases plain char* strings, and in particular it uses strlen() to get
> > their length.
>
> Oh, wow, this special casing for strings...  8-|
>
> > Thus it does not handle strings with embedded null bytes, but it
> > is possible for something PCH cares about (such as a string literal token 
> > in a
> > macro definition) to contain such embedded nulls. To fix that up, add a new
> > GTY option "string_length" so that gt_pch_note_object can be informed the
> > actual length it ought to use, and use it in the relevant libcpp structs
> > (cpp_string and ht_identifier) accordingly.
>
> For your test case:
>
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.C
> > @@ -0,0 +1,3 @@
> > +// { dg-do compile { target c++11 } }
> > +#include "pch-string-nulls.H"
> > +static_assert (X[4] == '[' && X[5] == '!' && X[6] == ']', "error");
>
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs
> > @@ -0,0 +1,2 @@
> > +/* Note that there is a null byte following "ABC".  */
> > +#define X R"(ABC^@[!])"
>
> ..., I understand how the following is necessary:
>
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -179,7 +179,11 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11, 
> > CLK_GNUC17, CLK_GNUC2X,
> >  /* Payload of a NUMBER, STRING, CHAR or COMMENT token.  */
> >  struct GTY(()) cpp_string {
> >unsigned int len;
> > -  const unsigned char *text;
> > +
> > +  /* TEXT is always null terminated (terminator not included in len); but 
> > this
> > + GTY markup arranges that PCH streaming works properly even if there 
> > is a
> > + null byte in the middle of the string.  */
> > +  const unsigned char * GTY((string_length ("1 + %h.len"))) text;
> >  };
>
> (That is, the test case FAILs with that one reverted.)
>
> However, this one did confuse me:
>
> > --- a/libcpp/include/symtab.h
> > +++ b/libcpp/include/symtab.h
> > @@ -29,7 +29,10 @@ along with this program; see the file COPYING3.  If not 
> > see
> >  typedef struct ht_identifier ht_identifier;
> >  typedef struct ht_identifier *ht_identifier_ptr;
> >  struct GTY(()) ht_identifier {
> > -  const unsigned char *str;
> > +  /* This GTY markup arranges that the null-terminated identifier would 
> > still
> > + stream to PCH correctly, if a null byte were to make its way into an
> > + identifier somehow.  */
> > +  const unsigned char * GTY((string_length ("1 + %h.len"))) str;
> >unsigned int len;
> >unsigned int hash_value;
> >  };
>
> I did wonder whether that's an actual or just a theoretical concern, to
> have 'ht_identifier's with embedded NULs?  If an actual concern, can we
> get a test case constructed?  Otherwise, should we revert this hunk,
> given that we have this auto-'strlen' handling, ignorant of embedded
> NULs, in a lot of other places?
>
> But then I realized that possibly we do maintain 'len' here not for
> correctness but as an optimization (trading an 'unsigned int' for
> repeated 'strlen' calls)?  My quick testing with the attached
> "[RFC] Verify no embedded NULs in 'struct ht_identifier'" might confirm
> this theory: no regressions (..., but I didn't bootstrap, and ran only
> parts of the testsuite).  (Not proposing that RFC for 'git push', of
> course.)
>
> If that's indeed the intention here, I shall change/add source code
> commentary to describe this rationale for this dedicated 'len' field
> (plus, that handling of embedded NULs falls out of that, automatically).
>
> For reference, this 'len' field has existed "forever".  Before
> 'struct ht_identifier' was added in
> commit 2a967f3d3a45294640e155381ef549e0b8090ad4 (Subversion r42334), we
> had in 'gcc/cpplib.h:struct cpp_hashnode': 'unsigned short len', or
> earlier 'length', earlier in '

[PATCH] c-family: Implement pragma_lex () for preprocess-only mode

2023-06-30 Thread Lewis Hyatt via Gcc-patches
In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
add a new libcpp callback, on_token_lex (), that ensures the preprocessor
sees these tokens too.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

* c-common.h (c_init_preprocess): Declare new function.
* c-opts.cc (c_common_init): Call it.
* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
(pragma_diagnostic_lex): ...this.
(pragma_diagnostic_lex_pp): Remove.
(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
all modes.
(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
usage.
* c-pragma.h (pragma_lex_discard_to_eol): Declare new function.

gcc/c/ChangeLog:

* c-parser.cc (pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New function.
(c_init_preprocess): New function.

gcc/cp/ChangeLog:

* parser.cc (c_init_preprocess): New function.
(maybe_read_tokens_for_pragma_lex): New function.
(pragma_lex): Support preprocess-only mode.
(pragma_lex_discard_to_eol): New funtion.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_callbacks): Add new callback
on_token_lex.
* macro.cc (cpp_get_token_1): Support new callback.
---

Notes:
Hello-

In r13-1544, I added support for processing `#pragma GCC diagnostic' in
preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
that patch I called into libcpp directly to obtain the tokens needed to
process the pragma. As part of the review, Jason noted that it would
probably be better to make pragma_lex () usable in preprocess-only mode, and
we decided just to add a comment about that for the time being, and to go
ahead and implement that in the future, if it became necessary to support
other pragmas during preprocessing.

I think now is a good time to proceed with that plan, because I would like
to fix PR87299, which is about another pragma (#pragma GCC target) not
working in preprocess-only mode. This patch makes the necessary changes for
pragma_lex () to work in preprocess-only mode.

I have also added a new callback, on_token_lex (), to libcpp. This is so the
preprocessor can see and stream out all the tokens that pragma_lex () gets
from libcpp, since it won't otherwise see them.  This seemed the simplest
approach to me. Another possibility would be to add a wrapper function in
c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
also stream the token in preprocess-only mode, and then change all calls
into libcpp in that file to use the wrapper function.  The libcpp callback
seemed cleaner to me FWIW.

There are no new tests added here, since it's just a change of
implementation covered by existing tests. Bootstrap + regtest all languages
looks good on x86-64 Linux.

Please let me know what you think? Thanks!

-Lewis

 gcc/c-family/c-common.h  |  3 +++
 gcc/c-family/c-opts.cc   |  1 +
 gcc/c-family/c-pragma.cc | 56 ++--
 gcc/c-family/c-pragma.h  |  2 ++
 gcc/c/c-parser.cc| 34 
 gcc/cp/parser.cc | 50 +++
 libcpp/include/cpplib.h  |  4 +++
 libcpp/macro.cc  |  3 +++
 8 files changed, 105 insertions(+), 48 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..78fc5248ba6 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void 

Re: ping: [PATCH] libcpp: Improve location for macro names [PR66290]

2023-06-19 Thread Lewis Hyatt via Gcc-patches
May I please ping this one? FWIW, it's 10 months old now without any feedback.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html

Most of the changes are just adapting the testsuite to look for the
improved diagnostic location. Otherwise it's a handful of lines in
libcpp and it just changes this:

t.cpp:1: warning: macro "X" is not used [-Wunused-macros]
1 | #define X 1
  |

to this:

t.cpp:1:9: warning: macro "X" is not used [-Wunused-macros]
1 | #define X 1
  | ^

which closes out PR66290. Thank you!

-Lewis

On Thu, Jan 12, 2023 at 6:31 PM Lewis Hyatt  wrote:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html
> May I please ping this one again? It will enable closing out the PR. Thanks!
>
> -Lewis
>
> On Thu, Dec 1, 2022 at 9:22 AM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> >
> > May I please ping this one? Thanks!
> > I have also re-attached the rebased patch here.
> >
> > -Lewis
> >
> > On Wed, Oct 12, 2022 at 06:37:50PM -0400, Lewis Hyatt wrote:
> > > Hello-
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > >
> > > Since Jeff was kind enough to ack one of my other preprocessor patches
> > > today, I have become emboldened to ping this one again too :). Would
> > > anyone have some time to take a look at it please? Thanks!
> > >
> > > -Lewis
> > >
> > > On Thu, Sep 15, 2022 at 6:31 PM Lewis Hyatt  wrote:
> > > >
> > > > Hello-
> > > >
> > > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > > > May I please ping this patch? Thank you.
> > > >
> > > > -Lewis
> > > >
> > > > On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
> > > > >
> > > > >
> > > > > When libcpp reports diagnostics whose locus is a macro name (such as 
> > > > > for
> > > > > -Wunused-macros), it uses the location in the cpp_macro object that 
> > > > > was
> > > > > stored by _cpp_new_macro. This is currently set to 
> > > > > pfile->directive_line,
> > > > > which contains the line number only and no column information. This 
> > > > > patch
> > > > > changes the stored location to the src_loc for the token defining the 
> > > > > macro
> > > > > name, which includes the location and range information.
> > > > >
> > > > > libcpp/ChangeLog:
> > > > >
> > > > > PR c++/66290
> > > > > * macro.cc (_cpp_create_definition): Add location argument.
> > > > > * internal.h (_cpp_create_definition): Adjust prototype.
> > > > > * directives.cc (do_define): Pass new location argument to
> > > > > _cpp_create_definition.
> > > > > (do_undef): Stop passing inferior location to 
> > > > > cpp_warning_with_line;
> > > > > the default from cpp_warning is better.
> > > > > (cpp_pop_definition): Pass new location argument to
> > > > > _cpp_create_definition.
> > > > > * pch.cc (cpp_read_state): Likewise.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > PR c++/66290
> > > > > * c-c++-common/cpp/macro-ranges.c: New test.
> > > > > * c-c++-common/cpp/line-2.c: Adapt to check for column 
> > > > > information
> > > > > on macro-related libcpp warnings.
> > > > > * c-c++-common/cpp/line-3.c: Likewise.
> > > > > * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> > > > > * c-c++-common/cpp/pr58844-1.c: Likewise.
> > > > > * c-c++-common/cpp/pr58844-2.c: Likewise.
> > > > > * c-c++-common/cpp/warning-zero-location.c: Likewise.
> > > > > * c-c++-common/pragma-diag-14.c: Likewise.
> > > > > * c-c++-common/pragma-diag-15.c: Likewise.
> > > > > * g++.dg/modules/macro-2_d.C: Likewise.
> > > > > * g++.dg/modules/macro-4_d.C: Likewise.
> > > > > * g++.dg/modules/macro-4_e.C: Likewise.
> > > > > * g++.dg/spellcheck-macro-ordering.C: Likewise.
> > > > > * gcc.dg/builtin-redefine.c: Likewise.
> > > > > * gcc.dg/cpp/Wunused.c: Likewise.
> > > > > * gcc.dg/cpp/redef2.c: Likewise.
> > > > > * gcc.dg/cpp/redef3.c: Likewise.
> > > > > * gcc.dg/cpp/redef4.c: Likewise.
> > > > > * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> > > > > * gcc.dg/cpp/ucnid-11.c: Likewise.
> > > > > * gcc.dg/cpp/undef2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-redefined.c: Likewise.
> > > > > * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> > > > > * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> > > > > ---
> > > > >
> > > > > Notes:
> > > > > Hello-
> > > > >
> > > > > The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was 
> > > > > originally
> > > > > about the entirely wrong location for -Wunused-macros in C++ 
> > > > > mode, which
> > > > > behavior was fixed by r13-1903, but before closing it out I 
> > > > > wanted to also
> > > 

Re: [pushed] diagnostics: ensure that .sarif files are UTF-8 encoded [PR109098]

2023-06-11 Thread Lewis Hyatt via Gcc-patches
On Fri, Mar 24, 2023 at 9:04 PM David Malcolm via Gcc-patches
 wrote:
>
> PR analyzer/109098 notes that the SARIF spec mandates that .sarif
> files are UTF-8 encoded, but -fdiagnostics-format=sarif-file naively
> assumes that the source files are UTF-8 encoded when quoting source
> artefacts in the .sarif output, which can lead to us writing out
> .sarif files with non-UTF-8 bytes in them (which break my reporting
> scripts).
>
> The root cause is that sarif_builder::maybe_make_artifact_content_object
> was using maybe_read_file to load the file content as bytes, and
> assuming they were UTF-8 encoded.
>
> This patch reworks both overloads of this function (one used for the
> whole file, the other for snippets of quoted lines) so that they go
> through input.cc's file cache, which attempts to decode the input files
> according to the input charset, and then encode as UTF-8.  They also
> check that the result actually is UTF-8, for cases where the input
> charset is missing, or incorrectly specified, and omit the quoted
> source for such awkward cases.
>
> Doing so fixes all of the cases I've encountered.
>
> The patch adds a new:
>   { dg-final { verify-sarif-file } }
> directive to all SARIF test cases in the test suite, which verifies
> that the output is UTF-8 encoded, and is valid JSON.  In particular
> it verifies that when we complain about encoding problems, the .sarif
> report we emit is itself correctly encoded.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> Integration testing shows no regressions, and a fix for the case
> seen in haproxy-2.7.1.
> Pushed to trunk as r13-6861-gd495ea2b232f3e.

Hi David-

Regarding the patch series I had about _Pragma locations (most
recently https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609472.html
and https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html).
That one will need some work now in order to apply on top of these
changes to input.cc. Happy to do that, but I thought I better check in
first to see if you had any feedback please on the new approach to
input.cc that's in the v2 patch? Do you think it's a worthwhile
feature, or you'd rather I just drop it? Thanks!

-Lewis


Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-06-02 Thread Lewis Hyatt via Gcc-patches
Hello-

Ping please? Thanks.
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

-Lewis

On Tue, May 2, 2023 at 9:27 AM Lewis Hyatt  wrote:
>
> May I please ping this one? Thanks...
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html
>
> On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
> >
> > The PR complains that we do not handle UTF-8 in the suffix for a 
> > user-defined
> > literal, such as:
> >
> > bool operator ""_π (unsigned long long);
> >
> > In fact we don't handle any extended identifier characters there, whether
> > UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
> > the "" tokens is included, since then the identifier is lexed in the 
> > "normal"
> > way as its own token. But when it is lexed as part of the string token, this
> > is handled in lex_string() with a one-off loop that is not aware of extended
> > characters.
> >
> > This patch fixes it by adding a new function scan_cur_identifier() that can 
> > be
> > used to lex an identifier while in the middle of lexing another token.
> >
> > BTW, the other place that has been mis-lexing identifiers is
> > lex_identifier_intern(), which is used to implement #pragma push_macro
> > and #pragma pop_macro. This does not support extended characters either.
> > I will add that in a subsequent patch, because it can't directly reuse the
> > new function, but rather needs to lex from a string instead of a cpp_buffer.
> >
> > With scan_cur_identifier(), we do also correctly warn about bidi and
> > normalization issues in the extended identifiers comprising the suffix.
> >
> > libcpp/ChangeLog:
> >
> > PR preprocessor/103902
> > * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> > some common code.
> > (lex_identifier_intern): Use the new function.
> > (lex_identifier): Don't run identifier diagnostics here, rather let
> > the call site do it when needed.
> > (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> > acccordingly.
> > (struct scan_id_result): New struct.
> > (scan_cur_identifier): New function.
> > (create_literal2): New function.
> > (lit_accum::create_literal2): New function.
> > (is_macro): Folded into new function...
> > (maybe_ignore_udl_macro_suffix): ...here.
> > (is_macro_not_literal_suffix): Folded likewise.
> > (lex_raw_string): Handle UTF-8 in UDL suffix via 
> > scan_cur_identifier ().
> > (lex_string): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR preprocessor/103902
> > * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> > * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> > * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> > * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> > ---
> >
> > Notes:
> > Hello-
> >
> > This is the updated version of the patch, incorporating feedback from 
> > Jakub
> > and Jason, most recently discussed here:
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
> >
> > Please let me know how it looks? It is simpler than before with the new
> > approach. Thanks!
> >
> > One thing to note. As Jason clarified for me, a usage like this:
> >
> >  #pragma GCC poison _x
> > const char * operator "" _x (const char *, unsigned long);
> >
> > The space between the "" and the _x is currently allowed but will be
> > deprecated in C++23. GCC currently will complain about the poisoned use 
> > of
> > _x in this case, and this patch, which is just focused on handling UTF-8
> > properly, does not change this. But it seems that it would be correct
> > not to apply poison in this case. I can try to follow up with a patch 
> > to do
> > so, if it seems worthwhile? Given the syntax is deprecated, maybe it's 
> > not
> > worth it...
> >
> > For the time being, this patch does add a testcase for the above and 
> > xfails
> > it. For the case where no space is present, which is the part touched 
> > by the
> > present patch, existing behavior is preserved correctly and no 
> > diagnostics
> > such as poison are issued for the UDL suffix. (Contrary to v1 of this
> > patch.)
> >
> > Thanks! bootstrap + regtested all languages on x86-64 Linux with
> > no regressions.
> >
> > -Lewis
> >
> >  .../g++.dg/cpp0x/udlit-extended-id-1.C|  68 
> >  .../g++.dg/cpp0x/udlit-extended-id-2.C|   6 +
> >  .../g++.dg/cpp0x/udlit-extended-id-3.C|  15 +
> >  .../g++.dg/cpp0x/udlit-extended-id-4.C|  14 +
> >  libcpp/lex.cc | 382 ++
> >  5 files changed, 317 insertions(+), 168 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> >  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-2.C
> >  create mode 100644 

Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-05-02 Thread Lewis Hyatt via Gcc-patches
May I please ping this one? Thanks...
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
>
> The PR complains that we do not handle UTF-8 in the suffix for a user-defined
> literal, such as:
>
> bool operator ""_π (unsigned long long);
>
> In fact we don't handle any extended identifier characters there, whether
> UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
> the "" tokens is included, since then the identifier is lexed in the "normal"
> way as its own token. But when it is lexed as part of the string token, this
> is handled in lex_string() with a one-off loop that is not aware of extended
> characters.
>
> This patch fixes it by adding a new function scan_cur_identifier() that can be
> used to lex an identifier while in the middle of lexing another token.
>
> BTW, the other place that has been mis-lexing identifiers is
> lex_identifier_intern(), which is used to implement #pragma push_macro
> and #pragma pop_macro. This does not support extended characters either.
> I will add that in a subsequent patch, because it can't directly reuse the
> new function, but rather needs to lex from a string instead of a cpp_buffer.
>
> With scan_cur_identifier(), we do also correctly warn about bidi and
> normalization issues in the extended identifiers comprising the suffix.
>
> libcpp/ChangeLog:
>
> PR preprocessor/103902
> * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> some common code.
> (lex_identifier_intern): Use the new function.
> (lex_identifier): Don't run identifier diagnostics here, rather let
> the call site do it when needed.
> (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> acccordingly.
> (struct scan_id_result): New struct.
> (scan_cur_identifier): New function.
> (create_literal2): New function.
> (lit_accum::create_literal2): New function.
> (is_macro): Folded into new function...
> (maybe_ignore_udl_macro_suffix): ...here.
> (is_macro_not_literal_suffix): Folded likewise.
> (lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier 
> ().
> (lex_string): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/103902
> * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> ---
>
> Notes:
> Hello-
>
> This is the updated version of the patch, incorporating feedback from 
> Jakub
> and Jason, most recently discussed here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
>
> Please let me know how it looks? It is simpler than before with the new
> approach. Thanks!
>
> One thing to note. As Jason clarified for me, a usage like this:
>
>  #pragma GCC poison _x
> const char * operator "" _x (const char *, unsigned long);
>
> The space between the "" and the _x is currently allowed but will be
> deprecated in C++23. GCC currently will complain about the poisoned use of
> _x in this case, and this patch, which is just focused on handling UTF-8
> properly, does not change this. But it seems that it would be correct
> not to apply poison in this case. I can try to follow up with a patch to 
> do
> so, if it seems worthwhile? Given the syntax is deprecated, maybe it's not
> worth it...
>
> For the time being, this patch does add a testcase for the above and 
> xfails
> it. For the case where no space is present, which is the part touched by 
> the
> present patch, existing behavior is preserved correctly and no diagnostics
> such as poison are issued for the UDL suffix. (Contrary to v1 of this
> patch.)
>
> Thanks! bootstrap + regtested all languages on x86-64 Linux with
> no regressions.
>
> -Lewis
>
>  .../g++.dg/cpp0x/udlit-extended-id-1.C|  68 
>  .../g++.dg/cpp0x/udlit-extended-id-2.C|   6 +
>  .../g++.dg/cpp0x/udlit-extended-id-3.C|  15 +
>  .../g++.dg/cpp0x/udlit-extended-id-4.C|  14 +
>  libcpp/lex.cc | 382 ++
>  5 files changed, 317 insertions(+), 168 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-4.C
>
> diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
> b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> new file mode 100644
> index 000..411d4fdd0ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> @@ -0,0 +1,68 @@
> +// { dg-do run { target c++11 

Ping: [PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-04-04 Thread Lewis Hyatt via Gcc-patches
May I please ping this one?
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613247.html

Thanks!

-Lewis

On Thu, Mar 2, 2023 at 6:21 PM Lewis Hyatt  wrote:
>
> The PR complains that we do not handle UTF-8 in the suffix for a user-defined
> literal, such as:
>
> bool operator ""_π (unsigned long long);
>
> In fact we don't handle any extended identifier characters there, whether
> UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
> the "" tokens is included, since then the identifier is lexed in the "normal"
> way as its own token. But when it is lexed as part of the string token, this
> is handled in lex_string() with a one-off loop that is not aware of extended
> characters.
>
> This patch fixes it by adding a new function scan_cur_identifier() that can be
> used to lex an identifier while in the middle of lexing another token.
>
> BTW, the other place that has been mis-lexing identifiers is
> lex_identifier_intern(), which is used to implement #pragma push_macro
> and #pragma pop_macro. This does not support extended characters either.
> I will add that in a subsequent patch, because it can't directly reuse the
> new function, but rather needs to lex from a string instead of a cpp_buffer.
>
> With scan_cur_identifier(), we do also correctly warn about bidi and
> normalization issues in the extended identifiers comprising the suffix.
>
> libcpp/ChangeLog:
>
> PR preprocessor/103902
> * lex.cc (identifier_diagnostics_on_lex): New function refactoring
> some common code.
> (lex_identifier_intern): Use the new function.
> (lex_identifier): Don't run identifier diagnostics here, rather let
> the call site do it when needed.
> (_cpp_lex_direct): Adjust the call sites of lex_identifier ()
> acccordingly.
> (struct scan_id_result): New struct.
> (scan_cur_identifier): New function.
> (create_literal2): New function.
> (lit_accum::create_literal2): New function.
> (is_macro): Folded into new function...
> (maybe_ignore_udl_macro_suffix): ...here.
> (is_macro_not_literal_suffix): Folded likewise.
> (lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier 
> ().
> (lex_string): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/103902
> * g++.dg/cpp0x/udlit-extended-id-1.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-2.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-3.C: New test.
> * g++.dg/cpp0x/udlit-extended-id-4.C: New test.
> ---
>
> Notes:
> Hello-
>
> This is the updated version of the patch, incorporating feedback from 
> Jakub
> and Jason, most recently discussed here:
>
> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html
>
> Please let me know how it looks? It is simpler than before with the new
> approach. Thanks!
>
> One thing to note. As Jason clarified for me, a usage like this:
>
>  #pragma GCC poison _x
> const char * operator "" _x (const char *, unsigned long);
>
> The space between the "" and the _x is currently allowed but will be
> deprecated in C++23. GCC currently will complain about the poisoned use of
> _x in this case, and this patch, which is just focused on handling UTF-8
> properly, does not change this. But it seems that it would be correct
> not to apply poison in this case. I can try to follow up with a patch to 
> do
> so, if it seems worthwhile? Given the syntax is deprecated, maybe it's not
> worth it...
>
> For the time being, this patch does add a testcase for the above and 
> xfails
> it. For the case where no space is present, which is the part touched by 
> the
> present patch, existing behavior is preserved correctly and no diagnostics
> such as poison are issued for the UDL suffix. (Contrary to v1 of this
> patch.)
>
> Thanks! bootstrap + regtested all languages on x86-64 Linux with
> no regressions.
>
> -Lewis
>
>  .../g++.dg/cpp0x/udlit-extended-id-1.C|  68 
>  .../g++.dg/cpp0x/udlit-extended-id-2.C|   6 +
>  .../g++.dg/cpp0x/udlit-extended-id-3.C|  15 +
>  .../g++.dg/cpp0x/udlit-extended-id-4.C|  14 +
>  libcpp/lex.cc | 382 ++
>  5 files changed, 317 insertions(+), 168 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-2.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-4.C
>
> diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
> b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> new file mode 100644
> index 000..411d4fdd0ba
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
> @@ -0,0 +1,68 @@
> +// { dg-do run { target 

Re: [PATCH] libcpp: Update to Unicode 15

2023-03-09 Thread Lewis Hyatt via Gcc-patches
On Fri, Nov 04, 2022 at 10:03:13AM +0100, Jakub Jelinek via Gcc-patches wrote:
> Hi!
> 
> The following pseudo-patch (for uname2c.h part
> just a pseudo patch with a lot of changes replaced with ...
> because it is too large but the important changes like
> -static const char uname2c_dict[59418] =
> +static const char uname2c_dict[59891] =
> -static const unsigned char uname2c_tree[208765] = {
> +static const unsigned char uname2c_tree[210697] = {
> are shown, full patch xz compressed will be posted separately
> due to mail limit) regenerates the libcpp tables with Unicode 15.0.0
> which added 4489 new characters.
> 
> As mentioned previously, this isn't just a matter of running the
> two libcpp/make*.cc programs on the new Unicode files, but one needs
> to manually update a table inside of makeuname2c.cc according to
> a table in Unicode text (which is partially reflected in the text
> files, but e.g. in Unicode 14.0.0 not 100% accurately, in 15.0.0
> actually accurately).
> I've also added some randomly chosen subset of those 4489 new
> characters to a testcase.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hi Jakub-

In addition to these files you updated last year for Unicode 15, we also need
to update generated_cpp_wcwidth.h, which implements cpp_wcwidth() for
diagnostics so we can output correct column numbers. There is a procedure
outlined in the file contrib/unicode/README that accomplishes this. Is it OK
to push the attached patch (gzipped since it is large and uninformative),
which is the result of following the procedure? It went straightforwardly as
expected, and bootstrap+regtest on x86-64 Linux is clean. Thanks!

-Lewis
[PATCH] libcpp: Update cpp_wcwidth() to Unicode 15

Updates cpp_wcwidth() to Unicode 15, following the procedure in
contrib/unicode/README mechanically without incident.

contrib/ChangeLog:

* unicode/DerivedCoreProperties.txt: Update to Unicode 15.
* unicode/DerivedNormalizationProps.txt: Likewise.
* unicode/EastAsianWidth.txt: Likwise.
* unicode/PropList.txt: Likewise.
* unicode/README: Likewise.
* unicode/UnicodeData.txt: Likewise.

libcpp/ChangeLog:

* generated_cpp_wcwidth.h: Regenerated for Unicode 15.


unicode_15_wcwidth-1.txt.gz
Description: application/gunzip


Ping: [PATCH] libcpp: Fix ICE on directive inside _Pragma() operator [PR67046]

2023-03-07 Thread Lewis Hyatt via Gcc-patches
Hello-

May I please ping this short patch that fixes an old bug? Thanks...

-Lewis

On Sat, Jan 14, 2023 at 1:46 PM Lewis Hyatt  wrote:
>
> get__Pragma_string() in directives.cc is responsible for lexing the parens
> and the string argument from a _Pragma("...") operator. This function does
> not handle the case when the closing paren is not on the same line as the
> string; in that case, libcpp will by default reuse the token buffer it
> previously used for the string, so that the string token returned by
> get__Pragma_string() may be corrupted, as shown in the testcase. Fix using
> the existing keep_tokens mechanism that temporarily disables the reuse of
> token buffers.
>
> libcpp/ChangeLog:
>
> PR preprocessor/67046
> * directives.cc (_cpp_do__Pragma): Increment pfile->keep_tokens to
> ensure the returned string token is valid.
>
> gcc/testsuite/ChangeLog:
>
> PR preprocessor/67046
> * c-c++-common/cpp/pr67046.c: New test.
> ---
>
> Notes:
> Hello-
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67046
>
> This fixes an old ICE in libcpp that can happen when lexing the tokens 
> from a
> _Pragma operator. Bootstrapped+tested on x86-64 Linux with no
> regressions. Please let me know if it's OK? Thanks...
>
> -Lewis
>
>  gcc/testsuite/c-c++-common/cpp/pr67046.c | 10 ++
>  libcpp/directives.cc |  5 +
>  2 files changed, 15 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/cpp/pr67046.c
>
> diff --git a/gcc/testsuite/c-c++-common/cpp/pr67046.c 
> b/gcc/testsuite/c-c++-common/cpp/pr67046.c
> new file mode 100644
> index 000..f37f20c624e
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/cpp/pr67046.c
> @@ -0,0 +1,10 @@
> +/* { dg-do preprocess } */
> +
> +_Pragma(
> +"message(\"msg\")"
> +)
> +
> +_Pragma(
> +"message(\"msg\")"
> +#
> +)
> diff --git a/libcpp/directives.cc b/libcpp/directives.cc
> index 9dc4363c65a..ffd262bce7d 100644
> --- a/libcpp/directives.cc
> +++ b/libcpp/directives.cc
> @@ -1996,7 +1996,12 @@ destringize_and_run (cpp_reader *pfile, const 
> cpp_string *in,
>  int
>  _cpp_do__Pragma (cpp_reader *pfile, location_t expansion_loc)
>  {
> +  /* Make sure we don't invalidate the string token, if the closing 
> parenthesis
> +   ended up on a different line.  */
> +  ++pfile->keep_tokens;
>const cpp_token *string = get__Pragma_string (pfile);
> +  --pfile->keep_tokens;
> +
>pfile->directive_result.type = CPP_PADDING;
>
>if (string)


[PATCH v2] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-03-02 Thread Lewis Hyatt via Gcc-patches
The PR complains that we do not handle UTF-8 in the suffix for a user-defined
literal, such as:

bool operator ""_π (unsigned long long);

In fact we don't handle any extended identifier characters there, whether
UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
the "" tokens is included, since then the identifier is lexed in the "normal"
way as its own token. But when it is lexed as part of the string token, this
is handled in lex_string() with a one-off loop that is not aware of extended
characters.

This patch fixes it by adding a new function scan_cur_identifier() that can be
used to lex an identifier while in the middle of lexing another token.

BTW, the other place that has been mis-lexing identifiers is
lex_identifier_intern(), which is used to implement #pragma push_macro
and #pragma pop_macro. This does not support extended characters either.
I will add that in a subsequent patch, because it can't directly reuse the
new function, but rather needs to lex from a string instead of a cpp_buffer.

With scan_cur_identifier(), we do also correctly warn about bidi and
normalization issues in the extended identifiers comprising the suffix.

libcpp/ChangeLog:

PR preprocessor/103902
* lex.cc (identifier_diagnostics_on_lex): New function refactoring
some common code.
(lex_identifier_intern): Use the new function.
(lex_identifier): Don't run identifier diagnostics here, rather let
the call site do it when needed.
(_cpp_lex_direct): Adjust the call sites of lex_identifier ()
acccordingly.
(struct scan_id_result): New struct.
(scan_cur_identifier): New function.
(create_literal2): New function.
(lit_accum::create_literal2): New function.
(is_macro): Folded into new function...
(maybe_ignore_udl_macro_suffix): ...here.
(is_macro_not_literal_suffix): Folded likewise.
(lex_raw_string): Handle UTF-8 in UDL suffix via scan_cur_identifier ().
(lex_string): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/103902
* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
* g++.dg/cpp0x/udlit-extended-id-3.C: New test.
* g++.dg/cpp0x/udlit-extended-id-4.C: New test.
---

Notes:
Hello-

This is the updated version of the patch, incorporating feedback from Jakub
and Jason, most recently discussed here:

https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612073.html

Please let me know how it looks? It is simpler than before with the new
approach. Thanks!

One thing to note. As Jason clarified for me, a usage like this:

 #pragma GCC poison _x
const char * operator "" _x (const char *, unsigned long);

The space between the "" and the _x is currently allowed but will be
deprecated in C++23. GCC currently will complain about the poisoned use of
_x in this case, and this patch, which is just focused on handling UTF-8
properly, does not change this. But it seems that it would be correct
not to apply poison in this case. I can try to follow up with a patch to do
so, if it seems worthwhile? Given the syntax is deprecated, maybe it's not
worth it...

For the time being, this patch does add a testcase for the above and xfails
it. For the case where no space is present, which is the part touched by the
present patch, existing behavior is preserved correctly and no diagnostics
such as poison are issued for the UDL suffix. (Contrary to v1 of this
patch.)

Thanks! bootstrap + regtested all languages on x86-64 Linux with
no regressions.

-Lewis

 .../g++.dg/cpp0x/udlit-extended-id-1.C|  68 
 .../g++.dg/cpp0x/udlit-extended-id-2.C|   6 +
 .../g++.dg/cpp0x/udlit-extended-id-3.C|  15 +
 .../g++.dg/cpp0x/udlit-extended-id-4.C|  14 +
 libcpp/lex.cc | 382 ++
 5 files changed, 317 insertions(+), 168 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-3.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-4.C

diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
new file mode 100644
index 000..411d4fdd0ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
@@ -0,0 +1,68 @@
+// { dg-do run { target c++11 } }
+// { dg-additional-options "-Wno-error=normalized" }
+#include 
+using namespace std;
+
+constexpr unsigned long long operator "" _π (unsigned long long x)
+{
+  return 3 * x;
+}
+
+/* Historically we didn't parse properly as part of the "" token, so check that
+   as well.  */
+constexpr unsigned long long operator ""_Π2 

Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-02-15 Thread Lewis Hyatt via Gcc-patches
On Wed, Feb 15, 2023 at 1:39 PM Jason Merrill  wrote:
>
> On 9/26/22 15:27, Lewis Hyatt wrote:
> > On Wed, Jun 15, 2022 at 03:06:16PM -0400, Lewis Hyatt wrote:
> >> On Tue, Jun 14, 2022 at 05:26:49PM -0400, Lewis Hyatt wrote:
> >>> Hello-
> >>>
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902
> >>>
> >>> The attached patch resolves PR preprocessor/103902 as described in the 
> >>> patch
> >>> message inline below. bootstrap + regtest all languages was successful on
> >>> x86-64 Linux, with no new failures:
> >>>
> >>> FAIL 103 103
> >>> PASS 542338 542371
> >>> UNSUPPORTED 15247 15250
> >>> UNTESTED 136 136
> >>> XFAIL 4166 4166
> >>> XPASS 17 17
> >>>
> >>> Please let me know if it looks OK?
> >>>
> >>> A few questions I have:
> >>>
> >>> - A difference introduced with this patch is that after lexing something
> >>> like `operator ""_abc', then `_abc' is added to the identifier hash map,
> >>> whereas previously it was not. I feel like this must be OK because with 
> >>> the
> >>> optional space as in `operator "" _abc', it would be added with or 
> >>> without the
> >>> patch.
> >>>
> >>> - The behavior of `#pragma GCC poison' is not consistent (including prior 
> >>> to
> >>>my patch). I tried to make it more so but there is still one thing I 
> >>> want to
> >>>ask about. Leaving aside extended characters for now, the 
> >>> inconsistency is
> >>>that currently the poison is only checked, when the suffix appears as a
> >>>standalone token.
> >>>
> >>>#pragma GCC poison _X
> >>>bool operator ""_X (unsigned long long);   //accepted before the patch,
> >>>   //rejected after it
> >>>bool operator "" _X (unsigned long long);  //rejected either before or 
> >>> after
> >>>const char * operator ""_X (const char *, unsigned long); //accepted 
> >>> before,
> >>>  //rejected 
> >>> after
> >>>const char * operator "" _X (const char *, unsigned long); //rejected 
> >>> either
> >>>
> >>>const char * s = ""_X; //accepted before the patch, rejected after it
> >>>const bool b = 1_X; //accepted before or after 
> >>>
> >>> I feel like after the patch, the behavior is the expected behavior for all
> >>> cases but the last one. Here, we allow the poisoned identifier because 
> >>> it's
> >>> not lexed as an identifier, it's lexed as part of a pp-number. Does it 
> >>> seem OK
> >>> like this or does it need to be addressed?
> >>
> >> Sorry, that version actually did not handle the case of -Wc++11-compat in
> >> c++98 mode correctly. This updated version fixes that and adds the missing
> >> test coverage for that, if you could please review this one instead?
> >>
> >> By the way, the pipermail archive seems to permanently mangle UTF-8 in 
> >> inline
> >> attachments. I attached the patch also gzipped to address that for the
> >> archive, since the new testcases do use non-ASCII characters.
> >>
> >> Thanks for taking a look!
> >
> > Hello-
> >
> > May I please ping this patch again? Joseph suggested that it would be best 
> > if
> > a C++ maintainer has a look at it. This is one of just a few places left 
> > where
> > we don't handle UTF-8 properly in libcpp, it would be really nice to get 
> > them
> > fixed up if there is time to review this patch. Thanks!
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html
> >
> > I re-attached it here as it required some trivial rebasing on top of 
> > recently
> > pushed changes. As before, I also attached the gzipped version so that the
> > UTF-8 testcases show up OK in the online archive, in case that's still an
> > issue. Thanks for taking a look!
>
> Thank you for the patch, sorry it slipped off my radar.
>

Thanks for taking a look at it. It's certainly an edge case that is
not bothering anyone too much, so no rush with it.

> > This patch fixes it by adding a new function scan_cur_identifier() that can 
> > be
> > used to lex an identifier while in the middle of lexing another token. It is
> > somewhat duplicative of the code in lex_identifier(), which handles the 
> > normal
> > case, but I think there's no good way to avoid that without pessimizing the
> > usual case, since lex_identifier() takes advantage of the fact that the 
> > first
> > character of the identifier has already been analyzed.
>
> So could you analyze the first character and then call lex_identifier?
>

Yes, it can be done this way. lex_identifier may need some adaptations
though, since it does some other work like tracking the original
spelling of the identifier. Plus per your comments below, it would
need to avoid the poison and other checks too.
I think it's pretty straightforward to refactor a bit so that it works
out. I kinda thought it may not be desirable to touch lex_identifier,
which is called on everything, just to handle this rare case, however
I am happy to do it this way after confirming it won't hurt

Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2023-02-10 Thread Lewis Hyatt via Gcc-patches
On Fri, Feb 10, 2023 at 11:30 AM Jakub Jelinek  wrote:
>
> On Mon, Sep 26, 2022 at 06:27:25PM -0400, Lewis Hyatt via Gcc-patches wrote:
> > May I please ping this patch again? Joseph suggested that it would be best 
> > if
> > a C++ maintainer has a look at it. This is one of just a few places left 
> > where
> > we don't handle UTF-8 properly in libcpp, it would be really nice to get 
> > them
> > fixed up if there is time to review this patch. Thanks!
>
> CCing them.
>
> Just some nits from me, but I agree C++ maintainers are the best reviewers
> for this.

Thanks so much for looking it over, I really appreciate it. I'll be
sure to incorporate all your feedback along with those from the full
review.

Is this for stage 1 at this point BTW?

One note, the patch as-is doesn't quite apply to master branch
nowadays, it just needs a small tweak since warn_about_normalization()
has acquired a new argument in the meantime. If it's helpful I can
resend it with this addressed, as well as the rest of your comments?

Finally one comment here:

> > +  if (const auto sr = scan_cur_identifier (pfile))
> > + {
> > +   /* If a string format macro, say from inttypes.h, is placed touching
> > +  a string literal it could be parsed as a C++11 user-defined
> > +  string literal thus breaking the program.  User-defined literals
> > +  outside of namespace std must start with a single underscore, so
> > +  assume anything of that form really is a UDL suffix.  We don't
> > +  need to worry about UDLs defined inside namespace std because
> > +  their names are reserved, so cannot be used as macro names in
> > +  valid programs.  */
> > +   if ((suffix_begin[0] != '_' || suffix_begin[1] == '_')
> > +   && cpp_macro_p (sr.node))
>
> What is the advantage of dropping is_macro_not_literal_suffix and
> hand-inlining it in two different spots?
> Couldn't even the actual warning be moved into an inline function?

The is_macro() function was doing two jobs, first lexing the
identifier and looking it up in the hash table, and then calling
cpp_macro_p(). This was a bit duplicative because the identifier was
then immediately lexed again after the check. Since lexing it became
more complicated with UTF-8 support, I changed it not to duplicate
that effort and instead scan_cur_identifer() does the job once. With
that done, all that's left for is_macro() to do is just the one line
check so I got rid of it. However, I agree that the check about
suffix_begin is not really trivial and so factoring this out into one
place instead of two makes sense. I'll try to move the whole warning
into its own function in the next iteration.

> Otherwise it looks reasonable to me, but I'd still prefer Jason or Nathan
> to review this.
>
> Jakub
>

Thanks again.

-Lewis


[PATCH] libcpp: Fix ICE on directive inside _Pragma() operator [PR67046]

2023-01-14 Thread Lewis Hyatt via Gcc-patches
get__Pragma_string() in directives.cc is responsible for lexing the parens
and the string argument from a _Pragma("...") operator. This function does
not handle the case when the closing paren is not on the same line as the
string; in that case, libcpp will by default reuse the token buffer it
previously used for the string, so that the string token returned by
get__Pragma_string() may be corrupted, as shown in the testcase. Fix using
the existing keep_tokens mechanism that temporarily disables the reuse of
token buffers.

libcpp/ChangeLog:

PR preprocessor/67046
* directives.cc (_cpp_do__Pragma): Increment pfile->keep_tokens to
ensure the returned string token is valid.

gcc/testsuite/ChangeLog:

PR preprocessor/67046
* c-c++-common/cpp/pr67046.c: New test.
---

Notes:
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67046

This fixes an old ICE in libcpp that can happen when lexing the tokens from 
a
_Pragma operator. Bootstrapped+tested on x86-64 Linux with no
regressions. Please let me know if it's OK? Thanks...

-Lewis

 gcc/testsuite/c-c++-common/cpp/pr67046.c | 10 ++
 libcpp/directives.cc |  5 +
 2 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr67046.c

diff --git a/gcc/testsuite/c-c++-common/cpp/pr67046.c 
b/gcc/testsuite/c-c++-common/cpp/pr67046.c
new file mode 100644
index 000..f37f20c624e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr67046.c
@@ -0,0 +1,10 @@
+/* { dg-do preprocess } */
+
+_Pragma(
+"message(\"msg\")"
+)
+
+_Pragma(
+"message(\"msg\")"
+#
+)
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 9dc4363c65a..ffd262bce7d 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1996,7 +1996,12 @@ destringize_and_run (cpp_reader *pfile, const cpp_string 
*in,
 int
 _cpp_do__Pragma (cpp_reader *pfile, location_t expansion_loc)
 {
+  /* Make sure we don't invalidate the string token, if the closing parenthesis
+   ended up on a different line.  */
+  ++pfile->keep_tokens;
   const cpp_token *string = get__Pragma_string (pfile);
+  --pfile->keep_tokens;
+
   pfile->directive_result.type = CPP_PADDING;
 
   if (string)


ping: [PATCH] libcpp: Improve location for macro names [PR66290]

2023-01-12 Thread Lewis Hyatt via Gcc-patches
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607647.html
May I please ping this one again? It will enable closing out the PR. Thanks!

-Lewis

On Thu, Dec 1, 2022 at 9:22 AM Lewis Hyatt  wrote:
>
> Hello-
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
>
> May I please ping this one? Thanks!
> I have also re-attached the rebased patch here.
>
> -Lewis
>
> On Wed, Oct 12, 2022 at 06:37:50PM -0400, Lewis Hyatt wrote:
> > Hello-
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> >
> > Since Jeff was kind enough to ack one of my other preprocessor patches
> > today, I have become emboldened to ping this one again too :). Would
> > anyone have some time to take a look at it please? Thanks!
> >
> > -Lewis
> >
> > On Thu, Sep 15, 2022 at 6:31 PM Lewis Hyatt  wrote:
> > >
> > > Hello-
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > > May I please ping this patch? Thank you.
> > >
> > > -Lewis
> > >
> > > On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
> > > >
> > > >
> > > > When libcpp reports diagnostics whose locus is a macro name (such as for
> > > > -Wunused-macros), it uses the location in the cpp_macro object that was
> > > > stored by _cpp_new_macro. This is currently set to 
> > > > pfile->directive_line,
> > > > which contains the line number only and no column information. This 
> > > > patch
> > > > changes the stored location to the src_loc for the token defining the 
> > > > macro
> > > > name, which includes the location and range information.
> > > >
> > > > libcpp/ChangeLog:
> > > >
> > > > PR c++/66290
> > > > * macro.cc (_cpp_create_definition): Add location argument.
> > > > * internal.h (_cpp_create_definition): Adjust prototype.
> > > > * directives.cc (do_define): Pass new location argument to
> > > > _cpp_create_definition.
> > > > (do_undef): Stop passing inferior location to 
> > > > cpp_warning_with_line;
> > > > the default from cpp_warning is better.
> > > > (cpp_pop_definition): Pass new location argument to
> > > > _cpp_create_definition.
> > > > * pch.cc (cpp_read_state): Likewise.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > PR c++/66290
> > > > * c-c++-common/cpp/macro-ranges.c: New test.
> > > > * c-c++-common/cpp/line-2.c: Adapt to check for column 
> > > > information
> > > > on macro-related libcpp warnings.
> > > > * c-c++-common/cpp/line-3.c: Likewise.
> > > > * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> > > > * c-c++-common/cpp/pr58844-1.c: Likewise.
> > > > * c-c++-common/cpp/pr58844-2.c: Likewise.
> > > > * c-c++-common/cpp/warning-zero-location.c: Likewise.
> > > > * c-c++-common/pragma-diag-14.c: Likewise.
> > > > * c-c++-common/pragma-diag-15.c: Likewise.
> > > > * g++.dg/modules/macro-2_d.C: Likewise.
> > > > * g++.dg/modules/macro-4_d.C: Likewise.
> > > > * g++.dg/modules/macro-4_e.C: Likewise.
> > > > * g++.dg/spellcheck-macro-ordering.C: Likewise.
> > > > * gcc.dg/builtin-redefine.c: Likewise.
> > > > * gcc.dg/cpp/Wunused.c: Likewise.
> > > > * gcc.dg/cpp/redef2.c: Likewise.
> > > > * gcc.dg/cpp/redef3.c: Likewise.
> > > > * gcc.dg/cpp/redef4.c: Likewise.
> > > > * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> > > > * gcc.dg/cpp/ucnid-11.c: Likewise.
> > > > * gcc.dg/cpp/undef2.c: Likewise.
> > > > * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> > > > * gcc.dg/cpp/warn-redefined.c: Likewise.
> > > > * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> > > > * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> > > > ---
> > > >
> > > > Notes:
> > > > Hello-
> > > >
> > > > The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was 
> > > > originally
> > > > about the entirely wrong location for -Wunused-macros in C++ mode, 
> > > > which
> > > > behavior was fixed by r13-1903, but before closing it out I wanted 
> > > > to also
> > > > address a second point brought up in the PR comments, namely that 
> > > > we do not
> > > > include column information when emitting diagnostics for macro 
> > > > names, such as
> > > > is done for -Wunused-macros. The attached patch updates the 
> > > > location stored in
> > > > the cpp_macro object so that it includes the column and range 
> > > > information for
> > > > the token comprising the macro name; previously, the location was 
> > > > just the
> > > > generic one pointing to the whole line.
> > > >
> > > > The change to libcpp is very small, the reason for all the 
> > > > testsuite changes is
> > > > that I have updated all tests explicitly looking for the columnless 
> > > > diagnostics
> > > > (with the "-:" syntax to dg-warning et al) so that they expect a 
> > > > column
> 

[PATCH v2 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings

2023-01-05 Thread Lewis Hyatt via Gcc-patches
Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^

with the caret in a nonsensical location, while this one:

=
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=

produces:

file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

==
In buffer generated from file.cpp:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

and

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

* directives.cc (get_token_no_padding): Add argument to receive the
virtual location of the token.
(get__Pragma_string): Likewise.
(do_pragma): Set pfile->directive_result->src_loc properly, it should
not be a virtual location.
(destringize_and_run): Update to provide proper locations for the
_Pragma string tokens.  Support raw strings.
(_cpp_do__Pragma): Adapt to changes to the helper functions.
* errors.cc (cpp_diagnostic_at): Support
cpp_reader::diagnostic_rebase_loc.
(cpp_diagnostic_with_line): Likewise.
* include/line-map.h (class rich_location): Add new member
forget_cached_expanded_locations().
* internal.h (struct _cpp__Pragma_state): Define new struct.
(_cpp_rebase_diagnostic_location): Declare new function.
(struct cpp_reader): Add diagnostic_rebase_loc member.
(_cpp_push__Pragma_token_context): Declare new function.
(_cpp_do__Pragma): Adjust prototype.
* macro.cc (pragma_str): New static var.
(builtin_macro): Adapt to new implementation of _Pragma processing.
(_cpp_pop_context): Fix the logic for resetting
pfile->top_most_macro_node, which previously was never triggered,
although the error seems to have been harmless.
(_cpp_push__Pragma_token_context): New function.
(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
macro tracking output for _Pragma directives.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
tracking output for _Pragma directives.
* c-c++-common/cpp/pr57580.c: Likewise.
* c-c++-common/gomp/pragma-3.c: Likewise.

[PATCH v2 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-01-05 Thread Lewis Hyatt via Gcc-patches
Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The actual change needed to the line-maps API in libcpp is not too large and
requires no space overhead in the line map data structures (on 64-bit systems
that is; one newly added data member to class line_map_ordinary sits inside
former padding bytes.) An LC_GEN map is just an ordinary map like any other,
but the TO_FILE member that normally points to the file name points instead to
the actual data.  This works automatically with PCH as well, for the same
reason that the file name makes its way into a PCH.  In order to avoid
confusion, the member has been renamed from TO_FILE to DATA, and associated
accessors adjusted.

Outside libcpp, there are many small changes but most of them are to
selftests, which are necessarily more sensitive to implementation
details. From the perspective of the user (the "user", here, being a frontend
using line maps or else the diagnostics infrastructure), the chief visible
change is that the function location_get_source_line() should be passed an
expanded_location object instead of a separate filename and line number.  This
is not a big change because in most cases, this information came anyway from a
call to expand_location and the needed expanded_location object is readily
available. The new overload of location_get_source_line() uses the extra
information in the expanded_location object to obtain the data from the
in-memory buffer when it originated from an LC_GEN map.

Until the subsequent patch that starts using LC_GEN maps, none are yet
generated within GCC, hence nothing is added to the testsuite here; but all
relevant selftests have been extended to cover generated data maps in addition
to normal files.

libcpp/ChangeLog:

* include/line-map.h (enum lc_reason): Add LC_GEN.
(struct line_map_ordinary): Add new members to support LC_GEN concept.
(ORDINARY_MAP_FILE_NAME): Assert that map really does encode a file
and not generated data.
(ORDINARY_MAP_GENERATED_DATA_P): New function.
(ORDINARY_MAP_GENERATED_DATA): New function.
(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
(ORDINARY_MAP_FILE_NAME_OR_DATA): New function.
(ORDINARY_MAPS_SAME_FILE_P): Declare new function.
(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare new function.
(LINEMAP_FILE): This was always a synonym for ORDINARY_MAP_FILE_NAME;
make this explicit.
(linemap_get_file_highest_location): Adjust prototype.
(linemap_add): Adjust prototype.
(class expanded_location): Add new members to store generated content.
* line-map.cc (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
(ORDINARY_MAPS_SAME_FILE_P): New function.
(linemap_add): Add new argument DATA_LEN. Support generated data in
LC_GEN maps.
(linemap_check_files_exited): Adapt to API changes supporting LC_GEN.
(linemap_line_start): Likewise.
(linemap_position_for_loc_and_offset): Likewise.
(linemap_get_expansion_filename): Likewise.
(linemap_expand_location): Likewise.
(linemap_dump): Likewise.
(linemap_dump_location): Likewise.
(linemap_get_file_highest_location): Likewise.
* directives.cc (_cpp_do_file_change): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (make_range): Initialize new fields in
expanded_location.
(compatible_locations_p): Use new ORDINARY_MAPS_SAME_FILE_P ()
function.
(layout::calculate_x_offset_display): Use the new expanded_location
overload of location_get_source_line(), so as to support LC_GEN maps.
(layout::print_line): Likewise.
(source_line::source_line): Likewise.
(line_corrections::add_hint): Likewise.
(class line_corrections): Store the location as an exploc rather than
individual filename, so as to support LC_GEN maps.
(layout::print_trailing_fixits): Use the new exploc constructor for
class line_corrections.
(test_layout_x_offset_display_utf8): Test LC_GEN maps as well as normal.
(test_layout_x_offset_display_tab): Likewise.
(test_diagnostic_show_locus_one_liner): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_add_location_if_nearby): Likewise.
(test_diagnostic_show_locus_fixit_lines): Likewise.
(test_fixit_consolidation): Likewise.
(test_overlapped_fixit_printing): Likewise.

[PATCH v2 4/4] diagnostics: Support generated data locations in SARIF output

2023-01-05 Thread Lewis Hyatt via Gcc-patches
The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

* diagnostic-format-sarif.cc (sarif_builder::xloc_to_fb): New function.
(sarif_builder::maybe_make_physical_location_object): Support
generated data locations.
(sarif_builder::make_artifact_location_object): Likewise.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
(sarif_builder::make_artifact_object): Likewise.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc| 102 +++---
 .../diagnostic-format-sarif-file-5.c  |  31 ++
 2 files changed, 93 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index f8fdd586ff0..99aba1414ea 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -125,7 +125,10 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+
+  typedef std::pair filename_or_buffer;
+  json::object *make_artifact_location_object (filename_or_buffer fb);
+
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -146,16 +149,17 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) 
const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
-   int start_line,
+  json::object *make_artifact_object (filename_or_buffer fb);
+  json::object *
+  maybe_make_artifact_content_object (filename_or_buffer fb) const;
+  json::object *maybe_make_artifact_content_object (expanded_location xloc,
int end_line) const;
   json::object *make_fix_object (const rich_location _loc);
   json::object *make_artifact_change_object (const rich_location );
   json::object *make_replacement_object (const fixit_hint ) const;
   json::object *make_artifact_content_object (const char *text) const;
   int get_sarif_column (expanded_location exploc) const;
+  static filename_or_buffer xloc_to_fb (expanded_location xloc);
 
   diagnostic_context *m_context;
 
@@ -166,7 +170,11 @@ private:
  diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set  m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+ with that length, not a filename.  */
+  hash_set ,
+  int_hash  >
+   > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set  m_rule_id_set;
   json::array *m_rules_arr;
@@ -588,6 +596,15 @@ sarif_builder::make_location_object (const 
diagnostic_event )
   return location_obj;
 }
 
+/* Populate a filename_or_buffer pair from an expanded location.  */
+sarif_builder::filename_or_buffer
+sarif_builder::xloc_to_fb (expanded_location xloc)
+{
+  if (xloc.generated_data_len)
+return filename_or_buffer (xloc.generated_data, xloc.generated_data_len);
+  return filename_or_buffer (xloc.file, 0);
+}
+
 /* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC,
or return NULL;
Add any filename to the m_artifacts.  */
@@ -603,7 +620,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  m_filenames.add (xloc_to_fb (expand_location (loc)));
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -627,7 +644,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (xloc_to_fb (expand_location (loc)));
 }
 
 /* 

[PATCH v2 2/4] diagnostics: Handle generated data locations in edit_context

2023-01-05 Thread Lewis Hyatt via Gcc-patches
Class edit_context handles outputting fixit hints in diff form that could be
manually or automatically applied by the user. This will not make sense for
generated data locations, such as the contents of a _Pragma string, because
the text to be modified does not appear in the user's input files. We do not
currently ever generate fixit hints in such a context, but for future-proofing
purposes, ignore such locations in edit context now.

gcc/ChangeLog:

* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.
---
 gcc/edit-context.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6f5bc6b9d8f..ae11b6f2e00 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint)
 return false;
   if (start.column == 0)
 return false;
+  if (start.generated_data)
+return false;
   if (next_loc.column == 0)
 return false;
+  if (next_loc.generated_data)
+return false;
 
   edited_file  = get_or_insert_file (start.file);
   if (!m_valid)


[PATCH v2 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2023-01-05 Thread Lewis Hyatt via Gcc-patches
Hello-

This series contains the four remaining patches in the series originally
sent here:

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605029.html

which implements improved locations for tokens lexed from a string inside a
_Pragma directive.

v2 1/4: diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

This was formerly v1 4/6. It has been rewritten in line with that review,
most recently discussed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606616.html

v2 2/4: diagnostics: Handle generated data locations in edit_context

This was formerly v1 5a/6. It has been approved already conditional on
v2 1/4 as a prerequisite.

v2 3/4: diagnostics: libcpp: Assign real locations to the tokens inside
_Pragma strings

This was formerly v1 6/6 and is unchanged from that one. It has not been
reviewed yet.

v2 4/4: diagnostics: Support generated data locations in SARIF output

This was formerly v1 5c/6. It has not been fully reviewed yet.

Thanks for taking a look!

-Lewis


Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2023-01-05 Thread Lewis Hyatt via Gcc-patches
On Thu, Nov 17, 2022 at 4:21 PM Lewis Hyatt  wrote:
>
> On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote:
> > On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> > [...snip...]
> > >
> > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > > index 5890c18bdc3..2935d7fb236 100644
> > > --- a/gcc/c-family/c-common.cc
> > > +++ b/gcc/c-family/c-common.cc
> > > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const 
> > > char *file, location_t loc)
> > >const line_map_ordinary *ord_map
> > > = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> > >
> > > +  if (ord_map->reason == LC_GEN)
> > > +   continue;
> > > +
> > >if (const line_map_ordinary *from
> > >   = linemap_included_from_linemap (line_table, ord_map))
> > > /* We cannot use pointer equality, because with preprocessed
> > >input all filename strings are unique.  */
> > > -   if (0 == strcmp (from->to_file, file))
> > > +   if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
> > >   {
> > > last_include_ord_map = from;
> > > last_ord_map_after_include = NULL;
> >
> > [...snip...]
> >
> > I'm not a fan of having the "to_file" field change meaning based on
> > whether reason is LC_GEN.
> >
> > How involved would it be to split line_map_ordinary into two
> > subclasses, so that we'd have this hierarchy (with indentation showing
> > inheritance):
> >
> > line_map
> >   line_map_ordinary
> > line_map_ordinary_file
> > line_map_ordinary_generated
> >   line_map_macro
> >
> > Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
> > to emphasize that it might not be a filename, and that we have to check
> > everywhere we access that field.
> >
> > Please can all those checks for LC_GEN go into an inline function so we
> > can write e.g.
> >   map->generated_p ()
> > or somesuch.
> >
> > If I reading things right, patch 6 adds the sole usage of this in
> > destringize_and_run.  Would we ever want to discriminate between
> > different kinds of generated buffers?
> >
> > [...snip...]
> >
> > > @@ -796,10 +798,13 @@ diagnostic_report_current_module 
> > > (diagnostic_context *context, location_t where)
> > >  N_("of module"),
> > >  N_("In module imported at"),   /* 6 */
> > >  N_("imported at"),
> > > +N_("In buffer generated from"),   /* 8 */
> > > };
> >
> > We use the wording "destringized" in:
> >
> > so maybe this should be "In buffer destringized from" ???  (I'm not
> > sure)
> >
> > [...snip...]
> >
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 483cb6e940d..3cf5480551d 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> >
> > [..snip...]
> >
> > > @@ -58,7 +64,7 @@ public:
> > >~file_cache_slot ();
> >
> > My initial thought reading the input.cc part of this patch was that I
> > want it to be very clear when a file_cache_slot is for a real file vs
> > when we're replaying generated data.  I'd hoped that this could have
> > been expressed via inheritance, but we preallocate all the cache slots
> > once in an array in file_cache's ctor and the slots get reused over
> > time.  So instead of that, can we please have some kind of:
> >
> >bool file_slot_p () const;
> >bool generated_slot_p () const;
> >
> > or somesuch, so that we can have clear assertions and conditionals
> > about the current state of a slot (I think the discriminating condition
> > is that generated_data_len > 0, right?)
> >
> > If I'm reading things right, it looks like file_cache_slot::m_file_path
> > does double duty after this patch, and is either a filename, or a
> > pointer to the generated data.  If so, please can the patch rename it,
> > and have all usage guarded appropriately.  Can it be a union? (or does
> > the ctor prevent that?)
> >
> > [...snip...]
> >
> > > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned 
> > > *highest_use_count)
> > > num_file_slots files are cached.  */
> &

[PATCH] preprocessor: Don't register pragmas in directives-only mode [PR108244]

2022-12-30 Thread Lewis Hyatt via Gcc-patches
libcpp's directives-only mode does not expect deferred pragmas to be
registered, but to date the c-family registration process has not checked for
this case. That issue became more visible since r13-1544, which added the
commonly used GCC diagnostic pragmas to the set of those registered in
preprocessing modes. Fix it by checking for directives-only mode in
c-family/c-pragma.cc.

gcc/c-family/ChangeLog:

PR preprocessor/108244
* c-pragma.cc (c_register_pragma_1): Don't attempt to register any
deferred pragmas if -fdirectives-only.
(init_pragma): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/pr108244-1.c: New test.
* c-c++-common/cpp/pr108244-2.c: New test.
* c-c++-common/cpp/pr108244-3.c: New test.
---

Notes:
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108244

The PR notes a regression in GCC 13 which is fixed by the attached
patch. Bootstrap+regtest all languages on x86-64 Linux looks good. Please 
let
me know if it is OK? Thanks.

-Lewis

 gcc/c-family/c-pragma.cc| 54 -
 gcc/testsuite/c-c++-common/cpp/pr108244-1.c |  5 ++
 gcc/testsuite/c-c++-common/cpp/pr108244-2.c |  5 ++
 gcc/testsuite/c-c++-common/cpp/pr108244-3.c |  6 +++
 4 files changed, 46 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr108244-1.c
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr108244-2.c
 create mode 100644 gcc/testsuite/c-c++-common/cpp/pr108244-3.c

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 142a46441ac..91fabf0a513 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1647,7 +1647,8 @@ c_register_pragma_1 (const char *space, const char *name,
 
   if (flag_preprocess_only)
 {
-  if (!(allow_expansion || ihandler.early_handler.handler_1arg))
+  if (cpp_get_options (parse_in)->directives_only
+ || !(allow_expansion || ihandler.early_handler.handler_1arg))
return;
 
   pragma_pp_data pp_data;
@@ -1811,34 +1812,39 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
 void
 init_pragma (void)
 {
-  if (flag_openacc)
+
+  if (!cpp_get_options (parse_in)->directives_only)
 {
-  const int n_oacc_pragmas = ARRAY_SIZE (oacc_pragmas);
-  int i;
+  if (flag_openacc)
+   {
+ const int n_oacc_pragmas = ARRAY_SIZE (oacc_pragmas);
+ int i;
 
-  for (i = 0; i < n_oacc_pragmas; ++i)
-   cpp_register_deferred_pragma (parse_in, "acc", oacc_pragmas[i].name,
- oacc_pragmas[i].id, true, true);
-}
+ for (i = 0; i < n_oacc_pragmas; ++i)
+   cpp_register_deferred_pragma (parse_in, "acc", oacc_pragmas[i].name,
+ oacc_pragmas[i].id, true, true);
+   }
 
-  if (flag_openmp)
-{
-  const int n_omp_pragmas = ARRAY_SIZE (omp_pragmas);
-  int i;
+  if (flag_openmp)
+   {
+ const int n_omp_pragmas = ARRAY_SIZE (omp_pragmas);
+ int i;
 
-  for (i = 0; i < n_omp_pragmas; ++i)
-   cpp_register_deferred_pragma (parse_in, "omp", omp_pragmas[i].name,
- omp_pragmas[i].id, true, true);
-}
-  if (flag_openmp || flag_openmp_simd)
-{
-  const int n_omp_pragmas_simd = sizeof (omp_pragmas_simd)
-/ sizeof (*omp_pragmas);
-  int i;
+ for (i = 0; i < n_omp_pragmas; ++i)
+   cpp_register_deferred_pragma (parse_in, "omp", omp_pragmas[i].name,
+ omp_pragmas[i].id, true, true);
+   }
+  if (flag_openmp || flag_openmp_simd)
+   {
+ const int n_omp_pragmas_simd
+   = sizeof (omp_pragmas_simd) / sizeof (*omp_pragmas);
+ int i;
 
-  for (i = 0; i < n_omp_pragmas_simd; ++i)
-   cpp_register_deferred_pragma (parse_in, "omp", omp_pragmas_simd[i].name,
- omp_pragmas_simd[i].id, true, true);
+ for (i = 0; i < n_omp_pragmas_simd; ++i)
+   cpp_register_deferred_pragma (parse_in, "omp",
+ omp_pragmas_simd[i].name,
+ omp_pragmas_simd[i].id, true, true);
+   }
 }
 
   if (!flag_preprocess_only)
diff --git a/gcc/testsuite/c-c++-common/cpp/pr108244-1.c 
b/gcc/testsuite/c-c++-common/cpp/pr108244-1.c
new file mode 100644
index 000..1678004a4d9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr108244-1.c
@@ -0,0 +1,5 @@
+/* { dg-do preprocess } */
+/* { dg-additional-options "-fdirectives-only" } */
+#pragma GCC diagnostic push
+#ifdef t
+#endif
diff --git a/gcc/testsuite/c-c++-common/cpp/pr108244-2.c 
b/gcc/testsuite/c-c++-common/cpp/pr108244-2.c
new file mode 100644
index 000..017682ad186
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/pr108244-2.c
@@ -0,0 +1,5 @@
+/* { dg-do preprocess } */
+/* { 

Ping^3: [PATCH] libcpp: Improve location for macro names [PR66290]

2022-12-01 Thread Lewis Hyatt via Gcc-patches
Hello-

https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html

May I please ping this one? Thanks!
I have also re-attached the rebased patch here.

-Lewis

On Wed, Oct 12, 2022 at 06:37:50PM -0400, Lewis Hyatt wrote:
> Hello-
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> 
> Since Jeff was kind enough to ack one of my other preprocessor patches
> today, I have become emboldened to ping this one again too :). Would
> anyone have some time to take a look at it please? Thanks!
> 
> -Lewis
> 
> On Thu, Sep 15, 2022 at 6:31 PM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> > May I please ping this patch? Thank you.
> >
> > -Lewis
> >
> > On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
> > >
> > >
> > > When libcpp reports diagnostics whose locus is a macro name (such as for
> > > -Wunused-macros), it uses the location in the cpp_macro object that was
> > > stored by _cpp_new_macro. This is currently set to pfile->directive_line,
> > > which contains the line number only and no column information. This patch
> > > changes the stored location to the src_loc for the token defining the 
> > > macro
> > > name, which includes the location and range information.
> > >
> > > libcpp/ChangeLog:
> > >
> > > PR c++/66290
> > > * macro.cc (_cpp_create_definition): Add location argument.
> > > * internal.h (_cpp_create_definition): Adjust prototype.
> > > * directives.cc (do_define): Pass new location argument to
> > > _cpp_create_definition.
> > > (do_undef): Stop passing inferior location to 
> > > cpp_warning_with_line;
> > > the default from cpp_warning is better.
> > > (cpp_pop_definition): Pass new location argument to
> > > _cpp_create_definition.
> > > * pch.cc (cpp_read_state): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > PR c++/66290
> > > * c-c++-common/cpp/macro-ranges.c: New test.
> > > * c-c++-common/cpp/line-2.c: Adapt to check for column information
> > > on macro-related libcpp warnings.
> > > * c-c++-common/cpp/line-3.c: Likewise.
> > > * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> > > * c-c++-common/cpp/pr58844-1.c: Likewise.
> > > * c-c++-common/cpp/pr58844-2.c: Likewise.
> > > * c-c++-common/cpp/warning-zero-location.c: Likewise.
> > > * c-c++-common/pragma-diag-14.c: Likewise.
> > > * c-c++-common/pragma-diag-15.c: Likewise.
> > > * g++.dg/modules/macro-2_d.C: Likewise.
> > > * g++.dg/modules/macro-4_d.C: Likewise.
> > > * g++.dg/modules/macro-4_e.C: Likewise.
> > > * g++.dg/spellcheck-macro-ordering.C: Likewise.
> > > * gcc.dg/builtin-redefine.c: Likewise.
> > > * gcc.dg/cpp/Wunused.c: Likewise.
> > > * gcc.dg/cpp/redef2.c: Likewise.
> > > * gcc.dg/cpp/redef3.c: Likewise.
> > > * gcc.dg/cpp/redef4.c: Likewise.
> > > * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> > > * gcc.dg/cpp/ucnid-11.c: Likewise.
> > > * gcc.dg/cpp/undef2.c: Likewise.
> > > * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> > > * gcc.dg/cpp/warn-redefined.c: Likewise.
> > > * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> > > * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> > > ---
> > >
> > > Notes:
> > > Hello-
> > >
> > > The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was 
> > > originally
> > > about the entirely wrong location for -Wunused-macros in C++ mode, 
> > > which
> > > behavior was fixed by r13-1903, but before closing it out I wanted to 
> > > also
> > > address a second point brought up in the PR comments, namely that we 
> > > do not
> > > include column information when emitting diagnostics for macro names, 
> > > such as
> > > is done for -Wunused-macros. The attached patch updates the location 
> > > stored in
> > > the cpp_macro object so that it includes the column and range 
> > > information for
> > > the token comprising the macro name; previously, the location was 
> > > just the
> > > generic one pointing to the whole line.
> > >
> > > The change to libcpp is very small, the reason for all the testsuite 
> > > changes is
> > > that I have updated all tests explicitly looking for the columnless 
> > > diagnostics
> > > (with the "-:" syntax to dg-warning et al) so that they expect a 
> > > column
> > > instead. I also added a new test which verifies the expected range 
> > > information
> > > in diagnostics with carets.
> > >
> > > Bootstrap + regtest on x86-64 Linux looks good. Please let me know if 
> > > it looks
> > > OK? Thanks!
> > >
> > > -Lewis
> > >
> > >  libcpp/directives.cc  |  13 +-
> > >  libcpp/internal.h |   2 +-
> > >  libcpp/macro.cc   

Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2022-11-17 Thread Lewis Hyatt via Gcc-patches
On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote:
> On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> [...snip...]
> > 
> > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > index 5890c18bdc3..2935d7fb236 100644
> > --- a/gcc/c-family/c-common.cc
> > +++ b/gcc/c-family/c-common.cc
> > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const 
> > char *file, location_t loc)
> >const line_map_ordinary *ord_map
> > = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> >  
> > +  if (ord_map->reason == LC_GEN)
> > +   continue;
> > +
> >if (const line_map_ordinary *from
> >   = linemap_included_from_linemap (line_table, ord_map))
> > /* We cannot use pointer equality, because with preprocessed
> >input all filename strings are unique.  */
> > -   if (0 == strcmp (from->to_file, file))
> > +   if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
> >   {
> > last_include_ord_map = from;
> > last_ord_map_after_include = NULL;
> 
> [...snip...]
> 
> I'm not a fan of having the "to_file" field change meaning based on
> whether reason is LC_GEN.
> 
> How involved would it be to split line_map_ordinary into two
> subclasses, so that we'd have this hierarchy (with indentation showing
> inheritance):
> 
> line_map
>   line_map_ordinary
> line_map_ordinary_file
> line_map_ordinary_generated
>   line_map_macro
> 
> Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
> to emphasize that it might not be a filename, and that we have to check
> everywhere we access that field.
> 
> Please can all those checks for LC_GEN go into an inline function so we
> can write e.g.
>   map->generated_p ()
> or somesuch.
> 
> If I reading things right, patch 6 adds the sole usage of this in
> destringize_and_run.  Would we ever want to discriminate between
> different kinds of generated buffers?
> 
> [...snip...]
> 
> > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context 
> > *context, location_t where)
> >  N_("of module"),
> >  N_("In module imported at"),   /* 6 */
> >  N_("imported at"),
> > +N_("In buffer generated from"),   /* 8 */
> > };
> 
> We use the wording "destringized" in:
> 
> so maybe this should be "In buffer destringized from" ???  (I'm not
> sure) 
> 
> [...snip...]
> 
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 483cb6e940d..3cf5480551d 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> 
> [..snip...]
> 
> > @@ -58,7 +64,7 @@ public:
> >~file_cache_slot ();
> 
> My initial thought reading the input.cc part of this patch was that I
> want it to be very clear when a file_cache_slot is for a real file vs
> when we're replaying generated data.  I'd hoped that this could have
> been expressed via inheritance, but we preallocate all the cache slots
> once in an array in file_cache's ctor and the slots get reused over
> time.  So instead of that, can we please have some kind of:
> 
>bool file_slot_p () const;
>bool generated_slot_p () const;
> 
> or somesuch, so that we can have clear assertions and conditionals
> about the current state of a slot (I think the discriminating condition
> is that generated_data_len > 0, right?)
> 
> If I'm reading things right, it looks like file_cache_slot::m_file_path
> does double duty after this patch, and is either a filename, or a
> pointer to the generated data.  If so, please can the patch rename it,
> and have all usage guarded appropriately.  Can it be a union? (or does
> the ctor prevent that?)
> 
> [...snip...]
>  
> > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned 
> > *highest_use_count)
> > num_file_slots files are cached.  */
> >  
> >  file_cache_slot*
> > -file_cache::add_file (const char *file_path)
> > +file_cache::add_file (const char *file_path, unsigned int 
> > generated_data_len)
> 
> Can we split this into two functions: one for files, and one for
> generated data?  (add_file vs add_generated_data?)
> 
> >  {
> >  
> > -  FILE *fp = fopen (file_path, "r");
> > -  if (fp == NULL)
> > -return NULL;
> > +  FILE *fp;
> > +  if (generated_data_len)
> > +fp = NULL;
>

Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2022-11-05 Thread Lewis Hyatt via Gcc-patches
Thanks for the comments! I have some replies below.

On Sat, Nov 5, 2022 at 12:23 PM David Malcolm  wrote:
> [...snip...]
> >
> > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > index 5890c18bdc3..2935d7fb236 100644
> > --- a/gcc/c-family/c-common.cc
> > +++ b/gcc/c-family/c-common.cc
> > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const 
> > char *file, location_t loc)
> >const line_map_ordinary *ord_map
> > = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> >
> > +  if (ord_map->reason == LC_GEN)
> > +   continue;
> > +
> >if (const line_map_ordinary *from
> >   = linemap_included_from_linemap (line_table, ord_map))
> > /* We cannot use pointer equality, because with preprocessed
> >input all filename strings are unique.  */
> > -   if (0 == strcmp (from->to_file, file))
> > +   if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
> >   {
> > last_include_ord_map = from;
> > last_ord_map_after_include = NULL;
>
> [...snip...]
>
> I'm not a fan of having the "to_file" field change meaning based on
> whether reason is LC_GEN.
>
> How involved would it be to split line_map_ordinary into two
> subclasses, so that we'd have this hierarchy (with indentation showing
> inheritance):
>
> line_map
>   line_map_ordinary
> line_map_ordinary_file
> line_map_ordinary_generated
>   line_map_macro
>
> Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
> to emphasize that it might not be a filename, and that we have to check
> everywhere we access that field.
>

Yeah, there were definitely a lot of ways to go about this. I settled
on the approach of minimizing the changes to libcpp for a couple
reasons. One is that I didn't want to add any extra overhead to
handling of non-_Pragma lexing, which is of course most of the time. I
think it's nice that lex.cpp was not touched at all for this change,
for example. The reason I re-used the to_file field was that this
class seems to be very concerned about minimizing space overhead (c.f.
all the comments about pointer alignment boundaries, etc.) I feel like
the reason for that attention was that the addition of macro location
tracking added a lot of overhead when it was implemented and the
authors wanted to minimize that. Nowadays, perhaps the RAM usage is
not as much of a concern. We do create a lot of line_map instances,
though. The other reason is that the line-maps API is already pretty
error-prone to use. A given location_t could be an ordinary location,
or a virtual location, or an ad-hoc location. Going through the
_Pragma location-related bugs that have been fixed over the years, it
seems like most of them stemmed from failing to check one or the other
of these cases when needed. So I was worried that adding yet another
type of location would make things worse.

But I see your point certainly. I feel like adding a new subclass will
require touching many more call sites, so not sure how it will look. I
guess I would be concerned about adding too many new conditional
branches. There are already very many, since almost every use of
line-maps API has to check for ad-hoc location first, etc. At some
point, if there are too many branches, it makes more sense to use
virtual functions instead and would perform better. I guess the
fundamental issue is that it's really a C-like API that has had C++
features added on to it over time, probably redesigning the API from
scratch would yield something cleaner. Given I wasn't proposing that
for now, I thought making the minimal possible change here would be
the way to go.

What do you think about making to_file a union and adjusting the
handful of places that would care? That could be a good improvement
that's in the right direction.

> Please can all those checks for LC_GEN go into an inline function so we
> can write e.g.
>   map->generated_p ()
> or somesuch.
>

Sure. I guess for consistency it has to look something like
LINEMAP_ORDINARY_GENERATED_P (map).

> If I reading things right, patch 6 adds the sole usage of this in
> destringize_and_run.  Would we ever want to discriminate between
> different kinds of generated buffers?
>

One other possible use case I had in mind was for builtin macros, e.g.
right now for something like

const char* line = __LINE__;

the diagnostic points just to the __LINE__ token. With an LC_GEN map
it could show the user that __LINE__ has expanded to an integer rather
than a string. Something like that. But anyway that was just an aside,
the way I was envisioning it, just one type of LC_GEN map is needed,
although I can see it might be nice to know further what it was made
for.

I could imagine eventually the static analyzer finding a use of it
also. For instance, you had a recent patch that asks libcpp to lex a
buffer containing a macro token, to get the expanded value. If a
diagnostic could be generated during that process for 

Re: [PATCH 5/6] diagnostics: Support generated data in additional contexts

2022-11-04 Thread Lewis Hyatt via Gcc-patches
On Fri, Nov 04, 2022 at 12:42:29PM -0400, David Malcolm wrote:
> On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> > Add awareness that diagnostic locations may be in generated buffers
> > rather
> > than an actual file to other places in the diagnostics code that may
> > care,
> > most notably SARIF output (which needs to obtain its own snapshots of
> > the code
> > involved). For edit context output, which outputs fixit hints as
> > diffs, for
> > now just make sure we ignore generated data buffers. At the moment,
> > there is
> > no ability for a fixit hint to be generated in such a buffer.
> > 
> > Because SARIF uses JSON as well, also add the ability to the
> > json::string
> > class to handle a buffer with nulls in the middle (since we place no
> > restriction on LC_GEN content) by providing the option to specify the
> > data
> > length.
> 
> Please can you split this patch into three parts:
> - the SARIF part
> - the json changes
> - the edit-context.cc changes (I think this at least counts as an
> "obvious" change with respect to the other changes in the kit, though
> I'm still working my way through patch 4 in the kit).
> 
> Please add a DejaGnu testcase to the SARIF part, with a diagnostic that
> references a generated data buffer; see
>   gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-*.c 
> for examples of SARIF testcases.
> 
> Please add a selftest to the json change so that we have a unit test of
> constructing a json::string with an embedded NUL, and how we serialize
> such a string (probably to json.cc's test_writing_strings)
> 
> Thanks
> Dave

Yes, certainly, sorry for not splitting it up more to start with. Regarding
the SARIF testcase, it's not that easy to get SARIF output to actually output
generated data, because as of now it can only appear in a _Pragma, and SARIF
does not output macro definitions currently. I think the only way I know to do
it, is to make use of -fdump-internal-locations, which generates top-level
inform() calls inside the _Pragma that can end up in the SARIF output. So I
wrote a testcase that does this, but not sure how you will feel about having
the testsuite rely on this internal debugging option.

I wasn't sure what's the best way to send the 3 split up patches. I attached
them here as 5a/6, 5b/6, 5c/6, in case that's right, but I wasn't sure if I 
should just
resend the whole batch (minus perhaps the 2 you have already acked), and/or if
I should wait for feedback on the other patches first. Happy to do whatever
makes it easier for you, and thanks for your time! Note that the new SARIF
patch (5c/6) now needs to come last in the series, after the patch 6/6 that
actually supports _Pragma, so that the new testcase can make use of that.

-Lewis
[PATCH 5a/6] diagnostics: Handle generated data locations in edit_context

Class edit_context handles outputting fixit hints in diff form that could be
manually or automatically applied by the user. This will not make sense for
generated data locations, such as the contents of a _Pragma string, because
the text to be modified does not appear in the user's input files. We do not
currently ever generate fixit hints in such a context, but for future-proofing
purposes, ignore such locations in edit context now.

gcc/ChangeLog:

* edit-context.cc (edit_context::apply_fixit): Ignore locations in
generated data.

diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6879ddd41b4..aa95bc0834f 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint)
 return false;
   if (start.column == 0)
 return false;
+  if (start.generated_data)
+return false;
   if (next_loc.column == 0)
 return false;
+  if (next_loc.generated_data)
+return false;
 
   edited_file  = get_or_insert_file (start.file);
   if (!m_valid)
[PATCH 5b/6] diagnostics: Remove null-termination requirement for json::string

json::string currently handles null-terminated data and so can't work with
data that may contain embedded null bytes or that is not null-terminated.
Supporting such data will make json::string more robust in some contexts, such
as SARIF output, which uses it to output user source code that may contain
embedded null bytes.

gcc/ChangeLog:

* json.h (class string): Add M_LEN member to store the length of
the data.  Add constructor taking an explicit length.
* json.cc (string::string):  Implement the new constructor.
(string::print): Support print strings that are not null-terminated.
Escape embdedded null bytes on output.
(test_writing_strings): Test the new null-byte-related features of
json::string.

diff --git a/gcc/json.cc b/gcc/json.cc
index 974f8c36825..3a79cac0

[PATCH 6/6] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings

2022-11-04 Thread Lewis Hyatt via Gcc-patches
Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^

with the caret in a nonsensical location, while this one:

=
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=

produces:

file.cpp:2:24: warning: missing terminating " character
2 | _Pragma(S)
  |^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

==
In buffer generated from file.cpp:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

and

==
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file.cpp:2:1: note: in <_Pragma directive>
2 | _Pragma(S)
  | ^~~
==

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

* directives.cc (get_token_no_padding): Add argument to receive the
virtual location of the token.
(get__Pragma_string): Likewise.
(do_pragma): Set pfile->directive_result->src_loc properly, it should
not be a virtual location.
(destringize_and_run): Update to provide proper locations for the
_Pragma string tokens.  Support raw strings.
(_cpp_do__Pragma): Adapt to changes to the helper functions.
* errors.cc (cpp_diagnostic_at): Support
cpp_reader::diagnostic_rebase_loc.
(cpp_diagnostic_with_line): Likewise.
* include/line-map.h (class rich_location): Add new member
forget_cached_expanded_locations().
* internal.h (struct _cpp__Pragma_state): Define new struct.
(_cpp_rebase_diagnostic_location): Declare new function.
(struct cpp_reader): Add diagnostic_rebase_loc member.
(_cpp_push__Pragma_token_context): Declare new function.
(_cpp_do__Pragma): Adjust prototype.
* macro.cc (pragma_str): New static var.
(builtin_macro): Adapt to new implementation of _Pragma processing.
(_cpp_pop_context): Fix the logic for resetting
pfile->top_most_macro_node, which previously was never triggered,
although the error seems to have been harmless.
(_cpp_push__Pragma_token_context): New function.
(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
macro tracking output for _Pragma directives.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
tracking output for _Pragma directives.
* c-c++-common/cpp/pr57580.c: Likewise.
* c-c++-common/gomp/pragma-3.c: Likewise.

[PATCH 5/6] diagnostics: Support generated data in additional contexts

2022-11-04 Thread Lewis Hyatt via Gcc-patches
Add awareness that diagnostic locations may be in generated buffers rather
than an actual file to other places in the diagnostics code that may care,
most notably SARIF output (which needs to obtain its own snapshots of the code
involved). For edit context output, which outputs fixit hints as diffs, for
now just make sure we ignore generated data buffers. At the moment, there is
no ability for a fixit hint to be generated in such a buffer.

Because SARIF uses JSON as well, also add the ability to the json::string
class to handle a buffer with nulls in the middle (since we place no
restriction on LC_GEN content) by providing the option to specify the data
length.

gcc/ChangeLog:

* diagnostic-format-sarif.cc (sarif_builder::xloc_to_fb): New function.
(sarif_builder::maybe_make_physical_location_object): Support
generated data locations.
(sarif_builder::make_artifact_location_object): Likewise.
(sarif_builder::maybe_make_region_object_for_context): Likewise.
(sarif_builder::make_artifact_object): Likewise.
(sarif_builder::maybe_make_artifact_content_object): Likewise.
(get_source_lines): Likewise.
* edit-context.cc (edit_context::apply_fixit): Ignore generated
locations if one should make its way this far.
* json.cc (string::string): Support non-null-terminated string.
(string::print): Likewise.
* json.h (class string): Likewise.
---
 gcc/diagnostic-format-sarif.cc | 86 +-
 gcc/edit-context.cc|  4 ++
 gcc/json.cc| 17 +--
 gcc/json.h |  5 +-
 4 files changed, 75 insertions(+), 37 deletions(-)

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 7110db4edd6..c2d18a1a16e 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -125,7 +125,10 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+
+  typedef std::pair filename_or_buffer;
+  json::object *make_artifact_location_object (filename_or_buffer fb);
+
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -146,16 +149,17 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) 
const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
-   int start_line,
+  json::object *make_artifact_object (filename_or_buffer fb);
+  json::object *
+  maybe_make_artifact_content_object (filename_or_buffer fb) const;
+  json::object *maybe_make_artifact_content_object (expanded_location xloc,
int end_line) const;
   json::object *make_fix_object (const rich_location _loc);
   json::object *make_artifact_change_object (const rich_location );
   json::object *make_replacement_object (const fixit_hint ) const;
   json::object *make_artifact_content_object (const char *text) const;
   int get_sarif_column (expanded_location exploc) const;
+  static filename_or_buffer xloc_to_fb (expanded_location xloc);
 
   diagnostic_context *m_context;
 
@@ -166,7 +170,11 @@ private:
  diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set  m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+ with that length, not a filename.  */
+  hash_set ,
+  int_hash  >
+   > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set  m_rule_id_set;
   json::array *m_rules_arr;
@@ -588,6 +596,15 @@ sarif_builder::make_location_object (const 
diagnostic_event )
   return location_obj;
 }
 
+/* Populate a filename_or_buffer pair from an expanded location.  */
+sarif_builder::filename_or_buffer
+sarif_builder::xloc_to_fb (expanded_location xloc)
+{
+  if (xloc.generated_data_len)
+return filename_or_buffer (xloc.generated_data, xloc.generated_data_len);
+  return filename_or_buffer (xloc.file, 0);
+}
+
 /* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC,
or return NULL;
Add any filename to the m_artifacts.  */
@@ -603,7 +620,7 @@ sarif_builder::maybe_make_physical_location_object 
(location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set 

[PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers

2022-11-04 Thread Lewis Hyatt via Gcc-patches
Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The actual change needed to the line-maps API in libcpp is very minimal and
requires no space overhead in the line map data structures (on 64-bit systems
that is; one newly added data member to class line_map_ordinary sits inside
former padding bytes.) An LC_GEN map is just an ordinary map like any other,
but the TO_FILE member that normally points to the file name points instead to
the actual data.  This works automatically with PCH as well, for the same
reason that the file name makes its way into a PCH.

Outside libcpp, there are many small changes but most of them are to
selftests, which are necessarily more sensitive to implementation
details. From the perspective of the user (the "user", here, being a frontend
using line maps or else the diagnostics infrastructure), the chief visible
change is that the function location_get_source_line() should be passed an
expanded_location object instead of a separate filename and line number.  This
is not a big change because in most cases, this information came anyway from a
call to expand_location and the needed expanded_location object is readily
available. The new overload of location_get_source_line() uses the extra
information in the expanded_location object to obtain the data from the
in-memory buffer when it originated from an LC_GEN map.

Until the subsequent patch that starts using LC_GEN maps, none are yet
generated within GCC, hence nothing is added to the testsuite here; but all
relevant selftests have been extended to cover generated data maps in
addition to normal files.

libcpp/ChangeLog:

* include/line-map.h (enum lc_reason): Add LC_GEN.
(struct line_map_ordinary): Add new member to_file_len and update the
GTY markup on to_file to support embedded null bytes.
(class expanded_location): Add new members to store generated content.
* line-map.cc (linemap_add): Add new argument to_file_len to support
generated content. Implement LC_GEN maps.
(linemap_line_start): Pass new to_file_len argument to linemap_add.
(linemap_expand_location): Support LC_GEN locations.
(linemap_dump): Likewise.

gcc/ChangeLog:

* diagnostic-show-locus.cc (make_range): Initialize new fields in
expanded_location.
(layout::calculate_x_offset_display): Use the new expanded_location
overload of location_get_source_line(), so as to support LC_GEN maps.
(layout::print_line): Likewise.
(source_line::source_line): Likewise.
(line_corrections::add_hint): Likewise.
(class line_corrections): Store the location as an exploc rather than
individual filename, so as to support LC_GEN maps.
(layout::print_trailing_fixits): Use the new exploc constructor for
class line_corrections.
(test_layout_x_offset_display_utf8): Test LC_GEN maps as well as normal.
(test_layout_x_offset_display_tab): Likewise.
(test_diagnostic_show_locus_one_liner): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_add_location_if_nearby): Likewise.
(test_diagnostic_show_locus_fixit_lines): Likewise.
(test_fixit_consolidation): Likewise.
(test_overlapped_fixit_printing): Likewise.
(test_overlapped_fixit_printing_utf8): Likewise.
(test_overlapped_fixit_printing_2): Likewise.
(test_fixit_insert_containing_newline): Likewise.
(test_fixit_insert_containing_newline_2): Likewise.
(test_fixit_replace_containing_newline): Likewise.
(test_fixit_deletion_affecting_newline): Likewise.
(test_tab_expansion): Likewise.
(test_escaping_bytes_1): Likewise.
(test_escaping_bytes_2): Likewise.
(test_line_numbers_multiline_range): Likewise.
(diagnostic_show_locus_cc_tests): Likewise.
* diagnostic.cc (diagnostic_report_current_module): Support LC_GEN
maps when outputting include trace.
(assert_location_text): Zero-initialize the expanded_location so as to
cover all fields, including the newly added ones.
* gcc-rich-location.cc (blank_line_before_p): Use the new
expanded_location overload of location_get_source_line().
* input.cc (class file_cache_slot): Add new member m_data_active.
(file_cache_slot::file_cache_slot): Initialize new member.
(special_fname_generated): New function.
(expand_location_1): Recognize LC_GEN locations and output the 

[PATCH 2/6] diagnostics: Use an inline function rather than hardcoding string

2022-11-04 Thread Lewis Hyatt via Gcc-patches
The string "" is hard-coded in several places throughout the
diagnostics code, and in some of those places, it is used incorrectly with
respect to internationalization. (Comparing a translated string to an
untranslated string.) The error is not currently observable in any output GCC
actually produces, hence no testcase added here, but it's worth fixing, and
also, I am shortly going to add a new such string and want to avoid hardcoding
that one in similar places.

gcc/c-family/ChangeLog:

* c-opts.cc (c_finish_options): Use special_fname_builtin () rather
than a hard-coded string.

gcc/ChangeLog:

* diagnostic.cc (diagnostic_get_location_text): Use
special_fname_builtin () rather than a hardcoded string (which was
also incorrectly left untranslated previously.)
* input.cc (special_fname_builtin): New function.
(expand_location_1): Use special_fname_builtin () rather than a
hard-coded string.
(test_builtins): Likewise.
* input.h (special_fname_builtin): Declare.

gcc/fortran/ChangeLog:

* cpp.cc (gfc_cpp_init): Use special_fname_builtin () rather than a
hardcoded string (which was also incorrectly left untranslated
previously.)
* error.cc (gfc_diagnostic_build_locus_prefix): Likewise.
* f95-lang.cc (gfc_init): Likewise.
---
 gcc/c-family/c-opts.cc  |  2 +-
 gcc/diagnostic.cc   |  2 +-
 gcc/fortran/cpp.cc  |  2 +-
 gcc/fortran/error.cc|  4 ++--
 gcc/fortran/f95-lang.cc |  2 +-
 gcc/input.cc| 10 --
 gcc/input.h |  3 +++
 7 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 32b929e3ece..521797fb7eb 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1476,7 +1476,7 @@ c_finish_options (void)
 {
   const line_map_ordinary *bltin_map
= linemap_check_ordinary (linemap_add (line_table, LC_RENAME, 0,
-  _(""), 0));
+  special_fname_builtin (), 0));
   cb_file_change (parse_in, bltin_map);
   linemap_line_start (line_table, 0, 1);
 
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 22f7b0b6d6e..7c7ee6da746 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -470,7 +470,7 @@ diagnostic_get_location_text (diagnostic_context *context,
   const char *file = s.file ? s.file : progname;
   int line = 0;
   int col = -1;
-  if (strcmp (file, N_("")))
+  if (strcmp (file, special_fname_builtin ()))
 {
   line = s.line;
   if (context->show_column)
diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
index 364bd0d2a85..0b5755edbb4 100644
--- a/gcc/fortran/cpp.cc
+++ b/gcc/fortran/cpp.cc
@@ -605,7 +605,7 @@ gfc_cpp_init (void)
   if (gfc_option.flag_preprocessed)
 return;
 
-  cpp_change_file (cpp_in, LC_RENAME, _(""));
+  cpp_change_file (cpp_in, LC_RENAME, special_fname_builtin ());
   if (!gfc_cpp_option.no_predefined)
 {
   /* Make sure all of the builtins about to be declared have
diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index c9d6edbb923..214fb78ba7b 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -1147,7 +1147,7 @@ gfc_diagnostic_build_locus_prefix (diagnostic_context 
*context,
   const char *locus_ce = colorize_stop (pp_show_color (pp));
   return (s.file == NULL
  ? build_message_string ("%s%s:%s", locus_cs, progname, locus_ce )
- : !strcmp (s.file, N_(""))
+ : !strcmp (s.file, special_fname_builtin ())
  ? build_message_string ("%s%s:%s", locus_cs, s.file, locus_ce)
  : context->show_column
  ? build_message_string ("%s%s:%d:%d:%s", locus_cs, s.file, s.line,
@@ -1167,7 +1167,7 @@ gfc_diagnostic_build_locus_prefix (diagnostic_context 
*context,
 
   return (s.file == NULL
  ? build_message_string ("%s%s:%s", locus_cs, progname, locus_ce )
- : !strcmp (s.file, N_(""))
+ : !strcmp (s.file, special_fname_builtin ())
  ? build_message_string ("%s%s:%s", locus_cs, s.file, locus_ce)
  : context->show_column
  ? build_message_string ("%s%s:%d:%d-%d:%s", locus_cs, s.file, s.line,
diff --git a/gcc/fortran/f95-lang.cc b/gcc/fortran/f95-lang.cc
index a6750bea787..0d83f3f8b69 100644
--- a/gcc/fortran/f95-lang.cc
+++ b/gcc/fortran/f95-lang.cc
@@ -259,7 +259,7 @@ gfc_init (void)
   if (!gfc_cpp_enabled ())
 {
   linemap_add (line_table, LC_ENTER, false, gfc_source_file, 1);
-  linemap_add (line_table, LC_RENAME, false, "", 0);
+  linemap_add (line_table, LC_RENAME, false, special_fname_builtin (), 0);
 }
   else
 gfc_cpp_init_0 ();
diff --git a/gcc/input.cc b/gcc/input.cc
index a28abfac5ac..483cb6e940d 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -29,6 +29,12 @@ along with GCC; see the file COPYING3.  If not see
 #define HAVE_ICONV 0
 #endif
 
+const char *
+special_fname_builtin ()
+{
+  

[PATCH 3/6] libcpp: Fix paste error with unknown pragma after macro expansion

2022-11-04 Thread Lewis Hyatt via Gcc-patches
In directives.cc, do_pragma() contains logic to handle a case such as the new
testcase pragma-omp-unknown.c, where an unknown pragma was the result of macro
expansion (for pragma namespaces that permit expansion). This no longer works
correctly as shown by the testcase, fixed by adding PREV_WHITE to the flags on
the second token to prevent an unwanted paste.  Also fixed the memory leak,
since the temporary tokens are pushed on their own context, nothing prevents
freeing of the buffer that holds them when the context is eventually popped.

libcpp/ChangeLog:

* directives.cc (do_pragma): Fix memory leak in token buffer.  Fix
unwanted paste between two tokens.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/pragma-omp-unknown.c: New test.
---
 gcc/testsuite/c-c++-common/gomp/pragma-omp-unknown.c | 10 ++
 libcpp/directives.cc | 10 +-
 2 files changed, 15 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/pragma-omp-unknown.c

diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-omp-unknown.c 
b/gcc/testsuite/c-c++-common/gomp/pragma-omp-unknown.c
new file mode 100644
index 000..04881f786ab
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-omp-unknown.c
@@ -0,0 +1,10 @@
+/* { dg-do preprocess } */
+/* { dg-options "-fopenmp" } */
+
+#define X UNKNOWN1
+#pragma omp X
+/* { dg-final { scan-file pragma-omp-unknown.i "#pragma omp UNKNOWN1" } } */
+
+#define Y UNKNOWN2
+_Pragma("omp Y")
+/* { dg-final { scan-file pragma-omp-unknown.i "#pragma omp UNKNOWN2" } } */
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 918752f6b1f..9dc4363c65a 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1565,15 +1565,15 @@ do_pragma (cpp_reader *pfile)
{
  /* Invalid name comes from macro expansion, _cpp_backup_tokens
 won't allow backing 2 tokens.  */
- /* ??? The token buffer is leaked.  Perhaps if def_pragma hook
-reads both tokens, we could perhaps free it, but if it doesn't,
-we don't know the exact lifespan.  */
- cpp_token *toks = XNEWVEC (cpp_token, 2);
+ const auto tok_buff = _cpp_get_buff (pfile, 2 * sizeof (cpp_token));
+ const auto toks = (cpp_token *)tok_buff->base;
  toks[0] = ns_token;
  toks[0].flags |= NO_EXPAND;
  toks[1] = *token;
- toks[1].flags |= NO_EXPAND;
+ toks[1].flags |= NO_EXPAND | PREV_WHITE;
  _cpp_push_token_context (pfile, NULL, toks, 2);
+ /* Arrange to free this buffer when no longer needed.  */
+ pfile->context->buff = tok_buff;
}
   pfile->cb.def_pragma (pfile, pfile->directive_line);
 }


[PATCH 1/6] diagnostics: Fix macro tracking for ad-hoc locations

2022-11-04 Thread Lewis Hyatt via Gcc-patches
The result of linemap_resolve_location() can be an ad-hoc location, if that is
what was stored in a relevant macro map.  maybe_unwind_expanded_macro_loc()
did not previously handle this case, causing it to print the wrong tracking
information for an example such as the new testcase macro-trace-1.c.  Fix that
by checking for ad-hoc locations where needed.

gcc/ChangeLog:

* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Handle ad-hoc
location in return value of linemap_resolve_location().

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/macro-trace-1.c: New test.
---
 gcc/testsuite/c-c++-common/cpp/macro-trace-1.c | 4 
 gcc/tree-diagnostic.cc | 7 +--
 2 files changed, 9 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/cpp/macro-trace-1.c

diff --git a/gcc/testsuite/c-c++-common/cpp/macro-trace-1.c 
b/gcc/testsuite/c-c++-common/cpp/macro-trace-1.c
new file mode 100644
index 000..34cfbb3dad3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/cpp/macro-trace-1.c
@@ -0,0 +1,4 @@
+/* This token is long enough to require an ad-hoc location. Make sure that
+   the macro trace still prints properly.  */
+#define X "0123456789012345678901234567689" /* { dg-error {expected .* before 
string constant} } */
+X /* { dg-note {in expansion of macro 'X'} } */
diff --git a/gcc/tree-diagnostic.cc b/gcc/tree-diagnostic.cc
index 0d79fe3c3c1..5cf3a1c17d2 100644
--- a/gcc/tree-diagnostic.cc
+++ b/gcc/tree-diagnostic.cc
@@ -190,14 +190,17 @@ maybe_unwind_expanded_macro_loc (diagnostic_context 
*context,
 location_t l = 
   linemap_resolve_location (line_table, resolved_def_loc,
 LRK_SPELLING_LOCATION,  );
-if (l < RESERVED_LOCATION_COUNT || LINEMAP_SYSP (m))
+   location_t l0 = l;
+   if (IS_ADHOC_LOC (l0))
+ l0 = get_location_from_adhoc_loc (line_table, l0);
+   if (l0 < RESERVED_LOCATION_COUNT || LINEMAP_SYSP (m))
   continue;
 
/* We need to print the context of the macro definition only
   when the locus of the first displayed diagnostic (displayed
   before this trace) was inside the definition of the
   macro.  */
-int resolved_def_loc_line = SOURCE_LINE (m, l);
+   const int resolved_def_loc_line = SOURCE_LINE (m, l0);
 if (ix == 0 && saved_location_line != resolved_def_loc_line)
   {
 diagnostic_append_note (context, resolved_def_loc, 


[PATCH 0/6] diagnostics: libcpp: Overhaul locations for _Pragma tokens

2022-11-04 Thread Lewis Hyatt via Gcc-patches
Hello-

In the past couple years there has been a ton of progress in fixing bugs
related to _Pragma, especially its use in the type of macros that many
projects like to implement for manipulating GCC diagnostic pragmas more
easily. For GCC 13 I have been going through the remaining open PRs, fixing a
couple and adding testcases for several that were already fixed. I felt that
made it a good time to overhaul one of the last remaining issues with _Pragma
processing, which is that we do not currently assign good locations to the
tokens involved. The locations are very important, however, because that is
how GCC diagnostic pragmas will ultimately determine whether a given warning
should or should not apply at a given point. Currently, the tokens inside a
_Pragma string are all assigned the same location as the _Pragma token itself,
which is sufficient to make diagnostic pragmas work correctly. It does produce
somewhat inferior diagnostics, though, since we do not point the user to which
part of the _Pragma string caused the problem; and if the _Pragma string was
expanded from a macro, we do not even point them to the string at all.

Further, the assignment of the fake location to the tokens inside the _Pragma
string takes place after all the tokens have been lexed -- consequently, if a
diagnostic is issued by libcpp during that process, it doesn't benefit from the
patched-up location and instead uses a bogus location. As a quick example,
compiling:

=
_Pragma("GCC diagnostic ignored \"oops")
=

produces:

=
file:1:24: warning: missing terminating " character
1 | _Pragma("GCC diagnostic ignored \"oops")
  |^
=

It is surprisingly involved to make that caret point to something
reasonable. The reason it points to the middle of nowhere is that the current
implementation of _Pragma in directives.cc:destringize_and_run() does not
touch the line_maps instance at all, and so does not inform it where the
tokens are coming from. But the line_maps API in fact does not provide any way
to handle this case, so this needs to be added first. With all the changes in
this patch set, we would output instead:

==
In buffer generated from file:1:
:1:24: warning: missing terminating " character
1 | GCC diagnostic ignored "oops
  |^
file:1:1: note: in <_Pragma directive>
1 | _Pragma("GCC diagnostic ignored \"oops")
  | ^~~
==

Treating the _Pragma like a macro expansion makes everything consistent and
solves a ton of problems; all the locations involved will just make sense from
the user's point of view.

Patches 1-3 are tiny bug fixes that I came across while working on the new
testcases. I was a bit surprised that #1 and #3 especially did not have PRs
open, but I guess these small glitches have gone unnoticed so far.

Patch 4 is the largest one. It adds a new reason=LC_GEN for ordinary line
maps. These maps are just like normal ones, except the file name pointer
points not to a file name, but to the actual data in memory instead. This is
how we can issue diagnostics for code that did not appear in the user's input,
such as the de-stringized _Pragma string. The changes needed in libcpp to
support this concept are pretty small and straightforward. Most of the changes
outside of libcpp are in input.cc and diagnostic-show-locus.cc, which need to
learn how to obtain code from LC_GEN maps, and also a lot of the changes are
in selftests that are pretty sensitive to the internal implementation.

Patch 5 is a continuation of 4 that supports LC_GEN maps in less commonly used
places, such as the new SARIF output format, that also need to know how to
read source back from in-memory buffers in addition to files.

Patch 6 updates the implementation of _Pragma handling to use LC_GEN maps and
to create virtual locations for the tokens as in the example above. I have
also added support for the argument of the _Pragma to be a raw string, as
requested by PR83473, since this was easy to do while I was there.

1/6: diagnostics: Fix macro tracking for ad-hoc locations
2/6: diagnostics: Use an inline function rather than hardcoding 
 string
3/6: libcpp: Fix paste error with unknown pragma after macro expansion
4/6: diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
5/6: diagnostics: Support generated data in additional contexts
6/6: diagnostics: libcpp: Assign real locations to the tokens inside
 _Pragma strings

Bootstrap and regtest all languages on x86-64 Linux looks good.

I realize it's near the end of stage 1 now. It would still be great and I
would appreciate very much if this patch could get reviewed please? For GCC 13,
there have been several _Pragma-related bugs fixed (especially PR53431), and
addressing this location issue would tie it together nicely. Thanks very much!

-Lewis


Re: [PATCH] diagnostics: Allow FEs to keep customizations for middle end [PR101551, PR106274]

2022-11-03 Thread Lewis Hyatt via Gcc-patches
On Fri, Oct 28, 2022 at 10:28:21AM +0200, Richard Biener wrote:
> Yes, the idea was also to free up memory but then that part never
> really materialized - the idea was to always run free-lang-data, not
> just when later outputting LTO bytecode.  The reason is probably
> mainly the diagnostic regressions you observe.
> 
> Maybe a better strathegy than your patch would be to work towards
> that goal but reduce the number of "freeings", instead adjusting the
> LTO streamer to properly ignore frontend specific bits where clearing
> conflicts with the intent to preserve accurate diagnostics throughout
> the compilation.
> 
> If you see bits that when not freed would fix some of the observed
> issues we can see to replicate the freeing in the LTO output machinery.
> 
> Richard.

Thanks again for the suggestions. I took a look and it seems pretty doable to
just stop resetting all the diagnostics hooks in free-lang-data. Once that's
done, the only problematic part that I have been able to identify is here in
ipa-free-lang-data.c around line 674:


  /* We need to keep field decls associated with their trees. Otherwise tree
 merging may merge some fields and keep others disjoint which in turn will
 not do well with TREE_CHAIN pointers linking them.

 Also do not drop containing types for virtual methods and tables because
 these are needed by devirtualization.
 C++ destructors are special because C++ frontends sometimes produces
 virtual destructor as an alias of non-virtual destructor.  In
 devirutalization code we always walk through aliases and we need
 context to be preserved too.  See PR89335  */
  if (TREE_CODE (decl) != FIELD_DECL
  && ((TREE_CODE (decl) != VAR_DECL && TREE_CODE (decl) != FUNCTION_DECL)
  || (!DECL_VIRTUAL_P (decl)
  && (TREE_CODE (decl) != FUNCTION_DECL
  || !DECL_CXX_DESTRUCTOR_P (decl)
DECL_CONTEXT (decl) = fld_decl_context (DECL_CONTEXT (decl));


The C++ implementations of the decl_printable_name langhook and the diagnostic
starter callback do not work as-is when the DECL_CONTEXT for class member
functions disappears.  So I did have a patch that changes the C++
implementations to work in this case, but attached here is a new one along the
lines of what you suggested, rather changing the above part of free-lang-data
so it doesn't activate as often. The patch is pretty complete (other than
missing a commit message) and bootstrap + regtest all languages looks good
with no regressions. I tried the same with BUILD_CONFIG=bootstrap-lto as well,
and that also looked good when it eventually finished. I added testcases for
several frontends to verify that the diagnostics still work with -flto. I am
not sure what are the implications for LTO itself, of changing this part of
the pass, so I would have to ask you to weigh in on that aspect please. Thanks!

-Lewis
[PATCH] middle-end: Preserve frontend diagnostics in free-lang-data [PR101551, 
PR106274]

gcc/ChangeLog:

PR lto/106274
PR middle-end/101551
* ipa-free-lang-data.cc (free_lang_data_in_decl): Preserve
DECL_CONTEXT for class member functions.
(free_lang_data): Do not reset frontend diagnostics customizations.

gcc/testsuite/ChangeLog:

PR lto/106274
PR middle-end/101551
* c-c++-common/diag-after-fld-1.c: New test.
* g++.dg/diag-after-fld-1.C: New test.
* g++.dg/diag-after-fld-2.C: New test.
* gfortran.dg/allocatable_uninitialized_2.f90: New test.
* objc.dg/diag-after-fld-1.m: New test.

diff --git a/gcc/ipa-free-lang-data.cc b/gcc/ipa-free-lang-data.cc
index ccdbf849c25..391b7689639 100644
--- a/gcc/ipa-free-lang-data.cc
+++ b/gcc/ipa-free-lang-data.cc
@@ -682,10 +682,8 @@ free_lang_data_in_decl (tree decl, class free_lang_data_d 
*fld)
  devirutalization code we always walk through aliases and we need
  context to be preserved too.  See PR89335  */
   if (TREE_CODE (decl) != FIELD_DECL
-  && ((TREE_CODE (decl) != VAR_DECL && TREE_CODE (decl) != FUNCTION_DECL)
-  || (!DECL_VIRTUAL_P (decl)
- && (TREE_CODE (decl) != FUNCTION_DECL
- || !DECL_CXX_DESTRUCTOR_P (decl)
+  && TREE_CODE (decl) != VAR_DECL
+  && TREE_CODE (decl) != FUNCTION_DECL)
 DECL_CONTEXT (decl) = fld_decl_context (DECL_CONTEXT (decl));
 }
 
@@ -1115,7 +1113,6 @@ free_lang_data (void)
   /* Reset some langhooks.  Do not reset types_compatible_p, it may
  still be used indirectly via the get_alias_set langhook.  */
   lang_hooks.dwarf_name = lhd_dwarf_name;
-  lang_hooks.decl_printable_name = gimple_decl_printable_name;
   lang_hooks.gimplify_expr = lhd_gimplify_expr;
   lang_hooks.overwrite_decl_assembler_name = lhd_overwrite_decl_assembler_name;
   lang_hooks.print_xnode = lhd_print_tree_nothing;
@@ -1141,9 +1138,6 @@ free_lang_data (void)
  make sure we never call decl_assembler_name on local symbols and
  devise a 

[PATCH] c++: libcpp: Support raw strings with newlines in directives [PR55971]

2022-10-27 Thread Lewis Hyatt via Gcc-patches
Hello-

May I please ask for a review of this patch from June? I realize it's a
10-year-old PR that doesn't seem to be bothering people much, but I still feel
like it's an unfortunate gap in C++11 support that is not hard to fix.

Original submission is here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html

But I have attached a new version here that is simplified, all the
_Pragma-related stuff has been removed and I will handle that in a later patch
instead. I also removed the changes to c-ppoutput.cc that I realized were not
needed after all. Bootstrap+regtest all languages on x86-64 Linux still looks
good. Thanks!

-Lewis

-- >8 --

It's not currently possible to use a C++11 raw string containing a newline as
part of the definition of a macro, or in any other preprocessing directive,
such as:

 #define X R"(two
lines)"

 #error R"(this error has
two lines)"

Add support for that by relaxing the conditions under which
_cpp_get_fresh_line() refuses to get a new line. For the case of lexing a raw
string, it's OK to do so as long as there is another line within the current
buffer. The code in cpp_get_fresh_line() was refactored into a new function
get_fresh_line_impl(), so that the new logic is applied only when processing a
raw string and not any other times.

libcpp/ChangeLog:

PR preprocessor/55971
* lex.cc (get_fresh_line_impl): New function refactoring the code
from...
(_cpp_get_fresh_line): ...here.
(lex_raw_string): Use the new version of get_fresh_line_impl() to
support raw strings containing new lines when processing a directive.

gcc/testsuite/ChangeLog:

PR preprocessor/55971
* c-c++-common/raw-string-directive-1.c: New test.
* c-c++-common/raw-string-directive-2.c: New test.

gcc/c-family/ChangeLog:

PR preprocessor/55971
* c-ppoutput.cc (adjust_for_newlines): Update comment.
---
 gcc/c-family/c-ppoutput.cc| 10 ++-
 .../c-c++-common/raw-string-directive-1.c | 74 +++
 .../c-c++-common/raw-string-directive-2.c | 33 +
 libcpp/lex.cc | 41 +++---
 4 files changed, 148 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/raw-string-directive-1.c
 create mode 100644 gcc/testsuite/c-c++-common/raw-string-directive-2.c

diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index a99d9e9c5ca..6e054358e9e 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -433,7 +433,15 @@ scan_translation_unit_directives_only (cpp_reader *pfile)
 lang_hooks.preprocess_token (pfile, NULL, streamer.filter);
 }
 
-/* Adjust print.src_line for newlines embedded in output.  */
+/* Adjust print.src_line for newlines embedded in output.  For example, if a 
raw
+   string literal contains newlines, then we need to increment our notion of 
the
+   current line to keep in sync and avoid outputting a line marker
+   unnecessarily.  If a raw string literal containing newlines is the result of
+   macro expansion, then we have the opposite problem, where the token takes up
+   more lines in the output than it did in the input, and hence a line marker 
is
+   needed to restore the correct state for subsequent lines.  In this case,
+   incrementing print.src_line still does the job, because it will cause us to
+   emit the line marker the next time a token is streamed.  */
 static void
 account_for_newlines (const unsigned char *str, size_t len)
 {
diff --git a/gcc/testsuite/c-c++-common/raw-string-directive-1.c 
b/gcc/testsuite/c-c++-common/raw-string-directive-1.c
new file mode 100644
index 000..d6525e107bc
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/raw-string-directive-1.c
@@ -0,0 +1,74 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu99" { target c } } */
+/* { dg-options "-std=c++11" { target c++ } } */
+
+/* Test that multi-line raw strings are lexed OK for all preprocessing
+   directives where one could appear. Test raw-string-directive-2.c
+   checks that #define is also processed properly.  */
+
+/* Note that in cases where we cause GCC to produce a multi-line error
+   message, we construct the string so that the second line looks enough
+   like an error message for DejaGNU to process it as such, so that we
+   can use dg-warning or dg-error directives to check for it.  */
+
+#warning R"delim(line1 /* { dg-warning "line1" } */
+file:15:1: warning: line2)delim" /* { dg-warning "line2" } */
+
+#error R"delim(line3 /* { dg-error "line3" } */
+file:18:1: error: line4)delim" /* { dg-error "line4" } */
+
+#define X1 R"(line 5
+line 6
+line 7
+line 8
+/*
+//
+line 9)" R"delim(
+line10)delim"
+
+#define X2(a) X1 #a R"(line 11
+/*
+line12
+)"
+
+#if R"(line 13 /* { dg-error "line13" } */
+file:35:1: error: line14)" /* { dg-error "line14\\)\"\" is not valid" } */
+#endif R"(line 15 /* { dg-warning "extra tokens at end of #endif" } */
+\
+line16)" ""
+
+#ifdef 

Re: [PATCH] diagnostics: Allow FEs to keep customizations for middle end [PR101551, PR106274]

2022-10-25 Thread Lewis Hyatt via Gcc-patches
On Tue, Oct 25, 2022 at 7:35 AM Richard Biener
 wrote:
>
> On Thu, Oct 20, 2022 at 1:09 AM Lewis Hyatt via Gcc-patches
>  wrote:
> >
> > Currently, the ipa-free-lang-data pass resets most of the frontend's
> > diagnostic customizations, such as the diagnostic_finalizer that prints 
> > macro
> > expansion information, which is the subject of the two PRs. In most cases,
> > however, there is no need to reset these customizations; they still work 
> > just
> > fine after the language-specific data has been freed. (Macro tracking
> > information, for instance, only depends on the line_maps instance and does 
> > not
> > use the tree data structures at all.)
> >
> > Add an interface whereby frontends can convey which of their customizations
> > should be preserved by ipa-free-lang-data. Only the macro tracking behavior 
> > is
> > changed for now.  Subsequent patches will add further configurations for 
> > each
> > frontend.
>
> One point of the resetting of the hooks is to avoid crashes due to us zapping
> many of the lang specific data structures.  If the hooks were more resilent
> that wouldn't be an issue.
>

Right. The patch I have for C++ (not sent yet) makes the C++ versions
of decl_printable_name and and the diagnostic starter able to work
after free_lang_data runs.  I just worry that future changes to the
C++ hooks would need to preserve this property, which could be error
prone since issues are not immediately apparent, and most of the
testsuite does not use -flto.

> Now - as for macro tracking, how difficult is it to replicate that in the
> default hook implementation?  Basically we have similar issues for
> late diagnostics of the LTO compile step where only the LTO (aka default)
> variant of the hooks are present - it would be nice to improve that as well.
>

It is easy enough to make the default diagnostic finalizer print the
macro tracking information stored in the global line_table. (It just
needs to check if the global line_table is set, in which case call
virt_loc_aware_diagnostic_finalizer()). This would remove the need for
C-family frontends to override that callback. Fortran would still do
so, since it does other things in its finalizer. However, this would
not help with the LTO frontend because the line_table is not part of
what gets streamed out. Rather the line_table is rebuilt from scratch
when reading the data back in, but the macro tracking information is
not available at that time, just the basic location info (filename and
source location). I am not that familiar with the LTO streaming
process but I feel like streaming the entire line_table would not mesh
well with it (especially since multiple of them from different
translation units would need to be combined back together).

> Note free_lang_data exists to "simplify" the LTO bytecode output - things
> freed do not need to be output.  Of course the "freeing" logic could be
> wired into the LTO bytecode output machinery directly - simply do not
> output what we'd free.  That way all info would prevail for the non-LTO
> compile and the hooks could continue to work as they do without any
> LTO streaming done.
>

Naively (emphasis on the naive, as I don't have any experience with
this part of GCC), that is how I would have guessed it worked. But I
understood there are some benefits to freeing the lang data earlier
(e.g. reduced resource usage), and even a hope to start doing it in
non-LTO builds as well, so I thought some incremental changes as in
this patch to make diagnostics better after free_lang_data could
perhaps be useful. Thanks for taking a look at it!

-Lewis


[PATCH] diagnostics: Allow FEs to keep customizations for middle end [PR101551, PR106274]

2022-10-19 Thread Lewis Hyatt via Gcc-patches
Currently, the ipa-free-lang-data pass resets most of the frontend's
diagnostic customizations, such as the diagnostic_finalizer that prints macro
expansion information, which is the subject of the two PRs. In most cases,
however, there is no need to reset these customizations; they still work just
fine after the language-specific data has been freed. (Macro tracking
information, for instance, only depends on the line_maps instance and does not
use the tree data structures at all.)

Add an interface whereby frontends can convey which of their customizations
should be preserved by ipa-free-lang-data. Only the macro tracking behavior is
changed for now.  Subsequent patches will add further configurations for each
frontend.

gcc/ChangeLog:

PR lto/106274
PR middle-end/101551
* diagnostic.h (struct diagnostic_context): Add new customization
point diagnostic_context::preserve_on_reset.
* diagnostic.cc (diagnostic_initialize): Initialize it.
* tree-diagnostic.h (tree_diagnostics_defaults): Add new optional
argument enable_preserve.
* tree-diagnostic.cc (tree_diagnostics_defaults): Implement it.
* ipa-free-lang-data.cc (free_lang_data): Use it.

gcc/c-family/ChangeLog:

PR lto/106274
PR middle-end/101551
* c-opts.cc (c_common_diagnostics_set_defaults): Preserve
diagnostic finalizer for the middle end.

libgomp/ChangeLog:

PR lto/106274
PR middle-end/101551
* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Remove
now-unnecessary workaround.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

PR lto/106274
PR middle-end/101551
* c-c++-common/diag-after-fld-1.c: New test.
---

Notes:
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101551
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106274

PR101551 complains that when compiling with offloading enabled, diagnostics
output changes in several different ways. PR106274 notes the same thing 
(which
has the same cause) when compiling with -flto, for the specific case of 
macro
expansion tracking, which is no longer output for middle-end diagnostics 
when
-flto is present. Restoring the macro tracking information, at least, can be
done simply, which is the attached patch. This is straightforward because 
the
printing of macro tracking information is not really reliant on the sort of
language-specific data structures that are freed by ipa-free-lang-data... it
just needs the line-maps API, which is common to all languages and which is
not impacted by the work done by ipa-free-lang-data.

To be clear, this is not about diagnostics issued *by* the lto frontend. 
This
is just about diagnostics issued by the language frontends, in case they 
were
asked to stream the IL for later use by the lto frontend. I think from the
user's perspective, compiling with or without -flto should not change the
quality of diagnostics, since on the face of it, -flto seems to be just a
request for the compiler to do something extra (write out the IL in addition
to all the other stuff it does), and it's not clear why this should change 
the
way diagnostics look. (I understand there must be some good reasons, why 
it's
not the case.)  So anyway I was focusing on how to keep the diagnostics as
close as possible to their normal form after ipa-free-lang-data is done.

The approach I took was to add a new variable "preserve_on_reset" in the
diagnostic_context, allowing a frontend to specify whether its diagnostic
context customizations are safe to leave in place for the middle end. There
are currently four customizations for which preservation can be enabled:

* diagnostic starter
* diagnostic finalizer
* format decoder
* decl_printable_name langhook

Preserving the diagnostic finalizer is sufficient to restore the output of
macro tracking information, and that's what I did in this patch. But it 
seems
that it's possible to go a bit farther than this as well. Here are some
examples:

1. Fortran:
Consider the existing testcase
gfortran.dg/allocatable_uninitialized_1.f90. When compiled without 
-flto,
it outputs:

===
allocatable_uninitialized_1.f90:6:47:

6 |a(1)=2*b(1) ! { dg-warning "uninitialized" }
  |   ^
Warning: 'b.offset' is used uninitialized [-Wuninitialized]
allocatable_uninitialized_1.f90:4:30:

4 |   real,allocatable:: a(:),b(:)
  |  ^
note: 'b' declared here
allocatable_uninitialized_1.f90:6:47:

6 |a(1)=2*b(1) ! { dg-warning "uninitialized" }
 

Re: [PATCH] pch: Fix streaming of strings with embedded null bytes

2022-10-19 Thread Lewis Hyatt via Gcc-patches
On Wed, Oct 19, 2022 at 8:23 AM Jakub Jelinek  wrote:
>
> On Wed, Oct 19, 2022 at 01:17:02PM +0100, Richard Sandiford wrote:
> > Jakub Jelinek  writes:
> > > On Wed, Oct 19, 2022 at 12:54:11PM +0100, Richard Sandiford via 
> > > Gcc-patches wrote:
> > >> Lewis Hyatt via Gcc-patches  writes:
> > >> > When a GTY'ed struct is streamed to PCH, any plain char* pointers it 
> > >> > contains
> > >> > (whether they live in GC-controlled memory or not) will be marked for 
> > >> > PCH
> > >> > output by the routine gt_pch_note_object in ggc-common.cc. This routine
> > >> > special-cases plain char* strings, and in particular it uses strlen() 
> > >> > to get
> > >> > their length. Thus it does not handle strings with embedded null 
> > >> > bytes, but it
> > >> > is possible for something PCH cares about (such as a string literal 
> > >> > token in a
> > >> > macro definition) to contain such embedded nulls. To fix that up, add 
> > >> > a new
> > >> > GTY option "string_length" so that gt_pch_note_object can be informed 
> > >> > the
> > >> > actual length it ought to use, and use it in the relevant libcpp 
> > >> > structs
> > >> > (cpp_string and ht_identifier) accordingly.
> > >>
> > >> This isn't really my area, as I'm about to demonstrate with this
> > >> question, but: regarding
> > >>
> > >>   if (note_ptr_fn == gt_pch_p_S)
> > >> (*slot)->size = strlen ((const char *)obj) + 1;
> > >>   else
> > >> (*slot)->size = ggc_get_size (obj);
> > >>
> > >> do you know why the PCH code goes out of its way to handle the sizes of
> > >> strings specially?  Are there enough garbage strings in the string pool
> > >> that it's worth optimising the size of the saved memory for strings but
> > >> not for other types of object?  Or is the gt_pch_p_S test needed for
> > >> correctness, rather than just being an optimisation?
> > >
> > > Just guessing, not all GC strings live in the stringpool.
> > > Isn't e.g. ggc_strdup just a GC allocation where the string length
> > > isn't stored anywhere?
> >
> > Is that different from other GC VLA allocations though?  I thought
> > ultimately we just tried to save and restore the containing pages.
>
> I think just the objects in it, not entire pages (ggc_get_size (obj)
> sized chunks for non-strings).
>
> > > And sometimes it isn't even GC allocated, e.g. ggc_strdup ("") just
> > > returns ""; I guess const char * pointers in GC memory can also point
> > > to string literals in .rodata and for PCH we move them.
> >
> > Ah, OK, that would definitely explain it, thanks.  In that case,
> > are you OK with the patch, as a way of continuing to support rodata
> > string pointers while also allowing embedded nuls?
>
> LGTM.
>

Thank you both very much, I will push that then.
My understanding is that a GTY()ed struct can contain arbitrary char*
pointers as a special case, they need not be in the string pool. They
will be silently ignored by GC marking routines if they are not within
ggc's pages (as opposed to any other pointer, which will abort if it
wasn't under ggc's control), and they will make it into PCH by the
gt_pch_note_object mechanism. For example, struct line_maps contains a
char* to store the file name for each map, which is just an ordinary
malloc()ed string owned by the cpp_reader object.

-Lewis


[PATCH] pch: Fix streaming of strings with embedded null bytes

2022-10-18 Thread Lewis Hyatt via Gcc-patches
When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains
(whether they live in GC-controlled memory or not) will be marked for PCH
output by the routine gt_pch_note_object in ggc-common.cc. This routine
special-cases plain char* strings, and in particular it uses strlen() to get
their length. Thus it does not handle strings with embedded null bytes, but it
is possible for something PCH cares about (such as a string literal token in a
macro definition) to contain such embedded nulls. To fix that up, add a new
GTY option "string_length" so that gt_pch_note_object can be informed the
actual length it ought to use, and use it in the relevant libcpp structs
(cpp_string and ht_identifier) accordingly.

gcc/ChangeLog:

* gengtype.cc (output_escaped_param): Add missing const.
(get_string_option): Add missing check for option type.
(walk_type): Support new "string_length" GTY option.
(write_types_process_field): Likewise.
* ggc-common.cc (gt_pch_note_object): Add optional length argument.
* ggc.h (gt_pch_note_object): Adjust prototype for new argument.
(gt_pch_n_S2): Declare...
* stringpool.cc (gt_pch_n_S2): ...new function.
* doc/gty.texi: Document new GTY((string_length)) option.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
* include/symtab.h (struct ht_identifier): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/pch/pch-string-nulls.C: New test.
* g++.dg/pch/pch-string-nulls.Hs: New test.
---

Notes:
Hello-

This fixes a small glitch with PCH files that I doubt matters in
practice. However, the new GTY((string_length)) option I think should be 
also
useful for other things (including for another patch I am working on), and 
it
seems worth fixing to me anyway.  Please let me know if it looks OK, or if 
you'd
prefer another approach? I did consider reusing GTY((length)) for this 
purpose
but it seemed much more straightforward to do it with a new option, and it's
really about something different since it isn't related to marking of
GC-controlled memory.

BTW, the testcase (pch-string-nulls.Hs) needs to have a literal null byte in
it. That wasn't emailing well so I temporarily have it as the string "^@" in
this patch, for illustration.

Bootstrap + regtest all languages looks good on x86-64 Linux. Thanks!

-Lewis

 gcc/doc/gty.texi | 21 +++-
 gcc/gengtype.cc  | 25 
 gcc/ggc-common.cc|  7 --
 gcc/ggc.h|  4 +++-
 gcc/stringpool.cc|  7 ++
 gcc/testsuite/g++.dg/pch/pch-string-nulls.C  |  3 +++
 gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs |  2 ++
 libcpp/include/cpplib.h  |  6 -
 libcpp/include/symtab.h  |  5 +++-
 9 files changed, 70 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.C
 create mode 100644 gcc/testsuite/g++.dg/pch/pch-string-nulls.Hs

diff --git a/gcc/doc/gty.texi b/gcc/doc/gty.texi
index 81aafd11ce3..4f791b300ba 100644
--- a/gcc/doc/gty.texi
+++ b/gcc/doc/gty.texi
@@ -196,7 +196,26 @@ static GTY((length("reg_known_value_size"))) rtx 
*reg_known_value;
 Note that the @code{length} option is only meant for use with arrays of
 non-atomic objects, that is, objects that contain pointers pointing to
 other GTY-managed objects.  For other GC-allocated arrays and strings
-you should use @code{atomic}.
+you should use @code{atomic} or @code{string_length}.
+
+@findex string_length
+@item string_length ("@var{expression}")
+
+In order to simplify production of PCH, a structure member that is a plain
+array of bytes (an optionally @code{const} and/or @code{unsigned} @code{char
+*}) is treated specially by the infrastructure. Even if such an array has not
+been allocated in GC-controlled memory, it will still be written properly into
+a PCH.  The machinery responsible for this needs to know the length of the
+data; by default, the length is determined by calling @code{strlen} on the
+pointer.  The @code{string_length} option specifies an alternate way to
+determine the length, such as by inspecting another struct member:
+
+@smallexample
+struct GTY(()) non_terminated_string @{
+  size_t sz;
+  const char * GTY((string_length ("%h.sz"))) data;
+@};
+@end smallexample
 
 @findex skip
 @item skip
diff --git a/gcc/gengtype.cc b/gcc/gengtype.cc
index 42363439bd8..28bf05e9c57 100644
--- a/gcc/gengtype.cc
+++ b/gcc/gengtype.cc
@@ -2403,7 +2403,7 @@ struct write_types_data
   enum write_types_kinds kind;
 };
 
-static void output_escaped_param (struct walk_type_data *d,
+static void output_escaped_param (const struct walk_type_data *d,
  const char *, const char *);
 static void 

Ping^2: [PATCH] libcpp: Improve location for macro names [PR66290]

2022-10-12 Thread Lewis Hyatt via Gcc-patches
Hello-

https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html

Since Jeff was kind enough to ack one of my other preprocessor patches
today, I have become emboldened to ping this one again too :). Would
anyone have some time to take a look at it please? Thanks!

-Lewis

On Thu, Sep 15, 2022 at 6:31 PM Lewis Hyatt  wrote:
>
> Hello-
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
> May I please ping this patch? Thank you.
>
> -Lewis
>
> On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
> >
> >
> > When libcpp reports diagnostics whose locus is a macro name (such as for
> > -Wunused-macros), it uses the location in the cpp_macro object that was
> > stored by _cpp_new_macro. This is currently set to pfile->directive_line,
> > which contains the line number only and no column information. This patch
> > changes the stored location to the src_loc for the token defining the macro
> > name, which includes the location and range information.
> >
> > libcpp/ChangeLog:
> >
> > PR c++/66290
> > * macro.cc (_cpp_create_definition): Add location argument.
> > * internal.h (_cpp_create_definition): Adjust prototype.
> > * directives.cc (do_define): Pass new location argument to
> > _cpp_create_definition.
> > (do_undef): Stop passing inferior location to cpp_warning_with_line;
> > the default from cpp_warning is better.
> > (cpp_pop_definition): Pass new location argument to
> > _cpp_create_definition.
> > * pch.cc (cpp_read_state): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR c++/66290
> > * c-c++-common/cpp/macro-ranges.c: New test.
> > * c-c++-common/cpp/line-2.c: Adapt to check for column information
> > on macro-related libcpp warnings.
> > * c-c++-common/cpp/line-3.c: Likewise.
> > * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> > * c-c++-common/cpp/pr58844-1.c: Likewise.
> > * c-c++-common/cpp/pr58844-2.c: Likewise.
> > * c-c++-common/cpp/warning-zero-location.c: Likewise.
> > * c-c++-common/pragma-diag-14.c: Likewise.
> > * c-c++-common/pragma-diag-15.c: Likewise.
> > * g++.dg/modules/macro-2_d.C: Likewise.
> > * g++.dg/modules/macro-4_d.C: Likewise.
> > * g++.dg/modules/macro-4_e.C: Likewise.
> > * g++.dg/spellcheck-macro-ordering.C: Likewise.
> > * gcc.dg/builtin-redefine.c: Likewise.
> > * gcc.dg/cpp/Wunused.c: Likewise.
> > * gcc.dg/cpp/redef2.c: Likewise.
> > * gcc.dg/cpp/redef3.c: Likewise.
> > * gcc.dg/cpp/redef4.c: Likewise.
> > * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> > * gcc.dg/cpp/ucnid-11.c: Likewise.
> > * gcc.dg/cpp/undef2.c: Likewise.
> > * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> > * gcc.dg/cpp/warn-redefined.c: Likewise.
> > * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> > * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> > ---
> >
> > Notes:
> > Hello-
> >
> > The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was 
> > originally
> > about the entirely wrong location for -Wunused-macros in C++ mode, which
> > behavior was fixed by r13-1903, but before closing it out I wanted to 
> > also
> > address a second point brought up in the PR comments, namely that we do 
> > not
> > include column information when emitting diagnostics for macro names, 
> > such as
> > is done for -Wunused-macros. The attached patch updates the location 
> > stored in
> > the cpp_macro object so that it includes the column and range 
> > information for
> > the token comprising the macro name; previously, the location was just 
> > the
> > generic one pointing to the whole line.
> >
> > The change to libcpp is very small, the reason for all the testsuite 
> > changes is
> > that I have updated all tests explicitly looking for the columnless 
> > diagnostics
> > (with the "-:" syntax to dg-warning et al) so that they expect a column
> > instead. I also added a new test which verifies the expected range 
> > information
> > in diagnostics with carets.
> >
> > Bootstrap + regtest on x86-64 Linux looks good. Please let me know if 
> > it looks
> > OK? Thanks!
> >
> > -Lewis
> >
> >  libcpp/directives.cc  |  13 +-
> >  libcpp/internal.h |   2 +-
> >  libcpp/macro.cc   |  12 +-
> >  libcpp/pch.cc |   2 +-
> >  gcc/testsuite/c-c++-common/cpp/line-2.c   |   2 +-
> >  gcc/testsuite/c-c++-common/cpp/line-3.c   |   2 +-
> >  .../c-c++-common/cpp/macro-arg-count-1.c  |   4 +-
> >  gcc/testsuite/c-c++-common/cpp/macro-ranges.c |  52 ++
> >  gcc/testsuite/c-c++-common/cpp/pr58844-1.c|   4 +-
> >  gcc/testsuite/c-c++-common/cpp/pr58844-2.c|   4 +-
> >  

[PATCH] preprocessor: Fix tracking of system header state [PR60014, PR60723]

2022-10-08 Thread Lewis Hyatt via Gcc-patches
The token_streamer class (which implements gcc mode -E and
-save-temps/-no-integrated-cpp) needs to keep track whether the last tokens
output were in a system header, so that it can generate line marker
annotations as necessary for a downstream consumer to reconstruct the
state. The logic for tracking it, which was added by r5-1863 to resolve
PR60723, has some edge case issues as revealed by the three new test
cases. The first, coming from the original PR60014, was incidentally fixed by
r9-1926 for unrelated reasons. The other two were still failing on master
prior to this commit. Such code paths were not realizable prior to r13-1544,
which made it possible for the token streamer to see CPP_PRAGMA tokens in more
contexts.

The two main issues being corrected here are:

1) print.prev_was_system_token needs to indicate whether the previous token
output was in a system location. However, it was not being set on every token,
only on those that triggered the main code path; specifically it was not
triggered on a CPP_PRAGMA token. Testcase 2 covers this case.

2) The token_streamer uses a variable "line_marker_emitted" to remember
whether a line marker has been emitted while processing a given token, so that
it wouldn't be done more than once in case multiple conditions requiring a
line marker are true. There was no reason for this to be a member variable
that retains its value from token to token, since it is just needed for
tracking the state locally while processing a single given token. The fact
that it could retain its value for a subsequent token is rather difficult to
observe, but testcase 3 demonstrates incorrect behavior resulting from
that. Moving this to a local variable also simplifies understanding the
control flow going forward.

gcc/c-family/ChangeLog:

PR preprocessor/60014
PR preprocessor/60723
* c-ppoutput.cc (class token_streamer): Remove member
line_marker_emitted to...
(token_streamer::stream): ...a local variable here. Set
print.prev_was_system_token on all code paths.

gcc/testsuite/ChangeLog:

PR preprocessor/60014
PR preprocessor/60723
* gcc.dg/cpp/pr60014-1.c: New test.
* gcc.dg/cpp/pr60014-1.h: New test.
* gcc.dg/cpp/pr60014-2.c: New test.
* gcc.dg/cpp/pr60014-2.h: New test.
* gcc.dg/cpp/pr60014-3.c: New test.
* gcc.dg/cpp/pr60014-3.h: New test.
---

Notes:
Hello-

I tried to describe it all in the commit message, please see also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60014#c8 for more
details. bootstrap+regtest all languages looks good on x86-64 Linux. Please
let me know if it looks OK? Thanks!

-Lewis

 gcc/c-family/c-ppoutput.cc   | 17 ++---
 gcc/testsuite/gcc.dg/cpp/pr60014-1.c |  9 +
 gcc/testsuite/gcc.dg/cpp/pr60014-1.h |  5 +
 gcc/testsuite/gcc.dg/cpp/pr60014-2.c |  5 +
 gcc/testsuite/gcc.dg/cpp/pr60014-2.h |  5 +
 gcc/testsuite/gcc.dg/cpp/pr60014-3.c | 16 
 gcc/testsuite/gcc.dg/cpp/pr60014-3.h |  2 ++
 7 files changed, 52 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr60014-1.c
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr60014-1.h
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr60014-2.c
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr60014-2.h
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr60014-3.c
 create mode 100644 gcc/testsuite/gcc.dg/cpp/pr60014-3.h

diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index 98081ccfbb0..a99d9e9c5ca 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -184,15 +184,13 @@ class token_streamer
   bool avoid_paste;
   bool do_line_adjustments;
   bool in_pragma;
-  bool line_marker_emitted;
 
  public:
   token_streamer (cpp_reader *pfile)
 :avoid_paste (false),
 do_line_adjustments (cpp_get_options (pfile)->lang != CLK_ASM
 && !flag_no_line_commands),
-in_pragma (false),
-line_marker_emitted (false)
+in_pragma (false)
 {
   gcc_assert (!print.streamer);
   print.streamer = this;
@@ -227,7 +225,14 @@ token_streamer::stream (cpp_reader *pfile, const cpp_token 
*token,
   if (token->type == CPP_EOF)
 return;
 
+  /* Keep track when we move into and out of system locations.  */
+  const bool is_system_token = in_system_header_at (loc);
+  const bool system_state_changed
+= (is_system_token != print.prev_was_system_token);
+  print.prev_was_system_token = is_system_token;
+
   /* Subtle logic to output a space if and only if necessary.  */
+  bool line_marker_emitted = false;
   if (avoid_paste)
 {
   unsigned src_line = LOCATION_LINE (loc);
@@ -301,19 +306,17 @@ token_streamer::stream (cpp_reader *pfile, const 
cpp_token *token,
   if (do_line_adjustments
  && !in_pragma
  && !line_marker_emitted
- && print.prev_was_system_token != !!in_system_header_at (loc)
+ && 

[PATCH] diagnostics: Add test for fixed _Pragma location issue [PR91669]

2022-10-03 Thread Lewis Hyatt via Gcc-patches
This PR related to _Pragma locations and diagnostic pragmas was fixed by a
combination of r10-325 and r13-1596. Add missing test coverage.

gcc/testsuite/ChangeLog:

PR c/91669
* c-c++-common/pr91669.c: New test.
---

Notes:
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91669#c4

The above PR was already fixed, but I'd like to add missing test coverage
before closing it. Does this look OK please? Thanks!

-Lewis

 gcc/testsuite/c-c++-common/pr91669.c | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/pr91669.c

diff --git a/gcc/testsuite/c-c++-common/pr91669.c 
b/gcc/testsuite/c-c++-common/pr91669.c
new file mode 100644
index 000..1070751ed2e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr91669.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wreturn-type" } */
+
+/* The location of the right brace within the macro expansion can be an adhoc
+   location, because the frontend attached custom data to it.  In order for the
+   diagnostic pragma to correctly understand that the diagnostic pop occurs
+   after the function and not before, linemap_location_before_p needs to handle
+   adhoc locations within a macro map, which was broken until fixed by r10-325.
+   Verify that we get it right, both when the brace is a macro token and when 
it
+   is part of the macro expansion.  */
+
+#define ENDFUNC1 \
+  _Pragma("GCC diagnostic push") \
+  _Pragma("GCC diagnostic ignored \"-Wreturn-type\"") \
+  } /* { dg-bogus {-Wreturn-type} } */ \
+  _Pragma("GCC diagnostic pop")
+
+int f1 () {
+ENDFUNC1 /* { dg-bogus {in expansion of macro 'ENDFUNC1' } } */
+
+#define ENDFUNC2(term) \
+  _Pragma("GCC diagnostic push") \
+  _Pragma("GCC diagnostic ignored \"-Wreturn-type\"") \
+  term /* { dg-bogus {in definition of macro 'ENDFUNC2'} } */ \
+  _Pragma("GCC diagnostic pop")
+
+int f2 () {
+ENDFUNC2(}) /* { dg-bogus {-Wreturn-type} } */


[PATCH] diagnostics: Fix virtual location for -Wuninitialized [PR69543]

2022-09-29 Thread Lewis Hyatt via Gcc-patches
Warnings issued for -Wuninitialized have been using the spelling location of
the problematic usage, discarding any information on the location of the macro
expansion point if such usage was in a macro. This makes the warnings
impossible to control reliably with #pragma GCC diagnostic, and also discards
useful context in the diagnostic output. There seems to be no need to discard
the virtual location information, so this patch fixes that.

PR69543 was mostly about _Pragma issues which have been fixed for many years
now. The PR remains open because two of the testcases added in response to it
still have xfails, but those xfails have nothing to do with _Pragma and rather
just with the issue fixed by this patch, so the PR can be closed now as well.

The other testcase modified here, pragma-diagnostic-2.c, was explicitly
testing for the undesirable behavior that was xfailed in pr69543-3.c. I have
adjusted that and also added a new testcase verifying all 3 types of warning
that come from tree-ssa-uninit.cc get the proper location information now.

gcc/ChangeLog:

PR preprocessor/69543
* tree-ssa-uninit.cc (warn_uninit): Stop stripping macro tracking
information away from the diagnostic location.
(maybe_warn_read_write_only): Likewise.
(maybe_warn_operand): Likewise.

gcc/testsuite/ChangeLog:

PR preprocessor/69543
* c-c++-common/pr69543-3.c: Remove xfail.
* c-c++-common/pr69543-4.c: Likewise.
* gcc.dg/cpp/pragma-diagnostic-2.c: Adjust test for new behavior.
* c-c++-common/pragma-diag-16.c: New test.
---

Notes:
Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543#c9

This patch resolves two xfail'ed testcases discussed on the PR. David seems 
to
have fully analyzed the situation back in 2017, but stopped short of pushing
any changes. I am working my way through resolving the remaining _Pragma
related PRs and it would be nice to close this one too. As David mentioned,
the issue here is that -Wuninitialized warnings are using the wrong 
location,
well they discard the macro tracking information and use only the spelling
point of the uninitialized usage. But '#pragma GCC diagnostic' can never 
work
reliably if this is done; it needs to know the macro expansion point in
order to look up the diagnostic enablement state as the user would naturally
interpret it. As a quick example:


int g;
 #define SET(a, b) ((a) = (b))
void f ()
{
  int x;
  #pragma GCC diagnostic ignored "-Wuninitialized"
  SET (g, x);
}


The current status without this patch is that because the macro tracking
information is removed from the location when the diagnostic is issued, the
location for the diagnostic is effectively line 2, prior to the #pragma, and
so the diagnostic does not get suppressed. But I think it seems clear that
users expect it should be suppressed in this case. SET could be buried in 
some
utility header and in any case has nothing to do with the function or the
actual issue, so its location should not impact whether or not the 
diagnostic
gets issued.

As David also mentioned on the PR, the behavior was changed intentionally by
r186971 in 2012. Dodji's rationale here:

https://gcc.gnu.org/ml/gcc-patches/2012-04/msg00574.html

indicates that this was necessary to avoid some undesirable locations on the
informative notes for the diagnostic, but does not provide any specific
examples of that, and I am not able to find any cases myself where it is 
worse
with the virtual location restored. Dodji stated it related to cases where 
the
variable definition (as opposed to the usage) occurs in a macro, but such
cases are unaffected by my patch, since the same virtual location is used
for the note about the declaration either way. I think a lot has changed 
since
that time, and the original rationale likely no longer applies. Given that
it does definitely cause a real problem, and users seem to be rather
interested in being able to suppress diagnostics with pragmas, I feel it 
makes
sense to change it back and stop discarding the macro tracking information
when generating the diagnostic.

Please let me know what you think? bootstrap/regtest all languages looks 
good
on x86-64 Linux:

FAIL 105 105
PASS 547685 547801
UNSUPPORTED 15435 15435
UNTESTED 136 136
XFAIL 4149 4129
XPASS 17 17

Thanks!

-Lewis

 gcc/testsuite/c-c++-common/pr69543-3.c|  8 +--
 gcc/testsuite/c-c++-common/pr69543-4.c|  8 +--
 gcc/testsuite/c-c++-common/pragma-diag-16.c   | 63 +++
 .../gcc.dg/cpp/pragma-diagnostic-2.c  |  7 ++-
 gcc/tree-ssa-uninit.cc| 12 +---
 5 files changed, 73 insertions(+), 25 deletions(-)
 create mode 100644 

Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2022-09-26 Thread Lewis Hyatt via Gcc-patches
On Wed, Jun 15, 2022 at 03:06:16PM -0400, Lewis Hyatt wrote:
> On Tue, Jun 14, 2022 at 05:26:49PM -0400, Lewis Hyatt wrote:
> > Hello-
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902
> > 
> > The attached patch resolves PR preprocessor/103902 as described in the patch
> > message inline below. bootstrap + regtest all languages was successful on
> > x86-64 Linux, with no new failures:
> > 
> > FAIL 103 103
> > PASS 542338 542371
> > UNSUPPORTED 15247 15250
> > UNTESTED 136 136
> > XFAIL 4166 4166
> > XPASS 17 17
> > 
> > Please let me know if it looks OK?
> > 
> > A few questions I have:
> > 
> > - A difference introduced with this patch is that after lexing something
> > like `operator ""_abc', then `_abc' is added to the identifier hash map,
> > whereas previously it was not. I feel like this must be OK because with the
> > optional space as in `operator "" _abc', it would be added with or without 
> > the
> > patch.
> > 
> > - The behavior of `#pragma GCC poison' is not consistent (including prior to
> >   my patch). I tried to make it more so but there is still one thing I want 
> > to
> >   ask about. Leaving aside extended characters for now, the inconsistency is
> >   that currently the poison is only checked, when the suffix appears as a
> >   standalone token.
> > 
> >   #pragma GCC poison _X
> >   bool operator ""_X (unsigned long long);   //accepted before the patch,
> >  //rejected after it
> >   bool operator "" _X (unsigned long long);  //rejected either before or 
> > after
> >   const char * operator ""_X (const char *, unsigned long); //accepted 
> > before,
> > //rejected after
> >   const char * operator "" _X (const char *, unsigned long); //rejected 
> > either
> > 
> >   const char * s = ""_X; //accepted before the patch, rejected after it
> >   const bool b = 1_X; //accepted before or after 
> > 
> > I feel like after the patch, the behavior is the expected behavior for all
> > cases but the last one. Here, we allow the poisoned identifier because it's
> > not lexed as an identifier, it's lexed as part of a pp-number. Does it seem 
> > OK
> > like this or does it need to be addressed?
> 
> Sorry, that version actually did not handle the case of -Wc++11-compat in
> c++98 mode correctly. This updated version fixes that and adds the missing
> test coverage for that, if you could please review this one instead?
> 
> By the way, the pipermail archive seems to permanently mangle UTF-8 in inline
> attachments. I attached the patch also gzipped to address that for the
> archive, since the new testcases do use non-ASCII characters.
> 
> Thanks for taking a look!

Hello-

May I please ping this patch again? Joseph suggested that it would be best if
a C++ maintainer has a look at it. This is one of just a few places left where
we don't handle UTF-8 properly in libcpp, it would be really nice to get them
fixed up if there is time to review this patch. Thanks!

https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html

I re-attached it here as it required some trivial rebasing on top of recently
pushed changes. As before, I also attached the gzipped version so that the
UTF-8 testcases show up OK in the online archive, in case that's still an
issue. Thanks for taking a look!

-Lewis
[PATCH] libcpp: Handle extended characters in user-defined literal suffix 
[PR103902]

The PR complains that we do not handle UTF-8 in the suffix for a user-defined
literal, such as:

bool operator ""_π (unsigned long long);

In fact we don't handle any extended identifier characters there, whether
UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
the "" tokens is included, since then the identifier is lexed in the "normal"
way as its own token. But when it is lexed as part of the string token, this
is handled in lex_string() with a one-off loop that is not aware of extended
characters.

This patch fixes it by adding a new function scan_cur_identifier() that can be
used to lex an identifier while in the middle of lexing another token. It is
somewhat duplicative of the code in lex_identifier(), which handles the normal
case, but I think there's no good way to avoid that without pessimizing the
usual case, since lex_identifier() takes advantage of the fact that the first
character of the identifier has already been analyzed. The code duplication is
somewhat offset by factoring out the identifier lexing diagnostics (e.g. for
poisoned identifiers), which were formerly duplicated in two places, and have
been factored into their own function that's used in (now) 3 places.

BTW, the other place that was lexing identifiers is lex_identifier_intern(),
which is used to implement #pragma push_macro and #pragma pop_macro. This does
not support extended characters either. I will add that in a subsequent patch,
because it can't directly reuse the new function, but rather needs to 

Ping: [PATCH] libcpp: Improve location for macro names [PR66290]

2022-09-15 Thread Lewis Hyatt via Gcc-patches
Hello-

https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599397.html
May I please ping this patch? Thank you.

-Lewis

On Fri, Aug 5, 2022 at 12:14 PM Lewis Hyatt  wrote:
>
>
> When libcpp reports diagnostics whose locus is a macro name (such as for
> -Wunused-macros), it uses the location in the cpp_macro object that was
> stored by _cpp_new_macro. This is currently set to pfile->directive_line,
> which contains the line number only and no column information. This patch
> changes the stored location to the src_loc for the token defining the macro
> name, which includes the location and range information.
>
> libcpp/ChangeLog:
>
> PR c++/66290
> * macro.cc (_cpp_create_definition): Add location argument.
> * internal.h (_cpp_create_definition): Adjust prototype.
> * directives.cc (do_define): Pass new location argument to
> _cpp_create_definition.
> (do_undef): Stop passing inferior location to cpp_warning_with_line;
> the default from cpp_warning is better.
> (cpp_pop_definition): Pass new location argument to
> _cpp_create_definition.
> * pch.cc (cpp_read_state): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/66290
> * c-c++-common/cpp/macro-ranges.c: New test.
> * c-c++-common/cpp/line-2.c: Adapt to check for column information
> on macro-related libcpp warnings.
> * c-c++-common/cpp/line-3.c: Likewise.
> * c-c++-common/cpp/macro-arg-count-1.c: Likewise.
> * c-c++-common/cpp/pr58844-1.c: Likewise.
> * c-c++-common/cpp/pr58844-2.c: Likewise.
> * c-c++-common/cpp/warning-zero-location.c: Likewise.
> * c-c++-common/pragma-diag-14.c: Likewise.
> * c-c++-common/pragma-diag-15.c: Likewise.
> * g++.dg/modules/macro-2_d.C: Likewise.
> * g++.dg/modules/macro-4_d.C: Likewise.
> * g++.dg/modules/macro-4_e.C: Likewise.
> * g++.dg/spellcheck-macro-ordering.C: Likewise.
> * gcc.dg/builtin-redefine.c: Likewise.
> * gcc.dg/cpp/Wunused.c: Likewise.
> * gcc.dg/cpp/redef2.c: Likewise.
> * gcc.dg/cpp/redef3.c: Likewise.
> * gcc.dg/cpp/redef4.c: Likewise.
> * gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
> * gcc.dg/cpp/ucnid-11.c: Likewise.
> * gcc.dg/cpp/undef2.c: Likewise.
> * gcc.dg/cpp/warn-redefined-2.c: Likewise.
> * gcc.dg/cpp/warn-redefined.c: Likewise.
> * gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
> * gcc.dg/cpp/warn-unused-macros.c: Likewise.
> ---
>
> Notes:
> Hello-
>
> The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was originally
> about the entirely wrong location for -Wunused-macros in C++ mode, which
> behavior was fixed by r13-1903, but before closing it out I wanted to also
> address a second point brought up in the PR comments, namely that we do 
> not
> include column information when emitting diagnostics for macro names, 
> such as
> is done for -Wunused-macros. The attached patch updates the location 
> stored in
> the cpp_macro object so that it includes the column and range information 
> for
> the token comprising the macro name; previously, the location was just the
> generic one pointing to the whole line.
>
> The change to libcpp is very small, the reason for all the testsuite 
> changes is
> that I have updated all tests explicitly looking for the columnless 
> diagnostics
> (with the "-:" syntax to dg-warning et al) so that they expect a column
> instead. I also added a new test which verifies the expected range 
> information
> in diagnostics with carets.
>
> Bootstrap + regtest on x86-64 Linux looks good. Please let me know if it 
> looks
> OK? Thanks!
>
> -Lewis
>
>  libcpp/directives.cc  |  13 +-
>  libcpp/internal.h |   2 +-
>  libcpp/macro.cc   |  12 +-
>  libcpp/pch.cc |   2 +-
>  gcc/testsuite/c-c++-common/cpp/line-2.c   |   2 +-
>  gcc/testsuite/c-c++-common/cpp/line-3.c   |   2 +-
>  .../c-c++-common/cpp/macro-arg-count-1.c  |   4 +-
>  gcc/testsuite/c-c++-common/cpp/macro-ranges.c |  52 ++
>  gcc/testsuite/c-c++-common/cpp/pr58844-1.c|   4 +-
>  gcc/testsuite/c-c++-common/cpp/pr58844-2.c|   4 +-
>  .../c-c++-common/cpp/warning-zero-location.c  |   2 +-
>  gcc/testsuite/c-c++-common/pragma-diag-14.c   |   2 +-
>  gcc/testsuite/c-c++-common/pragma-diag-15.c   |   2 +-
>  gcc/testsuite/g++.dg/modules/macro-2_d.C  |   4 +-
>  gcc/testsuite/g++.dg/modules/macro-4_d.C  |   4 +-
>  gcc/testsuite/g++.dg/modules/macro-4_e.C  |   2 +-
>  .../g++.dg/spellcheck-macro-ordering.C|   2 +-
>  gcc/testsuite/gcc.dg/builtin-redefine.c   |  18 +-
>  gcc/testsuite/gcc.dg/cpp/Wunused.c|   6 +-
>  gcc/testsuite/gcc.dg/cpp/redef2.c |  20 +-
>  

[PATCH] pch: Fix the reconstruction of adhoc data hash table

2022-09-07 Thread Lewis Hyatt via Gcc-patches
The function rebuild_location_adhoc_htab() was meant to reconstruct the
adhoc location hash map after restoring a line_maps instance from a
PCH. However, the function has never performed as intended because it
missed the last step of adding the data into the newly reconstructed hash
map. This patch fixes that.

It does not seem possible to construct a test case such that the current
incorrect behavior is observable as a compiler issue. It would be
observable, if it were possible for a precompiled header to contain an
adhoc location with a non-zero custom data pointer. But currently, such
data pointers are used only by the middle end to track inlining
information, and this happens later, too late to show up in a PCH.

I also noted that location_adhoc_data_update, which updates the hash map
pointers in a different scenario, was relying on undefined pointer
arithmetic behavior. I'm not aware of this having caused any issue in
practice, but in this patch I have also changed it to use defined pointer
operations instead.

libcpp/ChangeLog:

* line-map.cc (location_adhoc_data_update): Remove reliance on
undefined behavior.
(get_combined_adhoc_loc): Likewise.
(rebuild_location_adhoc_htab): Fix issue where the htab was not
properly updated.
---

Notes:
Hello-

While working on something unrelated in line-map.cc, I noticed that the
function rebuild_location_adhoc_htab(), whose job is to reconstruct the
adhoc data hash table after a line_maps instance is reconstructed from PCH,
doesn't actually rebuild the hash table at all:

void
rebuild_location_adhoc_htab (line_maps *set)
{
  unsigned i;
  set->location_adhoc_data_map.htab =
  htab_create (100, location_adhoc_data_hash, location_adhoc_data_eq, 
NULL);
  for (i = 0; i < set->location_adhoc_data_map.curr_loc; i++)
htab_find_slot (set->location_adhoc_data_map.htab,
set->location_adhoc_data_map.data + i, INSERT);
 ^^
}

In order to have the intended effect, it needs to set the return value of
htab_find_slot to be set->location_adhoc_data_map.data + i, otherwise it
doesn't effectively do anything except make the hash table think it has
curr_loc elements set, when in fact it has 0. Subsequent calls to
htab_traverse, for instance, will do nothing, and any lookups will also 
fail.

I tried for some time to construct a test case that would demonstrate an
observable consequence of this issue, but I don't think it's possible... The
nontrivial uses of this hash map are in the middle end (e.g. to produce the
trace of where a given expression was inlined from), and all this happens
after PCH was read in, and doesn't require any state to be read from the
PCH. It would become apparent in the future, however, if the ability to
attach arbitrary data to an adhoc location were used in other ways, perhaps
somewhere in libcpp; in that hypothetical case, the data would be lost when
reading back in the PCH.

There is another kinda related function, location_adhoc_data_update, which
updates all the pointers in the hash map whenever they are invalidated. It
seems to me, that this function invokes undefined behavior, since it adds an
arbitrary offset to the pointers, which do not necessarily point into the
same array after they were realloced. I don't think it's led to any problems
in practice but in this patch I also changed that to use well-defined
operations. Note sure how people may feel about that, since it does require
on the surface 2x as many operations with this change, but I can't see how
the current approach is guaranteed to be valid on all architectures?

Bootstrap + regtest looks good on x86-64 Linux. Thanks a lot to whoever may
have time to take a look at it.

-Lewis

 libcpp/line-map.cc | 41 +++--
 1 file changed, 27 insertions(+), 14 deletions(-)

diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index 62077c3857c..391f1d4bbc1 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -85,27 +85,38 @@ location_adhoc_data_eq (const void *l1, const void *l2)
  && lb1->data == lb2->data);
 }
 
-/* Update the hashtable when location_adhoc_data is reallocated.  */
+/* Update the hashtable when location_adhoc_data_map::data is reallocated.
+   The param is an array of two pointers, the previous value of the data
+   pointer, and then the new value.  The pointers stored in the hash map
+   are then rebased to be relative to the new data pointer instead of the
+   old one.  */
 
 static int
-location_adhoc_data_update (void **slot, void *data)
+location_adhoc_data_update (void **slot_v, void *param_v)
 {
-  *((char **) slot)
-= (char *) ((uintptr_t) *((char **) slot) + *((ptrdiff_t *) data));
+  const auto slot = reinterpret_cast (slot_v);
+  const auto param 

Re: Ping^2: 2 libcpp patches

2022-08-23 Thread Lewis Hyatt via Gcc-patches
On Tue, Aug 16, 2022 at 9:50 AM Lewis Hyatt  wrote:
>
> On Wed, Jul 20, 2022 at 8:56 PM Lewis Hyatt  wrote:
> >
> > Hello-
> >
> > May I please ping these two preprocessor patches?
> >
> > For PR103902:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html
> >
> > For PR55971:
> > https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html
> >
> > Thanks!
>
> Hello-
>
> I would very much appreciate feedback on these two patches please?
>
> For the first patch, I think it is a worthwhile goal to fix all the
> places where libcpp fails to support UTF-8 correctly, and this is one
> of two remaining ones that I'm aware of. I can fix the other case
> (handling of #pragma push_macro) once this one is in place.
>
> The second patch is about libcpp not allowing raw strings containing
> newlines in preprocessor directives, which is a nearly decade-old
> glitch that I think is also worth addressing.
>
> Thanks!
>
> -Lewis

Hello-

I  CCed Joseph on this ping last week, but he suggested asking other
maintainers to take a look, so I hope it's OK I am doing that now?
Jakub, I thought it could make sense to ask you please, since I saw
you working on lex.cc recently as well, and also since you commented
on PR55971 nine years ago :). With an uptick in development of libcpp
to support new C++23 features, I think the chances that my patches
will eventually conflict with those are increasing, so it would be
great to get them reviewed. As of current master branch, they still
apply and regtest fine. Thanks very much.

-Lewis


Ping^2: 2 libcpp patches

2022-08-16 Thread Lewis Hyatt via Gcc-patches
On Wed, Jul 20, 2022 at 8:56 PM Lewis Hyatt  wrote:
>
> Hello-
>
> May I please ping these two preprocessor patches?
>
> For PR103902:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html
>
> For PR55971:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html
>
> Thanks!

Hello-

I would very much appreciate feedback on these two patches please?

For the first patch, I think it is a worthwhile goal to fix all the
places where libcpp fails to support UTF-8 correctly, and this is one
of two remaining ones that I'm aware of. I can fix the other case
(handling of #pragma push_macro) once this one is in place.

The second patch is about libcpp not allowing raw strings containing
newlines in preprocessor directives, which is a nearly decade-old
glitch that I think is also worth addressing.

Thanks!

-Lewis


[PATCH] libcpp: Improve location for macro names [PR66290]

2022-08-05 Thread Lewis Hyatt via Gcc-patches

When libcpp reports diagnostics whose locus is a macro name (such as for
-Wunused-macros), it uses the location in the cpp_macro object that was
stored by _cpp_new_macro. This is currently set to pfile->directive_line,
which contains the line number only and no column information. This patch
changes the stored location to the src_loc for the token defining the macro
name, which includes the location and range information.

libcpp/ChangeLog:

PR c++/66290
* macro.cc (_cpp_create_definition): Add location argument.
* internal.h (_cpp_create_definition): Adjust prototype.
* directives.cc (do_define): Pass new location argument to
_cpp_create_definition.
(do_undef): Stop passing inferior location to cpp_warning_with_line;
the default from cpp_warning is better.
(cpp_pop_definition): Pass new location argument to
_cpp_create_definition.
* pch.cc (cpp_read_state): Likewise.

gcc/testsuite/ChangeLog:

PR c++/66290
* c-c++-common/cpp/macro-ranges.c: New test.
* c-c++-common/cpp/line-2.c: Adapt to check for column information
on macro-related libcpp warnings.
* c-c++-common/cpp/line-3.c: Likewise.
* c-c++-common/cpp/macro-arg-count-1.c: Likewise.
* c-c++-common/cpp/pr58844-1.c: Likewise.
* c-c++-common/cpp/pr58844-2.c: Likewise.
* c-c++-common/cpp/warning-zero-location.c: Likewise.
* c-c++-common/pragma-diag-14.c: Likewise.
* c-c++-common/pragma-diag-15.c: Likewise.
* g++.dg/modules/macro-2_d.C: Likewise.
* g++.dg/modules/macro-4_d.C: Likewise.
* g++.dg/modules/macro-4_e.C: Likewise.
* g++.dg/spellcheck-macro-ordering.C: Likewise.
* gcc.dg/builtin-redefine.c: Likewise.
* gcc.dg/cpp/Wunused.c: Likewise.
* gcc.dg/cpp/redef2.c: Likewise.
* gcc.dg/cpp/redef3.c: Likewise.
* gcc.dg/cpp/redef4.c: Likewise.
* gcc.dg/cpp/ucnid-11-utf8.c: Likewise.
* gcc.dg/cpp/ucnid-11.c: Likewise.
* gcc.dg/cpp/undef2.c: Likewise.
* gcc.dg/cpp/warn-redefined-2.c: Likewise.
* gcc.dg/cpp/warn-redefined.c: Likewise.
* gcc.dg/cpp/warn-unused-macros-2.c: Likewise.
* gcc.dg/cpp/warn-unused-macros.c: Likewise.
---

Notes:
Hello-

The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66290) was originally
about the entirely wrong location for -Wunused-macros in C++ mode, which
behavior was fixed by r13-1903, but before closing it out I wanted to also
address a second point brought up in the PR comments, namely that we do not
include column information when emitting diagnostics for macro names, such 
as
is done for -Wunused-macros. The attached patch updates the location stored 
in
the cpp_macro object so that it includes the column and range information 
for
the token comprising the macro name; previously, the location was just the
generic one pointing to the whole line.

The change to libcpp is very small, the reason for all the testsuite 
changes is
that I have updated all tests explicitly looking for the columnless 
diagnostics
(with the "-:" syntax to dg-warning et al) so that they expect a column
instead. I also added a new test which verifies the expected range 
information
in diagnostics with carets.

Bootstrap + regtest on x86-64 Linux looks good. Please let me know if it 
looks
OK? Thanks!

-Lewis

 libcpp/directives.cc  |  13 +-
 libcpp/internal.h |   2 +-
 libcpp/macro.cc   |  12 +-
 libcpp/pch.cc |   2 +-
 gcc/testsuite/c-c++-common/cpp/line-2.c   |   2 +-
 gcc/testsuite/c-c++-common/cpp/line-3.c   |   2 +-
 .../c-c++-common/cpp/macro-arg-count-1.c  |   4 +-
 gcc/testsuite/c-c++-common/cpp/macro-ranges.c |  52 ++
 gcc/testsuite/c-c++-common/cpp/pr58844-1.c|   4 +-
 gcc/testsuite/c-c++-common/cpp/pr58844-2.c|   4 +-
 .../c-c++-common/cpp/warning-zero-location.c  |   2 +-
 gcc/testsuite/c-c++-common/pragma-diag-14.c   |   2 +-
 gcc/testsuite/c-c++-common/pragma-diag-15.c   |   2 +-
 gcc/testsuite/g++.dg/modules/macro-2_d.C  |   4 +-
 gcc/testsuite/g++.dg/modules/macro-4_d.C  |   4 +-
 gcc/testsuite/g++.dg/modules/macro-4_e.C  |   2 +-
 .../g++.dg/spellcheck-macro-ordering.C|   2 +-
 gcc/testsuite/gcc.dg/builtin-redefine.c   |  18 +-
 gcc/testsuite/gcc.dg/cpp/Wunused.c|   6 +-
 gcc/testsuite/gcc.dg/cpp/redef2.c |  20 +-
 gcc/testsuite/gcc.dg/cpp/redef3.c |  14 +-
 gcc/testsuite/gcc.dg/cpp/redef4.c | 520 +-
 gcc/testsuite/gcc.dg/cpp/ucnid-11-utf8.c  |  12 +-
 gcc/testsuite/gcc.dg/cpp/ucnid-11.c   |  12 +-
 gcc/testsuite/gcc.dg/cpp/undef2.c |   6 +-
 gcc/testsuite/gcc.dg/cpp/warn-redefined-2.c   |  10 +-
 

Re: [COMMITTED,gcc12] c: Fix location for _Pragma tokens [PR97498]

2022-08-02 Thread Lewis Hyatt via Gcc-patches
On Mon, Aug 01, 2022 at 07:15:48PM -0400, Lewis Hyatt wrote:
> Hello-
> 
> This backport from r13-1596 to GCC 12 has been committed after
> pre-approval. This was a straightforward cherry-pick from master with no
> adjustments needed. I would like to note that subsequent to r13-1596, Thomas
> made a few commits to the libgomp testsuite to test for new diagnostic notes
> output after this patch; I have not backported these since I was not sure if
> that would be appropriate. I did verify that the libgomp testsuite changes
> work OK as-is on this branch, i.e. do not introduce any new failures,
> including with offloading enabled.

I have done the same for GCC 11 and 10 branches, patches attached. Thanks!

-Lewis
[GCC10] c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:

a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.

With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.

This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.

gcc/c/ChangeLog:

PR preprocessor/97498
* c-parser.c (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.

libcpp/ChangeLog:

PR preprocessor/97498
* directives.c (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.

gcc/testsuite/ChangeLog:

PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* gcc.dg/pragma-message.c: Adapt for improved warning locations.

(cherry picked from commit 0587cef3d7962a8b0f44779589ba2920dd3d71e5)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 2d347ad927c..b2c7a74b464 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -12328,6 +12328,7 @@ c_parser_pragma (c_parser *parser, enum pragma_context 
context, bool *if_p)
   unsigned int id;
   const char *construct = NULL;
 
+  input_location = c_parser_peek_token (parser)->location;
   id = c_parser_peek_token (parser)->pragma_kind;
   gcc_assert (id != PRAGMA_NONE);
 
diff --git a/gcc/testsuite/c-c++-common/pr97498.c 
b/gcc/testsuite/c-c++-common/pr97498.c
new file mode 100644
index 000..f5fa420415b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr97498.c
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wunused-function" } */
+#pragma GCC diagnostic ignored "-Wunused-function"
+static void f() {} _Pragma("GCC diagnostic error \"-Wunused-function\"") /* { 
dg-bogus "-Wunused-function" } */
diff --git a/gcc/testsuite/gcc.dg/pragma-message.c 
b/gcc/testsuite/gcc.dg/pragma-message.c
index 2f44b617710..1b7cf09de0a 100644
--- a/gcc/testsuite/gcc.dg/pragma-message.c
+++ b/gcc/testsuite/gcc.dg/pragma-message.c
@@ -42,9 +42,11 @@
 #pragma message ("Okay " THREE)  /* { dg-message "Okay 3" } */
 
 /* Create a TODO() that prints a message on compilation.  */
-#define DO_PRAGMA(x) _Pragma (#x)
-#define TODO(x) DO_PRAGMA(message ("TODO - " #x))
-TODO(Okay 4) /* { dg-message "TODO - Okay 4" } */
+#define DO_PRAGMA(x) _Pragma (#x) /* { dg-line pragma_loc1 } */
+#define TODO(x) DO_PRAGMA(message ("TODO - " #x)) /* { dg-line pragma_loc2 } */
+TODO(Okay 4) /* { dg-message "in expansion of macro 'TODO'" } */
+/* { dg-message "TODO - Okay 4" "test4.1" { target *-*-* } pragma_loc1 } */
+/* { dg-message "in expansion of macro 'DO_PRAGMA'" "test4.2" { target *-*-* } 
pragma_loc2 } */
 
 #if 0
 #pragma message ("Not printed")
diff --git a/libcpp/directives.c b/libcpp/directives.c
index cab9aad64d2..0d8545a7df9 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -1887,6 +1887,7 @@ destringize_and_run (cpp_reader *pfile, const cpp_string 
*in,
   maxcount = 50;
   toks = XNEWVEC (cpp_token, maxcount);
   toks[0] = pfile->directive_result;
+  toks[0].src_loc = expansion_loc;
 
   do
{
[GCC11] c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location 

[COMMITTED,gcc12] c: Fix location for _Pragma tokens [PR97498]

2022-08-01 Thread Lewis Hyatt via Gcc-patches
Hello-

This backport from r13-1596 to GCC 12 has been committed after
pre-approval. This was a straightforward cherry-pick from master with no
adjustments needed. I would like to note that subsequent to r13-1596, Thomas
made a few commits to the libgomp testsuite to test for new diagnostic notes
output after this patch; I have not backported these since I was not sure if
that would be appropriate. I did verify that the libgomp testsuite changes
work OK as-is on this branch, i.e. do not introduce any new failures,
including with offloading enabled.

-Lewis
c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:

a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.

With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.

This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.

gcc/c/ChangeLog:

PR preprocessor/97498
* c-parser.cc (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.

libcpp/ChangeLog:

PR preprocessor/97498
* directives.cc (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.

gcc/testsuite/ChangeLog:

PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations.
* c-c++-common/gomp/pragma-5.c: Likewise.
* gcc.dg/pragma-message.c: Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for
improved warning locations.
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

(cherry picked from commit 0587cef3d7962a8b0f44779589ba2920dd3d71e5)

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 129dd727ef3..f679d53706a 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -12378,6 +12378,7 @@ c_parser_pragma (c_parser *parser, enum pragma_context 
context, bool *if_p)
   unsigned int id;
   const char *construct = NULL;
 
+  input_location = c_parser_peek_token (parser)->location;
   id = c_parser_peek_token (parser)->pragma_kind;
   gcc_assert (id != PRAGMA_NONE);
 
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-3.c 
b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
index c1dee1bcc62..ae18e9b8886 100644
--- a/gcc/testsuite/c-c++-common/gomp/pragma-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
@@ -1,13 +1,14 @@
 /* { dg-additional-options "-fdump-tree-original" }  */
 /* PR preprocessor/103165  */
 
-#define inner(...) #__VA_ARGS__ ; _Pragma("omp error severity(warning) message 
(\"Test\") at(compilation)")
+#define inner(...) #__VA_ARGS__ ; _Pragma("omp error severity(warning) message 
(\"Test\") at(compilation)") /* { dg-line inner_location } */
 #define outer(...) inner(__VA_ARGS__)
 
 void
 f (void)
 {
-  const char *str = outer(inner(1,2));  /* { dg-warning "'pragma omp error' 
encountered: Test" } */
+  const char *str = outer(inner(1,2));
+  /* { dg-warning "'pragma omp error' encountered: Test" "inner expansion" { 
target *-*-* } inner_location } */
 }
 
 #if 0
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-5.c 
b/gcc/testsuite/c-c++-common/gomp/pragma-5.c
index af54b682789..8124f701502 100644
--- a/gcc/testsuite/c-c++-common/gomp/pragma-5.c
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-5.c
@@ -1,13 +1,14 @@
 /* { dg-additional-options "-fdump-tree-original" }  */
 /* PR preprocessor/103165  */
 
-#define inner(...) #__VA_ARGS__ ; _Pragma   (  "   omp error severity  
 (warning)  message (\"Test\") at(compilation)" )
+#define inner(...) #__VA_ARGS__ ; _Pragma   (  "   omp error severity  
 (warning)  message (\"Test\") at(compilation)" ) /* { dg-line 
inner_location } */
 #define outer(...) inner(__VA_ARGS__)
 
 void
 f (void)
 {
-  const char *str = outer(inner(1,2));  /* { dg-warning "'pragma omp error' 
encountered: Test" } */
+  const char *str = outer(inner(1,2));
+  /* { dg-warning "'pragma omp error' encountered: Test" "inner expansion" { 

Re: [PATCH 1/1] c++/106423: Fix pragma suppression of -Wc++20-compat diagnostics.

2022-07-31 Thread Lewis Hyatt via Gcc-patches
On Sat, Jul 30, 2022 at 7:06 PM Tom Honermann via Gcc-patches
 wrote:
>
> On 7/27/22 7:09 PM, Joseph Myers wrote:
> > On Sun, 24 Jul 2022, Tom Honermann via Gcc-patches wrote:
> >
> >> Gcc's '#pragma GCC diagnostic' directives are processed in "early mode"
> >> (see handle_pragma_diagnostic_early) for the C++ frontend and, as such,
> >> require that the target diagnostic option be enabled for the preprocessor
> >> (see c_option_is_from_cpp_diagnostics).  This change modifies the
> >> -Wc++20-compat option definition to register it as a preprocessor option
> >> so that its associated diagnostics can be suppressed.  The changes also
> > There are lots of C++ warning options, all of which should support pragma
> > suppression regardless of whether they are relevant to the preprocessor or
> > not.  Do they all need this kind of handling, or is it only -Wc++20-compat
> > that has some kind of problem?
>
> I had only checked -Wc++20-compat when working on the patch.
>
> I did some spot checking now and confirmed that suppression works as
> expected for C++ for at least the following warnings:
>-Wuninitialized
>-Warray-compare
>-Wbool-compare
>-Wtautological-compare
>-Wterminate
>
> I don't know the diagnostic framework well. As best I can tell, this
> issue is specific to the -Wc++20-compat option and when the particular
> diagnostic is issued (e.g., during lexing as opposed to during parsing).
> The following call chains appear to be relevant.
>cp_lexer_new_main -> cp_lexer_handle_early_pragma ->
> c_invoke_early_pragma_handler
>cp_parser_* -> cp_parser_pragma -> c_invoke_pragma_handler
>(where * might be "declaration", "toplevel_declaration",
> "class_head", "objc_interstitial_code", ...)
>
> The -Wc++20-compat enabled warning regarding new keywords in C++20 is
> issued from cp_lexer_get_preprocessor_token.
>
> Tom.
>

I have been working on improving the handling of "#pragma GCC
diagnostic" lately. The behavior for C++ changed since r13-1544
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=e46f4d7430c5210465791603735ab219ef263c51).
I have some more comments about the patch's approach on the PR
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53431#c44).

"#pragma GCC diagnostic" formerly did not work in C++ at all, for
diagnostics generated by libcpp, because C++ obtains all the tokens
from libcpp first (including deferred pragmas), and then processes
them afterward, too late to take effect for diagnostics that libcpp
has already emitted. r13-1544 fixed this up by adding an early pragma
handler, which runs as soon as a deferred pragma token is seen and
handles diagnostic pragmas if they pertain to libcpp-controlled
diagnostics. Non-libcpp diagnostics still need to be handled later,
during parsing, or else they get processed too early and it leads to
other problems. Basically, now each diagnostic pragma is handled as
close in time as possible to the time the associated diagnostics might
be generated.

The early pragma handler determines that an option comes from libcpp,
and so should be subject to early processing, if it was marked as such
in the options definition file. Tom's patch points out that
-Wc++20-compat needs to be handled early, and so marking it as a
libcpp diagnostic in c-family/c.opt arranges for that to work as
intended. Now one potential objection here is that -Wc++20-compat
warnings are not technically generated by libcpp. They are generated
by the C++ frontend immediately after lexing an identifier token from
libcpp (cp_lexer_get_preprocessor_token()). But the distinction
between these two steps is rather blurry and it seems logical to me,
to denote this as a libcpp-related option. Also, the same is already
done for -Wc++11-compat. Otherwise, we would need to add some new
option property to indicate which ones need to be handled for pragmas
at lexing time rather than parsing time.

At the moment I don't see any other diagnostics issued from
cp_lexer_get_preprocessor_token() that would need similar adjustments.
Assuming the approach is OK, it might be nice to add a comment to that
function, indicating that any diagnostics emitted there should be
annotated as libcpp options in the .opt file?

-Lewis


Re: [PATCH] c: Fix location for _Pragma tokens [PR97498]

2022-07-31 Thread Lewis Hyatt via Gcc-patches
On Sat, Jul 30, 2022 at 10:43 PM Jeff Law  wrote:
> > There was a request to backport this
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498#c7) since it is
> > relevant to this one:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106267. Is that OK as
> > well for any of the current release branches please? It will work fine
> > as far back as 10. Thanks...
> Generally we try to focus mostly on codegen issues and regressions on
> the release branches, but it's not a strict rule.  Given this has been
> on the trunk for nearly a couple weeks without issues, feel free to go
> ahead and backport per Martin's request.
>
> jeff

Thank you, I'll do that. One question, does a backport need to be an
exact cherry-pick, or is it OK if I need to tweak a few things as
well? I wasn't sure if I need to re-post the patch here in that case.
The patch itself applies to gcc 12 branch fine, however I think I need
a couple small changes to the testsuite parts. Thanks...

-Lewis


[PATCH] c++: Fix location for -Wunused-macros [PR66290]

2022-07-28 Thread Lewis Hyatt via Gcc-patches

In C++, since all tokens are lexed from libcpp up front, diagnostics generated
by libcpp after lexing has completed do not get a valid location from libcpp
(rather, libcpp thinks they all pertain to the end of the file.) This has long
been addressed using the global variable "done_lexing", which the C++ frontend
sets at the appropriate time; when done_lexing is true, then c_cpp_diagnostic(),
which outputs libcpp's diagnostics, uses input_location instead of the wrong
libcpp location. The C++ frontend arranges that input_location will point to the
token it is currently processing, so this generally works fine. However, there
is one exception currently, which is -Wunused-macros. This gets generated at the
end of processing in cpp_finish (), since we need to wait until then to
determine whether a macro was eventually used or not. But the locations it
passes to c_cpp_diagnostic () were remembered from the original lexing and hence
they should not be overridden with input_location, which is now the one
incorrectly pointing to the end of the file.

Fixed by setting done_lexing=false again just prior to calling cpp_finish (). I
also renamed the variable from done_lexing to "override_libcpp_locations", since
it's now not strictly about lexing anymore.

There is no new testcase with this patch, since we already had an xfailed
testcase which is now fixed.

gcc/c-family/ChangeLog:

PR c++/66290
* c-common.h: Rename global done_lexing to
override_libcpp_locations.
* c-common.cc (c_cpp_diagnostic): Likewise.
* c-opts.cc (c_common_finish): Set override_libcpp_locations
(formerly done_lexing) immediately prior to calling cpp_finish ().

gcc/cp/ChangeLog:

PR c++/66290
* parser.cc (cp_lexer_new_main): Rename global done_lexing to
override_libcpp_locations.

gcc/testsuite/ChangeLog:

PR c++/66290
* c-c++-common/pragma-diag-15.c: Remove xfail for C++.
---

Notes:
Hello-

The attached patch fixes PR66290, which is about C++ diagnostics using
the wrong location for -Wunused-macros. Please let me know if it looks OK?

Bootstrap + regtest all languages on x86-64 linux looks good, no changes 
other
than the un-XFAILed testcases.

Thank you!

-Lewis

 gcc/c-family/c-common.cc| 10 ++
 gcc/c-family/c-common.h |  8 +---
 gcc/c-family/c-opts.cc  |  6 ++
 gcc/cp/parser.cc|  2 +-
 gcc/testsuite/c-c++-common/pragma-diag-15.c |  2 +-
 5 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 655c3aefee6..6e41ceb38e9 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -284,9 +284,11 @@ int c_inhibit_evaluation_warnings;
be generated.  */
 bool in_late_binary_op;
 
-/* Whether lexing has been completed, so subsequent preprocessor
-   errors should use the compiler's input_location.  */
-bool done_lexing = false;
+/* Depending on which phase of processing we are in, we may need
+   to prefer input_location to libcpp's locations.  (Specifically,
+   after the C++ lexer is done lexing tokens, but prior to calling
+   cpp_finish (), we need to do so.  */
+bool override_libcpp_locations;
 
 /* Information about how a function name is generated.  */
 struct fname_var_t
@@ -6681,7 +6683,7 @@ c_cpp_diagnostic (cpp_reader *pfile ATTRIBUTE_UNUSED,
 default:
   gcc_unreachable ();
 }
-  if (done_lexing)
+  if (override_libcpp_locations)
 richloc->set_range (0, input_location, SHOW_RANGE_WITH_CARET);
   diagnostic_set_info_translated (, msg, ap,
   richloc, dlevel);
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index f9064393b4e..c06769b6f0b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -767,10 +767,12 @@ extern int max_tinst_depth;
 
 extern int c_inhibit_evaluation_warnings;
 
-/* Whether lexing has been completed, so subsequent preprocessor
-   errors should use the compiler's input_location.  */
+/* Depending on which phase of processing we are in, we may need
+   to prefer input_location to libcpp's locations.  (Specifically,
+   after the C++ lexer is done lexing tokens, but prior to calling
+   cpp_finish (), we need to do so.  */
 
-extern bool done_lexing;
+extern bool override_libcpp_locations;
 
 /* C types are partitioned into three subsets: object, function, and
incomplete types.  */
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index b9f01a65ed7..4e1463689de 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1281,6 +1281,12 @@ c_common_finish (void)
 	}
 }
 
+  /* When we call cpp_finish (), it may generate some diagnostics using
+ locations it remembered from the preprocessing phase, e.g. for
+ -Wunused-macros.  So inform c_cpp_diagnostic () not to override those
+ locations with input_location, which 

Ping: 2 libcpp patches

2022-07-20 Thread Lewis Hyatt via Gcc-patches
Hello-

May I please ping these two preprocessor patches?

For PR103902:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596704.html

For PR55971:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596820.html

Thanks!

-Lewis


Re: [PATCH] libphobos: Fix instability in the parallelized testsuite

2022-07-17 Thread Lewis Hyatt via Gcc-patches
> Hi Lewis,
> 
> Thanks! Good spot. I think it should be calling dg-runtest however,
> same as what libphobos.cycles/cycles.exp is doing. Could also fix the
> test name so each one is unique, just to hit two birds in one -
> something like the following would suffice (haven't had time to check).
> 
> Kind Regards,
> Iain.
> 
> ---
> 
> --- a/libphobos/testsuite/libphobos.unittest/unittest.exp
> +++ b/libphobos/testsuite/libphobos.unittest/unittest.exp
> @@ -42,8 +42,10 @@ foreach unit_test $unit_test_list {
>  set expected_fail [lindex $unit_test 1]
>  
>  foreach test $tests {
> -set shouldfail $expected_fail
> -dg-test $test "" $test_flags
> + set libphobos_test_name "[dg-trim-dirname $srcdir $test] $test_flags"
> + set shouldfail $expected_fail
> + dg-runtest $test "" $test_flags
> + set libphobos_test_name ""
>  }
>  
>  set shouldfail 0
> 

Thanks for the followup. I tested and can confirm your version works fine:

PASS: libphobos.unittest/customhandler.d -fversion=FailNoPrintout (test for 
excess errors)
PASS: libphobos.unittest/customhandler.d -fversion=FailNoPrintout execution test
PASS: libphobos.unittest/customhandler.d -fversion=FailedTests (test for excess 
errors)
PASS: libphobos.unittest/customhandler.d -fversion=FailedTests execution test
PASS: libphobos.unittest/customhandler.d -fversion=GoodTests (test for excess 
errors)
PASS: libphobos.unittest/customhandler.d -fversion=GoodTests execution test
PASS: libphobos.unittest/customhandler.d -fversion=NoTests (test for excess 
errors)
PASS: libphobos.unittest/customhandler.d -fversion=NoTests execution test
PASS: libphobos.unittest/customhandler.d -fversion=PassNoPrintout (test for 
excess errors)
PASS: libphobos.unittest/customhandler.d -fversion=PassNoPrintout execution test

Let me know if you want me to do anything from there please?  By the way, there
are a few other tests that cause some minor glitches with comparing results:

libphobos.sum:PASS: libphobos.shared/link.d 
-I/home/lewis/gccdev/base/src/libphobos/testsuite/libphobos.shared lib.so 
-shared-libphobos (test for excess errors)
libphobos.sum:PASS: libphobos.shared/link.d 
-I/home/lewis/gccdev/base/src/libphobos/testsuite/libphobos.shared lib.so 
-shared-libphobos execution test
libphobos.sum:PASS: libphobos.shared/link_linkdep.d 
-I/home/lewis/gccdev/base/src/libphobos/testsuite/libphobos.shared 
liblinkdep.so lib.so -shared-libphobos (test for excess errors)
libphobos.sum:PASS: libphobos.shared/link_linkdep.d 
-I/home/lewis/gccdev/base/src/libphobos/testsuite/libphobos.shared 
liblinkdep.so lib.so -shared-libphobos execution test
libphobos.sum:PASS: libphobos.shared/link_loaddep.d 
-I/home/lewis/gccdev/base/src/libphobos/testsuite/libphobos.shared 
libloaddep.so -shared-libphobos (test for excess errors)
libphobos.sum:PASS: libphobos.shared/link_loaddep.d 
-I/home/lewis/gccdev/base/src/libphobos/testsuite/libphobos.shared 
libloaddep.so -shared-libphobos execution test

The problem here is that the absolute path to the test dir ends up in
the results summary, since it appears in the options string that is
part of the test name. It's not so hard to work around when doing the
comparisons, but it seems to be the only case where this happens in
the whole testsuite, other than one other similar case from libgo. Is
there a standard way to handle it I take it? Thanks...

-Lewis


Re: [PATCH] c: Fix location for _Pragma tokens [PR97498]

2022-07-17 Thread Lewis Hyatt via Gcc-patches
On Sat, Jul 9, 2022 at 11:59 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 7/9/2022 2:52 PM, Lewis Hyatt via Gcc-patches wrote:
> > Hello-
> >
> > PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR
> > related to the fact that imprecise locations for _Pragma result in
> > counterintuitive behavior for GCC diagnostic pragmas, which inhibit the
> > ability to make convenient wrapper macros for enabling and disabling
> > diagnostics in specific scopes.
> >
> > It looks like David did a lot of work a few years ago improving this
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular
> > r233637 added a lot of new test coverage for cases that had regressed in the
> > past.
> >
> > I think the main source of problems for all remaining issues is that we use
> > the global input_location for deciding when/if a diagnostic should apply. I
> > think it should be eventually doable to eliminate this, and rather properly
> > resolve the token locations to the place they need to be
> I've long wanted to see our dependency on input_location be diminished
> with the goal of making it go away completely.
> > so that _Pragma
> > type wrapper macros just work the way people expect.
> Certainly desirable since many projects have built wrapper macros which
> use Pragmas to control warnings.  One of the biggest QOI implementation
> details we've had with the warnings has been problems with location data
> leading to an inability to turn them off in specific locations.
>
> So I'm all for improvements, in terms of getting our location data more
> correct.
>
>
>
> >
> > That said, PR97498 can be solved easily with a 2-line fix without removing
> > input_location, and I think the resulting change to input_location's value
> > is an improvement that will benefit other areas, so I thought I'd see what
> > you think about this patch please?
> >
> > Here is a typical testcase. Note the line continuations so it's all one
> > logical line.
> >
> > ===
> > _Pragma("GCC diagnostic push") \
> > _Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
> > static void f() {} \
> > _Pragma("GCC diagnostic pop")
> > ===
> >
> > What happens is that in C++ mode, input_location is always updated to the
> > most recently-lexed token, so the above case works fine and does not warn
> > when compiled with "g++ -Wunused-functions". However, in C mode, it does
> > warn because input_location in C is almost always set to the start of the
> > line, and is in this case. So the pop is deemed to take place prior to the
> > definition of f().
> >
> > Initially, I thought it best to change input_location for C mode to behave
> > like C++, and always update to the most recently lexed token. Maybe that's
> > still the right way to go, but there was a fair amount of testsuite fallout
> > from that. Most of it, was just that we would need to change the tests to 
> > look
> > for the new locations, and in many cases, the new locations seemed
> > preferable to the old ones, but it seemed a bit much for now, so I took a
> > more measured approach and just changed input_location in the specific case
> > of processing a pragma, to be the location of the CPP_PRAGMA token.
> >
> > Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides
> > to represent the _Pragma() expression doesn't have a valid location with
> > which input_location could be overridden. Looking into that, in r232893
> > David added logic which sets the location of all tokens inside the
> > _Pragma(...) to a reasonable place (namely it points to "_Pragma" at the
> > expansion point). However, that patch didn't change the location of the
> > CPP_PRAGMA token itself to similarly point there, so the 2nd line of this
> > patch does that.
> >
> > The rest of it is just tweaking a couple tests which were sensitive to the
> > location being output. In all these cases, the new locations seem more
> > informative to me than the old ones. With those tweaks, bootstrap + regtest
> > all languages looks good with no regressions.
> >
> > Please let me know what you think? Thanks!
> > gcc/c/ChangeLog:
> >
> >   PR preprocessor/97498
> >   * c-parser.cc (c_parser_pragma): Set input_location to the
> >   location of the pragma, rather than the start of the line.
> >
> > libcpp/ChangeLog:
> >
> >   PR preprocessor/974

[PATCH] libphobos: Fix instability in the parallelized testsuite

2022-07-14 Thread Lewis Hyatt via Gcc-patches
Hello-

I get a different number of test results from libphobos.unittest/unittest.exp,
depending on server load. I believe it's because this testsuite doesn't check
runtest_file_p:

$ make -j 1 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#'
 # of expected passes   10

$ make -j 2 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#'
 # of expected passes   10
 # of expected passes   10

$ make -j 4 RUNTESTFLAGS='unittest.exp' check-target-libphobos | grep '^#'
 # of expected passes   10
 # of expected passes   10
 # of expected passes   10
 # of expected passes   10

When running in parallel along with other tests, even at a fixed argument
for -j, the number of tests that actually execute will depend on how many of the
parallel sub-makes happened to start prior to the first one finishing, hence
it changes from run to run.

The attached patch fixes it for me, if it looks OK? Thanks, this would remove
some noise from before/after test comparisons.

-Lewis
libphobos: Fix instability in the parallelized testsuite

libphobos.unittest/unittest.exp calls bare dg-test rather than dg-runtest, and
so it should call runtest_file_p to determine whether to run each test or
not. Without that call, the tests run too many times in parallel mode (they will
run as many times, as the argument to make -j).

libphobos/ChangeLog:

* testsuite/libphobos.unittest/unittest.exp: Call runtest_file_p
prior to running each test.

diff --git a/libphobos/testsuite/libphobos.unittest/unittest.exp 
b/libphobos/testsuite/libphobos.unittest/unittest.exp
index 2a019caca8c..175decdc333 100644
--- a/libphobos/testsuite/libphobos.unittest/unittest.exp
+++ b/libphobos/testsuite/libphobos.unittest/unittest.exp
@@ -42,6 +42,9 @@ foreach unit_test $unit_test_list {
 set expected_fail [lindex $unit_test 1]
 
 foreach test $tests {
+   if {![runtest_file_p $runtests $test]} {
+continue
+}
 set shouldfail $expected_fail
 dg-test $test "" $test_flags
 }


Re: XFAIL 'offloading_enabled' diagnostics issue in 'libgomp.oacc-c-c++-common/reduction-5.c' [PR101551] (was: Enhance '_Pragma' diagnostics verification in OMP C/C++ test cases)

2022-07-13 Thread Lewis Hyatt via Gcc-patches
On Tue, Jul 12, 2022 at 9:10 AM Tobias Burnus  wrote:
> On 12.07.22 13:50, Lewis Hyatt via Gcc-patches wrote:
> > On Tue, Jul 12, 2022 at 2:33 AM Thomas Schwinge  
> > wrote:
> >> On 2022-07-11T11:27:12+0200, I wrote:
> >>> Oh my, PR101551 "[offloading] Differences in diagnostics etc."
> >>> strikes again...  The latter two 'note' diagnostics are currently
> >>> only emitted in non-offloading configurations.  I've now pushed to
> >>> master branch commit 3723aedaad20a129741c2f6f3c22b3dd1220a3fc
> >>> "XFAIL 'offloading_enabled' diagnostics issue in
> >>> 'libgomp.oacc-c-c++-common/reduction-5.c' [PR101551]", see attached.
> > Would you mind please confirming how I need to run configure in order
> > to get this configuration? Then I can look into why the difference in
> > location information there. Thanks.
>
> I think the simplest to replicate it without much effort is to run:
>
> cd ${GCC-SRC}/gcc
> sed -e 's/ENABLE_OFFLOADING/true/' *.cc */*.cc
>
> I think that covers all cases, which do not need the target lto1.
> If they do do - then it becomes more difficult as you need an
> offloading compiler. (But that is rather about: diagnostic or
> no diagostic and not about having a different diagnostic.)
>
> I think the different diagnostic has the reason stated in
> commit r12-135-gbd7ebe9da745a62184052dd1b15f4dd10fbdc9f4
>
> Namely:
> cut---
>  It turned out that a compiler built without offloading support
>  and one with can produce slightly different diagnostic.
>
>  Offloading support implies ENABLE_OFFLOAD which implies that
>  g->have_offload is set when offloading is actually needed.
>  In cgraphunit.c, the latter causes flag_generate_offload = 1,
>  which in turn affects tree.c's free_lang_data.
>
>  The result is that the front-end specific diagnostic gets reset
>  ('tree_diagnostics_defaults (global_dc)'), which affects in this
>  case 'Warning' vs. 'warning' via the Fortran frontend.
>
>  Result: 'Warning:' vs. 'warning:'.
>  Side note: Other FE also override the diagnostic, leading to
>  similar differences, e.g. the C++ FE outputs mangled function
>  names differently
> cut--
>
> If the message is from the offload-device's lto1 compiler, it
> becomes more difficult to configure+build GCC. See
> https://gcc.gnu.org/wiki/Offloading under
> "How to build an offloading-enabled GCC"
>
> I hope it helps.

Yes, very much, thank you. I am trying something that should improve
it, and also a similar issue that happens with -flto, I made this PR
about the latter: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106274


Re: XFAIL 'offloading_enabled' diagnostics issue in 'libgomp.oacc-c-c++-common/reduction-5.c' [PR101551] (was: Enhance '_Pragma' diagnostics verification in OMP C/C++ test cases)

2022-07-12 Thread Lewis Hyatt via Gcc-patches
On Tue, Jul 12, 2022 at 2:33 AM Thomas Schwinge  wrote:
>
> Hi!
>
> On 2022-07-11T11:27:12+0200, I wrote:
> > [...], I've just pushed to master branch
> > commit 06b2a2abe26554c6f9365676683d67368cbba206
> > "Enhance '_Pragma' diagnostics verification in OMP C/C++ test cases"
>
> > --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
> > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
> > @@ -17,7 +17,7 @@ const int n = 100;
> >  #define check_reduction(gwv_par, gwv_loop)   \
> >{  \
> >s1 = 2; s2 = 5;\
> > -DO_PRAGMA (acc parallel gwv_par copy (s1, s2))   \
> > +DO_PRAGMA (acc parallel gwv_par copy (s1, s2)) /* { dg-line DO_PRAGMA_loc 
> > } */ \
> >  DO_PRAGMA (acc loop gwv_loop reduction (+:s1, s2))   \
> >  for (i = 0; i < n; i++)  \
> >{  \
> > @@ -45,8 +45,10 @@ main (void)
> >
> >/* Nvptx targets require a vector_length or 32 in to allow spinlocks with
> >   gangs.  */
> > -  check_reduction (num_workers (nw) vector_length (vl), worker);
> > -  /* { dg-warning "region is vector partitioned but does not contain 
> > vector partitioned code" "test1" { target *-*-* } pragma_loc } */
> > +  check_reduction (num_workers (nw) vector_length (vl), worker); /* { 
> > dg-line check_reduction_loc }
> > +  /* { dg-warning "22:region is vector partitioned but does not contain 
> > vector partitioned code" "" { target *-*-* } pragma_loc }
> > + { dg-note "1:in expansion of macro 'DO_PRAGMA'" "" { target *-*-* } 
> > DO_PRAGMA_loc }
> > + { dg-note "3:in expansion of macro 'check_reduction'" "" { target 
> > *-*-* } check_reduction_loc } */
>
> Oh my, PR101551 "[offloading] Differences in diagnostics etc."
> strikes again...  The latter two 'note' diagnostics are currently
> only emitted in non-offloading configurations.  I've now pushed to
> master branch commit 3723aedaad20a129741c2f6f3c22b3dd1220a3fc
> "XFAIL 'offloading_enabled' diagnostics issue in
> 'libgomp.oacc-c-c++-common/reduction-5.c' [PR101551]", see attached.
>

Would you mind please confirming how I need to run configure in order
to get this configuration? Then I can look into why the difference in
location information there. Thanks

-Lewis


[PATCH] preprocessor: Set input_location to the most recently seen token

2022-07-11 Thread Lewis Hyatt via Gcc-patches
Hello-

As discussed here:
https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598136.html

Here is another short patch that improves "#pragma GCC diagnostic" handling.
Longer term, it will be desirable to make the handling of this pragma
independent of the global input_location. But in the meantime, some glitches
like this one can be readily addressed by making input_location point to
something better. In this case, input_location during preprocessing (-E or
-save-temps) is made to point to the most recently seen token rather than the
beginning of the file. To the best of my knowledge, nothing else besides
"#pragma GCC diagnostic" handling can observe input_location during
token streaming, so this is expected not to have any other
repercussions. Bootstrap + regtest does look clean on x86-64 Linux.

By the way, the new testcase fails when compiled with C++, but it's not
because of pragma handling, it's rather because the C++ frontend changes the
location on the warning to the wrong place. Once done_lexing has been set to
true, it changes the location of all warnings to input_location, however
that's not correct when the location is the cached location of a macro
definition; the original location is preferable. I will file a separate PR
about that, and have xfailed that testcase for now, since I am not quite there
with grokking the reason it behaves this way, and anyway it's not related to
this 1-line fix for gcc -E.

Please let me know how it looks? Thanks!

-Lewis
[PATCH] preprocessor: Set input_location to the most recently seen token

When preprocessing with -E and -save-temps, input_location points always to the
first character of the current file. This was previously irrelevant because
nothing was called during the token streaming process that would inspect
input_location. But since r13-1544, "#pragma GCC diagnostic" is supported in
preprocess-only mode, and that pragma relies on input_location to decide if a
given source code location is subject to a diagnostic or not. Most diagnostics
work fine anyway, because they are handled as soon as they are seen and so
everything is still seen in the expected order even though all the diagnostic
pragmas are treated as if they applied at the start of the file. One example
that doesn't work correctly is the new testcase, since here the warning is not
triggered until the end of the file and so it is necessary to track the location
properly.

Fixed by setting input_location to point to each token as it is being
streamed, similar to how C++ mode sets it.

gcc/c-family/ChangeLog:

* c-ppoutput.cc (token_streamer::stream): Update input_location
prior to streaming each token.

gcc/testsuite/ChangeLog:

* c-c++-common/pragma-diag-14.c: New test.
* c-c++-common/pragma-diag-15.c: New test.

diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index cd38c969ea0..98081ccfbb0 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -210,6 +210,10 @@ void
 token_streamer::stream (cpp_reader *pfile, const cpp_token *token,
location_t loc)
 {
+  /* Keep input_location up to date, since it is needed for processing early
+ pragmas such as #pragma GCC diagnostic.  */
+  input_location = loc;
+
   if (token->type == CPP_PADDING)
 {
   avoid_paste = true;
diff --git a/gcc/testsuite/c-c++-common/pragma-diag-14.c 
b/gcc/testsuite/c-c++-common/pragma-diag-14.c
new file mode 100644
index 000..618e7e1ef27
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pragma-diag-14.c
@@ -0,0 +1,9 @@
+/* { dg-do preprocess } */
+/* { dg-additional-options "-Wunused-macros" } */
+
+/* In the past, the pragma has erroneously disabled the warning because the
+   location was not tracked properly with -E or -save-temps; check that it 
works
+   now.  */
+
+#define X /* { dg-warning "-:-Wunused-macros" } */
+#pragma GCC diagnostic ignored "-Wunused-macros"
diff --git a/gcc/testsuite/c-c++-common/pragma-diag-15.c 
b/gcc/testsuite/c-c++-common/pragma-diag-15.c
new file mode 100644
index 000..d8076b4f93a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pragma-diag-15.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Wunused-macros" } */
+
+/* In the past, the pragma has erroneously disabled the warning because the
+   location was not tracked properly with -E or -save-temps; check that it 
works
+   now.
+
+   This test currently fails for C++ but it's not because of the pragma, it's
+   because the location of the macro definition is incorrectly set.  This is a
+   separate issue, will resolve it in a later patch.  */
+
+#define X /* { dg-warning "-:-Wunused-macros" {} { xfail c++ } } */
+#pragma GCC diagnostic ignored "-Wunused-macros"


[COMMITTED] c-family: Fix option check in handle_pragma_diagnostic [PR106252]

2022-07-11 Thread Lewis Hyatt via Gcc-patches
Hello-

PR106252 notes an error found by the address sanitizer from r13-1544. This was
my mistake during refactoring of handle_pragma_diagnostic(), the check
for the option arg was moved earlier than it should be. Committed the obvious
fix putting this back where it was before.

-Lewis
c-family: Fix option check in handle_pragma_diagnostic [PR106252]

In r13-1544, handle_pragma_diagnostic was refactored to support processing
early pragmas. During that process the part looking up option arguments was
inadvertenly moved too early, prior to checking the option was valid, causing
PR106252. Fixed by moving the check back where it goes.

gcc/c-family/ChangeLog:

PR preprocessor/106252
* c-pragma.cc (handle_pragma_diagnostic_impl): Don't look up the
option argument prior to verifying the option was found.

diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 62bce2ed0f5..789719e6e6a 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1009,10 +1009,6 @@ handle_pragma_diagnostic_impl ()
   if (early && !c_option_is_from_cpp_diagnostics (option_index))
 return;
 
-  const char *arg = NULL;
-  if (cl_options[option_index].flags & CL_JOINED)
-arg = data.option_str + 1 + cl_options[option_index].opt_len;
-
   if (option_index == OPT_SPECIAL_unknown)
 {
   if (want_diagnostics)
@@ -1052,6 +1048,10 @@ handle_pragma_diagnostic_impl ()
   return;
 }
 
+  const char *arg = NULL;
+  if (cl_options[option_index].flags & CL_JOINED)
+arg = data.option_str + 1 + cl_options[option_index].opt_len;
+
   struct cl_option_handlers handlers;
   set_default_handlers (, NULL);
   /* FIXME: input_location isn't the best location here, but it is


Re: [PATCH] c: Fix location for _Pragma tokens [PR97498]

2022-07-10 Thread Lewis Hyatt via Gcc-patches
On Sat, Jul 9, 2022 at 11:59 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 7/9/2022 2:52 PM, Lewis Hyatt via Gcc-patches wrote:
> > Hello-
> >
> > PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR
> > related to the fact that imprecise locations for _Pragma result in
> > counterintuitive behavior for GCC diagnostic pragmas, which inhibit the
> > ability to make convenient wrapper macros for enabling and disabling
> > diagnostics in specific scopes.
> >
> > It looks like David did a lot of work a few years ago improving this
> > (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular
> > r233637 added a lot of new test coverage for cases that had regressed in the
> > past.
> >
> > I think the main source of problems for all remaining issues is that we use
> > the global input_location for deciding when/if a diagnostic should apply. I
> > think it should be eventually doable to eliminate this, and rather properly
> > resolve the token locations to the place they need to be
> I've long wanted to see our dependency on input_location be diminished
> with the goal of making it go away completely.
> > so that _Pragma
> > type wrapper macros just work the way people expect.
> Certainly desirable since many projects have built wrapper macros which
> use Pragmas to control warnings.  One of the biggest QOI implementation
> details we've had with the warnings has been problems with location data
> leading to an inability to turn them off in specific locations.
>
> So I'm all for improvements, in terms of getting our location data more
> correct.
>
>
>
> >
> > That said, PR97498 can be solved easily with a 2-line fix without removing
> > input_location, and I think the resulting change to input_location's value
> > is an improvement that will benefit other areas, so I thought I'd see what
> > you think about this patch please?
> >
> > Here is a typical testcase. Note the line continuations so it's all one
> > logical line.
> >
> > ===
> > _Pragma("GCC diagnostic push") \
> > _Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
> > static void f() {} \
> > _Pragma("GCC diagnostic pop")
> > ===
> >
> > What happens is that in C++ mode, input_location is always updated to the
> > most recently-lexed token, so the above case works fine and does not warn
> > when compiled with "g++ -Wunused-functions". However, in C mode, it does
> > warn because input_location in C is almost always set to the start of the
> > line, and is in this case. So the pop is deemed to take place prior to the
> > definition of f().
> >
> > Initially, I thought it best to change input_location for C mode to behave
> > like C++, and always update to the most recently lexed token. Maybe that's
> > still the right way to go, but there was a fair amount of testsuite fallout
> > from that. Most of it, was just that we would need to change the tests to 
> > look
> > for the new locations, and in many cases, the new locations seemed
> > preferable to the old ones, but it seemed a bit much for now, so I took a
> > more measured approach and just changed input_location in the specific case
> > of processing a pragma, to be the location of the CPP_PRAGMA token.
> >
> > Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides
> > to represent the _Pragma() expression doesn't have a valid location with
> > which input_location could be overridden. Looking into that, in r232893
> > David added logic which sets the location of all tokens inside the
> > _Pragma(...) to a reasonable place (namely it points to "_Pragma" at the
> > expansion point). However, that patch didn't change the location of the
> > CPP_PRAGMA token itself to similarly point there, so the 2nd line of this
> > patch does that.
> >
> > The rest of it is just tweaking a couple tests which were sensitive to the
> > location being output. In all these cases, the new locations seem more
> > informative to me than the old ones. With those tweaks, bootstrap + regtest
> > all languages looks good with no regressions.
> >
> > Please let me know what you think? Thanks!
> > gcc/c/ChangeLog:
> >
> >   PR preprocessor/97498
> >   * c-parser.cc (c_parser_pragma): Set input_location to the
> >   location of the pragma, rather than the start of the line.
> >
> > libcpp/ChangeLog:
> >
> >   PR preprocessor/974

[PATCH] c: Fix location for _Pragma tokens [PR97498]

2022-07-09 Thread Lewis Hyatt via Gcc-patches
Hello-

PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR
related to the fact that imprecise locations for _Pragma result in
counterintuitive behavior for GCC diagnostic pragmas, which inhibit the
ability to make convenient wrapper macros for enabling and disabling
diagnostics in specific scopes.

It looks like David did a lot of work a few years ago improving this
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular
r233637 added a lot of new test coverage for cases that had regressed in the
past.

I think the main source of problems for all remaining issues is that we use
the global input_location for deciding when/if a diagnostic should apply. I
think it should be eventually doable to eliminate this, and rather properly
resolve the token locations to the place they need to be so that _Pragma
type wrapper macros just work the way people expect.

That said, PR97498 can be solved easily with a 2-line fix without removing
input_location, and I think the resulting change to input_location's value
is an improvement that will benefit other areas, so I thought I'd see what
you think about this patch please?

Here is a typical testcase. Note the line continuations so it's all one
logical line.

===
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wunused-function\"") \
static void f() {} \
_Pragma("GCC diagnostic pop")
===

What happens is that in C++ mode, input_location is always updated to the
most recently-lexed token, so the above case works fine and does not warn
when compiled with "g++ -Wunused-functions". However, in C mode, it does
warn because input_location in C is almost always set to the start of the
line, and is in this case. So the pop is deemed to take place prior to the
definition of f().

Initially, I thought it best to change input_location for C mode to behave
like C++, and always update to the most recently lexed token. Maybe that's
still the right way to go, but there was a fair amount of testsuite fallout
from that. Most of it, was just that we would need to change the tests to look
for the new locations, and in many cases, the new locations seemed
preferable to the old ones, but it seemed a bit much for now, so I took a
more measured approach and just changed input_location in the specific case
of processing a pragma, to be the location of the CPP_PRAGMA token.

Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides
to represent the _Pragma() expression doesn't have a valid location with
which input_location could be overridden. Looking into that, in r232893
David added logic which sets the location of all tokens inside the
_Pragma(...) to a reasonable place (namely it points to "_Pragma" at the
expansion point). However, that patch didn't change the location of the
CPP_PRAGMA token itself to similarly point there, so the 2nd line of this
patch does that.

The rest of it is just tweaking a couple tests which were sensitive to the
location being output. In all these cases, the new locations seem more
informative to me than the old ones. With those tweaks, bootstrap + regtest
all languages looks good with no regressions.

Please let me know what you think? Thanks!

-Lewis
[PATCH] c: Fix location for _Pragma tokens [PR97498]

The handling of #pragma GCC diagnostic uses input_location, which is not always
as precise as needed; in particular the relative location of some tokens and a
_Pragma directive will crucially determine whether a given diagnostic is enabled
or suppressed in the desired way. PR97498 shows how the C frontend ends up with
input_location pointing to the beginning of the line containing a _Pragma()
directive, resulting in the wrong behavior if the diagnostic to be modified
pertains to some tokens found earlier on the same line. This patch fixes that by
addressing two issues:

a) libcpp was not assigning a valid location to the CPP_PRAGMA token
generated by the _Pragma directive.
b) C frontend was not setting input_location to something reasonable.

With this change, the C frontend is able to change input_location to point to
the _Pragma token as needed.

This is just a two-line fix (one for each of a) and b)), the testsuite changes
were needed only because the location on the tested warnings has been somewhat
improved, so the tests need to look for the new locations.

gcc/c/ChangeLog:

PR preprocessor/97498
* c-parser.cc (c_parser_pragma): Set input_location to the
location of the pragma, rather than the start of the line.

libcpp/ChangeLog:

PR preprocessor/97498
* directives.cc (destringize_and_run): Override the location of
the CPP_PRAGMA token from a _Pragma directive to the location of
the expansion point, as is done for the tokens lexed from it.

gcc/testsuite/ChangeLog:

PR preprocessor/97498
* c-c++-common/pr97498.c: New test.
* 

[PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]

2022-07-07 Thread Lewis Hyatt via Gcc-patches
Hello-

The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91733) points out that,
while libcpp recognizes a lone '\r' as a valid line-ending character, the
infrastructure that obtains source lines to be printed in diagnostics does
not, and hence diagnostics do not output the intended portion of a source
file that uses such line endings. The PR's author suggests that libcpp
should stop accepting '\r' line endings, but that seems rather controversial
and not likely to change. Fixing the diagnostics is easy enough though, and
that's done by the attached patch. Please let me know if it looks OK,
thanks! bootstrap + regtest all languages looks good, with just new PASSes
for the new testcase.

FAIL 103 103
PASS 543592 543627
UNSUPPORTED 15298 15298
UNTESTED 136 136
XFAIL 4130 4130
XPASS 20 20


-Lewis
[PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]

libcpp recognizes a lone \r as a valid line ending, so the infrastructure
for retrieving source lines to be output in diagnostics needs to do the
same. This patch fixes file_cache_slot::get_next_line() accordingly so that
diagnostics display the correct part of the source when \r line endings are in
use.

gcc/ChangeLog:

PR preprocessor/91733
* input.cc (find_end_of_line): New helper function.
(file_cache_slot::get_next_line): Recognize \r as a line ending.
* diagnostic-show-locus.cc (test_escaping_bytes_1): Adapt selftest
since \r will now be interpreted as a line-ending.

gcc/testsuite/ChangeLog:

PR preprocessor/91733
* c-c++-common/pr91733.c: New test.

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 6eafe19785f..d267d2c258d 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -5508,7 +5508,7 @@ test_tab_expansion (const line_table_case _)
 static void
 test_escaping_bytes_1 (const line_table_case _)
 {
-  const char content[] = "before\0\1\2\3\r\x80\xff""after\n";
+  const char content[] = "before\0\1\2\3\v\x80\xff""after\n";
   const size_t sz = sizeof (content);
   temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
@@ -5523,18 +5523,18 @@ test_escaping_bytes_1 (const line_table_case _)
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
 return;
 
-  /* Locations of the NUL and \r bytes.  */
+  /* Locations of the NUL and \v bytes.  */
   location_t nul_loc
 = linemap_position_for_line_and_column (line_table, ord_map, 1, 7);
-  location_t r_loc
+  location_t v_loc
 = linemap_position_for_line_and_column (line_table, ord_map, 1, 11);
   gcc_rich_location richloc (nul_loc);
-  richloc.add_range (r_loc);
+  richloc.add_range (v_loc);
 
   {
 test_diagnostic_context dc;
 diagnostic_show_locus (, , DK_ERROR);
-ASSERT_STREQ (" before \1\2\3 \x80\xff""after\n"
+ASSERT_STREQ (" before \1\2\3\v\x80\xff""after\n"
  "   ^   ~\n",
  pp_formatted_text (dc.printer));
   }
@@ -5544,7 +5544,7 @@ test_escaping_bytes_1 (const line_table_case _)
 dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_UNICODE;
 diagnostic_show_locus (, , DK_ERROR);
 ASSERT_STREQ
-  (" before<80>after\n"
+  (" before<80>after\n"
"   ^~~~\n",
pp_formatted_text (dc.printer));
   }
@@ -5552,7 +5552,7 @@ test_escaping_bytes_1 (const line_table_case _)
 test_diagnostic_context dc;
 dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_BYTES;
 diagnostic_show_locus (, , DK_ERROR);
-ASSERT_STREQ (" before<00><01><02><03><0d><80>after\n"
+ASSERT_STREQ (" before<00><01><02><03><0b><80>after\n"
  "   ^~~~\n",
  pp_formatted_text (dc.printer));
   }
diff --git a/gcc/input.cc b/gcc/input.cc
index 2acbfdea4f8..060ca160126 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -646,6 +646,37 @@ file_cache_slot::maybe_read_data ()
   return read_data ();
 }
 
+/* Helper function for file_cache_slot::get_next_line (), to find the end of
+   the next line.  Returns with the memchr convention, i.e. nullptr if a line
+   terminator was not found.  We need to determine line endings in the same
+   manner that libcpp does: any of \n, \r\n, or \r is a line ending.  */
+
+static char *
+find_end_of_line (char *s, size_t len)
+{
+  for (const auto end = s + len; s != end; ++s)
+{
+  if (*s == '\n')
+   return s;
+  if (*s == '\r')
+   {
+ const auto next = s + 1;
+ if (next == end)
+   {
+ /* Don't find the line ending if \r is the very last character
+in the buffer; we do not know if it's the end of the file or
+just the end of what has been read so far, and we wouldn't
+want to break in the middle of what's actually a \r\n
+sequence.  Instead, we will handle the case of a file ending
+in a \r later.  */
+ 

  1   2   >