Re: [Tinycc-devel] [patch] adding path resolution to #line directives

2022-05-07 Thread Raul Hernandez
Hi,

You are indeed correct regarding the symlinks, I did not realize that would’ve 
been an issue.
The patch below contains a fix for the prefixing of absolute paths bug; this 
fixes our original issue as well, so we don’t actually need that 
canonicalization any longer :). 

It also contains new API for getting backtraces as char**’s. The code for it 
isn’t super pretty, though, suggestions are welcome.

Raul




backtrace_str.patch
Description: Binary data
___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] [patch] adding path resolution to #line directives

2022-05-06 Thread Michael Matz

Hey,

On Fri, 6 May 2022, Raul Hernandez wrote:

It would seem better to canonicalize during generating this, because 
the above and this don't look equivalent anyway (they are equivalent 
only when the above relative path is less that seven levels deep from 
/).


In our case this isn’t an issue, since the .c file is always compiled 
under /tmp/v_*/, so two levels would actually be enough here.


Sure, but TCC also has to work on input not coming from V.

The problem is that we cannot use an absolute path either (/home/…), 
because tcc will always prepend the file’s directory, even if the #line 
directive contains an absolute path. I guess a different fix could be to 
check for absolute paths, and prevent tcc from prepending the file 
surname if that’s the case,


That seems like a fix for what clearly is a bug in TCC, yes.  TCC 
shouldn't prepend something that is already an absolute path.


but I still think resolving relative paths 
is useful (one could generate #line directives like 
/my/lang/stdlib/../thirdparty/foo.c), and tcc would resolve them down to 
/my/lang/thirdparty/foo.c for cleaner backtraces, for example.


Well, IMHO TCC should do as little processing on what the producer put 
into the '#line' directives, simply because one has to assume that the 
producer did whatever it did for good reasons.  Prepending the compile 
directory to relative paths is borderline acceptably IMHO, but more than 
that: I don't think so.


So you simply remove '../' components without a matching directory 
part. That isn't correct in all circumstances.


This is only ever hit when enough ../’s have been parsed so that the 
beginning of the path (inserted by tcc or otherwise) has been completely 
removed, and we’re left at the root of the filesystem. In that case, 
/../ can be collapsed down to / . Could you give me an example of where 
this would be incorrect?


Hmm, you're right, it should be okay.  (It still feels strange, as it 
relies on the fact that '/..' is equal to '/').  But that still leaves the 
issue of symlinked directories, removing "dir/../" components from paths 
simply isn't preserving the path when 'dir' is a symlink to another 
directory:


% mkdir -p sub/sub2; touch sub/sub2/foo
% ln -sf sub/sub2 symlink
% ls symlink/../sub2/foo
symlink/../sub2/foo
% ls sub2/foo
ls: cannot access 'sub2/foo': No such file or directory

So, no, I don't think the patch is okay, it can transform correct input 
into incorrect output, even if it sometimes also transforms into nicer 
output.


You might consider using realpath(3) (which is in POSIX) to do all of 
this for you.


I tried that, but realpath() checks if the path actually exists on the 
filesystem, and returns an error if it does not, unfortunately.


Well, yeah, it has to, due to the above symlink problem.

This might be generally usefull to libtcc users, and the variant 
printing it out can be written in terms of the char** variant; so that 
seems sensible to have.


I can include that function then; the main__foo -> main.foo change is 
indeed very V-specific.


Related: libbacktrace has an API where it accepts a callback, which is 
then called with a backtrace frame's information 
(https://github.com/ianlancetaylor/libbacktrace/blob/master/backtrace.h#L91-L102 
), 
so that might be a useful alternative too.


Maybe, yeah.  I don't have a strong opinion there.


Ciao,
Michael.___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] [patch] adding path resolution to #line directives

2022-05-06 Thread Raul Hernandez
Hi,


> It would seem better to canonicalize during generating this, because the 
> above and this don't look equivalent anyway (they are equivalent only when 
> the above relative path is less that seven levels deep from /).

In our case this isn’t an issue, since the .c file is always compiled under 
/tmp/v_*/, so two levels would actually be enough here.


The problem is that we cannot use an absolute path either (/home/…), because 
tcc will always prepend the file’s directory, even if the #line directive 
contains an absolute path.
I guess a different fix could be to check for absolute paths, and prevent tcc 
from prepending the file surname if that’s the case, but I still think 
resolving relative paths is useful
(one could generate #line directives like /my/lang/stdlib/../thirdparty/foo.c), 
and tcc would resolve them down to /my/lang/thirdparty/foo.c for cleaner 
backtraces, for example.


> So you simply remove '../' components without a matching directory part. That 
> isn't correct in all circumstances.

This is only ever hit when enough ../’s have been parsed so that the beginning 
of the path (inserted by tcc or otherwise) has been completely removed, and 
we’re left at the root of the filesystem. In that case, /../ can be collapsed 
down to / .
Could you give me an example of where this would be incorrect?


> You might consider using realpath(3) (which is in POSIX) to do all of this 
> for you.

I tried that, but realpath() checks if the path actually exists on the 
filesystem, and returns an error if it does not, unfortunately.


> This might be generally usefull to libtcc users, and the variant printing it 
> out can be written in terms of the char** variant; so that seems sensible to 
> have.

I can include that function then; the main__foo -> main.foo change is indeed 
very V-specific.

Related: libbacktrace has an API where it accepts a callback, which is then 
called with a backtrace frame's information 
(https://github.com/ianlancetaylor/libbacktrace/blob/master/backtrace.h#L91-L102
 
),
 so that might be a useful alternative too.


Raul

___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] [patch] adding path resolution to #line directives

2022-05-05 Thread Michael Matz

Hey,

On Thu, 5 May 2022, Raul Hernandez wrote:


The code we generate looks something like this:

    #line 29
"../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v"


It would seem better to canonicalize during generating this, because the 
above ...



   /tmp/v_1000/../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c
.v:53: at panic_debug: Backtrace


... and this don't look equivalent anyway (they are equivalent only when 
the above relative path is less that seven levels deep from /).



    @@ -2042,6 +2042,22 @@ include_done:
                    pstrcpy(buf, sizeof buf, file->true_filename);
                    *tcc_basename(buf) = 0;
                    pstrcat(buf, sizeof buf, (char *)tokc.str.data);
    +                // collapse relative `../`s
    +                while ((q = strstr(buf, "../")) != NULL) {
    +                    (q == buf) ? (*q = '\0') : (*(q-1) = '\0');
    +                    last_slash = strrchr(buf, '/');
    +                    if (last_slash == NULL) {
    +                        memmove(buf, q + 2, strlen(q + 2) + 1);


So you simply remove '../' components without a matching directory part. 
That isn't correct in all circumstances.  You might consider using 
realpath(3) (which is in POSIX) to do all of this for you.  But as said, 
you might also be better off to just do that during generating the #line 
directives, so as to be independend of compiler behaviour.



Would you be willing to merge something this?

We’ve also added a function that returns the backtrace as a char** rather
than printing it out,


This might be generally usefull to libtcc users, and the variant printing 
it out can be written in terms of the char** variant; so that seems 
sensible to have.



and modified the backtrace functions to replace
e.g. main__foo with main.foo in symbol names,


But this is quite specific to V, so not too applicable without making it 
more general, at which point it doesn't look too attractive anymore.



Ciao,
Michael.___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


Re: [Tinycc-devel] [patch] adding path resolution to #line directives

2022-05-05 Thread Domingo Alvarez Duarte

Why don't you do the collapse at generation time ?

On 5/5/22 10:50, Raul Hernandez wrote:

Hi, list.


Over at vlang/v, we use #line directives to map line numbers in the 
generated C to the respective locations in the original code in 
backtraces, etc.

The code we generate looks something like this:

    #line 29 
"../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v"

    void panic_debug(void) {
        #line 53 
"../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v"

tcc_backtrace("Backtrace");
    }

    #line 1 "../../../../../../home/spaceface/git/v/v/test.v"
    void main__inner(void) {
        #line 1 "../../../../../../home/spaceface/git/v/v/test.v"
panic_debug();
    }

    #line 2 "../../../../../../home/spaceface/git/v/v/test.v"
    void main__outer(void) {
        #line 2 "../../../../../../home/spaceface/git/v/v/test.v"
main__inner();
    }

    #line 3 "../../../../../../home/spaceface/git/v/v/test.v"
    int main(void) {
        #line 3 "../../../../../../home/spaceface/git/v/v/test.v"
main__outer();
return 0;
    }


Other C compilers collapse the `../`s in the #line paths, but TCC does 
not, so currently the backtraces it produces look like this:


/tmp/v_1000/../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v:53: 
at panic_debug: Backtrace
/tmp/v_1000/../../../../../../home/spaceface/git/v/v/test.v:1: by 
main__inner
/tmp/v_1000/../../../../../../home/spaceface/git/v/v/test.v:2: by 
main__outer

/tmp/v_1000/../../../../../../home/spaceface/git/v/v/test.v:3: by main


I decided to try and fix this, and came up with the following patch:

    diff --git a/tccpp.c b/tccpp.c
    index 2ff5d5e..47ac859 100644
    --- a/tccpp.c
    +++ b/tccpp.c
    @@ -1797,7 +1797,7 @@ ST_FUNC void preprocess(int is_bof)
    {
TCCState *s1 = tcc_state;
        int i, c, n, saved_parse_flags;
    -    char buf[1024], *q;
    +    char buf[1024], *q, *last_slash;
        Sym *s;
saved_parse_flags = parse_flags;
    @@ -2042,6 +2042,22 @@ include_done:
      pstrcpy(buf, sizeof buf, file->true_filename);
      *tcc_basename(buf) = 0;
      pstrcat(buf, sizeof buf, (char *)tokc.str.data);
    +        // collapse relative `../`s
    +        while ((q = strstr(buf, "../")) != NULL) {
    +            (q == buf) ? (*q = '\0') : (*(q-1) = '\0');
    +            last_slash = strrchr(buf, '/');
    +            if (last_slash == NULL) {
    +                memmove(buf, q + 2, strlen(q + 2) + 1);
    +            } else {
    +                memmove(last_slash + 1, q + 2, strlen(q + 2) + 1);
    +            }
    +        }
    +        // collapse `foo//bar` to `foo/bar`
    +        c = strlen(buf);
    +        while((q = strstr(buf, "//")) != NULL) {
    +            memmove(q, q+1, c - (q - buf));
    +            c--;
    +        }
      tcc_debug_putfile(s1, buf);
  } else if (parse_flags & PARSE_FLAG_ASM_FILE)
      break;


With it, that same backtrace now looks like this:

/home/spaceface/git/v/v/vlib/builtin/builtin.c.v:53: at panic_debug: 
Backtrace

/home/spaceface/git/v/v/test.v:1: by main__inner
/home/spaceface/git/v/v/test.v:2: by main__outer
/home/spaceface/git/v/v/test.v:3: by main


Would you be willing to merge something this?

We’ve also added a function that returns the backtrace as a char** 
rather than printing it out, and modified the backtrace functions to 
replace e.g. main__foo with main.foo in symbol names, but I presume 
these patches won’t be as useful outside of V.


Raul


___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel


[Tinycc-devel] [patch] adding path resolution to #line directives

2022-05-05 Thread Raul Hernandez
Hi, list.


Over at vlang/v, we use #line directives to map line numbers in the generated C 
to the respective locations in the original code in backtraces, etc.
The code we generate looks something like this:

#line 29 "../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v"
void panic_debug(void) {
#line 53 
"../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v"
tcc_backtrace("Backtrace");
}

#line 1 "../../../../../../home/spaceface/git/v/v/test.v"
void main__inner(void) {
#line 1 "../../../../../../home/spaceface/git/v/v/test.v"
panic_debug();
}

#line 2 "../../../../../../home/spaceface/git/v/v/test.v"
void main__outer(void) {
#line 2 "../../../../../../home/spaceface/git/v/v/test.v"
main__inner();
}

#line 3 "../../../../../../home/spaceface/git/v/v/test.v"
int main(void) {
#line 3 "../../../../../../home/spaceface/git/v/v/test.v"
main__outer();
return 0;
}


Other C compilers collapse the `../`s in the #line paths, but TCC does not, so 
currently the backtraces it produces look like this:


/tmp/v_1000/../../../../../../home/spaceface/git/v/v/vlib/builtin/builtin.c.v:53:
 at panic_debug: Backtrace
/tmp/v_1000/../../../../../../home/spaceface/git/v/v/test.v:1: by 
main__inner
/tmp/v_1000/../../../../../../home/spaceface/git/v/v/test.v:2: by 
main__outer
/tmp/v_1000/../../../../../../home/spaceface/git/v/v/test.v:3: by main


I decided to try and fix this, and came up with the following patch:

diff --git a/tccpp.c b/tccpp.c
index 2ff5d5e..47ac859 100644
--- a/tccpp.c
+++ b/tccpp.c
@@ -1797,7 +1797,7 @@ ST_FUNC void preprocess(int is_bof)
{
TCCState *s1 = tcc_state;
int i, c, n, saved_parse_flags;
-char buf[1024], *q;
+char buf[1024], *q, *last_slash;
Sym *s;

saved_parse_flags = parse_flags;
@@ -2042,6 +2042,22 @@ include_done:
pstrcpy(buf, sizeof buf, file->true_filename);
*tcc_basename(buf) = 0;
pstrcat(buf, sizeof buf, (char *)tokc.str.data);
+// collapse relative `../`s
+while ((q = strstr(buf, "../")) != NULL) {
+(q == buf) ? (*q = '\0') : (*(q-1) = '\0');
+last_slash = strrchr(buf, '/');
+if (last_slash == NULL) {
+memmove(buf, q + 2, strlen(q + 2) + 1);
+} else {
+memmove(last_slash + 1, q + 2, strlen(q + 2) + 1);
+}
+}
+// collapse `foo//bar` to `foo/bar`
+c = strlen(buf);
+while((q = strstr(buf, "//")) != NULL) {
+memmove(q, q+1, c - (q - buf));
+c--;
+}
tcc_debug_putfile(s1, buf);
} else if (parse_flags & PARSE_FLAG_ASM_FILE)
break;


With it, that same backtrace now looks like this:

/home/spaceface/git/v/v/vlib/builtin/builtin.c.v:53: at panic_debug: 
Backtrace
/home/spaceface/git/v/v/test.v:1: by main__inner
/home/spaceface/git/v/v/test.v:2: by main__outer
/home/spaceface/git/v/v/test.v:3: by main


Would you be willing to merge something this?

We’ve also added a function that returns the backtrace as a char** rather than 
printing it out, and modified the backtrace functions to replace e.g. main__foo 
with main.foo in symbol names, but I presume these patches won’t be as useful 
outside of V.

Raul

___
Tinycc-devel mailing list
Tinycc-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/tinycc-devel