Hey Sergey,

Thanks for this! I am impressed at your efforts! Unfortunately, I think
this is a bit premature. I have more work to do on my exsymtab fork before
I was going to bring it up for inclusion in tcc itself. (Indeed, I am not
entirely sure that it is appropriate for inclusion in tcc.) Here are a
couple of reasons we should wait for a little while longer:

   1. The extended symbol table API is not documented at all.
   2. The symbol table copy takes O(N_sym^2) to run. It might be possible
   to speed this up, but for the moment I plan to work around it by
   implementing symbol table caching. I have not yet completed that work, so I
   consider the exsymtab project to be incomplete at the moment.
   3. This work is most of interest for folks using tcc as an in-memory C
   JIT compiler. In its current form it is nearly useless to those who want a
   fast compiler that produces binaries. After I implement symbol table
   caching, it may prove a bit more useful to the fast-compiler crowd, but
   it's not ready yet. We need to have a wider discussion about the merits of
   the extension before we include it in mob.
   4. As implemented, this cuts the maximum number of token symbols in
   half. (I needed to use one of the bits to indicate "extended symbol".)
   5. The current token lookup is based on a compressed trie that
   explicitly only supports A-Z, a-z, 0-9, and _. It does not support $ in
   identifiers and would need to be revised in order to do so. I chose to
   implement Phil Bagwell's Array Mapped Trie
   <http://lampwww.epfl.ch/papers/triesearches.pdf.gz> in the belief that
   it would perform better than a hash for lookup. Once I add symbol table
   caching, I hope to add (or switch to) Array Compressed Tries for even
   better cache utilization. But currently, I rely on have 63 allowed
   characters in identifiers.
   6. I know absolutely nothing about how the compilation and relocation
   stages modify the members of the symbol tables. It is a black box to me. As
   such, the copy procedure is a pile of ad-hoc data structure tests that is,
   in all likelihood, subtly broken and quite brittle. Adding this in its
   current state to tcc's codebase, especially without sufficient tests, could
   dampen efforts to change the symbol table handling or code generation.
   7. A separate idea that I plan to pursue on my fork is to extend how tcc
   pulls data in from file handles. I would like to make it hookable so that I
   could write hooks in my Perl module and have it interact directly with
   Perl's parser, rather than pulling all of the C code into a temporary
   buffer. This may go beyond the wishes of the community and merits further
   discussion.

For these reasons, I do not believe that the exsymtab fork, in its current
state, should be brought into the mob branch. I am more than happy to have
help, but let's wait a few more months until most of these issues have been
ironed out and we all have had a chance to discuss the merits and drawbacks
of extended symbol table support.


David
_______________________________________________
Tinycc-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

Reply via email to