Bart <b...@freeuk.com> writes:

> On 18/06/2018 11:45, Chris Angelico wrote:
>> On Mon, Jun 18, 2018 at 8:33 PM, Bart <b...@freeuk.com> wrote:
>
>
>>> You're right in that neither task is that trivial.
>>>
>>> I can remove comments by writing a tokeniser which scans Python source and
>>> re-outputs tokens one at a time. Such a tokeniser normally ignores comments.
>>>
>>> But to remove type hints, a deeper understanding of the input is needed. I
>>> would need a parser rather than a tokeniser. So it is harder.
>>
>> They would actually both end up the same. To properly recognize
>> comments, you need to understand enough syntax to recognize them. To
>> properly recognize type hints, you need to understand enough syntax to
>> recognize them. And in both cases, you need to NOT discard important
>> information like consecutive whitespace.
>
> No. If syntax is defined on top of tokens, then at the token level,
> you don't need to know any syntax. The process that scans characters
> looking for the next token, will usually discard comments. Job done.

You don't even need to scan for tokens other than strings.  From what I
read in the documentation a simple scanner like this would do the trick:

  %option noyywrap
  %x sqstr dqstr sqtstr dqtstr
  %%
   
  \'              ECHO; BEGIN(sqstr);
  \"              ECHO; BEGIN(dqstr);
  \'\'\'          ECHO; BEGIN(dqtstr);
  \"\"\"          ECHO; BEGIN(dqtstr);
   
  <dqstr>\"       |
  <sqstr>\'       |
  <sqtstr>\'\'\'  |
  <dqtstr>\"\"\"  ECHO; BEGIN(INITIAL);
   
  <sqstr>\\\'                           |
  <dqstr>\\\"                           |
  <sqstr,dqstr,sqtstr,dqtstr,INITIAL>.  ECHO;
   
  #.*
   
  %%
  int main(void) { yylex(); }

and it's only this long because there are four kinds of string.  Not
being a Python expert, there may be some corner case errors.  And really
there are comments that should not be removed such as #! on line 1 and
encoding declarations, but they would just need another line or two.

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to