Re: Reading a file for lexing, one way to do it...

Daniel Diaz Mon, 22 Jul 2013 22:07:36 -0700

Hello,

Le 15 juil. 2013 à 23:47, Sean Charles <[email protected]> a écrit :


> OK, well I have produced this tiny little program which so far does what I 
> wanted as a first step, but I am a little confused by my somewhat miraculous 
> arrival at this point and I want to make sure that I understand what is going 
> on.
> 
> I ran it with tracing enabled on a small file containing just "ABC\n" and it 
> was reassuring to see it do exactly what I thought it would do in the order 
> that I expected it to BUT even so I would appreciate a little reassurance 
> that I am on the right lines with it.
> 
> Here is the code:
> 
> lexit(Filename, Tokens) :-
>     open(Filename, read, In),
>     lexread(In, Tokens),
>     close(In).
> 
> lexread(In, _) :- at_end_of_stream(In), !.
> 
> lexread(In, [ chr(C, Line, Col) | Tokens ]) :-
>     stream_line_column(In, Line, Col),
>     get_char(In, C),
>     lexread(In, Tokens).
> 
> Running it:
> 
> lexit('small.txt',T).
> 
> T = [chr('A',1,1),chr('B',1,2),chr('C',1,3),chr('\n',1,4)|_]

Remark: your list is ended by a variable (reported as an underscore by the 
top-level). This is totally OK (and often used to further add elements to the 
end of the list). If it is not the wanted behavior (you want a proper list 
ended by the empty list []), you have to replace the first lexread/2 clause by:

lexread(In, []) :- at_end_of_stream(In), !.

> 
> Somehow I managed to figure out that I could put the "chr()" term inside the 
> head as I read somewhere on stack overflow recently that you could do that to 
> save a step or something. See, I am already running on vague, that's my lack 
> of Prolog experience showing already!
> 
> The "confusing" bit is this:
> 
>     lexread(In, [ chr(C, Line, Col) | Tokens ]) :-
> 
> I can see that "Tokens" remains uninstantiated until the end-of-file 
> condition triggers, at which point the complete call stack is picked up but I 
> am unsure of the reasoning as to why the list comes out in the correct order, 
> I think. I am seeing in my head a whole bund go .() "conses" all waiting to 
> go ff one after the other.
> 
> Then this line:
> 
>     stream_line_column(In, Line, Col),
> 
> instantiates Line and Col thus the term cur(C,Line,Col) is now fully 
> instantiated and then when the tail call to lexread() is made, a new 
> temporary variable is created for Tokens because it is still uninstantiated. 
> This continues until EOF at which point the stack frame is unwound and the 
> list is constructed but why does it appear to be "right" i.e. the tokens read 
> left to right in the same order as the characters in the file. I think I know 
> but I am still al title shaky at this point!
> 


Yes, this is the way to write it in Prolog. It is totally equivalent to:

% equivalent version
lexread(In, ListTok) :-
    stream_line_column(In, Line, Col),
    get_char(In, C),
    lexread(In, Tokens),
    ListTok = [ chr(C, Line, Col) | Tokens ].

which could be more in the spirit of C since the list seems created only when 
both the head and the tail are known. However it is not the case: the Prolog = 
is really an equation (over syntactic trees) and not an assignment as in C. So, 
ListTok = [ chr(C, Line, Col) | Tokens ] is the same as replacing all 
occurrences of ListTok by the right hand side [ chr(C, Line, Col) | Tokens ] 
(this is true as long as cuts are not used). From the implementation point of 
view, when [ chr(C, Line, Col) | Tokens ] is encountered in the head, the 
calling argument is linked (with a pointer) with a cons cell whose both the 
head and the tail are fresh variables which will be in turn linked (with 
pointers) to the created args. NB: a free variables is nothing else than an 
self-reference (a pointer to itself).


> I have used Haskell for a few years now and on a memory consumption 
> perspective, I have a hunch this method is very very bad as it would be 
> creating huge swathes of stack frames especially for a very very large file 
> but I am still learning. I have no doubt that there is a cleaner way using 
> DCG-s but for now this is where I am thinking on.

Don't worry, your code benefits from the so called Last Call Optimization (LCO) 
(aka tail call optimization): the stack frame is recovered before calling the 
last predicate of the body. This is important for recursive clauses like the 
second clause of lexread/2. With LCO, the stack is recovered before the 
recursive call, thus the whole predicate runs in constant stack space (I mean 
the control stack, also called "local stack" in the WAM ; the heap obviously 
grows to store the resulting list).

Note that, in the above "equivalent version" the recursion is no longer 
terminal, and last (or tail) call optimization (LCO) no longer applies (a new 
frame is allocated in the local stack at each recursive call). Yet an another 
reason to prefer your initial version.

Daniel


> 
> Thanks,
> Sean.
> 
> 
> 
> 
> -- 
> Ce message a été vérifié par MailScanner pour des virus ou des polluriels et 
> rien de suspect n'a été trouvé.
> _______________________________________________
> Users-prolog mailing list
> [email protected]
> https://lists.gnu.org/mailman/listinfo/users-prolog


-- 
Ce message a ete verifie par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a ete trouve.

_______________________________________________
Users-prolog mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/users-prolog

Re: Reading a file for lexing, one way to do it...

Reply via email to