Reading a file for lexing, one way to do it...

Sean Charles Mon, 15 Jul 2013 13:02:48 -0700

OK, well I have produced this tiny little program which so far does what I 
wanted as a first step, but I am a little confused by my somewhat miraculous 
arrival at this point and I want to make sure that I understand what is going 
on.


I ran it with tracing enabled on a small file containing just "ABC\n" and it 
was reassuring to see it do exactly what I thought it would do in the order 
that I expected it to BUT even so I would appreciate a little reassurance that 
I am on the right lines with it.

Here is the code:

lexit(Filename, Tokens) :-
    open(Filename, read, In),
    lexread(In, Tokens),
    close(In).

lexread(In, _) :- at_end_of_stream(In), !.

lexread(In, [ chr(C, Line, Col) | Tokens ]) :-
    stream_line_column(In, Line, Col),
    get_char(In, C),
    lexread(In, Tokens).

Running it:

lexit('small.txt',T).

T = [chr('A',1,1),chr('B',1,2),chr('C',1,3),chr('\n',1,4)|_]

Somehow I managed to figure out that I could put the "chr()" term inside the 
head as I read somewhere on stack overflow recently that you could do that to 
save a step or something. See, I am already running on vague, that's my lack of 
Prolog experience showing already!

The "confusing" bit is this:

    lexread(In, [ chr(C, Line, Col) | Tokens ]) :-

I can see that "Tokens" remains uninstantiated until the end-of-file condition 
triggers, at which point the complete call stack is picked up but I am unsure 
of the reasoning as to why the list comes out in the correct order, I think. I 
am seeing in my head a whole bund go .() "conses" all waiting to go ff one 
after the other.

Then this line:

    stream_line_column(In, Line, Col),

instantiates Line and Col thus the term cur(C,Line,Col) is now fully 
instantiated and then when the tail call to lexread() is made, a new temporary 
variable is created for Tokens because it is still uninstantiated. This 
continues until EOF at which point the stack frame is unwound and the list is 
constructed but why does it appear to be "right" i.e. the tokens read left to 
right in the same order as the characters in the file. I think I know but I am 
still al title shaky at this point!

I have used Haskell for a few years now and on a memory consumption 
perspective, I have a hunch this method is very very bad as it would be 
creating huge swathes of stack frames especially for a very very large file but 
I am still learning. I have no doubt that there is a cleaner way using DCG-s 
but for now this is where I am thinking on.

Thanks,
Sean.

_______________________________________________
Users-prolog mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/users-prolog

Reading a file for lexing, one way to do it...

Reply via email to