Hello venerable list…
Is it more efficient to store a list of lexemes as character codes or single
character atoms?
Without knowing the C code other than what I know of the FFI, is it more
compact to store a list of integers which presumably represent themselves or is
it more efficient to use single character atoms ?
I am guessing that the FlyWeight pattern is used or similar which means that
the single character atoms are actually pointers to the atom so a list of one
hundred 'a'-s is in fact a list of one hundred pointers into the atom store but
is the pointer size bigger than the character code size ?
I ask because my lexer is working and producing output like this:
| ?- feltlex('small.felt',X).
X = [comment(block,pos(1,1),[' ','S',t,r,i,n,g,'
',t,e,s,t,i,n,g,'.','\n','\n',' ',' ',' ','A',l,l,o,w,'
',b,a,c,k,s,l,a,s,h,e,d,' ',d,e,l,i,m,t,e,r,' ',i,n,' ',t,h,e,'
',s,e,q,u,e,n,c,e,'.','.','.','.','\n']),chr(/),comment(single,pos(6,1),['
','D',o,u,b,l,e,' ',q,u,o,t,e,d,'
',s,t,r,i,n,g,s,'.','.','.']),string(double,pos(7,1),[c,h,e,e,s,\,'"',e,b,u,r,g,e,r]),string(double,pos(8,1),[c,h,e,e,s,\,'''',e,b,u,r,g,e,r]),comment(single,pos(10,1),['
','S',i,n,g,l,e,' ',q,u,o,t,e,d,'
',s,t,r,i,n,g,s,'.','.','.']),string(single,pos(11,1),[c,h,e,e,s,e,\,'"',b,u,r,g,e,r]),string(single,pos(12,1),[c,h,e,e,s,e,\,'''',b,u,r,g,e,r])]
That's from a source file:
/* String testing.
Allow backslashed delimter in the sequence....
*/
; Double quoted strings...
"chees\"eburger"
"chees\'eburger"
; Single quoted strings...
'cheese\"burger'
'cheese\'burger'
Not a brilliant example but it was for testing the comment handling and string
consumption allowing for a backslashed single or double quote to be part of the
string. It's parsing using get_char/peek_char with LA(1) and that allows me to
cope well enough for now. It is s-expression based.
For a really large source file, I want to make sure that I am being as
efficient with internal storage as possible because once I have completed the
lexer I have to be able to create an AST from it and then translate it into
something else and I have already found out recently that GNU Prolog seg-faults
under OSX when dealing with large amounts of in-memory data.
So, anybody know what is the more space compact representation, atoms or
character codes ?
Thanks,
Sean.
_______________________________________________
Users-prolog mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/users-prolog