Hello venerable list…

Is it more efficient to store a list of lexemes as character codes or single 
character atoms?

Without knowing the C code other than what I know of the FFI, is it more 
compact to store a list of integers which presumably represent themselves or is 
it more efficient to use single character atoms ?

I am guessing that the FlyWeight pattern is used or similar which means that 
the single character atoms are actually pointers to the atom so a list of one 
hundred 'a'-s is in fact a list of one hundred pointers into the atom store but 
is the pointer size bigger than the character code size ?

I ask because my lexer is working and producing output like this:

| ?- feltlex('small.felt',X).

X = [comment(block,pos(1,1),[' ','S',t,r,i,n,g,' 
',t,e,s,t,i,n,g,'.','\n','\n',' ',' ',' ','A',l,l,o,w,' 
',b,a,c,k,s,l,a,s,h,e,d,' ',d,e,l,i,m,t,e,r,' ',i,n,' ',t,h,e,' 
',s,e,q,u,e,n,c,e,'.','.','.','.','\n']),chr(/),comment(single,pos(6,1),[' 
','D',o,u,b,l,e,' ',q,u,o,t,e,d,' 
',s,t,r,i,n,g,s,'.','.','.']),string(double,pos(7,1),[c,h,e,e,s,\,'"',e,b,u,r,g,e,r]),string(double,pos(8,1),[c,h,e,e,s,\,'''',e,b,u,r,g,e,r]),comment(single,pos(10,1),['
 ','S',i,n,g,l,e,' ',q,u,o,t,e,d,' 
',s,t,r,i,n,g,s,'.','.','.']),string(single,pos(11,1),[c,h,e,e,s,e,\,'"',b,u,r,g,e,r]),string(single,pos(12,1),[c,h,e,e,s,e,\,'''',b,u,r,g,e,r])]

That's from a source file:

/* String testing.

   Allow backslashed delimter in the sequence....
*/

; Double quoted strings...
"chees\"eburger"
"chees\'eburger"

; Single quoted strings...
'cheese\"burger'
'cheese\'burger'


Not a brilliant example but it was for testing the comment handling and string 
consumption allowing for a backslashed single or double quote to be part of the 
string. It's parsing using get_char/peek_char with LA(1) and that allows me to 
cope well enough for now. It is s-expression based.

For a really large source file, I want to make sure that I am being as 
efficient with internal storage as possible because once I have completed the 
lexer I have to be able to create an AST from it and then translate it into 
something else and I have already found out recently that GNU Prolog seg-faults 
under OSX when dealing with large amounts of in-memory data.

So, anybody know what is the more space compact representation, atoms or 
character codes ?

Thanks,
Sean.



_______________________________________________
Users-prolog mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/users-prolog

Reply via email to