Michael Gold <[email protected]> writes: > On Tue, May 26, 2009 at 00:53:46 +0200, [email protected] wrote: > ... >> We need to be able to determine the boundaries of a token that has >> been read, for error reporting. We cannot rely on the stm used by the >> token reader to determine the beginning position of a read token, >> since it is skipping white characters. > > This behaviour could be changed by > - adding a flag that causes token_read to return whitespace as a token; > or, > - adding a function/flag to advance to the beginning of the next token >
It is preferable it is not work of the tokeniser module, I think. Adding a new function for this seems nice for me. Although we could simply the API if `pdf_token_reader_new' function consumes characters until the first token, and `pdf_token_read' does same thing after of read each token, then we could assume the token always is at the current position of the stream, therefore we could use the `pdf_stm_tell' function to get the beginning of each token. Finally, I think we will not need the ending offset of a token.
