On Tue, May 26, 2009 at 03:18:02 +0200, David Vazquez wrote: > > Michael Gold <[email protected]> writes: > > > On Tue, May 26, 2009 at 00:53:46 +0200, [email protected] wrote: > >> We need to be able to determine the boundaries of a token that has > >> been read, for error reporting. We cannot rely on the stm used by the > >> token reader to determine the beginning position of a read token, > >> since it is skipping white characters. > > > > This behaviour could be changed by > > - adding a flag that causes token_read to return whitespace as a token; > > or, > > - adding a function/flag to advance to the beginning of the next token > > > > It is preferable it is not work of the tokeniser module, I think. > > Adding a new function for this seems nice for me. Although we could > simply the API if `pdf_token_reader_new' function consumes characters > until the first token, and `pdf_token_read' does same thing after of > read each token, then we could assume the token always is at the > current position of the stream, therefore we could use the > `pdf_stm_tell' function to get the beginning of each token.
Skipping whitespace in the constructor would complicate things, since the constructor (and presumably pdf_token_reader_reset) would then be able to return the same error codes as pdf_token_read; and if an error like PDF_EAGAIN occurred after reading some data, there would be no sane way to handle it since the constructor doesn't return an object on failure. Similarly, skipping whitespace after a token would be awkward because pdf_read_token would have to return a failure code (and hang onto the finished token) if it got an error while reading the whitespace. > Finally, I think we will not need the ending offset of a token. I don't see why it would be needed, but it's trivial to determine since the tokeniser never reads past the end of a token (except for peeking one byte ahead when necessary). -- Michael
signature.asc
Description: Digital signature
