On Wed, 2018-01-17 at 16:12 +0100, Georg Chini wrote:
On 14.01.2018 21:47, Tanu Kaskinen wrote:ry from
the heap are quite heavy operations, and your approach will do a
lot of
those operations. Even reading a simple integer will, I suppose,
involve first allocating a new string. I can't say for sure that the
performance penalty makes any practical difference, but my intuition
says that we should use the approach that I proposed that requires far
fewer heap memory allocations.
I can accept the argument that my solution unnecessarily allocates
strings in some situations. Though I do not think it has much practical
impact, most of it can be avoided by using special functions to
retrieve
numbers or booleans. I am also quite sure that a function similar to
the
*_split_in_place() functions can be implemented to completely avoid
the allocation of unnecessary strings.
An in-place split function seems feasible indeed.
I already provided an example about how I'd like the parsing to
work.
Apparently you didn't like that for some reason? I'll copy the
example
here again:
const char *state = message_parameters;
if (pa_message_read_start_list(&state) < 0)
fail("Expected a list.");
while (pa_message_read_end_list(&state) < 0) {
char *path;
char *description;
if (pa_message_read_path(&state, &path) < 0)
fail("Expected a path.");
if (pa_message_read_string(&state, &description) < 0)
fail("Expected a string.")
/* Do something with path and description. */
pa_xfree(description);
pa_xfree(path);
}
As said above, I would prefer the single parsing function until
the need for more complex functions arises. We should not be
too fine grained, this will lead to confusion which function must
be used when.
Can you elaborate, what risk of confusion do you see?
From my point of view, your solution has several issues.
1) The example above is not extensible. What is
pa_message_read_end_list()
supposed to do? Skip to the next element in the list indicated by { and
return
zero if } is encountered instead? If yes, this would not work if you
add
an item
to the inner list, because at the next iteration it would go to the
additional
element instead of the next list. On the other hand, If the function is
supposed
to skip to the end of a list, you cannot use it to iterate over list
elements.
pa_message_read_end_list() is supposed to consume the closing brace
(and any random characters, since those are now allowed), nothing else.
If it encounters an opening brace or end-of-string, it's supposed to
fail.
My example does not work with the current list-handlers message, that's
true.
To
make your solution extensible you would at least need three functions:
pa_message_find_list_start() - find the beginning of a list
pa_message_find_list_end() - skip to the end of a list
pa_message_find_list_element() - find the next element in a list
All three functions need to handle escaping, nesting and error
checking.
I don't know why you added "find" to the function names. You're
probably guessing wrong the semantics I intended to have with
pa_message_list_start() and pa_message_list_end().
pa_message_read_list_start() should only consume the next opening brace
and any whitespace preceding it (by "whitespace" I mean any ignorable
characters). If it encounters a closing brace or end-of-string, it
should fail. There's no need for escaping or nesting handling.
pa_message_read_list_end() is explained above.
A "skip-to-end" function is indeed needed, and it will have to handle
escaping and nesting.
Here's an updated example that should work with the current list-
handlers (with an "implicit" top-level list):
while (pa_message_read_list_start(&state) > 0) {
read path
read description
pa_message_skip_rest_of_list(&state) /* Consumes the list
closing brace and everything preceding it. */
}
if (!pa_message_eof(&state)) {
/* This can happen if an unexpected closing brace is
encountered. */
fail("Malformed data");
}
As you can see, the updated example is exactly the same length as your
proposal, except for the pa_message_eof() check that you didn't have,
but you should have a similar check, because otherwise you don't know
if the parsing loop ended because of end-of-string or malformed input.