Re: [OPEN-ILS-DEV] PATCH: osrf_json.c (rewrite of jsonObjectToJSON[Raw])
On Mon, 17 Nov 2008 23:32:21 -0500, Scott McKellar [EMAIL PROTECTED] wrote: This patch is a rewrite of the jsonObjectToJSON and jsonObjectToJSONRaw functions. It is dependent on my previous patch to utils.c and utils.h, adding the new buffer_append_uescape function. One purpose is to replace a call to the uescape function with a call to the faster buffer_append_uescape function. The other purpose is to introduce a faster way to translate a jsonObject into a string. (Also in one spot I broke up a very long string literal into several smaller pieces so that it wouldn't wrap around in the editor.) In the existing jsonObjectToJSON function, we receive a pointer to a jsonObject and return a string of JSON. However we don't translate the original jsonObject directly. Instead, we create a modified clone of the original, inserting an additional JSON_HASH node wherever we find a classname. Then we translate the clone, and finally destroy it. It always struck me as an egregious waste to create and destroy a whole parallel object just so that we could turn it into a string. So I looked for a way to eliminate the cloning. The result is a modification of add_json_to_buffer(), a local function that recursively traverses and translates the jsonObject. When it sees a classname (and it has been asked to expand classnames), the new version inserts additional gibberish into the output text and then continues the traversal, without modifying or copying the original jsonObject at all. In my benchmark, this new approach was faster than the original by a factor of about 5. When I combined this change with the use of the new buffer_append_uencode function, it was faster by a factor of about 7.2. The benchmark used a moderately complex jsonObject about 5 or 6 levels deep, with both hashes and arrays, with classnames at several levels. The performance gain will no doubt depend on the contents of the jsonObject,but I haven't tried to isolate the variables. The new version is a bit trickier and harder to read than the old. In my opinion the speedup is worth the obscurity, because a lot of places in Evergreen will benefit. With this change, the jsonObjectEncodeClass function, which had been called only from jsonObjectToJSON, is no longer called from anywhere. It's the function that created the modified copy of the original jsonObject. I see no reason to keep it around. After a decent interval, if no one objects I shall submit a patch to eliminate it. Agreed. Likewise the jsonObjectToJSONRaw function is no longer called from anywhere. It translates a jsonObject to JSON without expanding classnames. In this rewritten version it is identical to jsonObjectToJSON except that it calls add_json_to_buffer() with a different parameter value. If we eliminate jsonObjectToJSONRaw, I can simplify add_json_to_buffer() a bit. However we might someday want to be able to translate a jsonObject without expanding classnames. If nobody minds, I'll leave this function where it is. Also agreed. We (or other code using OpenSRF) may need the Raw() version in the future. This code path is used /a lot/, so the speedups are greatly appreciated. I've run into one issue with these last 2 patches, in particular, there seems to be a subtle difference in uescape() and buffer_append_uescape(). The majority of the JSON parsing succeeds, but parsing certain bibliographic data (MARC records) always results in NULL when attempting to encode the containing object (biblio_record entry objects).If I replace the call to buffer_append_uescape() with a call to uescape() in osrf_json_object.c, then all is well. I would send you some sample data to test, but I've yet to recreate the issue outside of a running system. For now, I'm adding some logging to see if I can track it down. I just wanted to pass this on in case any lightbulbs went off. I'll let you if I find something... Thanks, Scott -b -- Bill Erickson | VP, Software Development Integration | Equinox Software, Inc. / The Evergreen Experts | phone: 877-OPEN-ILS (673-6457) | email: [EMAIL PROTECTED] | web: http://esilibrary.com
Re: [OPEN-ILS-DEV] PATCH: osrf_json.c (rewrite of jsonObjectToJSON[Raw])
--- On Tue, 11/18/08, Bill Erickson [EMAIL PROTECTED] wrote: From: Bill Erickson [EMAIL PROTECTED] snip I've run into one issue with these last 2 patches, in particular, there seems to be a subtle difference in uescape() and buffer_append_uescape(). The majority of the JSON parsing succeeds, but parsing certain bibliographic data (MARC records) always results in NULL when attempting to encode the containing object (biblio_record entry objects).If I replace the call to buffer_append_uescape() with a call to uescape() in osrf_json_object.c, then all is well. I would send you some sample data to test, but I've yet to recreate the issue outside of a running system. For now, I'm adding some logging to see if I can track it down. I just wanted to pass this on in case any lightbulbs went off. I'll let you if I find something... I'm not sure what you mean by always results in NULL. What is NULL? The return from the original uescape()? That would indicate invalid UTF-8 characters in the input string. Maybe you mean that the buffer_append_uescape() seems to work, but the resulting JSON string later fails to parse correctly. In my test harness I translated things both ways and tested with strcmp() to verify that I got the same results. However I didn't test with non-ASCII characters, so that's likely to be the weakest link. It may be that I'm mangling the UTF-8. I'm more inclined to think that you have invalid UTF-8 characters in some of your MARC records. You wouldn't know about it because uescape() silently discards an entire input string if any of it is invalid. The new function translates as far as it can before it hits the invalid characters. I *think* that the result would still be valid JSON, because we only translate the values of strings, which get enclosed by double quotes anyway. However the semantics of that string might be invalid at some higher level. For diagnostic purposes, I suggest either or both of the following: 1. Check the return code from buffer_append_uescape(), and log a message if it's non-zero. 2. Tweak buffer_append_uescape() so that, if it detects invalid UTF-8, it restores the growing_buffer to its original length. In effect, this change would make buffer_append_uescape() behave like uescape() in the face of invalid UTF-8. I can send you a patch for this if you like, but I don't know if you would want it based on the original code or on the already patched version. In the long run we should probably think about how best to respond to invalid UTF-8. For now we just want to identify the problem and fix it. Meanwhile you can get most of the speedup by reverting to the original uescape function but otherwise keeping the rewrite of jsonObjectToJSON(). Scott McKellar http://home.swbell.net/mck9/ct/
Re: [OPEN-ILS-DEV] PATCH: osrf_json.c (rewrite of jsonObjectToJSON[Raw])
--- On Tue, 11/18/08, Bill Erickson [EMAIL PROTECTED] wrote: OK, I tracked down 2 problems, the first makes sense, the second, not so much. The first problem was the result of attempting to escape quotes () and backslashes (\) in the final if/else block of buffer_append_uescape(). Since and \ are both = ' ', they were not getting escaped. Doh! blush/ Sorry. The second was caused by NULL jsonObject's being passed from OSRF_LIST_GET_INDEX (osrf_json_object:337) into add_json_to_buffer(). osrfList allows NULL entries, so it seems the old could should have been dying here, but for some reason only the new code suffers. This one I don't understand. Is it working now, somehow? If it is, then I won't worry about it. Scott McKellar http://home.swbell.net/mck9/ct/
[OPEN-ILS-DEV] PATCH: osrf_json.c (rewrite of jsonObjectToJSON[Raw])
This patch is a rewrite of the jsonObjectToJSON and jsonObjectToJSONRaw functions. It is dependent on my previous patch to utils.c and utils.h, adding the new buffer_append_uescape function. One purpose is to replace a call to the uescape function with a call to the faster buffer_append_uescape function. The other purpose is to introduce a faster way to translate a jsonObject into a string. (Also in one spot I broke up a very long string literal into several smaller pieces so that it wouldn't wrap around in the editor.) In the existing jsonObjectToJSON function, we receive a pointer to a jsonObject and return a string of JSON. However we don't translate the original jsonObject directly. Instead, we create a modified clone of the original, inserting an additional JSON_HASH node wherever we find a classname. Then we translate the clone, and finally destroy it. It always struck me as an egregious waste to create and destroy a whole parallel object just so that we could turn it into a string. So I looked for a way to eliminate the cloning. The result is a modification of add_json_to_buffer(), a local function that recursively traverses and translates the jsonObject. When it sees a classname (and it has been asked to expand classnames), the new version inserts additional gibberish into the output text and then continues the traversal, without modifying or copying the original jsonObject at all. In my benchmark, this new approach was faster than the original by a factor of about 5. When I combined this change with the use of the new buffer_append_uencode function, it was faster by a factor of about 7.2. The benchmark used a moderately complex jsonObject about 5 or 6 levels deep, with both hashes and arrays, with classnames at several levels. The performance gain will no doubt depend on the contents of the jsonObject,but I haven't tried to isolate the variables. The new version is a bit trickier and harder to read than the old. In my opinion the speedup is worth the obscurity, because a lot of places in Evergreen will benefit. With this change, the jsonObjectEncodeClass function, which had been called only from jsonObjectToJSON, is no longer called from anywhere. It's the function that created the modified copy of the original jsonObject. I see no reason to keep it around. After a decent interval, if no one objects I shall submit a patch to eliminate it. Likewise the jsonObjectToJSONRaw function is no longer called from anywhere. It translates a jsonObject to JSON without expanding classnames. In this rewritten version it is identical to jsonObjectToJSON except that it calls add_json_to_buffer() with a different parameter value. If we eliminate jsonObjectToJSONRaw, I can simplify add_json_to_buffer() a bit. However we might someday want to be able to translate a jsonObject without expanding classnames. If nobody minds, I'll leave this function where it is. Scott McKellar http://home.swbell.net/mck9/ct/ Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it; and (d) In the case of each of (a), (b), or (c), I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license indicated in the file.*** ./trunk/src/libopensrf/osrf_json_object.c 2008-11-17 16:12:14.0 -0600 --- ./trunk-mod/src/libopensrf/osrf_json_object.c 2008-11-17 22:25:01.0 -0600 *** *** 61,67 static unusedObj* freeObjList = NULL; ! static void add_json_to_buffer( const jsonObject* obj, growing_buffer * buf ); /** * Return all unused jsonObjects to the heap --- 61,68 static unusedObj* freeObjList = NULL; ! static void add_json_to_buffer( const jsonObject* obj, ! growing_buffer * buf, int do_classname, int second_pass ); /** * Return all unused jsonObjects to the heap *** *** 194,200 unusedObjCapture++; currentListLen++; if (unusedObjCapture 1 !(unusedObjCapture % 1000)) ! osrfLogDebug( OSRF_LOG_MARK, Objects malloc()'d: %d, Reusable objects captured: %d, Objects