Re: [Ohrrpgce] Reload.SerializeXML

Ralph Versteegen Sat, 29 May 2010 23:54:21 -0700

On 28 May 2010 13:04, Mike Caron <[email protected]> wrote:
> On 27/05/2010 8:38 PM, Ralph Versteegen wrote:
>>
>> On 28 May 2010 11:25, James Paige<[email protected]>  wrote:
>>>
>>> On Thu, May 27, 2010 at 07:07:38PM -0400, Mike Caron wrote:
>>>>
>>>> On 27/05/2010 6:38 PM, James Paige wrote:
>>>>>
>>>>> Mike, was there any special reason why Reload.SerializeXML uses print
>>>>> statements rather than writing to a file?
>>>>>
>>>>> ---
>>>>> James
>>>>
>>>> I wrote it as a debugging function. If it wrote to a file, then I
>>>> wouldn't be able to see it on screen in reloadtest! :)
>>>> --
>>>> Mike
>>>
>>> I guess what I am really looking for is a reload2xml command-line tool
>>> so I can easily debug reload files on disk..
>>>
>>> and actually it doesn't matter that SerializeXML prints to the console,
>>> because I could just do
>>>
>>>  reload2xml somefile.reld>  somefile.xml
>>>
>>> ---
>>> James
>>
>> Writing to standard output is the Unix way anyway!
>>
>> Speaking of xml, Mike mentioned a couple weeks ago that the Reload
>> code grinds to a halt when processing some translated 64MB xml
>> document. Since I enjoy optimisation to a rather evil degree, I'd like
>> to look at it sometime. What are some good testcases?
>
> Right at this moment, I don't feel like touching any of that stuff, so go
> nuts.
>
> A few tips:
>
> - I don't think the private heap has anything to do with potential
> performance issues. They're the same calls being done by the runtime, just
> in a different memory block.
> - One thing I never thought about doing is compiling with the -profile
> switch.
> - I never did any formal performance tuning, being as I subscribe to the
> "make it work, then make it fast" camp. All that ZString crap was to
> alleviate my fears of memory corruption caused by having Strings in UDTs.
>
> This is the document I mentioned: (warning: ~5 Megs compressed, 64 Megs
> uncompressed)
>
> http://taleotc.com/medline08n0059.zip
>
> I haven't looked to closely at the structure of the document, but it seems
> to be a fairly average, if large, dataset.
>
> Other than that, Google isn't being very friendly. Querying "xml test
> documents" lists a bunch of XML Tutorials and Unit testing stuff, while
> "large xml test documents" seems to focus on *really* big documents (like, >
> 1 Gb), for which I suspect the RELOAD file format would break down :)
>
> --
> Mike


So here are my results (my machine is a 7 year old 3GHz pentium 4 with
1GB of RAM):

Beforehand:

(I believe that this version of xml2reload did not include the
attributes, which add about 30% to the .reld size)

bash-3.1$ xml2reload ../medline08n0059.xml medline.reld
Loaded XML document in 4839 ms
Parsed XML document in 74792 ms
Optimised document in 9859 ms
Serialized document in 1022907 ms
Tore down memory in 3891 ms
Finished in 1116305 ms


bash-3.1$ time reload2reload plotdict.reld plotdict2.reld
Loaded document in 30 ms
Serialized document in 2851 ms
Tore down memory in 1 ms
Finished in 2899 ms

real    0m2.919s
user    0m0.112s
sys     0m0.188s

(where reload2reload is obviously just a 10 liner)

============Afterwards:=======

bash-3.1$ time xml2reload ../medline08n0059.xml medline.reld
Loaded XML document in 4207 ms
Parsed XML document in 6023 ms
Optimised document in 10631 ms
Serialized document in 2623 ms
Tore down memory in 3097 ms
Finished in 26596 ms

real    0m26.699s
user    0m24.018s
sys     0m1.100s

(Also, running reload2reload on medline.reld (which is 31MB) required
about 142MB of memory)

bash-3.1$ reload2reload plotdict.reld plotdict2.reld
Loaded document in 13 ms
Serialized document in 21 ms
Tore down memory in 8 ms
Finished in 60 ms
_______________________________________________
Ohrrpgce mailing list
[email protected]
http://lists.motherhamster.org/listinfo.cgi/ohrrpgce-motherhamster.org

Re: [Ohrrpgce] Reload.SerializeXML

Reply via email to