Xpath looks pretty sweet, anyone used it for simillar data sizes?
On 06/01/06, ryanm [EMAIL PROTECTED] wrote:
I find myself in a situation where I need to build a tool to analyse
lots of xml data. Thousands of records containing a lot of strings as
well as numericals.
When I found myself in this situation I did 2 things:
1. Don't use XML, it is way too heavy for this much data. I found that by
using a double-delimeted or fixed-width data format, the file size was
reduced by as much as 70%. In the end, I went with fixed width because I
could parse it faster (by avoiding calling split() thousands of times).
Now, I still used the XML object, but instead of letting it parse the
file, I overwrote the onData event and used my own parsing function, which
generated objects directly instead of parsing it out to an XML object.
Essentially, the XML object just read the data in and dumped it to my
parsing function.
2. Don't try to parse it all at once. What I did was dump it all into a
buffer when it was loaded, and then fire off a parsing function that parsed
250 records per frame. I found that number through trial and error, you can
find your own balance. The important thing was, the application didn't stop
functioning while the records were being parsed, you could go to other areas
of the app and use it normally, and when you went to the section that
required the data, you got a progress bar showing how many records had been
parsed.
My parsing function was semi-complicated. It took the whole dataset in
as a string and split it on my record delimiter, and this array became my
buffer. This way I knew how many records there were to parse, and
approximately how long it would take to parse them. It then sliced 250
records off the top of the buffer on every frame and passed them to the
serialization function, which took them, serialized them, and inserted them
into my database object. My parsing function also built several indexes
while it was parsing the records, to make lookups faster once the database
was ready. My application was a database of hotels, which were sortable by a
number of criteria, so the parsing routine looked for those attributes of
each hotel as it parsed, and when it saw a new value for one of those
criteria, it made a new entry in the appropriate index for it.
I made very heavy use of the object collection syntax, for example:
Index[Location][USA][Texas][Dallas]
...referred to an array of hotel ids which were in Dallas, Texas, USA,
which could be used to find a hotel like this:
// 0 is the first index in the array of ids
hotelID = Index[Location][USA][Texas][Dallas][0];
return(Database[hotelID]);
In the end, it took about 5 times as much code to import, parse, and
index the database than the whole rest of the application, but it worked, it
was relatively fast, and it met the requirements I was given. I would've
preferred for it to work from a web server, selecting what I needed from the
database, but the client required that it work offline from a database that
shipped with the cd, as well as be able to download an updated database from
their website, and this was the best solution I could find in Flash that
worked on both PC and Mac (no 3rd party wrappers). Unfortunately it had to
parse the whole database every time you ran the app, but it would get the
newest version from the web if you were online and it gave you the option to
store it (in an ungodly-sized shared object) if you wanted to.
Anyway, that's how I did it, whether or not it was successful is a
matter of opinion. ;-)
ryanm
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders
--
Jonathan Clarke
1976 Ltd
http://19seventysix.co.uk
e: [EMAIL PROTECTED]
m (UK): +44 773 646 1954
m (Barbados): +1246 259 9475
___
Flashcoders mailing list
Flashcoders@chattyfig.figleaf.com
http://chattyfig.figleaf.com/mailman/listinfo/flashcoders