While trying to fix a bug related to name attachments not saving/loading, I worked through the code for the NFIO file handler to try to understand how things work (you may have seen my commits with documentation and comments).
Anyway, I noticed a few things that I wanted to ask about: - There was a lot more string comparison and processing then I expected in a native format. I think I was expecting something more like the bin file handler that streams everything in and out. How much better does the BIN filehandler perform? There may be some things that could be done to improve the performance of the OSB file handler by replacing some of the string processing. - Why are field container names written as strings to the file for every field container? Couldn't we decrease the file size and increase load performance by extending the header to include a map from a fc type id to a field container name? That way every where a field container name is in the current file structure we could just use a Uint32 that we could map directly to a field container type. Note: to write this header we would need the next idea.... - I think the writer could be optimized and simplified quite a bit. This may be a bit tough to explain, but I will do my best. The writer currently starts by writing the first fc. In the process of writing an fc, a list is extend with any other fc's that are pointed to in the system and are not already in the list. So after that first fc is written, the code loops over the list until the end of the list is reached. When it has been reached then all fc's have been written. This is a reasonable way to implement the code, but I see two problems: 1) The reachability traversal is done as a side-effect of the writing process. This is a little inelegant but works. 2) It seems dangerous to me to loop over a list at the same time it is being extended. The interesting thing is that there is already code in here that could simplify this a lot. There is another recursive reachability crawl that is used to get an initial count of nodes fc's that will be written out. This is used for updating the writing progress bar. We could reuse this. So my idea would be to refactor this code to: - Count all fc's to be written and build list of reachable fc's - Write standard header but use the size field to store the number of fc's we will write - Write a header with mapping of all used fc_type_id to fc_type_names - Iterate through the list of reachable fc's and write them one at a time - Close file This would have a couple of benefits: - Simplify the code a bit by having a single method that finds all reachable fc's - Allow use of fc_type_id's instead of names - Allow loader to find the number of expected fc's in the header, and use this for progress CB - List is not growing while we are writing so we could use a vector or even just reuse the counting set to write out the fcs Anyway, this is just a thought. Any comments -Allen ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Opensg-core mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensg-core
