A thousand apologies for my last incomplete post, I have a new computer with an awkwardly placed trackpad... anyway, the post in full is...
Hi all, I'm in the process of developing an XML plugin for Geany, a lightweight Linux IDE. Part of the plugin is a custom GTK Tree Model which displays the parsed tree of a document without having to load every row into the tree view. I'm pretty happy with that part and it all seems to be working rather well. Not content with that, I've been working on enhancing the plugin in order to give Geany some of the same features that you find in commercial XML editing software such as XPath searching and XSL transformations. Again, so far, so good. Where I've really come to grief is in trying to tie the model and the Scintilla editor widget together. I am trying to implement a feature which lets the user click on a row in the tree view and have the cursor go to that position and vice versa. In order to do this I needed to determine the start and end position of each tag and compare it to the position returned from the mouse click in the editor window. I was able to get the start and end position for each node in the tree by first creating a Parser Context with xmlCreateURLParserCtxt and then passing that to xmlParseDocument. I then copy the xmlParserNodeInfoSeq node_seq from the Parser Context into a linked list to enable a binary search for the position returned by the Scintilla edit. So far so good. I load a document, click on the editor and it moves the list view selection to the right node. Huzzah! Overcome by my mastery of C and libxml I continue testing only to find that I get unexpected results with non UTF-8 documents, specifically ISO-8859-1. Testing, testing, testing, I determine that the positions returned by the xmlParserNodeInfo for ISO-8859-1 documents are offset exactly 41 characters less than those from the Scintilla widget. After hacking about in the libxml source code, it appears to me that this has something to do with the way the documents are parsed according to their encoding and that this could account for the variation. I am assuming it has to do with the position of the input buffer _after_ the encoding declaration has been parsed. For now, I have a dirty, dirty little hack in place which determines if the encoding is ISO-8859-1 and if so, it subtracts 41 from the position passed. This is not good(tm) imho and I'm looking for a better way, especially since looking through the source code has made me aware of the far greater variety of document encodings out there than I had hitherto been aware of. So I guess it's time to phrase my questions: * Is there an easier way of determining the correct offset for the start position of a non-UTF-8 document other than a ghastly switch statement with all of the potential offsets? * Is it likely that access to the xmlParserNodeInfoSeq via xmlParseDocument will be deprecated in the future and my code will break on future versions? I'm sure I'll have more questions as I proceed, but I would appreciate some insight into the above from someone more familiar with the internals of libxml than I. Many thanks Chris Daley -- -------------------------------------- Chris Daley Sydney, New South Wales (EDT - UTC/GMT+11) e: [email protected] m: +61 437 031 214 s: chebizarro tw: chebizarro "There is no way to peace — peace is the way" - A.J. Muste _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] https://mail.gnome.org/mailman/listinfo/xml
