Hello. This is (most probably) not a question, this is a report about an issue with Omegahat's Xslt 0.5-2 (and back to 0.5-0), or with R, or platform related, or an arbitrary mixture of all three ... I can't decide that. Despite various testing and attempts to work around the problem, the result is always the same.
Environment: ------------ Mac OS X 10.4.6 R 2.2.1 ... via R.app *or* the terminal (BTW, for me R 2.3.1 is a no-op at the moment) R locale set to "de_DE.UTF-8/...", which is the default Package XML 0.99-8 Package Sxslt 0.5-2 Task: ----- I have -- a XML-file which contains a *valid* XML data-structure (incl. a DTD) -- a XSLT-file with a *valid* style sheet ... both UTF-8, no BOM, with explicit encoding information in the XML header (encoding="utf-8"). I want to read the XML-file, and parse the data to a suitable R list structure with XSLT. All in all, this *does* work as expected, no difference which of the following three strategies I choose, but alternative (3) below yields some strange results. Three alternative attempts/strategies: -------------------------------------- (1) Reading the XML-file and parsing the XML-tree directly with the methods from the XML-package (main methods: "xmlTreeParse", "xmlRoot"). > objectAsXML <- xmlRoot( xmlTreeParse( xmlfile, validate=TRUE )); (2) Reading the XML-file, applying the XSLT style sheet to transform the XML tree, then writing the result to a [temporary/utility, whatever] file, and at last sourcing this file in and assigning the result to a R object -- voila, not very elegant, but good for testing whether the transformation worked (main methods: "xsltApplyStyleSheet", "saveXML"). > xslsheet <- xsltParseStyleSheet( xslfile ); > objectAsS3list <- xsltApplyStyleSheet( xmlfile, xslsheet ); > check <- saveXML( objectAsS3list, xsltparsed.out ); > objectAsS3list <- source( xsltparsed.out )[[ 1 ]]; (3) Identical to (2), but without saving the transformed data to a temporary file. Instead, I call saveXML without a file name, and assign the XSLT-transformed XML-tree directly to an R object. > xslsheet <- xsltParseStyleSheet( xslfile ); > objectAsS3list <- xsltApplyStyleSheet( xmlfile, xslsheet ); > objectAsS3list <- eval( parse( text=saveXML( objectAsS3list ))); Results and problem: -------------------- ... or 'the funny part'. One would think that all three strategies yield exactly the same results, but that's not what happens. My XML-data contain German umlauts (in the data parts, not in the tags, of course), e.g. <name>Jörg Beyer</name> With alternatives (1)/package XML, and (2)/Sxslt + temp file, this example parses and results in R as <name>Jörg Beyer</name> # (1), or ... "Jörg Beyer" # (2), respectively which is exactly what I would expect -- *no problems here*. But with method (3)/Sxslt + direct assignment of the result to a variable, this example shows in R as "Jörg Beyer" # ugly, isn't it? It makes no difference whether I store the umlauts as characters or entities in the XML file, the result is the same (but storing the pure characters is more convenient). Second, it makes no difference whether I set the encoding of the XML file, say, to "ISO-8859-1" or something other than "utf-8". Again, the encoding information is an explicit part of the headers of my XML files. Always. And third, it makes no difference whether I use the terminal or R.app. And of course, trying to change the char mapping with other R commands during the whole process doesn't help. I informed the developer of the packages on 2006-08-02, and on 2006-08-28. Thanks for your interest. Joerg _______________________________________________ R-SIG-Mac mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-mac
