Re: [api-dev] problems reading xml file with com.sun.star.xml.dom.DocumentBuilder
Sorry, it seems to be a platform independent problem. I could reproduce your problem even on the Linux machine. Just had no really 'iso formatted' xml test document first. In the code I sent, the TextInputStream is in fact providing the correct character encoding and it turned out that the DocumentBuilder seems to look only into the stream for the encoding. Thus it doesn't help to provide the stream with a correct character encoding and you must provide the definition of the encoding inside the stream (here in the first line of your xml document). The only way I could think of to bypass this problem would be 1. Write this definition into your file (as you stated) 2. Somehow write this definition into your stream first (don't know yet how to do this) 3. Convert your stream encoding (maybe reading bytes from inputstream and writing utf to the parser - how?) Sorry again for not really helping you. Maybe somebody else? Btw: To get the build number without writing code you could open the about box from the help menu and type sdt keeping the control key pressed for all three letters. Christian Andersson wrote: Hmm this is not working for me, I still get a null object from oDB.parse... what system do you test this on? I am running this on windows 2003 server and openoffice 2.0 (I know that there is a way to get build number, but I keep forgetting it) Christoph Jopp wrote: Kjære Christian, for meg følgende code virker: oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oTextInpStream = createUnoService(com.sun.star.io.TextInputStream) oTextInpStream.setInputStream(oInpStream) oTextInpStream.setEncoding(iso-8859-1) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oTextInpStream) oInpStream.closeInput Sorry for my bad Norvegian but It's long ago, I've been there. To the code: You have to use a TextInputStream to be able to set the encoding. Hope it helps. Ha det bra, Christoph Christian Andersson wrote: I have a small problem, In starbasic I'm using (almost) the following code (there might be small mistakes sicne I'm writing this from memory) to read and parse an xml document with starbasic oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oInpStream) oInpStream.closeInput this works for me almost perfectly, and I say almost, since there are some xml documents that it cannot read. the problem I am having is that some documents (that are beeing generated by a third party system which I cannot change) have not declared that it is an xml document like this ?xml version=1.0 encoding=utf-8 ? it just starts with the xml tags directly liek this test test2 . /test2 /test this is all fine, I have other xml documents that also look liek this, and Openoffice can read and parse them. however within these problematic documents they are using national characters (åæø) encoded using iso-8859-1 and this is the problem. if they were encoded with utf-8 openoffice can read the document without having any ecoding declaration. but with iso-8859-1 the oDB.parse function just returns null. no errors/exceptions or anything, just null. if I in that file manually add ?xml version=1.0 encoding=iso-8859-1 ? at the start, openoffice can read it perfectly.. so is there some way I can force the dom parser to use iso-8859-1 instead of utf-8 ? it would be great if I could do domDoc = oDB.parse(oInpStream, iso-8859-1) and it would work, but from what I can see there is no function for this in the DocumentBuilder, not is there anything like this in the inputstream object or the simplefileaccess object. I should be able to get around this problem by programmaticly make a copy of the file, and insert the ?... part first and then use my modified file for reading the xml file, but this is only a last resort sollution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [api-dev] problems reading xml file with com.sun.star.xml.dom.DocumentBuilder
Hmm this is not working for me, I still get a null object from oDB.parse... what system do you test this on? I am running this on windows 2003 server and openoffice 2.0 (I know that there is a way to get build number, but I keep forgetting it) Christoph Jopp wrote: Kjære Christian, for meg følgende code virker: oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oTextInpStream = createUnoService(com.sun.star.io.TextInputStream) oTextInpStream.setInputStream(oInpStream) oTextInpStream.setEncoding(iso-8859-1) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oTextInpStream) oInpStream.closeInput Sorry for my bad Norvegian but It's long ago, I've been there. To the code: You have to use a TextInputStream to be able to set the encoding. Hope it helps. Ha det bra, Christoph Christian Andersson wrote: I have a small problem, In starbasic I'm using (almost) the following code (there might be small mistakes sicne I'm writing this from memory) to read and parse an xml document with starbasic oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oInpStream) oInpStream.closeInput this works for me almost perfectly, and I say almost, since there are some xml documents that it cannot read. the problem I am having is that some documents (that are beeing generated by a third party system which I cannot change) have not declared that it is an xml document like this ?xml version=1.0 encoding=utf-8 ? it just starts with the xml tags directly liek this test test2 . /test2 /test this is all fine, I have other xml documents that also look liek this, and Openoffice can read and parse them. however within these problematic documents they are using national characters (åæø) encoded using iso-8859-1 and this is the problem. if they were encoded with utf-8 openoffice can read the document without having any ecoding declaration. but with iso-8859-1 the oDB.parse function just returns null. no errors/exceptions or anything, just null. if I in that file manually add ?xml version=1.0 encoding=iso-8859-1 ? at the start, openoffice can read it perfectly.. so is there some way I can force the dom parser to use iso-8859-1 instead of utf-8 ? it would be great if I could do domDoc = oDB.parse(oInpStream, iso-8859-1) and it would work, but from what I can see there is no function for this in the DocumentBuilder, not is there anything like this in the inputstream object or the simplefileaccess object. I should be able to get around this problem by programmaticly make a copy of the file, and insert the ?... part first and then use my modified file for reading the xml file, but this is only a last resort sollution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Christian Andersson - [EMAIL PROTECTED] Configuration and Collaboration for OpenOffice.org Open Framework Systems AS http://www.ofs.no - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [api-dev] problems reading xml file with com.sun.star.xml.dom.DocumentBuilder
The system I tested it was a Linux Machine ;-) so it might be true that there is a difference. To check it on a Windows (XP) machine I have to wait until the evening. But what I found in the IDL reference might help: They say they use the character encoding name according to this http://www.iana.org/assignments/character-sets document. So it might be a different wrighting and you could check with some of these possibilities I found there: Name: ISO_8859-1:1987[RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 Alias: l1 Alias: IBM819 Alias: CP819 Alias: csISOLatin1 If something of it works tell me please. Otherwise I'll check it today in the evening on my windows machine. Christian Andersson wrote: Hmm this is not working for me, I still get a null object from oDB.parse... what system do you test this on? I am running this on windows 2003 server and openoffice 2.0 (I know that there is a way to get build number, but I keep forgetting it) Christoph Jopp wrote: Kjære Christian, for meg følgende code virker: oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oTextInpStream = createUnoService(com.sun.star.io.TextInputStream) oTextInpStream.setInputStream(oInpStream) oTextInpStream.setEncoding(iso-8859-1) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oTextInpStream) oInpStream.closeInput Sorry for my bad Norvegian but It's long ago, I've been there. To the code: You have to use a TextInputStream to be able to set the encoding. Hope it helps. Ha det bra, Christoph Christian Andersson wrote: I have a small problem, In starbasic I'm using (almost) the following code (there might be small mistakes sicne I'm writing this from memory) to read and parse an xml document with starbasic oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oInpStream) oInpStream.closeInput this works for me almost perfectly, and I say almost, since there are some xml documents that it cannot read. the problem I am having is that some documents (that are beeing generated by a third party system which I cannot change) have not declared that it is an xml document like this ?xml version=1.0 encoding=utf-8 ? it just starts with the xml tags directly liek this test test2 . /test2 /test this is all fine, I have other xml documents that also look liek this, and Openoffice can read and parse them. however within these problematic documents they are using national characters (åæø) encoded using iso-8859-1 and this is the problem. if they were encoded with utf-8 openoffice can read the document without having any ecoding declaration. but with iso-8859-1 the oDB.parse function just returns null. no errors/exceptions or anything, just null. if I in that file manually add ?xml version=1.0 encoding=iso-8859-1 ? at the start, openoffice can read it perfectly.. so is there some way I can force the dom parser to use iso-8859-1 instead of utf-8 ? it would be great if I could do domDoc = oDB.parse(oInpStream, iso-8859-1) and it would work, but from what I can see there is no function for this in the DocumentBuilder, not is there anything like this in the inputstream object or the simplefileaccess object. I should be able to get around this problem by programmaticly make a copy of the file, and insert the ?... part first and then use my modified file for reading the xml file, but this is only a last resort sollution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [api-dev] problems reading xml file with com.sun.star.xml.dom.DocumentBuilder
thank you, I'll try that at once, and don't worry about the norwegian, I'm not good at it either. Christoph Jopp wrote: Kjære Christian, for meg følgende code virker: oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oTextInpStream = createUnoService(com.sun.star.io.TextInputStream) oTextInpStream.setInputStream(oInpStream) oTextInpStream.setEncoding(iso-8859-1) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oTextInpStream) oInpStream.closeInput Sorry for my bad Norvegian but It's long ago, I've been there. To the code: You have to use a TextInputStream to be able to set the encoding. Hope it helps. Ha det bra, Christoph Christian Andersson wrote: I have a small problem, In starbasic I'm using (almost) the following code (there might be small mistakes sicne I'm writing this from memory) to read and parse an xml document with starbasic oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oInpStream) oInpStream.closeInput this works for me almost perfectly, and I say almost, since there are some xml documents that it cannot read. the problem I am having is that some documents (that are beeing generated by a third party system which I cannot change) have not declared that it is an xml document like this ?xml version=1.0 encoding=utf-8 ? it just starts with the xml tags directly liek this test test2 . /test2 /test this is all fine, I have other xml documents that also look liek this, and Openoffice can read and parse them. however within these problematic documents they are using national characters (åæø) encoded using iso-8859-1 and this is the problem. if they were encoded with utf-8 openoffice can read the document without having any ecoding declaration. but with iso-8859-1 the oDB.parse function just returns null. no errors/exceptions or anything, just null. if I in that file manually add ?xml version=1.0 encoding=iso-8859-1 ? at the start, openoffice can read it perfectly.. so is there some way I can force the dom parser to use iso-8859-1 instead of utf-8 ? it would be great if I could do domDoc = oDB.parse(oInpStream, iso-8859-1) and it would work, but from what I can see there is no function for this in the DocumentBuilder, not is there anything like this in the inputstream object or the simplefileaccess object. I should be able to get around this problem by programmaticly make a copy of the file, and insert the ?... part first and then use my modified file for reading the xml file, but this is only a last resort sollution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Christian Andersson - [EMAIL PROTECTED] Configuration and Collaboration for OpenOffice.org Open Framework Systems AS http://www.ofs.no - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [api-dev] problems reading xml file with com.sun.star.xml.dom.DocumentBuilder
Kjære Christian, for meg følgende code virker: oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oTextInpStream = createUnoService(com.sun.star.io.TextInputStream) oTextInpStream.setInputStream(oInpStream) oTextInpStream.setEncoding(iso-8859-1) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oTextInpStream) oInpStream.closeInput Sorry for my bad Norvegian but It's long ago, I've been there. To the code: You have to use a TextInputStream to be able to set the encoding. Hope it helps. Ha det bra, Christoph Christian Andersson wrote: I have a small problem, In starbasic I'm using (almost) the following code (there might be small mistakes sicne I'm writing this from memory) to read and parse an xml document with starbasic oSFA = createUNOService (com.sun.star.ucb.SimpleFileAccess) oInpStream = oSFA.openFileRead(sUrl) oDB = createUnoService(com.sun.star.xml.dom.DocumentBuilder) domDoc = oDB.parse(oInpStream) oInpStream.closeInput this works for me almost perfectly, and I say almost, since there are some xml documents that it cannot read. the problem I am having is that some documents (that are beeing generated by a third party system which I cannot change) have not declared that it is an xml document like this ?xml version=1.0 encoding=utf-8 ? it just starts with the xml tags directly liek this test test2 . /test2 /test this is all fine, I have other xml documents that also look liek this, and Openoffice can read and parse them. however within these problematic documents they are using national characters (åæø) encoded using iso-8859-1 and this is the problem. if they were encoded with utf-8 openoffice can read the document without having any ecoding declaration. but with iso-8859-1 the oDB.parse function just returns null. no errors/exceptions or anything, just null. if I in that file manually add ?xml version=1.0 encoding=iso-8859-1 ? at the start, openoffice can read it perfectly.. so is there some way I can force the dom parser to use iso-8859-1 instead of utf-8 ? it would be great if I could do domDoc = oDB.parse(oInpStream, iso-8859-1) and it would work, but from what I can see there is no function for this in the DocumentBuilder, not is there anything like this in the inputstream object or the simplefileaccess object. I should be able to get around this problem by programmaticly make a copy of the file, and insert the ?... part first and then use my modified file for reading the xml file, but this is only a last resort sollution. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]