On Friday 20 Apr 2012 12:48:38 Marek Šurek wrote: > Dear Ontotext team, > at first congratulation for the final release of OWLIM 5. > but back to problem, I identified problem with character encoding in this > new release OWLIM 5 beta3. The characters which are imported throught .ttl > file are somehow damaged. Few characters are encoded correctly and others > dont. I don't have the problem with OWLIM 4.2. When we moved to OWLIM 4.3 > I had to added -Dfile.encoding=UTF8 into Tomcat setenv.bat to have > correct results. Now nothing helps. Im running Windows 7 ENG, 64bit > I tried to enforce UTF-8 encoding by declaring xml header > <?xml version="1.0" encoding="UTF-8"?> > > and also using Notepad++ I tried to enforce converting into "UTF-8 without > BOM", but nothing changed. I tried to converted it in Notepad++ into > UTF-8 only, and parsing crashed with this exception : > org.openrdf.repository.RepositoryException: Failed to upload data: > org.openrdf.repository.RepositoryException: > org.openrdf.rio.RDFParseException: Content is not allowed in prolog. [line > 1, column 1] > > I also changed encoding in the files to UTF-8 and manually retype corrupted > words in hope there are some hidden characters, but with no results. Then > I thought that maybe only Notepad++ can see the files correctly and other > text editor will show corrupted .owl files, but they interpreted them in > the same way. I import the owl files into Protege and it looked good so I > tried to resave it into protege to remove possible hidden invalid > characters, but no results. > > I also tried to remove -Dfile.encoding=UTF8 from tomcat config and change > xml header <?xml version="1.0"?> and repeat the whole thing, but nothing > changed. Then I thought it could be caused by Windows so I tried to run > the whole thing on Linux(Red Hat), but problem remains. > > Could you offer me some hint where can be problem? > Best regards, > Marek
Hi Marek, I haven't been able to reproduce your issue, unfortunately - I tried loading a .ttl file containing latin, cyrillic and CJK characters and they all seem fine. I suspect the problem might be with protege incorrectly decoding the data from the Sesame server or somehow mangling the text when sending it to the server. Could you try to load your file using the Sesame workbench, then query the repository through both the workbench and protege to check? -- Dimitar Toshev _______________________________________________ Owlim-discussion mailing list [email protected] http://ontomail.semdata.org/cgi-bin/mailman/listinfo/owlim-discussion
