[Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

2017-03-29 Thread Beppe Mazzola
Hi

I want to use the JDBC driver in order to bulk load data into my Virtuoso
relational DB.
As first thing I would like to see the driver working fine inserting in the
DB non-Latin characters.
So I have used this Java code

import java.sql.Statement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

public class BulkLoader {

  public static Connection getConnection() throws SQLException {
try {
  Class.forName("virtuoso.jdbc4.Driver");
} catch (ClassNotFoundException e) {
  e.printStackTrace();
}
Connection conn = DriverManager.getConnection("jdbc:virtuoso://
127.0.0.1:/CHARSET=UTF-8/", "dba", "dba");
return conn;
  }

  public static void main(String[] args) {
Connection conn;
try {
  conn = getConnection();
  String sql = "INSERT INTO R2RML.TEST.PRODUCT (id,name) VALUES
(1,N'Спиротетрамат и неговият метаболит')";
  Statement st = conn.createStatement();
  int res = st.executeUpdate(sql);
} catch (SQLException e) {
  e.printStackTrace();
}
  }
}

to insert some text in a table with the following structure

create table R2RML.TEST.PRODUCT
(
  id   INTEGER,
  name NVARCHAR(100),
  PRIMARY KEY (id)
);

and I get the following result

[image: Immagine incorporata 1]

The text is still inserted incorrectly in the table.

I have removed the SQL_UTF8_EXECS setting from virtuoso.ini

;SQL_UTF8_EXECS   = 1

and used the Virtuoso driver virtjdbc4.jar

I have tried to prepare the statement before executing it, but nothing
changes.

Does anyone see something wrong or missing?

Thanks!

Cheers


Beppe

2017-03-23 17:05 GMT+00:00 Beppe Mazzola :

> Hi Sergey
>
> Thank you again for your feedback.
>
> I have noticed now that even the Virtuoso stored procedures
> DB.DBA.csv_register() and DB.DBA.csv_loader_run() are not compatible with
> unicode text. So it seems that my ETL module related to the data loading
> has to be reimplemented using the drivers you suggested, restoring the
> NVARCHAR columns in the RDB.
>
> Thanks again
>
> Cheers
>
> Beppe
>
>
>
> 2017-03-23 15:07 GMT+00:00 Sergey Malinin :
>
>> On 03/23/2017 12:45 AM, Beppe Mazzola wrote:
>>
>>> Thanks Sergey
>>>
>>> Your suggestion inspired me to do some more trials.
>>>
>>> Declaring the text column
>>>
>>> VARCHAR(3000) instead of NVARCHAR(1000)
>>>
>>> the problem disappears, but I have to increase the column size because
>>> somehow Virtuoso recognizes the text as non-Latin and uses UTF-8 that
>>> occupies more space (till 4 times more than the default encoding).
>>> I cannot declare my text columns VARCHAR, because I must be able to host
>>> non-Latin text till 4000 characters long. In order to do this I should
>>> declare the column
>>> VARCHAR(16000), whereas the maximum size is 4082. Unfortunately it seem
>>> the use of VARCHAR columns is not a solution for me...
>>>
>>> Yet as the Virtuoso documentation >> rtuoso/datatypes/>says that NVARCHAR is dedicated to Unicode:
>>>
>>> /"There are 3 additional data types to enable storing of Unicode data: /
>>>
>>>  */NCHAR/
>>>
>>>  **/NVARCHAR/*
>>>
>>>  */LONG NVARCHAR/
>>>
>>> /All the Unicode types are equivalent to their corresponding "narrow"
>>> type - CHAR, VARCHAR and LONG VARCHAR - except that instead of storing data
>>> as one byte they allow Unicode characters. Their lengths are defined and
>>> returned in characters instead of bytes."/
>>>
>>>
>>> The data in my project are loaded with an ETL process implemented using
>>> the Virtuoso bulk load feature (DB.DBA.csv_register() and
>>> DB.DBA.csv_loader_run() procedures).
>>>
>>> Can I still I use the ODBC / JDBC drivers to load my data into the RDB
>>> as somebody suggested?
>>>
>> [Sergey]
>> It is a bad idea to store UTF8 data in VARCHAR() column, it isn't
>> portable and you will need decode UTF-8 data by hand, if you get data from
>> table via JDBC/ODBC drivers.
>> 1) I think, that "SQL_UTF8_EXECS = 1" options will be usefull only, if
>> you want to create tables/columns with Non-Latin names like for example
>>   create table "таблица1" (...
>>
>> 2) If you need to store Unicode data in table, you must to use datatypes:
>> NCHAR, NVARCHAR, LONG NVARCHAR, NCLOB !!!
>> 3) The JDBC/ODBC drivers inserts Unicode data properly to datatypes above.
>>  For ODBC programs the ODBC procedures like SQLExecDirectW / SQLPrepareW
>> must be used for properly send Unicode data to server.
>> 4) If you want insert Unicode data to table or to use Unicode value in
>> WHERE, the N' prefix must be used.
>> Example to insert:
>> insert into table test values(1, N'АБВГД');
>>
>> 5) I write already, that there is a bug in the Virtuoso Conductor UI, so
>> it send Unicode data to server via wrong way.
>> NOW Conductor UI works properly only for select unicode data from tables.
>>
>> So try to use any tools that works via ODBC/JDBC/ADO.NET
>>  Virtuoso 

Re: [Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

2017-03-21 Thread Sergey Malinin
On 03/21/2017 04:42 AM, Beppe Mazzola wrote:
> Related to the problem I have about storing non-Latin character in Virtuoso 
> RDB, I see now that I haven't provided all the info related to it.
>
> So it was not evident that the Hungarian text written in the DB was not the 
> same read from it immediately after, despite the Virtuoso UTF-8 set-up has 
> been 
> performed
>
> I bring here again the test case (now simplified and completed) to reproduce 
> the problem.
>
> It has been performed with Conductor Interactive SQL.
>
It looks, that Conductor Interactive SQL does something wrong, when SQL 
insert(and etc)  queries are executed, it inserts wrong data to table.
But Conductor Interactive SQL shows unicode data properly, if it was inserted 
via ODBC/JDBC/ADO.NET .

Also note:
1) you don't need to set "SQL_UTF8_EXECS = 1" in virtuoso.ini 
http://docs.openlinksw.com/virtuoso/wideidentifiers/
for simple insert unicode data to columns.

2) for insert unicode values, the prefix N must be used
insert into test values(1, N'АБВГ')

3) try insert unicode data via ODBC/JDBC/ADO.NET drivers, it must be inserted 
properly.


>
> 1) log in as dba
>
> 2) Execute the following statements
>
> CREATE TABLE DB.test.HST_FTN
> (
>   FOOTNOTES_ID NUMERIC NOT NULL,
>   FOOTNOTE_TEXT NVARCHAR(1000) NOT NULL,
>   CONSTRAINT PK_V_SEM_MRL_HST_FTN PRIMARY KEY ( FOOTNOTES_ID )
> );
>
> INSERT INTO DB.test.HST_FTN
> (FOOTNOTES_ID,FOOTNOTE_TEXT)
> VALUES (4,'След тази дата МДГОВ ще бъде 0,05(*) mg/kg, освен ако не бъде 
> изменена с директива или регламент.');
>
> SELECT * FROM DB.test.HST_FTN ;
>
> 3) This is the result of the selectdisplayed, no longer the original text
>
> Query result:
> FOOTNOTES_ID
> DECIMAL   FOOTNOTE_TEXT
> NVARCHAR
>  4 След тази дата МДГОВ ще бъде 0,05(*) mg/kg, Ð¾Ñ 
> вен ако не бъде изменена Ñ Ð´Ð¸Ñ€ÐµÐºÑ‚Ð¸Ð²Ð° или 
> регламент.
> No. of rows in result: 1
>
> Here below again is the Virtuoso set up performed for UTF-8 .
>
>   * all the table columns containing non-Latin text declared as NVARCHAR(n)
>
>   * in virtuoso.ini file 
>
> SQL_UTF8_EXECS = 1 ... [HTTPServer] Charset= UTF-8  ; this is the 
> default value
> I do not know if there is still some set up missing or if it's a Virtuoso 
> issue. It's not clear to me as well whether the text is storied corrupted in 
> the DB 
> by the INSERT statement or it is somehow corrupted later, when Conductor 
> reads it from the DB and returns it to the browser.
> Any idea, hint are very welcome!
> Thanks Cheers
> Beppe
> 2017-02-27 10:21 GMT+00:00 Beppe Mazzola  >:
>
> Hi Hugh
> This should reproduce the problem. In my Virtuoso installation the 
> Bulgarian text inserted in the test table is different to the one displayed 
> as a result
> of the select.
> 1) Create a user with name test with the following setting.
> [picture removed]
> 2) log in as user test
> 3) Execute the following statements CREATE TABLE DB.test.HST_FTN (   
> FOOTNOTES_ID NUMERIC NOT NULL,   FOOTNOTE_TEXT NVARCHAR(1000) NOT NULL,   
> CONSTRAINT
> PK_V_SEM_MRL_HST_FTN PRIMARY KEY ( FOOTNOTES_ID ) );
> INSERT INTO DB.test.HST_FTN
> (FOOTNOTES_ID,FOOTNOTE_TEXT) VALUES (4,'След тази дата МДГОВ ще бъде 
> 0,05(*) mg/kg, освен ако не бъде изменена с директива или регламент.'); 
> SELECT * FROM
> DB.test.HST_FTN ;
> Cheers
> Beppe
> 2017-02-25 22:42 GMT+00:00 Hugh Williams  >:
>
> Hi Beppe,
> That should be all that is required on the server for Unicode / UTF-8 
> support.
> Do you have a test case for recreating the problem being observed, 
> which should show how the data is being inserted and the queried ?
> Best Regards
> Hugh Williams
> Professional Services
> OpenLink Software, Inc.  // http://www.openlinksw.com/
> Weblog   -- http://www.openlinksw.com/blogs/ 
> 
> LinkedIn -- http://www.linkedin.com/company/openlink-software/ 
> 
> Twitter  -- http://twitter.com/OpenLink
> Google+  -- http://plus.google.com/100570109519069333827/ 
> 
> Facebook -- http://www.facebook.com/OpenLinkSoftware 
> 
> Universal Data Access, Integration, and Management Technology 
> Providers
>> On 22 Feb 2017, at 14:20, Beppe Mazzola > > wrote:
>> Hi to all
>> In order to have Virtuoso relational DB able to correctly store the 
>> text with the UTF-8 encoding I have performed the following operations.
>>
>>   * all the columns with text declared as NVARCHAR(n)
>>   * in virtuoso.ini file
>>   

Re: [Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

2017-03-20 Thread Beppe Mazzola
Related to the problem I have about storing non-Latin character in Virtuoso
RDB, I see now that I haven't provided all the info related to it.

So it was not evident that the Hungarian text written in the DB was not the
same read from it immediately after, despite the Virtuoso UTF-8 set-up has
been performed

I bring here again the test case (now simplified and completed) to
reproduce the problem.

It has been performed with Conductor Interactive SQL.


1) log in as dba

2) Execute the following statements

CREATE TABLE DB.test.HST_FTN
(
  FOOTNOTES_ID NUMERIC NOT NULL,
  FOOTNOTE_TEXT NVARCHAR(1000) NOT NULL,
  CONSTRAINT PK_V_SEM_MRL_HST_FTN PRIMARY KEY ( FOOTNOTES_ID )
);

INSERT INTO DB.test.HST_FTN
(FOOTNOTES_ID,FOOTNOTE_TEXT)
VALUES (4,'След тази дата МДГОВ ще бъде 0,05(*) mg/kg, освен ако не бъде
изменена с директива или регламент.');

SELECT * FROM DB.test.HST_FTN ;

3) This is the result of the select displayed, no longer the original text

Query result:
FOOTNOTES_ID
DECIMAL FOOTNOTE_TEXT
NVARCHAR
 4  След тази дата МДГОВ ще бъде 0,05(*) mg/kg, оÑ
вен ако не бъде изменена Ñ Ð´Ð¸Ñ€ÐµÐºÑ‚Ð¸Ð²Ð° или
регламент.
No. of rows in result: 1


Here below again is the Virtuoso set up performed for UTF-8 .

   - all the table columns containing non-Latin text declared as NVARCHAR(n)


   - in virtuoso.ini file

SQL_UTF8_EXECS = 1
...
[HTTPServer]
Charset= UTF-8  ; this is the default value


I do not know if there is still some set up missing or if it's a Virtuoso
issue.

It's not clear to me as well whether the text is storied corrupted in the
DB by the INSERT statement or it is somehow corrupted later, when Conductor
reads it from the DB and returns it to the browser.

Any idea, hint are very welcome!

Thanks

Cheers

Beppe





2017-02-27 10:21 GMT+00:00 Beppe Mazzola :

> Hi Hugh
>
> This should reproduce the problem. In my Virtuoso installation the
> Bulgarian text inserted in the test table is different to the one displayed
> as a result of the select.
>
> 1) Create a user with name test with the following setting.
>
> [picture removed]
>
> 2) log in as user test
>
> 3) Execute the following statements
>
> CREATE TABLE DB.test.HST_FTN
> (
>   FOOTNOTES_ID NUMERIC NOT NULL,
>   FOOTNOTE_TEXT NVARCHAR(1000) NOT NULL,
>   CONSTRAINT PK_V_SEM_MRL_HST_FTN PRIMARY KEY ( FOOTNOTES_ID )
> );
>
> INSERT INTO DB.test.HST_FTN
> (FOOTNOTES_ID,FOOTNOTE_TEXT)
> VALUES (4,'След тази дата МДГОВ ще бъде 0,05(*) mg/kg, освен ако не бъде
> изменена с директива или регламент.');
>
> SELECT * FROM DB.test.HST_FTN ;
>
>
> Cheers
>
> Beppe
>
>
> 2017-02-25 22:42 GMT+00:00 Hugh Williams :
>
>> Hi Beppe,
>>
>> That should be all that is required on the server for Unicode / UTF-8
>> support.
>>
>> Do you have a test case for recreating the problem being observed, which
>> should show how the data is being inserted and the queried ?
>>
>> Best Regards
>> Hugh Williams
>> Professional Services
>> OpenLink Software, Inc.  //  http://www.openlinksw.com/
>> Weblog   -- http://www.openlinksw.com/blogs/
>> LinkedIn -- http://www.linkedin.com/company/openlink-software/
>> Twitter  -- http://twitter.com/OpenLink
>> Google+  -- http://plus.google.com/100570109519069333827/
>> Facebook -- http://www.facebook.com/OpenLinkSoftware
>> Universal Data Access, Integration, and Management Technology Providers
>>
>>
>>
>> On 22 Feb 2017, at 14:20, Beppe Mazzola  wrote:
>>
>> Hi to all
>>
>> In order to have Virtuoso relational DB able to correctly store the text
>> with the UTF-8 encoding I have performed the following operations.
>>
>>
>>- all the columns with text declared as NVARCHAR(n)
>>- in virtuoso.ini file
>>- SQL_UTF8_EXECS = 1
>>   - Charset   = UTF-8; this is the
>>   default value
>>
>> This is not enough, since I see that the non-Latin characters are
>> incorrectly stored in the DB yet.
>>
>> Do I miss any further set-up?
>>
>> I am using the following Virtuoso Open Source Version
>>
>> Virtuoso Open Source Edition (Column Store) (multi threaded)
>> Version 7.2.4.2.3217-pthreads as of Feb  3 2017
>> Compiled for Linux (x86_64-unknown-linux-gnu)
>>
>>
>> Thanks!
>>
>> Cheers
>>
>> Beppe
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, SlashDot.org !
>> http://sdm.link/slashdot___
>> Virtuoso-users mailing list
>> Virtuoso-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>>
>>
>>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! 

Re: [Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

2017-03-01 Thread Rob Vesse
Even if the database is configured for UTF-8 it still matters how you insert 
the data. I’ve had similar problems in the past where the issue was that the 
connection to the database was using a different encoding. If memory serves you 
have to explicitly set the character set on your ODBC/JDBC/ADO.net connection 
string to the appropriate character set as otherwise you can get incorrect 
encodings.

Rob

From: Beppe Mazzola <beppemazz...@gmail.com>
Date: Wednesday, 22 February 2017 14:20
To: "virtuoso-users@lists.sourceforge.net" 
<virtuoso-users@lists.sourceforge.net>
Subject: [Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

Hi to all
In order to have Virtuoso relational DB able to correctly store the text with 
the UTF-8 encoding I have performed the following operations.
· all the columns with text declared as NVARCHAR(n)
· in virtuoso.ini file
oSQL_UTF8_EXECS = 1
oCharset   = UTF-8; this is the default value
This is not enough, since I see that the non-Latin characters are incorrectly 
stored in the DB yet.
Do I miss any further set-up?
I am using the following Virtuoso Open Source Version

Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.4.2.3217-pthreads as of Feb  3 2017
Compiled for Linux (x86_64-unknown-linux-gnu)

Thanks!
Cheers
Beppe
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


Re: [Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

2017-02-25 Thread Hugh Williams
Hi Beppe,

That should be all that is required on the server for Unicode / UTF-8 support. 

Do you have a test case for recreating the problem being observed, which should 
show how the data is being inserted and the queried ?

Best Regards
Hugh Williams
Professional Services
OpenLink Software, Inc.  //  http://www.openlinksw.com/ 

Weblog   -- http://www.openlinksw.com/blogs/ 
LinkedIn -- http://www.linkedin.com/company/openlink-software/ 

Twitter  -- http://twitter.com/OpenLink 
Google+  -- http://plus.google.com/100570109519069333827/ 

Facebook -- http://www.facebook.com/OpenLinkSoftware 

Universal Data Access, Integration, and Management Technology Providers



> On 22 Feb 2017, at 14:20, Beppe Mazzola  > wrote:
> 
> Hi to all 
> 
> In order to have Virtuoso relational DB able to correctly store the text with 
> the UTF-8 encoding I have performed the following operations.
> 
> all the columns with text declared as NVARCHAR(n)
> in virtuoso.ini file 
> SQL_UTF8_EXECS = 1
> Charset   = UTF-8; this is the default value
> This is not enough, since I see that the non-Latin characters are incorrectly 
> stored in the DB yet.
> 
> Do I miss any further set-up?
> 
> I am using the following Virtuoso Open Source Version
> 
> Virtuoso Open Source Edition (Column Store) (multi threaded)
> Version 7.2.4.2.3217-pthreads as of Feb  3 2017
> Compiled for Linux (x86_64-unknown-linux-gnu)
> 
> 
> Thanks!
> 
> Cheers
> 
> Beppe
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org ! 
> http://sdm.link/slashdot___ 
> 
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net 
> 
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users



smime.p7s
Description: S/MIME cryptographic signature
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


[Virtuoso-users] Unable to setup UTF-8 encoding in the relational DB

2017-02-22 Thread Beppe Mazzola
Hi to all

In order to have Virtuoso relational DB able to correctly store the text
with the UTF-8 encoding I have performed the following operations.


   - all the columns with text declared as NVARCHAR(n)
   - in virtuoso.ini file
   - SQL_UTF8_EXECS = 1
  - Charset   = UTF-8; this is the default
  value

This is not enough, since I see that the non-Latin characters are
incorrectly stored in the DB yet.

Do I miss any further set-up?

I am using the following Virtuoso Open Source Version

Virtuoso Open Source Edition (Column Store) (multi threaded)
Version 7.2.4.2.3217-pthreads as of Feb  3 2017
Compiled for Linux (x86_64-unknown-linux-gnu)


Thanks!

Cheers

Beppe
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users