Hi

I want to use the JDBC driver in order to bulk load data into my Virtuoso
relational DB.
As first thing I would like to see the driver working fine inserting in the
DB non-Latin characters.
So I have used this Java code

import java.sql.Statement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

public class BulkLoader {

  public static Connection getConnection() throws SQLException {
    try {
      Class.forName("virtuoso.jdbc4.Driver");
    } catch (ClassNotFoundException e) {
      e.printStackTrace();
    }
    Connection conn = DriverManager.getConnection("jdbc:virtuoso://
127.0.0.1:1111/CHARSET=UTF-8/", "dba", "dba");
    return conn;
  }

  public static void main(String[] args) {
    Connection conn;
    try {
      conn = getConnection();
      String sql = "INSERT INTO R2RML.TEST.PRODUCT (id,name) VALUES
(1,N'Спиротетрамат и неговият метаболит')";
      Statement st = conn.createStatement();
      int res = st.executeUpdate(sql);
    } catch (SQLException e) {
      e.printStackTrace();
    }
  }
}

to insert some text in a table with the following structure

create table R2RML.TEST.PRODUCT
(
  id   INTEGER,
  name NVARCHAR(100),
  PRIMARY KEY (id)
);

and I get the following result

[image: Immagine incorporata 1]

The text is still inserted incorrectly in the table.

I have removed the SQL_UTF8_EXECS setting from virtuoso.ini

;SQL_UTF8_EXECS   = 1

and used the Virtuoso driver virtjdbc4.jar

I have tried to prepare the statement before executing it, but nothing
changes.

Does anyone see something wrong or missing?

Thanks!

Cheers


Beppe

2017-03-23 17:05 GMT+00:00 Beppe Mazzola <beppemazz...@gmail.com>:

> Hi Sergey
>
> Thank you again for your feedback.
>
> I have noticed now that even the Virtuoso stored procedures
> DB.DBA.csv_register() and DB.DBA.csv_loader_run() are not compatible with
> unicode text. So it seems that my ETL module related to the data loading
> has to be reimplemented using the drivers you suggested, restoring the
> NVARCHAR columns in the RDB.
>
> Thanks again
>
> Cheers
>
> Beppe
>
>
>
> 2017-03-23 15:07 GMT+00:00 Sergey Malinin <sergmali...@gmail.com>:
>
>> On 03/23/2017 12:45 AM, Beppe Mazzola wrote:
>>
>>> Thanks Sergey
>>>
>>> Your suggestion inspired me to do some more trials.
>>>
>>> Declaring the text column
>>>
>>> VARCHAR(3000) instead of NVARCHAR(1000)
>>>
>>> the problem disappears, but I have to increase the column size because
>>> somehow Virtuoso recognizes the text as non-Latin and uses UTF-8 that
>>> occupies more space (till 4 times more than the default encoding).
>>> I cannot declare my text columns VARCHAR, because I must be able to host
>>> non-Latin text till 4000 characters long. In order to do this I should
>>> declare the column
>>> VARCHAR(16000), whereas the maximum size is 4082. Unfortunately it seem
>>> the use of VARCHAR columns is not a solution for me...
>>>
>>> Yet as the Virtuoso documentation <http://docs.openlinksw.com/vi
>>> rtuoso/datatypes/>says that NVARCHAR is dedicated to Unicode:
>>>
>>> /"There are 3 additional data types to enable storing of Unicode data: /
>>>
>>>  *    /NCHAR/
>>>
>>>  *    */NVARCHAR/*
>>>
>>>  *    /LONG NVARCHAR/
>>>
>>> /All the Unicode types are equivalent to their corresponding "narrow"
>>> type - CHAR, VARCHAR and LONG VARCHAR - except that instead of storing data
>>> as one byte they allow Unicode characters. Their lengths are defined and
>>> returned in characters instead of bytes."/
>>>
>>>
>>> The data in my project are loaded with an ETL process implemented using
>>> the Virtuoso bulk load feature (DB.DBA.csv_register() and
>>> DB.DBA.csv_loader_run() procedures).
>>>
>>> Can I still I use the ODBC / JDBC drivers to load my data into the RDB
>>> as somebody suggested?
>>>
>> [Sergey]
>> It is a bad idea to store UTF8 data in VARCHAR() column, it isn't
>> portable and you will need decode UTF-8 data by hand, if you get data from
>> table via JDBC/ODBC drivers.
>> 1) I think, that "SQL_UTF8_EXECS = 1" options will be usefull only, if
>> you want to create tables/columns with Non-Latin names like for example
>>   create table "таблица1" (...
>>
>> 2) If you need to store Unicode data in table, you must to use datatypes:
>> NCHAR, NVARCHAR, LONG NVARCHAR, NCLOB !!!
>> 3) The JDBC/ODBC drivers inserts Unicode data properly to datatypes above.
>>  For ODBC programs the ODBC procedures like SQLExecDirectW / SQLPrepareW
>> must be used for properly send Unicode data to server.
>> 4) If you want insert Unicode data to table or to use Unicode value in
>> WHERE, the N' prefix must be used.
>> Example to insert:
>>     insert into table test values(1, N'АБВГД');
>>
>> 5) I write already, that there is a bug in the Virtuoso Conductor UI, so
>> it send Unicode data to server via wrong way.
>> NOW Conductor UI works properly only for select unicode data from tables.
>>
>> So try to use any tools that works via ODBC/JDBC/ADO.NET
>> <http://ado.net/> Virtuoso drivers, it will works properly and you will
>> not have problems with use unicode data.
>> For example for JDBC => http://www.sql-workbench.net/downloads.html
>>
>>
>>> Cheers
>>>
>>> Beppe
>>>
>>>
>>> 2017-03-21 11:20 GMT+00:00 Sergey Malinin <sergmali...@gmail.com
>>>  <mailto:sergmali...@gmail.com>>:
>>>
>>>     On 03/21/2017 04:42 AM, Beppe Mazzola wrote:
>>>
>>>         Related to the problem I have about storing non-Latin character
>>> in Virtuoso RDB, I see now that I haven't provided all the info related to
>>> it.
>>>
>>>         So it was not evident that the Hungarian text written in the DB
>>> was not the same read from it immediately after, despite the Virtuoso UTF-8
>>> set-up has
>>>         been performed
>>>
>>>         I bring here again the test case (now simplified and completed)
>>> to reproduce the problem.
>>>
>>>         It has been performed with Conductor Interactive SQL.
>>>
>>>     It looks, that Conductor Interactive SQL does something wrong, when
>>> SQL insert(and etc)  queries are executed, it inserts wrong data to table.
>>>     But Conductor Interactive SQL shows unicode data properly, if it was
>>> inserted via ODBC/JDBC/ADO.NET <http://ado.net/> <http://ADO.NET
>>> <http://ado.net/>> .
>>>
>>>     Also note:
>>>     1) you don't need to set "SQL_UTF8_EXECS = 1" in virtuoso.ini
>>> http://docs.openlinksw.com/virtuoso/wideidentifiers/
>>>     <http://docs.openlinksw.com/virtuoso/wideidentifiers/>
>>>     for simple insert unicode data to columns.
>>>
>>>     2) for insert unicode values, the prefix N must be used
>>>        insert into test values(1, N'АБВГ')
>>>
>>>     3) try insert unicode data via ODBC/JDBC/ADO.NET <http://ado.net/> <
>>> http://ADO.NET <http://ado.net/>> drivers, it must be inserted properly.
>>
>>

>          [...]
>>>
>>

>        3) This is the result of the selectdisplayed, no longer the
>>> original text
>>>
>>>         Query result:
>>>         FOOTNOTES_ID
>>>         DECIMAL         FOOTNOTE_TEXT
>>>         NVARCHAR
>>>          4       След тази дата МДГОВ ще бъде
>>> 0,05(*) mg/kg, Ð¾Ñ Ð²ÐµÐ½ ако не бъде изменена Ñ
>>> директива
>>>         или регламент.
>>>         No. of rows in result: 1
>>>
>>>         Here below again is the Virtuoso set up performed for UTF-8 .
>>>
>>>           * all the table columns containing non-Latin text declared as
>>> NVARCHAR(n)
>>>
>>>           * in virtuoso.ini file
>>>         SQL_UTF8_EXECS = 1 ... [HTTPServer] Charset        = UTF-8  ;
>>> this is the default value
>>>         I do not know if there is still some set up missing or if it's a
>>> Virtuoso issue. It's not clear to me as well whether the text is storied
>>> corrupted in
>>>         the DB by the INSERT statement or it is somehow corrupted later,
>>> when Conductor reads it from the DB and returns it to the browser.
>>>         Any idea, hint are very welcome!
>>>         Thanks Cheers
>>>         Beppe
>>>         2017-02-27 10:21 GMT+00:00 Beppe Mazzola <beppemazz...@gmail.com
>>>  <mailto:beppemazz...@gmail.com> <mailto:beppemazz...@gmail.com
>>>         <mailto:beppemazz...@gmail.com>>>:
>>>
>>>             Hi Hugh
>>>             This should reproduce the problem. In my Virtuoso
>>> installation the Bulgarian text inserted in the test table is different to
>>> the one displayed as
>>>         a result
>>>             of the select.
>>>             1) Create a user with name test with the following setting.
>>>             [picture removed]
>>>             2) log in as user test
>>>             3) Execute the following statements CREATE TABLE
>>> DB.test.HST_FTN (   FOOTNOTES_ID NUMERIC NOT NULL,  FOOTNOTE_TEXT
>>> NVARCHAR(1000) NOT NULL,          CONSTRAINT
>>>             PK_V_SEM_MRL_HST_FTN PRIMARY KEY ( FOOTNOTES_ID ) );
>>>             INSERT INTO DB.test.HST_FTN
>>>             (FOOTNOTES_ID,FOOTNOTE_TEXT) VALUES (4,'След тази дата МДГОВ
>>> ще бъде 0,05(*) mg/kg, освен ако не бъде изменена с директива или
>>> регламент.');
>>>         SELECT * FROM
>>>             DB.test.HST_FTN ;
>>>             Cheers
>>>             Beppe
>>>             2017-02-25 22:42 GMT+00:00 Hugh Williams <
>>> hwilli...@openlinksw.com <mailto:hwilli...@openlinksw.com> <mailto:
>>> hwilli...@openlinksw.com
>>>         <mailto:hwilli...@openlinksw.com>>>:
>>>
>>>                 Hi Beppe,
>>>                 That should be all that is required on the server for
>>> Unicode / UTF-8 support.
>>>                 Do you have a test case for recreating the problem being
>>> observed, which should show how the data is being inserted and the queried ?
>>>                 Best Regards
>>>                 Hugh Williams
>>>
>>>

>                     On 22 Feb 2017, at 14:20, Beppe Mazzola <
>>> beppemazz...@gmail.com>
>>>                     Hi to all
>>>                     In order to have Virtuoso relational DB able to
>>> correctly store the text with the UTF-8 encoding I have performed the
>>> following operations.
>>>
>>>                       * all the columns with text declared as NVARCHAR(n)
>>>                       * in virtuoso.ini file
>>>                           o SQL_UTF8_EXECS     = 1
>>>                           o Charset                       = UTF-8
>>>  ; this is the default value
>>>                     This is not enough, since I see that the non-Latin
>>> characters are incorrectly stored in the DB yet.
>>>                     Do I miss any further set-up?
>>>                     I am using the following Virtuoso Open Source
>>> Version Virtuoso Open Source Edition (Column Store) (multi threaded) Version
>>>             7.2.4.2.3217-pthreads as
>>>                     of Feb  3 2017 Compiled for Linux
>>> (x86_64-unknown-linux-gnu)
>>>                     Thanks!
>>>                     Cheers
>>>                     Beppe
>>>
>>>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to