Hi
I want to use the JDBC driver in order to bulk load data into my Virtuoso
relational DB.
As first thing I would like to see the driver working fine inserting in the
DB non-Latin characters.
So I have used this Java code
import java.sql.Statement;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
public class BulkLoader {
public static Connection getConnection() throws SQLException {
try {
Class.forName("virtuoso.jdbc4.Driver");
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
Connection conn = DriverManager.getConnection("jdbc:virtuoso://
127.0.0.1:1111/CHARSET=UTF-8/", "dba", "dba");
return conn;
}
public static void main(String[] args) {
Connection conn;
try {
conn = getConnection();
String sql = "INSERT INTO R2RML.TEST.PRODUCT (id,name) VALUES
(1,N'Спиротетрамат и неговият метаболит')";
Statement st = conn.createStatement();
int res = st.executeUpdate(sql);
} catch (SQLException e) {
e.printStackTrace();
}
}
}
to insert some text in a table with the following structure
create table R2RML.TEST.PRODUCT
(
id INTEGER,
name NVARCHAR(100),
PRIMARY KEY (id)
);
and I get the following result
[image: Immagine incorporata 1]
The text is still inserted incorrectly in the table.
I have removed the SQL_UTF8_EXECS setting from virtuoso.ini
;SQL_UTF8_EXECS = 1
and used the Virtuoso driver virtjdbc4.jar
I have tried to prepare the statement before executing it, but nothing
changes.
Does anyone see something wrong or missing?
Thanks!
Cheers
Beppe
2017-03-23 17:05 GMT+00:00 Beppe Mazzola <beppemazz...@gmail.com>:
> Hi Sergey
>
> Thank you again for your feedback.
>
> I have noticed now that even the Virtuoso stored procedures
> DB.DBA.csv_register() and DB.DBA.csv_loader_run() are not compatible with
> unicode text. So it seems that my ETL module related to the data loading
> has to be reimplemented using the drivers you suggested, restoring the
> NVARCHAR columns in the RDB.
>
> Thanks again
>
> Cheers
>
> Beppe
>
>
>
> 2017-03-23 15:07 GMT+00:00 Sergey Malinin <sergmali...@gmail.com>:
>
>> On 03/23/2017 12:45 AM, Beppe Mazzola wrote:
>>
>>> Thanks Sergey
>>>
>>> Your suggestion inspired me to do some more trials.
>>>
>>> Declaring the text column
>>>
>>> VARCHAR(3000) instead of NVARCHAR(1000)
>>>
>>> the problem disappears, but I have to increase the column size because
>>> somehow Virtuoso recognizes the text as non-Latin and uses UTF-8 that
>>> occupies more space (till 4 times more than the default encoding).
>>> I cannot declare my text columns VARCHAR, because I must be able to host
>>> non-Latin text till 4000 characters long. In order to do this I should
>>> declare the column
>>> VARCHAR(16000), whereas the maximum size is 4082. Unfortunately it seem
>>> the use of VARCHAR columns is not a solution for me...
>>>
>>> Yet as the Virtuoso documentation <http://docs.openlinksw.com/vi
>>> rtuoso/datatypes/>says that NVARCHAR is dedicated to Unicode:
>>>
>>> /"There are 3 additional data types to enable storing of Unicode data: /
>>>
>>> * /NCHAR/
>>>
>>> * */NVARCHAR/*
>>>
>>> * /LONG NVARCHAR/
>>>
>>> /All the Unicode types are equivalent to their corresponding "narrow"
>>> type - CHAR, VARCHAR and LONG VARCHAR - except that instead of storing data
>>> as one byte they allow Unicode characters. Their lengths are defined and
>>> returned in characters instead of bytes."/
>>>
>>>
>>> The data in my project are loaded with an ETL process implemented using
>>> the Virtuoso bulk load feature (DB.DBA.csv_register() and
>>> DB.DBA.csv_loader_run() procedures).
>>>
>>> Can I still I use the ODBC / JDBC drivers to load my data into the RDB
>>> as somebody suggested?
>>>
>> [Sergey]
>> It is a bad idea to store UTF8 data in VARCHAR() column, it isn't
>> portable and you will need decode UTF-8 data by hand, if you get data from
>> table via JDBC/ODBC drivers.
>> 1) I think, that "SQL_UTF8_EXECS = 1" options will be usefull only, if
>> you want to create tables/columns with Non-Latin names like for example
>> create table "таблица1" (...
>>
>> 2) If you need to store Unicode data in table, you must to use datatypes:
>> NCHAR, NVARCHAR, LONG NVARCHAR, NCLOB !!!
>> 3) The JDBC/ODBC drivers inserts Unicode data properly to datatypes above.
>> For ODBC programs the ODBC procedures like SQLExecDirectW / SQLPrepareW
>> must be used for properly send Unicode data to server.
>> 4) If you want insert Unicode data to table or to use Unicode value in
>> WHERE, the N' prefix must be used.
>> Example to insert:
>> insert into table test values(1, N'АБВГД');
>>
>> 5) I write already, that there is a bug in the Virtuoso Conductor UI, so
>> it send Unicode data to server via wrong way.
>> NOW Conductor UI works properly only for select unicode data from tables.
>>
>> So try to use any tools that works via ODBC/JDBC/ADO.NET
>> <http://ado.net/> Virtuoso drivers, it will works properly and you will
>> not have problems with use unicode data.
>> For example for JDBC => http://www.sql-workbench.net/downloads.html
>>
>>
>>> Cheers
>>>
>>> Beppe
>>>
>>>
>>> 2017-03-21 11:20 GMT+00:00 Sergey Malinin <sergmali...@gmail.com
>>> <mailto:sergmali...@gmail.com>>:
>>>
>>> On 03/21/2017 04:42 AM, Beppe Mazzola wrote:
>>>
>>> Related to the problem I have about storing non-Latin character
>>> in Virtuoso RDB, I see now that I haven't provided all the info related to
>>> it.
>>>
>>> So it was not evident that the Hungarian text written in the DB
>>> was not the same read from it immediately after, despite the Virtuoso UTF-8
>>> set-up has
>>> been performed
>>>
>>> I bring here again the test case (now simplified and completed)
>>> to reproduce the problem.
>>>
>>> It has been performed with Conductor Interactive SQL.
>>>
>>> It looks, that Conductor Interactive SQL does something wrong, when
>>> SQL insert(and etc) queries are executed, it inserts wrong data to table.
>>> But Conductor Interactive SQL shows unicode data properly, if it was
>>> inserted via ODBC/JDBC/ADO.NET <http://ado.net/> <http://ADO.NET
>>> <http://ado.net/>> .
>>>
>>> Also note:
>>> 1) you don't need to set "SQL_UTF8_EXECS = 1" in virtuoso.ini
>>> http://docs.openlinksw.com/virtuoso/wideidentifiers/
>>> <http://docs.openlinksw.com/virtuoso/wideidentifiers/>
>>> for simple insert unicode data to columns.
>>>
>>> 2) for insert unicode values, the prefix N must be used
>>> insert into test values(1, N'АБВГ')
>>>
>>> 3) try insert unicode data via ODBC/JDBC/ADO.NET <http://ado.net/> <
>>> http://ADO.NET <http://ado.net/>> drivers, it must be inserted properly.
>>
>>
> [...]
>>>
>>
> 3) This is the result of the selectdisplayed, no longer the
>>> original text
>>>
>>> Query result:
>>> FOOTNOTES_ID
>>> DECIMAL FOOTNOTE_TEXT
>>> NVARCHAR
>>> 4 След тази дата МДГОВ ще бъде
>>> 0,05(*) mg/kg, Ð¾Ñ Ð²ÐµÐ½ ако не бъде изменена Ñ
>>> директива
>>> или регламент.
>>> No. of rows in result: 1
>>>
>>> Here below again is the Virtuoso set up performed for UTF-8 .
>>>
>>> * all the table columns containing non-Latin text declared as
>>> NVARCHAR(n)
>>>
>>> * in virtuoso.ini file
>>> SQL_UTF8_EXECS = 1 ... [HTTPServer] Charset = UTF-8 ;
>>> this is the default value
>>> I do not know if there is still some set up missing or if it's a
>>> Virtuoso issue. It's not clear to me as well whether the text is storied
>>> corrupted in
>>> the DB by the INSERT statement or it is somehow corrupted later,
>>> when Conductor reads it from the DB and returns it to the browser.
>>> Any idea, hint are very welcome!
>>> Thanks Cheers
>>> Beppe
>>> 2017-02-27 10:21 GMT+00:00 Beppe Mazzola <beppemazz...@gmail.com
>>> <mailto:beppemazz...@gmail.com> <mailto:beppemazz...@gmail.com
>>> <mailto:beppemazz...@gmail.com>>>:
>>>
>>> Hi Hugh
>>> This should reproduce the problem. In my Virtuoso
>>> installation the Bulgarian text inserted in the test table is different to
>>> the one displayed as
>>> a result
>>> of the select.
>>> 1) Create a user with name test with the following setting.
>>> [picture removed]
>>> 2) log in as user test
>>> 3) Execute the following statements CREATE TABLE
>>> DB.test.HST_FTN ( FOOTNOTES_ID NUMERIC NOT NULL, FOOTNOTE_TEXT
>>> NVARCHAR(1000) NOT NULL, CONSTRAINT
>>> PK_V_SEM_MRL_HST_FTN PRIMARY KEY ( FOOTNOTES_ID ) );
>>> INSERT INTO DB.test.HST_FTN
>>> (FOOTNOTES_ID,FOOTNOTE_TEXT) VALUES (4,'След тази дата МДГОВ
>>> ще бъде 0,05(*) mg/kg, освен ако не бъде изменена с директива или
>>> регламент.');
>>> SELECT * FROM
>>> DB.test.HST_FTN ;
>>> Cheers
>>> Beppe
>>> 2017-02-25 22:42 GMT+00:00 Hugh Williams <
>>> hwilli...@openlinksw.com <mailto:hwilli...@openlinksw.com> <mailto:
>>> hwilli...@openlinksw.com
>>> <mailto:hwilli...@openlinksw.com>>>:
>>>
>>> Hi Beppe,
>>> That should be all that is required on the server for
>>> Unicode / UTF-8 support.
>>> Do you have a test case for recreating the problem being
>>> observed, which should show how the data is being inserted and the queried ?
>>> Best Regards
>>> Hugh Williams
>>>
>>>
> On 22 Feb 2017, at 14:20, Beppe Mazzola <
>>> beppemazz...@gmail.com>
>>> Hi to all
>>> In order to have Virtuoso relational DB able to
>>> correctly store the text with the UTF-8 encoding I have performed the
>>> following operations.
>>>
>>> * all the columns with text declared as NVARCHAR(n)
>>> * in virtuoso.ini file
>>> o SQL_UTF8_EXECS = 1
>>> o Charset = UTF-8
>>> ; this is the default value
>>> This is not enough, since I see that the non-Latin
>>> characters are incorrectly stored in the DB yet.
>>> Do I miss any further set-up?
>>> I am using the following Virtuoso Open Source
>>> Version Virtuoso Open Source Edition (Column Store) (multi threaded) Version
>>> 7.2.4.2.3217-pthreads as
>>> of Feb 3 2017 Compiled for Linux
>>> (x86_64-unknown-linux-gnu)
>>> Thanks!
>>> Cheers
>>> Beppe
>>>
>>>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users