[jira] [Assigned] (IGNITE-8386) SQL: Make sure PK index do not use wrapped object

2018-06-04 Thread Alexander Paschenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-8386:
---

Assignee: Alexander Paschenko

> SQL: Make sure PK index do not use wrapped object
> -
>
> Key: IGNITE-8386
> URL: https://issues.apache.org/jira/browse/IGNITE-8386
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.4
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
>  Labels: iep-19, performance
> Fix For: 2.6
>
>
> Currently PK may be built over the whole {{_KEY}} column, i.e. the whole 
> binary object. This could happen in two cases:
> 1) Composite PK
> 2) Plain PK but with {{WRAP_KEY}} option.
> This is critical performance issue for two reasons:
> 1) This index is effectively useless and cannot be used in any sensible 
> queries; it just wastes space and makes updates slower
> 2) Binary object typically has common header bytes what may lead to excessive 
> number of comparisons during index update.
> To mitigate the problem we need to ensure that index is *never* built over 
> {{_KEY}}, Instead, we must always extract target columns and build normal 
> index over them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-8384) SQL: Secondary indexes should sort entries by links rather than keys

2018-05-14 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-8384:
---

Assignee: Alexander Paschenko

> SQL: Secondary indexes should sort entries by links rather than keys
> 
>
> Key: IGNITE-8384
> URL: https://issues.apache.org/jira/browse/IGNITE-8384
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.4
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
>  Labels: iep-19, performance
> Fix For: 2.6
>
>
> Currently we sort entries in secondary indexes as {{(idx_cols, KEY)}}. The 
> key itself is not stored in the index in general case. It means that we need 
> to perform a lookup to data page to find correct insertion point for index 
> entry.
> This could be fixed easily by sorting entries a bit differently - {{idx_cols, 
> link}}. This is all we need.
> UPD: If we have an affinity keys, then affinity column will be added to 
> secondary index as well. 
>  So, we'll have secondary index as  {{(idx_cols, KEY, AFF_COL)}}
> Comparison occur here: 
> {{org.apache.ignite.internal.processors.query.h2.database.H2Tree#compare}}
> What we need is to avoid adding PK and affinity key columns to every 
> secondary index and compare links instead in this method.
> Probably we need to preserve old behavior for compatibility purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-6937) SQL TX: Support SELECT FOR UPDATE

2018-05-02 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-6937:

Attachment: Снимок экрана 2018-05-02 в 11.51.03.png

> SQL TX: Support SELECT FOR UPDATE
> -
>
> Key: IGNITE-6937
> URL: https://issues.apache.org/jira/browse/IGNITE-6937
> Project: Ignite
>  Issue Type: Task
>  Components: cache, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
>  Labels: iep-3
> Fix For: 2.6
>
> Attachments: Снимок экрана 2018-05-02 в 11.51.03.png
>
>
> Normally in SQL world readers do not block writers. This is how our SELECT 
> operations should work by default. But we need to add a support for {{SELECT 
> ... FOR UPDATE}} read mode, when reader obtains exclusive lock on read. 
> In this mode we lock entries as usual, but then send data back to the caller. 
> First page can be returned directly in our {{LockResponse}}. Next pages 
> should be requested in separate requests. With this approach {{SELECT ,,, FOR 
> UPDATE}} will require only single round-trip to both lock and read data in 
> case of small updates.
> Update {{SELECT}} statement syntax once this feature is supported:
> https://apacheignite-sql.readme.io/docs/select



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8204) Tests hang with lazy query flag on

2018-04-10 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-8204:
---

 Summary: Tests hang with lazy query flag on
 Key: IGNITE-8204
 URL: https://issues.apache.org/jira/browse/IGNITE-8204
 Project: Ignite
  Issue Type: Bug
  Components: sql
Reporter: Alexander Paschenko
Assignee: Alexander Paschenko
 Fix For: 2.5


Affected tests:

{{IgniteCacheClientQueryReplicatedNodeRestartSelfTest.testRestart}}

{{IgniteCacheDistributedPartitionQueryNodeRestartsSelfTest.testJoinQueryUnstableTopology}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4193) SQL TX: ODBC driver support

2018-04-03 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423832#comment-16423832
 ] 

Alexander Paschenko commented on IGNITE-4193:
-

[~isapego] the only minor thing to fix is class javadoc for 
{{OdbcRequestHandlerWorker}} - it mentions JDBC, this clearly needs fixing.

> SQL TX: ODBC driver support
> ---
>
> Key: IGNITE-4193
> URL: https://issues.apache.org/jira/browse/IGNITE-4193
> Project: Ignite
>  Issue Type: Task
>  Components: odbc
>Reporter: Denis Magda
>Assignee: Igor Sapego
>Priority: Major
>  Labels: iep-3
> Fix For: 2.5
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from ODBC driver side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4193) SQL TX: ODBC driver support

2018-04-03 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423820#comment-16423820
 ] 

Alexander Paschenko commented on IGNITE-4193:
-

[~isapego] please disregard my previous comment, it seems like code duplication 
is intentional pattern in current JDBC/ODBC state of things not to couple those 
together.

> SQL TX: ODBC driver support
> ---
>
> Key: IGNITE-4193
> URL: https://issues.apache.org/jira/browse/IGNITE-4193
> Project: Ignite
>  Issue Type: Task
>  Components: odbc
>Reporter: Denis Magda
>Assignee: Igor Sapego
>Priority: Major
>  Labels: iep-3
> Fix For: 2.5
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from ODBC driver side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-4193) SQL TX: ODBC driver support

2018-04-03 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423818#comment-16423818
 ] 

Alexander Paschenko commented on IGNITE-4193:
-

[~isapego] thanks for the patch, my sole comment would be about 
{{OdbcRequestHandlerWorker}} - it's almost complete copy-paste from 
{{JdbcRequestHandlerWorker}} and JDBC is even still mentioned in comments. Can 
we generify this piece? Probably we'll have to extract some auxiliary interface 
with {{doHandle}} method to achieve this.

> SQL TX: ODBC driver support
> ---
>
> Key: IGNITE-4193
> URL: https://issues.apache.org/jira/browse/IGNITE-4193
> Project: Ignite
>  Issue Type: Task
>  Components: odbc
>Reporter: Denis Magda
>Assignee: Igor Sapego
>Priority: Major
>  Labels: iep-3
> Fix For: 2.5
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from ODBC driver side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off

2018-03-27 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-8050:
---

Assignee: Alexander Paschenko

> Throw a meaningful exception when user issues TX SQL keyword with MVCC turned 
> off
> -
>
> Key: IGNITE-8050
> URL: https://issues.apache.org/jira/browse/IGNITE-8050
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> An exception must be thrown when the user issues TX SQL command (BEGIN, 
> COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and 
> can lead to SQL engine behavior to behaving quite differently from what the 
> user expects, esp. in terms of data consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off

2018-03-27 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-8050:

Component/s: jdbc

> Throw a meaningful exception when user issues TX SQL keyword with MVCC turned 
> off
> -
>
> Key: IGNITE-8050
> URL: https://issues.apache.org/jira/browse/IGNITE-8050
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> An exception must be thrown when the user issues TX SQL command (BEGIN, 
> COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and 
> can lead to SQL engine behavior to behaving quite differently from what the 
> user expects, esp. in terms of data consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off

2018-03-27 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-8050:

Component/s: sql

> Throw a meaningful exception when user issues TX SQL keyword with MVCC turned 
> off
> -
>
> Key: IGNITE-8050
> URL: https://issues.apache.org/jira/browse/IGNITE-8050
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> An exception must be thrown when the user issues TX SQL command (BEGIN, 
> COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and 
> can lead to SQL engine behavior to behaving quite differently from what the 
> user expects, esp. in terms of data consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off

2018-03-26 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-8050:
---

 Summary: Throw a meaningful exception when user issues TX SQL 
keyword with MVCC turned off
 Key: IGNITE-8050
 URL: https://issues.apache.org/jira/browse/IGNITE-8050
 Project: Ignite
  Issue Type: Task
Reporter: Alexander Paschenko
 Fix For: 2.5


An exception must be thrown when the user issues TX SQL command (BEGIN, COMMIT, 
ROLLBACK) in absence of MVCC - ingoring these may be confusing and can lead to 
SQL engine behavior to behaving quite differently from what the user expects, 
esp. in terms of data consistency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8016) Make sure checks in QueryUtils.typeForQueryEntity are sane

2018-03-22 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-8016:
---

 Summary: Make sure checks in QueryUtils.typeForQueryEntity are sane
 Key: IGNITE-8016
 URL: https://issues.apache.org/jira/browse/IGNITE-8016
 Project: Ignite
  Issue Type: Task
  Components: cache, sql
Reporter: Alexander Paschenko


If one uses {{AffinityUuid}} (and thus, most likely, any {{AffinityKey}} at 
all) as cache key in {{QueryEntity}} while not having an actual class for value 
type, cache fails to start with error

 

{{Failed to find value class in the node classpath (use default marshaller to 
enable binary objects)}}

 

Not only error message is misleading (as {{BinaryMarshaller}} is always on 
now), the checks themselves must be verified for sanity as ruling them out 
allowed to start cache, put data into it via cache API and perform SQL query 
that worked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7956) MVCC TX: cache eviction operations for key-value API

2018-03-14 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7956:
---

 Summary: MVCC TX: cache eviction operations for key-value API
 Key: IGNITE-7956
 URL: https://issues.apache.org/jira/browse/IGNITE-7956
 Project: Ignite
  Issue Type: Task
  Components: cache
Reporter: Alexander Paschenko
 Fix For: 2.5


We need to implement MVCC-compatible cache eviction operations for key-value 
API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7955) MVCC TX: cache peek for key-value API

2018-03-14 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7955:
---

 Summary: MVCC TX: cache peek for key-value API
 Key: IGNITE-7955
 URL: https://issues.apache.org/jira/browse/IGNITE-7955
 Project: Ignite
  Issue Type: Task
  Components: cache
Reporter: Alexander Paschenko
 Fix For: 2.5


We need to implement MVCC-compatible cache peek operation for key-value API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7954) MVCC TX: cache load routines for key-value API

2018-03-14 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7954:
---

 Summary: MVCC TX: cache load routines for key-value API
 Key: IGNITE-7954
 URL: https://issues.apache.org/jira/browse/IGNITE-7954
 Project: Ignite
  Issue Type: Task
  Components: cache
Reporter: Alexander Paschenko
 Fix For: 2.5


We need to implement MVCC-compatible cache load operations for key-value API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7953) MVCC TX: continuous queries

2018-03-14 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7953:
---

 Summary: MVCC TX: continuous queries
 Key: IGNITE-7953
 URL: https://issues.apache.org/jira/browse/IGNITE-7953
 Project: Ignite
  Issue Type: Task
  Components: cache
Reporter: Alexander Paschenko
 Fix For: 2.5


We need to implement MVCC-compatible continuous queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7952) MVCC TX: cache clear routines for key-value API

2018-03-14 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-7952:

Description: We need to implement MVCC-compatible cache clear operations 
for Key-Value API.

> MVCC TX: cache clear routines for key-value API
> ---
>
> Key: IGNITE-7952
> URL: https://issues.apache.org/jira/browse/IGNITE-7952
> Project: Ignite
>  Issue Type: Task
>  Components: cache
>Reporter: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> We need to implement MVCC-compatible cache clear operations for Key-Value API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7952) MVCC TX: cache clear routines for key-value API

2018-03-14 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7952:
---

 Summary: MVCC TX: cache clear routines for key-value API
 Key: IGNITE-7952
 URL: https://issues.apache.org/jira/browse/IGNITE-7952
 Project: Ignite
  Issue Type: Task
  Components: cache
Reporter: Alexander Paschenko
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7942) Document streaming in thin JDBC driver

2018-03-14 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7942:
---

 Summary: Document streaming in thin JDBC driver
 Key: IGNITE-7942
 URL: https://issues.apache.org/jira/browse/IGNITE-7942
 Project: Ignite
  Issue Type: Task
  Components: documentation, jdbc, sql
Reporter: Alexander Paschenko
 Fix For: 2.5






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode

2018-03-13 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396829#comment-16396829
 ] 

Alexander Paschenko commented on IGNITE-7253:
-

[~vozerov],

all done. Regargind pt. 6 - there's a test 
{{JdbcStreamingSelfTest#testOnlyInsertsAllowed}} that passes both for thin and 
thick drivers.

> JDBC thin driver: introduce streaming mode
> --
>
> Key: IGNITE-7253
> URL: https://issues.apache.org/jira/browse/IGNITE-7253
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
> Attachments: IGNITE_7253.patch
>
>
> Should be done after IGNITE-6022. We should allow optional streaming mode for 
> JDBC driver. In this mode only INSERTs without SELECT should be possible. All 
> other DML operations should throw an exception. 
> Design considerations:
> 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable 
> streaming for connection.
> 2) Add command {{STREAMER FLUSH}} which will force data flush.
> 3) Only INSERT without SELECT works, all other DML statements should throw an 
> exception
> 4) It should be possible to stream into several tables simultaneously (i.e. 
> several streamers could be opened)
> 5) Any DDL statement should force flush of all currently opened streamers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode

2018-03-02 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384100#comment-16384100
 ] 

Alexander Paschenko commented on IGNITE-7253:
-

[~vozerov], all fixed.

> JDBC thin driver: introduce streaming mode
> --
>
> Key: IGNITE-7253
> URL: https://issues.apache.org/jira/browse/IGNITE-7253
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
> Attachments: IGNITE_7253.patch
>
>
> Should be done after IGNITE-6022. We should allow optional streaming mode for 
> JDBC driver. In this mode only INSERTs without SELECT should be possible. All 
> other DML operations should throw an exception. 
> Design considerations:
> 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable 
> streaming for connection.
> 2) Add command {{STREAMER FLUSH}} which will force data flush.
> 3) Only INSERT without SELECT works, all other DML statements should throw an 
> exception
> 4) It should be possible to stream into several tables simultaneously (i.e. 
> several streamers could be opened)
> 5) Any DDL statement should force flush of all currently opened streamers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7859) SQL streaming support via native API

2018-03-01 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-7859:

Component/s: sql

> SQL streaming support via native API
> 
>
> Key: IGNITE-7859
> URL: https://issues.apache.org/jira/browse/IGNITE-7859
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> In addition to streaming via thin JDBC driver, ability to run same {{SET 
> STREAMING}} command should be added to cache API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-7859) SQL streaming support via native API

2018-03-01 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-7859:

Fix Version/s: 2.5

> SQL streaming support via native API
> 
>
> Key: IGNITE-7859
> URL: https://issues.apache.org/jira/browse/IGNITE-7859
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> In addition to streaming via thin JDBC driver, ability to run same {{SET 
> STREAMING}} command should be added to cache API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7859) SQL streaming support via native API

2018-03-01 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7859:
---

 Summary: SQL streaming support via native API
 Key: IGNITE-7859
 URL: https://issues.apache.org/jira/browse/IGNITE-7859
 Project: Ignite
  Issue Type: Task
Reporter: Alexander Paschenko


In addition to streaming via thin JDBC driver, ability to run same {{SET 
STREAMING}} command should be added to cache API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7785) SQL query with COUNT and UNION in sub-query produces JdbcSQLException

2018-02-27 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378534#comment-16378534
 ] 

Alexander Paschenko commented on IGNITE-7785:
-

[~pvinokurov],

Initial query is a bit weird as long as it has union of a table with itself.

In absence of {{UNION ALL}} clause, that subquery

{{(select id from company union all select id from company)}}

should give same result as single {{SELECT id FROM Company}}, and thus in 
initial case that UNION is redundant - if you remove UNION, query works.

However, if you add ALL to UNION, query works - however, in this case UNION has 
duplicates just like it should.

I will also have a look at second query.

> SQL query with COUNT and UNION in sub-query produces JdbcSQLException
> -
>
> Key: IGNITE-7785
> URL: https://issues.apache.org/jira/browse/IGNITE-7785
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1, 2.3
>Reporter: Pavel Vinokurov
>Priority: Major
>
> SQL initial script:
> CREATE TABLE Person(id INTEGER PRIMARY KEY, company_id INTEGER);
>  CREATE TABLE Company(id INTEGER PRIMARY KEY, name VARCHAR);
>  INSERT INTO Person(id,company_id) VALUES (1, 1), (2, 2), (3, 3);
>  INSERT INTO Company(id,name) VALUES (1,'n1'), (2,'n2'), (3,'n3');
> SQL Query:
> SELECT count(1) FROM person p
>  LEFT JOIN (select id from company union select id from company) as c on 
> c.id=p.company_id
> JDBC Exception:
> Caused by: org.h2.jdbc.JdbcSQLException: Column "P__Z0.COMPANY_ID" must be in 
> the GROUP BY list; SQL statement:
> SELECT
> P__Z0.COMPANY_ID __C0_0,
> COUNT(1) __C0_1
> FROM PUBLIC.PERSON P__Z0 [90016-195]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7738) Allow 'multiple statements' in thin JDBC streaming mode

2018-02-16 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7738:
---

 Summary: Allow 'multiple statements' in thin JDBC streaming mode
 Key: IGNITE-7738
 URL: https://issues.apache.org/jira/browse/IGNITE-7738
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Alexander Paschenko


We need to update thin JDBC protocol to let user run multiple statements via 
Statement.execute(String) when connection is in streamed mode. Currently in 
streaming mode the server always receives all SQL in batches and its batch 
processing logic does not allow multiple statements altogether. If we're able 
to recognize initial nature of the statement, we'll be able to act 
appropriately, and for this to be possible additional information must be 
passed with each query in the batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode

2018-02-16 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367320#comment-16367320
 ] 

Alexander Paschenko commented on IGNITE-7253:
-

[~vozerov],

all done.

> JDBC thin driver: introduce streaming mode
> --
>
> Key: IGNITE-7253
> URL: https://issues.apache.org/jira/browse/IGNITE-7253
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> Should be done after IGNITE-6022. We should allow optional streaming mode for 
> JDBC driver. In this mode only INSERTs without SELECT should be possible. All 
> other DML operations should throw an exception. 
> Design considerations:
> 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable 
> streaming for connection.
> 2) Add command {{STREAMER FLUSH}} which will force data flush.
> 3) Only INSERT without SELECT works, all other DML statements should throw an 
> exception
> 4) It should be possible to stream into several tables simultaneously (i.e. 
> several streamers could be opened)
> 5) Any DDL statement should force flush of all currently opened streamers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (IGNITE-7253) JDBC thin driver: introduce streaming mode

2018-02-16 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365795#comment-16365795
 ] 

Alexander Paschenko edited comment on IGNITE-7253 at 2/16/18 11:07 AM:
---

[~vozerov],

2, 3 - the only need for batch size in these cases is to keep track of number 
of fake counters that we return. That is, if user has issued "addBatch" trice, 
we return that number of counters, each having value of zero. Just a weak 
attempt to create a visibility of JDBC compliance, nothing else.


was (Author: al.psc):
[~vozerov],

2, 3 - the only need for batch size in these cases is to keep track of number 
of fake counters that we remove. That is, if user has issued "addBatch" trice, 
we return that number of counters, each having value of zero. Just a weak 
attempt to create a visibility of JDBC compliance, nothing else.

> JDBC thin driver: introduce streaming mode
> --
>
> Key: IGNITE-7253
> URL: https://issues.apache.org/jira/browse/IGNITE-7253
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> Should be done after IGNITE-6022. We should allow optional streaming mode for 
> JDBC driver. In this mode only INSERTs without SELECT should be possible. All 
> other DML operations should throw an exception. 
> Design considerations:
> 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable 
> streaming for connection.
> 2) Add command {{STREAMER FLUSH}} which will force data flush.
> 3) Only INSERT without SELECT works, all other DML statements should throw an 
> exception
> 4) It should be possible to stream into several tables simultaneously (i.e. 
> several streamers could be opened)
> 5) Any DDL statement should force flush of all currently opened streamers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode

2018-02-15 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365795#comment-16365795
 ] 

Alexander Paschenko commented on IGNITE-7253:
-

[~vozerov],

2, 3 - the only need for batch size in these cases is to keep track of number 
of fake counters that we remove. That is, if user has issued "addBatch" trice, 
we return that number of counters, each having value of zero. Just a weak 
attempt to create a visibility of JDBC compliance, nothing else.

> JDBC thin driver: introduce streaming mode
> --
>
> Key: IGNITE-7253
> URL: https://issues.apache.org/jira/browse/IGNITE-7253
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Major
> Fix For: 2.5
>
>
> Should be done after IGNITE-6022. We should allow optional streaming mode for 
> JDBC driver. In this mode only INSERTs without SELECT should be possible. All 
> other DML operations should throw an exception. 
> Design considerations:
> 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable 
> streaming for connection.
> 2) Add command {{STREAMER FLUSH}} which will force data flush.
> 3) Only INSERT without SELECT works, all other DML statements should throw an 
> exception
> 4) It should be possible to stream into several tables simultaneously (i.e. 
> several streamers could be opened)
> 5) Any DDL statement should force flush of all currently opened streamers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7664) SQL: throw sane exception on unsupported SQL statements

2018-02-09 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7664:
---

 Summary: SQL: throw sane exception on unsupported SQL statements
 Key: IGNITE-7664
 URL: https://issues.apache.org/jira/browse/IGNITE-7664
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Alexander Paschenko
Assignee: Alexander Paschenko
 Fix For: 2.5


Inspired by this SO issue:

[https://stackoverflow.com/questions/48708238/ignite-database-create-schema-assertionerror]

We should handle unsupported stuff more gracefully both in core code and 
drivers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7586) SQL COPY: add code examples

2018-02-07 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355509#comment-16355509
 ] 

Alexander Paschenko commented on IGNITE-7586:
-

[~kirill.shirokov],

Thanks, my comments:
 # Please refactor imports - we don't use wildcards. Also order of imports does 
matter, please have a look at guidelines. (First java.*, then everything else.) 
IDEA may easily be tuned to lay imports out just the way we need.
 # Please remove "create index", it is not necessary to the scope of this 
example.
 # Please change ""JDBC example finished." to ""JDBC COPY command example 
finished."

> SQL COPY: add code examples
> ---
>
> Key: IGNITE-7586
> URL: https://issues.apache.org/jira/browse/IGNITE-7586
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 2.4
>Reporter: Kirill Shirokov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
> Fix For: 2.4
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-06 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354254#comment-16354254
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

There was some weird stuff (abstract overrides, wrong ancestor, etc), but I've 
fixed it myself not to lose any more time on this - please have a look and note 
what has been changed.

Please create a Jira issue to add tests for parsing and link it to this one, 
thank you.

Will run tests too.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> * We may want to implement data compression in future.
> * We connect to single node in JDBC driver (no multi-node connections).
> h3. Cache/tables/column handling
> * We don't create table in 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-06 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353919#comment-16353919
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

In fact, cache writer *is* called to execute "independent actions on 
independent pieces of data".

However, issue here is not that big, and if it was the only thing we'd be done 
already. What worries me more is that we're hanging with these two issues (50, 
51) for few days already. Ok, let's pass on 51.

Still there's 50, unused method param, and this is not passable, still there 
hasn't been any reaction (neither fix, nor verbal answer) about that. Are we 
going to argue about it as well?

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-05 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353530#comment-16353530
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

Thanks for the fixes. Still I don't see pts. {{50}}, {{51}} fixed. Are you 
going to fix those so that we can get this over with?

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> * We may want to implement data compression in future.
> * We connect to single node in JDBC driver (no multi-node connections).
> h3. Cache/tables/column handling
> * We don't create table in the bulk load command
> * We may want to have and option for reading header row, which contains 
> column names to match columns
> * In future we may wish to support 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-05 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352166#comment-16352166
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

52. {{BulkLoadParser#createParser}} - I think we don't need two factories, one 
for formats is enough. Just create abstract method {{BulkLoadParser 
createParser()}} in {{BulkLoadFormat}}. You've stated above that format and 
parser are 1:1 related, so this seems to be the way to go.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> * We may want to implement data compression in future.
> * We connect to single node in JDBC driver (no multi-node connections).
> h3. Cache/tables/column handling
> * We don't create table in the bulk 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-02 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350605#comment-16350605
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/2/18 4:29 PM:
-

[~kirill.shirokov],

50. {{BulkLoadProcessor#close}}

>See javadoc.

I have seen it. Still this class is not a member of any hierarchy, {{close}} 
method neither overrides anything nor is overridden by anyone and thus now this 
param neither is used nor needed. Please remove it, adding it back once we need 
it won't be a big deal at all. Really, unused param in this case hardly is an 
architectural point of future extension.

51. {{BulkLoadCacheWriter}}:

>O. But what is the reason for doing that?

At the very least, to maintain consistency about names of methods having 
similar, or close-to-identical meaning. Say, all our closures have their method 
called {{apply}} while this class employs {{accept}}. I don't see any reason 
why this class shouldn't be in that hierarchy.

 


was (Author: al.psc):
[~kirill.shirokov],

50. {{BulkLoadProcessor#close}}

>See javadoc.

I have seen it. Still this class is not a member of any hierarchy, {{close}} 
method neither overrides anything nor is overridden by anyone and thus now this 
param neither is used nor needed. Please remove it, adding it back once we need 
it won't be a big deal at all. Really, unused param in this case hardly is an 
architectural point of extension.

51. {{BulkLoadCacheWriter}}:

>O. But what is the reason for doing that?

At the very least, to maintain consistency about names of methods having 
similar, or close-to-identical meaning. Say, all our closures have their method 
called {{apply}} while this class employs {{accept}}. I don't see any reason 
why this class shouldn't be in that hierarchy.

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-02 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350605#comment-16350605
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

50. {{BulkLoadProcessor#close}}

>See javadoc.

I have seen it. Still this class is not a member of any hierarchy, {{close}} 
method neither overrides anything nor is overridden by anyone and thus now this 
param neither is used nor needed. Please remove it, adding it back once we need 
it won't be a big deal at all. Really, unused param in this case hardly is an 
architectural point of extension.

51. {{BulkLoadCacheWriter}}:

>O. But what is the reason for doing that?

At the very least, to maintain consistency about names of methods having 
similar, or close-to-identical meaning. Say, all our closures have their method 
called {{apply}} while this class employs {{accept}}. I don't see any reason 
why this class shouldn't be in that hierarchy.

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-02 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350544#comment-16350544
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov], my comments:

44. {{BulkLoadProcessor#close}} - unused method param.

45. {{BulkLoadDataConverter}} - we don't need this class explicitly anymore, 
just create anonymous closure in-place.

46. About *32.2* - I simply suggest that you add one more ancestor to 
{{extends}} clause, this ancestor must be one of Ignite closures. Similar to 
what you did with {{BulkLoadDataConverter}}.

47. {{BulkLoadAckClientParameters#DEFAULT_INPUT_CHARSET}} - unused.

48. {{DmlStatementsProcessor#runDmlStatement(String sql, SqlCommand cmd):}}

48.1 Please rename to {{runNativeDmlStatement}} - it's not exactly an overload 
of another method with same name existing in this class.

48.2 Please remove braces after {{if (cmd instanceof SqlBulkLoadCommand) {}} - 
one-liners should not be surrounded with braces per Ignite conventions.

49. {{JdbcRequestHandler#processBulkLoadFileBatch}}: we try to actually process 
batch even when we have received {{CMD_FINISHED_ERROR}} (there's no check 
before attempting to process batch). Are you sure we want to do this regardless 
of state flag value?

This is it for now. Thanks!

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-02 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349007#comment-16349007
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/2/18 12:28 PM:
--

[~kirill.shirokov],

31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I 
think we should remove it and simply pass {{UpdatePlan}} to 
{{BulkLoadProcessor}}.

32. {{BulkLoadCacheWriter}}:

32.1 Odd empty line after interface header.

32.2 Leaving this interface be does not contradict with making it extend some 
of existing Ignite interfaces. Please add another appropriate ancestor to this 
interface.

33. Regarding this:

>Does this anti-pattern makes sense here? Or we're about keeping similarity 
>with existing code and bugward compatibility?

First, providing for resilience is one of the reasons that justify "error 
hiding". We aim not to let JDBC handler or NIO listener crash.

Second, here's no "hiding" per se as exception is properly logged and we send 
the user message.

34. {{BulkLoadProcessor}}:

34.1 odd empty line after ctor header.

34.2 {{processBatch}} is still coupled with JDBC related class.

34.3 I think class name {{BulkLoadContext}} would reflect what this class does 
more correctly. In fact, it does not process anything, just holds various bits 
that help the process. In favor of this tells the fact that you even use the 
word "context" in javadoc for this class.

35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have 
already suggested that we introduce some JDBC neutral enum to indicate current 
stage of the process.

36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if".

37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply 
{{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has 
to do with the command overall and *not* with an individual batch. It's sent 
once per bulk load.

38. {{UpdateMode}}: there's a typo in javadocs you've added to all enum 
members. Please correct: com *_m_* and

39. {{UpdatePlanBuilder}}: unused new static import.

40. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, 
not {{long}}?

41. {{JdbcQueryTask#call}}: I think it would be better to gracefully close 
newly created context before throwing an exception.

42. {{BulkLoadCsvFormat}}: many new members and their getters have "Re" suffix. 
It's not standard to Ignite codebase. I would either replace it to "regex" or 
"regexp", or even remove it - the user of this class can clearly see what is 
return type of those getters, and those suffixes actually don't tell anything 
new.

43. {{DdlStatementsProcessor#runDdlStatement}}: why employ log throttling here? 
Why would we want to skip some error messages?

This is it for now, thanks. Looking forward to fixes.


was (Author: al.psc):
[~kirill.shirokov],

31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I 
think we should remove it and simply pass {{UpdatePlan}} to 
{{BulkLoadProcessor}}.

32. {{BulkLoadCacheWriter}}:

32.1 Odd empty line after interface header.

32.2 Leaving this interface be does not contradict with making it extend some 
of existing Ignite interfaces. Please add another appropriate ancestor to this 
interface.

33. Regarding this:

>Does this anti-pattern makes sense here? Or we're about keeping similarity 
>with existing code and bugward compatibility?

First, providing for resilience is one of the reasons that justify "error 
hiding". We aim not to let JDBC handler or NIO listener crash.

Second, here's no "hiding" per se as exception is properly logged and we send 
the user message.

34. {{BulkLoadProcessor}}:

34.1 odd empty line after ctor header.

34.2 {{processBatch}} is still coupled with JDBC related class.

34.3 I think class name {{BulkLoadContext}} would reflect what this class does 
more correctly. In fact, it does not process anything, just holds various bits 
that help the process. In favor of this tells the fact that you even use the 
word "context" in javadoc for this class.

35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have 
already suggested that we introduce some JDBC neutral enum to indicate current 
stage of the process.

36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if".

37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply 
{{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has 
to do with the command overall and *not* with an individual batch. It's sent 
once per bulk load.

38. \{{UpdateMode: t}}here's a typo in javadocs you've added to all enum 
members. Please correct: com*_m_*and

39. {{UpdatePlanBuilder}}: unused new static import.

40. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, 
not {{long}}?

41. {{JdbcQueryTask#call}}: I think it would be better to gracefully close 
newly created 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349007#comment-16349007
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 6:00 PM:
-

[~kirill.shirokov],

31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I 
think we should remove it and simply pass {{UpdatePlan}} to 
{{BulkLoadProcessor}}.

32. {{BulkLoadCacheWriter}}:

32.1 Odd empty line after interface header.

32.2 Leaving this interface be does not contradict with making it extend some 
of existing Ignite interfaces. Please add another appropriate ancestor to this 
interface.

33. Regarding this:

>Does this anti-pattern makes sense here? Or we're about keeping similarity 
>with existing code and bugward compatibility?

First, providing for resilience is one of the reasons that justify "error 
hiding". We aim not to let JDBC handler or NIO listener crash.

Second, here's no "hiding" per se as exception is properly logged and we send 
the user message.

34. {{BulkLoadProcessor}}:

34.1 odd empty line after ctor header.

34.2 {{processBatch}} is still coupled with JDBC related class.

34.3 I think class name {{BulkLoadContext}} would reflect what this class does 
more correctly. In fact, it does not process anything, just holds various bits 
that help the process. In favor of this tells the fact that you even use the 
word "context" in javadoc for this class.

35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have 
already suggested that we introduce some JDBC neutral enum to indicate current 
stage of the process.

36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if".

37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply 
{{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has 
to do with the command overall and *not* with an individual batch. It's sent 
once per bulk load.

38. \{{UpdateMode: t}}here's a typo in javadocs you've added to all enum 
members. Please correct: com*_m_*and

39. {{UpdatePlanBuilder}}: unused new static import.

40. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, 
not {{long}}?

41. {{JdbcQueryTask#call}}: I think it would be better to gracefully close 
newly created context before throwing an exception.

42. {{BulkLoadCsvFormat}}: many new members and their getters have "Re" suffix. 
It's not standard to Ignite codebase. I would either replace it to "regex" or 
"regexp", or even remove it - the user of its class can clearly see what is 
return type of those getters, and those suffixes actually don't tell anything 
new.

43. {{DdlStatementsProcessor#runDdlStatement}}: why employ log throttling here? 
Why would we want to skip some error messages?

This is it for now, thanks. Looking forward to fixes.


was (Author: al.psc):
[~kirill.shirokov],

31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I 
think we should remove it and simply pass {{UpdatePlan}} to 
{{BulkLoadProcessor}}.

32. {{BulkLoadCacheWriter}}:

32.1 Odd empty line after interface header.

32.2 Leaving this interface be does not contradict with making it extend some 
of existing Ignite interfaces. Please add another appropriate ancestor to this 
interface.

33. Regarding this:

>Does this anti-pattern makes sense here? Or we're about keeping similarity 
>with existing code and bugward compatibility?

First, providing for resilience is one of the reasons that justify "error 
hiding". We aim not to let JDBC handler or NIO listener crash.

Second, here's no "hiding" per se as exception is properly logged and we send 
the user message.

34. {{BulkLoadProcessor}}:

34.1 odd empty line after ctor header.

34.2 {{processBatch}} is still coupled with JDBC related class.

34.3 I think class name {{BulkLoadContext}} would reflect what this class does 
more correctly. In fact, it does not process anything, just holds various bits 
that help the process. In favor of this tells the fact that you even use the 
word "context" in javadoc for this class.

35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have 
already suggested that we introduce some JDBC neutral enum to indicate current 
stage of the process.

36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if".

37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply 
{{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has 
to do with the command overall and *not* with an individual batch. It's sent 
once per bulk load.

38. Point *26* is unanswered and I don't see related code changes either.

39. {{UpdateMode: t}}here's a typo in javadocs you've added to all enum 
members. Please correct: com*_m_*and

40. {{UpdatePlanBuilder}}: unused new static import.

41. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, 
not {{long}}?

42. 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349007#comment-16349007
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I 
think we should remove it and simply pass {{UpdatePlan}} to 
{{BulkLoadProcessor}}.

32. {{BulkLoadCacheWriter}}:

32.1 Odd empty line after interface header.

32.2 Leaving this interface be does not contradict with making it extend some 
of existing Ignite interfaces. Please add another appropriate ancestor to this 
interface.

33. Regarding this:

>Does this anti-pattern makes sense here? Or we're about keeping similarity 
>with existing code and bugward compatibility?

First, providing for resilience is one of the reasons that justify "error 
hiding". We aim not to let JDBC handler or NIO listener crash.

Second, here's no "hiding" per se as exception is properly logged and we send 
the user message.

34. {{BulkLoadProcessor}}:

34.1 odd empty line after ctor header.

34.2 {{processBatch}} is still coupled with JDBC related class.

34.3 I think class name {{BulkLoadContext}} would reflect what this class does 
more correctly. In fact, it does not process anything, just holds various bits 
that help the process. In favor of this tells the fact that you even use the 
word "context" in javadoc for this class.

35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have 
already suggested that we introduce some JDBC neutral enum to indicate current 
stage of the process.

36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if".

37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply 
{{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has 
to do with the command overall and *not* with an individual batch. It's sent 
once per bulk load.

38. Point *26* is unanswered and I don't see related code changes either.

39. {{UpdateMode: t}}here's a typo in javadocs you've added to all enum 
members. Please correct: com*_m_*and

40. {{UpdatePlanBuilder}}: unused new static import.

41. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, 
not {{long}}?

42. {{JdbcQueryTask#call}}: I think it would be better to gracefully close 
newly created context before throwing an exception.

43. {{BulkLoadCsvFormat}}: many new members and their getters have "Re" suffix. 
It's not standard to Ignite codebase. I would either replace it to "regex" or 
"regexp", or even remove it - the user of its class can clearly see what is 
return type of those getters, and those suffixes actually don't tell anything 
new.

44. {{DdlStatementsProcessor#runDdlStatement}}: why employ log throttling here? 
Why would we want to skip some error messages?

This is it for now, thanks. Looking forward to fixes.

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348807#comment-16348807
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov]

28. {{BulkLoadDataConverter}} - why columns number manipulation is still there?

29. {{BulkLoadCsvFormat}}:

29.1 Odd empty line after class header.

29.2 New members lack javadocs and empty separating lines

29.3 Empty ctor not setting anything.

30. {{BulkLoadCsvParser}}:

30.1 Odd empty line in comment for ctor.

30.2 {{parseBatch}} - please rename {{result}} to {{res}}

I'm still in progress of reviewing your updates, thanks.

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> * We may want to implement data compression in future.

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348533#comment-16348533
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

Comments:

{{BulkLoadDataConverter}} -
 # No comments neither for the class, nor for its members, please add.
 # I don't think it's a good idea to make new lists and add there anything - 
totally unnecessary GC pressure that can easily be avoided. Just add equality 
check to methods performing SQL operations as suggested above, or add to 
{{processRow}} a param that would control whether we're expecting *exact* 
number of columns or number of columns *at least* as present in the plan.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:45 AM:
-

[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these references are 
in deeper "guts". We need to decouple these somehow - one day we might want to 
support this feature in ODBC, or somewhere else, and then it'd be inevitable. 
I'd just introduce an enum for various command types.

10. {{BulkLoadFormat}}:

10.1 Unused import

10.2 Odd empty lines in the beginning of class, in switch, etc

11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably 
they should be single class and implementations could be merged accordingly 
too. Can you come up with a case where we would really need to separate this 
entities? Say, one parser for few formats, or one format yielding more than one 
type of parser?

12. {{BulkLoadCsvParser}}:

12.1 {{processBatch}} method looks like it could be actually defined in parent 
class, and the logic to be changed in subclasses actually lives in 
{{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are 
defined on higher level. Can we rework hierarchy accordingly?

12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at 
all?

12.3 Odd empty lines after class header, method declarations

13. {{BulkLoadEntryConverter}}: do we really expect more than one 
implementation of this interface? If no, let's just replace it with 
{{IgniteClosureX}} or something like that. We have more than enough interfaces 
and abstract classes to represent lambdas. *Before refactoring this, please 
look at pt.17.3 as well*

14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, 
and this further assures that parser and its format must be one.

15. {{BulkLoadContext}}: too long comment line.

16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this 
interface? If no, just remove this class and let {{BulkLoadContext}} know about 
streamer. If yes, make it inherit one of Ignite's provided classes/interfaces 
for lambdas. *Before refactoring this, please look at pt.17.3 as well*

17. {{JdbcRequestHandler}}:

17.1 Unused import

17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, 
please do the same here.

17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data 
conversion, feeding to streamer) is just plain wrong. Not only it's logic that 
could be reused when we'll implement same feature for other interface like 
JDBC, it's also not the area of responsibility of JDBC request handler. This 
again shows the need of refactoring of {{BulkLoadContext}}: probably we need to 
introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. 
It will just accept {{List}} and do with it whatever it wants. JDBC 
must not know about this at all, it must just detect correct context and feed 
lists of objects to that new class I've mentioned above.

17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof 
BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would 
decrease scope depth, make code better readable and decrease overall size of 
diff (never hurts).

17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to 
do with files.

18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 
usages), not to mention that field name is very misleading - what does JDBC 
have to do with parsing?

19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we 
should mix words "request" and "result", we never do so in JDBC. Not to mention 
that we never send {{JdbcBulkLoadBatchRequest}} *before* we get 
{{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to 
{{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} 
command and never else.

This is not all, more comments will follow.


was (Author: al.psc):
[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:44 AM:
-

[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these references are 
in deeper "guts". We need to decouple these somehow - one day we might want to 
support this feature in ODBC, or somewhere else, and then it'd be inevitable. 
I'd just introduce an enum for various command types.

10. {{BulkLoadFormat}}:

10.1 Unused import

10.2 Odd empty lines in the beginning of class, in switch, etc

11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably 
they should be single class and implementations could be merged accordingly 
too. Can you come up with a case where we would really need to separate this 
entities? Say, one parser for few formats, or one format yielding more than one 
type of parser?

12. {{BulkLoadCsvParser}}:

12.1 {{processBatch}} method looks like it could be actually defined in parent 
class, and the logic to be changed in subclasses actually lives in 
{{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are 
defined on higher level. Can we rework hierarchy accordingly?

12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at 
all?

12.3 Odd empty lines after class header, method declarations

13. {{BulkLoadEntryConverter}}: do we really expect more than one 
implementation of this interface? If no, let's just replace it with 
{{IgniteClosureX}} or something like that. We have more than enough interfaces 
and abstract classes to represent lambdas. *Before refactoring this, please 
look at pt.10.3 as well*

14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, 
and this further assures that parser and its format must be one.

15. {{BulkLoadContext}}: too long comment line.

16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this 
interface?quit If no, just remove this class and let {{BulkLoadContext}} know 
about streamer. If yes, make it inherit one of Ignite's provided 
classes/interfaces for lambdas. *Before refactoring this, please look at 
pt.10.3 as well*

17. {{JdbcRequestHandler}}:

17.1 Unused import

17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, 
please do the same here.

17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data 
conversion, feeding to streamer) is just plain wrong. Not only it's logic that 
could be reused when we'll implement same feature for other interface like 
JDBC, it's also not the area of responsibility of JDBC request handler. This 
again shows the need of refactoring of {{BulkLoadContext}}: probably we need to 
introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. 
It will just accept {{List}} and do with it whatever it wants. JDBC 
must not know about this at all, it must just detect correct context and feed 
lists of objects to that new class I've mentioned above.

17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof 
BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would 
decrease scope depth, make code better readable and decrease overall size of 
diff (never hurts).

17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to 
do with files.

18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 
usages), not to mention that field name is very misleading - what does JDBC 
have to do with parsing?

19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we 
should mix words "request" and "result", we never do so in JDBC. Not to mention 
that we never send {{JdbcBulkLoadBatchRequest}} *before* we get 
{{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to 
{{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} 
command and never else.

This is not all, more comments will follow.


was (Author: al.psc):
[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:43 AM:
-

[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these references are 
in deeper "guts". We need to decouple these somehow - one day we might want to 
support this feature in ODBC, or somewhere else, and then it'd be inevitable. 
I'd just introduce an enum for various command types.

10. {{BulkLoadFormat}}:

10.1 Unused import

10.2 Odd empty lines in the beginning of class, in switch, etc

11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably 
they should be single class and implementations could be merged accordingly 
too. Can you come up with a case where we would really need to separate this 
entities? Say, one parser for few formats, or one format yielding more than one 
type of parser?

12. {{BulkLoadCsvParser}}:

12.1\{{ processBatch}} method looks like it could be actually defined in parent 
class, and the logic to be changed in subclasses actually lives in 
{{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are 
defined on higher level. Can we rework hierarchy accordingly?

12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at 
all?

12.3 Odd empty lines after class header, method declarations

13. {{BulkLoadEntryConverter}}: do we really expect more than one 
implementation of this interface? If no, let's just replace it with 
{{IgniteClosureX}} or something like that. We have more than enough interfaces 
and abstract classes to represent lambdas. *Before refactoring this, please 
look at pt.10.3 as well*

14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, 
and this further assures that parser and its format must be one.

15. {{BulkLoadContext}}: too long comment line.

16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this 
interface?quit If no, just remove this class and let {{BulkLoadContext}} know 
about streamer. If yes, make it inherit one of Ignite's provided 
classes/interfaces for lambdas. *Before refactoring this, please look at 
pt.10.3 as well*

17. {{JdbcRequestHandler}}:

17.1 Unused import

17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, 
please do the same here.

17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data 
conversion, feeding to streamer) is just plain wrong. Not only it's logic that 
could be reused when we'll implement same feature for other interface like 
JDBC, it's also not the area of responsibility of JDBC request handler. This 
again shows the need of refactoring of {{BulkLoadContext}}: probably we need to 
introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. 
It will just accept {{List}} and do with it whatever it wants. JDBC 
must not know about this at all, it must just detect correct context and feed 
lists of objects to that new class I've mentioned above.

17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof 
BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would 
decrease scope depth, make code better readable and decrease overall size of 
diff (never hurts).

17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to 
do with files.

18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 
usages), not to mention that field name is very misleading - what does JDBC 
have to do with parsing?

19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we 
should mix words "request" and "result", we never do so in JDBC. Not to mention 
that we never send {{JdbcBulkLoadBatchRequest}} *before* we get 
{{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to 
{{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} 
command and never else.

This is not all, more comments will follow.


was (Author: al.psc):
[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348211#comment-16348211
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov]

> I think it's OK to silently skip extra columns. 

It's not entirely ok here as this method ({{processRow}}) is called from other 
contexts too, like INSERT/MERGE. Probably the least dirty way here would be 
just to add equality check to other methods (other than {{bulkLoad}}) calling 
{{processRow}}.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> * We may want to implement data compression in future.
> * We connect to single node in JDBC driver (no multi-node connections).
> h3. Cache/tables/column handling
> * We don't create 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347177#comment-16347177
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:19 AM:
-

[~kirill.shirokov], please allow me to proceed.

20. {{SqlParserCopySelfTest}} - no single test for positive cases, only 
negative stuff. We need to cover positive cases extensively as well.

21. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be 
better to keep update counter in {{BulkLoadContext}} as the notion of counter 
is used not just in JDBC driver.

22. {{JdbcThinStatement}} -

22.1 let's catch and wrap any exception in {{sendFile}}

22.2 Odd empty line in the beginning of {{sendFile}}

23. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in 
CSV line contain commas, do we? This clearly does not seem right and I don't 
see any tests for that either. Shouldn't we handle escaped quotes too?

24. All {{*PipelineBlock}} classes - odd empty line after class header.

25. {{StrListAppenderBlock}} - do we need list of lists here? Probably just 
list of arrays could do?

26. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in 
fake "row" while all other parameters should be retrieved via getters. Can we 
unify this? (E.g. add getters for everything AND return everything in fake row 
- that would be consistent.)

27. Please add more tests for behavior in case of failure - server or client 
disconnect during file processing, file open error on client, etc.

I think this is it for now. Looking forward to the fixes.


was (Author: al.psc):
[~kirill.shirokov], please allow me to proceed.

 

1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative 
stuff. We need to cover positive cases extensively as well.

2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be 
better to keep update counter in {{BulkLoadContext}} as the notion of counter 
is used not just in JDBC driver.

3. {{JdbcThinStatement}} -

3.1 let's catch and wrap any exception in {{sendFile}}

3.2 Odd empty line in the beginning of {{sendFile}}

4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV 
line contain commas, do we? This clearly does not seem right and I don't see 
any tests for that either. Shouldn't we handle escaped quotes too?

5. All {{*PipelineBlock}} classes - odd empty line after class header.

6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list 
of arrays could do?

7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in 
fake "row" while all other parameters should be retrieved via getters. Can we 
unify this? (E.g. add getters for everything AND return everything in fake row 
- that would be consistent.)

8. Please add more tests for behavior in case of failure - server or client 
disconnect during file processing, file open error on client, etc.

I think this is it for now. Looking forward to the fixes.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-02-01 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:17 AM:
-

[~kirill.shirokov], please let me continue with comments.

8. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

9. {{BulkLoadParser}}:

9.1 Unused private field.

9.2 Too long line in {{processBatch}} declaration.

9.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these references are 
in deeper "guts". We need to decouple these somehow - one day we might work to 
support this feature in ODBC, or somewhere else, and then it'd be inevitable. 
I'd just introduce an enum for various command types.

10. {{BulkLoadFormat}}:

10.1 Unused import

10.2 Odd empty lines in the beginning of class, in switch, etc

11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably 
they should be single class and implementations could be merged accordingly 
too. Can you come up with a case where we would really need to separate this 
entities? Say, one parser for few formats, or one format yielding more than one 
type of parser?

12. {{BulkLoadCsvParser}}:

12.1{{ processBatch}} method looks like it could be actually defined in parent 
class, and the logic to be changed in subclasses actually lives in 
{{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are 
defined on higher level. Can we rework hierarchy accordingly?

12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at 
all?

12.3 Odd empty lines after class header, method declarations

13. {{BulkLoadEntryConverter}}: do we really expect more than one 
implementation of this interface? If no, let's just replace it with 
{{IgniteClosureX}} or something like that. We have more than enough interfaces 
and abstract classes to represent lambdas. *Before refactoring this, please 
look at pt.10.3 as well*

14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, 
and this further assures that parser and its format must be one.

15. {{BulkLoadContext}}: too long comment line.

16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this 
interface?quit If no, just remove this class and let {{BulkLoadContext}} know 
about streamer. If yes, make it inherit one of Ignite's provided 
classes/interfaces for lambdas. *Before refactoring this, please look at 
pt.10.3 as well*

17. {{JdbcRequestHandler}}:

17.1 Unused import

17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, 
please do the same here.

17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data 
conversion, feeding to streamer) is just plain wrong. Not only it's logic that 
could be reused when we'll implement same feature for other interface like 
JDBC, it's also not the area of responsibility of JDBC request handler. This 
again shows the need of refactoring of {{BulkLoadContext}}: probably we need to 
introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. 
It will just accept {{List}} and do with it whatever it wants. JDBC 
must not know about this at all, it must just detect correct context and feed 
lists of objects to that new class I've mentioned above.

17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof 
BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would 
decrease scope depth, make code better readable and decrease overall size of 
diff (never hurts).

17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to 
do with files.

18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 
usages), not to mention that field name is very misleading - what does JDBC 
have to do with parsing?

19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we 
should mix words "request" and "result", we never do so in JDBC. Not to mention 
that we never send {{JdbcBulkLoadBatchRequest}} *before* we get 
{{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to 
{{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} 
command and never else.

This is not all, more comments will follow.


was (Author: al.psc):
[~kirill.shirokov], please let me continue with comments.

1. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

2. {{BulkLoadParser}}:

2.1 Unused private field.

2.2 Too long line in {{processBatch}} declaration.

2.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347233#comment-16347233
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

And I'll rename all points to use the same single through numeration, sorry for 
the inconvenience.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block size 
> per batch).
> * We may want to implement data compression in future.
> * We connect to single node in JDBC driver (no multi-node connections).
> h3. Cache/tables/column handling
> * We don't create table in the bulk load command
> * We may want to have and option for reading header row, which contains 
> column names to match columns
> * In future we may wish to support COLUMNS (col1, _, col2, _, col3) syntax, 
> where 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347230#comment-16347230
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 5:28 PM:
--

[~kirill.shirokov],

Regarding 7.2 - yes, please disregard empty columns list. Suppose there's _key 
and _val.

Regarding empty lines - they are always added for readability but we don't add 
them right after line ending with {, be it method or class declaration or loop 
or switch. Yes, with long method declarations this may not look good, but for 
consistency let's stick with what I suggest, thanks.

Regarding 5 - ok, Vlad's always right. Still IMHO main thing is not simpler 
code but better usability here. (Just saying.)


was (Author: al.psc):
[~kirill.shirokov],

Regarding 7.2 - yes, please disregard empty columns list. Suppose there's _key 
and _val.

Regarding empty fields - they are always added for readability but we don't add 
them right after line ending with {, be it method or class declaration or loop 
or switch. Yes, with long method declarations this may not look good, but for 
consistency let's stick with what I suggest, thanks.

Regarding 5 - ok, Vlad's always right. Still IMHO main thing is not simpler 
code but better usability here. (Just saying.)

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347230#comment-16347230
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov],

Regarding 7.2 - yes, please disregard empty columns list. Suppose there's _key 
and _val.

Regarding empty fields - they are always added for readability but we don't add 
them right after line ending with {, be it method or class declaration or loop 
or switch. Yes, with long method declarations this may not look good, but for 
consistency let's stick with what I suggest, thanks.

Regarding 5 - ok, Vlad's always right. Still IMHO main thing is not simpler 
code but better usability here. (Just saying.)

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one quote character is present or escape sequence is invalid.
> h2. File handling
> * File character set to be supported in future
> * Skipped/imported row number (or first/last line or skip header option), 
> skipped/imported column number (or first/last column): to be supported in 
> future
> * Line start pattern (as in MySQL): no support planned
> * We currently support only client-side import. No server-side file import.
> * We may want to support client-side stdin import in future.
> * We do not handle importing multiple files from single command
> * We don't benefit from any kind of pre-sorting pre-partitioning data on 
> client side.
> * We don't include any any metadata, such as line number from client side.
> h3. Transferring data
> * We send file data via batches. In future we will support batch size 
> (specified with rows per batch or data block 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347177#comment-16347177
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 5:22 PM:
--

[~kirill.shirokov], please allow me to proceed.

 

1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative 
stuff. We need to cover positive cases extensively as well.

2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be 
better to keep update counter in {{BulkLoadContext}} as the notion of counter 
is used not just in JDBC driver.

3. {{JdbcThinStatement}} -

3.1 let's catch and wrap any exception in {{sendFile}}

3.2 Odd empty line in the beginning of {{sendFile}}

4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV 
line contain commas, do we? This clearly does not seem right and I don't see 
any tests for that either. Shouldn't we handle escaped quotes too?

5. All {{*PipelineBlock}} classes - odd empty line after class header.

6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list 
of arrays could do?

7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in 
fake "row" while all other parameters should be retrieved via getters. Can we 
unify this? (E.g. add getters for everything AND return everything in fake row 
- that would be consistent.)

8. Please add more tests for behavior in case of failure - server or client 
disconnect during file processing, file open error on client, etc.

I think this is it for now. Looking forward to the fixes.


was (Author: al.psc):
[~kirill.shirokov], please allow me to proceed.

 

1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative 
stuff. We need to cover positive cases extensively as well.

2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be 
better to keep update counter in {{BulkLoadContext}} as the notion of counter 
is used not just in JDBC driver.

3. {{JdbcThinStatement}} -

3.1 let's catch and wrap any exception in {{sendFile}}

3.2 Odd empty line in the beginning of {{sendFile}}

4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV 
line contain commas, do we? This clearly does not seem right and I don't see 
any tests for that either. Shouldn't we handle escaped quotes too?

5. All {{*PipelineBlock}} classes - odd empty line after class header.

6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list 
of arrays could do?

7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in 
fake "row" while all other parameters should be retrieved via getters. Can we 
unify this? (E.g. add getters for everything AND return everything in fake row 
- that would be consistent.)

8. Please add more tests for behavior in case of failure - server or client 
disconnect during file processing, file open error on client, etc.

I think this is it for now. Looking forward for the fixes.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] 

[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347216#comment-16347216
 ] 

Alexander Paschenko commented on IGNITE-3999:
-

[~aakhmedov] I've started looking at the patch. I'm in progress, meanwhile 
please ensure that:
 * all method overrides have javadoc like this: /** \{@inheritDoc} */
 * all new parameters have matching javadocs.

> Support case insensitive search in SQL
> --
>
> Key: IGNITE-3999
> URL: https://issues.apache.org/jira/browse/IGNITE-3999
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 1.7
>Reporter: Valentin Kulichenko
>Assignee: Amir Akhmedov
>Priority: Critical
> Fix For: 2.5
>
>
> Currently case insensitive search is possible only with the help of 
> {{lower()}} function:
> {code}
> select name from MyValue where lower(name) = 'abc_5'
> {code}
> But this will always be a full scan, even if {{name}} field is indexed.
> We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add 
> a respective property to {{@QuerySqlField}} annotation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (IGNITE-3999) Support case insensitive search in SQL

2018-01-31 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-3999:

Comment: was deleted

(was: Upsource review for the patch)

> Support case insensitive search in SQL
> --
>
> Key: IGNITE-3999
> URL: https://issues.apache.org/jira/browse/IGNITE-3999
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 1.7
>Reporter: Valentin Kulichenko
>Assignee: Amir Akhmedov
>Priority: Critical
> Fix For: 2.5
>
>
> Currently case insensitive search is possible only with the help of 
> {{lower()}} function:
> {code}
> select name from MyValue where lower(name) = 'abc_5'
> {code}
> But this will always be a full scan, even if {{name}} field is indexed.
> We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add 
> a respective property to {{@QuerySqlField}} annotation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347197#comment-16347197
 ] 

Alexander Paschenko commented on IGNITE-3999:
-

Added UpSource review link for the patch.

> Support case insensitive search in SQL
> --
>
> Key: IGNITE-3999
> URL: https://issues.apache.org/jira/browse/IGNITE-3999
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 1.7
>Reporter: Valentin Kulichenko
>Assignee: Amir Akhmedov
>Priority: Critical
> Fix For: 2.5
>
>
> Currently case insensitive search is possible only with the help of 
> {{lower()}} function:
> {code}
> select name from MyValue where lower(name) = 'abc_5'
> {code}
> But this will always be a full scan, even if {{name}} field is indexed.
> We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add 
> a respective property to {{@QuerySqlField}} annotation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347196#comment-16347196
 ] 

Alexander Paschenko commented on IGNITE-3999:
-

Upsource review for the patch

> Support case insensitive search in SQL
> --
>
> Key: IGNITE-3999
> URL: https://issues.apache.org/jira/browse/IGNITE-3999
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 1.7
>Reporter: Valentin Kulichenko
>Assignee: Amir Akhmedov
>Priority: Critical
> Fix For: 2.5
>
>
> Currently case insensitive search is possible only with the help of 
> {{lower()}} function:
> {code}
> select name from MyValue where lower(name) = 'abc_5'
> {code}
> But this will always be a full scan, even if {{name}} field is indexed.
> We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add 
> a respective property to {{@QuerySqlField}} annotation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347191#comment-16347191
 ] 

Alexander Paschenko commented on IGNITE-3999:
-

[~aakhmedov], [~vkulichenko], I will do that, thanks

> Support case insensitive search in SQL
> --
>
> Key: IGNITE-3999
> URL: https://issues.apache.org/jira/browse/IGNITE-3999
> Project: Ignite
>  Issue Type: Improvement
>  Components: sql
>Affects Versions: 1.7
>Reporter: Valentin Kulichenko
>Assignee: Amir Akhmedov
>Priority: Critical
> Fix For: 2.5
>
>
> Currently case insensitive search is possible only with the help of 
> {{lower()}} function:
> {code}
> select name from MyValue where lower(name) = 'abc_5'
> {code}
> But this will always be a full scan, even if {{name}} field is indexed.
> We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add 
> a respective property to {{@QuerySqlField}} annotation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347177#comment-16347177
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov], please allow me to proceed.

 

1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative 
stuff. We need to cover positive cases extensively as well.

2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be 
better to keep update counter in {{BulkLoadContext}} as the notion of counter 
is used not just in JDBC driver.

3. {{JdbcThinStatement}} -

3.1 let's catch and wrap any exception in {{sendFile}}

3.2 Odd empty line in the beginning of {{sendFile}}

4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV 
line contain commas, do we? This clearly does not seem right and I don't see 
any tests for that either. Shouldn't we handle escaped quotes too?

5. All {{*PipelineBlock}} classes - odd empty line after class header.

6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list 
of arrays could do?

7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in 
fake "row" while all other parameters should be retrieved via getters. Can we 
unify this? (E.g. add getters for everything AND return everything in fake row 
- that would be consistent.)

8. Please add more tests for behavior in case of failure - server or client 
disconnect during file processing, file open error on client, etc.

I think this is it for now. Looking forward for the fixes.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. Implementation decisions and notes
> h2. Parsing
> * We support CSV format described in RFC 4180.
> * Custom row and column separators, quoting characters are currently hardcoded
> * Escape sequences, line comment characters are currently not supported
> * We may want to support fixed-length formats (via format descriptors) in 
> future
> * We may want to strip comments from lines (for example, starting with '#')
> * We may want to allow user to either ignore empty lines or treat them as a 
> special case of record having all default values
> * We may allow user to enable whitespace trimming from beginning and end of a 
> line
> * We may want to allow user to specify error handling strategy: e.g., only 
> one 

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345541#comment-16345541
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 2:47 PM:
--

[~kirill.shirokov], my comments:

1. {{JdbcAbstractUpdateStatementSelfTest}} - unused code

2. {{org.apache.ignite.internal.jdbc2.JdbcBulkLoadSelfTest: createConnection}} 
- unused param; {{testBulkLoadThrows}} - please use try-with-resources.

3. {{org.apache.ignite.jdbc.thin.JdbcThinAbstractDmlStatementSelfTest:}}

3.1 {{jdbcRun}}: please remove this method and use inherited method execute 
instead.

3.2 {{testDOA}}: please rename into something more meaningful.

4. {{BulkLoadContextCursor:}}

4.1 Please make field final.

4.2 {{getAll}} - I changed the code to double call of {{singletonList}} just 
fine.

4.3 {{getFieldName}} - I believe this method should throw 
{{NoSuchElementException}} for all {{i}} except {{0}}.

4.4 Odd empty line right after class header.

5. {{SqlBulkLoadCommand}}: let's parse command in a way consistent with how 
it's done in SQL usually - we clearly should not use ugly stuff like 
{{BATCH_SIZE}} (with underscore instead of space) now that we are headed to our 
own parser. Please rework parsing so that user has to write {{BATCH SIZE}}. 
(NB: yes, I'm aware that we already have quite ugly stuff here, like 
{{INLINE_SIZE}} for indexes, but I believe that this must be avoided as much as 
possible.) -Also I think we should support case when the user does not supply 
columns list (so no empty brackets too) - currently it's a parsing error and 
there's a test for it.- _(Please disregard what's been struck through.)_

6. {{UpdatePlan#processRow}}:

6.1 Odd empty line after method declaration.

6.2. Shouldn't we rather check for equality here? {{row.size() < 
colNames.length}} - like this

7. {{UpdatePlanBuilder#planForBulkLoad}}:

7.1 Odd empty line after method declaration.

7.2 Let's add VERY simple test case (like cache of  and bulk 
load to it) - I think this may break on current implementation as we may 
actually will to insert into _val column, and this case does not seem to be 
handled - say, like this {{COPY from file to String () format csv}} (String is 
table name, table has 3 columns all of which are system columns.)

 

This is not all, more comments will follow.

 

 


was (Author: al.psc):
[~kirill.shirokov], my comments:

1. {{JdbcAbstractUpdateStatementSelfTest}} - unused code

2. {{org.apache.ignite.internal.jdbc2.JdbcBulkLoadSelfTest: createConnection}} 
- unused param; {{testBulkLoadThrows}} - please use try-with-resources.

3. {{org.apache.ignite.jdbc.thin.JdbcThinAbstractDmlStatementSelfTest:}}

3.1 {{jdbcRun}}: please remove this method and use inherited method execute 
instead.

3.2 {{testDOA}}: please rename into something more meaningful.

4. {{BulkLoadContextCursor:}}

4.1 Please make field final.

4.2 {{getAll}} - I changed the code to double call of {{singletonList}} just 
fine.

4.3 {{getFieldName}} - I believe this method should throw 
{{NoSuchElementException}} for all {{i}} except {{0}}.

4.4 Odd empty line right after class header.

5. {{SqlBulkLoadCommand}}: let's parse command in a way consistent with how 
it's done in SQL usually - we clearly should not use ugly stuff like 
{{BATCH_SIZE}} (with underscore instead of space) now that we are headed to our 
own parser. Please rework parsing so that user has to write {{BATCH SIZE}}. 
(NB: yes, I'm aware that we already have quite ugly stuff here, like 
{{INLINE_SIZE}} for indexes, but I believe that this must be avoided as much as 
possible.) Also I think we should support case when the user does not supply 
columns list (so no empty brackets too) - currently it's a parsing error and 
there's a test for it.

6. {{UpdatePlan#processRow}}:

6.1 Odd empty line after method declaration.

6.2. Shouldn't we rather check for equality here? {{row.size() < 
colNames.length}} - like this

7. {{UpdatePlanBuilder#planForBulkLoad}}:

7.1 Odd empty line after method declaration.

7.2 Let's add VERY simple test case (like cache of  and bulk 
load to it) - I think this may break on current implementation as we may 
actually will to insert into _val column, and this case does not seem to be 
handled - say, like this {{COPY from file to String () format csv}} (String is 
table name, table has 3 columns all of which are system columns.)

 

This is not all, more comments will follow.

 

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>

[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774
 ] 

Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 1:13 PM:
--

[~kirill.shirokov], please let me continue with comments.

1. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

2. {{BulkLoadParser}}:

2.1 Unused private field.

2.2 Too long line in {{processBatch}} declaration.

2.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these references are 
in deeper "guts". We need to decouple these somehow - one day we might work to 
support this feature in ODBC, or somewhere else, and then it'd be inevitable. 
I'd just introduce an enum for various command types.

3. {{BulkLoadFormat}}:

3.1 Unused import

3.2 Odd empty lines in the beginning of class, in switch, etc

4. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably 
they should be single class and implementations could be merged accordingly 
too. Can you come up with a case where we would really need to separate this 
entities? Say, one parser for few formats, or one format yielding more than one 
type of parser?

5. {{BulkLoadCsvParser}}:

{{5.1 processBatch}} method looks like it could be actually defined in parent 
class, and the logic to be changed in subclasses actually lives in 
{{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are 
defined on higher level. Can we rework hierarchy accordingly?

5.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all?

5.3 Odd empty lines after class header, method declarations

6. {{BulkLoadEntryConverter}}: do we really expect more than one implementation 
of this interface? If no, let's just replace it with {{IgniteClosureX}} or 
something like that. We have more than enough interfaces and abstract classes 
to represent lambdas. *Before refactoring this, please look at pt.10.3 as well*

7. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, 
and this further assures that parser and its format must be one.

8. {{BulkLoadContext}}: too long comment line.

9. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this 
interface?quit If no, just remove this class and let {{BulkLoadContext}} know 
about streamer. If yes, make it inherit one of Ignite's provided 
classes/interfaces for lambdas. *Before refactoring this, please look at 
pt.10.3 as well*

10. {{JdbcRequestHandler}}:

10.1 Unused import

10.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, 
please do the same here.

10.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data 
conversion, feeding to streamer) is just plain wrong. Not only it's logic that 
could be reused when we'll implement same feature for other interface like 
JDBC, it's also not the area of responsibility of JDBC request handler. This 
again shows the need of refactoring of {{BulkLoadContext}}: probably we need to 
introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. 
It will just accept {{List}} and do with it whatever it wants. JDBC 
must not know about this at all, it must just detect correct context and feed 
lists of objects to that new class I've mentioned above.

10.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof 
BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would 
decrease scope depth, make code better readable and decrease overall size of 
diff (never hurts).

10.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to 
do with files.

11. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 
usages), not to mention that field name is very misleading - what does JDBC 
have to do with parsing?

12. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we 
should mix words "request" and "result", we never do so in JDBC. Not to mention 
that we never send {{JdbcBulkLoadBatchRequest}} *before* we get 
{{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to 
{{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} 
command and never else.

This is not all, more comments will follow.


was (Author: al.psc):
[~kirill.shirokov], please let me continue with comments.

1. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

2. {{BulkLoadParser}}:

2.1 Unused private field.

2.2 Too long line in {{processBatch}} declaration.

2.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-31 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov], please let me continue with comments.

1. {{BulkLoadParameters}} - do we really need this class? It just has two 
fields. For now I see no point in it. Do you expect many more new fields to 
come?

2. {{BulkLoadParser}}:

2.1 Unused private field.

2.2 Too long line in {{processBatch}} declaration.

2.3 It seems plain wrong that arguments of {{processBatch}} are related with 
JDBC. JDBC is a connectivity interface, nothing else, and these references are 
in deeper "guts". We need to decouple these somehow - one day we might work to 
support this feature in ODBC, or somewhere else, and then it'd be inevitable. 
I'd just introduce an enum for various command types.

3. {{BulkLoadFormat}}:

3.1 Unused import

3.2 Odd empty lines in the beginning of class, in switch, etc

4. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably 
they should be single class and implementations could be merged accordingly 
too. Can you come up with a case where we would really need to separate this 
entities? Say, one parser for few formats, or one format yielding more than one 
type of parser?

5. {{BulkLoadCsvParser}}:

{{5.1 processBatch}} method looks like it could be actually defined in parent 
class, and the logic to be changed in subclasses actually lives in 
{{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF }}are 
defined on higher level. Can we rework hierarchy accordingly?

5.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all?

5.3 Odd empty lines after class header, method declarations

6. {{BulkLoadEntryConverter}}: do we really expect more than one implementation 
of this interface? If no, let's just replace it with {{IgniteClosureX}} or 
something like that. We have more than enough interfaces and abstract classes 
to represent lambdas. *Before refactoring this, please look at pt.10.3 as well*

7. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, 
and this further assures that parser and its format must be one.

8. {{BulkLoadContext}}: too long comment line.

9. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this 
interface?quit If no, just remove this class and let {{BulkLoadContext}} know 
about streamer. If yes, make it inherit one of Ignite's provided 
classes/interfaces for lambdas. *Before refactoring this, please look at 
pt.10.3 as well*

10. {{JdbcRequestHandler}}:

10.1 Unused import

10.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, 
please do the same here.

10.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data 
conversion, feeding to streamer) is just plain wrong. Not only it's logic that 
could be reused when we'll implement same feature for other interface like 
JDBC, it's also not the area of responsibility of JDBC request handler. This 
again shows the need of refactoring of {{BulkLoadContext}}: probably we need to 
introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. 
It will just accept {{List}} and do with it whatever it wants. JDBC 
must not know about this at all, it must just detect correct context and feed 
lists of objects to that new class I've mentioned above.

10.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof 
BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would 
decrease scope depth, make code better readable and decrease overall size of 
diff (never hurts).

10.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to 
do with files.

11. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 
usages), not to mention that field name is very misleading - what does JDBC 
have to do with parsing?

12. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we 
should mix words "request" and "result", we never do so in JDBC. Not to mention 
that we never send {{JdbcBulkLoadBatchRequest}} *before* we get 
{{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to 
{{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} 
command and never else.

This is not all, more comments will follow.

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC 

[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading

2018-01-30 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345541#comment-16345541
 ] 

Alexander Paschenko commented on IGNITE-6917:
-

[~kirill.shirokov], my comments:

1. {{JdbcAbstractUpdateStatementSelfTest}} - unused code

2. {{org.apache.ignite.internal.jdbc2.JdbcBulkLoadSelfTest: createConnection}} 
- unused param; {{testBulkLoadThrows}} - please use try-with-resources.

3. {{org.apache.ignite.jdbc.thin.JdbcThinAbstractDmlStatementSelfTest:}}

3.1 {{jdbcRun}}: please remove this method and use inherited method execute 
instead.

3.2 {{testDOA}}: please rename into something more meaningful.

4. {{BulkLoadContextCursor:}}

4.1 Please make field final.

4.2 {{getAll}} - I changed the code to double call of {{singletonList}} just 
fine.

4.3 {{getFieldName}} - I believe this method should throw 
{{NoSuchElementException}} for all {{i}} except {{0}}.

4.4 Odd empty line right after class header.

5. {{SqlBulkLoadCommand}}: let's parse command in a way consistent with how 
it's done in SQL usually - we clearly should not use ugly stuff like 
{{BATCH_SIZE}} (with underscore instead of space) now that we are headed to our 
own parser. Please rework parsing so that user has to write {{BATCH SIZE}}. 
(NB: yes, I'm aware that we already have quite ugly stuff here, like 
{{INLINE_SIZE}} for indexes, but I believe that this must be avoided as much as 
possible.) Also I think we should support case when the user does not supply 
columns list (so no empty brackets too) - currently it's a parsing error and 
there's a test for it.

6. {{UpdatePlan#processRow}}:

6.1 Odd empty line after method declaration.

6.2. Shouldn't we rather check for equality here? {{row.size() < 
colNames.length}} - like this

7. {{UpdatePlanBuilder#planForBulkLoad}}:

7.1 Odd empty line after method declaration.

7.2 Let's add VERY simple test case (like cache of  and bulk 
load to it) - I think this may break on current implementation as we may 
actually will to insert into _val column, and this case does not seem to be 
handled - say, like this {{COPY from file to String () format csv}} (String is 
table name, table has 3 columns all of which are system columns.)

 

This is not all, more comments will follow.

 

 

> SQL: implement COPY command for efficient data loading
> --
>
> Key: IGNITE-6917
> URL: https://issues.apache.org/jira/browse/IGNITE-6917
> Project: Ignite
>  Issue Type: New Feature
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Kirill Shirokov
>Priority: Major
>  Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is 
> only possible to execute single commands one by one. We already can batch 
> them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file 
> (or input stream in general case) on the client side, then transfer data to 
> the cluster, and then execute update inside the cluster, e.g. through 
> streamer.
> First of all we need to create quick and dirty prototype to assess potential 
> performance improvement. It speedup is confirmed, we should build base 
> implementation which will accept only files. But at the same time we should 
> understand how it will evolve in future: multiple file formats (probably 
> including Hadoop formarts, e.g. Parquet), escape characters, input streams, 
> etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
> FROM "file.name"
> INTO .
> [(col-name, ...)]
> FORMAT  -- Only CSV format is supported in the current 
> release
> [BATCH_SIZE ]
> {noformat}
> We may want to gradually add features to this command in future to have 
> something like this:
> {noformat}
> COPY
> FROM "file.name"[CHARSET ""]
> INTO . [CREATE [IF NOT EXISTS]]
> [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]]
> FORMAT (csv|tsv|...)
> -- CSV format options:
> [FIELDSEP='column-separators-regexp']
> [LINESEP='row-separators-regexp']
> [QUOTE='quote-chars']
> [ESCAPE='escape-char']
> [NULL='null-sequence']
> [COMMENT='single-line-comment-start-char']
> [TRIM_LINES]
> [IMPORT_EMPTY_LINES]
> [CHARSET ""]
> [ROWS -]
> --or--
> [SKIP ROWS ] [MAX ROWS ]
> [COLS -]
> --or--
> [SKIP COLS ] [MAX COLS ]
> [(MATCH | SKIP) HEADER]
> [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS]
> [BATCH SIZE ( ROWS | [K|M|G|T|P])]
> [COMPRESS "codec-name" [codec options]]
> [LOCK (TABLE|ROWS)]
> [NOLOGGING]
> [BACKEND (DIRECT | STREAMER)]
> {noformat}
> h1. 

[jira] [Comment Edited] (IGNITE-7413) SqlDmlExample: Incorrect result for Delete if run with standalone nodes (Java & .NET)

2018-01-16 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327549#comment-16327549
 ] 

Alexander Paschenko edited comment on IGNITE-7413 at 1/16/18 6:49 PM:
--

[~ptupitsyn], I've done some investigation and here's what I've found.

If you ask me, currently the problem is not {{JOIN}}, but rather {{IN}} clause.

Consider following {{SELECT}} query (generated by DML engine):
{code:java}
SELECT
_KEY,
_VAL
FROM "SqlDmlExamplePersons".PERSON
WHERE ID IN( SELECT
P.ID
FROM "SqlDmlExamplePersons".PERSON P 
INNER JOIN "SqlDmlExampleOrganizations".ORGANIZATION O 
ON TRUE
WHERE (O.NAME <> ?1) AND (P.ORGID = O.ID) ){code}
It won't return correct result even if you set {{distributedJoins=true}} 
because distributed joins flag seems to work only for top-level query and 
doesn't seem to care about stuff like INs and other subqueries. That said, I 
believe that the problem is our joins implementation. What can be done with 
this and at what cost requires additional investigation.


was (Author: al.psc):
[~ptupitsyn], I've done some investigation and here's what I've found.

I've looked at {{DELETE}} statement as it contains {{JOIN}} too.

Alas, the problem is not {{JOIN}}, but rather {{IN}} clause.

Consider following {{SELECT}} query (generated by DML engine):
{code:java}
SELECT
_KEY,
_VAL
FROM "SqlDmlExamplePersons".PERSON
WHERE ID IN( SELECT
P.ID
FROM "SqlDmlExamplePersons".PERSON P 
INNER JOIN "SqlDmlExampleOrganizations".ORGANIZATION O 
ON TRUE
WHERE (O.NAME <> ?1) AND (P.ORGID = O.ID) ){code}
It won't return correct result even if you set {{distributedJoins=true}} 
because distributed joins flag seems to work only for top-level query and 
doesn't seem to care about stuff like INs and other subqueries. That said, I 
believe that the problem is our joins implementation. What can be done with 
this and at what cost requires additional investigation.

> SqlDmlExample: Incorrect result for Delete if run with standalone nodes (Java 
> & .NET)
> -
>
> Key: IGNITE-7413
> URL: https://issues.apache.org/jira/browse/IGNITE-7413
> Project: Ignite
>  Issue Type: Bug
>  Components: platforms
>Affects Versions: 2.3
>Reporter: Irina Zaporozhtseva
>Assignee: Pavel Tupitsyn
>Priority: Minor
>  Labels: .NET
> Fix For: 2.4
>
>
> Datagrid.QueryDmlExample: Incorrect result for Delete if run with standalone 
> nodes
>  
> without standalone nodes:
> {code}
> >>> Delete non-ASF employees 
> >>> 1: John Doe, ASF, 4400 
> >>> 2: Jane Roe, ASF, 5500
> {code}
>  
> with standalone nodes:
> {code}
> >>> Delete non-ASF employees 
> >>> 1: John Doe, ASF, 4400
> >>> 4: Richard Miles, Eclipse, 3000 
> >>> 2: Jane Roe, ASF, 5500
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-7248) "Schema not found" error when setting streaming mode for JDBC driver

2018-01-16 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327366#comment-16327366
 ] 

Alexander Paschenko commented on IGNITE-7248:
-

[~tledkov-gridgain], my comments:

1. {{JdbcStreamingToPublicCacheTest}}: unused imports; looks like we don't need 
{{getConfiguration0}} method either, just override {{getConfiguration}} and 
call there {{super.getConfiguration}} like you do now.

2. The fix itself is incorrect. You can't just use cache name as schema name - 
they may not match (PUBLIC schema, also schema name from cache 
configuration...). The problem is that we mix caches and schemas in 
{{JdbcStatement}} and {{GridQueryProcessor}}. Correct fix would be like this:
 * fix {{GridQueryProcessor#prepareNativeStatement}} so that it does not worry 
about caches and works only in terms of schema (I reverted your change in 
{{JdbcStatement}} and removed {{idx.schema}} lookup from 
{{GridQueryProcessor#prepareNativeStatement}} and your test passes just fine).

That said, please make the fix as proposed (cache name should not be used 
instead of schema name anywhere) and re-run tests.

> "Schema not found" error when setting streaming mode for JDBC driver
> 
>
> Key: IGNITE-7248
> URL: https://issues.apache.org/jira/browse/IGNITE-7248
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.3
>Reporter: Alexey Kukushkin
>Assignee: Vladimir Ozerov
>Priority: Major
> Fix For: 2.4
>
>
> Using JDBC "thick" driver in streaming mode fails with a "Schema not found" 
> erorr:
> {code}
> Connection connection = 
> DriverManager.getConnection("jdbc:ignite:cfg://streaming=true:cache=CACHE@file:/path-to-ignite-config.xml");
> {code}
> using connection to create a table works fine but this exception is generated 
> when using the connection to execute INSER INTO...
> {code}
> class org.apache.ignite.internal.processors.query.IgniteSQLException: Failed
> to set schema for DB connection for thread [schema=]
> org.h2.jdbc.JdbcSQLException: Schema  not found; SQL statement:
> SET SCHEMA "" [90079-195]
> {code}
> See [User 
> List|http://apache-ignite-users.70518.x6.nabble.com/Cannot-insert-data-into-table-using-JDBC-tc17477.html]
>  for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6022) SQL: add native batch execution support for DML statements

2018-01-16 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327329#comment-16327329
 ] 

Alexander Paschenko commented on IGNITE-6022:
-

[~rkondakov], my comment:

Class {{DmlBatchSender$Batch}}: please remove direct field usages and replace 
them with getters that you have already introduced. Also method 
{{Batch#rowProcessors}} looks unused.

> SQL: add native batch execution support for DML statements
> --
>
> Key: IGNITE-6022
> URL: https://issues.apache.org/jira/browse/IGNITE-6022
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>Assignee: Roman Kondakov
>Priority: Major
>  Labels: iep-1, performance
> Fix For: 2.4
>
>
> We have batch execution support for JDBC and ODBC drivers. This decreases 
> number of network hops. However, we do not have any batch execution support 
> on the server side. It means that for batch of N similar statements, every 
> statement will go through the whole execution chain - parsing, splitting, 
> communication with servers. And while parsing and splitting might be avoided 
> with help of statement cache, the most heavy part - network communication - 
> is still there.
> We need to investigate how to optimize the flow for batch updates. Possible 
> improvements:
> 1) Execute statements with certain degree of parallelism;
> 2) Send several query execution requests to the server at once;
> 3) Ensure that caches are used properly for batching - we should not parse 
> the same request multiple times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-6022) SQL: add native batch execution support for DML statements

2018-01-12 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323815#comment-16323815
 ] 

Alexander Paschenko commented on IGNITE-6022:
-

[~rkondakov]
Hi, my comments:

1. Do we really need additional ctor in {{SqlFieldsQueryEx}}? Args field is a 
list, not an array, and we have null check at {{addBatchedArgs}}, so nothing 
prevents us from using old ctor and adding as much args as we like. I don't 
think that pre-allocating list of given size justifies existence of an 
additional ctor we clearly can live without.
2. {{DmlStatementsProcessor.doInsertBatched}}: why do we not fail fast on 
unexpected exceptions and instead try to convert non {{IgniteSQLException}} s 
to {{SQLException}} s? [~vozerov] what could be correct behavior here, how do 
you think? I believe we should handle only {{IgniteSQLException}} s at this 
point.
3. {{DmlUtils.isBatched}} can be greatly simplified and turned into a 
one-liner, please do so ({{return (instanceof && ((QueryEx)isBatched)}})
4. {{SqlFieldsQueryEx.isBatched}} - please use {{F.isEmpty}}
5. {{JdbcRequestHandler.executeBatchedQuery}}: you don't need {{return}} in 
{{catch}} clause, instead please move everything after {{catch}} into {{try}}. 
Local var {{qryRes}} will be declared inside {{try}} then too.
6. Why does {{UpdatePlanBuilder.checkPlanCanBeDistributed}} count 
{{DmlUtils.isBatched}} unconditionally? Probably there are some cases when 
batch can be executed in distributed manner?
7. {{DmlBatchSender.add}}: you can simplify code and get rid of duplicate code 
at the end of this method by rewriting condition to {{if (batch.put(...) != 
null || batch.size() >= size)}}
8. {{DmlBatchSender.processPage}}: this constant copying of maps on each page 
looks quite suspicious to me. To avoid this, just keep two maps instead of one 
where needed: {{}} and {{Object, EntryProcessor}} (same sets 
of keys) and pass both to {{processPage}}. Method {{countAllRows}} will get 
simpler too (it will need only values of the first map).
9. Multiple violations of coding conventions - please don't put closing curly 
brace and anything else on the same line - like this: {{} catch {}}, instead 
you should move {{catch}} to the next line.
10. In the cases where there are maps or lists with contents that are hard to 
understand intuitively I would write concise comments about what those tons of 
generic args mean, or what are those lists of lists of lists.
11. {{UpdatePlan}}: {{createRows(Object[])}} and {{extractArgsValues}} contain 
parts of clearly copy-pasted code. Can't we unite those? Probably 
{{createRows(Object[])}} should just call {{extractArgsValues}}?

> SQL: add native batch execution support for DML statements
> --
>
> Key: IGNITE-6022
> URL: https://issues.apache.org/jira/browse/IGNITE-6022
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>Assignee: Roman Kondakov
>  Labels: iep-1, performance
> Fix For: 2.4
>
>
> We have batch execution support for JDBC and ODBC drivers. This decreases 
> number of network hops. However, we do not have any batch execution support 
> on the server side. It means that for batch of N similar statements, every 
> statement will go through the whole execution chain - parsing, splitting, 
> communication with servers. And while parsing and splitting might be avoided 
> with help of statement cache, the most heavy part - network communication - 
> is still there.
> We need to investigate how to optimize the flow for batch updates. Possible 
> improvements:
> 1) Execute statements with certain degree of parallelism;
> 2) Send several query execution requests to the server at once;
> 3) Ensure that caches are used properly for batching - we should not parse 
> the same request multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-5571) Make sure that cache-less execution works as good as cache-based

2017-12-27 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304737#comment-16304737
 ] 

Alexander Paschenko commented on IGNITE-5571:
-

[~vozerov]
All done.

> Make sure that cache-less execution works as good as cache-based
> 
>
> Key: IGNITE-5571
> URL: https://issues.apache.org/jira/browse/IGNITE-5571
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Critical
> Fix For: 2.4
>
>
> Compare the following two methods:
> 1) {{GridQueryProcessor.querySqlFields}} - old good entry point for query 
> execution;
> 2) {{GridQueryProcessor.querySqlFieldsNoCache}} - new method for "cache-less" 
> execution.
> Note how cache context is used in the first method:
> 1) First, it helps determine whether query can be converted to "local"
> 2) Second, it gets query parallelism of current cache, and if it differs from 
> {{1}}, then it turns on {{distributedJoins}}.
> Neither of this happens in the second implementation. Moreover, I had to 
> throw an exception for local queries, as I didn't know how to handle them 
> properly.
> We need to investigate and fix these two deficiencies somehow. Probably some 
> inputs from [~sergi.vladykin] would be required, to understand what is going 
> on.
> Our ultimate goal is to make "cache-less" execution as good as the old one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-7300) Allow expressions in SQL INSERTs within transactions

2017-12-25 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-7300:

Description: The problem is related to IGNITE-7267 - the latter honors raw 
rows, but drops support for inserts with expressions which yield local 
subqueries. To fix this, {{UpdatePlan.isLocalSubquery()}} must be honored.

> Allow expressions in SQL INSERTs within transactions
> 
>
> Key: IGNITE-7300
> URL: https://issues.apache.org/jira/browse/IGNITE-7300
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Paschenko
>Assignee: Igor Seliverstov
> Fix For: 2.4
>
>
> The problem is related to IGNITE-7267 - the latter honors raw rows, but drops 
> support for inserts with expressions which yield local subqueries. To fix 
> this, {{UpdatePlan.isLocalSubquery()}} must be honored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-7300) Allow expressions in SQL INSERTs within transactions

2017-12-25 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-7300:

Environment: (was: The problem is related to IGNITE-7267 - the latter 
honors raw rows, but drops support for inserts with expressions which yield 
local subqueries. To fix this, {{UpdatePlan.isLocalSubquery()}} must be 
honored.)

> Allow expressions in SQL INSERTs within transactions
> 
>
> Key: IGNITE-7300
> URL: https://issues.apache.org/jira/browse/IGNITE-7300
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexander Paschenko
>Assignee: Igor Seliverstov
> Fix For: 2.4
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7300) Allow expressions in SQL INSERTs within transactions

2017-12-25 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7300:
---

 Summary: Allow expressions in SQL INSERTs within transactions
 Key: IGNITE-7300
 URL: https://issues.apache.org/jira/browse/IGNITE-7300
 Project: Ignite
  Issue Type: Bug
 Environment: The problem is related to IGNITE-7267 - the latter honors 
raw rows, but drops support for inserts with expressions which yield local 
subqueries. To fix this, {{UpdatePlan.isLocalSubquery()}} must be honored.
Reporter: Alexander Paschenko
Assignee: Igor Seliverstov
 Fix For: 2.4






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-4192) SQL TX: JDBC driver support

2017-12-19 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296848#comment-16296848
 ] 

Alexander Paschenko commented on IGNITE-4192:
-

[~vozerov]
All done.

> SQL TX: JDBC driver support
> ---
>
> Key: IGNITE-4192
> URL: https://issues.apache.org/jira/browse/IGNITE-4192
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc, sql
>Reporter: Denis Magda
>Assignee: Alexander Paschenko
>  Labels: iep-3
> Fix For: 2.4
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from JDBC driver side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-4490) Optimize DML for fast INSERT and MERGE

2017-12-15 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292771#comment-16292771
 ] 

Alexander Paschenko commented on IGNITE-4490:
-

[~vozerov]
No, at this point we shouldn't as this routine comes from how H2 converts dates 
internally and that logic does not involve Java 8 stuff - see 
{{org.h2.util.DateTimeUtils}} and its imports.

> Optimize DML for fast INSERT and MERGE
> --
>
> Key: IGNITE-4490
> URL: https://issues.apache.org/jira/browse/IGNITE-4490
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 1.8
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
>  Labels: performance
>
> It's possible to avoid any SQL querying and map some INSERT and MERGE 
> statements to cache operations in a way similar to that of UPDATE and DELETE 
> - i.e. don't make queries when there are no expressions to evaluate in the 
> query and enhance update plans to perform direct cache operations when INSERT 
> and MERGE affect columns {{_key}} and {{_val}} only.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-4750) SQL: Support GROUP_CONCAT function

2017-12-12 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287402#comment-16287402
 ] 

Alexander Paschenko commented on IGNITE-4750:
-

[~tledkov-gridgain],
Right, but {{GridSqlType}} is also associated with SQL keyword for 
corresponding type, and, although it does not affect current task, we should 
use right keyword which is {{VARCHAR}}.

> SQL: Support GROUP_CONCAT function
> --
>
> Key: IGNITE-4750
> URL: https://issues.apache.org/jira/browse/IGNITE-4750
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Denis Magda
>Assignee: Taras Ledkov
> Fix For: 2.4
>
>
> GROUP_CONCAT function is not supported at the moment. Makes sense to fill 
> this gap:
> http://apache-ignite-users.70518.x6.nabble.com/GROUP-CONCAT-function-is-unsupported-td10757.html
> Presently the function doc is hidden:
> https://apacheignite-sql.readme.io/docs/group_concat
> Open it up once the ticket is released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-4750) SQL: Support GROUP_CONCAT function

2017-12-12 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287363#comment-16287363
 ] 

Alexander Paschenko commented on IGNITE-4750:
-

[~tledkov-gridgain], my comments:

{{GridSqlAggregateFunction}}:
1. Please don't use strings addition together with {{StatementBuilder}} - use 
{{append}} instead.
2. Please use {{resetCount()}} before {{appendExceptFirst()}} - these 
operations are looking at the same counter, and therefore it should be reset 
every time you use {{appendExceptFirst()}}.
3. Methods {{setGroupConcat*}} have way too wordy param names, like this: 
{{setGroupConcatSeparator(groupConcatSeparator)}} - there's redundancy, you can 
use simply {{separator}} for param name without losing any clarity.

4. {{GridSqlQueryParser}}: Code conventions are violated - long lines.

{{GridSqlQuerySplitter}}:
5. Please throw {{IgniteSqlException}} instead of {{IgniteException}}, error 
code should be {{UNSUPPORTED_OPERATION}}.
6. Here: {{rdcAgg = aggregate(false, agg.type())}} please use {{GROUP_CONCAT}} 
as the second arg for clarity.

7. {{GridSqlType}}: I believe SQL keyword for strings is {{VARCHAR}}, not 
{{STRING}}, isn't it?

8. {{IgniteSqlGroupConcatCollocatedTest}}: please fix long lines.

> SQL: Support GROUP_CONCAT function
> --
>
> Key: IGNITE-4750
> URL: https://issues.apache.org/jira/browse/IGNITE-4750
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Denis Magda
>Assignee: Taras Ledkov
> Fix For: 2.4
>
>
> GROUP_CONCAT function is not supported at the moment. Makes sense to fill 
> this gap:
> http://apache-ignite-users.70518.x6.nabble.com/GROUP-CONCAT-function-is-unsupported-td10757.html
> Presently the function doc is hidden:
> https://apacheignite-sql.readme.io/docs/group_concat
> Open it up once the ticket is released.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-4192) SQL TX: JDBC driver support

2017-12-08 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283683#comment-16283683
 ] 

Alexander Paschenko edited comment on IGNITE-4192 at 12/8/17 2:55 PM:
--

[~vozerov]
My questions/clarifications:

5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}}

{quote}
Most probably transaction should be bound to Connection object.
{quote}
This is what effectively happening now - each {{Connection}} has its own 
{{JdbcRequestHandler}} on server side which in turn has single threaded worker 
within whose scope transactions start and end. After all, it's a thin client, 
and there may be no Ignite running at all on the host owning {{Connection}} 
object. All in all, a connection may not have more than one transaction started 
as long as no single thread can.


was (Author: al.psc):
[~vozerov]
My questions/clarifications:

5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}}

{quote}
Most probably transaction should be bound to Connection object.
{quote}
This is what effectively happening now - each {{Connection}} has its own 
{{JdbcRequestHandler}} on server side which in turn has single threaded worker 
within whose scope transactions start and end. After all, it's a thin client, 
and there may be no Ignite running at all on the host owning {{Connection}} 
object.

> SQL TX: JDBC driver support
> ---
>
> Key: IGNITE-4192
> URL: https://issues.apache.org/jira/browse/IGNITE-4192
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc
>Reporter: Denis Magda
>Assignee: Alexander Paschenko
>  Labels: iep-3, sql
> Fix For: 2.4
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from JDBC driver side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-4192) SQL TX: JDBC driver support

2017-12-08 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283683#comment-16283683
 ] 

Alexander Paschenko edited comment on IGNITE-4192 at 12/8/17 2:54 PM:
--

[~vozerov]
My questions/clarifications:

5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}}

{quote}
Most probably transaction should be bound to Connection object.
{quote}
This is what effectively happening now - each {{Connection}} has its own 
{{JdbcRequestHandler}} on server side which in turn has single threaded worker 
within whose scope transactions start and end. After all, it's a thin client, 
and there may be no Ignite running at all on the host owning {{Connection}} 
object.


was (Author: al.psc):
[~vozerov]
My questions/clarifications:

5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}}

{{quote}}
Most probably transaction should be bound to Connection object.
{{quote}}
This is what effectively happening now - each {{Connection}} has its own 
{{JdbcRequestHandler}} on server side which in turn has single threaded worker 
within whose scope transactions start and end. After all, it's a thin client, 
and there may be no Ignite running at all on the host owning {{Connection}} 
object.

> SQL TX: JDBC driver support
> ---
>
> Key: IGNITE-4192
> URL: https://issues.apache.org/jira/browse/IGNITE-4192
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc
>Reporter: Denis Magda
>Assignee: Alexander Paschenko
>  Labels: iep-3, sql
> Fix For: 2.4
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from JDBC driver side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-7113) "Key is missing from query" when creating table with key_type=java.lang.String

2017-12-07 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-7113:
---

Assignee: Alexander Paschenko

> "Key is missing from query" when creating table with key_type=java.lang.String
> --
>
> Key: IGNITE-7113
> URL: https://issues.apache.org/jira/browse/IGNITE-7113
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.3
>Reporter: Ilya Kasnacheev
>Assignee: Alexander Paschenko
> Attachments: IgniteStringKeyTest.java
>
>
> When creating a table of
> {code}
> CREATE TABLE IF NOT EXISTS TableWithStringKey (
>   ID VARCHAR PRIMARY KEY,
>   DataNodeId VARCHAR
> ) WITH "backups=1, cache_name=TableWithStringKey, atomicity=transactional, 
> key_type=java.lang.String, value_type=TableWithStringKey"
> {code}
> and attempting an insert later
> {code}
> INSERT INTO TableWithStringKey (ID, DataNodeId) VALUES ('ref2', 'src2')
> {code}
> There's suddently an exception
> {code}
> javax.cache.CacheException: class 
> org.apache.ignite.internal.processors.query.IgniteSQLException: Failed to 
> execute DML statement [stmt=INSERT INTO TableWithStringKey (ID, DataNodeId) 
> VALUES ('ref2', 'src2'), params=null]
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:597)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:560)
>   at 
> org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
>   at 
> com.gridgain.reproducer.IgniteStringKeyTest.insertTest(IgniteStringKeyTest.java:34)
>   ... 24 more
> Caused by: class 
> org.apache.ignite.internal.processors.query.IgniteSQLException: Failed to 
> execute DML statement [stmt=INSERT INTO TableWithStringKey (ID, DataNodeId) 
> VALUES ('ref2', 'src2'), params=null]
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1459)
>   at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1909)
>   at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1907)
>   at 
> org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
>   at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2445)
>   at 
> org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1914)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:585)
>   ... 27 more
> Caused by: class org.apache.ignite.IgniteCheckedException: Key is missing 
> from query
>   at 
> org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.createSupplier(UpdatePlanBuilder.java:369)
>   at 
> org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.planForInsert(UpdatePlanBuilder.java:211)
>   at 
> org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.planForStatement(UpdatePlanBuilder.java:98)
>   at 
> org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.getPlanForStatement(DmlStatementsProcessor.java:473)
>   at 
> org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.updateSqlFields(DmlStatementsProcessor.java:170)
>   at 
> org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.updateSqlFieldsDistributed(DmlStatementsProcessor.java:229)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1453)
>   ... 33 more
> {code}
> that goes away if you remove "key_type=java.lang.String"
> I'm attaching a reproducer class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-7120) Test cases inherited by AbstractSchemaSelfTest became flucky after the refactoring to use SQL API.

2017-12-06 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-7120:
---

Assignee: Alexander Paschenko

> Test cases inherited by AbstractSchemaSelfTest became flucky after the 
> refactoring to use SQL API.
> --
>
> Key: IGNITE-7120
> URL: https://issues.apache.org/jira/browse/IGNITE-7120
> Project: Ignite
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.3
>Reporter: Roman Kondakov
>Assignee: Alexander Paschenko
> Fix For: 2.4
>
>
> This problem may be caused by the delay between CREATE INDEX command and the 
> consequitive JDBC metadata updating.  Therefore, a metadata info may be 
> outdated in AbstractSchemaSelfTest#assertIndex() {.. 
> c.getMetaData().getIndexInfo() ..}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-7001) Refactor dynamic indexes tests to use SQL API

2017-11-29 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270667#comment-16270667
 ] 

Alexander Paschenko edited comment on IGNITE-7001 at 11/29/17 12:25 PM:


[~vozerov] done, few remaining usages look justified to me (failover tests, or 
something not visible via public API like actual affinity column of a table).


was (Author: al.psc):
@ozerov done, few remaining usages look justified to me (failover tests, or 
something not visible via public API like actual affinity column of a table).

> Refactor dynamic indexes tests to use SQL API
> -
>
> Key: IGNITE-7001
> URL: https://issues.apache.org/jira/browse/IGNITE-7001
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
> Fix For: 2.4
>
>
> Tests for dynamic indexes should be refactored like those in IGNITE-6416 to 
> use SQL API instead of relying on internal structures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-7001) Refactor dynamic indexes tests to use SQL API

2017-11-29 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270667#comment-16270667
 ] 

Alexander Paschenko commented on IGNITE-7001:
-

@ozerov done, few remaining usages look justified to me (failover tests, or 
something not visible via public API like actual affinity column of a table).

> Refactor dynamic indexes tests to use SQL API
> -
>
> Key: IGNITE-7001
> URL: https://issues.apache.org/jira/browse/IGNITE-7001
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
> Fix For: 2.4
>
>
> Tests for dynamic indexes should be refactored like those in IGNITE-6416 to 
> use SQL API instead of relying on internal structures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-4192) SQL TX: JDBC driver support

2017-11-28 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-4192:
---

Assignee: Alexander Paschenko

> SQL TX: JDBC driver support
> ---
>
> Key: IGNITE-4192
> URL: https://issues.apache.org/jira/browse/IGNITE-4192
> Project: Ignite
>  Issue Type: Task
>  Components: jdbc
>Reporter: Denis Magda
>Assignee: Alexander Paschenko
>  Labels: iep-3
> Fix For: 2.4
>
>
> To support execution of DML and SELECT statements inside of a transaction 
> started from JDBC driver side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-7001) Refactor dynamic indexes test to use SQL API

2017-11-23 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-7001:
---

 Summary: Refactor dynamic indexes test to use SQL API
 Key: IGNITE-7001
 URL: https://issues.apache.org/jira/browse/IGNITE-7001
 Project: Ignite
  Issue Type: Task
  Components: sql
Reporter: Alexander Paschenko
Assignee: Alexander Paschenko
 Fix For: 2.4


Tests for dynamic indexes should be refactored like those in IGNITE-6416 to use 
SQL API instead of relying on internal structures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6996) Smarter handling of id fields in SQL values

2017-11-23 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-6996:
---

 Summary: Smarter handling of id fields in SQL values
 Key: IGNITE-6996
 URL: https://issues.apache.org/jira/browse/IGNITE-6996
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Reporter: Alexander Paschenko


Consider such case:

User wants to have a composite value (many value fields in {{QueryEntity}}) 
with one field associated with value's id (most likely matching cache key too).

Currently in order to insert such an object we will have to do something like
{{INSERT INTO Person(_key, id, name) values(1, 1, 'John')}}

And there's no way to avoid such a redundant repeat of the same value.

Suggested approach: I believe that we should specifically handle the case when 
user specifies {{keyFieldName}} in configuration and specified field is field 
of the value.

In such case, we could just do {{INSERT INTO Person(id, name) values(1, 
'John')}} and derive {{_key}} value from {{id}} column. (And vice versa.)

At a glance, this also will require following tweaks:
- forbid performing SQL {{UPDATE}} on such column ({{id}} in above example);
- on an {{INSERT}}, check that {{_key}} and {{id}} values are the same, if both 
specified.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-5571) Make sure that cache-less execution works as good as cache-based

2017-11-20 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259088#comment-16259088
 ] 

Alexander Paschenko commented on IGNITE-5571:
-

[~vozerov],
1; 4; 5 - fixed.

2; 3 - stuff like GROUP BY didn't participate in analysis before; thus, this 
flag was calculated solely relying on characteristics of the cache the query 
was issued on and query flags - like this

{{boolean loc = (qry.isReplicatedOnly() && cctx.isReplicatedAffinityNode()) || 
cctx.isLocal() || qry.isLocal();}}

Therefore, I believe that more intelligent and in-depth analysis of the queries 
should be moved to a separate task as of now as long as suggested patch does 
not in fact change existing behavior.


> Make sure that cache-less execution works as good as cache-based
> 
>
> Key: IGNITE-5571
> URL: https://issues.apache.org/jira/browse/IGNITE-5571
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Critical
> Fix For: 2.4
>
>
> Compare the following two methods:
> 1) {{GridQueryProcessor.querySqlFields}} - old good entry point for query 
> execution;
> 2) {{GridQueryProcessor.querySqlFieldsNoCache}} - new method for "cache-less" 
> execution.
> Note how cache context is used in the first method:
> 1) First, it helps determine whether query can be converted to "local"
> 2) Second, it gets query parallelism of current cache, and if it differs from 
> {{1}}, then it turns on {{distributedJoins}}.
> Neither of this happens in the second implementation. Moreover, I had to 
> throw an exception for local queries, as I didn't know how to handle them 
> properly.
> We need to investigate and fix these two deficiencies somehow. Probably some 
> inputs from [~sergi.vladykin] would be required, to understand what is going 
> on.
> Our ultimate goal is to make "cache-less" execution as good as the old one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-6950) Update documentation for user facing CREATE TABLE params

2017-11-17 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-6950:
---

Assignee: Denis Magda  (was: Alexander Paschenko)

> Update documentation for user facing CREATE TABLE params
> 
>
> Key: IGNITE-6950
> URL: https://issues.apache.org/jira/browse/IGNITE-6950
> Project: Ignite
>  Issue Type: Task
>  Security Level: Public(Viewable by anyone) 
>  Components: documentation, sql
>Affects Versions: 2.4
>Reporter: Alexander Paschenko
>Assignee: Denis Magda
> Fix For: 2.4
>
>
> Changes to documentation should be made to reflect additional spelling for 
> some {{CREATE TABLE}} params added in IGNITE-6270.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6950) Update documentation for user facing CREATE TABLE params

2017-11-17 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-6950:
---

 Summary: Update documentation for user facing CREATE TABLE params
 Key: IGNITE-6950
 URL: https://issues.apache.org/jira/browse/IGNITE-6950
 Project: Ignite
  Issue Type: Task
  Security Level: Public (Viewable by anyone)
  Components: documentation, sql
Affects Versions: 2.4
Reporter: Alexander Paschenko
Assignee: Alexander Paschenko
 Fix For: 2.4


Changes to documentation should be made to reflect additional spelling for some 
{{CREATE TABLE}} params added in IGNITE-6270.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-5571) Make sure that cache-less execution works as good as cache-based

2017-10-19 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210852#comment-16210852
 ] 

Alexander Paschenko commented on IGNITE-5571:
-

[~ptupitsyn], hello, could you please have a look at .NET test failures?
https://ci.ignite.apache.org/viewLog.html?buildId=898285=buildResultsDiv=Ignite20Tests_IgnitePlatformNet

> Make sure that cache-less execution works as good as cache-based
> 
>
> Key: IGNITE-5571
> URL: https://issues.apache.org/jira/browse/IGNITE-5571
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
>Priority: Critical
> Fix For: 2.4
>
>
> Compare the following two methods:
> 1) {{GridQueryProcessor.querySqlFields}} - old good entry point for query 
> execution;
> 2) {{GridQueryProcessor.querySqlFieldsNoCache}} - new method for "cache-less" 
> execution.
> Note how cache context is used in the first method:
> 1) First, it helps determine whether query can be converted to "local"
> 2) Second, it gets query parallelism of current cache, and if it differs from 
> {{1}}, then it turns on {{distributedJoins}}.
> Neither of this happens in the second implementation. Moreover, I had to 
> throw an exception for local queries, as I didn't know how to handle them 
> properly.
> We need to investigate and fix these two deficiencies somehow. Probably some 
> inputs from [~sergi.vladykin] would be required, to understand what is going 
> on.
> Our ultimate goal is to make "cache-less" execution as good as the old one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command

2017-10-18 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210112#comment-16210112
 ] 

Alexander Paschenko commented on IGNITE-6283:
-

[~dmagda], the answer is "yes" for both questions.

> Document ALTER TABLE ADD COLUMN command
> ---
>
> Key: IGNITE-6283
> URL: https://issues.apache.org/jira/browse/IGNITE-6283
> Project: Ignite
>  Issue Type: Task
>  Components: documentation, sql
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
> Fix For: 2.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6637) No proper cleanup of statements cache is done on table drop

2017-10-16 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-6637:

Description: 
We should cleanup statements cache whenever cache is deregistered - otherwise 
it's possible to retrieve from statements cache a statement that is prepared 
from H2 perspective but may not be executed. Such situation may arise for any 
dynamic cache, not only for that created via SQL.
Reproducer attached.

  was:
We should cleanup statements cache whenever cache is deregistered - otherwise 
it's possible to retrieve from statements cache a statement that is prepared 
from H2 perspective but may not be executed.
Reproducer attached.


> No proper cleanup of statements cache is done on table drop
> ---
>
> Key: IGNITE-6637
> URL: https://issues.apache.org/jira/browse/IGNITE-6637
> Project: Ignite
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: sql
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
> Fix For: 2.3
>
> Attachments: IgniteCacheLocalQuerySelfTest.java
>
>
> We should cleanup statements cache whenever cache is deregistered - otherwise 
> it's possible to retrieve from statements cache a statement that is prepared 
> from H2 perspective but may not be executed. Such situation may arise for any 
> dynamic cache, not only for that created via SQL.
> Reproducer attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6637) No proper cleanup of statements cache is done on table drop

2017-10-16 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-6637:

Fix Version/s: 2.3

> No proper cleanup of statements cache is done on table drop
> ---
>
> Key: IGNITE-6637
> URL: https://issues.apache.org/jira/browse/IGNITE-6637
> Project: Ignite
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: sql
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
> Fix For: 2.3
>
> Attachments: IgniteCacheLocalQuerySelfTest.java
>
>
> We should cleanup statements cache whenever cache is deregistered - otherwise 
> it's possible to retrieve from statements cache a statement that is prepared 
> from H2 perspective but may not be executed.
> Reproducer attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (IGNITE-6637) No proper cleanup of statements cache is done on table drop

2017-10-16 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko updated IGNITE-6637:

Attachment: IgniteCacheLocalQuerySelfTest.java

> No proper cleanup of statements cache is done on table drop
> ---
>
> Key: IGNITE-6637
> URL: https://issues.apache.org/jira/browse/IGNITE-6637
> Project: Ignite
>  Issue Type: Bug
>  Security Level: Public(Viewable by anyone) 
>  Components: sql
>Reporter: Alexander Paschenko
>Assignee: Alexander Paschenko
> Attachments: IgniteCacheLocalQuerySelfTest.java
>
>
> We should cleanup statements cache whenever cache is deregistered - otherwise 
> it's possible to retrieve from statements cache a statement that is prepared 
> from H2 perspective but may not be executed.
> Reproducer attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6637) No proper cleanup of statements cache is done on table drop

2017-10-16 Thread Alexander Paschenko (JIRA)
Alexander Paschenko created IGNITE-6637:
---

 Summary: No proper cleanup of statements cache is done on table 
drop
 Key: IGNITE-6637
 URL: https://issues.apache.org/jira/browse/IGNITE-6637
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
  Components: sql
Reporter: Alexander Paschenko
Assignee: Alexander Paschenko


We should cleanup statements cache whenever cache is deregistered - otherwise 
it's possible to retrieve from statements cache a statement that is prepared 
from H2 perspective but may not be executed.
Reproducer attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6024) SQL: execute DML statements on the server when possible

2017-10-11 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200080#comment-16200080
 ] 

Alexander Paschenko commented on IGNITE-6024:
-

[~isapego], OK, let's let that be.

> SQL: execute DML statements on the server when possible
> ---
>
> Key: IGNITE-6024
> URL: https://issues.apache.org/jira/browse/IGNITE-6024
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>Assignee: Sergey Kalashnikov
>  Labels: important, performance
> Fix For: 2.3
>
>
> Currently we execute DML statements as follows:
> 1) Get query result set to the client
> 2) Construct entry processors and send them to servers in batches
> This approach is inefficient as it causes a lot of unnecessary network 
> communication  Instead, we should execute DML statements directly on server 
> nodes when it is possible.
> Implementation considerations:
> 1) Determine set of queries which could be processed in this way. E.g., 
> {{LIMIT/OFFSET}}, {{GROUP BY}}, {{ORDER BY}}, {{DISTINCT}}, etc. are out of 
> question - they must go through the client anyway. Probably 
> {{skipMergeTable}} flag is a good starting point (good, not precise!)
> 2) Send request to every server and execute local DML right there
> 3) No failover support at the moment - throw "partial update" exception if 
> topology is unstable
> 4) Handle partition reservation carefully
> 5) Transactions: we still have single coordinator - this is a client. When 
> MVCC and TX SQL is ready, client will assign proper counters to server 
> requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6024) SQL: execute DML statements on the server when possible

2017-10-11 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200062#comment-16200062
 ] 

Alexander Paschenko commented on IGNITE-6024:
-

[~isapego], [~skalashnikov], my comments:

1. I don't think we need internal class inside {{UpdatePlan}}. Please move its 
fields to {{UpdatePlan}} as it already does have many fields that have meaning 
only for some operations.
In order to return correct params from 
{{UpdatePlanBuilder#checkPlanCanBeDistributed}}, we could just use a {{T2}} as 
long as it's just communication between private methods.

2. Do we really need {{SqlFieldsQueryEx}}? Probably we could just pass 
additional params explicitly, couldn't we?

[~vozerov], what do you think?

> SQL: execute DML statements on the server when possible
> ---
>
> Key: IGNITE-6024
> URL: https://issues.apache.org/jira/browse/IGNITE-6024
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>Assignee: Sergey Kalashnikov
>  Labels: important, performance
> Fix For: 2.3
>
>
> Currently we execute DML statements as follows:
> 1) Get query result set to the client
> 2) Construct entry processors and send them to servers in batches
> This approach is inefficient as it causes a lot of unnecessary network 
> communication  Instead, we should execute DML statements directly on server 
> nodes when it is possible.
> Implementation considerations:
> 1) Determine set of queries which could be processed in this way. E.g., 
> {{LIMIT/OFFSET}}, {{GROUP BY}}, {{ORDER BY}}, {{DISTINCT}}, etc. are out of 
> question - they must go through the client anyway. Probably 
> {{skipMergeTable}} flag is a good starting point (good, not precise!)
> 2) Send request to every server and execute local DML right there
> 3) No failover support at the moment - throw "partial update" exception if 
> topology is unstable
> 4) Handle partition reservation carefully
> 5) Transactions: we still have single coordinator - this is a client. When 
> MVCC and TX SQL is ready, client will assign proper counters to server 
> requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6573) SQL: Document WRAP_KEY and WRAP_VALUE commands

2017-10-10 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198909#comment-16198909
 ] 

Alexander Paschenko commented on IGNITE-6573:
-

Done, [~dmagda] please have a look. What to see: 
https://dash.readme.io/project/apacheignite-sql/v2.1/docs/create-table, section 
*Parameters* (at the very end) and section *Note* (at the end too).

> SQL: Document WRAP_KEY and WRAP_VALUE commands
> --
>
> Key: IGNITE-6573
> URL: https://issues.apache.org/jira/browse/IGNITE-6573
> Project: Ignite
>  Issue Type: Task
>  Components: documentation
>Reporter: Vladimir Ozerov
>Assignee: Alexander Paschenko
> Fix For: 2.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command

2017-10-10 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198886#comment-16198886
 ] 

Alexander Paschenko edited comment on IGNITE-6283 at 10/10/17 4:25 PM:
---

Done, [~dmagda], [~sergey.puch...@gmail.com] please have a look. Page: 
https://dash.readme.io/project/apacheignite-sql/v2.1/docs/alter-table


was (Author: al.psc):
Done, [~dmagda], [~sergey.puch...@gmail.com] please have a look.

> Document ALTER TABLE ADD COLUMN command
> ---
>
> Key: IGNITE-6283
> URL: https://issues.apache.org/jira/browse/IGNITE-6283
> Project: Ignite
>  Issue Type: Task
>  Components: documentation, sql
>Reporter: Vladimir Ozerov
>Assignee: Denis Magda
> Fix For: 2.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command

2017-10-10 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198886#comment-16198886
 ] 

Alexander Paschenko edited comment on IGNITE-6283 at 10/10/17 4:04 PM:
---

Done, [~dmagda], [~sergey.puch...@gmail.com] please have a look.


was (Author: al.psc):
Done, [~dmagda] [~sergey.puch...@gmail.com] please have a look.

> Document ALTER TABLE ADD COLUMN command
> ---
>
> Key: IGNITE-6283
> URL: https://issues.apache.org/jira/browse/IGNITE-6283
> Project: Ignite
>  Issue Type: Task
>  Components: documentation, sql
>Reporter: Vladimir Ozerov
>Assignee: Denis Magda
> Fix For: 2.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command

2017-10-10 Thread Alexander Paschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Paschenko reassigned IGNITE-6283:
---

Assignee: Denis Magda  (was: Alexander Paschenko)

Done, [~dmagda] [~sergey.puch...@gmail.com] please have a look.

> Document ALTER TABLE ADD COLUMN command
> ---
>
> Key: IGNITE-6283
> URL: https://issues.apache.org/jira/browse/IGNITE-6283
> Project: Ignite
>  Issue Type: Task
>  Components: documentation, sql
>Reporter: Vladimir Ozerov
>Assignee: Denis Magda
> Fix For: 2.3
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (IGNITE-6024) SQL: execute DML statements on the server when possible

2017-10-10 Thread Alexander Paschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198543#comment-16198543
 ] 

Alexander Paschenko commented on IGNITE-6024:
-

[~skalashnikov], my comments:

1. Why has {{UpdatePlan}} become mutable? As before, it's initialized fully in 
one place. Please refactor to return immutability.

2. Why {{GridH2DmlResponse}} carries only error string? We already have had to 
pull error code to thin JDBC response, probably we should preserve code here 
too? [~vozerov], what do you think?

3. What is semantic of distributed {{MERGE}} or {{INSERT}} without subquery? If 
I issue {{MERGE INTO Person(id, name) values (1, 'a'), (2, 'b'), (3, 'c')}}, 
how it will work? I don't see any code that would check this. Intuitively, it 
should be somewhere around {{DmlStatementsProcessor#checkPlanCanBeDistributed}}.

4. {{DmlStatementsProcessor#checkPlanCanBeDistributed}}: why we don't care 
about statements caching here? ({{PreparedStatement stmt = 
conn.prepareStatement(plan.selectQry)}}) - looks like this could introduce an 
inevitable parsing.

5. Option name {{updateOnServer}} looks confusing to me. Isn't something like 
{{distributedDml}} better?

> SQL: execute DML statements on the server when possible
> ---
>
> Key: IGNITE-6024
> URL: https://issues.apache.org/jira/browse/IGNITE-6024
> Project: Ignite
>  Issue Type: Task
>  Components: sql
>Affects Versions: 2.1
>Reporter: Vladimir Ozerov
>Assignee: Sergey Kalashnikov
>  Labels: important, performance
> Fix For: 2.3
>
>
> Currently we execute DML statements as follows:
> 1) Get query result set to the client
> 2) Construct entry processors and send them to servers in batches
> This approach is inefficient as it causes a lot of unnecessary network 
> communication  Instead, we should execute DML statements directly on server 
> nodes when it is possible.
> Implementation considerations:
> 1) Determine set of queries which could be processed in this way. E.g., 
> {{LIMIT/OFFSET}}, {{GROUP BY}}, {{ORDER BY}}, {{DISTINCT}}, etc. are out of 
> question - they must go through the client anyway. Probably 
> {{skipMergeTable}} flag is a good starting point (good, not precise!)
> 2) Send request to every server and execute local DML right there
> 3) No failover support at the moment - throw "partial update" exception if 
> topology is unstable
> 4) Handle partition reservation carefully
> 5) Transactions: we still have single coordinator - this is a client. When 
> MVCC and TX SQL is ready, client will assign proper counters to server 
> requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   >