[jira] [Assigned] (IGNITE-8386) SQL: Make sure PK index do not use wrapped object
[ https://issues.apache.org/jira/browse/IGNITE-8386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-8386: --- Assignee: Alexander Paschenko > SQL: Make sure PK index do not use wrapped object > - > > Key: IGNITE-8386 > URL: https://issues.apache.org/jira/browse/IGNITE-8386 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 2.4 >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Labels: iep-19, performance > Fix For: 2.6 > > > Currently PK may be built over the whole {{_KEY}} column, i.e. the whole > binary object. This could happen in two cases: > 1) Composite PK > 2) Plain PK but with {{WRAP_KEY}} option. > This is critical performance issue for two reasons: > 1) This index is effectively useless and cannot be used in any sensible > queries; it just wastes space and makes updates slower > 2) Binary object typically has common header bytes what may lead to excessive > number of comparisons during index update. > To mitigate the problem we need to ensure that index is *never* built over > {{_KEY}}, Instead, we must always extract target columns and build normal > index over them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8384) SQL: Secondary indexes should sort entries by links rather than keys
[ https://issues.apache.org/jira/browse/IGNITE-8384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-8384: --- Assignee: Alexander Paschenko > SQL: Secondary indexes should sort entries by links rather than keys > > > Key: IGNITE-8384 > URL: https://issues.apache.org/jira/browse/IGNITE-8384 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 2.4 >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Labels: iep-19, performance > Fix For: 2.6 > > > Currently we sort entries in secondary indexes as {{(idx_cols, KEY)}}. The > key itself is not stored in the index in general case. It means that we need > to perform a lookup to data page to find correct insertion point for index > entry. > This could be fixed easily by sorting entries a bit differently - {{idx_cols, > link}}. This is all we need. > UPD: If we have an affinity keys, then affinity column will be added to > secondary index as well. > So, we'll have secondary index as {{(idx_cols, KEY, AFF_COL)}} > Comparison occur here: > {{org.apache.ignite.internal.processors.query.h2.database.H2Tree#compare}} > What we need is to avoid adding PK and affinity key columns to every > secondary index and compare links instead in this method. > Probably we need to preserve old behavior for compatibility purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-6937) SQL TX: Support SELECT FOR UPDATE
[ https://issues.apache.org/jira/browse/IGNITE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-6937: Attachment: Снимок экрана 2018-05-02 в 11.51.03.png > SQL TX: Support SELECT FOR UPDATE > - > > Key: IGNITE-6937 > URL: https://issues.apache.org/jira/browse/IGNITE-6937 > Project: Ignite > Issue Type: Task > Components: cache, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Labels: iep-3 > Fix For: 2.6 > > Attachments: Снимок экрана 2018-05-02 в 11.51.03.png > > > Normally in SQL world readers do not block writers. This is how our SELECT > operations should work by default. But we need to add a support for {{SELECT > ... FOR UPDATE}} read mode, when reader obtains exclusive lock on read. > In this mode we lock entries as usual, but then send data back to the caller. > First page can be returned directly in our {{LockResponse}}. Next pages > should be requested in separate requests. With this approach {{SELECT ,,, FOR > UPDATE}} will require only single round-trip to both lock and read data in > case of small updates. > Update {{SELECT}} statement syntax once this feature is supported: > https://apacheignite-sql.readme.io/docs/select -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8204) Tests hang with lazy query flag on
Alexander Paschenko created IGNITE-8204: --- Summary: Tests hang with lazy query flag on Key: IGNITE-8204 URL: https://issues.apache.org/jira/browse/IGNITE-8204 Project: Ignite Issue Type: Bug Components: sql Reporter: Alexander Paschenko Assignee: Alexander Paschenko Fix For: 2.5 Affected tests: {{IgniteCacheClientQueryReplicatedNodeRestartSelfTest.testRestart}} {{IgniteCacheDistributedPartitionQueryNodeRestartsSelfTest.testJoinQueryUnstableTopology}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4193) SQL TX: ODBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423832#comment-16423832 ] Alexander Paschenko commented on IGNITE-4193: - [~isapego] the only minor thing to fix is class javadoc for {{OdbcRequestHandlerWorker}} - it mentions JDBC, this clearly needs fixing. > SQL TX: ODBC driver support > --- > > Key: IGNITE-4193 > URL: https://issues.apache.org/jira/browse/IGNITE-4193 > Project: Ignite > Issue Type: Task > Components: odbc >Reporter: Denis Magda >Assignee: Igor Sapego >Priority: Major > Labels: iep-3 > Fix For: 2.5 > > > To support execution of DML and SELECT statements inside of a transaction > started from ODBC driver side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4193) SQL TX: ODBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423820#comment-16423820 ] Alexander Paschenko commented on IGNITE-4193: - [~isapego] please disregard my previous comment, it seems like code duplication is intentional pattern in current JDBC/ODBC state of things not to couple those together. > SQL TX: ODBC driver support > --- > > Key: IGNITE-4193 > URL: https://issues.apache.org/jira/browse/IGNITE-4193 > Project: Ignite > Issue Type: Task > Components: odbc >Reporter: Denis Magda >Assignee: Igor Sapego >Priority: Major > Labels: iep-3 > Fix For: 2.5 > > > To support execution of DML and SELECT statements inside of a transaction > started from ODBC driver side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-4193) SQL TX: ODBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16423818#comment-16423818 ] Alexander Paschenko commented on IGNITE-4193: - [~isapego] thanks for the patch, my sole comment would be about {{OdbcRequestHandlerWorker}} - it's almost complete copy-paste from {{JdbcRequestHandlerWorker}} and JDBC is even still mentioned in comments. Can we generify this piece? Probably we'll have to extract some auxiliary interface with {{doHandle}} method to achieve this. > SQL TX: ODBC driver support > --- > > Key: IGNITE-4193 > URL: https://issues.apache.org/jira/browse/IGNITE-4193 > Project: Ignite > Issue Type: Task > Components: odbc >Reporter: Denis Magda >Assignee: Igor Sapego >Priority: Major > Labels: iep-3 > Fix For: 2.5 > > > To support execution of DML and SELECT statements inside of a transaction > started from ODBC driver side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off
[ https://issues.apache.org/jira/browse/IGNITE-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-8050: --- Assignee: Alexander Paschenko > Throw a meaningful exception when user issues TX SQL keyword with MVCC turned > off > - > > Key: IGNITE-8050 > URL: https://issues.apache.org/jira/browse/IGNITE-8050 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > An exception must be thrown when the user issues TX SQL command (BEGIN, > COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and > can lead to SQL engine behavior to behaving quite differently from what the > user expects, esp. in terms of data consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off
[ https://issues.apache.org/jira/browse/IGNITE-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-8050: Component/s: jdbc > Throw a meaningful exception when user issues TX SQL keyword with MVCC turned > off > - > > Key: IGNITE-8050 > URL: https://issues.apache.org/jira/browse/IGNITE-8050 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > An exception must be thrown when the user issues TX SQL command (BEGIN, > COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and > can lead to SQL engine behavior to behaving quite differently from what the > user expects, esp. in terms of data consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off
[ https://issues.apache.org/jira/browse/IGNITE-8050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-8050: Component/s: sql > Throw a meaningful exception when user issues TX SQL keyword with MVCC turned > off > - > > Key: IGNITE-8050 > URL: https://issues.apache.org/jira/browse/IGNITE-8050 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > An exception must be thrown when the user issues TX SQL command (BEGIN, > COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and > can lead to SQL engine behavior to behaving quite differently from what the > user expects, esp. in terms of data consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8050) Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off
Alexander Paschenko created IGNITE-8050: --- Summary: Throw a meaningful exception when user issues TX SQL keyword with MVCC turned off Key: IGNITE-8050 URL: https://issues.apache.org/jira/browse/IGNITE-8050 Project: Ignite Issue Type: Task Reporter: Alexander Paschenko Fix For: 2.5 An exception must be thrown when the user issues TX SQL command (BEGIN, COMMIT, ROLLBACK) in absence of MVCC - ingoring these may be confusing and can lead to SQL engine behavior to behaving quite differently from what the user expects, esp. in terms of data consistency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-8016) Make sure checks in QueryUtils.typeForQueryEntity are sane
Alexander Paschenko created IGNITE-8016: --- Summary: Make sure checks in QueryUtils.typeForQueryEntity are sane Key: IGNITE-8016 URL: https://issues.apache.org/jira/browse/IGNITE-8016 Project: Ignite Issue Type: Task Components: cache, sql Reporter: Alexander Paschenko If one uses {{AffinityUuid}} (and thus, most likely, any {{AffinityKey}} at all) as cache key in {{QueryEntity}} while not having an actual class for value type, cache fails to start with error {{Failed to find value class in the node classpath (use default marshaller to enable binary objects)}} Not only error message is misleading (as {{BinaryMarshaller}} is always on now), the checks themselves must be verified for sanity as ruling them out allowed to start cache, put data into it via cache API and perform SQL query that worked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7956) MVCC TX: cache eviction operations for key-value API
Alexander Paschenko created IGNITE-7956: --- Summary: MVCC TX: cache eviction operations for key-value API Key: IGNITE-7956 URL: https://issues.apache.org/jira/browse/IGNITE-7956 Project: Ignite Issue Type: Task Components: cache Reporter: Alexander Paschenko Fix For: 2.5 We need to implement MVCC-compatible cache eviction operations for key-value API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7955) MVCC TX: cache peek for key-value API
Alexander Paschenko created IGNITE-7955: --- Summary: MVCC TX: cache peek for key-value API Key: IGNITE-7955 URL: https://issues.apache.org/jira/browse/IGNITE-7955 Project: Ignite Issue Type: Task Components: cache Reporter: Alexander Paschenko Fix For: 2.5 We need to implement MVCC-compatible cache peek operation for key-value API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7954) MVCC TX: cache load routines for key-value API
Alexander Paschenko created IGNITE-7954: --- Summary: MVCC TX: cache load routines for key-value API Key: IGNITE-7954 URL: https://issues.apache.org/jira/browse/IGNITE-7954 Project: Ignite Issue Type: Task Components: cache Reporter: Alexander Paschenko Fix For: 2.5 We need to implement MVCC-compatible cache load operations for key-value API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7953) MVCC TX: continuous queries
Alexander Paschenko created IGNITE-7953: --- Summary: MVCC TX: continuous queries Key: IGNITE-7953 URL: https://issues.apache.org/jira/browse/IGNITE-7953 Project: Ignite Issue Type: Task Components: cache Reporter: Alexander Paschenko Fix For: 2.5 We need to implement MVCC-compatible continuous queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7952) MVCC TX: cache clear routines for key-value API
[ https://issues.apache.org/jira/browse/IGNITE-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-7952: Description: We need to implement MVCC-compatible cache clear operations for Key-Value API. > MVCC TX: cache clear routines for key-value API > --- > > Key: IGNITE-7952 > URL: https://issues.apache.org/jira/browse/IGNITE-7952 > Project: Ignite > Issue Type: Task > Components: cache >Reporter: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > We need to implement MVCC-compatible cache clear operations for Key-Value API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7952) MVCC TX: cache clear routines for key-value API
Alexander Paschenko created IGNITE-7952: --- Summary: MVCC TX: cache clear routines for key-value API Key: IGNITE-7952 URL: https://issues.apache.org/jira/browse/IGNITE-7952 Project: Ignite Issue Type: Task Components: cache Reporter: Alexander Paschenko Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7942) Document streaming in thin JDBC driver
Alexander Paschenko created IGNITE-7942: --- Summary: Document streaming in thin JDBC driver Key: IGNITE-7942 URL: https://issues.apache.org/jira/browse/IGNITE-7942 Project: Ignite Issue Type: Task Components: documentation, jdbc, sql Reporter: Alexander Paschenko Fix For: 2.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode
[ https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396829#comment-16396829 ] Alexander Paschenko commented on IGNITE-7253: - [~vozerov], all done. Regargind pt. 6 - there's a test {{JdbcStreamingSelfTest#testOnlyInsertsAllowed}} that passes both for thin and thick drivers. > JDBC thin driver: introduce streaming mode > -- > > Key: IGNITE-7253 > URL: https://issues.apache.org/jira/browse/IGNITE-7253 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > Attachments: IGNITE_7253.patch > > > Should be done after IGNITE-6022. We should allow optional streaming mode for > JDBC driver. In this mode only INSERTs without SELECT should be possible. All > other DML operations should throw an exception. > Design considerations: > 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable > streaming for connection. > 2) Add command {{STREAMER FLUSH}} which will force data flush. > 3) Only INSERT without SELECT works, all other DML statements should throw an > exception > 4) It should be possible to stream into several tables simultaneously (i.e. > several streamers could be opened) > 5) Any DDL statement should force flush of all currently opened streamers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode
[ https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384100#comment-16384100 ] Alexander Paschenko commented on IGNITE-7253: - [~vozerov], all fixed. > JDBC thin driver: introduce streaming mode > -- > > Key: IGNITE-7253 > URL: https://issues.apache.org/jira/browse/IGNITE-7253 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > Attachments: IGNITE_7253.patch > > > Should be done after IGNITE-6022. We should allow optional streaming mode for > JDBC driver. In this mode only INSERTs without SELECT should be possible. All > other DML operations should throw an exception. > Design considerations: > 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable > streaming for connection. > 2) Add command {{STREAMER FLUSH}} which will force data flush. > 3) Only INSERT without SELECT works, all other DML statements should throw an > exception > 4) It should be possible to stream into several tables simultaneously (i.e. > several streamers could be opened) > 5) Any DDL statement should force flush of all currently opened streamers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7859) SQL streaming support via native API
[ https://issues.apache.org/jira/browse/IGNITE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-7859: Component/s: sql > SQL streaming support via native API > > > Key: IGNITE-7859 > URL: https://issues.apache.org/jira/browse/IGNITE-7859 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > In addition to streaming via thin JDBC driver, ability to run same {{SET > STREAMING}} command should be added to cache API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (IGNITE-7859) SQL streaming support via native API
[ https://issues.apache.org/jira/browse/IGNITE-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-7859: Fix Version/s: 2.5 > SQL streaming support via native API > > > Key: IGNITE-7859 > URL: https://issues.apache.org/jira/browse/IGNITE-7859 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > In addition to streaming via thin JDBC driver, ability to run same {{SET > STREAMING}} command should be added to cache API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7859) SQL streaming support via native API
Alexander Paschenko created IGNITE-7859: --- Summary: SQL streaming support via native API Key: IGNITE-7859 URL: https://issues.apache.org/jira/browse/IGNITE-7859 Project: Ignite Issue Type: Task Reporter: Alexander Paschenko In addition to streaming via thin JDBC driver, ability to run same {{SET STREAMING}} command should be added to cache API. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7785) SQL query with COUNT and UNION in sub-query produces JdbcSQLException
[ https://issues.apache.org/jira/browse/IGNITE-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378534#comment-16378534 ] Alexander Paschenko commented on IGNITE-7785: - [~pvinokurov], Initial query is a bit weird as long as it has union of a table with itself. In absence of {{UNION ALL}} clause, that subquery {{(select id from company union all select id from company)}} should give same result as single {{SELECT id FROM Company}}, and thus in initial case that UNION is redundant - if you remove UNION, query works. However, if you add ALL to UNION, query works - however, in this case UNION has duplicates just like it should. I will also have a look at second query. > SQL query with COUNT and UNION in sub-query produces JdbcSQLException > - > > Key: IGNITE-7785 > URL: https://issues.apache.org/jira/browse/IGNITE-7785 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.1, 2.3 >Reporter: Pavel Vinokurov >Priority: Major > > SQL initial script: > CREATE TABLE Person(id INTEGER PRIMARY KEY, company_id INTEGER); > CREATE TABLE Company(id INTEGER PRIMARY KEY, name VARCHAR); > INSERT INTO Person(id,company_id) VALUES (1, 1), (2, 2), (3, 3); > INSERT INTO Company(id,name) VALUES (1,'n1'), (2,'n2'), (3,'n3'); > SQL Query: > SELECT count(1) FROM person p > LEFT JOIN (select id from company union select id from company) as c on > c.id=p.company_id > JDBC Exception: > Caused by: org.h2.jdbc.JdbcSQLException: Column "P__Z0.COMPANY_ID" must be in > the GROUP BY list; SQL statement: > SELECT > P__Z0.COMPANY_ID __C0_0, > COUNT(1) __C0_1 > FROM PUBLIC.PERSON P__Z0 [90016-195] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7738) Allow 'multiple statements' in thin JDBC streaming mode
Alexander Paschenko created IGNITE-7738: --- Summary: Allow 'multiple statements' in thin JDBC streaming mode Key: IGNITE-7738 URL: https://issues.apache.org/jira/browse/IGNITE-7738 Project: Ignite Issue Type: Improvement Components: sql Reporter: Alexander Paschenko We need to update thin JDBC protocol to let user run multiple statements via Statement.execute(String) when connection is in streamed mode. Currently in streaming mode the server always receives all SQL in batches and its batch processing logic does not allow multiple statements altogether. If we're able to recognize initial nature of the statement, we'll be able to act appropriately, and for this to be possible additional information must be passed with each query in the batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode
[ https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367320#comment-16367320 ] Alexander Paschenko commented on IGNITE-7253: - [~vozerov], all done. > JDBC thin driver: introduce streaming mode > -- > > Key: IGNITE-7253 > URL: https://issues.apache.org/jira/browse/IGNITE-7253 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > Should be done after IGNITE-6022. We should allow optional streaming mode for > JDBC driver. In this mode only INSERTs without SELECT should be possible. All > other DML operations should throw an exception. > Design considerations: > 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable > streaming for connection. > 2) Add command {{STREAMER FLUSH}} which will force data flush. > 3) Only INSERT without SELECT works, all other DML statements should throw an > exception > 4) It should be possible to stream into several tables simultaneously (i.e. > several streamers could be opened) > 5) Any DDL statement should force flush of all currently opened streamers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (IGNITE-7253) JDBC thin driver: introduce streaming mode
[ https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365795#comment-16365795 ] Alexander Paschenko edited comment on IGNITE-7253 at 2/16/18 11:07 AM: --- [~vozerov], 2, 3 - the only need for batch size in these cases is to keep track of number of fake counters that we return. That is, if user has issued "addBatch" trice, we return that number of counters, each having value of zero. Just a weak attempt to create a visibility of JDBC compliance, nothing else. was (Author: al.psc): [~vozerov], 2, 3 - the only need for batch size in these cases is to keep track of number of fake counters that we remove. That is, if user has issued "addBatch" trice, we return that number of counters, each having value of zero. Just a weak attempt to create a visibility of JDBC compliance, nothing else. > JDBC thin driver: introduce streaming mode > -- > > Key: IGNITE-7253 > URL: https://issues.apache.org/jira/browse/IGNITE-7253 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > Should be done after IGNITE-6022. We should allow optional streaming mode for > JDBC driver. In this mode only INSERTs without SELECT should be possible. All > other DML operations should throw an exception. > Design considerations: > 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable > streaming for connection. > 2) Add command {{STREAMER FLUSH}} which will force data flush. > 3) Only INSERT without SELECT works, all other DML statements should throw an > exception > 4) It should be possible to stream into several tables simultaneously (i.e. > several streamers could be opened) > 5) Any DDL statement should force flush of all currently opened streamers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7253) JDBC thin driver: introduce streaming mode
[ https://issues.apache.org/jira/browse/IGNITE-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365795#comment-16365795 ] Alexander Paschenko commented on IGNITE-7253: - [~vozerov], 2, 3 - the only need for batch size in these cases is to keep track of number of fake counters that we remove. That is, if user has issued "addBatch" trice, we return that number of counters, each having value of zero. Just a weak attempt to create a visibility of JDBC compliance, nothing else. > JDBC thin driver: introduce streaming mode > -- > > Key: IGNITE-7253 > URL: https://issues.apache.org/jira/browse/IGNITE-7253 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Major > Fix For: 2.5 > > > Should be done after IGNITE-6022. We should allow optional streaming mode for > JDBC driver. In this mode only INSERTs without SELECT should be possible. All > other DML operations should throw an exception. > Design considerations: > 1) Add command {{SET STREAMING=1|ON|0|OFF}} which will enable or disable > streaming for connection. > 2) Add command {{STREAMER FLUSH}} which will force data flush. > 3) Only INSERT without SELECT works, all other DML statements should throw an > exception > 4) It should be possible to stream into several tables simultaneously (i.e. > several streamers could be opened) > 5) Any DDL statement should force flush of all currently opened streamers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IGNITE-7664) SQL: throw sane exception on unsupported SQL statements
Alexander Paschenko created IGNITE-7664: --- Summary: SQL: throw sane exception on unsupported SQL statements Key: IGNITE-7664 URL: https://issues.apache.org/jira/browse/IGNITE-7664 Project: Ignite Issue Type: Improvement Components: sql Reporter: Alexander Paschenko Assignee: Alexander Paschenko Fix For: 2.5 Inspired by this SO issue: [https://stackoverflow.com/questions/48708238/ignite-database-create-schema-assertionerror] We should handle unsupported stuff more gracefully both in core code and drivers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7586) SQL COPY: add code examples
[ https://issues.apache.org/jira/browse/IGNITE-7586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355509#comment-16355509 ] Alexander Paschenko commented on IGNITE-7586: - [~kirill.shirokov], Thanks, my comments: # Please refactor imports - we don't use wildcards. Also order of imports does matter, please have a look at guidelines. (First java.*, then everything else.) IDEA may easily be tuned to lay imports out just the way we need. # Please remove "create index", it is not necessary to the scope of this example. # Please change ""JDBC example finished." to ""JDBC COPY command example finished." > SQL COPY: add code examples > --- > > Key: IGNITE-7586 > URL: https://issues.apache.org/jira/browse/IGNITE-7586 > Project: Ignite > Issue Type: Improvement > Components: sql >Affects Versions: 2.4 >Reporter: Kirill Shirokov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > Fix For: 2.4 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354254#comment-16354254 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], There was some weird stuff (abstract overrides, wrong ancestor, etc), but I've fixed it myself not to lose any more time on this - please have a look and note what has been changed. Please create a Jira issue to add tests for parsing and link it to this one, thank you. Will run tests too. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). > * We may want to implement data compression in future. > * We connect to single node in JDBC driver (no multi-node connections). > h3. Cache/tables/column handling > * We don't create table in
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353919#comment-16353919 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], In fact, cache writer *is* called to execute "independent actions on independent pieces of data". However, issue here is not that big, and if it was the only thing we'd be done already. What worries me more is that we're hanging with these two issues (50, 51) for few days already. Ok, let's pass on 51. Still there's 50, unused method param, and this is not passable, still there hasn't been any reaction (neither fix, nor verbal answer) about that. Are we going to argue about it as well? > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353530#comment-16353530 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], Thanks for the fixes. Still I don't see pts. {{50}}, {{51}} fixed. Are you going to fix those so that we can get this over with? > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). > * We may want to implement data compression in future. > * We connect to single node in JDBC driver (no multi-node connections). > h3. Cache/tables/column handling > * We don't create table in the bulk load command > * We may want to have and option for reading header row, which contains > column names to match columns > * In future we may wish to support
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352166#comment-16352166 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], 52. {{BulkLoadParser#createParser}} - I think we don't need two factories, one for formats is enough. Just create abstract method {{BulkLoadParser createParser()}} in {{BulkLoadFormat}}. You've stated above that format and parser are 1:1 related, so this seems to be the way to go. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). > * We may want to implement data compression in future. > * We connect to single node in JDBC driver (no multi-node connections). > h3. Cache/tables/column handling > * We don't create table in the bulk
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350605#comment-16350605 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/2/18 4:29 PM: - [~kirill.shirokov], 50. {{BulkLoadProcessor#close}} >See javadoc. I have seen it. Still this class is not a member of any hierarchy, {{close}} method neither overrides anything nor is overridden by anyone and thus now this param neither is used nor needed. Please remove it, adding it back once we need it won't be a big deal at all. Really, unused param in this case hardly is an architectural point of future extension. 51. {{BulkLoadCacheWriter}}: >O. But what is the reason for doing that? At the very least, to maintain consistency about names of methods having similar, or close-to-identical meaning. Say, all our closures have their method called {{apply}} while this class employs {{accept}}. I don't see any reason why this class shouldn't be in that hierarchy. was (Author: al.psc): [~kirill.shirokov], 50. {{BulkLoadProcessor#close}} >See javadoc. I have seen it. Still this class is not a member of any hierarchy, {{close}} method neither overrides anything nor is overridden by anyone and thus now this param neither is used nor needed. Please remove it, adding it back once we need it won't be a big deal at all. Really, unused param in this case hardly is an architectural point of extension. 51. {{BulkLoadCacheWriter}}: >O. But what is the reason for doing that? At the very least, to maintain consistency about names of methods having similar, or close-to-identical meaning. Say, all our closures have their method called {{apply}} while this class employs {{accept}}. I don't see any reason why this class shouldn't be in that hierarchy. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350605#comment-16350605 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], 50. {{BulkLoadProcessor#close}} >See javadoc. I have seen it. Still this class is not a member of any hierarchy, {{close}} method neither overrides anything nor is overridden by anyone and thus now this param neither is used nor needed. Please remove it, adding it back once we need it won't be a big deal at all. Really, unused param in this case hardly is an architectural point of extension. 51. {{BulkLoadCacheWriter}}: >O. But what is the reason for doing that? At the very least, to maintain consistency about names of methods having similar, or close-to-identical meaning. Say, all our closures have their method called {{apply}} while this class employs {{accept}}. I don't see any reason why this class shouldn't be in that hierarchy. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350544#comment-16350544 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], my comments: 44. {{BulkLoadProcessor#close}} - unused method param. 45. {{BulkLoadDataConverter}} - we don't need this class explicitly anymore, just create anonymous closure in-place. 46. About *32.2* - I simply suggest that you add one more ancestor to {{extends}} clause, this ancestor must be one of Ignite closures. Similar to what you did with {{BulkLoadDataConverter}}. 47. {{BulkLoadAckClientParameters#DEFAULT_INPUT_CHARSET}} - unused. 48. {{DmlStatementsProcessor#runDmlStatement(String sql, SqlCommand cmd):}} 48.1 Please rename to {{runNativeDmlStatement}} - it's not exactly an overload of another method with same name existing in this class. 48.2 Please remove braces after {{if (cmd instanceof SqlBulkLoadCommand) {}} - one-liners should not be surrounded with braces per Ignite conventions. 49. {{JdbcRequestHandler#processBulkLoadFileBatch}}: we try to actually process batch even when we have received {{CMD_FINISHED_ERROR}} (there's no check before attempting to process batch). Are you sure we want to do this regardless of state flag value? This is it for now. Thanks! > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349007#comment-16349007 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/2/18 12:28 PM: -- [~kirill.shirokov], 31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I think we should remove it and simply pass {{UpdatePlan}} to {{BulkLoadProcessor}}. 32. {{BulkLoadCacheWriter}}: 32.1 Odd empty line after interface header. 32.2 Leaving this interface be does not contradict with making it extend some of existing Ignite interfaces. Please add another appropriate ancestor to this interface. 33. Regarding this: >Does this anti-pattern makes sense here? Or we're about keeping similarity >with existing code and bugward compatibility? First, providing for resilience is one of the reasons that justify "error hiding". We aim not to let JDBC handler or NIO listener crash. Second, here's no "hiding" per se as exception is properly logged and we send the user message. 34. {{BulkLoadProcessor}}: 34.1 odd empty line after ctor header. 34.2 {{processBatch}} is still coupled with JDBC related class. 34.3 I think class name {{BulkLoadContext}} would reflect what this class does more correctly. In fact, it does not process anything, just holds various bits that help the process. In favor of this tells the fact that you even use the word "context" in javadoc for this class. 35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have already suggested that we introduce some JDBC neutral enum to indicate current stage of the process. 36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if". 37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply {{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has to do with the command overall and *not* with an individual batch. It's sent once per bulk load. 38. {{UpdateMode}}: there's a typo in javadocs you've added to all enum members. Please correct: com *_m_* and 39. {{UpdatePlanBuilder}}: unused new static import. 40. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, not {{long}}? 41. {{JdbcQueryTask#call}}: I think it would be better to gracefully close newly created context before throwing an exception. 42. {{BulkLoadCsvFormat}}: many new members and their getters have "Re" suffix. It's not standard to Ignite codebase. I would either replace it to "regex" or "regexp", or even remove it - the user of this class can clearly see what is return type of those getters, and those suffixes actually don't tell anything new. 43. {{DdlStatementsProcessor#runDdlStatement}}: why employ log throttling here? Why would we want to skip some error messages? This is it for now, thanks. Looking forward to fixes. was (Author: al.psc): [~kirill.shirokov], 31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I think we should remove it and simply pass {{UpdatePlan}} to {{BulkLoadProcessor}}. 32. {{BulkLoadCacheWriter}}: 32.1 Odd empty line after interface header. 32.2 Leaving this interface be does not contradict with making it extend some of existing Ignite interfaces. Please add another appropriate ancestor to this interface. 33. Regarding this: >Does this anti-pattern makes sense here? Or we're about keeping similarity >with existing code and bugward compatibility? First, providing for resilience is one of the reasons that justify "error hiding". We aim not to let JDBC handler or NIO listener crash. Second, here's no "hiding" per se as exception is properly logged and we send the user message. 34. {{BulkLoadProcessor}}: 34.1 odd empty line after ctor header. 34.2 {{processBatch}} is still coupled with JDBC related class. 34.3 I think class name {{BulkLoadContext}} would reflect what this class does more correctly. In fact, it does not process anything, just holds various bits that help the process. In favor of this tells the fact that you even use the word "context" in javadoc for this class. 35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have already suggested that we introduce some JDBC neutral enum to indicate current stage of the process. 36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if". 37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply {{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has to do with the command overall and *not* with an individual batch. It's sent once per bulk load. 38. \{{UpdateMode: t}}here's a typo in javadocs you've added to all enum members. Please correct: com*_m_*and 39. {{UpdatePlanBuilder}}: unused new static import. 40. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, not {{long}}? 41. {{JdbcQueryTask#call}}: I think it would be better to gracefully close newly created
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349007#comment-16349007 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 6:00 PM: - [~kirill.shirokov], 31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I think we should remove it and simply pass {{UpdatePlan}} to {{BulkLoadProcessor}}. 32. {{BulkLoadCacheWriter}}: 32.1 Odd empty line after interface header. 32.2 Leaving this interface be does not contradict with making it extend some of existing Ignite interfaces. Please add another appropriate ancestor to this interface. 33. Regarding this: >Does this anti-pattern makes sense here? Or we're about keeping similarity >with existing code and bugward compatibility? First, providing for resilience is one of the reasons that justify "error hiding". We aim not to let JDBC handler or NIO listener crash. Second, here's no "hiding" per se as exception is properly logged and we send the user message. 34. {{BulkLoadProcessor}}: 34.1 odd empty line after ctor header. 34.2 {{processBatch}} is still coupled with JDBC related class. 34.3 I think class name {{BulkLoadContext}} would reflect what this class does more correctly. In fact, it does not process anything, just holds various bits that help the process. In favor of this tells the fact that you even use the word "context" in javadoc for this class. 35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have already suggested that we introduce some JDBC neutral enum to indicate current stage of the process. 36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if". 37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply {{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has to do with the command overall and *not* with an individual batch. It's sent once per bulk load. 38. \{{UpdateMode: t}}here's a typo in javadocs you've added to all enum members. Please correct: com*_m_*and 39. {{UpdatePlanBuilder}}: unused new static import. 40. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, not {{long}}? 41. {{JdbcQueryTask#call}}: I think it would be better to gracefully close newly created context before throwing an exception. 42. {{BulkLoadCsvFormat}}: many new members and their getters have "Re" suffix. It's not standard to Ignite codebase. I would either replace it to "regex" or "regexp", or even remove it - the user of its class can clearly see what is return type of those getters, and those suffixes actually don't tell anything new. 43. {{DdlStatementsProcessor#runDdlStatement}}: why employ log throttling here? Why would we want to skip some error messages? This is it for now, thanks. Looking forward to fixes. was (Author: al.psc): [~kirill.shirokov], 31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I think we should remove it and simply pass {{UpdatePlan}} to {{BulkLoadProcessor}}. 32. {{BulkLoadCacheWriter}}: 32.1 Odd empty line after interface header. 32.2 Leaving this interface be does not contradict with making it extend some of existing Ignite interfaces. Please add another appropriate ancestor to this interface. 33. Regarding this: >Does this anti-pattern makes sense here? Or we're about keeping similarity >with existing code and bugward compatibility? First, providing for resilience is one of the reasons that justify "error hiding". We aim not to let JDBC handler or NIO listener crash. Second, here's no "hiding" per se as exception is properly logged and we send the user message. 34. {{BulkLoadProcessor}}: 34.1 odd empty line after ctor header. 34.2 {{processBatch}} is still coupled with JDBC related class. 34.3 I think class name {{BulkLoadContext}} would reflect what this class does more correctly. In fact, it does not process anything, just holds various bits that help the process. In favor of this tells the fact that you even use the word "context" in javadoc for this class. 35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have already suggested that we introduce some JDBC neutral enum to indicate current stage of the process. 36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if". 37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply {{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has to do with the command overall and *not* with an individual batch. It's sent once per bulk load. 38. Point *26* is unanswered and I don't see related code changes either. 39. {{UpdateMode: t}}here's a typo in javadocs you've added to all enum members. Please correct: com*_m_*and 40. {{UpdatePlanBuilder}}: unused new static import. 41. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, not {{long}}? 42.
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349007#comment-16349007 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], 31. {{BulkLoadDataConverter}} is now empty and doesn't convert anything. I think we should remove it and simply pass {{UpdatePlan}} to {{BulkLoadProcessor}}. 32. {{BulkLoadCacheWriter}}: 32.1 Odd empty line after interface header. 32.2 Leaving this interface be does not contradict with making it extend some of existing Ignite interfaces. Please add another appropriate ancestor to this interface. 33. Regarding this: >Does this anti-pattern makes sense here? Or we're about keeping similarity >with existing code and bugward compatibility? First, providing for resilience is one of the reasons that justify "error hiding". We aim not to let JDBC handler or NIO listener crash. Second, here's no "hiding" per se as exception is properly logged and we send the user message. 34. {{BulkLoadProcessor}}: 34.1 odd empty line after ctor header. 34.2 {{processBatch}} is still coupled with JDBC related class. 34.3 I think class name {{BulkLoadContext}} would reflect what this class does more correctly. In fact, it does not process anything, just holds various bits that help the process. In favor of this tells the fact that you even use the word "context" in javadoc for this class. 35. {{BulkLoadParser}}: is still coupled with JDBC related class. I have already suggested that we introduce some JDBC neutral enum to indicate current stage of the process. 36. {{JdbcRequestHandler#executeQuery}} - odd empty line after your new "if". 37. {{JdbcBulkLoadBatchAckResult}}: I suggested simply {{JdbcBulkLoadAckResult}}, without the word "batch", as this ack response has to do with the command overall and *not* with an individual batch. It's sent once per bulk load. 38. Point *26* is unanswered and I don't see related code changes either. 39. {{UpdateMode: t}}here's a typo in javadocs you've added to all enum members. Please correct: com*_m_*and 40. {{UpdatePlanBuilder}}: unused new static import. 41. {{BulkLoadStreamerWriter}}: why is counter internally stored as {{int}}, not {{long}}? 42. {{JdbcQueryTask#call}}: I think it would be better to gracefully close newly created context before throwing an exception. 43. {{BulkLoadCsvFormat}}: many new members and their getters have "Re" suffix. It's not standard to Ignite codebase. I would either replace it to "regex" or "regexp", or even remove it - the user of its class can clearly see what is return type of those getters, and those suffixes actually don't tell anything new. 44. {{DdlStatementsProcessor#runDdlStatement}}: why employ log throttling here? Why would we want to skip some error messages? This is it for now, thanks. Looking forward to fixes. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] >
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348807#comment-16348807 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov] 28. {{BulkLoadDataConverter}} - why columns number manipulation is still there? 29. {{BulkLoadCsvFormat}}: 29.1 Odd empty line after class header. 29.2 New members lack javadocs and empty separating lines 29.3 Empty ctor not setting anything. 30. {{BulkLoadCsvParser}}: 30.1 Odd empty line in comment for ctor. 30.2 {{parseBatch}} - please rename {{result}} to {{res}} I'm still in progress of reviewing your updates, thanks. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). > * We may want to implement data compression in future.
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348533#comment-16348533 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], Comments: {{BulkLoadDataConverter}} - # No comments neither for the class, nor for its members, please add. # I don't think it's a good idea to make new lists and add there anything - totally unnecessary GC pressure that can easily be avoided. Just add equality check to methods performing SQL operations as suggested above, or add to {{processRow}} a param that would control whether we're expecting *exact* number of columns or number of columns *at least* as present in the plan. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). >
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:45 AM: - [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these references are in deeper "guts". We need to decouple these somehow - one day we might want to support this feature in ODBC, or somewhere else, and then it'd be inevitable. I'd just introduce an enum for various command types. 10. {{BulkLoadFormat}}: 10.1 Unused import 10.2 Odd empty lines in the beginning of class, in switch, etc 11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably they should be single class and implementations could be merged accordingly too. Can you come up with a case where we would really need to separate this entities? Say, one parser for few formats, or one format yielding more than one type of parser? 12. {{BulkLoadCsvParser}}: 12.1 {{processBatch}} method looks like it could be actually defined in parent class, and the logic to be changed in subclasses actually lives in {{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are defined on higher level. Can we rework hierarchy accordingly? 12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all? 12.3 Odd empty lines after class header, method declarations 13. {{BulkLoadEntryConverter}}: do we really expect more than one implementation of this interface? If no, let's just replace it with {{IgniteClosureX}} or something like that. We have more than enough interfaces and abstract classes to represent lambdas. *Before refactoring this, please look at pt.17.3 as well* 14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, and this further assures that parser and its format must be one. 15. {{BulkLoadContext}}: too long comment line. 16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this interface? If no, just remove this class and let {{BulkLoadContext}} know about streamer. If yes, make it inherit one of Ignite's provided classes/interfaces for lambdas. *Before refactoring this, please look at pt.17.3 as well* 17. {{JdbcRequestHandler}}: 17.1 Unused import 17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, please do the same here. 17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data conversion, feeding to streamer) is just plain wrong. Not only it's logic that could be reused when we'll implement same feature for other interface like JDBC, it's also not the area of responsibility of JDBC request handler. This again shows the need of refactoring of {{BulkLoadContext}}: probably we need to introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. It will just accept {{List}} and do with it whatever it wants. JDBC must not know about this at all, it must just detect correct context and feed lists of objects to that new class I've mentioned above. 17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would decrease scope depth, make code better readable and decrease overall size of diff (never hurts). 17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to do with files. 18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 usages), not to mention that field name is very misleading - what does JDBC have to do with parsing? 19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we should mix words "request" and "result", we never do so in JDBC. Not to mention that we never send {{JdbcBulkLoadBatchRequest}} *before* we get {{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to {{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} command and never else. This is not all, more comments will follow. was (Author: al.psc): [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:44 AM: - [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these references are in deeper "guts". We need to decouple these somehow - one day we might want to support this feature in ODBC, or somewhere else, and then it'd be inevitable. I'd just introduce an enum for various command types. 10. {{BulkLoadFormat}}: 10.1 Unused import 10.2 Odd empty lines in the beginning of class, in switch, etc 11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably they should be single class and implementations could be merged accordingly too. Can you come up with a case where we would really need to separate this entities? Say, one parser for few formats, or one format yielding more than one type of parser? 12. {{BulkLoadCsvParser}}: 12.1 {{processBatch}} method looks like it could be actually defined in parent class, and the logic to be changed in subclasses actually lives in {{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are defined on higher level. Can we rework hierarchy accordingly? 12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all? 12.3 Odd empty lines after class header, method declarations 13. {{BulkLoadEntryConverter}}: do we really expect more than one implementation of this interface? If no, let's just replace it with {{IgniteClosureX}} or something like that. We have more than enough interfaces and abstract classes to represent lambdas. *Before refactoring this, please look at pt.10.3 as well* 14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, and this further assures that parser and its format must be one. 15. {{BulkLoadContext}}: too long comment line. 16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this interface?quit If no, just remove this class and let {{BulkLoadContext}} know about streamer. If yes, make it inherit one of Ignite's provided classes/interfaces for lambdas. *Before refactoring this, please look at pt.10.3 as well* 17. {{JdbcRequestHandler}}: 17.1 Unused import 17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, please do the same here. 17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data conversion, feeding to streamer) is just plain wrong. Not only it's logic that could be reused when we'll implement same feature for other interface like JDBC, it's also not the area of responsibility of JDBC request handler. This again shows the need of refactoring of {{BulkLoadContext}}: probably we need to introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. It will just accept {{List}} and do with it whatever it wants. JDBC must not know about this at all, it must just detect correct context and feed lists of objects to that new class I've mentioned above. 17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would decrease scope depth, make code better readable and decrease overall size of diff (never hurts). 17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to do with files. 18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 usages), not to mention that field name is very misleading - what does JDBC have to do with parsing? 19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we should mix words "request" and "result", we never do so in JDBC. Not to mention that we never send {{JdbcBulkLoadBatchRequest}} *before* we get {{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to {{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} command and never else. This is not all, more comments will follow. was (Author: al.psc): [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else,
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:43 AM: - [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these references are in deeper "guts". We need to decouple these somehow - one day we might want to support this feature in ODBC, or somewhere else, and then it'd be inevitable. I'd just introduce an enum for various command types. 10. {{BulkLoadFormat}}: 10.1 Unused import 10.2 Odd empty lines in the beginning of class, in switch, etc 11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably they should be single class and implementations could be merged accordingly too. Can you come up with a case where we would really need to separate this entities? Say, one parser for few formats, or one format yielding more than one type of parser? 12. {{BulkLoadCsvParser}}: 12.1\{{ processBatch}} method looks like it could be actually defined in parent class, and the logic to be changed in subclasses actually lives in {{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are defined on higher level. Can we rework hierarchy accordingly? 12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all? 12.3 Odd empty lines after class header, method declarations 13. {{BulkLoadEntryConverter}}: do we really expect more than one implementation of this interface? If no, let's just replace it with {{IgniteClosureX}} or something like that. We have more than enough interfaces and abstract classes to represent lambdas. *Before refactoring this, please look at pt.10.3 as well* 14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, and this further assures that parser and its format must be one. 15. {{BulkLoadContext}}: too long comment line. 16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this interface?quit If no, just remove this class and let {{BulkLoadContext}} know about streamer. If yes, make it inherit one of Ignite's provided classes/interfaces for lambdas. *Before refactoring this, please look at pt.10.3 as well* 17. {{JdbcRequestHandler}}: 17.1 Unused import 17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, please do the same here. 17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data conversion, feeding to streamer) is just plain wrong. Not only it's logic that could be reused when we'll implement same feature for other interface like JDBC, it's also not the area of responsibility of JDBC request handler. This again shows the need of refactoring of {{BulkLoadContext}}: probably we need to introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. It will just accept {{List}} and do with it whatever it wants. JDBC must not know about this at all, it must just detect correct context and feed lists of objects to that new class I've mentioned above. 17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would decrease scope depth, make code better readable and decrease overall size of diff (never hurts). 17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to do with files. 18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 usages), not to mention that field name is very misleading - what does JDBC have to do with parsing? 19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we should mix words "request" and "result", we never do so in JDBC. Not to mention that we never send {{JdbcBulkLoadBatchRequest}} *before* we get {{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to {{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} command and never else. This is not all, more comments will follow. was (Author: al.psc): [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else,
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348211#comment-16348211 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov] > I think it's OK to silently skip extra columns. It's not entirely ok here as this method ({{processRow}}) is called from other contexts too, like INSERT/MERGE. Probably the least dirty way here would be just to add equality check to other methods (other than {{bulkLoad}}) calling {{processRow}}. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). > * We may want to implement data compression in future. > * We connect to single node in JDBC driver (no multi-node connections). > h3. Cache/tables/column handling > * We don't create
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347177#comment-16347177 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:19 AM: - [~kirill.shirokov], please allow me to proceed. 20. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative stuff. We need to cover positive cases extensively as well. 21. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be better to keep update counter in {{BulkLoadContext}} as the notion of counter is used not just in JDBC driver. 22. {{JdbcThinStatement}} - 22.1 let's catch and wrap any exception in {{sendFile}} 22.2 Odd empty line in the beginning of {{sendFile}} 23. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV line contain commas, do we? This clearly does not seem right and I don't see any tests for that either. Shouldn't we handle escaped quotes too? 24. All {{*PipelineBlock}} classes - odd empty line after class header. 25. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list of arrays could do? 26. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in fake "row" while all other parameters should be retrieved via getters. Can we unify this? (E.g. add getters for everything AND return everything in fake row - that would be consistent.) 27. Please add more tests for behavior in case of failure - server or client disconnect during file processing, file open error on client, etc. I think this is it for now. Looking forward to the fixes. was (Author: al.psc): [~kirill.shirokov], please allow me to proceed. 1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative stuff. We need to cover positive cases extensively as well. 2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be better to keep update counter in {{BulkLoadContext}} as the notion of counter is used not just in JDBC driver. 3. {{JdbcThinStatement}} - 3.1 let's catch and wrap any exception in {{sendFile}} 3.2 Odd empty line in the beginning of {{sendFile}} 4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV line contain commas, do we? This clearly does not seem right and I don't see any tests for that either. Shouldn't we handle escaped quotes too? 5. All {{*PipelineBlock}} classes - odd empty line after class header. 6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list of arrays could do? 7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in fake "row" while all other parameters should be retrieved via getters. Can we unify this? (E.g. add getters for everything AND return everything in fake row - that would be consistent.) 8. Please add more tests for behavior in case of failure - server or client disconnect during file processing, file open error on client, etc. I think this is it for now. Looking forward to the fixes. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name []
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774 ] Alexander Paschenko edited comment on IGNITE-6917 at 2/1/18 8:17 AM: - [~kirill.shirokov], please let me continue with comments. 8. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 9. {{BulkLoadParser}}: 9.1 Unused private field. 9.2 Too long line in {{processBatch}} declaration. 9.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these references are in deeper "guts". We need to decouple these somehow - one day we might work to support this feature in ODBC, or somewhere else, and then it'd be inevitable. I'd just introduce an enum for various command types. 10. {{BulkLoadFormat}}: 10.1 Unused import 10.2 Odd empty lines in the beginning of class, in switch, etc 11. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably they should be single class and implementations could be merged accordingly too. Can you come up with a case where we would really need to separate this entities? Say, one parser for few formats, or one format yielding more than one type of parser? 12. {{BulkLoadCsvParser}}: 12.1{{ processBatch}} method looks like it could be actually defined in parent class, and the logic to be changed in subclasses actually lives in {{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are defined on higher level. Can we rework hierarchy accordingly? 12.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all? 12.3 Odd empty lines after class header, method declarations 13. {{BulkLoadEntryConverter}}: do we really expect more than one implementation of this interface? If no, let's just replace it with {{IgniteClosureX}} or something like that. We have more than enough interfaces and abstract classes to represent lambdas. *Before refactoring this, please look at pt.10.3 as well* 14. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, and this further assures that parser and its format must be one. 15. {{BulkLoadContext}}: too long comment line. 16. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this interface?quit If no, just remove this class and let {{BulkLoadContext}} know about streamer. If yes, make it inherit one of Ignite's provided classes/interfaces for lambdas. *Before refactoring this, please look at pt.10.3 as well* 17. {{JdbcRequestHandler}}: 17.1 Unused import 17.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, please do the same here. 17.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data conversion, feeding to streamer) is just plain wrong. Not only it's logic that could be reused when we'll implement same feature for other interface like JDBC, it's also not the area of responsibility of JDBC request handler. This again shows the need of refactoring of {{BulkLoadContext}}: probably we need to introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. It will just accept {{List}} and do with it whatever it wants. JDBC must not know about this at all, it must just detect correct context and feed lists of objects to that new class I've mentioned above. 17.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would decrease scope depth, make code better readable and decrease overall size of diff (never hurts). 17.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to do with files. 18. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 usages), not to mention that field name is very misleading - what does JDBC have to do with parsing? 19. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we should mix words "request" and "result", we never do so in JDBC. Not to mention that we never send {{JdbcBulkLoadBatchRequest}} *before* we get {{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to {{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} command and never else. This is not all, more comments will follow. was (Author: al.psc): [~kirill.shirokov], please let me continue with comments. 1. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 2. {{BulkLoadParser}}: 2.1 Unused private field. 2.2 Too long line in {{processBatch}} declaration. 2.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else,
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347233#comment-16347233 ] Alexander Paschenko commented on IGNITE-6917: - And I'll rename all points to use the same single through numeration, sorry for the inconvenience. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block size > per batch). > * We may want to implement data compression in future. > * We connect to single node in JDBC driver (no multi-node connections). > h3. Cache/tables/column handling > * We don't create table in the bulk load command > * We may want to have and option for reading header row, which contains > column names to match columns > * In future we may wish to support COLUMNS (col1, _, col2, _, col3) syntax, > where
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347230#comment-16347230 ] Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 5:28 PM: -- [~kirill.shirokov], Regarding 7.2 - yes, please disregard empty columns list. Suppose there's _key and _val. Regarding empty lines - they are always added for readability but we don't add them right after line ending with {, be it method or class declaration or loop or switch. Yes, with long method declarations this may not look good, but for consistency let's stick with what I suggest, thanks. Regarding 5 - ok, Vlad's always right. Still IMHO main thing is not simpler code but better usability here. (Just saying.) was (Author: al.psc): [~kirill.shirokov], Regarding 7.2 - yes, please disregard empty columns list. Suppose there's _key and _val. Regarding empty fields - they are always added for readability but we don't add them right after line ending with {, be it method or class declaration or loop or switch. Yes, with long method declarations this may not look good, but for consistency let's stick with what I suggest, thanks. Regarding 5 - ok, Vlad's always right. Still IMHO main thing is not simpler code but better usability here. (Just saying.) > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347230#comment-16347230 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], Regarding 7.2 - yes, please disregard empty columns list. Suppose there's _key and _val. Regarding empty fields - they are always added for readability but we don't add them right after line ending with {, be it method or class declaration or loop or switch. Yes, with long method declarations this may not look good, but for consistency let's stick with what I suggest, thanks. Regarding 5 - ok, Vlad's always right. Still IMHO main thing is not simpler code but better usability here. (Just saying.) > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one quote character is present or escape sequence is invalid. > h2. File handling > * File character set to be supported in future > * Skipped/imported row number (or first/last line or skip header option), > skipped/imported column number (or first/last column): to be supported in > future > * Line start pattern (as in MySQL): no support planned > * We currently support only client-side import. No server-side file import. > * We may want to support client-side stdin import in future. > * We do not handle importing multiple files from single command > * We don't benefit from any kind of pre-sorting pre-partitioning data on > client side. > * We don't include any any metadata, such as line number from client side. > h3. Transferring data > * We send file data via batches. In future we will support batch size > (specified with rows per batch or data block
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347177#comment-16347177 ] Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 5:22 PM: -- [~kirill.shirokov], please allow me to proceed. 1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative stuff. We need to cover positive cases extensively as well. 2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be better to keep update counter in {{BulkLoadContext}} as the notion of counter is used not just in JDBC driver. 3. {{JdbcThinStatement}} - 3.1 let's catch and wrap any exception in {{sendFile}} 3.2 Odd empty line in the beginning of {{sendFile}} 4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV line contain commas, do we? This clearly does not seem right and I don't see any tests for that either. Shouldn't we handle escaped quotes too? 5. All {{*PipelineBlock}} classes - odd empty line after class header. 6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list of arrays could do? 7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in fake "row" while all other parameters should be retrieved via getters. Can we unify this? (E.g. add getters for everything AND return everything in fake row - that would be consistent.) 8. Please add more tests for behavior in case of failure - server or client disconnect during file processing, file open error on client, etc. I think this is it for now. Looking forward to the fixes. was (Author: al.psc): [~kirill.shirokov], please allow me to proceed. 1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative stuff. We need to cover positive cases extensively as well. 2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be better to keep update counter in {{BulkLoadContext}} as the notion of counter is used not just in JDBC driver. 3. {{JdbcThinStatement}} - 3.1 let's catch and wrap any exception in {{sendFile}} 3.2 Odd empty line in the beginning of {{sendFile}} 4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV line contain commas, do we? This clearly does not seem right and I don't see any tests for that either. Shouldn't we handle escaped quotes too? 5. All {{*PipelineBlock}} classes - odd empty line after class header. 6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list of arrays could do? 7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in fake "row" while all other parameters should be retrieved via getters. Can we unify this? (E.g. add getters for everything AND return everything in fake row - that would be consistent.) 8. Please add more tests for behavior in case of failure - server or client disconnect during file processing, file open error on client, etc. I think this is it for now. Looking forward for the fixes. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name []
[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL
[ https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347216#comment-16347216 ] Alexander Paschenko commented on IGNITE-3999: - [~aakhmedov] I've started looking at the patch. I'm in progress, meanwhile please ensure that: * all method overrides have javadoc like this: /** \{@inheritDoc} */ * all new parameters have matching javadocs. > Support case insensitive search in SQL > -- > > Key: IGNITE-3999 > URL: https://issues.apache.org/jira/browse/IGNITE-3999 > Project: Ignite > Issue Type: Improvement > Components: sql >Affects Versions: 1.7 >Reporter: Valentin Kulichenko >Assignee: Amir Akhmedov >Priority: Critical > Fix For: 2.5 > > > Currently case insensitive search is possible only with the help of > {{lower()}} function: > {code} > select name from MyValue where lower(name) = 'abc_5' > {code} > But this will always be a full scan, even if {{name}} field is indexed. > We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add > a respective property to {{@QuerySqlField}} annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (IGNITE-3999) Support case insensitive search in SQL
[ https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-3999: Comment: was deleted (was: Upsource review for the patch) > Support case insensitive search in SQL > -- > > Key: IGNITE-3999 > URL: https://issues.apache.org/jira/browse/IGNITE-3999 > Project: Ignite > Issue Type: Improvement > Components: sql >Affects Versions: 1.7 >Reporter: Valentin Kulichenko >Assignee: Amir Akhmedov >Priority: Critical > Fix For: 2.5 > > > Currently case insensitive search is possible only with the help of > {{lower()}} function: > {code} > select name from MyValue where lower(name) = 'abc_5' > {code} > But this will always be a full scan, even if {{name}} field is indexed. > We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add > a respective property to {{@QuerySqlField}} annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL
[ https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347197#comment-16347197 ] Alexander Paschenko commented on IGNITE-3999: - Added UpSource review link for the patch. > Support case insensitive search in SQL > -- > > Key: IGNITE-3999 > URL: https://issues.apache.org/jira/browse/IGNITE-3999 > Project: Ignite > Issue Type: Improvement > Components: sql >Affects Versions: 1.7 >Reporter: Valentin Kulichenko >Assignee: Amir Akhmedov >Priority: Critical > Fix For: 2.5 > > > Currently case insensitive search is possible only with the help of > {{lower()}} function: > {code} > select name from MyValue where lower(name) = 'abc_5' > {code} > But this will always be a full scan, even if {{name}} field is indexed. > We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add > a respective property to {{@QuerySqlField}} annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL
[ https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347196#comment-16347196 ] Alexander Paschenko commented on IGNITE-3999: - Upsource review for the patch > Support case insensitive search in SQL > -- > > Key: IGNITE-3999 > URL: https://issues.apache.org/jira/browse/IGNITE-3999 > Project: Ignite > Issue Type: Improvement > Components: sql >Affects Versions: 1.7 >Reporter: Valentin Kulichenko >Assignee: Amir Akhmedov >Priority: Critical > Fix For: 2.5 > > > Currently case insensitive search is possible only with the help of > {{lower()}} function: > {code} > select name from MyValue where lower(name) = 'abc_5' > {code} > But this will always be a full scan, even if {{name}} field is indexed. > We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add > a respective property to {{@QuerySqlField}} annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-3999) Support case insensitive search in SQL
[ https://issues.apache.org/jira/browse/IGNITE-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347191#comment-16347191 ] Alexander Paschenko commented on IGNITE-3999: - [~aakhmedov], [~vkulichenko], I will do that, thanks > Support case insensitive search in SQL > -- > > Key: IGNITE-3999 > URL: https://issues.apache.org/jira/browse/IGNITE-3999 > Project: Ignite > Issue Type: Improvement > Components: sql >Affects Versions: 1.7 >Reporter: Valentin Kulichenko >Assignee: Amir Akhmedov >Priority: Critical > Fix For: 2.5 > > > Currently case insensitive search is possible only with the help of > {{lower()}} function: > {code} > select name from MyValue where lower(name) = 'abc_5' > {code} > But this will always be a full scan, even if {{name}} field is indexed. > We need to correctly support {{VARCHAR_IGNORECASE}} H2 type in Ignite and add > a respective property to {{@QuerySqlField}} annotation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347177#comment-16347177 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], please allow me to proceed. 1. {{SqlParserCopySelfTest}} - no single test for positive cases, only negative stuff. We need to cover positive cases extensively as well. 2. {{JdbcBulkLoadContext}} - probably we don't need this class and it would be better to keep update counter in {{BulkLoadContext}} as the notion of counter is used not just in JDBC driver. 3. {{JdbcThinStatement}} - 3.1 let's catch and wrap any exception in {{sendFile}} 3.2 Odd empty line in the beginning of {{sendFile}} 4. {{CsvLineProcessorBlock}} - we don't handle cases when quoted strings in CSV line contain commas, do we? This clearly does not seem right and I don't see any tests for that either. Shouldn't we handle escaped quotes too? 5. All {{*PipelineBlock}} classes - odd empty line after class header. 6. {{StrListAppenderBlock}} - do we need list of lists here? Probably just list of arrays could do? 7. {{BulkLoadContextCursor}}{{}} - it's unclear to me why we return context in fake "row" while all other parameters should be retrieved via getters. Can we unify this? (E.g. add getters for everything AND return everything in fake row - that would be consistent.) 8. Please add more tests for behavior in case of failure - server or client disconnect during file processing, file open error on client, etc. I think this is it for now. Looking forward for the fixes. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1. Implementation decisions and notes > h2. Parsing > * We support CSV format described in RFC 4180. > * Custom row and column separators, quoting characters are currently hardcoded > * Escape sequences, line comment characters are currently not supported > * We may want to support fixed-length formats (via format descriptors) in > future > * We may want to strip comments from lines (for example, starting with '#') > * We may want to allow user to either ignore empty lines or treat them as a > special case of record having all default values > * We may allow user to enable whitespace trimming from beginning and end of a > line > * We may want to allow user to specify error handling strategy: e.g., only > one
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345541#comment-16345541 ] Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 2:47 PM: -- [~kirill.shirokov], my comments: 1. {{JdbcAbstractUpdateStatementSelfTest}} - unused code 2. {{org.apache.ignite.internal.jdbc2.JdbcBulkLoadSelfTest: createConnection}} - unused param; {{testBulkLoadThrows}} - please use try-with-resources. 3. {{org.apache.ignite.jdbc.thin.JdbcThinAbstractDmlStatementSelfTest:}} 3.1 {{jdbcRun}}: please remove this method and use inherited method execute instead. 3.2 {{testDOA}}: please rename into something more meaningful. 4. {{BulkLoadContextCursor:}} 4.1 Please make field final. 4.2 {{getAll}} - I changed the code to double call of {{singletonList}} just fine. 4.3 {{getFieldName}} - I believe this method should throw {{NoSuchElementException}} for all {{i}} except {{0}}. 4.4 Odd empty line right after class header. 5. {{SqlBulkLoadCommand}}: let's parse command in a way consistent with how it's done in SQL usually - we clearly should not use ugly stuff like {{BATCH_SIZE}} (with underscore instead of space) now that we are headed to our own parser. Please rework parsing so that user has to write {{BATCH SIZE}}. (NB: yes, I'm aware that we already have quite ugly stuff here, like {{INLINE_SIZE}} for indexes, but I believe that this must be avoided as much as possible.) -Also I think we should support case when the user does not supply columns list (so no empty brackets too) - currently it's a parsing error and there's a test for it.- _(Please disregard what's been struck through.)_ 6. {{UpdatePlan#processRow}}: 6.1 Odd empty line after method declaration. 6.2. Shouldn't we rather check for equality here? {{row.size() < colNames.length}} - like this 7. {{UpdatePlanBuilder#planForBulkLoad}}: 7.1 Odd empty line after method declaration. 7.2 Let's add VERY simple test case (like cache ofand bulk load to it) - I think this may break on current implementation as we may actually will to insert into _val column, and this case does not seem to be handled - say, like this {{COPY from file to String () format csv}} (String is table name, table has 3 columns all of which are system columns.) This is not all, more comments will follow. was (Author: al.psc): [~kirill.shirokov], my comments: 1. {{JdbcAbstractUpdateStatementSelfTest}} - unused code 2. {{org.apache.ignite.internal.jdbc2.JdbcBulkLoadSelfTest: createConnection}} - unused param; {{testBulkLoadThrows}} - please use try-with-resources. 3. {{org.apache.ignite.jdbc.thin.JdbcThinAbstractDmlStatementSelfTest:}} 3.1 {{jdbcRun}}: please remove this method and use inherited method execute instead. 3.2 {{testDOA}}: please rename into something more meaningful. 4. {{BulkLoadContextCursor:}} 4.1 Please make field final. 4.2 {{getAll}} - I changed the code to double call of {{singletonList}} just fine. 4.3 {{getFieldName}} - I believe this method should throw {{NoSuchElementException}} for all {{i}} except {{0}}. 4.4 Odd empty line right after class header. 5. {{SqlBulkLoadCommand}}: let's parse command in a way consistent with how it's done in SQL usually - we clearly should not use ugly stuff like {{BATCH_SIZE}} (with underscore instead of space) now that we are headed to our own parser. Please rework parsing so that user has to write {{BATCH SIZE}}. (NB: yes, I'm aware that we already have quite ugly stuff here, like {{INLINE_SIZE}} for indexes, but I believe that this must be avoided as much as possible.) Also I think we should support case when the user does not supply columns list (so no empty brackets too) - currently it's a parsing error and there's a test for it. 6. {{UpdatePlan#processRow}}: 6.1 Odd empty line after method declaration. 6.2. Shouldn't we rather check for equality here? {{row.size() < colNames.length}} - like this 7. {{UpdatePlanBuilder#planForBulkLoad}}: 7.1 Odd empty line after method declaration. 7.2 Let's add VERY simple test case (like cache of and bulk load to it) - I think this may break on current implementation as we may actually will to insert into _val column, and this case does not seem to be handled - say, like this {{COPY from file to String () format csv}} (String is table name, table has 3 columns all of which are system columns.) This is not all, more comments will follow. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >
[jira] [Comment Edited] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774 ] Alexander Paschenko edited comment on IGNITE-6917 at 1/31/18 1:13 PM: -- [~kirill.shirokov], please let me continue with comments. 1. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 2. {{BulkLoadParser}}: 2.1 Unused private field. 2.2 Too long line in {{processBatch}} declaration. 2.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these references are in deeper "guts". We need to decouple these somehow - one day we might work to support this feature in ODBC, or somewhere else, and then it'd be inevitable. I'd just introduce an enum for various command types. 3. {{BulkLoadFormat}}: 3.1 Unused import 3.2 Odd empty lines in the beginning of class, in switch, etc 4. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably they should be single class and implementations could be merged accordingly too. Can you come up with a case where we would really need to separate this entities? Say, one parser for few formats, or one format yielding more than one type of parser? 5. {{BulkLoadCsvParser}}: {{5.1 processBatch}} method looks like it could be actually defined in parent class, and the logic to be changed in subclasses actually lives in {{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF}} are defined on higher level. Can we rework hierarchy accordingly? 5.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all? 5.3 Odd empty lines after class header, method declarations 6. {{BulkLoadEntryConverter}}: do we really expect more than one implementation of this interface? If no, let's just replace it with {{IgniteClosureX}} or something like that. We have more than enough interfaces and abstract classes to represent lambdas. *Before refactoring this, please look at pt.10.3 as well* 7. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, and this further assures that parser and its format must be one. 8. {{BulkLoadContext}}: too long comment line. 9. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this interface?quit If no, just remove this class and let {{BulkLoadContext}} know about streamer. If yes, make it inherit one of Ignite's provided classes/interfaces for lambdas. *Before refactoring this, please look at pt.10.3 as well* 10. {{JdbcRequestHandler}}: 10.1 Unused import 10.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, please do the same here. 10.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data conversion, feeding to streamer) is just plain wrong. Not only it's logic that could be reused when we'll implement same feature for other interface like JDBC, it's also not the area of responsibility of JDBC request handler. This again shows the need of refactoring of {{BulkLoadContext}}: probably we need to introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. It will just accept {{List}} and do with it whatever it wants. JDBC must not know about this at all, it must just detect correct context and feed lists of objects to that new class I've mentioned above. 10.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would decrease scope depth, make code better readable and decrease overall size of diff (never hurts). 10.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to do with files. 11. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 usages), not to mention that field name is very misleading - what does JDBC have to do with parsing? 12. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we should mix words "request" and "result", we never do so in JDBC. Not to mention that we never send {{JdbcBulkLoadBatchRequest}} *before* we get {{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to {{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} command and never else. This is not all, more comments will follow. was (Author: al.psc): [~kirill.shirokov], please let me continue with comments. 1. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 2. {{BulkLoadParser}}: 2.1 Unused private field. 2.2 Too long line in {{processBatch}} declaration. 2.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346774#comment-16346774 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], please let me continue with comments. 1. {{BulkLoadParameters}} - do we really need this class? It just has two fields. For now I see no point in it. Do you expect many more new fields to come? 2. {{BulkLoadParser}}: 2.1 Unused private field. 2.2 Too long line in {{processBatch}} declaration. 2.3 It seems plain wrong that arguments of {{processBatch}} are related with JDBC. JDBC is a connectivity interface, nothing else, and these references are in deeper "guts". We need to decouple these somehow - one day we might work to support this feature in ODBC, or somewhere else, and then it'd be inevitable. I'd just introduce an enum for various command types. 3. {{BulkLoadFormat}}: 3.1 Unused import 3.2 Odd empty lines in the beginning of class, in switch, etc 4. {{BulkLoadParser}} and {{BulkLoadFormat}} seem to be related 1:1 - probably they should be single class and implementations could be merged accordingly too. Can you come up with a case where we would really need to separate this entities? Say, one parser for few formats, or one format yielding more than one type of parser? 5. {{BulkLoadCsvParser}}: {{5.1 processBatch}} method looks like it could be actually defined in parent class, and the logic to be changed in subclasses actually lives in {{parseBatch}}. This is so because command types like {{CMD_FINISHED_EOF }}are defined on higher level. Can we rework hierarchy accordingly? 5.2 Unused parameter in ctor ( {{params}} ), do we have tests for params at all? 5.3 Odd empty lines after class header, method declarations 6. {{BulkLoadEntryConverter}}: do we really expect more than one implementation of this interface? If no, let's just replace it with {{IgniteClosureX}} or something like that. We have more than enough interfaces and abstract classes to represent lambdas. *Before refactoring this, please look at pt.10.3 as well* 7. {{BulkLoadCsvFormat}}: amount of useful code in this class approaches zero, and this further assures that parser and its format must be one. 8. {{BulkLoadContext}}: too long comment line. 9. {{BulkLoadCacheWriter}}: do we expect more than one implementation of this interface?quit If no, just remove this class and let {{BulkLoadContext}} know about streamer. If yes, make it inherit one of Ignite's provided classes/interfaces for lambdas. *Before refactoring this, please look at pt.10.3 as well* 10. {{JdbcRequestHandler}}: 10.1 Unused import 10.2 {{processBulkLoadFileBatch}}: all other methods just catch {{Exception}}, please do the same here. 10.3 {{processBulkLoadFileBatch}}: it looks like doing actual job (data conversion, feeding to streamer) is just plain wrong. Not only it's logic that could be reused when we'll implement same feature for other interface like JDBC, it's also not the area of responsibility of JDBC request handler. This again shows the need of refactoring of {{BulkLoadContext}}: probably we need to introduce a concrete class that will *a)* wrap streamer and *b)* do conversion. It will just accept {{List}} and do with it whatever it wants. JDBC must not know about this at all, it must just detect correct context and feed lists of objects to that new class I've mentioned above. 10.4 {{executeQuery}}: I'd move whole stuff of {{if (fieldsCur instanceof BulkLoadContextCursor)}} above, before {{if (results.size() == 1)}} - it would decrease scope depth, make code better readable and decrease overall size of diff (never hurts). 10.5 {{executeQuery}}: {{filesToSendResult}} - doesn't have anything at all to do with files. 11. {{JdbcBulkLoadContext}}: unused field {{cmdParsingResult}} (getter has 0 usages), not to mention that field name is very misleading - what does JDBC have to do with parsing? 12. {{JdbcBulkLoadBatchRequestResult}}: quite weird name, I don't think we should mix words "request" and "result", we never do so in JDBC. Not to mention that we never send {{JdbcBulkLoadBatchRequest}} *before* we get {{JdbcBulkLoadBatchRequestResult}} :) I'd just rename this to {{JdbcBulkLoadAckResult}} as it's sent as an acknowledgement of {{COPY}} command and never else. This is not all, more comments will follow. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC
[jira] [Commented] (IGNITE-6917) SQL: implement COPY command for efficient data loading
[ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345541#comment-16345541 ] Alexander Paschenko commented on IGNITE-6917: - [~kirill.shirokov], my comments: 1. {{JdbcAbstractUpdateStatementSelfTest}} - unused code 2. {{org.apache.ignite.internal.jdbc2.JdbcBulkLoadSelfTest: createConnection}} - unused param; {{testBulkLoadThrows}} - please use try-with-resources. 3. {{org.apache.ignite.jdbc.thin.JdbcThinAbstractDmlStatementSelfTest:}} 3.1 {{jdbcRun}}: please remove this method and use inherited method execute instead. 3.2 {{testDOA}}: please rename into something more meaningful. 4. {{BulkLoadContextCursor:}} 4.1 Please make field final. 4.2 {{getAll}} - I changed the code to double call of {{singletonList}} just fine. 4.3 {{getFieldName}} - I believe this method should throw {{NoSuchElementException}} for all {{i}} except {{0}}. 4.4 Odd empty line right after class header. 5. {{SqlBulkLoadCommand}}: let's parse command in a way consistent with how it's done in SQL usually - we clearly should not use ugly stuff like {{BATCH_SIZE}} (with underscore instead of space) now that we are headed to our own parser. Please rework parsing so that user has to write {{BATCH SIZE}}. (NB: yes, I'm aware that we already have quite ugly stuff here, like {{INLINE_SIZE}} for indexes, but I believe that this must be avoided as much as possible.) Also I think we should support case when the user does not supply columns list (so no empty brackets too) - currently it's a parsing error and there's a test for it. 6. {{UpdatePlan#processRow}}: 6.1 Odd empty line after method declaration. 6.2. Shouldn't we rather check for equality here? {{row.size() < colNames.length}} - like this 7. {{UpdatePlanBuilder#planForBulkLoad}}: 7.1 Odd empty line after method declaration. 7.2 Let's add VERY simple test case (like cache ofand bulk load to it) - I think this may break on current implementation as we may actually will to insert into _val column, and this case does not seem to be handled - say, like this {{COPY from file to String () format csv}} (String is table name, table has 3 columns all of which are system columns.) This is not all, more comments will follow. > SQL: implement COPY command for efficient data loading > -- > > Key: IGNITE-6917 > URL: https://issues.apache.org/jira/browse/IGNITE-6917 > Project: Ignite > Issue Type: New Feature > Components: sql >Reporter: Vladimir Ozerov >Assignee: Kirill Shirokov >Priority: Major > Labels: iep-1 > > Inspired by Postgres [1] > Common use case - bulk data load through JDBC/ODBC interface. Currently it is > only possible to execute single commands one by one. We already can batch > them to improve performance, but there is still big room for improvement. > We should think of a completely new command - {{COPY}}. It will accept a file > (or input stream in general case) on the client side, then transfer data to > the cluster, and then execute update inside the cluster, e.g. through > streamer. > First of all we need to create quick and dirty prototype to assess potential > performance improvement. It speedup is confirmed, we should build base > implementation which will accept only files. But at the same time we should > understand how it will evolve in future: multiple file formats (probably > including Hadoop formarts, e.g. Parquet), escape characters, input streams, > etc.. > [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html] > h1. Proposed syntax > Curent implementation: > {noformat} > COPY > FROM "file.name" > INTO . > [(col-name, ...)] > FORMAT -- Only CSV format is supported in the current > release > [BATCH_SIZE ] > {noformat} > We may want to gradually add features to this command in future to have > something like this: > {noformat} > COPY > FROM "file.name"[CHARSET ""] > INTO . [CREATE [IF NOT EXISTS]] > [(col-name [] [NULLABLE] [ESCAPES], ...) [MATCH HEADER]] > FORMAT (csv|tsv|...) > -- CSV format options: > [FIELDSEP='column-separators-regexp'] > [LINESEP='row-separators-regexp'] > [QUOTE='quote-chars'] > [ESCAPE='escape-char'] > [NULL='null-sequence'] > [COMMENT='single-line-comment-start-char'] > [TRIM_LINES] > [IMPORT_EMPTY_LINES] > [CHARSET ""] > [ROWS -] > --or-- > [SKIP ROWS ] [MAX ROWS ] > [COLS -] > --or-- > [SKIP COLS ] [MAX COLS ] > [(MATCH | SKIP) HEADER] > [(REPLACE|IGNORE|ABORT ON [])) DUPLICATE KEYS] > [BATCH SIZE ( ROWS | [K|M|G|T|P])] > [COMPRESS "codec-name" [codec options]] > [LOCK (TABLE|ROWS)] > [NOLOGGING] > [BACKEND (DIRECT | STREAMER)] > {noformat} > h1.
[jira] [Comment Edited] (IGNITE-7413) SqlDmlExample: Incorrect result for Delete if run with standalone nodes (Java & .NET)
[ https://issues.apache.org/jira/browse/IGNITE-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327549#comment-16327549 ] Alexander Paschenko edited comment on IGNITE-7413 at 1/16/18 6:49 PM: -- [~ptupitsyn], I've done some investigation and here's what I've found. If you ask me, currently the problem is not {{JOIN}}, but rather {{IN}} clause. Consider following {{SELECT}} query (generated by DML engine): {code:java} SELECT _KEY, _VAL FROM "SqlDmlExamplePersons".PERSON WHERE ID IN( SELECT P.ID FROM "SqlDmlExamplePersons".PERSON P INNER JOIN "SqlDmlExampleOrganizations".ORGANIZATION O ON TRUE WHERE (O.NAME <> ?1) AND (P.ORGID = O.ID) ){code} It won't return correct result even if you set {{distributedJoins=true}} because distributed joins flag seems to work only for top-level query and doesn't seem to care about stuff like INs and other subqueries. That said, I believe that the problem is our joins implementation. What can be done with this and at what cost requires additional investigation. was (Author: al.psc): [~ptupitsyn], I've done some investigation and here's what I've found. I've looked at {{DELETE}} statement as it contains {{JOIN}} too. Alas, the problem is not {{JOIN}}, but rather {{IN}} clause. Consider following {{SELECT}} query (generated by DML engine): {code:java} SELECT _KEY, _VAL FROM "SqlDmlExamplePersons".PERSON WHERE ID IN( SELECT P.ID FROM "SqlDmlExamplePersons".PERSON P INNER JOIN "SqlDmlExampleOrganizations".ORGANIZATION O ON TRUE WHERE (O.NAME <> ?1) AND (P.ORGID = O.ID) ){code} It won't return correct result even if you set {{distributedJoins=true}} because distributed joins flag seems to work only for top-level query and doesn't seem to care about stuff like INs and other subqueries. That said, I believe that the problem is our joins implementation. What can be done with this and at what cost requires additional investigation. > SqlDmlExample: Incorrect result for Delete if run with standalone nodes (Java > & .NET) > - > > Key: IGNITE-7413 > URL: https://issues.apache.org/jira/browse/IGNITE-7413 > Project: Ignite > Issue Type: Bug > Components: platforms >Affects Versions: 2.3 >Reporter: Irina Zaporozhtseva >Assignee: Pavel Tupitsyn >Priority: Minor > Labels: .NET > Fix For: 2.4 > > > Datagrid.QueryDmlExample: Incorrect result for Delete if run with standalone > nodes > > without standalone nodes: > {code} > >>> Delete non-ASF employees > >>> 1: John Doe, ASF, 4400 > >>> 2: Jane Roe, ASF, 5500 > {code} > > with standalone nodes: > {code} > >>> Delete non-ASF employees > >>> 1: John Doe, ASF, 4400 > >>> 4: Richard Miles, Eclipse, 3000 > >>> 2: Jane Roe, ASF, 5500 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-7248) "Schema not found" error when setting streaming mode for JDBC driver
[ https://issues.apache.org/jira/browse/IGNITE-7248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327366#comment-16327366 ] Alexander Paschenko commented on IGNITE-7248: - [~tledkov-gridgain], my comments: 1. {{JdbcStreamingToPublicCacheTest}}: unused imports; looks like we don't need {{getConfiguration0}} method either, just override {{getConfiguration}} and call there {{super.getConfiguration}} like you do now. 2. The fix itself is incorrect. You can't just use cache name as schema name - they may not match (PUBLIC schema, also schema name from cache configuration...). The problem is that we mix caches and schemas in {{JdbcStatement}} and {{GridQueryProcessor}}. Correct fix would be like this: * fix {{GridQueryProcessor#prepareNativeStatement}} so that it does not worry about caches and works only in terms of schema (I reverted your change in {{JdbcStatement}} and removed {{idx.schema}} lookup from {{GridQueryProcessor#prepareNativeStatement}} and your test passes just fine). That said, please make the fix as proposed (cache name should not be used instead of schema name anywhere) and re-run tests. > "Schema not found" error when setting streaming mode for JDBC driver > > > Key: IGNITE-7248 > URL: https://issues.apache.org/jira/browse/IGNITE-7248 > Project: Ignite > Issue Type: Bug >Affects Versions: 2.3 >Reporter: Alexey Kukushkin >Assignee: Vladimir Ozerov >Priority: Major > Fix For: 2.4 > > > Using JDBC "thick" driver in streaming mode fails with a "Schema not found" > erorr: > {code} > Connection connection = > DriverManager.getConnection("jdbc:ignite:cfg://streaming=true:cache=CACHE@file:/path-to-ignite-config.xml"); > {code} > using connection to create a table works fine but this exception is generated > when using the connection to execute INSER INTO... > {code} > class org.apache.ignite.internal.processors.query.IgniteSQLException: Failed > to set schema for DB connection for thread [schema=] > org.h2.jdbc.JdbcSQLException: Schema not found; SQL statement: > SET SCHEMA "" [90079-195] > {code} > See [User > List|http://apache-ignite-users.70518.x6.nabble.com/Cannot-insert-data-into-table-using-JDBC-tc17477.html] > for more details -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6022) SQL: add native batch execution support for DML statements
[ https://issues.apache.org/jira/browse/IGNITE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327329#comment-16327329 ] Alexander Paschenko commented on IGNITE-6022: - [~rkondakov], my comment: Class {{DmlBatchSender$Batch}}: please remove direct field usages and replace them with getters that you have already introduced. Also method {{Batch#rowProcessors}} looks unused. > SQL: add native batch execution support for DML statements > -- > > Key: IGNITE-6022 > URL: https://issues.apache.org/jira/browse/IGNITE-6022 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 2.1 >Reporter: Vladimir Ozerov >Assignee: Roman Kondakov >Priority: Major > Labels: iep-1, performance > Fix For: 2.4 > > > We have batch execution support for JDBC and ODBC drivers. This decreases > number of network hops. However, we do not have any batch execution support > on the server side. It means that for batch of N similar statements, every > statement will go through the whole execution chain - parsing, splitting, > communication with servers. And while parsing and splitting might be avoided > with help of statement cache, the most heavy part - network communication - > is still there. > We need to investigate how to optimize the flow for batch updates. Possible > improvements: > 1) Execute statements with certain degree of parallelism; > 2) Send several query execution requests to the server at once; > 3) Ensure that caches are used properly for batching - we should not parse > the same request multiple times. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-6022) SQL: add native batch execution support for DML statements
[ https://issues.apache.org/jira/browse/IGNITE-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16323815#comment-16323815 ] Alexander Paschenko commented on IGNITE-6022: - [~rkondakov] Hi, my comments: 1. Do we really need additional ctor in {{SqlFieldsQueryEx}}? Args field is a list, not an array, and we have null check at {{addBatchedArgs}}, so nothing prevents us from using old ctor and adding as much args as we like. I don't think that pre-allocating list of given size justifies existence of an additional ctor we clearly can live without. 2. {{DmlStatementsProcessor.doInsertBatched}}: why do we not fail fast on unexpected exceptions and instead try to convert non {{IgniteSQLException}} s to {{SQLException}} s? [~vozerov] what could be correct behavior here, how do you think? I believe we should handle only {{IgniteSQLException}} s at this point. 3. {{DmlUtils.isBatched}} can be greatly simplified and turned into a one-liner, please do so ({{return (instanceof && ((QueryEx)isBatched)}}) 4. {{SqlFieldsQueryEx.isBatched}} - please use {{F.isEmpty}} 5. {{JdbcRequestHandler.executeBatchedQuery}}: you don't need {{return}} in {{catch}} clause, instead please move everything after {{catch}} into {{try}}. Local var {{qryRes}} will be declared inside {{try}} then too. 6. Why does {{UpdatePlanBuilder.checkPlanCanBeDistributed}} count {{DmlUtils.isBatched}} unconditionally? Probably there are some cases when batch can be executed in distributed manner? 7. {{DmlBatchSender.add}}: you can simplify code and get rid of duplicate code at the end of this method by rewriting condition to {{if (batch.put(...) != null || batch.size() >= size)}} 8. {{DmlBatchSender.processPage}}: this constant copying of maps on each page looks quite suspicious to me. To avoid this, just keep two maps instead of one where needed: {{
[jira] [Commented] (IGNITE-5571) Make sure that cache-less execution works as good as cache-based
[ https://issues.apache.org/jira/browse/IGNITE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304737#comment-16304737 ] Alexander Paschenko commented on IGNITE-5571: - [~vozerov] All done. > Make sure that cache-less execution works as good as cache-based > > > Key: IGNITE-5571 > URL: https://issues.apache.org/jira/browse/IGNITE-5571 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Critical > Fix For: 2.4 > > > Compare the following two methods: > 1) {{GridQueryProcessor.querySqlFields}} - old good entry point for query > execution; > 2) {{GridQueryProcessor.querySqlFieldsNoCache}} - new method for "cache-less" > execution. > Note how cache context is used in the first method: > 1) First, it helps determine whether query can be converted to "local" > 2) Second, it gets query parallelism of current cache, and if it differs from > {{1}}, then it turns on {{distributedJoins}}. > Neither of this happens in the second implementation. Moreover, I had to > throw an exception for local queries, as I didn't know how to handle them > properly. > We need to investigate and fix these two deficiencies somehow. Probably some > inputs from [~sergi.vladykin] would be required, to understand what is going > on. > Our ultimate goal is to make "cache-less" execution as good as the old one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-7300) Allow expressions in SQL INSERTs within transactions
[ https://issues.apache.org/jira/browse/IGNITE-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-7300: Description: The problem is related to IGNITE-7267 - the latter honors raw rows, but drops support for inserts with expressions which yield local subqueries. To fix this, {{UpdatePlan.isLocalSubquery()}} must be honored. > Allow expressions in SQL INSERTs within transactions > > > Key: IGNITE-7300 > URL: https://issues.apache.org/jira/browse/IGNITE-7300 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Paschenko >Assignee: Igor Seliverstov > Fix For: 2.4 > > > The problem is related to IGNITE-7267 - the latter honors raw rows, but drops > support for inserts with expressions which yield local subqueries. To fix > this, {{UpdatePlan.isLocalSubquery()}} must be honored. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-7300) Allow expressions in SQL INSERTs within transactions
[ https://issues.apache.org/jira/browse/IGNITE-7300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-7300: Environment: (was: The problem is related to IGNITE-7267 - the latter honors raw rows, but drops support for inserts with expressions which yield local subqueries. To fix this, {{UpdatePlan.isLocalSubquery()}} must be honored.) > Allow expressions in SQL INSERTs within transactions > > > Key: IGNITE-7300 > URL: https://issues.apache.org/jira/browse/IGNITE-7300 > Project: Ignite > Issue Type: Bug >Reporter: Alexander Paschenko >Assignee: Igor Seliverstov > Fix For: 2.4 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7300) Allow expressions in SQL INSERTs within transactions
Alexander Paschenko created IGNITE-7300: --- Summary: Allow expressions in SQL INSERTs within transactions Key: IGNITE-7300 URL: https://issues.apache.org/jira/browse/IGNITE-7300 Project: Ignite Issue Type: Bug Environment: The problem is related to IGNITE-7267 - the latter honors raw rows, but drops support for inserts with expressions which yield local subqueries. To fix this, {{UpdatePlan.isLocalSubquery()}} must be honored. Reporter: Alexander Paschenko Assignee: Igor Seliverstov Fix For: 2.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-4192) SQL TX: JDBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296848#comment-16296848 ] Alexander Paschenko commented on IGNITE-4192: - [~vozerov] All done. > SQL TX: JDBC driver support > --- > > Key: IGNITE-4192 > URL: https://issues.apache.org/jira/browse/IGNITE-4192 > Project: Ignite > Issue Type: Task > Components: jdbc, sql >Reporter: Denis Magda >Assignee: Alexander Paschenko > Labels: iep-3 > Fix For: 2.4 > > > To support execution of DML and SELECT statements inside of a transaction > started from JDBC driver side. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-4490) Optimize DML for fast INSERT and MERGE
[ https://issues.apache.org/jira/browse/IGNITE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292771#comment-16292771 ] Alexander Paschenko commented on IGNITE-4490: - [~vozerov] No, at this point we shouldn't as this routine comes from how H2 converts dates internally and that logic does not involve Java 8 stuff - see {{org.h2.util.DateTimeUtils}} and its imports. > Optimize DML for fast INSERT and MERGE > -- > > Key: IGNITE-4490 > URL: https://issues.apache.org/jira/browse/IGNITE-4490 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 1.8 >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko > Labels: performance > > It's possible to avoid any SQL querying and map some INSERT and MERGE > statements to cache operations in a way similar to that of UPDATE and DELETE > - i.e. don't make queries when there are no expressions to evaluate in the > query and enhance update plans to perform direct cache operations when INSERT > and MERGE affect columns {{_key}} and {{_val}} only. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-4750) SQL: Support GROUP_CONCAT function
[ https://issues.apache.org/jira/browse/IGNITE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287402#comment-16287402 ] Alexander Paschenko commented on IGNITE-4750: - [~tledkov-gridgain], Right, but {{GridSqlType}} is also associated with SQL keyword for corresponding type, and, although it does not affect current task, we should use right keyword which is {{VARCHAR}}. > SQL: Support GROUP_CONCAT function > -- > > Key: IGNITE-4750 > URL: https://issues.apache.org/jira/browse/IGNITE-4750 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Denis Magda >Assignee: Taras Ledkov > Fix For: 2.4 > > > GROUP_CONCAT function is not supported at the moment. Makes sense to fill > this gap: > http://apache-ignite-users.70518.x6.nabble.com/GROUP-CONCAT-function-is-unsupported-td10757.html > Presently the function doc is hidden: > https://apacheignite-sql.readme.io/docs/group_concat > Open it up once the ticket is released. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-4750) SQL: Support GROUP_CONCAT function
[ https://issues.apache.org/jira/browse/IGNITE-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287363#comment-16287363 ] Alexander Paschenko commented on IGNITE-4750: - [~tledkov-gridgain], my comments: {{GridSqlAggregateFunction}}: 1. Please don't use strings addition together with {{StatementBuilder}} - use {{append}} instead. 2. Please use {{resetCount()}} before {{appendExceptFirst()}} - these operations are looking at the same counter, and therefore it should be reset every time you use {{appendExceptFirst()}}. 3. Methods {{setGroupConcat*}} have way too wordy param names, like this: {{setGroupConcatSeparator(groupConcatSeparator)}} - there's redundancy, you can use simply {{separator}} for param name without losing any clarity. 4. {{GridSqlQueryParser}}: Code conventions are violated - long lines. {{GridSqlQuerySplitter}}: 5. Please throw {{IgniteSqlException}} instead of {{IgniteException}}, error code should be {{UNSUPPORTED_OPERATION}}. 6. Here: {{rdcAgg = aggregate(false, agg.type())}} please use {{GROUP_CONCAT}} as the second arg for clarity. 7. {{GridSqlType}}: I believe SQL keyword for strings is {{VARCHAR}}, not {{STRING}}, isn't it? 8. {{IgniteSqlGroupConcatCollocatedTest}}: please fix long lines. > SQL: Support GROUP_CONCAT function > -- > > Key: IGNITE-4750 > URL: https://issues.apache.org/jira/browse/IGNITE-4750 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Denis Magda >Assignee: Taras Ledkov > Fix For: 2.4 > > > GROUP_CONCAT function is not supported at the moment. Makes sense to fill > this gap: > http://apache-ignite-users.70518.x6.nabble.com/GROUP-CONCAT-function-is-unsupported-td10757.html > Presently the function doc is hidden: > https://apacheignite-sql.readme.io/docs/group_concat > Open it up once the ticket is released. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-4192) SQL TX: JDBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283683#comment-16283683 ] Alexander Paschenko edited comment on IGNITE-4192 at 12/8/17 2:55 PM: -- [~vozerov] My questions/clarifications: 5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}} {quote} Most probably transaction should be bound to Connection object. {quote} This is what effectively happening now - each {{Connection}} has its own {{JdbcRequestHandler}} on server side which in turn has single threaded worker within whose scope transactions start and end. After all, it's a thin client, and there may be no Ignite running at all on the host owning {{Connection}} object. All in all, a connection may not have more than one transaction started as long as no single thread can. was (Author: al.psc): [~vozerov] My questions/clarifications: 5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}} {quote} Most probably transaction should be bound to Connection object. {quote} This is what effectively happening now - each {{Connection}} has its own {{JdbcRequestHandler}} on server side which in turn has single threaded worker within whose scope transactions start and end. After all, it's a thin client, and there may be no Ignite running at all on the host owning {{Connection}} object. > SQL TX: JDBC driver support > --- > > Key: IGNITE-4192 > URL: https://issues.apache.org/jira/browse/IGNITE-4192 > Project: Ignite > Issue Type: Task > Components: jdbc >Reporter: Denis Magda >Assignee: Alexander Paschenko > Labels: iep-3, sql > Fix For: 2.4 > > > To support execution of DML and SELECT statements inside of a transaction > started from JDBC driver side. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-4192) SQL TX: JDBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283683#comment-16283683 ] Alexander Paschenko edited comment on IGNITE-4192 at 12/8/17 2:54 PM: -- [~vozerov] My questions/clarifications: 5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}} {quote} Most probably transaction should be bound to Connection object. {quote} This is what effectively happening now - each {{Connection}} has its own {{JdbcRequestHandler}} on server side which in turn has single threaded worker within whose scope transactions start and end. After all, it's a thin client, and there may be no Ignite running at all on the host owning {{Connection}} object. was (Author: al.psc): [~vozerov] My questions/clarifications: 5 - this is done already, see usages of {{IgniteH2Indexing#sqlUserTxStart()}} {{quote}} Most probably transaction should be bound to Connection object. {{quote}} This is what effectively happening now - each {{Connection}} has its own {{JdbcRequestHandler}} on server side which in turn has single threaded worker within whose scope transactions start and end. After all, it's a thin client, and there may be no Ignite running at all on the host owning {{Connection}} object. > SQL TX: JDBC driver support > --- > > Key: IGNITE-4192 > URL: https://issues.apache.org/jira/browse/IGNITE-4192 > Project: Ignite > Issue Type: Task > Components: jdbc >Reporter: Denis Magda >Assignee: Alexander Paschenko > Labels: iep-3, sql > Fix For: 2.4 > > > To support execution of DML and SELECT statements inside of a transaction > started from JDBC driver side. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-7113) "Key is missing from query" when creating table with key_type=java.lang.String
[ https://issues.apache.org/jira/browse/IGNITE-7113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-7113: --- Assignee: Alexander Paschenko > "Key is missing from query" when creating table with key_type=java.lang.String > -- > > Key: IGNITE-7113 > URL: https://issues.apache.org/jira/browse/IGNITE-7113 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.3 >Reporter: Ilya Kasnacheev >Assignee: Alexander Paschenko > Attachments: IgniteStringKeyTest.java > > > When creating a table of > {code} > CREATE TABLE IF NOT EXISTS TableWithStringKey ( > ID VARCHAR PRIMARY KEY, > DataNodeId VARCHAR > ) WITH "backups=1, cache_name=TableWithStringKey, atomicity=transactional, > key_type=java.lang.String, value_type=TableWithStringKey" > {code} > and attempting an insert later > {code} > INSERT INTO TableWithStringKey (ID, DataNodeId) VALUES ('ref2', 'src2') > {code} > There's suddently an exception > {code} > javax.cache.CacheException: class > org.apache.ignite.internal.processors.query.IgniteSQLException: Failed to > execute DML statement [stmt=INSERT INTO TableWithStringKey (ID, DataNodeId) > VALUES ('ref2', 'src2'), params=null] > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:597) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:560) > at > org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382) > at > com.gridgain.reproducer.IgniteStringKeyTest.insertTest(IgniteStringKeyTest.java:34) > ... 24 more > Caused by: class > org.apache.ignite.internal.processors.query.IgniteSQLException: Failed to > execute DML statement [stmt=INSERT INTO TableWithStringKey (ID, DataNodeId) > VALUES ('ref2', 'src2'), params=null] > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1459) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1909) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor$5.applyx(GridQueryProcessor.java:1907) > at > org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2445) > at > org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:1914) > at > org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:585) > ... 27 more > Caused by: class org.apache.ignite.IgniteCheckedException: Key is missing > from query > at > org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.createSupplier(UpdatePlanBuilder.java:369) > at > org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.planForInsert(UpdatePlanBuilder.java:211) > at > org.apache.ignite.internal.processors.query.h2.dml.UpdatePlanBuilder.planForStatement(UpdatePlanBuilder.java:98) > at > org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.getPlanForStatement(DmlStatementsProcessor.java:473) > at > org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.updateSqlFields(DmlStatementsProcessor.java:170) > at > org.apache.ignite.internal.processors.query.h2.DmlStatementsProcessor.updateSqlFieldsDistributed(DmlStatementsProcessor.java:229) > at > org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.queryDistributedSqlFields(IgniteH2Indexing.java:1453) > ... 33 more > {code} > that goes away if you remove "key_type=java.lang.String" > I'm attaching a reproducer class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-7120) Test cases inherited by AbstractSchemaSelfTest became flucky after the refactoring to use SQL API.
[ https://issues.apache.org/jira/browse/IGNITE-7120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-7120: --- Assignee: Alexander Paschenko > Test cases inherited by AbstractSchemaSelfTest became flucky after the > refactoring to use SQL API. > -- > > Key: IGNITE-7120 > URL: https://issues.apache.org/jira/browse/IGNITE-7120 > Project: Ignite > Issue Type: Bug > Components: sql >Affects Versions: 2.3 >Reporter: Roman Kondakov >Assignee: Alexander Paschenko > Fix For: 2.4 > > > This problem may be caused by the delay between CREATE INDEX command and the > consequitive JDBC metadata updating. Therefore, a metadata info may be > outdated in AbstractSchemaSelfTest#assertIndex() {.. > c.getMetaData().getIndexInfo() ..} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-7001) Refactor dynamic indexes tests to use SQL API
[ https://issues.apache.org/jira/browse/IGNITE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270667#comment-16270667 ] Alexander Paschenko edited comment on IGNITE-7001 at 11/29/17 12:25 PM: [~vozerov] done, few remaining usages look justified to me (failover tests, or something not visible via public API like actual affinity column of a table). was (Author: al.psc): @ozerov done, few remaining usages look justified to me (failover tests, or something not visible via public API like actual affinity column of a table). > Refactor dynamic indexes tests to use SQL API > - > > Key: IGNITE-7001 > URL: https://issues.apache.org/jira/browse/IGNITE-7001 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko > Fix For: 2.4 > > > Tests for dynamic indexes should be refactored like those in IGNITE-6416 to > use SQL API instead of relying on internal structures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-7001) Refactor dynamic indexes tests to use SQL API
[ https://issues.apache.org/jira/browse/IGNITE-7001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16270667#comment-16270667 ] Alexander Paschenko commented on IGNITE-7001: - @ozerov done, few remaining usages look justified to me (failover tests, or something not visible via public API like actual affinity column of a table). > Refactor dynamic indexes tests to use SQL API > - > > Key: IGNITE-7001 > URL: https://issues.apache.org/jira/browse/IGNITE-7001 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko > Fix For: 2.4 > > > Tests for dynamic indexes should be refactored like those in IGNITE-6416 to > use SQL API instead of relying on internal structures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-4192) SQL TX: JDBC driver support
[ https://issues.apache.org/jira/browse/IGNITE-4192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-4192: --- Assignee: Alexander Paschenko > SQL TX: JDBC driver support > --- > > Key: IGNITE-4192 > URL: https://issues.apache.org/jira/browse/IGNITE-4192 > Project: Ignite > Issue Type: Task > Components: jdbc >Reporter: Denis Magda >Assignee: Alexander Paschenko > Labels: iep-3 > Fix For: 2.4 > > > To support execution of DML and SELECT statements inside of a transaction > started from JDBC driver side. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-7001) Refactor dynamic indexes test to use SQL API
Alexander Paschenko created IGNITE-7001: --- Summary: Refactor dynamic indexes test to use SQL API Key: IGNITE-7001 URL: https://issues.apache.org/jira/browse/IGNITE-7001 Project: Ignite Issue Type: Task Components: sql Reporter: Alexander Paschenko Assignee: Alexander Paschenko Fix For: 2.4 Tests for dynamic indexes should be refactored like those in IGNITE-6416 to use SQL API instead of relying on internal structures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6996) Smarter handling of id fields in SQL values
Alexander Paschenko created IGNITE-6996: --- Summary: Smarter handling of id fields in SQL values Key: IGNITE-6996 URL: https://issues.apache.org/jira/browse/IGNITE-6996 Project: Ignite Issue Type: Improvement Components: sql Reporter: Alexander Paschenko Consider such case: User wants to have a composite value (many value fields in {{QueryEntity}}) with one field associated with value's id (most likely matching cache key too). Currently in order to insert such an object we will have to do something like {{INSERT INTO Person(_key, id, name) values(1, 1, 'John')}} And there's no way to avoid such a redundant repeat of the same value. Suggested approach: I believe that we should specifically handle the case when user specifies {{keyFieldName}} in configuration and specified field is field of the value. In such case, we could just do {{INSERT INTO Person(id, name) values(1, 'John')}} and derive {{_key}} value from {{id}} column. (And vice versa.) At a glance, this also will require following tweaks: - forbid performing SQL {{UPDATE}} on such column ({{id}} in above example); - on an {{INSERT}}, check that {{_key}} and {{id}} values are the same, if both specified. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-5571) Make sure that cache-less execution works as good as cache-based
[ https://issues.apache.org/jira/browse/IGNITE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259088#comment-16259088 ] Alexander Paschenko commented on IGNITE-5571: - [~vozerov], 1; 4; 5 - fixed. 2; 3 - stuff like GROUP BY didn't participate in analysis before; thus, this flag was calculated solely relying on characteristics of the cache the query was issued on and query flags - like this {{boolean loc = (qry.isReplicatedOnly() && cctx.isReplicatedAffinityNode()) || cctx.isLocal() || qry.isLocal();}} Therefore, I believe that more intelligent and in-depth analysis of the queries should be moved to a separate task as of now as long as suggested patch does not in fact change existing behavior. > Make sure that cache-less execution works as good as cache-based > > > Key: IGNITE-5571 > URL: https://issues.apache.org/jira/browse/IGNITE-5571 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Critical > Fix For: 2.4 > > > Compare the following two methods: > 1) {{GridQueryProcessor.querySqlFields}} - old good entry point for query > execution; > 2) {{GridQueryProcessor.querySqlFieldsNoCache}} - new method for "cache-less" > execution. > Note how cache context is used in the first method: > 1) First, it helps determine whether query can be converted to "local" > 2) Second, it gets query parallelism of current cache, and if it differs from > {{1}}, then it turns on {{distributedJoins}}. > Neither of this happens in the second implementation. Moreover, I had to > throw an exception for local queries, as I didn't know how to handle them > properly. > We need to investigate and fix these two deficiencies somehow. Probably some > inputs from [~sergi.vladykin] would be required, to understand what is going > on. > Our ultimate goal is to make "cache-less" execution as good as the old one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6950) Update documentation for user facing CREATE TABLE params
[ https://issues.apache.org/jira/browse/IGNITE-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-6950: --- Assignee: Denis Magda (was: Alexander Paschenko) > Update documentation for user facing CREATE TABLE params > > > Key: IGNITE-6950 > URL: https://issues.apache.org/jira/browse/IGNITE-6950 > Project: Ignite > Issue Type: Task > Security Level: Public(Viewable by anyone) > Components: documentation, sql >Affects Versions: 2.4 >Reporter: Alexander Paschenko >Assignee: Denis Magda > Fix For: 2.4 > > > Changes to documentation should be made to reflect additional spelling for > some {{CREATE TABLE}} params added in IGNITE-6270. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6950) Update documentation for user facing CREATE TABLE params
Alexander Paschenko created IGNITE-6950: --- Summary: Update documentation for user facing CREATE TABLE params Key: IGNITE-6950 URL: https://issues.apache.org/jira/browse/IGNITE-6950 Project: Ignite Issue Type: Task Security Level: Public (Viewable by anyone) Components: documentation, sql Affects Versions: 2.4 Reporter: Alexander Paschenko Assignee: Alexander Paschenko Fix For: 2.4 Changes to documentation should be made to reflect additional spelling for some {{CREATE TABLE}} params added in IGNITE-6270. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-5571) Make sure that cache-less execution works as good as cache-based
[ https://issues.apache.org/jira/browse/IGNITE-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210852#comment-16210852 ] Alexander Paschenko commented on IGNITE-5571: - [~ptupitsyn], hello, could you please have a look at .NET test failures? https://ci.ignite.apache.org/viewLog.html?buildId=898285=buildResultsDiv=Ignite20Tests_IgnitePlatformNet > Make sure that cache-less execution works as good as cache-based > > > Key: IGNITE-5571 > URL: https://issues.apache.org/jira/browse/IGNITE-5571 > Project: Ignite > Issue Type: Task > Components: sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko >Priority: Critical > Fix For: 2.4 > > > Compare the following two methods: > 1) {{GridQueryProcessor.querySqlFields}} - old good entry point for query > execution; > 2) {{GridQueryProcessor.querySqlFieldsNoCache}} - new method for "cache-less" > execution. > Note how cache context is used in the first method: > 1) First, it helps determine whether query can be converted to "local" > 2) Second, it gets query parallelism of current cache, and if it differs from > {{1}}, then it turns on {{distributedJoins}}. > Neither of this happens in the second implementation. Moreover, I had to > throw an exception for local queries, as I didn't know how to handle them > properly. > We need to investigate and fix these two deficiencies somehow. Probably some > inputs from [~sergi.vladykin] would be required, to understand what is going > on. > Our ultimate goal is to make "cache-less" execution as good as the old one. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command
[ https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210112#comment-16210112 ] Alexander Paschenko commented on IGNITE-6283: - [~dmagda], the answer is "yes" for both questions. > Document ALTER TABLE ADD COLUMN command > --- > > Key: IGNITE-6283 > URL: https://issues.apache.org/jira/browse/IGNITE-6283 > Project: Ignite > Issue Type: Task > Components: documentation, sql >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko > Fix For: 2.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-6637) No proper cleanup of statements cache is done on table drop
[ https://issues.apache.org/jira/browse/IGNITE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-6637: Description: We should cleanup statements cache whenever cache is deregistered - otherwise it's possible to retrieve from statements cache a statement that is prepared from H2 perspective but may not be executed. Such situation may arise for any dynamic cache, not only for that created via SQL. Reproducer attached. was: We should cleanup statements cache whenever cache is deregistered - otherwise it's possible to retrieve from statements cache a statement that is prepared from H2 perspective but may not be executed. Reproducer attached. > No proper cleanup of statements cache is done on table drop > --- > > Key: IGNITE-6637 > URL: https://issues.apache.org/jira/browse/IGNITE-6637 > Project: Ignite > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: sql >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko > Fix For: 2.3 > > Attachments: IgniteCacheLocalQuerySelfTest.java > > > We should cleanup statements cache whenever cache is deregistered - otherwise > it's possible to retrieve from statements cache a statement that is prepared > from H2 perspective but may not be executed. Such situation may arise for any > dynamic cache, not only for that created via SQL. > Reproducer attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-6637) No proper cleanup of statements cache is done on table drop
[ https://issues.apache.org/jira/browse/IGNITE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-6637: Fix Version/s: 2.3 > No proper cleanup of statements cache is done on table drop > --- > > Key: IGNITE-6637 > URL: https://issues.apache.org/jira/browse/IGNITE-6637 > Project: Ignite > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: sql >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko > Fix For: 2.3 > > Attachments: IgniteCacheLocalQuerySelfTest.java > > > We should cleanup statements cache whenever cache is deregistered - otherwise > it's possible to retrieve from statements cache a statement that is prepared > from H2 perspective but may not be executed. > Reproducer attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (IGNITE-6637) No proper cleanup of statements cache is done on table drop
[ https://issues.apache.org/jira/browse/IGNITE-6637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko updated IGNITE-6637: Attachment: IgniteCacheLocalQuerySelfTest.java > No proper cleanup of statements cache is done on table drop > --- > > Key: IGNITE-6637 > URL: https://issues.apache.org/jira/browse/IGNITE-6637 > Project: Ignite > Issue Type: Bug > Security Level: Public(Viewable by anyone) > Components: sql >Reporter: Alexander Paschenko >Assignee: Alexander Paschenko > Attachments: IgniteCacheLocalQuerySelfTest.java > > > We should cleanup statements cache whenever cache is deregistered - otherwise > it's possible to retrieve from statements cache a statement that is prepared > from H2 perspective but may not be executed. > Reproducer attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (IGNITE-6637) No proper cleanup of statements cache is done on table drop
Alexander Paschenko created IGNITE-6637: --- Summary: No proper cleanup of statements cache is done on table drop Key: IGNITE-6637 URL: https://issues.apache.org/jira/browse/IGNITE-6637 Project: Ignite Issue Type: Bug Security Level: Public (Viewable by anyone) Components: sql Reporter: Alexander Paschenko Assignee: Alexander Paschenko We should cleanup statements cache whenever cache is deregistered - otherwise it's possible to retrieve from statements cache a statement that is prepared from H2 perspective but may not be executed. Reproducer attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6024) SQL: execute DML statements on the server when possible
[ https://issues.apache.org/jira/browse/IGNITE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200080#comment-16200080 ] Alexander Paschenko commented on IGNITE-6024: - [~isapego], OK, let's let that be. > SQL: execute DML statements on the server when possible > --- > > Key: IGNITE-6024 > URL: https://issues.apache.org/jira/browse/IGNITE-6024 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 2.1 >Reporter: Vladimir Ozerov >Assignee: Sergey Kalashnikov > Labels: important, performance > Fix For: 2.3 > > > Currently we execute DML statements as follows: > 1) Get query result set to the client > 2) Construct entry processors and send them to servers in batches > This approach is inefficient as it causes a lot of unnecessary network > communication Instead, we should execute DML statements directly on server > nodes when it is possible. > Implementation considerations: > 1) Determine set of queries which could be processed in this way. E.g., > {{LIMIT/OFFSET}}, {{GROUP BY}}, {{ORDER BY}}, {{DISTINCT}}, etc. are out of > question - they must go through the client anyway. Probably > {{skipMergeTable}} flag is a good starting point (good, not precise!) > 2) Send request to every server and execute local DML right there > 3) No failover support at the moment - throw "partial update" exception if > topology is unstable > 4) Handle partition reservation carefully > 5) Transactions: we still have single coordinator - this is a client. When > MVCC and TX SQL is ready, client will assign proper counters to server > requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6024) SQL: execute DML statements on the server when possible
[ https://issues.apache.org/jira/browse/IGNITE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200062#comment-16200062 ] Alexander Paschenko commented on IGNITE-6024: - [~isapego], [~skalashnikov], my comments: 1. I don't think we need internal class inside {{UpdatePlan}}. Please move its fields to {{UpdatePlan}} as it already does have many fields that have meaning only for some operations. In order to return correct params from {{UpdatePlanBuilder#checkPlanCanBeDistributed}}, we could just use a {{T2}} as long as it's just communication between private methods. 2. Do we really need {{SqlFieldsQueryEx}}? Probably we could just pass additional params explicitly, couldn't we? [~vozerov], what do you think? > SQL: execute DML statements on the server when possible > --- > > Key: IGNITE-6024 > URL: https://issues.apache.org/jira/browse/IGNITE-6024 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 2.1 >Reporter: Vladimir Ozerov >Assignee: Sergey Kalashnikov > Labels: important, performance > Fix For: 2.3 > > > Currently we execute DML statements as follows: > 1) Get query result set to the client > 2) Construct entry processors and send them to servers in batches > This approach is inefficient as it causes a lot of unnecessary network > communication Instead, we should execute DML statements directly on server > nodes when it is possible. > Implementation considerations: > 1) Determine set of queries which could be processed in this way. E.g., > {{LIMIT/OFFSET}}, {{GROUP BY}}, {{ORDER BY}}, {{DISTINCT}}, etc. are out of > question - they must go through the client anyway. Probably > {{skipMergeTable}} flag is a good starting point (good, not precise!) > 2) Send request to every server and execute local DML right there > 3) No failover support at the moment - throw "partial update" exception if > topology is unstable > 4) Handle partition reservation carefully > 5) Transactions: we still have single coordinator - this is a client. When > MVCC and TX SQL is ready, client will assign proper counters to server > requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6573) SQL: Document WRAP_KEY and WRAP_VALUE commands
[ https://issues.apache.org/jira/browse/IGNITE-6573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198909#comment-16198909 ] Alexander Paschenko commented on IGNITE-6573: - Done, [~dmagda] please have a look. What to see: https://dash.readme.io/project/apacheignite-sql/v2.1/docs/create-table, section *Parameters* (at the very end) and section *Note* (at the end too). > SQL: Document WRAP_KEY and WRAP_VALUE commands > -- > > Key: IGNITE-6573 > URL: https://issues.apache.org/jira/browse/IGNITE-6573 > Project: Ignite > Issue Type: Task > Components: documentation >Reporter: Vladimir Ozerov >Assignee: Alexander Paschenko > Fix For: 2.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command
[ https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198886#comment-16198886 ] Alexander Paschenko edited comment on IGNITE-6283 at 10/10/17 4:25 PM: --- Done, [~dmagda], [~sergey.puch...@gmail.com] please have a look. Page: https://dash.readme.io/project/apacheignite-sql/v2.1/docs/alter-table was (Author: al.psc): Done, [~dmagda], [~sergey.puch...@gmail.com] please have a look. > Document ALTER TABLE ADD COLUMN command > --- > > Key: IGNITE-6283 > URL: https://issues.apache.org/jira/browse/IGNITE-6283 > Project: Ignite > Issue Type: Task > Components: documentation, sql >Reporter: Vladimir Ozerov >Assignee: Denis Magda > Fix For: 2.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command
[ https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198886#comment-16198886 ] Alexander Paschenko edited comment on IGNITE-6283 at 10/10/17 4:04 PM: --- Done, [~dmagda], [~sergey.puch...@gmail.com] please have a look. was (Author: al.psc): Done, [~dmagda] [~sergey.puch...@gmail.com] please have a look. > Document ALTER TABLE ADD COLUMN command > --- > > Key: IGNITE-6283 > URL: https://issues.apache.org/jira/browse/IGNITE-6283 > Project: Ignite > Issue Type: Task > Components: documentation, sql >Reporter: Vladimir Ozerov >Assignee: Denis Magda > Fix For: 2.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (IGNITE-6283) Document ALTER TABLE ADD COLUMN command
[ https://issues.apache.org/jira/browse/IGNITE-6283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Paschenko reassigned IGNITE-6283: --- Assignee: Denis Magda (was: Alexander Paschenko) Done, [~dmagda] [~sergey.puch...@gmail.com] please have a look. > Document ALTER TABLE ADD COLUMN command > --- > > Key: IGNITE-6283 > URL: https://issues.apache.org/jira/browse/IGNITE-6283 > Project: Ignite > Issue Type: Task > Components: documentation, sql >Reporter: Vladimir Ozerov >Assignee: Denis Magda > Fix For: 2.3 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (IGNITE-6024) SQL: execute DML statements on the server when possible
[ https://issues.apache.org/jira/browse/IGNITE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198543#comment-16198543 ] Alexander Paschenko commented on IGNITE-6024: - [~skalashnikov], my comments: 1. Why has {{UpdatePlan}} become mutable? As before, it's initialized fully in one place. Please refactor to return immutability. 2. Why {{GridH2DmlResponse}} carries only error string? We already have had to pull error code to thin JDBC response, probably we should preserve code here too? [~vozerov], what do you think? 3. What is semantic of distributed {{MERGE}} or {{INSERT}} without subquery? If I issue {{MERGE INTO Person(id, name) values (1, 'a'), (2, 'b'), (3, 'c')}}, how it will work? I don't see any code that would check this. Intuitively, it should be somewhere around {{DmlStatementsProcessor#checkPlanCanBeDistributed}}. 4. {{DmlStatementsProcessor#checkPlanCanBeDistributed}}: why we don't care about statements caching here? ({{PreparedStatement stmt = conn.prepareStatement(plan.selectQry)}}) - looks like this could introduce an inevitable parsing. 5. Option name {{updateOnServer}} looks confusing to me. Isn't something like {{distributedDml}} better? > SQL: execute DML statements on the server when possible > --- > > Key: IGNITE-6024 > URL: https://issues.apache.org/jira/browse/IGNITE-6024 > Project: Ignite > Issue Type: Task > Components: sql >Affects Versions: 2.1 >Reporter: Vladimir Ozerov >Assignee: Sergey Kalashnikov > Labels: important, performance > Fix For: 2.3 > > > Currently we execute DML statements as follows: > 1) Get query result set to the client > 2) Construct entry processors and send them to servers in batches > This approach is inefficient as it causes a lot of unnecessary network > communication Instead, we should execute DML statements directly on server > nodes when it is possible. > Implementation considerations: > 1) Determine set of queries which could be processed in this way. E.g., > {{LIMIT/OFFSET}}, {{GROUP BY}}, {{ORDER BY}}, {{DISTINCT}}, etc. are out of > question - they must go through the client anyway. Probably > {{skipMergeTable}} flag is a good starting point (good, not precise!) > 2) Send request to every server and execute local DML right there > 3) No failover support at the moment - throw "partial update" exception if > topology is unstable > 4) Handle partition reservation carefully > 5) Transactions: we still have single coordinator - this is a client. When > MVCC and TX SQL is ready, client will assign proper counters to server > requests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)