[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2019-04-23 Thread Alexey Serbin (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824440#comment-16824440
 ] 

Alexey Serbin commented on KUDU-1563:
-

[~adar], I don't have any thoughts on this issue yet, but I'll try to take a 
closer look this week to get more context.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2019-04-22 Thread Adar Dembo (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823678#comment-16823678
 ] 

Adar Dembo commented on KUDU-1563:
--

bq.  I don't know how to make the API compatible.

[~aserbin], you dealt with KuduWriteOperation's non-PIMPLness recently; do you 
have any thoughts on this issue?


> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2019-04-22 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823344#comment-16823344
 ] 

Brock Noland commented on KUDU-1563:


I'd love to work on this but I don't know how to make the API compatible and am 
resource constrained. 

If someone wants to work on it, go right ahead.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2019-04-18 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821303#comment-16821303
 ] 

Grant Henke commented on KUDU-1563:
---

This would be a useful optimization for full restore (via Spark) optimizations. 
Right now we use UPSERT in case a spark task needs to be retried, but in the 
case of a failed Spark task that means we UPSERT all the rows that previously 
succeeded again. 

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: backup, newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-18 Thread Adar Dembo (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16724448#comment-16724448
 ] 

Adar Dembo commented on KUDU-1563:
--

bq. What would be implications be if we could not do this in a backwards 
compatible way? 

It really depends on the incompatibility itself. IIRC, your draft adds a new 
data member to {{KuduWriteOperation}}. This class is allocated by 
libkudu_client.so via calls like {{KuduTable::NewInsert}}, and, most of the 
time, deallocated by libkudu_client.so after it has taken ownership of the 
operation in {{KuduSession::Apply}} and sent it on the wire. However, if the 
operation failed, it'll be assigned to a {{KuduError}} and passed back to the 
third party application, and the application can choose to take ownership of it 
via {{KuduError::release_failed_op}}. At that point, the application is on the 
hook for deallocating it, and if the application and libkudu_client.so don't 
agree on the size and layout of the class, memory will get corrupted by the 
deallocation.

In short, the severity and impact of the incompatibility varies on a case by 
case basis, and is pretty difficult to assess thoroughly, which is why I'd err 
on the side of either "don't do it", or "do it, and rev the client SONAME's 
major version to express the incompatibility".

bq. I am not sure how to solve the PIMPL'ed issue either but I am happy to 
investigate.

If you could isolate the changes to just new classes/subclasses (i.e. just a 
new {{KuduInsertIgnore}} subclass of {{KuduWriteOperation}}), then you'd be in 
the clear.

Barring that, you could implement new variants of {{KuduWriteOperation}} and 
friends that are PIMPL'ed, and modify the rest of the client APIs to support 
them alongside the existing variant. Users of the C++ client will need to 
change their code to use the new variants, but it'll be completely safe from a 
backwards compatibility perspective.

The [KDE community wiki page on the 
subject|https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B]
 has much more useful insight and some examples too.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-17 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723602#comment-16723602
 ] 

Brock Noland commented on KUDU-1563:


[~adar]

> I am nervous about inflating the memory consumption of each operation, and 
> I'm not sure how to preserve backwards compatibility in the C++ client's 
> non-PIMPL'ed KuduWriteOperation class. If you can address both of these 
> concerns, I'd be open to per-operation configuration.

>From a memory perspective, I think this can be implemented as a bitmask on an 
>integer which would consume little memory on a per-operation basis.

I am not sure how to solve the PIMPL'ed issue either but I am happy to 
investigate. What would be implications be if we could not do this in a 
backwards compatible way? FWIW - I am sure someone outside Impala is using the 
C++ client, but in my customer base of 25+ Kudu users, we don't have a single 
one. Thus my gut tells me it's a very small number of users.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-17 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723598#comment-16723598
 ] 

Brock Noland commented on KUDU-1563:


bq. excuse me, I didn't mean to assign it to myself, assigned it back to you.

No worries! The holidays typically very productive for my open source 
contributions so if we can get agreement on approach, I think I'll have it 
complete by the new year.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-17 Thread Mike Percy (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723538#comment-16723538
 ] 

Mike Percy commented on KUDU-1563:
--

I know I'm late to this party. I think it's worth modeling what SQL does and 
INSERT IGNORE in that context operates at a batch or operation level, not a 
session level. So it seems more of an impedance match to keep this type of 
error handling configuration at the operation or batch level from a client API 
perspective to avoid requiring SQL clients to constantly be setting session 
options if they are caching sessions.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-17 Thread Attila Bukor (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16723381#comment-16723381
 ] 

Attila Bukor commented on KUDU-1563:


Hi [~brocknoland], excuse me, I didn't mean to assign it to myself, assigned it 
back to you.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-13 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16720459#comment-16720459
 ] 

Brock Noland commented on KUDU-1563:


Thanks [~r1pp3rj4ck] for picking this up. While I'd like to contribute it, I am 
more concerned with getting access to the feature. Is this something you have 
bandwidth to work on now?

bq. I agree that operation level is more intuitive and more flexible, though I 
don't really see a use case for that added flexibility. Can you articulate one?

I don't have a use case. Only a that feels slightly odd changing a session 
level parameter to define how a operation behaves. I am fine with the suggested 
approach it'll just take more work to implement.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Attila Bukor
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-07 Thread Adar Dembo (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713156#comment-16713156
 ] 

Adar Dembo commented on KUDU-1563:
--

bq. Why would we have the configuration at the session level? Why not put it at 
the operation level? I guess databases do have a session level configuration, 
but it feels odd to me that I am setting at the session level how an {{INSERT 
IGNORE}} should behave. How about we add an argument to the new operation which 
specifies the behavior of that operation?

I agree that operation level is more intuitive and more flexible, though I 
don't really see a use case for that added flexibility. Can you articulate one?

In any case, my concerns are implementation-specific: I am nervous about 
inflating the memory consumption of each operation, and I'm not sure how to 
preserve backwards compatibility in the C++ client's non-PIMPL'ed 
KuduWriteOperation class. If you can address both of these concerns, I'd be 
open to per-operation configuration.


> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-07 Thread Grant Henke (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713014#comment-16713014
 ] 

Grant Henke commented on KUDU-1563:
---

I am reading through and catching up on this. I think it would definitely be a 
nice feature to have. 

It looks like [~danburkert] also mentioned both the operation level setting and 
the session level setting: 

bq. I think I'm in favor of merging the current patch which introduces an 
INSERT IGNORE operation to ignore constraint violations of type 1 on the server 
side. Additionally, we should strongly consider adding a session-specific 
options to selectively ignore each type of constraint individually. So for 
example, the client could use the INSERT IGNORE operation type if they want to 
selectively ignore some instances of duplicate primary-key constraints, or it 
could call KuduSession::ignoreDuplicatePrimaryKeyViolations to ignore all of 
them for the entire session.

I agree that the intuitive place to define the expected behavior would be on 
the operation. I am not sure if there is a big benefit to having both, but 
having it be session based only seams to reduce the flexibility of what a 
client can do.

 





> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2018-12-07 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712794#comment-16712794
 ] 

Brock Noland commented on KUDU-1563:


[~adar] - one thing feels a little odd about this. Why would we have the 
configuration at the session level? Why not put it at the operation level? I 
guess databases do have a session level configuration, but it feels odd to me 
that I am setting at the session level how an {{INSERT IGNORE}} should behave. 
How about we add an argument to the new operation which specifies the behavior 
of that operation?

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>Priority: Major
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2017-05-24 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023757#comment-16023757
 ] 

Brock Noland commented on KUDU-1563:


Useful [link|https://mariadb.com/kb/en/mariadb/insert-on-duplicate-key-update/] 
to understand the differences between {{UPSERT}} and {{ON DUPLICATE KEY 
UPDATE}} 

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2017-05-24 Thread Dan Burkert (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023372#comment-16023372
 ] 

Dan Burkert commented on KUDU-1563:
---

Just learned about a usecase that would be well-served by an {{ON DUPLICATE KEY 
UPDATE}} mechanism in Kudu.  In particular, the workload is ingesting batches 
of timestamped records, with each record being quite large.  Individual batches 
routinely contain duplicate records whose contents only differ by collection 
timestamp.  Ideally as new batches are ingested, duplicate records would update 
the collection timestamp column, but skip updating the larger data columns.

To do this effectively, we could have a duplicate-resolution strategy that 
updates individual columns to new values, effectively {{ON DUPLICATE KEY 
UPDATE}} with only constants allowed as the update value.  To be efficient, and 
to map well to SQL, this should probably be specified once on the entire batch 
instead of on individual ops.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

2016-11-08 Thread Dan Burkert (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648540#comment-15648540
 ] 

Dan Burkert commented on KUDU-1563:
---

[~mjacobs] brings up a good point that the duplicate-key constraint on insert 
is not the only constraint when writing to Kudu:

# duplicate primary-key constraint on insert
# missing primary-key constraint on delete and update
# missing range partition on any write
# missing column value in column without default on insert

Applications may want to 'ignore' any of these errors when writing to Kudu.  
Some of these errors are reported by the server (1, 2 and 4), and some are 
caught by the client before sending (3, and the client could check 4 but 
currently does not).

Of these constraints, I think 1. is the most commonly ignored, and that's why 
we decided to add first-class support for it by adding a special operation 
type.  Obviously that approach can't scale to all of the constraint types, much 
less their cross product.

I think I'm in favor of merging the current patch which introduces an INSERT 
IGNORE operation to ignore constraint violations of type 1 on the server side.  
Additionally, we should strongly consider adding a session-specific options to 
selectively ignore each type of constraint individually.  So for example, the 
client could use the INSERT IGNORE operation type if they want to selectively 
ignore some instances of duplicate primary-key constraints, or it could call 
{{KuduSession::ignoreDuplicatePrimaryKeyViolations}} to ignore all of them for 
the entire session.  We would also expose flags for the rest of the constraint 
types.

Finally, the client should expose how many violations of each type were ignored 
in the session statistics.

> Add support for INSERT IGNORE
> -
>
> Key: KUDU-1563
> URL: https://issues.apache.org/jira/browse/KUDU-1563
> Project: Kudu
>  Issue Type: New Feature
>Reporter: Dan Burkert
>Assignee: Brock Noland
>  Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| 
> https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-],
>  which is implemented by filtering the errors on the client side.  If we are 
> going to continue to support this feature (and the consensus seems to be that 
> we probably should), we should promote it to a first class operation type 
> that is handled on the server side.  This would have a modest perf. 
> improvement since less errors are returned, and it would allow INSERT IGNORE 
> ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)