[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085537#comment-15085537 ] Stefania commented on CASSANDRA-9303: - CI still OK, ready to commit. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085424#comment-15085424 ] Stefania commented on CASSANDRA-9303: - dtests are fine now, after rebasing the dtest branch as well. However 2.2+ branches changed overnight, so I've rebased again and restarted CI for 2.2+. In the end to make life easier, I converted the merge commits into simple commits so the rebase can be done without having to re-resolve the original merge conflicts. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083546#comment-15083546 ] Stefania commented on CASSANDRA-9303: - Unit tests are fine but about 30 dtests fail on all branches due to "No such file or directory". They seem to pass locally so I don't understand if it's related to the patch or not, I will resume tomorrow. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083401#comment-15083401 ] Sylvain Lebresne commented on CASSANDRA-9303: - bq. I thought a committer could do a squashed merge without necessarily having to rebase as long as the patch applies or is a rebase always necessary? We don't commit by merging, we pull a squashed version of the patch on top of the current code base, so rebasing is always the preferred way. In any case, we do ideally want to have test run on a sufficiently rebased version (typically saying a test failure is fine because the patch is on an old version is potentially dangerous) and we want to avoid having the committer deal with merge conflicts since he's not necessarily familiar with the patch, so always rebasing is a good strategy. Anyway, thanks for doing it and let's wait on CI. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083372#comment-15083372 ] Stefania commented on CASSANDRA-9303: - bq. Can't you squash it? I mean, that's what the committer would have to do anyway so on top of giving accurate test results, it'll also make the committer job easier. I thought a committer could do a squashed merge without necessarily having to rebase as long as the patch applies or is a rebase always necessary? In any case, here are the branches squashed and rebased. I've also updated _CHANGES.txt_ and _NEWS.txt_. I've restarted CI again to rule out any mistakes up-merging. ||2.1||2.2||3.0||3.2||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/9303-2.1]|[patch|https://github.com/stef1927/cassandra/commits/9303-2.2]|[patch|https://github.com/stef1927/cassandra/commits/9303-3.0]|[patch|https://github.com/stef1927/cassandra/commits/9303-3.2]|[patch|https://github.com/stef1927/cassandra/commits/9303]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-testall/]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-dtest/]| Old branches still exist with an {{-old}} suffix. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083319#comment-15083319 ] Sylvain Lebresne commented on CASSANDRA-9303: - bq. A few failures, especially on trunk, but this is due to the lack of a recent rebase (which would be painful without having recorded the merge conflicts with git rerere) Can't you squash it? I mean, that's what the committer would have to do anyway so on top of giving accurate test results, it'll also make the committer job easier. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083217#comment-15083217 ] Stefania commented on CASSANDRA-9303: - CI is fine. A few failures, especially on trunk, but this is due to the lack of a recent rebase (which would be painful without having recorded the merge conflicts with {{git rerere}}). Note for committing: * to avoid dtest failures [this pull request|https://github.com/riptano/cassandra-dtest/pull/724] should be merged just before committing. * repeating branches here: ||2.1||2.2||3.0||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/9303-2.1]|[patch|https://github.com/stef1927/cassandra/commits/9303-2.2]|[patch|https://github.com/stef1927/cassandra/commits/9303-3.0]|[patch|https://github.com/stef1927/cassandra/commits/9303]| > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082929#comment-15082929 ] Paulo Motta commented on CASSANDRA-9303: bq. I've already modified.. That should be enough, thanks! > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082912#comment-15082912 ] Stefania commented on CASSANDRA-9303: - Thanks, I will monitor the cassci jobs and update the ticket once completed. bq. Also, if you could add a simple dtest to check that the unlogged batch warning is only logged if there are non local mutations that would be nice. I've already modified {{test_client_warnings}}, see [this commit | https://github.com/stef1927/cassandra-dtest/commit/0f8b8850cbf3410cc58bfc9c822502706bc6bf07]. Checking client warnings should be equivalent to checking log messages but I can add that too if needed. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082850#comment-15082850 ] Paulo Motta commented on CASSANDRA-9303: Thanks! New code looks good. Please mark as ready to commit when tests look good as I will be away for the rest of the day. Also, if you could add a simple dtest to check that the unlogged batch warning is only logged if there are non local mutations that would be nice. But this can go independent of commit as its just a dtest PR. Good work! > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082837#comment-15082837 ] Stefania commented on CASSANDRA-9303: - Applicable to 2.2+ only, I fixed a problem with {{ClientWarningsTest}} and restarted CI. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082700#comment-15082700 ] Stefania commented on CASSANDRA-9303: - bq. Could you improve the local mutation check on {{BatchStatement}} Done, sorry I totally missed the existence of those helper methods. bq. Although the fix for CASSANDRA-10938 looks harmless... I reverted it. I agree with your concerns but I am also equally worried about people importing data with CASSANDRA-10938 still not fixed. bq. Did you validate the performance of the new batch-by-replica approach? Yes. Although the hybrid approach may cost us 2-3 seconds with a 1M cassandra-stress benchmark with 3 nodes (~25 vs ~22 seconds), we do not impact batching by partition key because that has priority. So, unlike the discussion on CASSANDRA-9302, batching by partition key is still there and batching by replica is just a backup approach. It seems conceptually wrong to me to send UNLOGGED batches with non-local partitions: * we'll trigger the WARN that we worked towards removing for local partitions * we also increase the risk of timeouts if one node gets overloaded CI pending including unit tests, here are all the links: ||2.1||2.2||3.0||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/9303-2.1]|[patch|https://github.com/stef1927/cassandra/commits/9303-2.2]|[patch|https://github.com/stef1927/cassandra/commits/9303-3.0]|[patch|https://github.com/stef1927/cassandra/commits/9303]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-testall/]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-dtest/]| > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082211#comment-15082211 ] Paulo Motta commented on CASSANDRA-9303: Nice job, we`re nearly there! :) Now tests are passing locally on Windows and code looks good. Some minor nits: * Could you improve the local mutation check on {{BatchStatement}}, by using {{StorageService.getLocalRanges}}, {{Range.isInRanges}} and also skip the {{isMutationLocal()}} evaluation if the {{localMutationsOnly}} variable is {{false}}. Also you can remove the cqlsh reference on the comment, since even in a non-cqlsh context the warning is not necessary if there are only local mutations in an unlogged batch. * Although the fix for CASSANDRA-10938 looks harmless, I'm not sure if it could have some unintended consequences, so I'd prefer to commit it separately after discussion on CASSANDRA-10938. Did you validate the performance of the new batch-by-replica approach? In the end it seems CASSANDRA-10938 was not caused by batching by partition key and there was a lot of back-and-forth between batch-by-replica vs batch-by-partition, so it's not very clear which approach is the best. We could probably do a more thorough evaluation/validation later, but it would be nice to make sure our batching strategy performs well. Since there are also java code changes, can you also submit unit tests in addition to dtests on cassci? Thanks! > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081092#comment-15081092 ] Stefania commented on CASSANDRA-9303: - I've also performed a little bit more work: * Removed the WARN for UNLOGGED batches with multiple partitions introduced by CASSANDRA-9399 _if the partitions are only local_. * Optimized {{split_batches}} to first batch by partition key, if at least two rows have the same partition key, and batch by replica only those rows without common partition keys. This ensures we optimize single insertions server side per partition key and it saves us the cost of accessing the token map to work out the replica if we have common partition keys. * Ensured that {{DCAwareRoundRobinPolicy}} gets the data center name to avoid a WARN. * Applied a workaround for CASSANDRA-10938. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074033#comment-15074033 ] Stefania commented on CASSANDRA-9303: - The hanging tests were caused by the way in which we run cqlsh in ccm, this [pull request|https://github.com/pcmanus/ccm/pull/432] fixed it. The remaining failures were caused by the following two things: * Handling of temporary files is quite different on Windows, the changes to address this are only in the dtest code, see the second commit of the [pull request|https://github.com/riptano/cassandra-dtest/pull/724]. * The path names should have been normalized, see [this commit|https://github.com/stef1927/cassandra/commit/295219dfbcf24ece9729030cce6e9638899b2842]. I've also changed a few more things, mostly discovered whilst trying to reproduce CASSANDRA-10938 on Windows: * Reverted to batching by replica to avoid Cassandra processes using too much CPU. Batching by replica was changed to batching by partition key during the code review of CASSANDRA-9302 because there is a cost in determining the replicas of each record. However, sending batches with records on different replicas is probably worst then spending a few cycles in Python determining the correct replicas. It also allows up to use LOGGED batching, see next point. * Changed batch type from UNLOGGED to LOGGED to avoid a WARN in the Cassandra log files and for more consistent failed batch status reporting (even though INSERT should be idempotent, so this can be changed back to UNLOGGED if performance is impacted too much but it shouldn't since all parititions should be local). * Fixed a problem with cassandra-stress that only manifested on Windows and on trunk when using a custom profile. However the Windows stress launch scripts were incorrect from 2.1 onwards. I worked on the 2.2 patch and merged upwards. I also cherry-picked back to 2.1 with manual conflict resolution in bin/cqlsh. Even though we don't support Windows for 2.1 I figured it was best to fix these problems anyway. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072503#comment-15072503 ] Stefania commented on CASSANDRA-9303: - Thanks for running the new tests on Windows [~pauloricardomg], I will set-up a Windows environment and take a look at the failures. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071623#comment-15071623 ] Paulo Motta commented on CASSANDRA-9303: There are still quite a few failures on Windows so I think we'll need to setup a cassci windows run to monitor them. The following tests are hanging, so I created a [dtest branch|https://github.com/pauloricardomg/cassandra-dtest/tree/9303-skipping] skipping them: I believe those might be somehow related to CASSANDRA-10858: * test_copy_to_with_fewer_failures_than_max_attempts * test_copy_to_with_more_failures_than_max_attempts Those might be related to CASSANDRA-10938: * test_bulk_round_trip_default * test_bulk_round_trip_blogposts * test_bulk_round_trip_with_timeouts * test_bulk_round_trip_with_low_ingestrate I attached a [dtest run output|https://issues.apache.org/jira/secure/attachment/12779517/dtest.out] with more details about other failures. I will be off until January 6th, so feel free to find another reviewer until then. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > Attachments: dtest.out > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15071009#comment-15071009 ] Stefania commented on CASSANDRA-9303: - CI on trunk restarted. DTEST PR: https://github.com/riptano/cassandra-dtest/pull/724 > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070957#comment-15070957 ] Stefania commented on CASSANDRA-9303: - CI for 2.2 and 3.0 is fine. The problems on trunk seem to originate from commit 3c8d87f4324e5ff8bf6b1c3652e9c5eacf03bc20, CASSANDRA-10580. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070880#comment-15070880 ] Stefania commented on CASSANDRA-9303: - 2.1 CI is OK. Found a small merge error in 2.2, fixed it and restarted CI for 2.2 and 3.0. On trunk we will get lots of timeouts, it seems there is a problem on the unpatched branch. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070852#comment-15070852 ] Stefania commented on CASSANDRA-9303: - Thank you for your input, we will commit to 2.1+ then. -- bq. Only minor nit is to use {{os.linesep instead}} of {{'\n'}} on {{_printmsg(msg, eol='\n')}}. Nope, it's intentional and in fact I changed the {{os.linesep}} occurrences that I found in other parts of the file as well. See the doc here: https://docs.python.org/2/library/os.html?highlight=linesep#os.linesep - on Windows {{os.linesep}} is '\r\n' which then becomes '\r\r\n' because '\n' is automatically converted to '\r\n' when writing to text files. I assume this includes stdout. bq. can you just check the failing dtest {{cqlsh_copy_tests.py:CqlshCopyTest.test_read_missing_partition_key}} from CASSANDRA-10854 They pass only on the [dtest 9303 branch|https://github.com/stef1927/cassandra-dtest/commits/9303] since the exception name has changed - I had to fix one more small thing in the code as well. I've also fixed a bug with COPY TO that I discovered when testing with VNODES disabled. bq. Feel free to squash and up-merge. Squashed (except for the latest changes) and merged: ||2.1||2.2||3.0||trunk|| |[patch|https://github.com/stef1927/cassandra/commits/9303-2.1]|[patch|https://github.com/stef1927/cassandra/commits/9303-2.2]|[patch|https://github.com/stef1927/cassandra/commits/9303-3.0]|[patch|https://github.com/stef1927/cassandra/commits/9303]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-dtest/]| We have a problem running the Windows tests, aside from the long time it takes, because we cannot parametrize the CASSCI job then we cannot use the 9303 dtest branch and therefore most of the tests will fail because {{format_value()}} in _formatting.py_ expects more parameters. The master branch tests won't exercise most of the options either. I do not have a working Windows environment available right now, would you be able to run _cqlsh_copy_tests.py_ on your environment and then send me any errors? Alternatively I can create the dtest pull request and run the tests on CASSCI once both PR and this ticket have been committed, but we keep this ticket open until we've verified the Windows tests are also OK. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069832#comment-15069832 ] Jonathan Ellis commented on CASSANDRA-9303: --- I'm reluctant to pull 9302 out, so I'd prefer adding this to 2.1 as well. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069743#comment-15069743 ] Stefania commented on CASSANDRA-9303: - bq. I'd agree with Aleksey Yeschenko that this should go only on 3.0+, however, since this is a follow-up/complement to CASSANDRA-9302, which is a new feature and went into an unreleased 2.1 version, I'd advocate for this to go into 2.1 as well, unless CASSANDRA-9302 is removed from 2.1, otherwise the new copy from/to feature would ship half-complete on 2.1 what wouldn't make much sense IMO. I tend to agree that CASSANDRA-9302 is somewhat incomplete without these options. So either we roll it back from 2.1 and 2.2 or we commit this as well. Further, it would be a pain to fix 9302 bugs without this patch since the code changed significantly enough to cause merge conflicts. [~iamaleksey] WDYT? > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069739#comment-15069739 ] Aleksey Yeschenko commented on CASSANDRA-9303: -- This situation is unfortunate, but you are right. Technically we would revert CASSANDRA-9302 and CASSANDRA-9304 from 2.1, but it does seem easier to just go ahead and commit this to 2.1. Actually, I'm fine with either option. Revert the previous commits or commit this patch to 2.1 as well. [~jbellis], as the reporter of this JIRA, what'd be your preference? > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069736#comment-15069736 ] Paulo Motta commented on CASSANDRA-9303: Looks good now. Tested locally and all options look good. Dtests are also passing. Only minor nit is to use {{os.linesep}} instead of {{'\n'}} on {{_printmsg(msg, eol='\n')}}. bq. It doesn't work because stdin is actually set to the file specified with the -f option. Since this is not an issue with COPY but with the way -f is implemented, I would prefer deferring to another ticket if this functionality is required. +1 bq. I've also rebased on the 2.1 branch (since CASSANDRA-9494 will only be on trunk) and applied the fix for CASSANDRA-10854 since it requires extra work on this branch. +1, can you just check the failing dtest {{cqlsh_copy_tests.py:CqlshCopyTest.test_read_missing_partition_key}} from CASSANDRA-10854? bq. I would like to squash the dtest commits as well, let me know if you still need to review some individual commits first. Feel free to squash and up-merge. bq. I'm still waiting to hear about which branches we need to apply this patch to; plus I would like to squash the commits before up-merging. I'd agree with [~iamaleksey] that this should go only on 3.0+, however, since this is a follow-up/complement to CASSANDRA-9302, which is a new feature and went into an unreleased 2.1 version, I'd advocate for this to go into 2.1 as well, unless CASSANDRA-9302 is removed from 2.1, otherwise the *new copy from/to* feature would ship half-complete on 2.1 what wouldn't make much sense. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069701#comment-15069701 ] Aleksey Yeschenko commented on CASSANDRA-9303: -- This is tricky. This *should* only go to 3.x. 2.1 is close to EOL and at this stage should only include critical bug fixes. That said, for pragmatic reasons, committing to 3.0.x as well should outweigh rues breakage, so I'm fine if we do it (this one time). > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.0.x, 3.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068064#comment-15068064 ] Stefania commented on CASSANDRA-9303: - bq. So just reverting to the previous approach should be fine, Regarding the printing of the read options, I was thinking of something more concise instead of a one-config-per-line which can get too verbose, ... Done, check reverted and we no longer print each option on a separate line but only once per section. bq. Regarding COPY TO STDOUT should we skip printing info messages since a user may want to redirect the output to another script or file? It's done but I had to move the static methods into {{CopyTask}} so the diff is a bit hard, sorry about it. bq. f I have an {{import.cql}} file containing {{COPY keyspace1.standard1 from stdin;}} is the following supposed to work: {{cat input.csv | bin/cqlsh -f import.cql?}} It doesn't work because {{stdin}} is actually set to the file specified with the {{-f}} option. Since this is not an issue with COPY but with the way {{-f}} is implemented, I would prefer deferring to another ticket if this functionality is required. I've also rebased on the 2.1 branch (since CASSANDRA-9494 will only be on trunk) and applied the fix for CASSANDRA-10854 since it requires extra work on this branch. I'm still waiting to hear about which branches we need to apply this patch to; plus I would like to squash the commits before up-merging. I would like to squash the dtest commits as well, let me know if you still need to review some individual commits first. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066557#comment-15066557 ] Paulo Motta commented on CASSANDRA-9303: bq. That's correct, copy-to* sections are not read in from executions and vice-versa. I've added a check to explicitly skip invalid or wrong direction options from config files along with more log messages so that it should be easier to see that an option is not read or ignored. Ok, my bad then. I tested with the previous version which did not have exclusive sections. I don't think it`s necessary to skip invalid options (within the exclusive sections) as they are harmless and their treating make the code a bit more complex. So just reverting to the previous approach should be fine, but feel free to keep the way it is if you think it's OK. Sorry about this confusion! Regarding the printing of the read options, I was thinking of something more concise instead of a one-config-per-line which can get too verbose, something along the lines of: {noformat} Reading options from /home/paulo/.cassandra/cqlshrc:[copy-from]: {chunksize=100, ingestrate=100, wtf=102, numprocesses=5} Reading options from /home/paulo/.cassandra/cqlshrc:[copy-from:keyspace1.standard1] : {ingestrate=200, invalid="true"} Using 5 child processes {noformat} Two more things: * Regarding {{COPY TO STDOUT}} should we skip printing info messages since a user may want to redirect the output to another script or file? Like {{echo "copy keyspace1.standard1 TO STDOUT with SKIPCOLS = 'C2';" | bin/cqlsh | process.sh}} * If I have an {{import.cql}} file containing {{COPY keyspace1.standard1 from stdin;}} is the following supposed to work: {{cat input.csv | bin/cqlsh -f import.cql}}? Because I'm getting the following: {noformat} ➜ cassandra git:(9303-2.1) ✗ cat input.csv | bin/cqlsh -f import.cql Using 3 child processes Starting copy of keyspace1.standard1 with columns ['key', 'C0', 'C1', 'C2', 'C3', 'C4']. [Use \. on a line by itself to end input] Processed: 0 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s 0 rows imported from 0 files in 0.007 seconds (0 skipped). {noformat} Thanks, we are really close now! :-) > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066163#comment-15066163 ] Stefania commented on CASSANDRA-9303: - bq. I tested the new config options and the ingest rate is now working like a charm. Thanks, I made a slight modification to the ingest rate algorithm to give a better chance to the receive meter to show the statistics. The ingest rate should still be pretty accurate. bq. I was initially thinking that while \[copy\] is a global section, \[copy-from*\] and \[copy-to*\] are exclusive sections for these commands, so for example if you define INGESTRATE by mistake in the \[copy-to\] section it's not picked up by a copy-from execution. That's correct, {{copy-to*}} sections are not read in from executions and vice-versa. I've added a check to explicitly skip invalid or wrong direction options from config files along with more log messages so that it should be easier to see that an option is not read or ignored. bq. Can you also add some examples to conf/cqlshrc.sample ? And maybe also update the cql protocol version there which is quite old. Done. bq.Also in the Reading options from /home/paulo/.cqlsh/cqlshrc message, maybe print which options are being read to improve clarity (don't worry if not straightforward) Done. bq. Cool! Since it's an edge-case I guess we can omit in the help and print a message instead in case it happens. Done. bq. Sounds good, it just seems the skipped columns is still being printed on the message Starting copy of keyspace1.standard1 with columns \['key', 'C0', 'C1', 'C2', 'C3', 'C4'\]. (you fixed before, but it came back somehow). It came back because of the changes to SKIPCOLS, it should be OK now. bq. Move csv_dialect_defaults from cqlsh.py to copyutil.py Done, I got rid of it. bq. Move exclusive skip_columns field from CopyTask to ImportTask Done, I've also moved it from ChildProcess to ImportProcess. bq. csv_options are a bit misleading since they are not exclusive csv-related options, can we maybe rename the tuple CopyOptions(csv, dialect, unrecognized) to Options(copy, dialect, unrecognized)? Done -- I need to clarify with [~iamaleksey] for which branch we need to commit this since CASSANDRA-9494 was only committed to trunk. I will up-merge later on today once I know for sure. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064115#comment-15064115 ] Paulo Motta commented on CASSANDRA-9303: Looking very good, I tested the new config options and the ingest rate is now working like a charm. Some follow-up comments below: bq. Done, I cleaned up the options a bit as well and removed the helper methods in the main cqlsh files. * I was initially thinking that while \[copy\] is a global section, \[copy-from*\] and \[copy-to*\] are exclusive sections for these commands, so for example if you define INGESTRATE by mistake in the \[copy-to\] section it's not picked up by a copy-from execution. * Can you also add some examples to {{conf/cqlshrc.sample}} ? And maybe also update the cql protocol version there which is quite old. * Also in the {{Reading options from /home/paulo/.cqlsh/cqlshrc}} message, maybe print which options are being read to improve clarity (don't worry if not straightforward) bq. If a file from a previous execution exists it will be ranamed to .MMDD_HHMMSS. Cool! Since it's an edge-case I guess we can omit in the help and print a message instead in case it happens. bq. So, I converted SKIPCOLS to a COPY FROM option and changed its semantic to just skip columns that exist in the file. Sounds good, it just seems the skipped columns is still being printed on the message {{Starting copy of keyspace1.standard1 with columns \['key', 'C0', 'C1', 'C2', 'C3', 'C4'\].}} (you fixed before, but it came back somehow). Minor code style nits: * Move csv_dialect_defaults from cqlsh.py to copyutil.py * Move exclusive skip_columns field from CopyTask to ImportTask * csv_options are a bit misleading since they are not exclusive csv-related options, can we maybe rename the tuple CopyOptions(csv, dialect, unrecognized) to Options(copy, dialect, unrecognized)? We're getting there, I guess we'll be done by next round. :) > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062302#comment-15062302 ] Stefania commented on CASSANDRA-9303: - bq. I'd suggest the following \[copy(:ks.table)\] (global and per-table copy (to and from) options), \[copy-from(:ks.table)\] (global and per-table copy-from options), \[copy-to(:ks.table)\] (global and per-table copy-to options) where (:ks.table) is optional. so you can have \[copy\], \[copy-to\], \[copy-from\], \[copy-to:ks.table\], \[copy-from:ks.table\]. Done, I cleaned up the options a bit as well and removed the helper methods in the main cqlsh files. bq. maybe we could just add an unique suffix to avoid appending to an existing file from a previous execution? If a file from a previous execution exists it will be ranamed to .MMDD_HHMMSS. bq. We can address if it won't take too much time, otherwise we can address it separately. Can we maybe improve it by making batchsize adaptive = min(batchsize, ingest_rate - current_record) or something more complicated will be needed? Done, adaptive chunk size and retries needed changing. bq. Move SKIPCOLS to COPY_COMMON_OPTIONS since it can be used in both copy-to and copy-from. Actually it should be a COPY FROM only option, see more below. bq. Regarding the beahvior of SKIPCOLS with COPY FROM, right now it only supports having fewer columns in the CSV. Should we also support actually skipping columns in the CSV even if they are present? I think the sematic I chose, to use SKIPCOLS to subtract from the set of columns specified in the command line, is not as advantageous as the ability to skip columns in the file. Providing both features with the same option would be confusing. So, I converted SKIPCOLS to a COPY FROM option and changed its semantic to just skip columns that exist in the file. If in future the need arises to specify "all columns except" in the command line, we can introduce a regex like extression (^col_name) in the columns part of the COPY cmd. bq. Another related feature to have in the future would be to pick only specific columnms from the csv and allowing custom orderings of columns, but we can leave that for later if there's a need. I think reordering columns is not as useful as skipping them so I tend to agree to leave this as a future development if the need arises. bq. After those are addressed you can probably start making 2.2+ patches. I changed a lot of code today and I've run out of time anyway, so I'll wait for one more round of review before up-merging. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060520#comment-15060520 ] Paulo Motta commented on CASSANDRA-9303: Looking good, thanks! Some follow-ups below: bq. CONFIGSECTIONS: this is removed and instead we search the following static sections: \[copy\], \[copy-ks-table\], \[copy-ks-table-from\] or \[copy-ks-table-to\], in this order. sounds good! I'd suggest the following \[copy(:ks.table)\] (global and per-table copy (to and from) options), \[copy-from(:ks.table)\] (global and per-table copy-from options), \[copy-to(:ks.table)\] (global and per-table copy-to options) where (:ks.table) is optional. so you can have \[copy\], \[copy-to\], \[copy-from\], \[copy-to:ks.table\], \[copy-from:ks.table\]. bq. if no error file is specified I've introduced a default error file called import_ks_table.err nice! maybe we could just add an unique suffix to avoid appending to an existing file from a previous execution? bq. Another thing that follows from the CASSANDRA-9302 review is that the INGESTRATE only works if it is much bigger than the CHUNKSIZE. We could address it here if you think this is important. We can address if it won't take too much time, otherwise we can address it separately. Can we maybe improve it by making batchsize adaptive = {{min(batchsize, ingest_rate - current_record)}} or something more complicated will be needed? Some minor things I missed before: * Move {{SKIPCOLS}} to {{COPY_COMMON_OPTIONS}} since it can be used in both copy-to and copy-from. * Regarding the beahvior of {{SKIPCOLS}} with COPY FROM, right now it only supports having fewer columns in the CSV. Should we also support actually skipping columns in the CSV even if they are present? ** Another related feature to have in the future would be to pick only specific columnms from the csv and allowing custom orderings of columns, but we can leave that for later if there's a need. After those are addressed you can probably start making 2.2+ patches. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060229#comment-15060229 ] Stefania commented on CASSANDRA-9303: - {quote} I didn't really get the purpose of the CONFIGSECTIONS option and I think it complicates more than bring us benefits. Is there any particular case you want to achieve with this option? Can't we just have a general \[copy\] section, in addition specific \[ks.table\] sections for custom options per-table? {quote} The purpose was to let people chose multiple sections for example depending on the direction, they may want to override some options that are common to both directions but require different values depending on the direction. Another purpose was a common copy section as you pointed out. Unfortunately we cannot have hierarchical sections, it doesn't seem to be supported. bq. We could also support those sections on cqlshrc as well, and maybe add example to conf/cqlshrc.sample. But if it's too much additional work just leave it as is. It's not much work since we have access to the {{CONFIG_FILE}} variable. It's just a matter of designing this feature in a sensible way. Here's a proposal, I'll wait for your comments before starting work: * CONFIGFILE: a file where to read config sections, if not specified we search _.cqlshrc_ * CONFIGSECTIONS: this is removed and instead we search the following static sections: \[copy\], \[copy-ks-table\], \[copy-ks-table-from\] or \[copy-ks-table-to\], in this order. {quote} The ERRFILE option is not present in COPY_FROM_OPTIONS so it does not show up in the auto completer. Also, it seems the default ERRFILE is not being written if one is not explicitly specified. We could extend this error message on ImportTask.process_records to print the errfile name so the user will know where to look if he didn't specify one: "Failed to process 10 rows (failed rows written to bla.err)" {quote} Done, if no error file is specified I've introduced a default error file called _import_ks_table.err_ since we may have multiple input files now so it was not clear which input file name to pick as a default. This has also the advantage of working for STDIN as well. I've left the default file in the current folder, I didn't try anything too fancy, let me know if you want to enhance this. {quote} In copyutil.py:maybe_read_config_file can you replace {code} ret.update(dict([(k, v,) for k, v in opts.iteritems() if k not in ['configfile', 'configsections']])) {code} with {{ret.update(opts)}} since you already popped 'configfile' and 'configsections' from opts before? (or maybe there's something I'm missing). {quote} You didn't miss anything, it's fixed now thanks. bq. The name {{ExportTask.check_processes}} is a bit misleading, since it sends work and monitors progress, maybe rename to schedule_and_monitor, or coordinate or start_work or even monitor_processes ? I can't find a good name as well as you can see Renamed it to {{export_records}} and renamed the equivalent method in {{ImportTask}} to {{import_records}}. bq. Minor typo in {{ExportTask.get_ranges}} method description: rage -> range Fixed. {quote} On this snippet in ExportTask.get_ranges: {code} # For the last ring interval we query the same replicas that hold the last token in the ring if previous_range and (not end_token or previous < end_token): ranges[(previous, end_token)] = ranges[previous_range].copy() {code} for the last ring interval aren't we supposed to query the replicas that hold the first token in the ring instead (wrap-around)? {quote} Yes technically this would be the correct thing to do. I guess so far we did not really care about edge cases, even if we query the wrong replicas for one range it doesn't really matter for performance. I changed it to query the first token replicas now. {quote} On ImportProcess.run_normal, did you find out the reason why the commented snippet below slows down the query? Did you try it again after the review changes of CASSANDRA-9302? {code} # not sure if this is required but it does slow things down three fold # query_statement.consistency_level = self.consistency_level {code} If it still holds that's quite bizarre, as the same consistency is used later in the batch statement. I wonder how the prepared statement CL interacts with the batch CL, if it does at all. {quote} It must have been another problem fixed by 9302 as it makes no difference to performance now, I've re-introduced it. {quote} On ImportReader.get_source you forgot a debug print: print "Returning source {}".format(ret). You should probably remove it or print only on debug mode. {quote} Removed. {quote} On formatting.py you can probably replace the format_integer_with_thousands_sep with a simpler imple
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060214#comment-15060214 ] Stefania commented on CASSANDRA-9303: - It's fixed now thanks. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059027#comment-15059027 ] Paulo Motta commented on CASSANDRA-9303: Also, there seems to be a new failure with [cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_round_trip_with_rate_file|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-dtest/lastCompletedBuild/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_round_trip_with_rate_file/] probably due to the review changes of CASSANDRA-9302. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059004#comment-15059004 ] Paulo Motta commented on CASSANDRA-9303: Impressive work [~Stefania]. I tested it locally and most options work as expected. Overall I'm very satisfied with code and tests, except for some minor details listed below: * I didn't really get the purpose of the {{CONFIGSECTIONS}} option and I think it complicates more than bring us benefits. Is there any particular case you want to achieve with this option? Can't we just have a general {{\[copy\]}} section, in addition specific {{\[ks.table\]}} sections for custom options per-table? ** We could also support those sections on {{cqlshrc}} as well, and maybe add example to {{conf/cqlshrc.sample}}. But if it's too much additional work just leave it as is. * The {{ERRFILE}} option is not present in {{COPY_FROM_OPTIONS}} so it does not show up in the auto completer. ** Also, it seems the default {{ERRFILE}} is not being written if one is not explicitly specified. ** We could extend this error message on {{ImportTask.process_records}} to print the errfile name so the user will know where to look if he didn't specify one: *** {{"Failed to process 10 rows (failed rows written to bla.err)"}} * In {{copyutil.py:maybe_read_config_file}} can you replace {noformat}ret.update(dict([(k, v,) for k, v in opts.iteritems() if k not in ['configfile', 'configsections']])){noformat} with {noformat}ret.update(opts){noformat} since you already popped {{'configfile'}} and {{'configsections'}} from {{opts}} before? (or maybe there's something I'm missing). * The name {{ExportTask.check_processes}} is a bit misleading, since it sends work and monitors progress, maybe rename to {{schedule_and_monitor}}, or {{coordinate}} or {{start_work}} or even {{monitor_processes}} ? I can't find a good name as well as you can see :P * Minor typo in {{ExportTask.get_ranges}} method description: {{rage}} -> {{range}} * On this snippet in {{ExportTask.get_ranges}}: {code} # For the last ring interval we query the same replicas that hold the last token in the ring if previous_range and (not end_token or previous < end_token): ranges[(previous, end_token)] = ranges[previous_range].copy() {code} for the last ring interval aren't we supposed to query the replicas that hold the first token in the ring instead (wrap-around)? * On {{ImportProcess.run_normal}}, did you find out the reason why the commented snippet below slows down the query? Did you try it again after the review changes of CASSANDRA-9302? {noformat} # not sure if this is required but it does slow things down three fold # query_statement.consistency_level = self.consistency_level {noformat} If it still holds that's quite bizarre, as the same consistency is used later in the batch statement. I wonder how the prepared statement CL interacts with the batch CL, if it does at all. * On {{ImportReader.get_source}} you forgot a debug print: {{print "Returning source {}".format(ret)}}. You should probably remove it or print only on debug mode. * On {{formatting.py}} you can probably replace the {{format_integer_with_thousands_sep}} with a simpler implementation taking advantage of python support to thousand separator formatting (only available with "," though, that's why the replace afterwards): {code} def format_integer_with_thousands_sep(val, thousands_sep=','): return "{:,}".format(val).replace(',', thousands_sep) {code} * Suggestion: modify the following messages to include the number of files written/read: {noformat} 1000 rows exported to N files in 1.257 seconds. 130 rows imported from N files in 0.154 seconds. {noformat} * I found two situations where one corrupted row may fail importing of all the other rows, so you should probably cover these in your dtests: ** when there is a parse error in the primary key (stress-generated blob in this case): {noformat} Failed to import 1000 rows: ParseError - non-hexadecimal number found in fromhex() arg at position 0 - given up without retries Exceeded maximum number of parse errors 10 Failed to process 1000 rows {noformat} ** when there is a row with fewer number of columns in the CSV: {noformat} Failed to import 20 rows: InvalidRequest - code=2200 [Invalid query] message="There were 6 markers(?) in CQL but 5 bound variables" - will retry later, attempt 1 of 5 Failed to import 20 rows: InvalidRequest - code=2200 [Invalid query] message="There were 6 markers(?) in CQL but 5 bound variables" - will retry later, attempt 2 of 5 Failed to import 20 rows: InvalidRequest - code=2200 [Invalid query] message="There were 6 markers(?) in CQL but 5 bound variables" - will retry later, attempt 3 of 5 Failed to import 20 rows: InvalidRequest - code=2200 [Invalid query] message="There were 6 markers(?) in CQL b
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044613#comment-15044613 ] Stefania commented on CASSANDRA-9303: - [~aholmber] : this is the final part of the COPY enhancements and it is also ready for review. The patch is based on CASSANDRA-9494 and CASSANDRA-9302. I'll up-merge to 2.2+ once these tickets have been reviewed. Here are the 2.1 links: |[patch|https://github.com/stef1927/cassandra/commits/9303-2.1]| |[dtest|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-9303-2.1-dtest/]| Also, note the location of the tests that were written for the new options: https://github.com/stef1927/cassandra-dtest/commits/9303. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036020#comment-15036020 ] Stefania commented on CASSANDRA-9303: - bq. 1. If I set SKIPROWS to 10 say, and also set HEADER to true, will I skip 10 or 11 rows? 10 data rows and the header will be skipped. bq. 2. Why do you disable the ERRFILE for stdin? No reason other than coming up with a sensible default name. bq. 3. If the MAXERRORS/MAXINSERTERRORS is >1, where do you keep the error around? Is it captured anywhere so someone can look back on what type of error occurred? NoHostAvailableException, WriteTimeoutException, bad date format, etc. Errors get printed to stdout whilst the failed rows are saved to ERRFILE and not printed to stdout. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15035771#comment-15035771 ] Brian Hess commented on CASSANDRA-9303: 2 questions: 1. If I set SKIPROWS to 10 say, and also set HEADER to true, will I skip 10 or 11 rows? 2. Why do you disable the ERRFILE for stdin? > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1503#comment-1503 ] Stefania commented on CASSANDRA-9303: - Since the descriptions in the table above are for the loader, here is the corresponding documentation for COPY, there are minor differences but they should be pretty equivalent: {code} Available common COPY options and defaults: DELIMITER=',' - character that appears between records QUOTE='"' - quoting character to be used to quote fields ESCAPE='\' - character to appear before the QUOTE char when quoted HEADER=false- whether to ignore the first line NULL='' - string that represents a null value DATETIMEFORMAT= - timestamp strftime format '%Y-%m-%d %H:%M:%S%z' defaults to time_format value in cqlshrc JOBS='6'- the number of jobs each process can work on at a time MAXATTEMPTS='5' - the maximum number of attempts per batch or range REPORTFREQUENCY='1' - the frequency with which we display status updates DECIMALSEP='.' - the separator for decimal values THOUSANDSSEP='' - the separator for thousands digit groups BOOLSTYLE='True,False' - the representation for booleans, case insensitive, specify true followed by false, for example yes,no or 1,0 NUMPROCESSES='n'- the number of worker processes, by default the number of cores minus one capped at 16 CONFIGFILE='' - a configuration file where you can specify WITH options, which may be overwritten by those specified on the command line. The format of the config file is the same as cqlshrc (see the Python ConfigParser documentation), you can put your options under a section named 'ks.table' where ks and table are the names of they keyspace and table of the COPY command. You can also specify alternative sections with CONFIGSECTIONS. You cannot recursively link multiple configuration files by specifying CONFIGFILE or CONFIGSECTIONS in a configuration file. CONFIGSECTIONS='' - a comma separated list of sections to be read from a config file specified via CONFIGFILE. The order is important since later sections will override values from previous sections if the same key is specified in multiple sections. RATEFILE='' - an optional file where to print the output statistics Available COPY FROM options and defaults: CHUNKSIZE='1000'- the size of chunks passed to worker processes INGESTRATE='5' - the maximum rate to insert data in rows per second MINBATCHSIZE='2'- the minimum size of an import batch MAXBATCHSIZE='20' - the maximum size of an import batch TTL='-1'- the time to live in seconds, by default data will not expire (neg. ttl) MAXROWS='-1'- the maximum number of rows, -1 means no maximum SKIPROWS='0'- the number of rows to skip SKIPCOLS='' - a comma separated list of column names to skip MAXPARSEERRORS='-1' - the maximum global number of parsing errors, -1 means no maximum MAXINSERTERRORS='-1'- the maximum global number of insert errors, -1 means no maximum ERRFILE='' - a file where to store all rows that could not be imported, by default this is concatenated with ".err", disabled if importing from STDIN Available COPY TO options and defaults: ENCODING='utf8' - encoding for CSV output PAGESIZE='1000' - the page size for fetching results PAGETIMEOUT=10 - the page timeout in seconds for fetching results BEGINTOKEN=''- the minimum token string to consider when exporting data ENDTOKEN='' - the maximum token string to consider when exporting data {code} > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical >
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033320#comment-15033320 ] Stefania commented on CASSANDRA-9303: - All options have been completed, refer to the table above. I still have to implement multi-file import however. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033243#comment-15033243 ] Stefania commented on CASSANDRA-9303: - I was thinking of list of python globs so we can do things like: {{file1, file2, ... fileN}} but also {{*.csv, *.txt, folder/*}} and so forth. Is this what you have in mind? > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032077#comment-15032077 ] Jonathan Ellis commented on CASSANDRA-9303: --- Why a directory of files vs a list of any files? (Globbing can turn a directory into a list easily enough.) > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015299#comment-15015299 ] Stefania commented on CASSANDRA-9303: - Thanks for your feedback, here is the updated versions of the progress tables. I will try and see if we can add support for importing multiple files as well. h3. Importing ||cassandra-loader||COPY FROM||description||status|| |-configFile filename| |File with configuration options|TODO (extend cqlsh config file)| |-delim delimiter|delimiter|Delimiter to use|already available| |-dateFormat dateFormatString|dtformats|Date format|TODO| |-nullString nullString|nullval|String that signifies NULL|already available| |-skipRows skipRows| |Number of rows to skip|TODO| |-skipCols columnsToSkip|column|Comma-separated list of columsn to skip|already available, they can specify which columns in the cmd syntax| |-maxRows maxRows| |Maximum number of rows to read (-1 means all)|TODO| |-maxErrors maxErrors| |Maximum parse errors to endure|TODO| |-badDir badDirectory| |Directory for where to place badly parsed rows.|TODO| |-port portNumber| |CQL Port Number|already available via cqlsh| |-user username| |Cassandra username|already available via cqlsh| |-pw password| |Password for user|already available via cqlsh| |-ssl-truststore-path path| |Path to SSL truststore|already available via cqlsh| |-ssl-truststore-pw pwd| |Password for SSL truststore|already available via cqlsh| |-ssl-keystore-path path| |Path to SSL keystore|already available via cqlsh| |-ssl-keystore-pw pwd| |Password for SSL keystore|already available via cqlsh| |-consistencyLevel CL| |Consistency level|already available via cqlsh| |-numFutures numFutures|jobs|Number of CQL futures to keep in flight|already available| |-batchSize batchSize|minbatchsize, maxbatchsize|Number of INSERTs to batch together|alredy available| |-decimalDelim decimalDelim|decimalsep|Decimal delimiter|done| | |thousandssep|Thousands delimiter|done| |-boolStyle boolStyleString|boolstyle|Style for booleans|done| |-numThreads numThreads|numProcesses|Number of concurrent threads (files) to load|done| |-queryTimeout # seconds|pageTimeout|Query timeout (in seconds)|already available| |-numRetries numRetries|maxattempts|Number of times to retry the INSERT|already available| |-maxInsertErrors # errors| |Maximum INSERT errors to endure|TODO| |-rate rows-per-second| |Maximum insert rate|TODO (unsure how)| |-progressRate num txns|reportfrequency|How often to report the insert rate|already available| |-rateFile filename| |Where to print the rate statistics|TODO| |-successDir dir| |Directory where to move successfully loaded files|will implement only if adding support for multi-file import| |-failureDir dir| |Directory where to move files that did not successfully load|will implement only if adding support for multi-file import| h3. Exporting ||cassandra-unloader||COPY TO||description||status|| |configFile filename| |File with configuration options|TODO (extend cqlsh config file)| |-delim delimiter|delimiter|Delimiter to use|already available| |-dateFormat dateFormatString|dtformats|Date format|already available| |-nullString nullString|nullval|String that signifies NULL|already available| |-port portNumber| |CQL Port Number|already available via cqlsh| |-user username| |Cassandra username|already available via cqlsh| |-pw password| |Password for user|already available via cqlsh| |-ssl-truststore-path path| |Path to SSL truststore|already available via cqlsh| |-ssl-truststore-pw pwd| |Password for SSL truststore|already available via cqlsh| |-ssl-keystore-path path| |Path to SSL keystore|already available via cqlsh| |-ssl-keystore-pw pwd| |Password for SSL keystore|already available via cqlsh| |consistencyLevel CL| |Consistency level|already available via cqlsh| |decimalDelim decimalDelim|decimalsep|Decimal delimiter|done| | |thousandssep|Thousands delimiter|done| |boolStyle boolStyleString|boolstyle|Style for booleans|done| |numThreads numThreads|numprocesses|Number of concurrent threads to unload|done| |beginToken tokenString|begintoken|Begin token|done| |endToken tokenString|endtoken|End token|done| Where it says _done_, I'm actually still working on automated tests. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014773#comment-15014773 ] Stefania commented on CASSANDRA-9303: - They are new in the CASSANDRA-9302 patch, not yet committed. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014280#comment-15014280 ] Brian Hess commented on CASSANDRA-9303: I'm curious - how do you set the following in CQLSH COPY FROM: - numFutures (the number of concurrent asynchronous requests "in flight" at a time) - batchSize (the number of INSERTs to batch and send as one request) - queryTimeout (the amount of time to wait on queries) - numRetries (the number of times to retry failed/timed-out queries) - progressRate (the rate at which progress is reported) All of these are marked as "already available", but it isn't clear how to set them (nor from the documentation). > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014261#comment-15014261 ] Brian Hess commented on CASSANDRA-9303: Are there no plans to support loading a directory of files? I would say that that is one of the bigger options leveraged by users of cassandra-loader. I'm +1 on not doing the things that CQLSH already handles (username, password, etc). > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013881#comment-15013881 ] Tyler Hobbs commented on CASSANDRA-9303: I don't think we need to repeat the options that are already passed to cqlsh (user, port, ssl stuff, consistency level). Since we only support loading a single file right now, I don't think {{successDir}} and {{failureDir}} are important. However, a single-file version of {{badDirectory}} for storing rows that errored in some way would be good. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013193#comment-15013193 ] Stefania commented on CASSANDRA-9303: - [~jbellis], [~thobbs] : do we want all options or are there some we don't care about? For example those related to moving files to specific folders (successDir, failureDir) or those for specifying options that are already passed to cqlsh (user, port, etc). I think the cassandra-unloader also splits output files. > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9303) Match cassandra-loader options in COPY FROM
[ https://issues.apache.org/jira/browse/CASSANDRA-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013182#comment-15013182 ] Stefania commented on CASSANDRA-9303: - Here is the current status. I will regularly edit the following tables to reflect the progress: h3. Importing ||cassandra-loader||COPY FROM||description||status|| |-configFile filename| |File with configuration options|TODO| |-delim delimiter|delimiter|Delimiter to use|already available| |-dateFormat dateFormatString|dtformats|Date format|TODO but we can parse all valid CQL time formats| |-nullString nullString|nullval|String that signifies NULL|already available| |-skipRows skipRows| |Number of rows to skip|TODO| |-skipCols columnsToSkip|column|Comma-separated list of columsn to skip|already available, they can specify which columns in the cmd syntax| |-maxRows maxRows| |Maximum number of rows to read (-1 means all)|TODO| |-maxErrors maxErrors| |Maximum parse errors to endure|TODO| |-badDir badDirectory| |Directory for where to place badly parsed rows.|TODO| |-port portNumber| |CQL Port Number|TODO| |-user username| |Cassandra username|TODO| |-pw password| |Password for user|TODO| |-ssl-truststore-path path| |Path to SSL truststore|TODO| |-ssl-truststore-pw pwd| |Password for SSL truststore|TODO| |-ssl-keystore-path path| |Path to SSL keystore|TODO| |-ssl-keystore-pw pwd| |Password for SSL keystore|TODO| |-consistencyLevel CL| |Consistency level|TODO| |-numFutures numFutures|jobs|Number of CQL futures to keep in flight|already available| |-batchSize batchSize|minbatchsize, maxbatchsize|Number of INSERTs to batch together|alredy available| |-decimalDelim decimalDelim| |Decimal delimiter|TODO| |-boolStyle boolStyleString| |Style for booleans|TODO| |-numThreads numThreads| |Number of concurrent threads (files) to load|TODO (numProcesses)| |-queryTimeout # seconds|pageTimeout|Query timeout (in seconds)|already available| |-numRetries numRetries|maxattempts|Number of times to retry the INSERT|already available| |-maxInsertErrors # errors| |Maximum INSERT errors to endure|TODO| |-rate rows-per-second| |Maximum insert rate|TODO (unsure how)| |-progressRate num txns|reportfrequency|How often to report the insert rate|already available| |-rateFile filename| |Where to print the rate statistics|TODO| |-successDir dir| |Directory where to move successfully loaded files|TODO| |-failureDir dir| |Directory where to move files that did not successfully load|TODO| h3. Exporting ||cassandra-unloader||COPY TO||description||status|| |configFile filename| |File with configuration options|TODO| |-delim delimiter|delimiter|Delimiter to use|TODO| |-dateFormat dateFormatString|dtformats|Date format|already available| |-nullString nullString|nullval|String that signifies NULL|already available| |-port portNumber| |CQL Port Number|TODO| |-user username| |Cassandra username|TODO| |-pw password| |Password for user|TODO| |-ssl-truststore-path path| |Path to SSL truststore|TODO| |-ssl-truststore-pw pwd| |Password for SSL truststore|TODO| |-ssl-keystore-path path| |Path to SSL keystore|TODO| |-ssl-keystore-pw pwd| |Password for SSL keystore|TODO| |consistencyLevel CL| |Consistency level|TODO| |decimalDelim decimalDelim| |Decimal delimiter|TODO| |boolStyle boolStyleString| |Style for booleans|TODO| |numThreads numThreads| |Number of concurrent threads to unload|TODO (numProcesses)| |beginToken tokenString| |Begin token|TODO| |endToken tokenString| |End token|TODO| > Match cassandra-loader options in COPY FROM > --- > > Key: CASSANDRA-9303 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9303 > Project: Cassandra > Issue Type: New Feature > Components: Tools >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 2.1.x > > > https://github.com/brianmhess/cassandra-loader added a bunch of options to > handle real world requirements, we should match those. -- This message was sent by Atlassian JIRA (v6.3.4#6332)