[jira] [Commented] (CASSANDRA-12519) dtest failure in offline_tools_test.TestOfflineTools.sstableofflinerelevel_test
[ https://issues.apache.org/jira/browse/CASSANDRA-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362918#comment-17362918 ] Stefania Alborghetti commented on CASSANDRA-12519: -- The plan proposed above makes sense to me. I also looked at the PR for trunk and it LGTM. Great job [~adelapena] and [~bereng]. > dtest failure in > offline_tools_test.TestOfflineTools.sstableofflinerelevel_test > --- > > Key: CASSANDRA-12519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12519 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/python >Reporter: Sean McCarthy >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-rc2, 4.0, 3.0.x, 3.11.x, 4.0-rc, 4.x > > Attachments: node1.log, node1_debug.log, node1_gc.log > > Time Spent: 1h > Remaining Estimate: 0h > > example failure: > http://cassci.datastax.com/job/trunk_offheap_dtest/379/testReport/offline_tools_test/TestOfflineTools/sstableofflinerelevel_test/ > {code} > Stacktrace > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/automaton/cassandra-dtest/offline_tools_test.py", line 209, in > sstableofflinerelevel_test > self.assertGreater(max(final_levels), 1) > File "/usr/lib/python2.7/unittest/case.py", line 942, in assertGreater > self.fail(self._formatMessage(msg, standardMsg)) > File "/usr/lib/python2.7/unittest/case.py", line 410, in fail > raise self.failureException(msg) > "1 not greater than 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12519) dtest failure in offline_tools_test.TestOfflineTools.sstableofflinerelevel_test
[ https://issues.apache.org/jira/browse/CASSANDRA-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362042#comment-17362042 ] Stefania Alborghetti commented on CASSANDRA-12519: -- {quote} I'm not familiarized with the lifecycle package so I'm not sure whether skipping the temporary sstables when resetting the levels is right, or whether the validation error that happens after changing the metadata is caused by a deeper problem. {quote} I would need to see the full reason why the transaction rejected a record and I wasn't able to find a full failure, but it must have failed the checksum verification because the metadata file is changed by the standalone tools, {{sstablelevelreset}} in our case. The transaction is checking if anything has tampered with a file guarded by it. This is done by {{LogFile.verify()}} and would also prevent a main Cassandra process from starting up. This is because there is some automated cleanup done on startup when {{LogTransaction.removeUnfinishedLeftovers()}} is called. Since we don't want to mistakenly delete files restored by users for example, we check using a checksum which is calculated from the files that existed when the transaction record was created. There are more checks but this is the main one and the one that I believe must have failed. So if anything changes any of these files, temporary or permanent, the transaction detects it. These two standalone tools change the sstable metadata and hence probably triggered it. I think it's reasonable to change {{sstablelevelreset}} to skip temporary files, because if the transaction did not complete, it's as if these files never existed. However, I don't think this is sufficient to fix the problem, because changing the old existing metadata files could also trigger a checksum error. So I may be wrong, but it seems to me that the real fix is to use the cleanup utility in the test, before running {{sstablelevelreset}} so that there are no left over transactions. If these two tools are likely to be used directly from users when the process is offline, as they seem to be, I believe that they should cleanup leftover transactions first, or at least issue a warning if there are any. Otherwise the main process may refuse to start for the same reason explained above. To cleanup leftovers we can simply call {{LifecycleTransaction.removeUnfinishedLeftovers(cfs)}} from the tool itself, before doing any work. We should consider a follow up to do this, or fix this directly in this ticket. If we fix this here, then we don't need to do this in the test. So you can either merge what you have and open a follow up, or add {{LifecycleTransaction.removeUnfinishedLeftovers(cfs)}}, as well as kipping the temporary files (which seems more correct to me), and see if this fixes it without changing the test. > dtest failure in > offline_tools_test.TestOfflineTools.sstableofflinerelevel_test > --- > > Key: CASSANDRA-12519 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12519 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest/python >Reporter: Sean McCarthy >Assignee: Andres de la Peña >Priority: Normal > Fix For: 4.0-rc2, 4.0, 3.0.x, 3.11.x, 4.0-rc, 4.x > > Attachments: node1.log, node1_debug.log, node1_gc.log > > > example failure: > http://cassci.datastax.com/job/trunk_offheap_dtest/379/testReport/offline_tools_test/TestOfflineTools/sstableofflinerelevel_test/ > {code} > Stacktrace > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/automaton/cassandra-dtest/offline_tools_test.py", line 209, in > sstableofflinerelevel_test > self.assertGreater(max(final_levels), 1) > File "/usr/lib/python2.7/unittest/case.py", line 942, in assertGreater > self.fail(self._formatMessage(msg, standardMsg)) > File "/usr/lib/python2.7/unittest/case.py", line 410, in fail > raise self.failureException(msg) > "1 not greater than 1 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15702) copyutil.py uses Exception.message which doesn't exist in Python 3
[ https://issues.apache.org/jira/browse/CASSANDRA-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15702: - Since Version: 4.x Source Control Link: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commit;h=58015fd681dadbe31068ecfb4f4c0eb506de8efa Resolution: Fixed Status: Resolved (was: Ready to Commit) CI was fine, the only error was a code style compliance issue (missing blank line) introduced by CASSANDRA-15679, which I fixed on commit. > copyutil.py uses Exception.message which doesn't exist in Python 3 > -- > > Key: CASSANDRA-15702 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15702 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Eduard Tudenhoefner >Assignee: Eduard Tudenhoefner >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > Python 3 deprecated and removed the use of Exception.message. The purpose of > this ticket is to convert *Exception.message* to something that is Python 3 > compatible -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15702) copyutil.py uses Exception.message which doesn't exist in Python 3
[ https://issues.apache.org/jira/browse/CASSANDRA-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15702: - Status: Ready to Commit (was: Review In Progress) > copyutil.py uses Exception.message which doesn't exist in Python 3 > -- > > Key: CASSANDRA-15702 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15702 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Eduard Tudenhoefner >Assignee: Eduard Tudenhoefner >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > Python 3 deprecated and removed the use of Exception.message. The purpose of > this ticket is to convert *Exception.message* to something that is Python 3 > compatible -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15702) copyutil.py uses Exception.message which doesn't exist in Python 3
[ https://issues.apache.org/jira/browse/CASSANDRA-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079420#comment-17079420 ] Stefania Alborghetti commented on CASSANDRA-15702: -- Patch looks good, running cqlsh tests here: [53|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/53/] > copyutil.py uses Exception.message which doesn't exist in Python 3 > -- > > Key: CASSANDRA-15702 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15702 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Eduard Tudenhoefner >Assignee: Eduard Tudenhoefner >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > Python 3 deprecated and removed the use of Exception.message. The purpose of > this ticket is to convert *Exception.message* to something that is Python 3 > compatible -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15702) copyutil.py uses Exception.message which doesn't exist in Python 3
[ https://issues.apache.org/jira/browse/CASSANDRA-15702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15702: - Reviewers: Stefania Alborghetti, Stefania Alborghetti (was: Stefania Alborghetti) Stefania Alborghetti, Stefania Alborghetti Status: Review In Progress (was: Patch Available) > copyutil.py uses Exception.message which doesn't exist in Python 3 > -- > > Key: CASSANDRA-15702 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15702 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Eduard Tudenhoefner >Assignee: Eduard Tudenhoefner >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-alpha > > Time Spent: 10m > Remaining Estimate: 0h > > Python 3 deprecated and removed the use of Exception.message. The purpose of > this ticket is to convert *Exception.message* to something that is Python 3 > compatible -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15679) cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: 'bytearray'"
[ https://issues.apache.org/jira/browse/CASSANDRA-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079338#comment-17079338 ] Stefania Alborghetti edited comment on CASSANDRA-15679 at 4/9/20, 1:32 PM: --- CI runs: [28|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/28/] [30|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/30/] [31|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/31/] [34|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/34/] [35|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/35/] Dtest changes: [88cc70b4d6b598c60d4b567cdceff49a76140312|https://github.com/apache/cassandra-dtest/commit/88cc70b4d6b598c60d4b567cdceff49a76140312] was (Author: stefania): CI runs: [28|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/28/] [30|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/30/] [31|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/31/] [34|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/34/] [[35|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/35/|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/35/]] Dtest changes: [88cc70b4d6b598c60d4b567cdceff49a76140312|https://github.com/apache/cassandra-dtest/commit/88cc70b4d6b598c60d4b567cdceff49a76140312] > cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: > 'bytearray'" > - > > Key: CASSANDRA-15679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15679 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Erick Ramirez >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 2.2.17, 3.11.7, 2.1.x, 4.0-alpha > > Time Spent: 2h 20m > Remaining Estimate: 0h > > h2. Background > A user was having issues loading CSV data with the {{COPY FROM}} command into > a {{map}} column with {{blob}} values. > h2. Replication steps > I can easily replicate the problem with this simple table: > {noformat} > CREATE TABLE community.blobmaptable ( > id text PRIMARY KEY, > blobmapcol map > ) > {noformat} > I have this CSV file that contains just 1 row: > {noformat} > $ cat blobmap.csv > c3,{3: 0x74776f} > {noformat} > And here's the error when I try to load it: > {noformat} > cqlsh:community> COPY blobmaptable (id, blobmapcol) FROM '~/blobmap.csv' ; > Using 1 child processes > Starting copy of community.blobmaptable with columns [id, blobmapcol]. > Failed to import 1 rows: ParseError - Failed to parse {3: 0x74776f} : > unhashable type: 'bytearray', given up without retries > Failed to process 1 rows; failed rows written to > import_community_blobmaptable.err > Processed: 1 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s > 1 rows imported from 1 files in 0.389 seconds (0 skipped). > {noformat} > I've also logged > [PYTHON-1234|https://datastax-oss.atlassian.net/browse/PYTHON-1234] because I > wasn't sure if it was a Python driver issue. Cheers! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15679) cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: 'bytearray'"
[ https://issues.apache.org/jira/browse/CASSANDRA-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15679: - Fix Version/s: (was: 3.0.21) (was: 2.1.21) (was: 4.x) 4.0-alpha 2.1.x Since Version: 2.1.21 Source Control Link: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commit;h=f3568c0d50ac7573b53f0043b2567bde3b39bee8 Resolution: Fixed Status: Resolved (was: Ready to Commit) CI runs: [28|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/28/] [30|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/30/] [31|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/31/] [34|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/34/] [[35|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/35/|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/35/]] Dtest changes: [88cc70b4d6b598c60d4b567cdceff49a76140312|https://github.com/apache/cassandra-dtest/commit/88cc70b4d6b598c60d4b567cdceff49a76140312] > cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: > 'bytearray'" > - > > Key: CASSANDRA-15679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15679 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Erick Ramirez >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 2.2.17, 3.11.7, 2.1.x, 4.0-alpha > > Time Spent: 2h 20m > Remaining Estimate: 0h > > h2. Background > A user was having issues loading CSV data with the {{COPY FROM}} command into > a {{map}} column with {{blob}} values. > h2. Replication steps > I can easily replicate the problem with this simple table: > {noformat} > CREATE TABLE community.blobmaptable ( > id text PRIMARY KEY, > blobmapcol map > ) > {noformat} > I have this CSV file that contains just 1 row: > {noformat} > $ cat blobmap.csv > c3,{3: 0x74776f} > {noformat} > And here's the error when I try to load it: > {noformat} > cqlsh:community> COPY blobmaptable (id, blobmapcol) FROM '~/blobmap.csv' ; > Using 1 child processes > Starting copy of community.blobmaptable with columns [id, blobmapcol]. > Failed to import 1 rows: ParseError - Failed to parse {3: 0x74776f} : > unhashable type: 'bytearray', given up without retries > Failed to process 1 rows; failed rows written to > import_community_blobmaptable.err > Processed: 1 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s > 1 rows imported from 1 files in 0.389 seconds (0 skipped). > {noformat} > I've also logged > [PYTHON-1234|https://datastax-oss.atlassian.net/browse/PYTHON-1234] because I > wasn't sure if it was a Python driver issue. Cheers! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15679) cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: 'bytearray'"
[ https://issues.apache.org/jira/browse/CASSANDRA-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15679: - Status: Ready to Commit (was: Review In Progress) > cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: > 'bytearray'" > - > > Key: CASSANDRA-15679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15679 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Erick Ramirez >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 2.1.21, 2.2.17, 3.0.21, 3.11.7, 4.x > > Time Spent: 2h > Remaining Estimate: 0h > > h2. Background > A user was having issues loading CSV data with the {{COPY FROM}} command into > a {{map}} column with {{blob}} values. > h2. Replication steps > I can easily replicate the problem with this simple table: > {noformat} > CREATE TABLE community.blobmaptable ( > id text PRIMARY KEY, > blobmapcol map > ) > {noformat} > I have this CSV file that contains just 1 row: > {noformat} > $ cat blobmap.csv > c3,{3: 0x74776f} > {noformat} > And here's the error when I try to load it: > {noformat} > cqlsh:community> COPY blobmaptable (id, blobmapcol) FROM '~/blobmap.csv' ; > Using 1 child processes > Starting copy of community.blobmaptable with columns [id, blobmapcol]. > Failed to import 1 rows: ParseError - Failed to parse {3: 0x74776f} : > unhashable type: 'bytearray', given up without retries > Failed to process 1 rows; failed rows written to > import_community_blobmaptable.err > Processed: 1 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s > 1 rows imported from 1 files in 0.389 seconds (0 skipped). > {noformat} > I've also logged > [PYTHON-1234|https://datastax-oss.atlassian.net/browse/PYTHON-1234] because I > wasn't sure if it was a Python driver issue. Cheers! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14050) Many cqlsh_copy_tests are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-14050: - Fix Version/s: 4.0-alpha Since Version: 4.x Source Control Link: https://github.com/apache/cassandra-dtest/commit/da8abe3cab3fc186a6cfb2e3771f647a0dac120e Resolution: Fixed Status: Resolved (was: Ready to Commit) > Many cqlsh_copy_tests are busted > > > Key: CASSANDRA-14050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14050 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Michael Kjellman >Assignee: Stefania Alborghetti >Priority: Normal > Fix For: 4.0-alpha > > > Many cqlsh_copy_tests are busted. We should disable the entire suite until > this is resolved as these tests are currently nothing but a waste of time. > test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_blogposts_with_max_connections - > cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'memtable_allocation_type': 'offheap_objects', > 'num_tokens': '256', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2546, in test_bulk_round_trip_blogposts > stress_table='stresscql.blogposts') > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2451, in _test_bulk_round_trip > self.prepare(nodes=nodes, partitioner=partitioner, > configuration_options=configuration_options) > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 115, in prepare > self.cluster.populate(nodes, > tokens=tokens).start(wait_for_binary_proto=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise NodeError("Error starting {0}.".format(node.name), p) > "Error starting node3.\n >> begin captured logging << > \ndtest: DEBUG: cluster ccm directory: > /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{ > 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n > 'num_tokens': '256',\n'phi_convict_threshold': 5,\n > 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': > 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14050) Many cqlsh_copy_tests are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-14050: - Status: Ready to Commit (was: Review In Progress) > Many cqlsh_copy_tests are busted > > > Key: CASSANDRA-14050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14050 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Michael Kjellman >Assignee: Stefania Alborghetti >Priority: Normal > > Many cqlsh_copy_tests are busted. We should disable the entire suite until > this is resolved as these tests are currently nothing but a waste of time. > test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_blogposts_with_max_connections - > cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'memtable_allocation_type': 'offheap_objects', > 'num_tokens': '256', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2546, in test_bulk_round_trip_blogposts > stress_table='stresscql.blogposts') > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2451, in _test_bulk_round_trip > self.prepare(nodes=nodes, partitioner=partitioner, > configuration_options=configuration_options) > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 115, in prepare > self.cluster.populate(nodes, > tokens=tokens).start(wait_for_binary_proto=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise NodeError("Error starting {0}.".format(node.name), p) > "Error starting node3.\n >> begin captured logging << > \ndtest: DEBUG: cluster ccm directory: > /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{ > 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n > 'num_tokens': '256',\n'phi_convict_threshold': 5,\n > 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': > 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077539#comment-17077539 ] Stefania Alborghetti commented on CASSANDRA-15229: -- {quote}My view is that having a significant proportion of memory wasted to fragmentation is a serious bug, irregardless of the total amount of memory that is wasted. {quote} That's absolutely true. However, it's also true that none of our users reported any problems when the cache was 512 MB and the default file access mode was mmap. Perhaps there are users in open source that reported problems, I haven't done a Jira search. So my point was simply meant to say that we should be mindful of changing critical code late in a release cycle if the existing code is performing adequately. {quote}It's not poorly suited to long lived buffers its it? Only to buffers with widely divergent lifetimes. {quote} I implied the fact that lifetimes are divergent, since we're trying to support a cache, sorry about the confusion. {quote}Honestly, given chunks are normally the same size, simply re-using the evicted buffer if possible, and if not allocating new system memory, seems probably sufficient to me. {quote} I'm not too sure that chunks are normally the same size. For data files, they depend on the compression parameters or on the partition sizes, both could be different for different tables. Also, indexes would use different chunk sizes surely? We observed that the chunk cache gradually tends to shift from buffers coming from data files to buffers coming from index files, as indexes are accessed more frequently. We have a different index implementation though. {quote}{quote}I'll try to share some code so you can have a clearer picture. {quote} Thanks, that sounds great. I may not get to it immediately, but look forward to taking a look hopefully soon. {quote} I've dropped some files on this [branch|https://github.com/stef1927/cassandra/tree/15229-4.0]. The buffer pool is in org.apache.cassandra.utils.memory.buffers. The starting point is the [BufferPool|https://github.com/apache/cassandra/compare/trunk...stef1927:15229-4.0#diff-72046b5d367f6e120594b58c973bed71R24] and its concrete implementations or the [BufferFactory|https://github.com/apache/cassandra/compare/trunk...stef1927:15229-4.0#diff-4fc5fae1de112fc5eb0bd865af532f0aR31]. I've also dropped some related utility classes but not all of them, so clearly the code doesn't compile and the unit tests are also missing. > BufferPool Regression > - > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0, 4.0-beta > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14050) Many cqlsh_copy_tests are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-14050: - Test and Documentation Plan: [https://github.com/apache/cassandra-dtest/pull/62] The dependency of the dtests to cqlshlib formatting.py has been removed by running {{SELECT *}} queries with cqlsh rather than the driver. This avoids applying the formatting manually. In some cases, where formatting was trivial and the number of rows was significant, the driver is still used and the formatting is done manually (normally a conversion to a string was sufficient). This was done to overcome cqlsh paging. Although we can disable paging, parsing a large number of rows could cause memory or speed problems. Status: Patch Available (was: In Progress) I've submitted a [patch|https://github.com/apache/cassandra-dtest/pull/62] based on the third approach, see documentation plan. CI runs: * [2.1|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/25] * [2.2|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/26] * [3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/27] * [3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/23] * [4.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch-dtest/24] > Many cqlsh_copy_tests are busted > > > Key: CASSANDRA-14050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14050 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Michael Kjellman >Assignee: Stefania Alborghetti >Priority: Normal > > Many cqlsh_copy_tests are busted. We should disable the entire suite until > this is resolved as these tests are currently nothing but a waste of time. > test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_blogposts_with_max_connections - > cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'memtable_allocation_type': 'offheap_objects', > 'num_tokens': '256', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2546, in test_bulk_round_trip_blogposts > stress_table='stresscql.blogposts') > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2451, in _test_bulk_round_trip > self.prepare(nodes=nodes, partitioner=partitioner, > configuration_options=configuration_options) > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 115, in prepare > self.cluster.populate(nodes, > tokens=tokens).start(wait_for_binary_proto=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise NodeError("Error starting {0}.".format(node.name), p) > "Error starting node3.\n >> begin captured logging << > \ndtest: DEBUG: cluster ccm directory: > /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{ > 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n > 'num_tokens': '256',\n'phi_convict_threshold': 5,\n > 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': > 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14050) Many cqlsh_copy_tests are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti reassigned CASSANDRA-14050: Assignee: Stefania Alborghetti (was: Sam Sriramadhesikan) > Many cqlsh_copy_tests are busted > > > Key: CASSANDRA-14050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14050 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Michael Kjellman >Assignee: Stefania Alborghetti >Priority: Normal > > Many cqlsh_copy_tests are busted. We should disable the entire suite until > this is resolved as these tests are currently nothing but a waste of time. > test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_blogposts_with_max_connections - > cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'memtable_allocation_type': 'offheap_objects', > 'num_tokens': '256', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2546, in test_bulk_round_trip_blogposts > stress_table='stresscql.blogposts') > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2451, in _test_bulk_round_trip > self.prepare(nodes=nodes, partitioner=partitioner, > configuration_options=configuration_options) > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 115, in prepare > self.cluster.populate(nodes, > tokens=tokens).start(wait_for_binary_proto=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise NodeError("Error starting {0}.".format(node.name), p) > "Error starting node3.\n >> begin captured logging << > \ndtest: DEBUG: cluster ccm directory: > /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{ > 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n > 'num_tokens': '256',\n'phi_convict_threshold': 5,\n > 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': > 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076347#comment-17076347 ] Stefania Alborghetti commented on CASSANDRA-15229: -- bq. The current implementation isn't really a bump the pointer allocator? It's bitmap based, though with a very tiny bitmap. Sorry it's been a while. Of course the current implementation is also bitmap based. The point is that it is not suitable for long lived buffers, similarly to our bump the pointer strategy. The transient case is easy to solve, either approach would work. bq. Could you elaborate on how these work, as my intuition is that anything designed for a thread-per-core architecture probably won't translate so well to the present state of the world. Though, either way, I suppose this is probably orthogonal to this ticket as we only need to address the {{ChunkCache}} part. The thread-per-core architecture makes it easy to identify threads that do most of the work and cause most of the contention. However, thread identification can be achieved also with thread pools or we can simply give all threads a local stash of buffers, provided that we return it when the thread dies. I don't think there is any other dependency on TPC beyond this. The design choice was mostly dictated by the size of the cache: with AIO reads the OS page cache is bypassed, and the chunk cache needs therefore to be very large, which is not the case if we use Java NIO reads or if we eventually implement asynchronous reads with the new uring API, bypassing AIO completely (which I do recommend). bq. We also optimized the chunk cache to store memory addresses rather than byte buffers, which significantly reduced heap usage. The byte buffers are materialized on the fly. bq. This would be a huge improvement, and a welcome backport if it is easy - though it might (I would guess) depend on Unsafe, which may be going away soon. It's orthogonal to this ticket, though, I think Yes it's based on the Unsafe. The addresses come from the slabs, and then we use the Unsafe to create hollow buffers and to set the address. This is an optimization and it clearly belongs to a separate ticket. {quote} We changed the chunk cache to always store buffers of the same size. We have global lists of these slabs, sorted by buffer size where each size is a power-of-two. How do these two statements reconcile? {quote} So let's assume the current workload is mostly on a table with 4k chunks, which translate to 4k buffers in the cache. Let's also assume that the workload is shifting towards another table, with 8k chunks. Alternatively, let's assume compression is ON, and an ALTER TABLE changes the chunk size. So now the chunk cache is slowly evicting 4k buffers and retaining 8k buffers. These buffers come from two different lists: the list of slabs serving 4k and the list serving 8k. Even if we collect all unused 4k slabs, until each slab has every single buffer returned, there will be wasted memory and we do not control how long that will take. To be fair, it's an extreme case, and we were perhaps over cautions in addressing this possibility by fixing the size of buffers in the cache. So it's possible that the redesigned buffer pool may work even with the current chunk cache implementation. bq. Is it your opinion that your entire ChunkCache implementation can be dropped wholesale into 4.0? I would assume it is still primarily multi-threaded. If so, it might be preferable to trying to fix the existing ChunkCache The changes to the chunk cache are not trivial and should be left as a follow up for 4.x or later in my opinion. The changes to the buffer pool can be dropped in 4.0 if you think that: - they are safe even in the presence of the case described above. - they are justified: memory wasted due to fragmentation is perhaps not an issue with a cache as little as 512 MB I'll try to share some code so you can have a clearer picture. > BufferPool Regression > - > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0, 4.0-beta > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do
[jira] [Updated] (CASSANDRA-15672) Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec
[ https://issues.apache.org/jira/browse/CASSANDRA-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15672: - Since Version: 4.0-alpha Source Control Link: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commit;h=b4e640a96e76f8d4a45937b1312b64ddc1aeb8ac Resolution: Fixed Status: Resolved (was: Ready to Commit) CI link: https://jenkins-cm4.apache.org/view/patches/job/Cassandra-devbranch/16/ Committed as [b4e640a96e76f8d4a45937b1312b64ddc1aeb8ac|https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commit;h=b4e640a96e76f8d4a45937b1312b64ddc1aeb8ac] > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > - > > Key: CASSANDRA-15672 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15672 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > The following failure was observed: > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > [junit-timeout] > [junit-timeout] Testcase: > testMockedMessagingPrepareFailureP1(org.apache.cassandra.repair.consistent.CoordinatorMessagingTest): >FAILED > [junit-timeout] null > [junit-timeout] junit.framework.AssertionFailedError > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailure(CoordinatorMessagingTest.java:206) > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailureP1(CoordinatorMessagingTest.java:154) > [junit-timeout] > [junit-timeout] > Seen on Java8 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15672) Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec
[ https://issues.apache.org/jira/browse/CASSANDRA-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15672: - Status: Ready to Commit (was: Review In Progress) > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > - > > Key: CASSANDRA-15672 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15672 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > The following failure was observed: > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > [junit-timeout] > [junit-timeout] Testcase: > testMockedMessagingPrepareFailureP1(org.apache.cassandra.repair.consistent.CoordinatorMessagingTest): >FAILED > [junit-timeout] null > [junit-timeout] junit.framework.AssertionFailedError > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailure(CoordinatorMessagingTest.java:206) > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailureP1(CoordinatorMessagingTest.java:154) > [junit-timeout] > [junit-timeout] > Seen on Java8 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14050) Many cqlsh_copy_tests are busted
[ https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17074829#comment-17074829 ] Stefania Alborghetti commented on CASSANDRA-14050: -- [~sasrira] are you still working on this? If not, [~Ge] and I would like to take over. We would like to fix these tests before merging CASSANDRA-15679. The reason for the failures is that the _cqlsh copy tests.py_ links to _cqlshlib/formatting.py_. It needs this in order to apply the identical formatting used by cqlsh and determine if the data obtained via {{self.session.execute("SELECT * FROM testtuple")}} matches the data in the csv files. Since cqlshlib on trunk supports both python 3 and python 2, then the cqlsh copy tests work for trunk. But for older branches that only support python 2, the tests no longer work. So to fix the tests we would need to make cqlshlib support both python 2 and python 3, at least as far as _formatting.py_. There is a problem with this approach though: this code is mostly tested via dtests, which only support python 3 (I assume this is the case because of the dependency on Python 3) and therefore, how would we know if we broke anything for Python 2? Maybe we could run the dtests from before the migration to python 3, hoping that they still work. Another approach would be to copy formatting.py into the dtests repo for the older branches but this is quite ugly. Lastly, there is the option of removing the dependency to _formatting.py_. I think we could try replacing {{self.session.execute("SELECT * FROM testtuple")}} with the equivalent cqlsh command and see if that works. > Many cqlsh_copy_tests are busted > > > Key: CASSANDRA-14050 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14050 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Testing >Reporter: Michael Kjellman >Assignee: Sam Sriramadhesikan >Priority: Normal > > Many cqlsh_copy_tests are busted. We should disable the entire suite until > this is resolved as these tests are currently nothing but a waste of time. > test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_blogposts_with_max_connections - > cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest > Error starting node3. > >> begin captured logging << > dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH > dtest: DEBUG: Done setting configuration options: > { 'initial_token': None, > 'memtable_allocation_type': 'offheap_objects', > 'num_tokens': '256', > 'phi_convict_threshold': 5, > 'range_request_timeout_in_ms': 1, > 'read_request_timeout_in_ms': 1, > 'request_timeout_in_ms': 1, > 'truncate_request_timeout_in_ms': 1, > 'write_request_timeout_in_ms': 1} > - >> end captured logging << - > File "/usr/lib/python2.7/unittest/case.py", line 329, in run > testMethod() > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2546, in test_bulk_round_trip_blogposts > stress_table='stresscql.blogposts') > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 2451, in _test_bulk_round_trip > self.prepare(nodes=nodes, partitioner=partitioner, > configuration_options=configuration_options) > File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", > line 115, in prepare > self.cluster.populate(nodes, > tokens=tokens).start(wait_for_binary_proto=True) > File > "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", > line 423, in start > raise NodeError("Error starting {0}.".format(node.name), p) > "Error starting node3.\n >> begin captured logging << > \ndtest: DEBUG: cluster ccm directory: > /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{ > 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n > 'num_tokens': '256',\n'phi_convict_threshold': 5,\n > 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': > 1,\n'request_timeout_in_ms': 1,\n > 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': > 1}\n- >> end captured logging << > -" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15672) Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec
[ https://issues.apache.org/jira/browse/CASSANDRA-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15672: - Status: Review In Progress (was: Changes Suggested) > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > - > > Key: CASSANDRA-15672 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15672 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > The following failure was observed: > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > [junit-timeout] > [junit-timeout] Testcase: > testMockedMessagingPrepareFailureP1(org.apache.cassandra.repair.consistent.CoordinatorMessagingTest): >FAILED > [junit-timeout] null > [junit-timeout] junit.framework.AssertionFailedError > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailure(CoordinatorMessagingTest.java:206) > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailureP1(CoordinatorMessagingTest.java:154) > [junit-timeout] > [junit-timeout] > Seen on Java8 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15679) cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: 'bytearray'"
[ https://issues.apache.org/jira/browse/CASSANDRA-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15679: - Reviewers: Stefania Alborghetti, Stefania Alborghetti (was: Stefania Alborghetti) Stefania Alborghetti, Stefania Alborghetti Status: Review In Progress (was: Patch Available) > cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: > 'bytearray'" > - > > Key: CASSANDRA-15679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15679 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Erick Ramirez >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 2.1.21, 2.2.17, 3.0.21, 3.11.7, 4.x > > Time Spent: 1h > Remaining Estimate: 0h > > h2. Background > A user was having issues loading CSV data with the {{COPY FROM}} command into > a {{map}} column with {{blob}} values. > h2. Replication steps > I can easily replicate the problem with this simple table: > {noformat} > CREATE TABLE community.blobmaptable ( > id text PRIMARY KEY, > blobmapcol map > ) > {noformat} > I have this CSV file that contains just 1 row: > {noformat} > $ cat blobmap.csv > c3,{3: 0x74776f} > {noformat} > And here's the error when I try to load it: > {noformat} > cqlsh:community> COPY blobmaptable (id, blobmapcol) FROM '~/blobmap.csv' ; > Using 1 child processes > Starting copy of community.blobmaptable with columns [id, blobmapcol]. > Failed to import 1 rows: ParseError - Failed to parse {3: 0x74776f} : > unhashable type: 'bytearray', given up without retries > Failed to process 1 rows; failed rows written to > import_community_blobmaptable.err > Processed: 1 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s > 1 rows imported from 1 files in 0.389 seconds (0 skipped). > {noformat} > I've also logged > [PYTHON-1234|https://datastax-oss.atlassian.net/browse/PYTHON-1234] because I > wasn't sure if it was a Python driver issue. Cheers! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15679) cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: 'bytearray'"
[ https://issues.apache.org/jira/browse/CASSANDRA-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15679: - Test and Documentation Plan: This bug is triggered when we try to deserialize a blob inside a container, such as list, set, or map. copyutil stores blobs as bytearrays. Python bytearrays are mutable and not hashable data structures. In Python 3 there is bytes type that is exactly what we need in this situation. On older versions that do not support Python 3 and in case we run cqlsh with Python 2 at 4.0, we should fall back to str. https://github.com/apache/cassandra/pull/506 https://github.com/apache/cassandra/pull/507 https://github.com/apache/cassandra/pull/508 https://github.com/apache/cassandra/pull/509 https://github.com/apache/cassandra/pull/510 Status: Patch Available (was: Open) > cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: > 'bytearray'" > - > > Key: CASSANDRA-15679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15679 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Erick Ramirez >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 2.1.21, 2.2.17, 3.0.21, 3.11.7, 4.x > > Time Spent: 1h > Remaining Estimate: 0h > > h2. Background > A user was having issues loading CSV data with the {{COPY FROM}} command into > a {{map}} column with {{blob}} values. > h2. Replication steps > I can easily replicate the problem with this simple table: > {noformat} > CREATE TABLE community.blobmaptable ( > id text PRIMARY KEY, > blobmapcol map > ) > {noformat} > I have this CSV file that contains just 1 row: > {noformat} > $ cat blobmap.csv > c3,{3: 0x74776f} > {noformat} > And here's the error when I try to load it: > {noformat} > cqlsh:community> COPY blobmaptable (id, blobmapcol) FROM '~/blobmap.csv' ; > Using 1 child processes > Starting copy of community.blobmaptable with columns [id, blobmapcol]. > Failed to import 1 rows: ParseError - Failed to parse {3: 0x74776f} : > unhashable type: 'bytearray', given up without retries > Failed to process 1 rows; failed rows written to > import_community_blobmaptable.err > Processed: 1 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s > 1 rows imported from 1 files in 0.389 seconds (0 skipped). > {noformat} > I've also logged > [PYTHON-1234|https://datastax-oss.atlassian.net/browse/PYTHON-1234] because I > wasn't sure if it was a Python driver issue. Cheers! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15679) cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: 'bytearray'"
[ https://issues.apache.org/jira/browse/CASSANDRA-15679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15679: - Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear Impact(13164) Complexity: Normal Discovered By: User Report Severity: Low Status: Open (was: Triage Needed) > cqlsh COPY FROM of map of blobs fails with parse error "unhashable type: > 'bytearray'" > - > > Key: CASSANDRA-15679 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15679 > Project: Cassandra > Issue Type: Bug > Components: Tool/cqlsh >Reporter: Erick Ramirez >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 2.2.17, 3.0.21, 3.11.7, 4.x, 2.1.21 > > Time Spent: 1h > Remaining Estimate: 0h > > h2. Background > A user was having issues loading CSV data with the {{COPY FROM}} command into > a {{map}} column with {{blob}} values. > h2. Replication steps > I can easily replicate the problem with this simple table: > {noformat} > CREATE TABLE community.blobmaptable ( > id text PRIMARY KEY, > blobmapcol map > ) > {noformat} > I have this CSV file that contains just 1 row: > {noformat} > $ cat blobmap.csv > c3,{3: 0x74776f} > {noformat} > And here's the error when I try to load it: > {noformat} > cqlsh:community> COPY blobmaptable (id, blobmapcol) FROM '~/blobmap.csv' ; > Using 1 child processes > Starting copy of community.blobmaptable with columns [id, blobmapcol]. > Failed to import 1 rows: ParseError - Failed to parse {3: 0x74776f} : > unhashable type: 'bytearray', given up without retries > Failed to process 1 rows; failed rows written to > import_community_blobmaptable.err > Processed: 1 rows; Rate: 2 rows/s; Avg. rate: 3 rows/s > 1 rows imported from 1 files in 0.389 seconds (0 skipped). > {noformat} > I've also logged > [PYTHON-1234|https://datastax-oss.atlassian.net/browse/PYTHON-1234] because I > wasn't sure if it was a Python driver issue. Cheers! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15672) Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec
[ https://issues.apache.org/jira/browse/CASSANDRA-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15672: - Status: Changes Suggested (was: Review In Progress) We can try to synchronize the test by using a latch when messages are intercepted. If this approach fails, we can fall back to relaxing the intermediate checks. > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > - > > Key: CASSANDRA-15672 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15672 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > The following failure was observed: > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > [junit-timeout] > [junit-timeout] Testcase: > testMockedMessagingPrepareFailureP1(org.apache.cassandra.repair.consistent.CoordinatorMessagingTest): >FAILED > [junit-timeout] null > [junit-timeout] junit.framework.AssertionFailedError > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailure(CoordinatorMessagingTest.java:206) > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailureP1(CoordinatorMessagingTest.java:154) > [junit-timeout] > [junit-timeout] > Seen on Java8 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15672) Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec
[ https://issues.apache.org/jira/browse/CASSANDRA-15672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15672: - Reviewers: Stefania Alborghetti, Stefania Alborghetti (was: Stefania Alborghetti) Stefania Alborghetti, Stefania Alborghetti Status: Review In Progress (was: Patch Available) > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > - > > Key: CASSANDRA-15672 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15672 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: Ekaterina Dimitrova >Assignee: Aleksandr Sorokoumov >Priority: Normal > Labels: pull-request-available > Fix For: 4.0, 4.0-beta > > Time Spent: 10m > Remaining Estimate: 0h > > The following failure was observed: > Testsuite: org.apache.cassandra.repair.consistent.CoordinatorMessagingTest > Tests run: 7, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.878 sec > [junit-timeout] > [junit-timeout] Testcase: > testMockedMessagingPrepareFailureP1(org.apache.cassandra.repair.consistent.CoordinatorMessagingTest): >FAILED > [junit-timeout] null > [junit-timeout] junit.framework.AssertionFailedError > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailure(CoordinatorMessagingTest.java:206) > [junit-timeout] at > org.apache.cassandra.repair.consistent.CoordinatorMessagingTest.testMockedMessagingPrepareFailureP1(CoordinatorMessagingTest.java:154) > [junit-timeout] > [junit-timeout] > Seen on Java8 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072895#comment-17072895 ] Stefania Alborghetti edited comment on CASSANDRA-15229 at 4/1/20, 4:14 PM: --- We hit this buffer pool regression problem in our DSE fork a while ago. Because our chunk cache became much larger when it replaced the OS page cache, off-heap memory was growing significantly beyond the limits configured. This was partly due to some leaks, but the fragmentation in the current design of the buffer pool was a big part of it. This is how we solved it: - a bump-the-pointer slab approach for the transient pool, not to dissimilar from the current implementation. We then exploit our thread per core architecture: core threads get a dedicated slab each, other threads share a global slab. - a bitmap-based slab approach for the permanent pool, which is only used by the chunk cache. These slabs can only issue buffers of the same size, one bit is flipped in the bitmap for each buffer issued. When multiple buffers are requested, the slab tries to issue consecutive addresses but this is not guaranteed since we want to avoid memory fragmentation. We have global lists of these slabs, sorted by buffer size where each size is a power-of-two. Slabs are taken out of these lists when they are full, and they are put back into circulation when they have space available. The lists are global but core threads get a thread-local stash of buffers, i.e. they request multiple buffers at the same time in order to reduce contention on the global lists. We changed the chunk cache to always store buffers of the same size. If we need to read chunks of a different size, we use an array of buffers in the cache and we request multiple buffers at the same time. If we get consecutive addresses, we optimize for this case by building a single byte buffer over the first address. We also optimized the chunk cache to store memory addresses rather than byte buffers, which significantly reduced heap usage. The byte buffers are materialized on the fly. For the permanent case, we made the choice of constraining the size of the buffers in the cache so that memory in the pool could be fully used. This may or may not be what people prefer. Our choice was due to the large size of the cache, 20+ GB. An approach that allows some memory fragmentation may be sufficient for smaller cache sizes. Please let me know if there is interest in porting this solution to 4.0 or 4.x. I can share the code if needed. was (Author: stefania): We hit this buffer pool regression problem in our DSE fork a while ago. Because our chunk cache became much larger when it replaced the OS page cache, off-heap memory was growing significantly beyond the limits configured. This was partly due to some leaks, but the fragmentation in the current design of the buffer pool was a big part of it. This is how we solved it: - a bump-the-pointer slab approach for the transient pool, not to dissimilar from the current implementation. We then exploit our thread per core architecture: core threads get a dedicated slab each, other threads share a global slab. - a bitmap-based slab approach for the permanent pool, which is only used by the chunk cache. These slabs can only issue buffers of the same size, one bit is flipped in the bitmap for each buffer issued. When multiple buffers are requested, the slab tries to issue consecutive addresses but this is not guaranteed since we want to avoid memory fragmentation. We have global lists of these slabs, sorted by buffer size where each size is a power-of-two. Slabs are taken out of these lists when they are full, and they are put back into circulation when they have space available. The lists are global but core threads get a thread-local stash of buffers, i.e. they request multiple buffers at the same time in order to reduce contention on the global lists. We changed the chunk cache to always store chunks of the same size. If we need to read chunks of a different size, we use an array of buffers in the cache and we request multiple buffers at the same time. If we get consecutive addresses, we optimize for this case by building a single byte buffer over the first address. We also optimized the chunk cache to store memory addresses rather than byte buffers, which significantly reduced heap usage. The byte buffers are materialized on the fly. For the permanent case, we made the choice of constraining the size of the buffers in the cache so that memory in the pool could be fully used. This may or may not be what people prefer. Our choice was due to the large size of the cache, 20+ GB. An approach that allows some memory fragmentation may be sufficient for smaller cache sizes. Please let me know if there is interest in porting this solution to 4.0 or 4.x. I can
[jira] [Commented] (CASSANDRA-15229) BufferPool Regression
[ https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072895#comment-17072895 ] Stefania Alborghetti commented on CASSANDRA-15229: -- We hit this buffer pool regression problem in our DSE fork a while ago. Because our chunk cache became much larger when it replaced the OS page cache, off-heap memory was growing significantly beyond the limits configured. This was partly due to some leaks, but the fragmentation in the current design of the buffer pool was a big part of it. This is how we solved it: - a bump-the-pointer slab approach for the transient pool, not to dissimilar from the current implementation. We then exploit our thread per core architecture: core threads get a dedicated slab each, other threads share a global slab. - a bitmap-based slab approach for the permanent pool, which is only used by the chunk cache. These slabs can only issue buffers of the same size, one bit is flipped in the bitmap for each buffer issued. When multiple buffers are requested, the slab tries to issue consecutive addresses but this is not guaranteed since we want to avoid memory fragmentation. We have global lists of these slabs, sorted by buffer size where each size is a power-of-two. Slabs are taken out of these lists when they are full, and they are put back into circulation when they have space available. The lists are global but core threads get a thread-local stash of buffers, i.e. they request multiple buffers at the same time in order to reduce contention on the global lists. We changed the chunk cache to always store chunks of the same size. If we need to read chunks of a different size, we use an array of buffers in the cache and we request multiple buffers at the same time. If we get consecutive addresses, we optimize for this case by building a single byte buffer over the first address. We also optimized the chunk cache to store memory addresses rather than byte buffers, which significantly reduced heap usage. The byte buffers are materialized on the fly. For the permanent case, we made the choice of constraining the size of the buffers in the cache so that memory in the pool could be fully used. This may or may not be what people prefer. Our choice was due to the large size of the cache, 20+ GB. An approach that allows some memory fragmentation may be sufficient for smaller cache sizes. Please let me know if there is interest in porting this solution to 4.0 or 4.x. I can share the code if needed. > BufferPool Regression > - > > Key: CASSANDRA-15229 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15229 > Project: Cassandra > Issue Type: Bug > Components: Local/Caching >Reporter: Benedict Elliott Smith >Assignee: ZhaoYang >Priority: Normal > Fix For: 4.0, 4.0-beta > > > The BufferPool was never intended to be used for a {{ChunkCache}}, and we > need to either change our behaviour to handle uncorrelated lifetimes or use > something else. This is particularly important with the default chunk size > for compressed sstables being reduced. If we address the problem, we should > also utilise the BufferPool for native transport connections like we do for > internode messaging, and reduce the number of pooling solutions we employ. > Probably the best thing to do is to improve BufferPool’s behaviour when used > for things with uncorrelated lifetimes, which essentially boils down to > tracking those chunks that have not been freed and re-circulating them when > we run out of completely free blocks. We should probably also permit > instantiating separate {{BufferPool}}, so that we can insulate internode > messaging from the {{ChunkCache}}, or at least have separate memory bounds > for each, and only share fully-freed chunks. > With these improvements we can also safely increase the {{BufferPool}} chunk > size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce > the amount of global coordination and per-allocation overhead. We don’t need > 1KiB granularity for allocations, nor 16 byte granularity for tiny > allocations. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15409) nodetool compactionstats showing extra pending task for TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15409: - Fix Version/s: (was: 4.x) 4.0 Since Version: 3.0.8 Source Control Link: https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=122cf57f1134704769cce9daddf882c3ea578905 Resolution: Fixed Status: Resolved (was: Ready to Commit) > nodetool compactionstats showing extra pending task for TWCS > > > Key: CASSANDRA-15409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15409 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.6, 4.0 > > > Summary: nodetool compactionstats showing extra pending task for TWCS > - > The output of {{nodetool compactionstats}}can show "pending tasks: 1" when > there are actually none. This seems to be a consistent problem in testing C* > trunk. In my testing, it looks like the {{nodetool compactionstats}} counter > output is consistently off by 1 as compared to the table output of the tasks > > testing with {{concurrent_compactors: 8}} > In 12 hours it never ended, always showing 1 pending job > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15409) nodetool compactionstats showing extra pending task for TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15409: - Status: Ready to Commit (was: Review In Progress) > nodetool compactionstats showing extra pending task for TWCS > > > Key: CASSANDRA-15409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15409 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.6, 4.x > > > Summary: nodetool compactionstats showing extra pending task for TWCS > - > The output of {{nodetool compactionstats}}can show "pending tasks: 1" when > there are actually none. This seems to be a consistent problem in testing C* > trunk. In my testing, it looks like the {{nodetool compactionstats}} counter > output is consistently off by 1 as compared to the table output of the tasks > > testing with {{concurrent_compactors: 8}} > In 12 hours it never ended, always showing 1 pending job > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15409) nodetool compactionstats showing extra pending task for TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993665#comment-16993665 ] Stefania Alborghetti commented on CASSANDRA-15409: -- CI looked good on our infra, so committed to 3.11 as [122cf57f1134704769cce9daddf882c3ea578905|https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commit;h=122cf57f1134704769cce9daddf882c3ea578905] and merged into trunk. > nodetool compactionstats showing extra pending task for TWCS > > > Key: CASSANDRA-15409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15409 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.6, 4.x > > > Summary: nodetool compactionstats showing extra pending task for TWCS > - > The output of {{nodetool compactionstats}}can show "pending tasks: 1" when > there are actually none. This seems to be a consistent problem in testing C* > trunk. In my testing, it looks like the {{nodetool compactionstats}} counter > output is consistently off by 1 as compared to the table output of the tasks > > testing with {{concurrent_compactors: 8}} > In 12 hours it never ended, always showing 1 pending job > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15409) nodetool compactionstats showing extra pending task for TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993665#comment-16993665 ] Stefania Alborghetti edited comment on CASSANDRA-15409 at 12/11/19 4:05 PM: CI looked good on our infra, so committed to 3.11 as [122cf57f1134704769cce9daddf882c3ea578905|https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=122cf57f1134704769cce9daddf882c3ea578905] and merged into trunk. was (Author: stefania): CI looked good on our infra, so committed to 3.11 as [122cf57f1134704769cce9daddf882c3ea578905|https://gitbox.apache.org/repos/asf?p=cassandra.git;a=commit;h=122cf57f1134704769cce9daddf882c3ea578905] and merged into trunk. > nodetool compactionstats showing extra pending task for TWCS > > > Key: CASSANDRA-15409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15409 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.6, 4.x > > > Summary: nodetool compactionstats showing extra pending task for TWCS > - > The output of {{nodetool compactionstats}}can show "pending tasks: 1" when > there are actually none. This seems to be a consistent problem in testing C* > trunk. In my testing, it looks like the {{nodetool compactionstats}} counter > output is consistently off by 1 as compared to the table output of the tasks > > testing with {{concurrent_compactors: 8}} > In 12 hours it never ended, always showing 1 pending job > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15409) nodetool compactionstats showing extra pending task for TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975175#comment-16975175 ] Stefania Alborghetti commented on CASSANDRA-15409: -- Patch LGTM Moving forward, it's perhaps easier to post the direct link to the branches: [CASSANDRA-15409-3.11|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15409-3.11] [CASSANDRA-15409-trunk|https://github.com/ekaterinadimitrova2/cassandra/tree/CASSANDRA-15409-trunk] Also, before merging, we need to post the CI results. > nodetool compactionstats showing extra pending task for TWCS > > > Key: CASSANDRA-15409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15409 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.6, 4.x > > > Summary: nodetool compactionstats showing extra pending task for TWCS > - > The output of {{nodetool compactionstats}}can show "pending tasks: 1" when > there are actually none. This seems to be a consistent problem in testing C* > trunk. In my testing, it looks like the {{nodetool compactionstats}} counter > output is consistently off by 1 as compared to the table output of the tasks > > testing with {{concurrent_compactors: 8}} > In 12 hours it never ended, always showing 1 pending job > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15409) nodetool compactionstats showing extra pending task for TWCS
[ https://issues.apache.org/jira/browse/CASSANDRA-15409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-15409: - Reviewers: Stefania Alborghetti, Stefania Alborghetti (was: Stefania Alborghetti) Stefania Alborghetti, Stefania Alborghetti Status: Review In Progress (was: Patch Available) > nodetool compactionstats showing extra pending task for TWCS > > > Key: CASSANDRA-15409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15409 > Project: Cassandra > Issue Type: Bug > Components: Tool/nodetool >Reporter: Ekaterina Dimitrova >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 3.11.6, 4.x > > > Summary: nodetool compactionstats showing extra pending task for TWCS > - > The output of {{nodetool compactionstats}}can show "pending tasks: 1" when > there are actually none. This seems to be a consistent problem in testing C* > trunk. In my testing, it looks like the {{nodetool compactionstats}} counter > output is consistently off by 1 as compared to the table output of the tasks > > testing with {{concurrent_compactors: 8}} > In 12 hours it never ended, always showing 1 pending job > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-9259) Bulk Reading from Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti reassigned CASSANDRA-9259: --- Assignee: (was: Stefania Alborghetti) > Bulk Reading from Cassandra > --- > > Key: CASSANDRA-9259 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9259 > Project: Cassandra > Issue Type: New Feature > Components: Legacy/CQL, Legacy/Local Write-Read Paths, > Legacy/Streaming and Messaging, Legacy/Testing, Local/Compaction >Reporter: Brian Hess >Priority: Urgent > Fix For: 4.x > > Attachments: 256_vnodes.jpg, before_after.jpg, > bulk-read-benchmark.1.html, bulk-read-jfr-profiles.1.tar.gz, > bulk-read-jfr-profiles.2.tar.gz, no_vnodes.jpg, spark_benchmark_raw_data.zip > > > This ticket is following on from the 2015 NGCC. This ticket is designed to be > a place for discussing and designing an approach to bulk reading. > The goal is to have a bulk reading path for Cassandra. That is, a path > optimized to grab a large portion of the data for a table (potentially all of > it). This is a core element in the Spark integration with Cassandra, and the > speed at which Cassandra can deliver bulk data to Spark is limiting the > performance of Spark-plus-Cassandra operations. This is especially of > importance as Cassandra will (likely) leverage Spark for internal operations > (for example CASSANDRA-8234). > The core CQL to consider is the following: > SELECT a, b, c FROM myKs.myTable WHERE Token(partitionKey) > X AND > Token(partitionKey) <= Y > There are a few approaches that could be considered. First, we consider a new > "Streaming Compaction" approach. The key observation here is that a bulk read > from Cassandra is a lot like a major compaction, though instead of outputting > a new SSTable we would output CQL rows to a stream/socket/etc. This would be > similar to a CompactionTask, but would strip out some unnecessary things in > there (e.g., some of the indexing, etc). Predicates and projections could > also be encapsulated in this new "StreamingCompactionTask", for example. > Here, we choose X and Y to be contained within one token range (perhaps > considering the primary range of a node without vnodes, for example). This > query pushes 50K-100K rows/sec, which is not very fast if we are doing bulk > operations via Spark (or other processing frameworks - ETL, etc). There are a > few causes (e.g., inefficient paging). > > Another approach would be an alternate storage format. For example, we might > employ Parquet (just as an example) to store the same data as in the primary > Cassandra storage (aka SSTables). This is akin to Global Indexes (an > alternate storage of the same data optimized for a particular query). Then, > Cassandra can choose to leverage this alternate storage for particular CQL > queries (e.g., range scans). > These are just 2 suggestions to get the conversation going. > One thing to note is that it will be useful to have this storage segregated > by token range so that when you extract via these mechanisms you do not get > replications-factor numbers of copies of the data. That will certainly be an > issue for some Spark operations (e.g., counting). Thus, we will want > per-token-range storage (even for single disks), so this will likely leverage > CASSANDRA-6696 (though, we'll want to also consider the single disk case). > It is also worth discussing what the success criteria is here. It is unlikely > to be as fast as EDW or HDFS performance (though, that is still a good goal), > but being within some percentage of that performance should be set as > success. For example, 2x as long as doing bulk operations on HDFS with > similar node count/size/etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11520) Implement optimized local read path for CL.ONE
[ https://issues.apache.org/jira/browse/CASSANDRA-11520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-11520: - Resolution: Won't Fix Status: Resolved (was: Open) > Implement optimized local read path for CL.ONE > -- > > Key: CASSANDRA-11520 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11520 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/CQL, Legacy/Local Write-Read Paths >Reporter: Stefania Alborghetti >Assignee: Stefania Alborghetti >Priority: Normal > Fix For: 4.x > > > -Add an option to the CQL SELECT statement to- Bypass the coordination layer > when reading locally at CL.ONE. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-11521: - Fix Version/s: (was: 4.x) > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Local Write-Read Paths >Reporter: Stefania Alborghetti >Assignee: Stefania Alborghetti >Priority: Normal > Labels: protocolv5 > Attachments: final-patch-jfr-profiles-1.zip > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-11521: - Status: Open (was: Patch Available) > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Local Write-Read Paths >Reporter: Stefania Alborghetti >Assignee: Stefania Alborghetti >Priority: Normal > Labels: protocolv5 > Fix For: 4.x > > Attachments: final-patch-jfr-profiles-1.zip > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11521) Implement streaming for bulk read requests
[ https://issues.apache.org/jira/browse/CASSANDRA-11521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania Alborghetti updated CASSANDRA-11521: - Resolution: Won't Fix Status: Resolved (was: Open) > Implement streaming for bulk read requests > -- > > Key: CASSANDRA-11521 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11521 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Local Write-Read Paths >Reporter: Stefania Alborghetti >Assignee: Stefania Alborghetti >Priority: Normal > Labels: protocolv5 > Fix For: 4.x > > Attachments: final-patch-jfr-profiles-1.zip > > > Allow clients to stream data from a C* host, bypassing the coordination layer > and eliminating the need to query individual pages one by one. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org