[jira] [Commented] (CASSANDRA-9694) system_auth not upgraded
[ https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613210#comment-14613210 ] Sam Tunnicliffe commented on CASSANDRA-9694: Thanks, but that log shows no errors the auth upgrade process happens as expected, the new tables are created and a conversion is attempted (at 12:59:33,194) which fails in the anticipated way. However, the log also shows no client requests being made to the node, once it started up so even if there were a problem with authentication or permissions, it wouldn't be triggered. Can you restart that node and direct some traffic to it please? A single connection with cqlsh should be enough to see if there's any problem. system_auth not upgraded Key: CASSANDRA-9694 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55 Reporter: Andreas Schnitzerling Assignee: Sam Tunnicliffe Fix For: 2.2.0 rc2 Attachments: 9694.txt, system.log.1.zip, system.log.2..zip, system_exception.log After upgrading Authorization-Exceptions occur. I checked the system_auth keyspace and have seen, that tables users, credentials and permissions were not upgraded automatically. I upgraded them (I needed 2 times per table because of CASSANDRA-9566). After upgrading the system_auth tables I could login via cql using different users. {code:title=system.log} WARN [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - CassandraAuthorizer failed to authorize #User updateprog for keyspace logdata ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.get(LocalCache.java:3934) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na] at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at
[jira] [Created] (CASSANDRA-9725) CQL docs do not build due to duplicate name
Christopher Batey created CASSANDRA-9725: Summary: CQL docs do not build due to duplicate name Key: CASSANDRA-9725 URL: https://issues.apache.org/jira/browse/CASSANDRA-9725 Project: Cassandra Issue Type: Bug Components: Documentation website Reporter: Christopher Batey Fix on branch broken-cql-docs in g...@github.com:chbatey/cassandra-1.git -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9647) Tables created by cassandra-stress are omitted in DESCRIBE KEYSPACE
[ https://issues.apache.org/jira/browse/CASSANDRA-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613248#comment-14613248 ] Tyler Hobbs commented on CASSANDRA-9647: Pending test runs: * [2.1 testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.1-testall/] * [2.1 dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.1-dtest/] * [2.2 testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.2-testall/] * [2.2 dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.2-dtest/] Tables created by cassandra-stress are omitted in DESCRIBE KEYSPACE --- Key: CASSANDRA-9647 URL: https://issues.apache.org/jira/browse/CASSANDRA-9647 Project: Cassandra Issue Type: Bug Reporter: Ryan McGuire Assignee: Tyler Hobbs Priority: Minor Labels: cqlsh, stress Fix For: 2.2.0 rc2 CASSANDRA-9374 modified cassandra-stress to only use CQL for creating its schema. This seems to work, as I'm testing on a cluster with start_rpc:false. However, when I try to run a DESCRIBE on the schema it omits the tables, complaining that they were created with a legacy API: {code} cqlsh DESCRIBE KEYSPACE keyspace1 ; CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true; /* Warning: Table keyspace1.counter1 omitted because it has constructs not compatible with CQL (was created via legacy API). Approximate structure, for reference: (this should not be used to reproduce this schema) CREATE TABLE keyspace1.counter1 ( key blob PRIMARY KEY, C0 counter, C1 counter, C2 counter, C3 counter, C4 counter ) WITH COMPACT STORAGE AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; */ /* Warning: Table keyspace1.standard1 omitted because it has constructs not compatible with CQL (was created via legacy API). Approximate structure, for reference: (this should not be used to reproduce this schema) CREATE TABLE keyspace1.standard1 ( key blob PRIMARY KEY, C0 blob, C1 blob, C2 blob, C3 blob, C4 blob ) WITH COMPACT STORAGE AND bloom_filter_fp_chance = 0.01 AND caching = '{keys:ALL, rows_per_partition:NONE}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; */ cqlsh {code} Note that it attempts to describe them anyway, but they are commented out and shouldn't be used to restore from. [This is the ccm workflow I used to test this|https://gist.githubusercontent.com/EnigmaCurry/e779055c8debf6de8ef9/raw/a894e99725b6df599f3ce1db5012dd6d069b1339/gistfile1.txt] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9726) Built in aggregate docs do not display examples due to whitespace error
Christopher Batey created CASSANDRA-9726: Summary: Built in aggregate docs do not display examples due to whitespace error Key: CASSANDRA-9726 URL: https://issues.apache.org/jira/browse/CASSANDRA-9726 Project: Cassandra Issue Type: Bug Components: Documentation website Reporter: Christopher Batey Fix on branch aggregate-docs at https://github.com/chbatey/cassandra-1.git -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9556) Add newer data types to cassandra stress (e.g. decimal, dates, UDTs)
[ https://issues.apache.org/jira/browse/CASSANDRA-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9556: -- Assignee: ZhaoYang Reviewer: Benjamin Lerer (was: Jeremy Hanna) [~blerer] to review Add newer data types to cassandra stress (e.g. decimal, dates, UDTs) Key: CASSANDRA-9556 URL: https://issues.apache.org/jira/browse/CASSANDRA-9556 Project: Cassandra Issue Type: Bug Components: Tools Reporter: Jeremy Hanna Assignee: ZhaoYang Labels: stress Attachments: cassandra-2.1-9556.txt, trunk-9556.txt Currently you can't define a data model with decimal types and use Cassandra stress with it. Also, I imagine that holds true with other newer data types such as the new date and time types. Besides that, now that data models are including user defined types, we should allow users to create those structures with stress as well. Perhaps we could split out the UDTs into a different ticket if it holds the other types up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing
[ https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-9591: Labels: benedict-to-commit sstablescrub (was: sstablescrub) Scrub (recover) sstables even when -Index.db is missing --- Key: CASSANDRA-9591 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591 Project: Cassandra Issue Type: Improvement Reporter: mck Assignee: mck Labels: benedict-to-commit, sstablescrub Fix For: 2.0.x Attachments: 9591-2.0.txt, 9591-2.1.txt Today SSTableReader needs at minimum 3 files to load an sstable: - -Data.db - -CompressionInfo.db - -Index.db But during the scrub process the -Index.db file isn't actually necessary, unless there's corruption in the -Data.db and we want to be able to skip over corrupted rows. Given that there is still a fair chance that there's nothing wrong with the -Data.db file and we're just missing the -Index.db file this patch addresses that situation. So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to recover sstables despite missing -Index.db files. This can happen from a catastrophic incident where data directories have been lost and/or corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild. I have not tested this patch against normal c* operations and all the other (more critical) ways SSTableReader is used. i'll happily do that and add the needed units tests if people see merit in accepting the patch. Otherwise the patch can live with the issue, in-case anyone else needs it. There's also a cassandra distribution bundled with the patch [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz] to make life a little easier for anyone finding themselves in such a bad situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9723) UDF / UDA execution time in trace
[ https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613019#comment-14613019 ] Robert Stupp commented on CASSANDRA-9723: - Thanks for pointing this out! Always had that in my mind but unfortunately completely missed to open a JIRA for this. UDF / UDA execution time in trace - Key: CASSANDRA-9723 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Minor I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and doesn't appear to be mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612937#comment-14612937 ] Stefania commented on CASSANDRA-7392: - [~slebresne], I would like to know how to best abort a read operation. I think we have several options: - Add an abort requested control field (OpState) in the code [here|https://github.com/stef1927/cassandra/commits/7392] to ReadCommand but this means changing the constructor chain which is very complex (the index read command needs to inherit the control field of the main command - unless we make it not final) - Pass it as an argument to the various methods (executeLocally, queryStorage, search, etc) - Add it to ReadOrderGroup, which is passed almost everywhere but it isn't really related - Simply stop the iterator in executeLocally with a wrapper iterator (this is required anyway I believe but it would not abort the index reads) Could you comment on : - your preferred option, I don't want to spoil your design :) - whether the code is stable or whether there is some refactoring still missing that I should wait for or help out with Abort in-progress queries that time out --- Key: CASSANDRA-7392 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Stefania Fix For: 3.x Currently we drop queries that time out before we get to them (because node is overloaded) but not queries that time out while being processed. (Particularly common for index queries on data that shouldn't be indexed.) Adding the latter and logging when we have to interrupt one gets us a poor man's slow query log for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9723) UDF / UDA execution time in trace
[ https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-9723: Fix Version/s: 2.2.x UDF / UDA execution time in trace - Key: CASSANDRA-9723 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Minor Fix For: 2.2.x I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and doesn't appear to be mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times
Christopher Batey created CASSANDRA-9724: Summary: UDA appears to be causing query to be executed multiple times Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Priority: Critical Not sure if this is intended behaviour. Example table: {quote} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {quote} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source| source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 | 612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {quote} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source | source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:56:15.904000 | 127.0.0.1 |108 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:56:15.904000 | 127.0.0.1 |201 Executing single-partition
[jira] [Updated] (CASSANDRA-9715) Secondary index out of sync
[ https://issues.apache.org/jira/browse/CASSANDRA-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hazel Bobrins updated CASSANDRA-9715: - Environment: RHEL 6.2 2.6.32-220.13.1.el6.x86_64 / Java 1.7.0_76 (was: RHEL 6.2 2.6.32-220.13.1.el6.x86_64) Secondary index out of sync --- Key: CASSANDRA-9715 URL: https://issues.apache.org/jira/browse/CASSANDRA-9715 Project: Cassandra Issue Type: Bug Components: Core Environment: RHEL 6.2 2.6.32-220.13.1.el6.x86_64 / Java 1.7.0_76 Reporter: Hazel Bobrins On 2.0.15 ( we moved from 2.08 hoping this problem would go away) we am seeing intermittent issues where a secondary index is getting out of sync. Set up is a 6 node cluster with 3 data centers, two nodes in each and with a RF of 2 in each data centre. So far I have been unable to reproduce this synthetically but have seen multiple instances across all nodes within the cluster. Data set is very small ~40K keys and 100MB of data. We add maybe 1000 records a day, delete ~500 and update ~200. Not a very write based system. Reads we can push out to ~2000/sec. Writes are done at CL ALL and reads at ONE All examples so far have been triggered when a record has been deleted and then other added with the same index cardinality; I think it has also always been the last record in the set which was deleted before the addition. On a flushed keyspace a sstable2json export of the primary index shows all records correctly, however, an export of the secondary index is missing the records. nodetool rebuild_index does not resolve the problem Nether does a compact or repair A select on the primary key at CL ALL also has no impact However, a select at CL ALL on the secondary index does resolve the problem. There is currently a none critical record which is out of the index on one of our nodes. If another key is added with the same index cardinality it is added to the index correctly. If this is then removed it once again returns empty. We have checked all the obvious OS bits and confirmed our time sync (ntp based). At DEBUG level we see nothing obvious wrong when adding/removing keys to the above broken entry. Due to the very intermittent nature of this problem is been impossible so far to gather any DEBUG logs of it failing; we have also been unsuccessful so far in reproducing this in out QA. I know this is not much to go on, if there is anything we can provide to help expand what might be the issue please let me know and we'll provide it asap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613012#comment-14613012 ] Sylvain Lebresne commented on CASSANDRA-7392: - A least for ranges and single partition slice queries, aborting in {{queryMemtableAndDiskInternal}} won't buy us much: building the iterator won't take much time at all, it's the reading of the iterator that may take time. So we do at least need the ability to abort the iterator reading, and for that wrapping the result iterator in {{executeLocally}} as you said sounds to me like the best/simplest option. That leaves single partition names queries, for which the work is indeed done in {{queryMemtableAndDiskInternal}}. For that, I do would avoid adding it as a field of {{ReadCommand}}, as aborting is more a property of the execution than of the command itself. Maybe we could add it to {{ReadOrderGroup}} but rename that class to something more generic (maybe {{ExecutionController}}?), so it doesn't feel out of place, and that could be convenient place to add more stuff in the future. I'll remark however that for names queries, the proper way to protect for long queries is also to wrap the iterators read inside of {{queryMemtableAndDiskInternal}}. Only checking for aborting at the begining of handling each memtable/sstable (like in the patch you've linked) is probably not fine-grained enough (in the sense that a names query is likely to ony take a long time if lots of names are queried, and if that's the case reading a single sstable could take quite some time). bq. but it would not abort the index reads It would actually, in the sense that we don't query the index fully upfront, we do it on-demand when the main iterator requires more data. bq. whether the code is stable or whether there is some refactoring still missing that I should wait for As far as I'm concerned, the only missing refactoring is CASSANDRA-9705, and that will almost surely not affect any of the code you will touch in this ticket, so you're clear :) Abort in-progress queries that time out --- Key: CASSANDRA-7392 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Jonathan Ellis Assignee: Stefania Fix For: 3.x Currently we drop queries that time out before we get to them (because node is overloaded) but not queries that time out while being processed. (Particularly common for index queries on data that shouldn't be indexed.) Adding the latter and logging when we have to interrupt one gets us a poor man's slow query log for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9723) UDF / UDA execution time in trace
Christopher Batey created CASSANDRA-9723: Summary: UDF / UDA execution time in trace Key: CASSANDRA-9723 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Christopher Batey Priority: Minor I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and doesn't appear to be mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9723) UDF / UDA execution time in trace
[ https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp reassigned CASSANDRA-9723: --- Assignee: Robert Stupp UDF / UDA execution time in trace - Key: CASSANDRA-9723 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Minor I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and doesn't appear to be mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9723) UDF / UDA execution time in trace
[ https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613019#comment-14613019 ] Robert Stupp edited comment on CASSANDRA-9723 at 7/3/15 8:58 AM: - Thanks for pointing this out! Always had that in my mind but unfortunately completely missed to open a JIRA for this. EDIT: should be a trivial patch - may make it into 2.2.0rc2. was (Author: snazy): Thanks for pointing this out! Always had that in my mind but unfortunately completely missed to open a JIRA for this. UDF / UDA execution time in trace - Key: CASSANDRA-9723 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Minor Fix For: 2.2.x I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and doesn't appear to be mentioned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7
[ https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612874#comment-14612874 ] Loic Lambiel commented on CASSANDRA-9683: - Yes it is correct Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7 -- Key: CASSANDRA-9683 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 (3.13 Kernel) * 3 JDK: Oracle JDK 7 RAM: 32GB Cores 4 (+4 HT) Reporter: Loic Lambiel Assignee: Ariel Weisberg Fix For: 2.1.x Attachments: cassandra.yaml, cfstats.txt, os_load.png, pending_compactions.png, read_latency.png, schema.txt, system.log, write_latency.png After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, the average load grows from 0.1-0.3 to 1.8. Latencies did increase as well. We see an increase of pending compactions, probably due to CASSANDRA-9592. This cluster has almost no workload (staging environment) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers
[ https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-7066: Labels: benedict-to-commit compaction (was: compaction) Simplify (and unify) cleanup of compaction leftovers Key: CASSANDRA-7066 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Priority: Minor Labels: benedict-to-commit, compaction Fix For: 3.x Attachments: 7066.txt Currently we manage a list of in-progress compactions in a system table, which we use to cleanup incomplete compactions when we're done. The problem with this is that 1) it's a bit clunky (and leaves us in positions where we can unnecessarily cleanup completed files, or conversely not cleanup files that have been superceded); and 2) it's only used for a regular compaction - no other compaction types are guarded in the same way, so can result in duplication if we fail before deleting the replacements. I'd like to see each sstable store in its metadata its direct ancestors, and on startup we simply delete any sstables that occur in the union of all ancestor sets. This way as soon as we finish writing we're capable of cleaning up any leftovers, so we never get duplication. It's also much easier to reason about. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Batey updated CASSANDRA-9724: - Attachment: data.zip Dump of rows from raw_weather_data table UDA appears to be causing query to be executed multiple times - Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Critical Attachments: data.zip Not sure if this is intended behaviour. Example table: {quote} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {quote} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {quote} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from raw_weather_data where wsid =
[jira] [Commented] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613035#comment-14613035 ] Christopher Batey commented on CASSANDRA-9724: -- Uploaded, CSV from the table the queries are done on. UDA appears to be causing query to be executed multiple times - Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Critical Attachments: data.zip Not sure if this is intended behaviour. Example table: {quote} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {quote} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {quote} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613065#comment-14613065 ] Sylvain Lebresne commented on CASSANDRA-9471: - bq. If we choose not to include this feature, it would be better to implement these directly How much better? Thinking out loud here, but we're a database, we're dealing with sorted stuff all the time. So even outside of its use (or not) by {{Columns}}, having a more capable {{BtreeSet}} implementation, one that can act more like an efficient sorted list, feels to me like something that would be useful to have in our tool belt. Meaning by that it sounds from you comments that the indexability does add much complexity to the implementation(disclaimer: I haven't looked at the patch) , so if its cost is really small, maybe it's worth getting the flexibility? Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613093#comment-14613093 ] Sylvain Lebresne commented on CASSANDRA-9471: - bq. for the normal worries of code atrophy Yes, that is something to take into account. However, it's also a utility class, one that is meant to be used a lot in the codebase. And the indexability code both already written. So if it doesn't introduce significant complexity, it does feels like a relatively good deal. Basically, I would hate to spend more time pulling the already written functionality out to maybe end up someday having a good use of this, but ending up doing something less efficient just because it's not there. Besides, it's totally possible it will be used by {{Columns}} in the end :) Anyway, I don't want to sound insistent, it's not that I absolutely want it. Just offering that maybe simply rebasing that ticket now would avoid pushing that to when we might be even shorter on resources than we are, doesn't precludes considering better alternative for {{Columns}} later, and won't waste all that much work if we do end up changing {{Columns}} but keep the indexability as generally useful. Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9686) FSReadError and LEAK DETECTED after upgrading
[ https://issues.apache.org/jira/browse/CASSANDRA-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613039#comment-14613039 ] Stefania commented on CASSANDRA-9686: - No one volunteered on IRC so let's wait for [~krummas] to be back next week regarding handling of corrupt sstables. FSReadError and LEAK DETECTED after upgrading - Key: CASSANDRA-9686 URL: https://issues.apache.org/jira/browse/CASSANDRA-9686 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55 Reporter: Andreas Schnitzerling Assignee: Stefania Fix For: 2.2.x Attachments: cassandra.bat, cassandra.yaml, compactions_in_progress.zip, sstable_activity.zip, system.log After upgrading one of 15 nodes from 2.1.7 to 2.2.0-rc1 I get FSReadError and LEAK DETECTED on start. Deleting the listed files, the failure goes away. {code:title=system.log} ERROR [SSTableBatchOpen:1] 2015-06-29 14:38:34,554 DebuggableThreadPoolExecutor.java:242 - Error in ThreadPoolExecutor org.apache.cassandra.io.FSReadError: java.io.IOException: Compressed file with 0 chunks encountered: java.io.DataInputStream@1c42271 at org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:178) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:117) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:86) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:142) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:101) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:178) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:681) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:644) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:443) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:350) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.io.sstable.format.SSTableReader$4.run(SSTableReader.java:480) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_55] at java.lang.Thread.run(Unknown Source) [na:1.7.0_55] Caused by: java.io.IOException: Compressed file with 0 chunks encountered: java.io.DataInputStream@1c42271 at org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:174) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] ... 15 common frames omitted ERROR [Reference-Reaper:1] 2015-06-29 14:38:34,734 Ref.java:189 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@3e547f) to class org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1926439:D:\Programme\Cassandra\data\data\system\compactions_in_progress\system-compactions_in_progress-ka-6866 was not released before the reference was garbage collected {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613058#comment-14613058 ] Benedict commented on CASSANDRA-9471: - Well, the decision does ultimately affect how certain features within the btree are implemented - or at least the cost/benefit analysis (for the reviewer as much as myself). Right now I've used the indexability feature to make a trivial implementation of lower/higher/floor/ceil, because it permits you to treat the whole btree as though it were an array for indexing, using binarySearch semantics and positional access. If we choose not to include this feature, it would be better to implement these directly - not onerous, of course, but I want to avoid burdening branimir with unnecessary review. There's also some intertwining on testing (using higher features to help test lower ones). However you make a good point, and I will see what minimal set of changes I can extract to get the ball rolling. It's probably still pretty significant and helpful. Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing
[ https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613048#comment-14613048 ] Benedict edited comment on CASSANDRA-9591 at 7/3/15 9:31 AM: - Perhaps we should just {{obsoleteOriginals}} up-front, if offline? We could even do it in the {{StandaloneScrubber}}, before calling {{scrubber.scrub()}}, to avoid polluting the general purpose {{Scrubber}}. No strong feelings though - the patch looks like it works to me. [~stefania]: could you rebase your branches and once CI passes I'll commit. was (Author: benedict): Perhaps we should just {{obsoleteOriginals}} up-front, if offline? We could even do it in the {{StandaloneScrubber}}, before calling {{scrubber.scrub()}}, to avoid polluting the general purpose {{Scrubber}}. No strong feelings though - the patch looks like it works to me. [~stef1927]: could you rebase your branches and once CI passes I'll commit. Scrub (recover) sstables even when -Index.db is missing --- Key: CASSANDRA-9591 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591 Project: Cassandra Issue Type: Improvement Reporter: mck Assignee: mck Labels: benedict-to-commit, sstablescrub Fix For: 2.0.x Attachments: 9591-2.0.txt, 9591-2.1.txt Today SSTableReader needs at minimum 3 files to load an sstable: - -Data.db - -CompressionInfo.db - -Index.db But during the scrub process the -Index.db file isn't actually necessary, unless there's corruption in the -Data.db and we want to be able to skip over corrupted rows. Given that there is still a fair chance that there's nothing wrong with the -Data.db file and we're just missing the -Index.db file this patch addresses that situation. So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to recover sstables despite missing -Index.db files. This can happen from a catastrophic incident where data directories have been lost and/or corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild. I have not tested this patch against normal c* operations and all the other (more critical) ways SSTableReader is used. i'll happily do that and add the needed units tests if people see merit in accepting the patch. Otherwise the patch can live with the issue, in-case anyone else needs it. There's also a cassandra distribution bundled with the patch [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz] to make life a little easier for anyone finding themselves in such a bad situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613081#comment-14613081 ] Benedict commented on CASSANDRA-9471: - Well, performance-wise the difference is negligible. There is an extra lg(N/32) cost for the current implementation, which amortizes to imperceptible (and literally zero for small sets). The fact we don't use higher/lower/ceil/floor very commonly means I'm confident this extra cost is better to incur for the simplicity of implementation. The reason I say better is exclusively because there is a more direct implementation for the inequality lookups. If we don't have _another_ reason for indexing it seems better practice to implement that directly, and leave out the indexing feature. The indexability is actually surprisingly simple, and doesn't introduce significant complexity IMO. I'm just a little wary of introducing features we don't use _directly_ (even if I have an attachment to it), for the normal worries of code atrophy. I certainly won't argue against its inclusion, though, as I agree it seems like it _should_ be more generally useful. I'm just not yet aware of another place for it. Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp reassigned CASSANDRA-9724: --- Assignee: Robert Stupp UDA appears to be causing query to be executed multiple times - Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Critical Not sure if this is intended behaviour. Example table: {quote} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {quote} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {quote} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03
[jira] [Commented] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613027#comment-14613027 ] Robert Stupp commented on CASSANDRA-9724: - [~chbatey] do you have some sample data as CSV or CQL? UDA appears to be causing query to be executed multiple times - Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Priority: Critical Not sure if this is intended behaviour. Example table: {quote} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {quote} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {quote} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from raw_weather_data where wsid
[jira] [Updated] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size
[ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefania updated CASSANDRA-8894: Labels: benedict-to-commit (was: ) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size -- Key: CASSANDRA-8894 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Stefania Labels: benedict-to-commit Fix For: 3.x A large contributor to slower buffered reads than mmapped is likely that we read a full 64Kb at once, when average record sizes may be as low as 140 bytes on our stress tests. The TLB has only 128 entries on a modern core, and each read will touch 32 of these, meaning we are unlikely to almost ever be hitting the TLB, and will be incurring at least 30 unnecessary misses each time (as well as the other costs of larger than necessary accesses). When working with an SSD there is little to no benefit reading more than 4Kb at once, and in either case reading more data than we need is wasteful. So, I propose selecting a buffer size that is the next larger power of 2 than our average record size (with a minimum of 4Kb), so that we expect to read in one operation. I also propose that we create a pool of these buffers up-front, and that we ensure they are all exactly aligned to a virtual page, so that the source and target operations each touch exactly one virtual page per 4Kb of expected record size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing
[ https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613048#comment-14613048 ] Benedict commented on CASSANDRA-9591: - Perhaps we should just {{obsoleteOriginals}} up-front, if offline? We could even do it in the {{StandaloneScrubber}}, before calling {{scrubber.scrub()}}, to avoid polluting the general purpose {{Scrubber}}. No strong feelings though - the patch looks like it works to me. [~stef1927]: could you rebase your branches and once CI passes I'll commit. Scrub (recover) sstables even when -Index.db is missing --- Key: CASSANDRA-9591 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591 Project: Cassandra Issue Type: Improvement Reporter: mck Assignee: mck Labels: benedict-to-commit, sstablescrub Fix For: 2.0.x Attachments: 9591-2.0.txt, 9591-2.1.txt Today SSTableReader needs at minimum 3 files to load an sstable: - -Data.db - -CompressionInfo.db - -Index.db But during the scrub process the -Index.db file isn't actually necessary, unless there's corruption in the -Data.db and we want to be able to skip over corrupted rows. Given that there is still a fair chance that there's nothing wrong with the -Data.db file and we're just missing the -Index.db file this patch addresses that situation. So the following patch makes it possible for the StandaloneScrubber (sstablescrub) to recover sstables despite missing -Index.db files. This can happen from a catastrophic incident where data directories have been lost and/or corrupted, or wiped and the backup not healthy. I'm aware that normally one depends on replicas or snapshots to avoid such situations, but such catastrophic incidents do occur in the wild. I have not tested this patch against normal c* operations and all the other (more critical) ways SSTableReader is used. i'll happily do that and add the needed units tests if people see merit in accepting the patch. Otherwise the patch can live with the issue, in-case anyone else needs it. There's also a cassandra distribution bundled with the patch [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz] to make life a little easier for anyone finding themselves in such a bad situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613046#comment-14613046 ] Sylvain Lebresne commented on CASSANDRA-9471: - bq. at the very least, the improved iterator, improved tests, and wider deployment of the btree are all worth incorporating. What about moving those changes to a separate ticket (i.e. one that is not concerned by {{Columns}})? It's useful to trunk anyway as you says, and the less stuff we delay, the better. Splitting the changes related to {{Columns}} from the other is also more incremental in a way :) Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9694) system_auth not upgraded
[ https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613087#comment-14613087 ] Andreas Schnitzerling commented on CASSANDRA-9694: -- I disabled auth and everything else is working correctly. I tested it for 20 hours continuously running. system_auth not upgraded Key: CASSANDRA-9694 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55 Reporter: Andreas Schnitzerling Assignee: Sam Tunnicliffe Fix For: 2.2.0 rc2 Attachments: 9694.txt, system_exception.log After upgrading Authorization-Exceptions occur. I checked the system_auth keyspace and have seen, that tables users, credentials and permissions were not upgraded automatically. I upgraded them (I needed 2 times per table because of CASSANDRA-9566). After upgrading the system_auth tables I could login via cql using different users. {code:title=system.log} WARN [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - CassandraAuthorizer failed to authorize #User updateprog for keyspace logdata ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.get(LocalCache.java:3934) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na] at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [na:1.7.0_55] at java.lang.Thread.run(Unknown Source) [na:1.7.0_55] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9694) system_auth not upgraded
[ https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613106#comment-14613106 ] Sam Tunnicliffe commented on CASSANDRA-9694: We haven't seen this problem in any of our testing, and afraid I'm unable to reproduce it now. Do you have the logs from the node when you first upgraded it, before wiping system_auth? Failing that, the only things I can suggest are to upgrade another node to 2.2.0-rc1 and capture its logs (at INFO level at least) or to rebuild the upgraded node on 2.1.7 then run the upgrade again, again capturing the logs. system_auth not upgraded Key: CASSANDRA-9694 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55 Reporter: Andreas Schnitzerling Assignee: Sam Tunnicliffe Fix For: 2.2.0 rc2 Attachments: 9694.txt, system_exception.log After upgrading Authorization-Exceptions occur. I checked the system_auth keyspace and have seen, that tables users, credentials and permissions were not upgraded automatically. I upgraded them (I needed 2 times per table because of CASSANDRA-9566). After upgrading the system_auth tables I could login via cql using different users. {code:title=system.log} WARN [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - CassandraAuthorizer failed to authorize #User updateprog for keyspace logdata ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.get(LocalCache.java:3934) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na] at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[jira] [Updated] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times
[ https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-9724: - Priority: Major (was: Critical) UDA appears to be causing query to be executed multiple times - Key: CASSANDRA-9724 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724 Project: Cassandra Issue Type: Bug Components: Core Reporter: Christopher Batey Assignee: Robert Stupp Attachments: data.zip Not sure if this is intended behaviour. Example table: {quote} CREATE TABLE raw_weather_data ( wsid text, // Composite of Air Force Datsav3 station number and NCDC WBAN number year int,// Year collected month int, // Month collected day int, // Day collected hour int,// Hour collected temperature double, // Air temperature (degrees Celsius) dewpoint double, // Dew point temperature (degrees Celsius) pressure double, // Sea level pressure (hectopascals) wind_direction int, // Wind direction in degrees. 0-359 wind_speed double,// Wind speed (meters per second) sky_condition int, // Total cloud cover (coded, see format documentation) sky_condition_text text, // Non-coded sky conditions one_hour_precip double, // One-hour accumulated liquid precipitation (millimeters) six_hour_precip double, // Six-hour accumulated liquid precipitation (millimeters) PRIMARY KEY ((wsid), year, month, day, hour) ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC); {quote} 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source | source_elapsed -++---+ Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 0 Parsing select temperature from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |109 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |193 Executing single-partition query on raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |519 Acquiring sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |544 Merging memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 |558 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |600 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 |612 Read 92 live and 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 |848 Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 1680 {quote} However once i include the min function i get: select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; {quote} activity | timestamp | source| source_elapsed --++---+ Execute CQL3 query | 2015-07-03 09:56:15.904000 | 127.0.0.1 | 0 Parsing select min(temperature) from raw_weather_data where wsid = '725030:14732' and year = 2008; [SharedPool-Worker-1] |
[jira] [Updated] (CASSANDRA-9694) system_auth not upgraded
[ https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Schnitzerling updated CASSANDRA-9694: - Attachment: system.log.2..zip system.log.1.zip Here the steps: 1. copied 2.2.0 instance w/o commitlog, data, saved_caches 2. created data dir 3. copied (backed up) 2.1.7 data (user-ks + system + system_auth + system_traces) into data (except user-CF onlinedata which is the bigest containing 13 GB of data) 4. changed log-level to DEBUG 5. enabled auth 6. started cassandra 7. after 8 minutes nodetool stopdaemon system_auth not upgraded Key: CASSANDRA-9694 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694 Project: Cassandra Issue Type: Bug Components: Core Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55 Reporter: Andreas Schnitzerling Assignee: Sam Tunnicliffe Fix For: 2.2.0 rc2 Attachments: 9694.txt, system.log.1.zip, system.log.2..zip, system_exception.log After upgrading Authorization-Exceptions occur. I checked the system_auth keyspace and have seen, that tables users, credentials and permissions were not upgraded automatically. I upgraded them (I needed 2 times per table because of CASSANDRA-9566). After upgrading the system_auth tables I could login via cql using different users. {code:title=system.log} WARN [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - CassandraAuthorizer failed to authorize #User updateprog for keyspace logdata ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - Error occurred during processing of message. com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received only 0 responses. at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.get(LocalCache.java:3934) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) ~[guava-16.0.jar:na] at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821) ~[guava-16.0.jar:na] at org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572) ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[libthrift-0.9.2.jar:0.9.2] at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204) ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1] at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [na:1.7.0_55] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613103#comment-14613103 ] Benedict commented on CASSANDRA-9471: - bq. but ending up doing something less efficient just because it's not there You're right, this can happen frustratingly often. OK. I'm convinced :) I'll split out the btree-only stuff into a separate ticket. Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9717) TestCommitLog segment size dtests fail on trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613119#comment-14613119 ] Branimir Lambov commented on CASSANDRA-9717: I suppose the easiest thing to do here is to increase the tolerance, but the test can still be flaky after we do that. A better solution is to fix the random seed for the writes so that we can use a small tolerance and avoid all flakiness, but I don't know if that's something we can do with the dtest infrastructure (or how, if we can). TestCommitLog segment size dtests fail on trunk --- Key: CASSANDRA-9717 URL: https://issues.apache.org/jira/browse/CASSANDRA-9717 Project: Cassandra Issue Type: Sub-task Reporter: Jim Witschey Assignee: Branimir Lambov Priority: Blocker Fix For: 3.0 beta 1 The test for the commit log segment size when the specified size is 32MB. It fails for me locally and on on cassci. ([cassci link|http://cassci.datastax.com/view/trunk/job/trunk_dtest/305/testReport/commitlog_test/TestCommitLog/default_segment_size_test/]) The command to run the test by itself is {{CASSANDRA_VERSION=git:trunk nosetests commitlog_test.py:TestCommitLog.default_segment_size_test}}. EDIT: a similar test, {{commitlog_test.py:TestCommitLog.small_segment_size_test}}, also fails with a similar error. The solution here may just be to change the expected size or the acceptable error -- the result isn't far off. I'm happy to make the dtest change if that's the solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613122#comment-14613122 ] Sylvain Lebresne commented on CASSANDRA-9471: - Side note: if the changes to {{Columns}} are not hard to rebase, I'd personally be fine with just rebasing that ticket as is (without bothering splitting it in 2 tickets) for the sake of saving you some time. At least for CASSANDRA-9705, I don't plan on having much of {{Columns}} going obsolete (the indexability will be most likely much less used but will still be handy, and we'll still rely heavily-ish on {{contains}} which is currently not terribly efficient). And of course, that still doesn't precludes from consider other implementation of {{Columns}} later. Anyway, fine with whatever way you prefer, but just to say that if splitting into 2 tickets takes you the same time than just rebasing the whole patch, I'd personally just go with the second option. Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array
[ https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613137#comment-14613137 ] Sylvain Lebresne commented on CASSANDRA-9471: - I'm sorry for the back and forth, but if you haven't started working on that rebase, actually disregard my previous comment (might be safer to leave Columns alone until CASSANDRA-9705). Columns should be backed by a BTree, not an array - Key: CASSANDRA-9471 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Benedict Assignee: Benedict Fix For: 3.0 beta 1 Follow up to 8099. We have pretty terrible lookup performance as the number of columns grows (linear). In at least one location, this results in quadratic performance. We don't however want this structure to be either any more expensive to build, nor to store. Some small modifications to BTree will permit it to serve here, by permitting efficient lookup by index, and calculation _of_ index for a given key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)