date:20150703

Christopher Batey created CASSANDRA-9725:


 Summary: CQL docs do not build due to duplicate name
 Key: CASSANDRA-9725
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9725
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website
Reporter: Christopher Batey


Fix on branch broken-cql-docs in g...@github.com:chbatey/cassandra-1.git  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9647) Tables created by cassandra-stress are omitted in DESCRIBE KEYSPACE

2015-07-03 Thread Tyler Hobbs (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613248#comment-14613248
 ] 

Tyler Hobbs commented on CASSANDRA-9647:


Pending test runs:
* [2.1 
testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.1-testall/]
* [2.1 
dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.1-dtest/]
* [2.2 
testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.2-testall/]
* [2.2 
dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9647-2.2-dtest/]

 Tables created by cassandra-stress are omitted in DESCRIBE KEYSPACE
 ---

 Key: CASSANDRA-9647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9647
 Project: Cassandra
  Issue Type: Bug
Reporter: Ryan McGuire
Assignee: Tyler Hobbs
Priority: Minor
  Labels: cqlsh, stress
 Fix For: 2.2.0 rc2


 CASSANDRA-9374 modified cassandra-stress to only use CQL for creating its 
 schema. This seems to work, as I'm testing on a cluster with start_rpc:false.
 However, when I try to run a DESCRIBE on the schema it omits the tables, 
 complaining that they were created with a legacy API:
 {code}
 cqlsh DESCRIBE KEYSPACE keyspace1 ;
 CREATE KEYSPACE keyspace1 WITH replication = {'class': 'SimpleStrategy', 
 'replication_factor': '1'}  AND durable_writes = true;
 /*
 Warning: Table keyspace1.counter1 omitted because it has constructs not 
 compatible with CQL (was created via legacy API).
 Approximate structure, for reference:
 (this should not be used to reproduce this schema)
 CREATE TABLE keyspace1.counter1 (
 key blob PRIMARY KEY,
 C0 counter,
 C1 counter,
 C2 counter,
 C3 counter,
 C4 counter
 ) WITH COMPACT STORAGE
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 */
 /*
 Warning: Table keyspace1.standard1 omitted because it has constructs not 
 compatible with CQL (was created via legacy API).
 Approximate structure, for reference:
 (this should not be used to reproduce this schema)
 CREATE TABLE keyspace1.standard1 (
 key blob PRIMARY KEY,
 C0 blob,
 C1 blob,
 C2 blob,
 C3 blob,
 C4 blob
 ) WITH COMPACT STORAGE
 AND bloom_filter_fp_chance = 0.01
 AND caching = '{keys:ALL, rows_per_partition:NONE}'
 AND comment = ''
 AND compaction = {'min_threshold': '4', 'class': 
 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
 'max_threshold': '32'}
 AND compression = {}
 AND dclocal_read_repair_chance = 0.1
 AND default_time_to_live = 0
 AND gc_grace_seconds = 864000
 AND max_index_interval = 2048
 AND memtable_flush_period_in_ms = 0
 AND min_index_interval = 128
 AND read_repair_chance = 0.0
 AND speculative_retry = '99.0PERCENTILE';
 */
 cqlsh 
 {code}
 Note that it attempts to describe them anyway, but they are commented out and 
 shouldn't be used to restore from.
 [This is the ccm workflow I used to test 
 this|https://gist.githubusercontent.com/EnigmaCurry/e779055c8debf6de8ef9/raw/a894e99725b6df599f3ce1db5012dd6d069b1339/gistfile1.txt]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9726) Built in aggregate docs do not display examples due to whitespace error

Christopher Batey created CASSANDRA-9726:


 Summary: Built in aggregate docs do not display examples due to 
whitespace error
 Key: CASSANDRA-9726
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9726
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website
Reporter: Christopher Batey


Fix on branch aggregate-docs at https://github.com/chbatey/cassandra-1.git



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9556) Add newer data types to cassandra stress (e.g. decimal, dates, UDTs)

2015-07-03 Thread Jonathan Ellis (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-9556:
--
Assignee: ZhaoYang
Reviewer: Benjamin Lerer  (was: Jeremy Hanna)

[~blerer] to review

 Add newer data types to cassandra stress (e.g. decimal, dates, UDTs)
 

 Key: CASSANDRA-9556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9556
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jeremy Hanna
Assignee: ZhaoYang
  Labels: stress
 Attachments: cassandra-2.1-9556.txt, trunk-9556.txt


 Currently you can't define a data model with decimal types and use Cassandra 
 stress with it.  Also, I imagine that holds true with other newer data types 
 such as the new date and time types.  Besides that, now that data models are 
 including user defined types, we should allow users to create those 
 structures with stress as well.  Perhaps we could split out the UDTs into a 
 different ticket if it holds the other types up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-9591:

Labels: benedict-to-commit sstablescrub  (was: sstablescrub)

 Scrub (recover) sstables even when -Index.db is missing
 ---

 Key: CASSANDRA-9591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
 Project: Cassandra
  Issue Type: Improvement
Reporter: mck
Assignee: mck
  Labels: benedict-to-commit, sstablescrub
 Fix For: 2.0.x

 Attachments: 9591-2.0.txt, 9591-2.1.txt


 Today SSTableReader needs at minimum 3 files to load an sstable:
  - -Data.db
  - -CompressionInfo.db 
  - -Index.db
 But during the scrub process the -Index.db file isn't actually necessary, 
 unless there's corruption in the -Data.db and we want to be able to skip over 
 corrupted rows. Given that there is still a fair chance that there's nothing 
 wrong with the -Data.db file and we're just missing the -Index.db file this 
 patch addresses that situation.
 So the following patch makes it possible for the StandaloneScrubber 
 (sstablescrub) to recover sstables despite missing -Index.db files.
 This can happen from a catastrophic incident where data directories have been 
 lost and/or corrupted, or wiped and the backup not healthy. I'm aware that 
 normally one depends on replicas or snapshots to avoid such situations, but 
 such catastrophic incidents do occur in the wild.
 I have not tested this patch against normal c* operations and all the other 
 (more critical) ways SSTableReader is used. i'll happily do that and add the 
 needed units tests if people see merit in accepting the patch.
 Otherwise the patch can live with the issue, in-case anyone else needs it. 
 There's also a cassandra distribution bundled with the patch 
 [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
  to make life a little easier for anyone finding themselves in such a bad 
 situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9723) UDF / UDA execution time in trace


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613019#comment-14613019
 ] 

Robert Stupp commented on CASSANDRA-9723:
-

Thanks for pointing this out! Always had that in my mind but unfortunately 
completely missed to open a JIRA for this.

 UDF / UDA execution time in trace
 -

 Key: CASSANDRA-9723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Minor

 I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and 
 doesn't appear to be mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612937#comment-14612937
 ] 

Stefania commented on CASSANDRA-7392:
-

[~slebresne], I would like to know how to best abort a read operation. I think 
we have several options:

- Add an abort requested control field (OpState) in the code 
[here|https://github.com/stef1927/cassandra/commits/7392] to ReadCommand but 
this means changing the constructor chain which is very complex (the index read 
command needs to inherit the control field of the main command - unless we make 
it not final)
- Pass it as an argument to the various methods (executeLocally, queryStorage, 
search, etc)
- Add it to ReadOrderGroup, which is passed almost everywhere but it isn't 
really related
- Simply stop the iterator in executeLocally with a wrapper iterator (this is 
required anyway I believe but it would not abort the index reads) 

Could you comment on :
- your preferred option, I don't want to spoil your design :)
- whether the code is stable or whether there is some refactoring still missing 
that I should wait for or help out with

 Abort in-progress queries that time out
 ---

 Key: CASSANDRA-7392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Stefania
 Fix For: 3.x


 Currently we drop queries that time out before we get to them (because node 
 is overloaded) but not queries that time out while being processed.  
 (Particularly common for index queries on data that shouldn't be indexed.)  
 Adding the latter and logging when we have to interrupt one gets us a poor 
 man's slow query log for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9723) UDF / UDA execution time in trace


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-9723:

Fix Version/s: 2.2.x

 UDF / UDA execution time in trace
 -

 Key: CASSANDRA-9723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Minor
 Fix For: 2.2.x


 I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and 
 doesn't appear to be mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times

Christopher Batey created CASSANDRA-9724:


 Summary: UDA appears to be causing query to be executed multiple 
times
 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Priority: Critical


Not sure if this is intended behaviour.

Example table:

{quote}
CREATE TABLE raw_weather_data (
   wsid text,   // Composite of Air Force Datsav3 station number 
and NCDC WBAN number
   year int,// Year collected
   month int,   // Month collected
   day int, // Day collected
   hour int,// Hour collected
   temperature double,   // Air temperature (degrees Celsius)
   dewpoint double,  // Dew point temperature (degrees Celsius)
   pressure double,  // Sea level pressure (hectopascals)
   wind_direction int,  // Wind direction in degrees. 0-359
   wind_speed double,// Wind speed (meters per second)
   sky_condition int,   // Total cloud cover (coded, see format 
documentation)
   sky_condition_text text, // Non-coded sky conditions
   one_hour_precip double,   // One-hour accumulated liquid precipitation 
(millimeters)
   six_hour_precip double,   // Six-hour accumulated liquid precipitation 
(millimeters)
   PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
{quote}

1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
where wsid = '725030:14732' and year = 2008;

{quote}

 activity   
 | timestamp  | source| 
source_elapsed
-++---+

  Execute CQL3 query | 2015-07-03 09:53:25.002000 | 127.0.0.1 | 
 0
 Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 127.0.0.1 
|109
   
Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
127.0.0.1 |193
  Executing single-partition query on 
raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 127.0.0.1 
|519
  Acquiring 
sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
127.0.0.1 |544
   Merging 
memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
127.0.0.1 |558
 Skipped 0/0 non-slice-intersecting sstables, included 0 
due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
127.0.0.1 |600
Merging data from memtables 
and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 127.0.0.1 | 
   612
Read 92 live and 0 
tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 127.0.0.1 
|848

Request complete | 2015-07-03 09:53:25.003680 | 127.0.0.1 | 
  1680
{quote}

However once i include the min function i get: select min(temperature) from 
raw_weather_data where wsid = '725030:14732' and year = 2008;

{quote}
 activity   
  | timestamp  | source 
   | source_elapsed
--++---+

   Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
127.0.0.1 |  0
 Parsing select min(temperature) from raw_weather_data where wsid = 
'725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03 
09:56:15.904000 | 127.0.0.1 |108

Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:56:15.904000 | 
127.0.0.1 |201
   Executing single-partition

[jira] [Updated] (CASSANDRA-9715) Secondary index out of sync

2015-07-03 Thread Hazel Bobrins (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hazel Bobrins updated CASSANDRA-9715:
-
Environment: RHEL 6.2 2.6.32-220.13.1.el6.x86_64 / Java 1.7.0_76 (was:
RHEL 6.2 2.6.32-220.13.1.el6.x86_64)

Secondary index out of sync
---

Key: CASSANDRA-9715
URL: https://issues.apache.org/jira/browse/CASSANDRA-9715
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: RHEL 6.2 2.6.32-220.13.1.el6.x86_64 / Java 1.7.0_76
Reporter: Hazel Bobrins

On 2.0.15 ( we moved from 2.08 hoping this problem would go away) we am
seeing intermittent issues where a secondary index is getting out of sync.
Set up is a 6 node cluster with 3 data centers, two nodes in each and with a
RF of 2 in each data centre.
So far I have been unable to reproduce this synthetically but have seen
multiple instances across all nodes within the cluster.
Data set is very small ~40K keys and 100MB of data. We add maybe 1000
records a day, delete ~500 and update ~200. Not a very write based system.
Reads we can push out to ~2000/sec.
Writes are done at CL ALL and reads at ONE
All examples so far have been triggered when a record has been deleted and
then other added with the same index cardinality; I think it has also always
been the last record in the set which was deleted before the addition.
On a flushed keyspace a sstable2json export of the primary index shows all
records correctly, however, an export of the secondary index is missing the
records.
nodetool rebuild_index does not resolve the problem
Nether does a compact or repair
A select on the primary key at CL ALL also has no impact
However, a select at CL ALL on the secondary index does resolve the problem.
There is currently a none critical record which is out of the index on one of
our nodes. If another key is added with the same index cardinality it is
added to the index correctly. If this is then removed it once again returns
empty.
We have checked all the obvious OS bits and confirmed our time sync (ntp
based).
At DEBUG level we see nothing obvious wrong when adding/removing keys to the
above broken entry.
Due to the very intermittent nature of this problem is been impossible so far
to gather any DEBUG logs of it failing; we have also been unsuccessful so far
in reproducing this in out QA.
I know this is not much to go on, if there is anything we can provide to help
expand what might be the issue please let me know and we'll provide it asap.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613012#comment-14613012
 ] 

Sylvain Lebresne commented on CASSANDRA-7392:
-

A least for ranges and single partition slice queries, aborting in 
{{queryMemtableAndDiskInternal}} won't buy us much: building the iterator won't 
take much time at all, it's the reading of the iterator that may take time.

So we do at least need the ability to abort the iterator reading, and for that 
wrapping the result iterator in {{executeLocally}} as you said sounds to me 
like the best/simplest option.

That leaves single partition names queries, for which the work is indeed done 
in {{queryMemtableAndDiskInternal}}. For that, I do would avoid adding it as a 
field of {{ReadCommand}}, as aborting is more a property of the execution than 
of the command itself. Maybe we could add it to {{ReadOrderGroup}} but rename 
that class to something more generic (maybe {{ExecutionController}}?), so it 
doesn't feel out of place, and that could be convenient place to add more stuff 
in the future.

I'll remark however that for names queries, the proper way to protect for long 
queries is also to wrap the iterators read inside of 
{{queryMemtableAndDiskInternal}}. Only checking for aborting at the begining of 
handling each memtable/sstable (like in the patch you've linked) is probably 
not fine-grained enough (in the sense that a names query is likely to ony take 
a long time if lots of names are queried, and if that's the case reading a 
single sstable could take quite some time).

bq. but it would not abort the index reads

It would actually, in the sense that we don't query the index fully upfront, we 
do it on-demand when the main iterator requires more data.

bq. whether the code is stable or whether there is some refactoring still 
missing that I should wait for

As far as I'm concerned, the only missing refactoring is CASSANDRA-9705, and 
that will almost surely not affect any of the code you will touch in this 
ticket, so you're clear :)


 Abort in-progress queries that time out
 ---

 Key: CASSANDRA-7392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Stefania
 Fix For: 3.x


 Currently we drop queries that time out before we get to them (because node 
 is overloaded) but not queries that time out while being processed.  
 (Particularly common for index queries on data that shouldn't be indexed.)  
 Adding the latter and logging when we have to interrupt one gets us a poor 
 man's slow query log for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-9723) UDF / UDA execution time in trace

Christopher Batey created CASSANDRA-9723:


 Summary: UDF / UDA execution time in trace
 Key: CASSANDRA-9723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Christopher Batey
Priority: Minor


I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and 
doesn't appear to be mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-9723) UDF / UDA execution time in trace


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp reassigned CASSANDRA-9723:
---

Assignee: Robert Stupp

 UDF / UDA execution time in trace
 -

 Key: CASSANDRA-9723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Minor

 I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and 
 doesn't appear to be mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-9723) UDF / UDA execution time in trace


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613019#comment-14613019
 ] 

Robert Stupp edited comment on CASSANDRA-9723 at 7/3/15 8:58 AM:
-

Thanks for pointing this out! Always had that in my mind but unfortunately 
completely missed to open a JIRA for this.

EDIT: should be a trivial patch - may make it into 2.2.0rc2.


was (Author: snazy):
Thanks for pointing this out! Always had that in my mind but unfortunately 
completely missed to open a JIRA for this.

 UDF / UDA execution time in trace
 -

 Key: CASSANDRA-9723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9723
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Minor
 Fix For: 2.2.x


 I'd like to see how long my UDF/As take in the trace. Checked in 2.2rc1 and 
 doesn't appear to be mentioned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9683) Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 2.1.7

2015-07-03 Thread Loic Lambiel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612874#comment-14612874
 ] 

Loic Lambiel commented on CASSANDRA-9683:
-

Yes it is correct

 Get mucher higher load and latencies after upgrading from 2.1.6 to cassandra 
 2.1.7
 --

 Key: CASSANDRA-9683
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9683
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04 (3.13 Kernel) * 3
 JDK: Oracle JDK 7
 RAM: 32GB
 Cores 4 (+4 HT)
Reporter: Loic Lambiel
Assignee: Ariel Weisberg
 Fix For: 2.1.x

 Attachments: cassandra.yaml, cfstats.txt, os_load.png, 
 pending_compactions.png, read_latency.png, schema.txt, system.log, 
 write_latency.png


 After upgrading our cassandra staging cluster version from 2.1.6 to 2.1.7, 
 the average load grows from 0.1-0.3 to 1.8.
 Latencies did increase as well.
 We see an increase of pending compactions, probably due to CASSANDRA-9592.
 This cluster has almost no workload (staging environment)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7066) Simplify (and unify) cleanup of compaction leftovers


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-7066:

Labels: benedict-to-commit compaction  (was: compaction)

 Simplify (and unify) cleanup of compaction leftovers
 

 Key: CASSANDRA-7066
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7066
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Stefania
Priority: Minor
  Labels: benedict-to-commit, compaction
 Fix For: 3.x

 Attachments: 7066.txt


 Currently we manage a list of in-progress compactions in a system table, 
 which we use to cleanup incomplete compactions when we're done. The problem 
 with this is that 1) it's a bit clunky (and leaves us in positions where we 
 can unnecessarily cleanup completed files, or conversely not cleanup files 
 that have been superceded); and 2) it's only used for a regular compaction - 
 no other compaction types are guarded in the same way, so can result in 
 duplication if we fail before deleting the replacements.
 I'd like to see each sstable store in its metadata its direct ancestors, and 
 on startup we simply delete any sstables that occur in the union of all 
 ancestor sets. This way as soon as we finish writing we're capable of 
 cleaning up any leftovers, so we never get duplication. It's also much easier 
 to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Batey updated CASSANDRA-9724:
-
Attachment: data.zip

Dump of rows from raw_weather_data table

 UDA appears to be causing query to be executed multiple times
 -

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Critical
 Attachments: data.zip


 Not sure if this is intended behaviour.
 Example table:
 {quote}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {quote}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {quote}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select min(temperature) from raw_weather_data where wsid =

[jira] [Commented] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613035#comment-14613035
 ] 

Christopher Batey commented on CASSANDRA-9724:
--

Uploaded, CSV from the table the queries are done on.

 UDA appears to be causing query to be executed multiple times
 -

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Critical
 Attachments: data.zip


 Not sure if this is intended behaviour.
 Example table:
 {quote}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {quote}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {quote}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

[
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613065#comment-14613065
]

Sylvain Lebresne commented on CASSANDRA-9471:
-

bq. If we choose not to include this feature, it would be better to implement
these directly

How much better? Thinking out loud here, but we're a database, we're dealing
with sorted stuff all the time. So even outside of its use (or not) by
{{Columns}}, having a more capable {{BtreeSet}} implementation, one that can
act more like an efficient sorted list, feels to me like something that would
be useful to have in our tool belt. Meaning by that it sounds from you comments
that the indexability does add much complexity to the
implementation(disclaimer: I haven't looked at the patch) , so if its cost is
really small, maybe it's worth getting the flexibility?

Columns should be backed by a BTree, not an array
-

Key: CASSANDRA-9471
URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 3.0 beta 1

Follow up to 8099.
We have pretty terrible lookup performance as the number of columns grows
(linear). In at least one location, this results in quadratic performance.
We don't however want this structure to be either any more expensive to
build, nor to store. Some small modifications to BTree will permit it to
serve here, by permitting efficient lookup by index, and calculation _of_
index for a given key.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613093#comment-14613093
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

bq. for the normal worries of code atrophy

Yes, that is something to take into account. However, it's also a utility 
class, one that is meant to be used a lot in the codebase. And the indexability 
code both already written. So if it doesn't introduce significant complexity, 
it does feels like a relatively good deal. Basically, I would hate to spend 
more time pulling the already written functionality out to maybe end up someday 
having a good use of this, but ending up doing something less efficient just 
because it's not there. Besides, it's totally possible it will be used by 
{{Columns}} in the end :)

Anyway, I don't want to sound insistent, it's not that I absolutely want it. 
Just offering that maybe simply rebasing that ticket now would avoid pushing 
that to when we might be even shorter on resources than we are, doesn't 
precludes considering better alternative for {{Columns}} later, and won't waste 
all that much work if we do end up changing {{Columns}} but keep the 
indexability as generally useful. 

 Columns should be backed by a BTree, not an array
 -

 Key: CASSANDRA-9471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0 beta 1


 Follow up to 8099. 
 We have pretty terrible lookup performance as the number of columns grows 
 (linear). In at least one location, this results in quadratic performance. 
 We don't however want this structure to be either any more expensive to 
 build, nor to store. Some small modifications to BTree will permit it to 
 serve here, by permitting efficient lookup by index, and calculation _of_ 
 index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9686) FSReadError and LEAK DETECTED after upgrading


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613039#comment-14613039
 ] 

Stefania commented on CASSANDRA-9686:
-

No one volunteered on IRC so let's wait for [~krummas] to be back next week 
regarding handling of corrupt sstables.


 FSReadError and LEAK DETECTED after upgrading
 -

 Key: CASSANDRA-9686
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9686
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55
Reporter: Andreas Schnitzerling
Assignee: Stefania
 Fix For: 2.2.x

 Attachments: cassandra.bat, cassandra.yaml, 
 compactions_in_progress.zip, sstable_activity.zip, system.log


 After upgrading one of 15 nodes from 2.1.7 to 2.2.0-rc1 I get FSReadError and 
 LEAK DETECTED on start. Deleting the listed files, the failure goes away.
 {code:title=system.log}
 ERROR [SSTableBatchOpen:1] 2015-06-29 14:38:34,554 
 DebuggableThreadPoolExecutor.java:242 - Error in ThreadPoolExecutor
 org.apache.cassandra.io.FSReadError: java.io.IOException: Compressed file 
 with 0 chunks encountered: java.io.DataInputStream@1c42271
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:178)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:117)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:86)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:142)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:101)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:178)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:681)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader.load(SSTableReader.java:644)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:443)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader.open(SSTableReader.java:350)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.io.sstable.format.SSTableReader$4.run(SSTableReader.java:480)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
 ~[na:1.7.0_55]
   at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
 [na:1.7.0_55]
   at java.lang.Thread.run(Unknown Source) [na:1.7.0_55]
 Caused by: java.io.IOException: Compressed file with 0 chunks encountered: 
 java.io.DataInputStream@1c42271
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.readChunkOffsets(CompressionMetadata.java:174)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   ... 15 common frames omitted
 ERROR [Reference-Reaper:1] 2015-06-29 14:38:34,734 Ref.java:189 - LEAK 
 DETECTED: a reference 
 (org.apache.cassandra.utils.concurrent.Ref$State@3e547f) to class 
 org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1926439:D:\Programme\Cassandra\data\data\system\compactions_in_progress\system-compactions_in_progress-ka-6866
  was not released before the reference was garbage collected
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

[
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613058#comment-14613058
]

Benedict commented on CASSANDRA-9471:
-

Well, the decision does ultimately affect how certain features within the btree
are implemented - or at least the cost/benefit analysis (for the reviewer as
much as myself). Right now I've used the indexability feature to make a trivial
implementation of lower/higher/floor/ceil, because it permits you to treat the
whole btree as though it were an array for indexing, using binarySearch
semantics and positional access. If we choose not to include this feature, it
would be better to implement these directly - not onerous, of course, but I
want to avoid burdening branimir with unnecessary review. There's also some
intertwining on testing (using higher features to help test lower ones).

However you make a good point, and I will see what minimal set of changes I can
extract to get the ball rolling. It's probably still pretty significant and
helpful.

Columns should be backed by a BTree, not an array
-

Key: CASSANDRA-9471
URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 3.0 beta 1

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613048#comment-14613048
 ] 

Benedict edited comment on CASSANDRA-9591 at 7/3/15 9:31 AM:
-

Perhaps we should just {{obsoleteOriginals}} up-front, if offline? We could 
even do it in the {{StandaloneScrubber}}, before calling {{scrubber.scrub()}}, 
to avoid polluting the general purpose {{Scrubber}}.

No strong feelings though - the patch looks like it works to me.

[~stefania]: could you rebase your branches and once CI passes I'll commit.


was (Author: benedict):
Perhaps we should just {{obsoleteOriginals}} up-front, if offline? We could 
even do it in the {{StandaloneScrubber}}, before calling {{scrubber.scrub()}}, 
to avoid polluting the general purpose {{Scrubber}}.

No strong feelings though - the patch looks like it works to me.

[~stef1927]: could you rebase your branches and once CI passes I'll commit.

 Scrub (recover) sstables even when -Index.db is missing
 ---

 Key: CASSANDRA-9591
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
 Project: Cassandra
  Issue Type: Improvement
Reporter: mck
Assignee: mck
  Labels: benedict-to-commit, sstablescrub
 Fix For: 2.0.x

 Attachments: 9591-2.0.txt, 9591-2.1.txt


 Today SSTableReader needs at minimum 3 files to load an sstable:
  - -Data.db
  - -CompressionInfo.db 
  - -Index.db
 But during the scrub process the -Index.db file isn't actually necessary, 
 unless there's corruption in the -Data.db and we want to be able to skip over 
 corrupted rows. Given that there is still a fair chance that there's nothing 
 wrong with the -Data.db file and we're just missing the -Index.db file this 
 patch addresses that situation.
 So the following patch makes it possible for the StandaloneScrubber 
 (sstablescrub) to recover sstables despite missing -Index.db files.
 This can happen from a catastrophic incident where data directories have been 
 lost and/or corrupted, or wiped and the backup not healthy. I'm aware that 
 normally one depends on replicas or snapshots to avoid such situations, but 
 such catastrophic incidents do occur in the wild.
 I have not tested this patch against normal c* operations and all the other 
 (more critical) ways SSTableReader is used. i'll happily do that and add the 
 needed units tests if people see merit in accepting the patch.
 Otherwise the patch can live with the issue, in-case anyone else needs it. 
 There's also a cassandra distribution bundled with the patch 
 [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
  to make life a little easier for anyone finding themselves in such a bad 
 situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array

[
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613081#comment-14613081
]

Benedict commented on CASSANDRA-9471:
-

Well, performance-wise the difference is negligible. There is an extra lg(N/32)
cost for the current implementation, which amortizes to imperceptible (and
literally zero for small sets). The fact we don't use higher/lower/ceil/floor
very commonly means I'm confident this extra cost is better to incur for the
simplicity of implementation.

The reason I say better is exclusively because there is a more direct
implementation for the inequality lookups. If we don't have _another_ reason
for indexing it seems better practice to implement that directly, and leave out
the indexing feature.

The indexability is actually surprisingly simple, and doesn't introduce
significant complexity IMO. I'm just a little wary of introducing features we
don't use _directly_ (even if I have an attachment to it), for the normal
worries of code atrophy. I certainly won't argue against its inclusion, though,
as I agree it seems like it _should_ be more generally useful. I'm just not yet
aware of another place for it.

Columns should be backed by a BTree, not an array
-

Key: CASSANDRA-9471
URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Benedict
Assignee: Benedict
Fix For: 3.0 beta 1

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp reassigned CASSANDRA-9724:
---

Assignee: Robert Stupp

 UDA appears to be causing query to be executed multiple times
 -

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Critical

 Not sure if this is intended behaviour.
 Example table:
 {quote}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {quote}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {quote}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select min(temperature) from raw_weather_data where wsid = 
 '725030:14732' and year = 2008; [SharedPool-Worker-1] | 2015-07-03

[jira] [Commented] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613027#comment-14613027
 ] 

Robert Stupp commented on CASSANDRA-9724:
-

[~chbatey] do you have some sample data as CSV or CQL?

 UDA appears to be causing query to be executed multiple times
 -

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
Priority: Critical

 Not sure if this is intended behaviour.
 Example table:
 {quote}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {quote}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {quote}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select min(temperature) from raw_weather_data where wsid

[jira] [Updated] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefania updated CASSANDRA-8894:

Labels: benedict-to-commit  (was: )

 Our default buffer size for (uncompressed) buffered reads should be smaller, 
 and based on the expected record size
 --

 Key: CASSANDRA-8894
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Stefania
  Labels: benedict-to-commit
 Fix For: 3.x


 A large contributor to slower buffered reads than mmapped is likely that we 
 read a full 64Kb at once, when average record sizes may be as low as 140 
 bytes on our stress tests. The TLB has only 128 entries on a modern core, and 
 each read will touch 32 of these, meaning we are unlikely to almost ever be 
 hitting the TLB, and will be incurring at least 30 unnecessary misses each 
 time (as well as the other costs of larger than necessary accesses). When 
 working with an SSD there is little to no benefit reading more than 4Kb at 
 once, and in either case reading more data than we need is wasteful. So, I 
 propose selecting a buffer size that is the next larger power of 2 than our 
 average record size (with a minimum of 4Kb), so that we expect to read in one 
 operation. I also propose that we create a pool of these buffers up-front, 
 and that we ensure they are all exactly aligned to a virtual page, so that 
 the source and target operations each touch exactly one virtual page per 4Kb 
 of expected record size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9591) Scrub (recover) sstables even when -Index.db is missing

[
https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613048#comment-14613048
]

Benedict commented on CASSANDRA-9591:
-

Perhaps we should just {{obsoleteOriginals}} up-front, if offline? We could
even do it in the {{StandaloneScrubber}}, before calling {{scrubber.scrub()}},
to avoid polluting the general purpose {{Scrubber}}.

No strong feelings though - the patch looks like it works to me.

[~stef1927]: could you rebase your branches and once CI passes I'll commit.

Scrub (recover) sstables even when -Index.db is missing
---

Key: CASSANDRA-9591
URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
Project: Cassandra
Issue Type: Improvement
Reporter: mck
Assignee: mck
Labels: benedict-to-commit, sstablescrub
Fix For: 2.0.x

Attachments: 9591-2.0.txt, 9591-2.1.txt

Today SSTableReader needs at minimum 3 files to load an sstable:
- -Data.db
- -CompressionInfo.db
- -Index.db
But during the scrub process the -Index.db file isn't actually necessary,
unless there's corruption in the -Data.db and we want to be able to skip over
corrupted rows. Given that there is still a fair chance that there's nothing
wrong with the -Data.db file and we're just missing the -Index.db file this
patch addresses that situation.
So the following patch makes it possible for the StandaloneScrubber
(sstablescrub) to recover sstables despite missing -Index.db files.
This can happen from a catastrophic incident where data directories have been
lost and/or corrupted, or wiped and the backup not healthy. I'm aware that
normally one depends on replicas or snapshots to avoid such situations, but
such catastrophic incidents do occur in the wild.
I have not tested this patch against normal c* operations and all the other
(more critical) ways SSTableReader is used. i'll happily do that and add the
needed units tests if people see merit in accepting the patch.
Otherwise the patch can live with the issue, in-case anyone else needs it.
There's also a cassandra distribution bundled with the patch
[here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
to make life a little easier for anyone finding themselves in such a bad
situation.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613046#comment-14613046
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

bq.  at the very least, the improved iterator, improved tests, and wider 
deployment of the btree are all worth incorporating.

What about moving those changes to a separate ticket (i.e. one that is not 
concerned by {{Columns}})? It's useful to trunk anyway as you says, and the 
less stuff we delay, the better. Splitting the changes related to {{Columns}} 
from the other is also more incremental in a way :)

 Columns should be backed by a BTree, not an array
 -

 Key: CASSANDRA-9471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0 beta 1


 Follow up to 8099. 
 We have pretty terrible lookup performance as the number of columns grows 
 (linear). In at least one location, this results in quadratic performance. 
 We don't however want this structure to be either any more expensive to 
 build, nor to store. Some small modifications to BTree will permit it to 
 serve here, by permitting efficient lookup by index, and calculation _of_ 
 index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9694) system_auth not upgraded

2015-07-03 Thread Andreas Schnitzerling (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613087#comment-14613087
 ] 

Andreas Schnitzerling commented on CASSANDRA-9694:
--

I disabled auth and everything else is working correctly. I tested it for 20 
hours continuously running.

 system_auth not upgraded
 

 Key: CASSANDRA-9694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55
Reporter: Andreas Schnitzerling
Assignee: Sam Tunnicliffe
 Fix For: 2.2.0 rc2

 Attachments: 9694.txt, system_exception.log


 After upgrading Authorization-Exceptions occur. I checked the system_auth 
 keyspace and have seen, that tables users, credentials and permissions were 
 not upgraded automatically. I upgraded them (I needed 2 times per table 
 because of CASSANDRA-9566). After upgrading the system_auth tables I could 
 login via cql using different users.
 {code:title=system.log}
 WARN  [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - 
 CassandraAuthorizer failed to authorize #User updateprog for keyspace 
 logdata
 ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - 
 Error occurred during processing of message.
 com.google.common.util.concurrent.UncheckedExecutionException: 
 java.lang.RuntimeException: 
 org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
 received only 0 responses.
   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) 
 ~[guava-16.0.jar:na]
   at com.google.common.cache.LocalCache.get(LocalCache.java:3934) 
 ~[guava-16.0.jar:na]
   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) 
 ~[guava-16.0.jar:na]
   at 
 com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821)
  ~[guava-16.0.jar:na]
   at 
 org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588)
  ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572)
  ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1]
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
 ~[libthrift-0.9.2.jar:0.9.2]
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
 ~[libthrift-0.9.2.jar:0.9.2]
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
 [na:1.7.0_55]
   at java.lang.Thread.run(Unknown Source) [na:1.7.0_55]
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9694) system_auth not upgraded

2015-07-03 Thread Sam Tunnicliffe (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613106#comment-14613106
 ] 

Sam Tunnicliffe commented on CASSANDRA-9694:


We haven't seen this problem in any of our testing, and afraid  I'm unable to 
reproduce it now. Do you have the logs from the node when you first upgraded 
it, before wiping system_auth? Failing that, the only things I can suggest are 
to upgrade another node to 2.2.0-rc1 and capture its logs (at INFO level at 
least) or to rebuild the upgraded node on 2.1.7 then run the upgrade again, 
again capturing the logs.

 system_auth not upgraded
 

 Key: CASSANDRA-9694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55
Reporter: Andreas Schnitzerling
Assignee: Sam Tunnicliffe
 Fix For: 2.2.0 rc2

 Attachments: 9694.txt, system_exception.log


 After upgrading Authorization-Exceptions occur. I checked the system_auth 
 keyspace and have seen, that tables users, credentials and permissions were 
 not upgraded automatically. I upgraded them (I needed 2 times per table 
 because of CASSANDRA-9566). After upgrading the system_auth tables I could 
 login via cql using different users.
 {code:title=system.log}
 WARN  [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - 
 CassandraAuthorizer failed to authorize #User updateprog for keyspace 
 logdata
 ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - 
 Error occurred during processing of message.
 com.google.common.util.concurrent.UncheckedExecutionException: 
 java.lang.RuntimeException: 
 org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
 received only 0 responses.
   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) 
 ~[guava-16.0.jar:na]
   at com.google.common.cache.LocalCache.get(LocalCache.java:3934) 
 ~[guava-16.0.jar:na]
   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) 
 ~[guava-16.0.jar:na]
   at 
 com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821)
  ~[guava-16.0.jar:na]
   at 
 org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588)
  ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572)
  ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1]
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
 ~[libthrift-0.9.2.jar:0.9.2]
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
 ~[libthrift-0.9.2.jar:0.9.2]
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

[jira] [Updated] (CASSANDRA-9724) UDA appears to be causing query to be executed multiple times

2015-07-03 Thread Aleksey Yeschenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-9724:
-
Priority: Major  (was: Critical)

 UDA appears to be causing query to be executed multiple times
 -

 Key: CASSANDRA-9724
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9724
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Christopher Batey
Assignee: Robert Stupp
 Attachments: data.zip


 Not sure if this is intended behaviour.
 Example table:
 {quote}
 CREATE TABLE raw_weather_data (
wsid text,   // Composite of Air Force Datsav3 station number 
 and NCDC WBAN number
year int,// Year collected
month int,   // Month collected
day int, // Day collected
hour int,// Hour collected
temperature double,   // Air temperature (degrees Celsius)
dewpoint double,  // Dew point temperature (degrees Celsius)
pressure double,  // Sea level pressure (hectopascals)
wind_direction int,  // Wind direction in degrees. 0-359
wind_speed double,// Wind speed (meters per second)
sky_condition int,   // Total cloud cover (coded, see format 
 documentation)
sky_condition_text text, // Non-coded sky conditions
one_hour_precip double,   // One-hour accumulated liquid precipitation 
 (millimeters)
six_hour_precip double,   // Six-hour accumulated liquid precipitation 
 (millimeters)
PRIMARY KEY ((wsid), year, month, day, hour)
 ) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
 {quote}
 1 node cluster 2.2rc1. Trace for: select temperature from raw_weather_data 
 where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
| timestamp  | source  
   | source_elapsed
 -++---+
   
 Execute CQL3 query | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |  0
  Parsing select temperature from raw_weather_data where wsid = '725030:14732' 
 and year = 2008; [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |109

 Preparing statement [SharedPool-Worker-1] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |193
   Executing single-partition query on 
 raw_weather_data [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |519
   Acquiring 
 sstable references [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |544
Merging 
 memtable tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002000 | 
 127.0.0.1 |558
  Skipped 0/0 non-slice-intersecting sstables, included 0 
 due to tombstones [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |600
 Merging data from 
 memtables and 0 sstables [SharedPool-Worker-2] | 2015-07-03 09:53:25.002001 | 
 127.0.0.1 |612
 Read 92 live and 
 0 tombstone cells [SharedPool-Worker-2] | 2015-07-03 09:53:25.003000 | 
 127.0.0.1 |848
   
   Request complete | 2015-07-03 09:53:25.003680 | 
 127.0.0.1 |   1680
 {quote}
 However once i include the min function i get: select min(temperature) from 
 raw_weather_data where wsid = '725030:14732' and year = 2008;
 {quote}
  activity 
 | timestamp  | 
 source| source_elapsed
 --++---+
   
  Execute CQL3 query | 2015-07-03 09:56:15.904000 | 
 127.0.0.1 |  0
  Parsing select min(temperature) from raw_weather_data where wsid = 
 '725030:14732' and year = 2008; [SharedPool-Worker-1] |

[jira] [Updated] (CASSANDRA-9694) system_auth not upgraded

2015-07-03 Thread Andreas Schnitzerling (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Schnitzerling updated CASSANDRA-9694:
-
Attachment: system.log.2..zip
system.log.1.zip

Here the steps:
1. copied 2.2.0 instance w/o commitlog, data, saved_caches
2. created data dir
3. copied (backed up) 2.1.7 data (user-ks + system + system_auth + 
system_traces) into data (except user-CF onlinedata which is the bigest 
containing 13 GB of data)
4. changed log-level to DEBUG
5. enabled auth
6. started cassandra
7. after 8 minutes nodetool stopdaemon

 system_auth not upgraded
 

 Key: CASSANDRA-9694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows-7-32 bit, 3.2GB RAM, Java 1.7.0_55
Reporter: Andreas Schnitzerling
Assignee: Sam Tunnicliffe
 Fix For: 2.2.0 rc2

 Attachments: 9694.txt, system.log.1.zip, system.log.2..zip, 
 system_exception.log


 After upgrading Authorization-Exceptions occur. I checked the system_auth 
 keyspace and have seen, that tables users, credentials and permissions were 
 not upgraded automatically. I upgraded them (I needed 2 times per table 
 because of CASSANDRA-9566). After upgrading the system_auth tables I could 
 login via cql using different users.
 {code:title=system.log}
 WARN  [Thrift:14] 2015-07-01 11:38:57,748 CassandraAuthorizer.java:91 - 
 CassandraAuthorizer failed to authorize #User updateprog for keyspace 
 logdata
 ERROR [Thrift:14] 2015-07-01 11:41:26,210 CustomTThreadPoolServer.java:223 - 
 Error occurred during processing of message.
 com.google.common.util.concurrent.UncheckedExecutionException: 
 java.lang.RuntimeException: 
 org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
 received only 0 responses.
   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201) 
 ~[guava-16.0.jar:na]
   at com.google.common.cache.LocalCache.get(LocalCache.java:3934) 
 ~[guava-16.0.jar:na]
   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938) 
 ~[guava-16.0.jar:na]
   at 
 com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4821)
  ~[guava-16.0.jar:na]
   at 
 org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:72)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.authorize(ClientState.java:362) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:295)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:272)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:259) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:243)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.statements.SelectStatement.checkAccess(SelectStatement.java:143)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:222)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:256) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:241) 
 ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1891)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4588)
  ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1]
   at 
 org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4572)
  ~[apache-cassandra-thrift-2.2.0-rc1.jar:2.2.0-rc1]
   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
 ~[libthrift-0.9.2.jar:0.9.2]
   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
 ~[libthrift-0.9.2.jar:0.9.2]
   at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:204)
  ~[apache-cassandra-2.2.0-rc1.jar:2.2.0-rc1]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613103#comment-14613103
 ] 

Benedict commented on CASSANDRA-9471:
-

bq. but ending up doing something less efficient just because it's not there

You're right, this can happen frustratingly often. OK. I'm convinced :)

I'll split out the btree-only stuff into a separate ticket.

 Columns should be backed by a BTree, not an array
 -

 Key: CASSANDRA-9471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0 beta 1


 Follow up to 8099. 
 We have pretty terrible lookup performance as the number of columns grows 
 (linear). In at least one location, this results in quadratic performance. 
 We don't however want this structure to be either any more expensive to 
 build, nor to store. Some small modifications to BTree will permit it to 
 serve here, by permitting efficient lookup by index, and calculation _of_ 
 index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9717) TestCommitLog segment size dtests fail on trunk

2015-07-03 Thread Branimir Lambov (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613119#comment-14613119
]

Branimir Lambov commented on CASSANDRA-9717:

I suppose the easiest thing to do here is to increase the tolerance, but the
test can still be flaky after we do that. A better solution is to fix the
random seed for the writes so that we can use a small tolerance and avoid all
flakiness, but I don't know if that's something we can do with the dtest
infrastructure (or how, if we can).

TestCommitLog segment size dtests fail on trunk
---

Key: CASSANDRA-9717
URL: https://issues.apache.org/jira/browse/CASSANDRA-9717
Project: Cassandra
Issue Type: Sub-task
Reporter: Jim Witschey
Assignee: Branimir Lambov
Priority: Blocker
Fix For: 3.0 beta 1

The test for the commit log segment size when the specified size is 32MB. It
fails for me locally and on on cassci. ([cassci
link|http://cassci.datastax.com/view/trunk/job/trunk_dtest/305/testReport/commitlog_test/TestCommitLog/default_segment_size_test/])
The command to run the test by itself is {{CASSANDRA_VERSION=git:trunk
nosetests commitlog_test.py:TestCommitLog.default_segment_size_test}}.
EDIT: a similar test,
{{commitlog_test.py:TestCommitLog.small_segment_size_test}}, also fails with
a similar error.
The solution here may just be to change the expected size or the acceptable
error -- the result isn't far off. I'm happy to make the dtest change if
that's the solution.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613122#comment-14613122
 ] 

Sylvain Lebresne commented on CASSANDRA-9471:
-

Side note: if the changes to {{Columns}} are not hard to rebase, I'd personally 
be fine with just rebasing that ticket as is (without bothering splitting it in 
2 tickets) for the sake of saving you some time. At least for CASSANDRA-9705, I 
don't plan on having much of {{Columns}} going obsolete (the indexability will 
be most likely much less used but will still be handy, and we'll still rely 
heavily-ish on {{contains}} which is currently not terribly efficient). And of 
course, that still doesn't precludes from consider other implementation of 
{{Columns}} later.

Anyway, fine with whatever way you prefer, but just to say that if splitting 
into 2 tickets takes you the same time than just rebasing the whole patch, I'd 
personally just go with the second option.

 Columns should be backed by a BTree, not an array
 -

 Key: CASSANDRA-9471
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9471
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
Assignee: Benedict
 Fix For: 3.0 beta 1


 Follow up to 8099. 
 We have pretty terrible lookup performance as the number of columns grows 
 (linear). In at least one location, this results in quadratic performance. 
 We don't however want this structure to be either any more expensive to 
 build, nor to store. Some small modifications to BTree will permit it to 
 serve here, by permitting efficient lookup by index, and calculation _of_ 
 index for a given key.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9471) Columns should be backed by a BTree, not an array