date:20141119

[jira] [Commented] (CASSANDRA-8192) AssertionError in Memory.java

2014-11-19 Thread Andreas Schnitzerling (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217575#comment-14217575
 ] 

Andreas Schnitzerling commented on CASSANDRA-8192:
--

On one node I can run with 1GB and no memory problems. I think I can find more 
of that nodes. So it's well running on 32-bit. I think, it should be supported 
and I hope, my other issues not relying to memory will not be rejected because 
of 32-bit. Thanks.

 AssertionError in Memory.java
 -

 Key: CASSANDRA-8192
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8192
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Windows-7-32 bit, 3GB RAM, Java 1.7.0_67
Reporter: Andreas Schnitzerling
Assignee: Joshua McKenzie
 Attachments: cassandra.bat, cassandra.yaml, system.log


 Since update of 1 of 12 nodes from 2.1.0-rel to 2.1.1-rel Exception during 
 start up.
 {panel:title=system.log}
 ERROR [SSTableBatchOpen:1] 2014-10-27 09:44:00,079 CassandraDaemon.java:153 - 
 Exception in thread Thread[SSTableBatchOpen:1,5,main]
 java.lang.AssertionError: null
   at org.apache.cassandra.io.util.Memory.size(Memory.java:307) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.init(CompressionMetadata.java:135)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:83)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.metadata(CompressedSegmentedFile.java:50)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:48)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:766) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:725) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:402) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:302) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
   at 
 org.apache.cassandra.io.sstable.SSTableReader$4.run(SSTableReader.java:438) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
 ~[na:1.7.0_55]
   at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0_55]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
 [na:1.7.0_55]
   at java.lang.Thread.run(Unknown Source) [na:1.7.0_55]
 {panel}
 In the attached log you can still see as well CASSANDRA-8069 and 
 CASSANDRA-6283.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8337) mmap underflow during validation compaction

2014-11-19 Thread Alexander Sterligov (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217584#comment-14217584
 ] 

Alexander Sterligov commented on CASSANDRA-8337:


After one crash got exception:
{quote}
ERROR [main] 2014-11-19 11:37:45,741 CassandraDaemon.java:465 - Exception 
encountered during startup
java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:563)
 ~[apache-cassandra-2.1.2.jar:2.1.2]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:232) 
[apache-cassandra-2.1.2.jar:2.1.2]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:448) 
[apache-cassandra-2.1.2.jar:2.1.2]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:537) 
[apache-cassandra-2.1.2.jar:2.1.2]
{quote}

I think problem happens in case of lot's of replicas (I have 6 DC with 3 
replicas in each with replication factor 3) and a lots of streaming. I cannot 
reproduce it on 3 replicas in test environment, but it's a stable problem on 18 
replicas with about 1 000 000 small records.

Maybe sstables are compacted and validated before they were fully synced to 
disk?

 mmap underflow during validation compaction
 ---

 Key: CASSANDRA-8337
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8337
 Project: Cassandra
  Issue Type: Bug
Reporter: Alexander Sterligov
 Attachments: thread_dump


 During full parallel repair I often get errors like the following
 {quote}
 [2014-11-19 01:02:39,355] Repair session 116beaf0-6f66-11e4-afbb-c1c082008cbe 
 for range (3074457345618263602,-9223372036854775808] failed with error 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #116beaf0-6f66-11e4-afbb-c1c082008cbe on iss/target_state_history, 
 (3074457345618263602,-9223372036854775808]] Validation failed in 
 /95.108.242.19
 {quote}
 At the log of the node there are always same exceptions:
 {quote}
 ERROR [ValidationExecutor:2] 2014-11-19 01:02:10,847 
 JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
 forcefully due to:
 org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
 mmap segment underflow; remaining is 15 but 47 requested
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1518)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1385)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:1315)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1706)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1694)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:276)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getScanners(WrappingCompactionStrategy.java:320)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:917)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:97)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:557)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
 Caused by: java.io.IOException: mmap segment underflow; remaining is 15 but 
 47 requested
 at 
 org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:135)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:348) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:327)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1460)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 ... 13 common frames omitted

[jira] [Updated] (CASSANDRA-8231) Wrong size of cached prepared statements

2014-11-19 Thread Benjamin Lerer (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-8231:
--
Attachment: CASSANDRA-8231.txt

This patch replace the jamm version 0.2.8 by the version 0.3.0 which support 
the {{Unmetered}} annotation on type.

The {{ignoreKnownSingleton}} option of {{MemoryMeter}} was already excluding 
{{Class}} and {{Enum}} instances. So only the {{CFMetadata}}, {{AbstractType}} 
and {{Function}} had to be marked with the {{Unmetered}} annotation. 

I used the {{enableDebug}} option from {{MemoryMeter}} to verify that the 
measured instances were the expected ones. 

 Wrong size of cached prepared statements
 

 Key: CASSANDRA-8231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8231
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jaroslav Kamenik
Assignee: Benjamin Lerer
 Attachments: 8231-notes.txt, CASSANDRA-8231.txt, Unsafes.java


 Cassandra counts memory footprint of prepared statements for caching 
 purposes. It seems, that there is problem with some statements, ie 
 SelectStatement. Even simple selects is counted as 100KB object, updates, 
 deletes etc have few hundreds or thousands bytes. Result is that cache - 
 QueryProcessor.preparedStatements  - holds just fraction of statements..
 I dig a little into the code, and it seems that problem is in jamm in class 
 MemoryMeter. It seems that if instance contains reference to class, it counts 
 size of whole class too. SelectStatement references EnumSet through 
 ResultSet.Metadata and EnumSet holds reference to Enum class...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8231) Wrong size of cached prepared statements

2014-11-19 Thread Benjamin Lerer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217593#comment-14217593
 ] 

Benjamin Lerer commented on CASSANDRA-8231:
---

[~dbros...@apache.org], you are quite familiar with Jamm, can you review my 
patch.

 Wrong size of cached prepared statements
 

 Key: CASSANDRA-8231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8231
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jaroslav Kamenik
Assignee: Benjamin Lerer
 Fix For: 2.1.3

 Attachments: 8231-notes.txt, CASSANDRA-8231.txt, Unsafes.java


 Cassandra counts memory footprint of prepared statements for caching 
 purposes. It seems, that there is problem with some statements, ie 
 SelectStatement. Even simple selects is counted as 100KB object, updates, 
 deletes etc have few hundreds or thousands bytes. Result is that cache - 
 QueryProcessor.preparedStatements  - holds just fraction of statements..
 I dig a little into the code, and it seems that problem is in jamm in class 
 MemoryMeter. It seems that if instance contains reference to class, it counts 
 size of whole class too. SelectStatement references EnumSet through 
 ResultSet.Metadata and EnumSet holds reference to Enum class...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8302) Filtering for CONTAINS (KEY) on frozen collection clustering columns within a partition does not work

2014-11-19 Thread Benjamin Lerer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217626#comment-14217626
 ] 

Benjamin Lerer commented on CASSANDRA-8302:
---

+1

 Filtering for CONTAINS (KEY) on frozen collection clustering columns within a 
 partition does not work
 -

 Key: CASSANDRA-8302
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8302
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Tyler Hobbs
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 2.1.3

 Attachments: 8302.txt


 Create a table like this:
 {noformat}
 CREATE TABLE foo (
 a int,
 b int,
 c frozensetint
 d int,
 PRIMARY KEY (a, b, c, d)
 )
 {noformat}
 and add an index on it:
 {noformat}
 CREATE INDEX ON foo(b)
 {noformat}
 A query across all partitions will work correctly:
 {noformat}
 cqlsh:ks1 insert into foo (a, b, c, d) VALUES (0, 0, {1, 2}, 0);
 cqlsh:ks1 SELECT * FROM foo WHERE b=0 AND c CONTAINS 2 and d=0 ALLOW 
 FILTERING;
  a | b | c  | d
 ---+---++---
  0 | 0 | {1, 2} | 0
 (1 rows)
 {noformat}
 But if the query is restricted to a single partition, it is considered 
 invalid (and the error message isn't great):
 {noformat}
 cqlsh:ks1 SELECT * FROM foo WHERE a=0 AND b=0 AND c CONTAINS 2 and d=0 ALLOW 
 FILTERING;
 code=2200 [Invalid query] message=No secondary indexes on the restricted 
 columns support the provided operators: 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6717) Modernize schema tables

2014-11-19 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217639#comment-14217639
 ] 

Sylvain Lebresne commented on CASSANDRA-6717:
-

Btw, that wasn't mentioned above but I do think we should move from the old 
AbstractType class names for types in the schema tables. Let's serialize types 
using their CQL name (knowing that even for non-CQL types we'll still use the 
double-quoted classname). That probably means that for clustering columns 
definition we'd have to keep a boolean on whether the clustering order is 
reversed or not. 

 Modernize schema tables
 ---

 Key: CASSANDRA-6717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6717
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 3.0


 There is a few problems/improvements that can be done with the way we store 
 schema:
 # CASSANDRA-4988: as explained on the ticket, storing the comparator is now 
 redundant (or almost, we'd need to store whether the table is COMPACT or not 
 too, which we don't currently is easy and probably a good idea anyway), it 
 can be entirely reconstructed from the infos in schema_columns (the same is 
 true of key_validator and subcomparator, and replacing default_validator by a 
 COMPACT_VALUE column in all case is relatively simple). And storing the 
 comparator as an opaque string broke concurrent updates of sub-part of said 
 comparator (concurrent collection addition or altering 2 separate clustering 
 columns typically) so it's really worth removing it.
 # CASSANDRA-4603: it's time to get rid of those ugly json maps. I'll note 
 that schema_keyspaces is a problem due to its use of COMPACT STORAGE, but I 
 think we should fix it once and for-all nonetheless (see below).
 # For CASSANDRA-6382 and to allow indexing both map keys and values at the 
 same time, we'd need to be able to have more than one index definition for a 
 given column.
 # There is a few mismatches in table options between the one stored in the 
 schema and the one used when declaring/altering a table which would be nice 
 to fix. The compaction, compression and replication maps are one already 
 mentioned from CASSANDRA-4603, but also for some reason 
 'dclocal_read_repair_chance' in CQL is called just 'local_read_repair_chance' 
 in the schema table, and 'min/max_compaction_threshold' are column families 
 option in the schema but just compaction options for CQL (which makes more 
 sense).
 None of those issues are major, and we could probably deal with them 
 independently but it might be simpler to just fix them all in one shot so I 
 wanted to sum them all up here. In particular, the fact that 
 'schema_keyspaces' uses COMPACT STORAGE is annoying (for the replication map, 
 but it may limit future stuff too) which suggest we should migrate it to a 
 new, non COMPACT table. And while that's arguably a detail, it wouldn't hurt 
 to rename schema_columnfamilies to schema_tables for the years to come since 
 that's the prefered vernacular for CQL.
 Overall, what I would suggest is to move all schema tables to a new keyspace, 
 named 'schema' for instance (or 'system_schema' but I prefer the shorter 
 version), and fix all the issues above at once. Since we currently don't 
 exchange schema between nodes of different versions, all we'd need to do that 
 is a one shot startup migration, and overall, I think it could be simpler for 
 clients to deal with one clear migration than to have to handle minor 
 individual changes all over the place. I also think it's somewhat cleaner 
 conceptually to have schema tables in their own keyspace since they are 
 replicated through a different mechanism than other system tables.
 If we do that, we could, for instance, migrate to the following schema tables 
 (details up for discussion of course):
 {noformat}
 CREATE TYPE user_type (
   name text,
   column_names listtext,
   column_types listtext
 )
 CREATE TABLE keyspaces (
   name text PRIMARY KEY,
   durable_writes boolean,
   replication mapstring, string,
   user_types mapstring, user_type
 )
 CREATE TYPE trigger_definition (
   name text,
   options maptex, text
 )
 CREATE TABLE tables (
   keyspace text,
   name text,
   id uuid,
   table_type text, // COMPACT, CQL or SUPER
   dropped_columns maptext, bigint,
   triggers maptext, trigger_definition,
   // options
   comment text,
   compaction maptext, text,
   compression maptext, text,
   read_repair_chance double,
   dclocal_read_repair_chance double,
   gc_grace_seconds int,
   caching text,
   rows_per_partition_to_cache text,
   default_time_to_live int,
   min_index_interval int,
   max_index_interval int,
   speculative_retry text,
   populate_io_cache_on_flush

[jira] [Commented] (CASSANDRA-6717) Modernize schema tables

2014-11-19 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217645#comment-14217645
 ] 

Aleksey Yeschenko commented on CASSANDRA-6717:
--

Sure, I recall you mentioning that.

The end goal is to get as close to CQL CREATE TABLE syntax as technically 
possible, both for the schema tables and the metadata classes, too.

 Modernize schema tables
 ---

 Key: CASSANDRA-6717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6717
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Aleksey Yeschenko
Priority: Minor
 Fix For: 3.0


 There is a few problems/improvements that can be done with the way we store 
 schema:
 # CASSANDRA-4988: as explained on the ticket, storing the comparator is now 
 redundant (or almost, we'd need to store whether the table is COMPACT or not 
 too, which we don't currently is easy and probably a good idea anyway), it 
 can be entirely reconstructed from the infos in schema_columns (the same is 
 true of key_validator and subcomparator, and replacing default_validator by a 
 COMPACT_VALUE column in all case is relatively simple). And storing the 
 comparator as an opaque string broke concurrent updates of sub-part of said 
 comparator (concurrent collection addition or altering 2 separate clustering 
 columns typically) so it's really worth removing it.
 # CASSANDRA-4603: it's time to get rid of those ugly json maps. I'll note 
 that schema_keyspaces is a problem due to its use of COMPACT STORAGE, but I 
 think we should fix it once and for-all nonetheless (see below).
 # For CASSANDRA-6382 and to allow indexing both map keys and values at the 
 same time, we'd need to be able to have more than one index definition for a 
 given column.
 # There is a few mismatches in table options between the one stored in the 
 schema and the one used when declaring/altering a table which would be nice 
 to fix. The compaction, compression and replication maps are one already 
 mentioned from CASSANDRA-4603, but also for some reason 
 'dclocal_read_repair_chance' in CQL is called just 'local_read_repair_chance' 
 in the schema table, and 'min/max_compaction_threshold' are column families 
 option in the schema but just compaction options for CQL (which makes more 
 sense).
 None of those issues are major, and we could probably deal with them 
 independently but it might be simpler to just fix them all in one shot so I 
 wanted to sum them all up here. In particular, the fact that 
 'schema_keyspaces' uses COMPACT STORAGE is annoying (for the replication map, 
 but it may limit future stuff too) which suggest we should migrate it to a 
 new, non COMPACT table. And while that's arguably a detail, it wouldn't hurt 
 to rename schema_columnfamilies to schema_tables for the years to come since 
 that's the prefered vernacular for CQL.
 Overall, what I would suggest is to move all schema tables to a new keyspace, 
 named 'schema' for instance (or 'system_schema' but I prefer the shorter 
 version), and fix all the issues above at once. Since we currently don't 
 exchange schema between nodes of different versions, all we'd need to do that 
 is a one shot startup migration, and overall, I think it could be simpler for 
 clients to deal with one clear migration than to have to handle minor 
 individual changes all over the place. I also think it's somewhat cleaner 
 conceptually to have schema tables in their own keyspace since they are 
 replicated through a different mechanism than other system tables.
 If we do that, we could, for instance, migrate to the following schema tables 
 (details up for discussion of course):
 {noformat}
 CREATE TYPE user_type (
   name text,
   column_names listtext,
   column_types listtext
 )
 CREATE TABLE keyspaces (
   name text PRIMARY KEY,
   durable_writes boolean,
   replication mapstring, string,
   user_types mapstring, user_type
 )
 CREATE TYPE trigger_definition (
   name text,
   options maptex, text
 )
 CREATE TABLE tables (
   keyspace text,
   name text,
   id uuid,
   table_type text, // COMPACT, CQL or SUPER
   dropped_columns maptext, bigint,
   triggers maptext, trigger_definition,
   // options
   comment text,
   compaction maptext, text,
   compression maptext, text,
   read_repair_chance double,
   dclocal_read_repair_chance double,
   gc_grace_seconds int,
   caching text,
   rows_per_partition_to_cache text,
   default_time_to_live int,
   min_index_interval int,
   max_index_interval int,
   speculative_retry text,
   populate_io_cache_on_flush boolean,
   bloom_filter_fp_chance double
   memtable_flush_period_in_ms int,
   PRIMARY KEY (keyspace, name)
 )
 CREATE TYPE index_definition (
   name text,
   index_type text,
   options maptext, text
 )
 CREATE TABLE columns

[jira] [Created] (CASSANDRA-8339) Reading columns marked as type different than default validation class from CQL causes errors

2014-11-19 Thread Erik Forsberg (JIRA)

Erik Forsberg created CASSANDRA-8339:


 Summary: Reading columns marked as type different than default 
validation class from CQL causes errors
 Key: CASSANDRA-8339
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8339
 Project: Cassandra
  Issue Type: Bug
Reporter: Erik Forsberg
Assignee: Tyler Hobbs


As [discussed on users mailing 
list|http://www.mail-archive.com/user%40cassandra.apache.org/msg39251.html] I'm 
having trouble reading data from a table created via thrift, where some columns 
are marked as having a validator different than the default one.

Minimal working example:

{noformat}
#!/usr/bin/env python

# Run this in virtualenv with pycassa and cassandra-driver installed via pip
import pycassa
import cassandra
import calendar
import traceback
import time
from uuid import uuid4

keyspace = badcql

sysmanager = pycassa.system_manager.SystemManager(localhost)
sysmanager.create_keyspace(keyspace, 
strategy_options={'replication_factor':'1'})
sysmanager.create_column_family(keyspace, Users, 
key_validation_class=pycassa.system_manager.LEXICAL_UUID_TYPE,

comparator_type=pycassa.system_manager.ASCII_TYPE,

default_validation_class=pycassa.system_manager.UTF8_TYPE)
sysmanager.create_index(keyspace, Users, username, 
pycassa.system_manager.UTF8_TYPE)
sysmanager.create_index(keyspace, Users, email, 
pycassa.system_manager.UTF8_TYPE)
sysmanager.alter_column(keyspace, Users, default_account_id, 
pycassa.system_manager.LEXICAL_UUID_TYPE)
sysmanager.create_index(keyspace, Users, active, 
pycassa.system_manager.INT_TYPE)
sysmanager.alter_column(keyspace, Users, date_created, 
pycassa.system_manager.LONG_TYPE)

pool = pycassa.pool.ConnectionPool(keyspace, ['localhost:9160'])
cf = pycassa.ColumnFamily(pool, Users)

user_uuid = uuid4()

cf.insert(user_uuid, {'username':'test_username', 'auth_method':'ldap', 
'email':'t...@example.com', 'active':1, 
  'date_created':long(calendar.timegm(time.gmtime())), 
'default_account_id':uuid4()})

from cassandra.cluster import Cluster
cassandra_cluster = Cluster([localhost])
cassandra_session = cassandra_cluster.connect(keyspace)
print username, cassandra_session.execute('SELECT value from Users where 
key = %s and column1 = %s', (user_uuid, 'username',))
print email, cassandra_session.execute('SELECT value from Users where key = 
%s and column1 = %s', (user_uuid, 'email',))
try:
print default_account_id, cassandra_session.execute('SELECT value from 
Users where key = %s and column1 = %s', (user_uuid, 'default_account_id',))
except Exception as e:
print Exception trying to get default_account_id, traceback.format_exc()
cassandra_session = cassandra_cluster.connect(keyspace)

try:
print active, cassandra_session.execute('SELECT value from Users where 
key = %s and column1 = %s', (user_uuid, 'active',))
except Exception as e:
print Exception trying to get active, traceback.format_exc()
cassandra_session = cassandra_cluster.connect(keyspace)

try:
print date_created, cassandra_session.execute('SELECT value from Users 
where key = %s and column1 = %s', (user_uuid, 'date_created',))
except Exception as e:
print Exception trying to get date_created, traceback.format_exc()

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8010) cassandra-stress needs better docs for rate options

2014-11-19 Thread Liang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217676#comment-14217676
 ] 

Liang Xie commented on CASSANDRA-8010:
--

see:
{code}
$ ./cassandra-stress help -rate

Usage: -rate threads=? [limit=?]
 OR 
Usage: -rate [auto] [threads=?] [threads=?]

  threads=?run this many clients concurrently
  limit=? (default=0/s)limit operations per second across 
all clients
  auto test with increasing number of 
threadCount until performance plateaus
  threads=? (default=4)   run at least this many clients 
concurrently
  threads=? (default=1000)run at most this many clients 
concurrently
{code}
maybe we can close this JIRA now?

 cassandra-stress needs better docs for rate options
 ---

 Key: CASSANDRA-8010
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8010
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation  website, Examples, Tools
Reporter: Matt Stump
Priority: Minor
  Labels: lhf

 It's not obvious how to use the rate option. I wasn't able to figure it out 
 via the source, or from the docs. I kept trying to do -rate= or -threads=. I 
 had to search confluence for usage examples.
 Need something like this in the docs:
 -rate threads=900
 -rate threads=900



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8193) Multi-DC parallel snapshot repair

2014-11-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217678#comment-14217678
 ] 

Jimmy Mårdell commented on CASSANDRA-8193:
--

The only change I made to StorageServiceMBean is the addition of two new 
methods, so I think it should be fine?



 Multi-DC parallel snapshot repair
 -

 Key: CASSANDRA-8193
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8193
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jimmy Mårdell
Assignee: Jimmy Mårdell
Priority: Minor
 Fix For: 2.0.12

 Attachments: cassandra-2.0-8193-1.txt, cassandra-2.0-8193-2.txt


 The current behaviour of snapshot repair is to let one node at a time 
 calculate a merkle tree. This is to ensure only one node at a time is doing 
 the expensive calculation. The drawback is that it takes even longer time to 
 do the merkle tree calculation.
 In a multi-DC setup, I think it would make more sense to have one node in 
 each DC calculate the merkle tree at the same time. This would yield a 
 significant improvement when you have many data centers.
 I'm not sure how relevant this is in 2.1, but I don't see us upgrading to 2.1 
 any time soon. Unless there is an obvious drawback that I'm missing, I'd like 
 to implement this in the 2.0 branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8323) Adapt UDF code after JAVA-502

2014-11-19 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217679#comment-14217679
 ] 

Robert Stupp commented on CASSANDRA-8323:
-

Linked CASSANDRA-6717 since if we get rid of {{AbstractType}} in 3.0 at all, we 
do not need this functionality any more.

 Adapt UDF code after JAVA-502
 -

 Key: CASSANDRA-8323
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8323
 Project: Cassandra
  Issue Type: Improvement
Reporter: Robert Stupp
Assignee: Robert Stupp
 Fix For: 3.0


 In CASSANDRA-7563 support for user-types, tuple-types and collections is 
 added to C* using the Java Driver.
 The code in C* requires access to some functionality which is currently 
 performed using reflection/invoke-dynamic.
 This ticket is about to provide better/direct access to that functionality.
 I'll provide patches for Java Driver + C*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7188) Wrong class type: class org.apache.cassandra.db.Column in CounterColumn.reconcile

2014-11-19 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217738#comment-14217738
 ] 

Nicolas Lalevée commented on CASSANDRA-7188:


Our prod cluster upgraded to 2.0.11.
Without incident !

Thank you for the bug fix.

 Wrong class type: class org.apache.cassandra.db.Column in 
 CounterColumn.reconcile
 -

 Key: CASSANDRA-7188
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7188
 Project: Cassandra
  Issue Type: Bug
Reporter: Nicolas Lalevée
Assignee: Aleksey Yeschenko
  Labels: qa-resolved
 Fix For: 2.0.11

 Attachments: 7188.txt


 When migrating a cluster of 6 nodes from 1.2.11 to 2.0.7, we started to see 
 on the first migrated node this error:
 {noformat}
 ERROR [ReplicateOnWriteStage:1] 2014-05-07 11:26:59,779 CassandraDaemon.java 
 (line 198) Exception in thread Thread[ReplicateOnWriteStage:1,5,main]
 java.lang.AssertionError: Wrong class type: class 
 org.apache.cassandra.db.Column
 at 
 org.apache.cassandra.db.CounterColumn.reconcile(CounterColumn.java:159)
 at 
 org.apache.cassandra.db.filter.QueryFilter$1.reduce(QueryFilter.java:109)
 at 
 org.apache.cassandra.db.filter.QueryFilter$1.reduce(QueryFilter.java:103)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:112)
 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at 
 org.apache.cassandra.db.filter.NamesQueryFilter.collectReducedColumns(NamesQueryFilter.java:98)
 at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
 at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
 at 
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1540)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1369)
 at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:327)
 at 
 org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:55)
 at 
 org.apache.cassandra.db.CounterMutation.makeReplicationMutation(CounterMutation.java:100)
 at 
 org.apache.cassandra.service.StorageProxy$8$1.runMayThrow(StorageProxy.java:1085)
 at 
 org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1916)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}
 We then saw on the other 5 nodes, still on 1.2.x, this error:
 {noformat}
 ERROR [MutationStage:2793] 2014-05-07 11:46:12,301 CassandraDaemon.java (line 
 191) Exception in thread Thread[MutationStage:2793,5,main]
 java.lang.AssertionError: Wrong class type: class 
 org.apache.cassandra.db.Column
 at 
 org.apache.cassandra.db.CounterColumn.reconcile(CounterColumn.java:165)
 at 
 org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:378)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addColumn(AtomicSortedColumns.java:166)
 at 
 org.apache.cassandra.db.AbstractColumnContainer.addColumn(AbstractColumnContainer.java:119)
 at org.apache.cassandra.db.SuperColumn.addColumn(SuperColumn.java:218)
 at org.apache.cassandra.db.SuperColumn.putColumn(SuperColumn.java:229)
 at 
 org.apache.cassandra.db.ThreadSafeSortedColumns.addColumnInternal(ThreadSafeSortedColumns.java:108)
 at 
 org.apache.cassandra.db.ThreadSafeSortedColumns.addAllWithSizeDelta(ThreadSafeSortedColumns.java:138)
 at 
 org.apache.cassandra.db.AbstractColumnContainer.addAllWithSizeDelta(AbstractColumnContainer.java:99)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:205)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:168)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:742)
 at org.apache.cassandra.db.Table.apply(Table.java:388)
 at org.apache.cassandra.db.Table.apply(Table.java:353)
 at

[jira] [Commented] (CASSANDRA-8067) NullPointerException in KeyCacheSerializer

2014-11-19 Thread Andreas Schnitzerling (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217774#comment-14217774
 ] 

Andreas Schnitzerling commented on CASSANDRA-8067:
--

Today I got that failure during nodetool upgradesstables after upgrading from 
2.0.10 to 2.1.2 w/ finalizer-patch CASSANDRA-6283
{noformat}
ERROR [CompactionExecutor:65] 2014-11-19 09:56:02,453 CassandraDaemon.java:153 
- Exception in thread Thread[CompactionExecutor:65,1,main]
java.lang.NullPointerException: null
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:475)
 ~[apache-cassandra-2.1.2-SNAPSHOT.jar:2.1.2-SNAPSHOT]
at 
org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:463)
 ~[apache-cassandra-2.1.2-SNAPSHOT.jar:2.1.2-SNAPSHOT]
at 
org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:274)
 ~[apache-cassandra-2.1.2-SNAPSHOT.jar:2.1.2-SNAPSHOT]
at 
org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1088)
 ~[apache-cassandra-2.1.2-SNAPSHOT.jar:2.1.2-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 
~[na:1.7.0_55]
at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0_55]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[na:1.7.0_55]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[na:1.7.0_55]
at java.lang.Thread.run(Unknown Source) [na:1.7.0_55]
{noformat}

 NullPointerException in KeyCacheSerializer
 --

 Key: CASSANDRA-8067
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8067
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Eric Leleu
 Fix For: 2.1.1


 Hi,
 I have this stack trace in the logs of Cassandra server (v2.1)
 {code}
 ERROR [CompactionExecutor:14] 2014-10-06 23:32:02,098 
 CassandraDaemon.java:166 - Exception in thread 
 Thread[CompactionExecutor:14,1,main]
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:475)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:463)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:225)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1061)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
 Source) ~[na:1.7.0]
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 
 ~[na:1.7.0]
 at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
 [na:1.7.0]
 at java.lang.Thread.run(Unknown Source) [na:1.7.0]
 {code}
 It may not be critical because this error occured in the AutoSavingCache. 
 However the line 475 is about the CFMetaData so it may hide bigger issue...
 {code}
  474 CFMetaData cfm = 
 Schema.instance.getCFMetaData(key.desc.ksname, key.desc.cfname);
  475 cfm.comparator.rowIndexEntrySerializer().serialize(entry, 
 out);
 {code}
 Regards,
 Eric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8055) Centralize shared executors

2014-11-19 Thread Sam Tunnicliffe (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-8055:
---
Attachment: 8055.txt

 Centralize shared executors
 ---

 Key: CASSANDRA-8055
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8055
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: Sam Tunnicliffe
Priority: Minor
  Labels: lhf
 Fix For: 2.1.3

 Attachments: 8055.txt


 As mentioned in CASSANDRA-7930 we should put shared executors in a common 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (CASSANDRA-8067) NullPointerException in KeyCacheSerializer

2014-11-19 Thread Yuki Morishita (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita reopened CASSANDRA-8067:
---

reopening for investigation

 NullPointerException in KeyCacheSerializer
 --

 Key: CASSANDRA-8067
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8067
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Eric Leleu
 Fix For: 2.1.1


 Hi,
 I have this stack trace in the logs of Cassandra server (v2.1)
 {code}
 ERROR [CompactionExecutor:14] 2014-10-06 23:32:02,098 
 CassandraDaemon.java:166 - Exception in thread 
 Thread[CompactionExecutor:14,1,main]
 java.lang.NullPointerException: null
 at 
 org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:475)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.service.CacheService$KeyCacheSerializer.serialize(CacheService.java:463)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.cache.AutoSavingCache$Writer.saveCache(AutoSavingCache.java:225)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at 
 org.apache.cassandra.db.compaction.CompactionManager$11.run(CompactionManager.java:1061)
  ~[apache-cassandra-2.1.0.jar:2.1.0]
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown 
 Source) ~[na:1.7.0]
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 
 ~[na:1.7.0]
 at java.util.concurrent.FutureTask.run(Unknown Source) ~[na:1.7.0]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
 [na:1.7.0]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
 [na:1.7.0]
 at java.lang.Thread.run(Unknown Source) [na:1.7.0]
 {code}
 It may not be critical because this error occured in the AutoSavingCache. 
 However the line 475 is about the CFMetaData so it may hide bigger issue...
 {code}
  474 CFMetaData cfm = 
 Schema.instance.getCFMetaData(key.desc.ksname, key.desc.cfname);
  475 cfm.comparator.rowIndexEntrySerializer().serialize(entry, 
 out);
 {code}
 Regards,
 Eric



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CASSANDRA-8340) Use sstable min timestamp when deciding if an sstable should be included in DTCS compactions

2014-11-19 Thread Marcus Eriksson (JIRA)

Marcus Eriksson created CASSANDRA-8340:
--

 Summary: Use sstable min timestamp when deciding if an sstable 
should be included in DTCS compactions
 Key: CASSANDRA-8340
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8340
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Priority: Minor


Currently we check how old the newest data (max timestamp) in an sstable is 
when we check if it should be compacted.

If we instead switch to using min timestamp for this we have a pretty clean 
migration path from STCS/LCS to DTCS. 

My thinking is that before migrating, the user does a major compaction, which 
creates a huge sstable containing all data, with min timestamp very far back in 
time, then switching to DTCS, we will have a big sstable that we never compact 
(ie, min timestamp of this big sstable is before max_sstable_age_days), and all 
newer data will be after that, and that new data will be properly compacted

WDYT [~Bj0rn] ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8264) Problems with multicolumn relations and COMPACT STORAGE

2014-11-19 Thread Benjamin Lerer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217860#comment-14217860
 ] 

Benjamin Lerer commented on CASSANDRA-8264:
---

+1

 Problems with multicolumn relations and COMPACT STORAGE
 ---

 Key: CASSANDRA-8264
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8264
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Tyler Hobbs
Assignee: Tyler Hobbs
 Fix For: 2.0.12, 2.1.3

 Attachments: 8264-2.0.txt, 8264-2.1.txt


 As discovered in CASSANDRA-7859, there are a few issues with multi-column 
 relations and {{COMPACT STORAGE}}.
 The first issue is that IN clauses on multiple columns aren't handled 
 correctly.  There appear to be other issues as well, but I haven't been able 
 to dig into them yet.  To reproduce the issues, run each of the tests in 
 {{MultiColumnRelationTest}} with a {{COMPACT STORAGE}} version of the table.  
 (Changing the tests to do that automatically will be part of the ticket.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8231) Wrong size of cached prepared statements

2014-11-19 Thread Dave Brosius (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217884#comment-14217884
 ] 

Dave Brosius commented on CASSANDRA-8231:
-

+LGTM. i'll push 0.30 to maven

 Wrong size of cached prepared statements
 

 Key: CASSANDRA-8231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8231
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jaroslav Kamenik
Assignee: Benjamin Lerer
 Fix For: 2.1.3

 Attachments: 8231-notes.txt, CASSANDRA-8231.txt, Unsafes.java


 Cassandra counts memory footprint of prepared statements for caching 
 purposes. It seems, that there is problem with some statements, ie 
 SelectStatement. Even simple selects is counted as 100KB object, updates, 
 deletes etc have few hundreds or thousands bytes. Result is that cache - 
 QueryProcessor.preparedStatements  - holds just fraction of statements..
 I dig a little into the code, and it seems that problem is in jamm in class 
 MemoryMeter. It seems that if instance contains reference to class, it counts 
 size of whole class too. SelectStatement references EnumSet through 
 ResultSet.Metadata and EnumSet holds reference to Enum class...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8231) Wrong size of cached prepared statements

2014-11-19 Thread Benjamin Lerer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217896#comment-14217896
 ] 

Benjamin Lerer commented on CASSANDRA-8231:
---

Thanks

 Wrong size of cached prepared statements
 

 Key: CASSANDRA-8231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8231
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jaroslav Kamenik
Assignee: Benjamin Lerer
 Fix For: 2.1.3

 Attachments: 8231-notes.txt, CASSANDRA-8231.txt, Unsafes.java


 Cassandra counts memory footprint of prepared statements for caching 
 purposes. It seems, that there is problem with some statements, ie 
 SelectStatement. Even simple selects is counted as 100KB object, updates, 
 deletes etc have few hundreds or thousands bytes. Result is that cache - 
 QueryProcessor.preparedStatements  - holds just fraction of statements..
 I dig a little into the code, and it seems that problem is in jamm in class 
 MemoryMeter. It seems that if instance contains reference to class, it counts 
 size of whole class too. SelectStatement references EnumSet through 
 ResultSet.Metadata and EnumSet holds reference to Enum class...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8231) Wrong size of cached prepared statements

2014-11-19 Thread Dave Brosius (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217900#comment-14217900
 ] 

Dave Brosius commented on CASSANDRA-8231:
-

ah one thing. don't forget to change the license file name for jamm.

 Wrong size of cached prepared statements
 

 Key: CASSANDRA-8231
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8231
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jaroslav Kamenik
Assignee: Benjamin Lerer
 Fix For: 2.1.3

 Attachments: 8231-notes.txt, CASSANDRA-8231.txt, Unsafes.java


 Cassandra counts memory footprint of prepared statements for caching 
 purposes. It seems, that there is problem with some statements, ie 
 SelectStatement. Even simple selects is counted as 100KB object, updates, 
 deletes etc have few hundreds or thousands bytes. Result is that cache - 
 QueryProcessor.preparedStatements  - holds just fraction of statements..
 I dig a little into the code, and it seems that problem is in jamm in class 
 MemoryMeter. It seems that if instance contains reference to class, it counts 
 size of whole class too. SelectStatement references EnumSet through 
 ResultSet.Metadata and EnumSet holds reference to Enum class...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8329) LeveledCompactionStrategy should split large files across data directories when compacting

2014-11-19 Thread Yuki Morishita (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217926#comment-14217926
 ] 

Yuki Morishita commented on CASSANDRA-8329:
---

[~aboudreault] Thanks for your help. We need to check if LCS will write to 
multiple disks when compacting large SSTables in L0 after the patch.

 LeveledCompactionStrategy should split large files across data directories 
 when compacting
 --

 Key: CASSANDRA-8329
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8329
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: J.B. Langston
Assignee: Marcus Eriksson
 Fix For: 2.0.12

 Attachments: 
 0001-get-new-sstable-directory-for-every-new-file-during-.patch


 Because we fall back to STCS for L0 when LCS gets behind, the sstables in L0 
 can get quite large during sustained periods of heavy writes.  This can 
 result in large imbalances between data volumes when using JBOD support.  
 Eventually these large files get broken up as L0 sstables are moved up into 
 higher levels; however, because LCS only chooses a single volume on which to 
 write all of the sstables created during a single compaction, the imbalance 
 is persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8286) Regression in ORDER BY

2014-11-19 Thread Benjamin Lerer (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217927#comment-14217927
 ] 

Benjamin Lerer commented on CASSANDRA-8286:
---

I like the new unit tests.
+1

 Regression in ORDER BY
 --

 Key: CASSANDRA-8286
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8286
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Thompson
Assignee: Tyler Hobbs
  Labels: cql
 Fix For: 2.0.12, 2.1.3

 Attachments: 8286-2.0-v2.txt, 8286-2.0.txt, 8286-2.1-v2.txt, 
 8286-2.1.txt, 8286-trunk-v2.txt, 8286-trunk.txt


 The dtest {{cql_tests.py:TestCQL.order_by_multikey_test}} is now failing in 
 2.0:
 http://cassci.datastax.com/job/cassandra-2.0_dtest/lastCompletedBuild/testReport/cql_tests/TestCQL/order_by_multikey_test/history/
 This failure began at the commit for CASSANDRA-8178.
 The error message reads 
 {code}==
 ERROR: order_by_multikey_test (cql_tests.TestCQL)
 --
 Traceback (most recent call last):
   File /Users/philipthompson/cstar/cassandra-dtest/dtest.py, line 524, in 
 wrapped
 f(obj)
   File /Users/philipthompson/cstar/cassandra-dtest/cql_tests.py, line 1807, 
 in order_by_multikey_test
 res = cursor.execute(SELECT col1 FROM test WHERE my_id in('key1', 
 'key2', 'key3') ORDER BY col1;)
   File /Library/Python/2.7/site-packages/cassandra/cluster.py, line 1281, 
 in execute
 result = future.result(timeout)
   File /Library/Python/2.7/site-packages/cassandra/cluster.py, line 2771, 
 in result
 raise self._final_exception
 InvalidRequest: code=2200 [Invalid query] message=ORDER BY could not be used 
 on columns missing in select clause.{code}
 and occurs at the query {{SELECT col1 FROM test WHERE my_id in('key1', 
 'key2', 'key3') ORDER BY col1;}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8055) Centralize shared executors

2014-11-19 Thread Aleksey Yeschenko (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-8055:
-
Reviewer: Aleksey Yeschenko

 Centralize shared executors
 ---

 Key: CASSANDRA-8055
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8055
 Project: Cassandra
  Issue Type: Improvement
Reporter: T Jake Luciani
Assignee: Sam Tunnicliffe
Priority: Minor
  Labels: lhf
 Fix For: 2.1.3

 Attachments: 8055.txt


 As mentioned in CASSANDRA-7930 we should put shared executors in a common 
 class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (CASSANDRA-8018) Cassandra seems to insert twice in custom PerColumnSecondaryIndex

2014-11-19 Thread Benjamin Lerer (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer reassigned CASSANDRA-8018:
-

Assignee: Benjamin Lerer

 Cassandra seems to insert twice in custom PerColumnSecondaryIndex
 -

 Key: CASSANDRA-8018
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8018
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Pavel Chlupacek
Assignee: Benjamin Lerer
 Fix For: 2.1.3


 When inserting data into Cassandra 2.1.0 into table with custom secondary 
 index, the Cell is inserted twice, if inserting new entry into row with same 
 rowId, but different cluster index columns. 
 
 CREATE KEYSPACE fulltext WITH replication = {'class': 'SimpleStrategy',  
 'replication_factor' : 1};
 CREATE TABLE fulltext.test ( id uuid, name text, name2 text, json varchar, 
 lucene text, primary key ( id , name));
 sCREATE CUSTOM INDEX lucene_idx on fulltext.test(lucene) using 
 'com.spinoco.fulltext.cassandra.TestIndex'; 
 // this causes only one insert
  insertInto(fulltext,test)
   .value(id, id1.uuid)
   .value(name, goosh1) 
   .value(json, TestContent.message1.asJson)
 // this causes 2 inserts to be done 
  insertInto(fulltext,test)
 .value(id, id1.uuid)
 .value(name, goosh2)
 .value(json, TestContent.message2.asJson)
 /// stacktraces for inserts (always same, for 1st and 2nd insert)
 custom indexer stacktraces and then
   at 
 org.apache.cassandra.db.index.SecondaryIndexManager$StandardUpdater.insert(SecondaryIndexManager.java:707)
   at 
 org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:344)
   at 
 org.apache.cassandra.db.AtomicBTreeColumns$ColumnUpdater.apply(AtomicBTreeColumns.java:319)
   at 
 org.apache.cassandra.utils.btree.NodeBuilder.addNewKey(NodeBuilder.java:323)
   at 
 org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:191)
   at org.apache.cassandra.utils.btree.Builder.update(Builder.java:74)
   at org.apache.cassandra.utils.btree.BTree.update(BTree.java:186)
   at 
 org.apache.cassandra.db.AtomicBTreeColumns.addAllWithSizeDelta(AtomicBTreeColumns.java:189)
   at org.apache.cassandra.db.Memtable.put(Memtable.java:194)
   at 
 org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1142)
   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:394)
   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:351)
   at org.apache.cassandra.db.Mutation.apply(Mutation.java:214)
   at 
 org.apache.cassandra.service.StorageProxy$7.runMayThrow(StorageProxy.java:970)
   at 
 org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2080)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at 
 org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:163)
   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:103)
   at java.lang.Thread.run(Thread.java:744)
  Note that cell, rowkey and Group in public abstract void 
 insert(ByteBuffer rowKey, Cell col, OpOrder.Group opGroup); are having for 
 both successive calls same identity 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8228) Log malfunctioning host on prepareForRepair

2014-11-19 Thread Rajanarayanan Thottuvaikkatumana (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218030#comment-14218030
 ] 

Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-8228:
-

Anything else from my side? Thanks

 Log malfunctioning host on prepareForRepair
 ---

 Key: CASSANDRA-8228
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8228
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Juho Mäkinen
Assignee: Rajanarayanan Thottuvaikkatumana
Priority: Trivial
  Labels: lhf
 Attachments: cassandra-trunk-8228.txt


 Repair startup goes thru ActiveRepairService.prepareForRepair() which might 
 result with Repair failed with error Did not get positive replies from all 
 endpoints. error, but there's no other logging regarding to this error.
 It seems that it would be trivial to modify the prepareForRepair() to log the 
 host address which caused the error, thus ease the debugging effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8212) Archive Commitlog Test Failing

2014-11-19 Thread Philip Thompson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8212:
---
Assignee: Marcus Eriksson

 Archive Commitlog Test Failing
 --

 Key: CASSANDRA-8212
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8212
 Project: Cassandra
  Issue Type: Bug
Reporter: Philip Thompson
Assignee: Marcus Eriksson
 Fix For: 2.0.12


 The test snapshot_test.TestArchiveCommitlog.test_archive_commitlog is failing 
 on 2.0.11, but not 2.1.1. We attempt to replay 65000 rows, but in 2.0.11 only 
 63000 rows succeed. URL for test output:
 http://cassci.datastax.com/job/cassandra-2.0_dtest/lastCompletedBuild/testReport/snapshot_test/TestArchiveCommitlog/test_archive_commitlog/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7075) Add the ability to automatically distribute your commitlogs across all data volumes

2014-11-19 Thread Branimir Lambov (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-7075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218069#comment-14218069
]

Branimir Lambov commented on CASSANDRA-7075:

First draft of the multi-volume commit log can be found
[here|https://github.com/blambov/cassandra/compare/7075-commitlog-volumes-2].
This is still a work in progress, but while I'm looking at ways to properly
test everything, I'd be interested in some opinions on where to take this next.

To be able to spread the load between drives, the new implementation switches
'volumes' on every sync request. Each volume has its own writing thread (which
in the compressed case will also be doing the compression); the segment
management thread, which handles creating and recycling segments, remains
shared for now. Each volume writes in its own CommitLogSegment, so in effect we
may write some mutations in one segment, switch to the segment in the other
drive, then switch back to writing in the first-- which means that the order of
mutations is no longer defined first by the segment ID. To deal with this I
exposed the concept of a 'section', which existed before as the set of
mutations between two sync markers, and gave the section an ID which now
replaces the segment ID in ReplayPositions. Every time we start writing to a
volume, a new section with a fresh ID is created. Every time we switch volumes,
a write for the old section is scheduled and either the volume is put back at
the end of a queue of ready-to-use volumes (if the segment is not exhausted or
there is an available reserve segment) or the management thread is woken to
prepare a new segment and put the volume back in the queue when one is ready.

Because of the new ordering, commit log replay now has to be able to sort and
operate on the level of sections (for new logs) as well as on the level of
segments (for legacy logs). The machinery is refactored a little to permit
this, and the new code is also used to select a non-conflicting section ID at
start.

For full flexibility commit log volumes are configured separately from data
volumes. If necessary, multiple volumes can be assigned to the same drive. With
archiving it's not clear where archived logs should be restored, thus I created
an option to specify that as well (with a default of sending them to the first
CL volume).

The current code has more locking than I'd like, most importantly in
CLSM.advanceVolume(), which is called every time a disk synchronization is
requested (also when a segment is full, but that has much lower frequency).
There is a noticeable impact on performance; I need more performance testing in
various configurations to quantify it. I can see three ways to continue from
here:

# Leave the locking as it is, which permits flexibility in the ordering of
volumes in the queue. This can be made use of by making queuedVolumes a
priority queue, ordered, e.g. by expected sync finish time. The latter will be
able to handle heterogeneous situations (e.g. SSDs + HDDs; more importantly
uneven distribution of requests from other parts of the code on the drives)
very well. I think this option will result in the least complex code and the
highest flexibility of the solution.
# Not permit reordering of volumes in the queue, which lets section IDs be
assigned on queue entry rather than exit; with a little more work switching to
a new section from the queue can be made a single compare-and-swap. In this
option the load necessarily has to be spread evenly between the specified CL
volumes (not necessarily between the drives as a user still may give multiple
directories on the same drive). With a single CL volume and possibly in
homogeneous scenarios this option should result in the best performance.
# As above, but put sections in the queue only when the previous sync for the
volume has completed. This option can use the drives' performance most
efficiently, but it needs another queuing layer to be able to properly deal
with situations where all drives are busy and mutations are still incoming.

I'm leaning towards (1) for the flexibility, but that may be a performance
regression in the single-volume case. Is it worth investing the time to try out
two or all three options?

Add the ability to automatically distribute your commitlogs across all data
volumes
---

Key: CASSANDRA-7075
URL: https://issues.apache.org/jira/browse/CASSANDRA-7075
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Tupshin Harper
Assignee: Branimir Lambov
Priority: Minor
Labels: performance
Fix For: 3.0

given the prevalance of ssds (no need to separate commitlog and data), and
improved

[jira] [Created] (CASSANDRA-8341) Expose time spent in each thread pool

2014-11-19 Thread Chris Lohfink (JIRA)

Chris Lohfink created CASSANDRA-8341:


 Summary: Expose time spent in each thread pool
 Key: CASSANDRA-8341
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Chris Lohfink
Priority: Minor


Can increment a counter with time spent in each queue.  This can provide 
context on how much time is spent percentage wise in each stage.  Additionally 
can be used with littles law in future if ever want to try to tune the size of 
the pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-8341) Expose time spent in each thread pool

2014-11-19 Thread Chris Lohfink (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-8341:
-
Attachment: 8341.patch

 Expose time spent in each thread pool
 -

 Key: CASSANDRA-8341
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Chris Lohfink
Priority: Minor
  Labels: metrics
 Attachments: 8341.patch


 Can increment a counter with time spent in each queue.  This can provide 
 context on how much time is spent percentage wise in each stage.  
 Additionally can be used with littles law in future if ever want to try to 
 tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool

2014-11-19 Thread Chris Lohfink (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218111#comment-14218111
 ] 

Chris Lohfink commented on CASSANDRA-8341:
--

Attached a first pass patch for an idea.  I just recorded the wall time but may 
want to record cpu time as well ([getCurrentThreadCpuTime  
getCurrentThreadUserTime|https://docs.oracle.com/javase/7/docs/api/java/lang/management/ThreadMXBean.html#getCurrentThreadCpuTime()]).
  May be worth recording it with a histogram instead of just a counter as well, 
but for the purpose of exposing % of time I think the total is sufficient.  I 
added an insertion meter as well for easier time in estimating different pool 
sizes (just easier then adding up the pending/completed/active and give sense 
of rate).

 Expose time spent in each thread pool
 -

 Key: CASSANDRA-8341
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Chris Lohfink
Priority: Minor
  Labels: metrics
 Attachments: 8341.patch


 Can increment a counter with time spent in each queue.  This can provide 
 context on how much time is spent percentage wise in each stage.  
 Additionally can be used with littles law in future if ever want to try to 
 tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8341) Expose time spent in each thread pool

2014-11-19 Thread Robert Stupp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14218149#comment-14218149
 ] 

Robert Stupp commented on CASSANDRA-8341:
-

FWIW it's not necessary to add another {{ThreadLocal}} to trace work-unit start 
time. Wrapping the {{Runnable}} using a static class containing the start-time 
feels cheaper.
Adding to metrics code before or after work-unit execution transparently 
extends work-unit execution latency, which isn't measured.
(Note that System.nanoTime() [may introduce 
latency|http://shipilev.net/blog/2014/nanotrusting-nanotime/]).

 Expose time spent in each thread pool
 -

 Key: CASSANDRA-8341
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8341
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Chris Lohfink
Priority: Minor
  Labels: metrics
 Attachments: 8341.patch


 Can increment a counter with time spent in each queue.  This can provide 
 context on how much time is spent percentage wise in each stage.  
 Additionally can be used with littles law in future if ever want to try to 
 tune the size of the pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

94 matches

Mail list logo