from:"Jonathan Ellis \(Commented\) \(JIRA\)"

[jira] [Commented] (CASSANDRA-4170) cql3 ALTER TABLE ALTER TYPE has no effect

2012-04-20 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258264#comment-13258264
 ] 

Jonathan Ellis commented on CASSANDRA-4170:
---

+1

 cql3 ALTER TABLE ALTER TYPE has no effect
 -

 Key: CASSANDRA-4170
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4170
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: Sylvain Lebresne
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4170.txt


 running the following with cql3:
 {noformat}
 CREATE TABLE test (foo text PRIMARY KEY, bar int);
 ALTER TABLE test ALTER bar TYPE float;
 {noformat}
 does not actually change the column type of bar. It does under cql2.
 Note that on the current cassandra-1.1.0 HEAD, this causes an NPE, fixed by 
 CASSANDRA-4163. But even with that applied, the ALTER shown here has no 
 effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4177) Little improvement on the messages of the exceptions thrown by ExternalClient

2012-04-20 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258267#comment-13258267
 ] 

Jonathan Ellis commented on CASSANDRA-4177:
---

Don't you get a Caused by later on in the stack trace with the original 
approach?

 Little improvement on the messages of the exceptions thrown by ExternalClient
 -

 Key: CASSANDRA-4177
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4177
 Project: Cassandra
  Issue Type: Improvement
Reporter: Michał Michalski
Assignee: Michał Michalski
Priority: Trivial
 Attachments: trunk-4177.txt


 After adding BulkRecordWriter (or actually ExternalClient) the ability to 
 make use of authentication I've noticed that exceptions that are thrown on 
 login failure are very misguiding - there's always a Could not retrieve 
 endpoint ranges RuntimeException being thrown, no matter what really 
 happens. This hides the real reason of all authentication problems. I've 
 changed this line a bit, so all the messages are passed without any change, 
 so now I get - for example - AuthenticationException(why:Given password in 
 password mode MD5 could not be validated for user operator) or - in worst 
 case - Unexpected authentication problem, which is waaay more helpful, so I 
 submit this trivial, but useful improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4171) cql3 ALTER TABLE foo WITH default_validation=int has no effect

2012-04-20 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258312#comment-13258312
 ] 

Jonathan Ellis commented on CASSANDRA-4171:
---

+1

 cql3 ALTER TABLE foo WITH default_validation=int has no effect
 --

 Key: CASSANDRA-4171
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4171
 Project: Cassandra
  Issue Type: Bug
  Components: API, Core
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: Sylvain Lebresne
Priority: Trivial
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4171.txt


 running the following with cql3:
 {noformat}
 CREATE TABLE test (foo text PRIMARY KEY) WITH default_validation=timestamp;
 ALTER TABLE test WITH default_validation=int;
 {noformat}
 does not actually change the default validation type of the CF. It does under 
 cql2.
 No error is thrown. Some properties *can* be successfully changed using ALTER 
 WITH, such as comment and gc_grace_seconds, but I haven't tested all of them. 
 It seems probable that default_validation is the only problematic one, since 
 it's the only (changeable) property which accepts CQL typenames.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257539#comment-13257539
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. I'm either misunderstanding what you call 'arbitrary orderings' or I have 
not been part of that discussion

I think you are misunderstanding.  This is what I'm referring to:

{code}
.   if (stmt.parameters.orderBy != null)
{
CFDefinition.Name name = cfDef.get(stmt.parameters.orderBy);
if (name == null)
throw new InvalidRequestException(String.format(Order by 
on unknown column %s, stmt.parameters.orderBy));

if (name.kind != CFDefinition.Name.Kind.COLUMN_ALIAS || 
name.position != 0)
throw new InvalidRequestException(String.format(Order by 
is currently only supported on the second column of the PRIMARY KEY (if any), 
got %s, stmt.parameters.orderBy));
}
{code}

bq. How is that sophistry, seriously? 

ORDER BY X DESC does not mean give me them in the reverse order that Xes are 
in on disk, it means give me larger values before smaller ones.  This isn't 
open for debate, it's a very clear requirement.

Remember that clustering is not new ground for databases; SQL has been there, 
done that.  As I mentioned when we were designing the CQL3 schema syntax, 
RDBMSes have had a concept of clustered indexes for a long, long time.  But 
clustering on an index ASC or DESC does not affect the results other than as an 
optimization; when you ORDER BY X, that's what you get.

SQL and CQL are declarative languages: Here is what I want; you figure out how 
to give me the results.  This has proved a good design.  Modifying the 
semantics of a query based on index or clustering or other declarations 
elsewhere has ZERO precedent and is bad design to boot; you don't want users to 
have to consult their DDL when debugging, to know what results a query will 
give.

Thus, the only design that makes sense in the larger context of a declarative 
language is to treat the clustering as an optimization as I've described (or 
as an index, if you prefer), and continue to reject ORDER BY requests that 
are neither forward- nor reverse-clustered.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257580#comment-13257580
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. what happened to Third (and this is the big one) I strongly suspect that 
we're going to start supporting at least limited run-time ordering in the near 
future from CASSANDRA-3925

Nothing, except that it's a separate ticket's worth of work.

bq. I never suggested that [ORDER BY depends on disk order], not even a little. 
Not more than you did.

I really don't see the distinction between saying disk order and clustering 
order, as in the clustered part of th PK induces an ordering of records ... 
SELECT without ORDER BY should return records in that clustering order ... 
SELECT ORDER BY ASC returns 'z' before 'a'. 

But disk order or clustering order, I don't care which you call it; I reject 
both as modifiers of the semantics of DESC.

bq. the fact that value X is larger than Y depends on the ordering induces by 
your custom types

Agreed.  But that's not the same as reverse-clustering on a type: y int ... 
PRIMARY KEY (x, y DESC) (to use your syntax) is NOT the same as y ReversedInt 
... PRIMARY KEY (x, y).  In the former, ORDER BY Y DESC should give larger Y 
before smaller; in the latter, the reverse.


 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257663#comment-13257663
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. the model defines an ordering of the rows (where rows is in the sense of 
SQL) in tables, order that is defined as the ordering implied by the types of 
the clustered keys (and to be clear, I don't care what clustering mean in 
SQL, I'm reusing the name because you're using it, but I only mean by that term 
the fields in the PK after the first one). That doesn't imply the disk order 
has to respect it

I think the mental model of rows as predicates, queries returning sets of rows 
with no inherent order, and ORDER BY as specifying the desired order, is much 
simpler and easier to reason about (see prior point about having to consult DDL 
+ QUERY to figure out what order results are supposed to appear in).

bq. To my defence, you're attributing your semantic to my made up syntax 

I was trying to say that I view ReversedType(Int32Type) as modification of 
Int32Type (which should not affect int ordering) and not a completely new type, 
the way the (hypothetical) ReversedInt (or BackwardsInt, or AlmostNotQuiteInt) 
type would be.  Since the latter isn't really related to an int at all, even 
though they look a lot like ints in many respects.

bq. I do think that in most case it's more natural to define a reversed type 
rather than just adding an optim for reversed queries. 

I don't follow.

bq. I do think that have a form of syntactic double negation that is not 
equivalent to removing both is kind of weird... I do think that it's not 
necessarily clear per se (i.e to anyone that may not be familiar with SQL 
clustering for instance) that WITH CLUSTERING ORDER (x DESC) does not change 
the ordering

But saying {{ORDER BY X DESC}} always gives you higher X first is the only 
way to avoid the double negation!  Otherwise in your original syntax of PK (X, 
Y DESC), the only way to get 1 to sort before 100 is to ask for ORDER BY Y DESC 
so the DESC cancel out!

I just can't agree that ORDER BY Y DESC giving {1, 100} is going to be less 
confusing than {100, 1}, no matter how much we tell users, No, you see, it's 
really just reversing the clustering order, which you already reversed...

Users may not be familiar with clustering, but they're *very* familiar with 
ORDER BY, which as I said above, is very clear on what it does.  Clustering is 
the closest example of how performance hints should *not* change the semantics 
of the query, but indexes fall into the same category.

It may also be worth pointing out that it's worth preserving CQL compatibility 
with Hive; queries that execute on both (and to the best of my knowledge CQL3 
is a strict subset of Hive SQL) should not give different results.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4173) cqlsh: in cql3 mode, use cql3 quoting when outputting cql

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257757#comment-13257757
 ] 

Jonathan Ellis commented on CASSANDRA-4173:
---

Does CQL2 support double quotes?  If so, switching to doublequotes everywhere 
may be simpler.

 cqlsh: in cql3 mode, use cql3 quoting when outputting cql
 -

 Key: CASSANDRA-4173
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4173
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.1.0
Reporter: paul cannon
Assignee: paul cannon
Priority: Minor
  Labels: cql3, cqlsh

 when cqlsh needs to output a column name or other term which needs quoting 
 (say, if you run DESCRIBE KEYSPACE and some column name has a space in it), 
 it currently only knows how to quote in the cql2 way. That is,
 {noformat}
 cqlsh:foo describe columnfamily bar
 CREATE COLUMNFAMILY bar (
   a int PRIMARY KEY,
   'b c' text
 ) WITH
 ...
 {noformat}
 cql3 does not recognize single quotes around column names, or columnfamily or 
 keyspace names either. cqlsh ought to learn how to use double-quotes instead 
 when in cql3 mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257794#comment-13257794
 ] 

Jonathan Ellis commented on CASSANDRA-4174:
---

Are you proposing we issue a single compaction submission when streaming is 
done, instead?

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257821#comment-13257821
 ] 

Jonathan Ellis commented on CASSANDRA-4174:
---

Devil's advocate for the status quo: starting compaction as soon as I have one 
sstable to work on might smooth out the workload more.  (If we finish the 
first compaction before the next is available, then great; if we don't, then 
they'll stack up and we'll do something closer to the all at once approach.)

Thoughts?

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257836#comment-13257836
 ] 

Jonathan Ellis commented on CASSANDRA-4175:
---

The wrinkle here is concurrent schema changes -- how can we make sure each node 
uses the same column ids for each name?  I see two possible approaches:

# embed something like Zookeeper to standardize the id map
# punt: let each node use a node-local map, and translate back and forth to 
full column name across node boundaries


 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4174) Unnecessary compaction happens when streaming

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257890#comment-13257890
 ] 

Jonathan Ellis commented on CASSANDRA-4174:
---

I see.

So is this basically a cosmetic change then, to not have redundant tasks 
created?

If so, I think I'd rather commit to 1.1.1.

 Unnecessary compaction happens when streaming
 -

 Key: CASSANDRA-4174
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4174
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.0.10

 Attachments: 4174-1.0.txt


 When streaming session finishes, streamed sstabls are added to CFS one by one 
 using 
 ColumnFamilyStore#addSSTable(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/streaming/StreamInSession.java#L141).
  This method submits compaction in 
 background(https://github.com/apache/cassandra/blob/cassandra-1.0.9/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L946),
  and end up with unnecessary compaction tasks behind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258017#comment-13258017
 ] 

Jonathan Ellis commented on CASSANDRA-4175:
---

And extremely collision-prone. :)

 Reduce memory (and disk) space requirements with a column name/id map
 -

 Key: CASSANDRA-4175
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
 Fix For: 1.2


 We spend a lot of memory on column names, both transiently (during reads) and 
 more permanently (in the row cache).  Compression mitigates this on disk but 
 not on the heap.
 The overhead is significant for typical small column values, e.g., ints.
 Even though we intern once we get to the memtable, this affects writes too 
 via very high allocation rates in the young generation, hence more GC 
 activity.
 Now that CQL3 provides us some guarantees that column names must be defined 
 before they are inserted, we could create a map of (say) 32-bit int column 
 id, to names, and use that internally right up until we return a resultset to 
 the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4175) Reduce memory (and disk) space requirements with a column name/id map

2012-04-19 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13258030#comment-13258030
]

Jonathan Ellis commented on CASSANDRA-4175:
---

Hashcode just isn't designed to be collision-resistant; it prioritizes speed.
Even with a better (from the standpoint of collisions) general-purpose hash
like Murmur, 32bits is just too small. The smallest cryptographic hash I know
of is md5, and ballooning to 128bits puts a serious crimp in the potential
savings here.

Reduce memory (and disk) space requirements with a column name/id map
-

Key: CASSANDRA-4175
URL: https://issues.apache.org/jira/browse/CASSANDRA-4175
Project: Cassandra
Issue Type: Improvement
Reporter: Jonathan Ellis
Fix For: 1.2

We spend a lot of memory on column names, both transiently (during reads) and
more permanently (in the row cache). Compression mitigates this on disk but
not on the heap.
The overhead is significant for typical small column values, e.g., ints.
Even though we intern once we get to the memtable, this affects writes too
via very high allocation rates in the young generation, hence more GC
activity.
Now that CQL3 provides us some guarantees that column names must be defined
before they are inserted, we could create a map of (say) 32-bit int column
id, to names, and use that internally right up until we return a resultset to
the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4165) Generate Digest file for compressed SSTables

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256560#comment-13256560
 ] 

Jonathan Ellis commented on CASSANDRA-4165:
---

The thinking was, compressed sstables have a per-block checksum, so there's no 
need to have the less-granular sha.

 Generate Digest file for compressed SSTables
 

 Key: CASSANDRA-4165
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4165
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Priority: Minor
 Attachments: 0001-Generate-digest-for-compressed-files-as-well.patch


 We use the generated *Digest.sha1-files to verify backups, would be nice if 
 they were generated for compressed sstables as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256631#comment-13256631
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

Trivium: this was our oldest open issue.

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: jmx, lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff, cf_snapshots_556_2.diff, 
 cf_snapshots_556_2A.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256774#comment-13256774
]

Jonathan Ellis commented on CASSANDRA-4162:
---

If you're hung up on the nodetool help description, let's fix that.
Fundamentally disablegossip disables gossip. That's all. It's not intended
to, nor should it, stop all network traffic dead in the water. I've already
explained why that is, and brandon and eldon have given workarounds for when
you really do want to do that.

nodetool disablegossip does not prevent gossip delivery of writes via
already-initiated hinted handoff
--

Key: CASSANDRA-4162
URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.0.9
Environment: reported on IRC, believe it was a linux environment,
nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
Labels: gossip

This ticket derives from #cassandra, aaron_morton and I assisted a user who
had run disablethrift and disablegossip and was confused as to why he was
seeing writes to his node.
Aaron and I went through a series of debugging questions, user verified that
there was traffic on the gossip port. His node was showing as down from the
perspective of other nodes, and nodetool also showed that gossip was not
active.
Aaron read the code and had the user turn debug logging on. The user saw
Hinted Handoff messages being delivered and Aaron confirmed in the code that
a hinted handoff delivery session only checks gossip state when it first
starts. As a result, it will continue to deliver hints and disregard gossip
state on the target node.
per nodetool docs

disablegossip - Disable gossip (effectively marking the node dead)

I believe most people will be using disablegossip and disablethrift for
operational reasons, and propose that they do not expect HH delivery to
continue, via gossip, when they have run disablegossip.

[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256782#comment-13256782
 ] 

Jonathan Ellis commented on CASSANDRA-4162:
---

Incidentally, startup is slow is definitely on our radar. We're looking at 
that in CASSANDRA-2392 and others.

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3762) AutoSaving KeyCache and System load time improvements.

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256793#comment-13256793
 ] 

Jonathan Ellis commented on CASSANDRA-3762:
---

bq. If we want to see the optimal solution for all the use cases i think we 
have to go for the alternative where we can save the Keycache position to the 
disk and read it back and what ever is missing let it fault fill.

I like this idea.  If you have a lot of rows (i.e., a large index) then this is 
the only thing that's going to save you from doing random i/o.

The only downside I see is the question of how much churn your sstables will 
experience between save, and load.  If you have a small data set that is 
constantly being overwritten for instance, you could basically invalidate the 
whole cache.  But, it's quite possible that just reducing cache save period is 
adequate to address this.

 AutoSaving KeyCache and System load time improvements.
 --

 Key: CASSANDRA-3762
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3762
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-SavedKeyCache-load-time-improvements.patch


 CASSANDRA-2392 saves the index summary to the disk... but when we have saved 
 cache we will still scan through the index to get the data out.
 We might be able to separate this from SSTR.load and let it load the index 
 summary, once all the SST's are loaded we might be able to check the 
 bloomfilter and do a random IO on fewer Index's to populate the KeyCache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2392) Saving IndexSummaries to disk

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256804#comment-13256804
 ] 

Jonathan Ellis commented on CASSANDRA-2392:
---

For the record, I'm still fine with saying loading caches will slow down 
startup, deal with it, but I think we have a good plan of attack on 3762 now 
and it may be simpler to just do that first, before rebasing this.  Which is 
also fine.

 Saving IndexSummaries to disk
 -

 Key: CASSANDRA-2392
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2392
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-2392-v6.patch, 
 0001-re-factor-first-and-last.patch, 0001-save-summaries-to-disk-v4.patch, 
 0001-save-summaries-to-disk.patch, 0002-save-summaries-to-disk-v2.patch, 
 0002-save-summaries-to-disk-v3.patch, 0002-save-summaries-to-disk.patch, 
 CASSANDRA-2392-v5.patch


 For nodes with millions of keys, doing rolling restarts that take over 10 
 minutes per node can be painful if you have 100 node cluster. All of our time 
 is spent on doing index summary computations on startup. It would be great if 
 we could save those to disk as well. Our indexes are quite large.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256817#comment-13256817
]

Jonathan Ellis commented on CASSANDRA-2864:
---

Is the original description here still an accurate guide to the approach taken?

Alternative Row Cache Implementation

Key: CASSANDRA-2864
URL: https://issues.apache.org/jira/browse/CASSANDRA-2864
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Daniel Doubleday
Assignee: Daniel Doubleday
Priority: Minor

we have been working on an alternative implementation to the existing row
cache(s)
We have 2 main goals:
- Decrease memory - get more rows in the cache without suffering a huge
performance penalty
- Reduce gc pressure
This sounds a lot like we should be using the new serializing cache in 0.8.
Unfortunately our workload consists of loads of updates which would
invalidate the cache all the time.
The second unfortunate thing is that the idea we came up with doesn't fit the
new cache provider api...
It looks like this:
Like the serializing cache we basically only cache the serialized byte
buffer. we don't serialize the bloom filter and try to do some other minor
compression tricks (var ints etc not done yet). The main difference is that
we don't deserialize but use the normal sstable iterators and filters as in
the regular uncached case.
So the read path looks like this:
return filter.collectCollatedColumns(memtable iter, cached row iter)
The write path is not affected. It does not update the cache
During flush we merge all memtable updates with the cached rows.
The attached patch is based on 0.8 branch r1143352
It does not replace the existing row cache but sits aside it. Theres
environment switch to choose the implementation. This way it is easy to
benchmark performance differences.
-DuseSSTableCache=true enables the alternative cache. It shares its
configuration with the standard row cache. So the cache capacity is shared.
We have duplicated a fair amount of code. First we actually refactored the
existing sstable filter / reader but than decided to minimize dependencies.
Also this way it is easy to customize serialization for in memory sstable
rows.
We have also experimented a little with compression but since this task at
this stage is mainly to kick off discussion we wanted to keep things simple.
But there is certainly room for optimizations.

[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256832#comment-13256832
]

Jonathan Ellis commented on CASSANDRA-2864:
---

If so, how do you avoid scanning the sstables? Does this only work on
named-column queries? That is, if I ask for a slice from X to Y, if you have
data in your cache for X1 X2, how do you know there is not also an X3 on disk
somewhere?

Alternative Row Cache Implementation

[jira] [Commented] (CASSANDRA-2864) Alternative Row Cache Implementation

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256859#comment-13256859
]

Jonathan Ellis commented on CASSANDRA-2864:
---

I think you might need to write that book, because the commit history is tough
to follow. :)

Alternative Row Cache Implementation

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256954#comment-13256954
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

How is that different from the query cache I waved my hands about?


 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4163) CQL3 ALTER TABLE command causes NPE

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256957#comment-13256957
 ] 

Jonathan Ellis commented on CASSANDRA-4163:
---

LGTM.

Nit: could we initialize properties to an empty map to avoid having to 
null-check it?

 CQL3 ALTER TABLE command causes NPE
 ---

 Key: CASSANDRA-4163
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4163
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: INFO 16:07:11,757 Cassandra version: 1.1.0-rc1-SNAPSHOT
 INFO 16:07:11,757 Thrift API version: 19.30.0
 INFO 16:07:11,758 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
Reporter: Kristine Hahn
Assignee: paul cannon
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4163.patch.txt


 To reproduce the problem:
 ./cqlsh --cql3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift 
 protocol 19.30.0]
 Use HELP for help.
 cqlsh CREATE KEYSPACE test34 WITH strategy_class = 
 'org.apache.cassandra.locator.SimpleStrategy' AND 
 strategy_options:replication_factor='1';
 cqlsh USE test34;
 cqlsh:test34 CREATE TABLE users (
   ... password varchar,
   ... gender varchar,
   ... session_token varchar,
   ... state varchar,
   ... birth_year bigint,
   ... pk varchar,
   ... PRIMARY KEY (pk)
   ... );
 cqlsh:test34 ALTER TABLE users ADD coupon_code varchar;
 TSocket read 0 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4163) CQL3 ALTER TABLE command causes NPE

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256970#comment-13256970
 ] 

Jonathan Ellis commented on CASSANDRA-4163:
---

Yeah, I don't see this as performance critical so I'd rather go for cleanliness.

 CQL3 ALTER TABLE command causes NPE
 ---

 Key: CASSANDRA-4163
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4163
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: INFO 16:07:11,757 Cassandra version: 1.1.0-rc1-SNAPSHOT
 INFO 16:07:11,757 Thrift API version: 19.30.0
 INFO 16:07:11,758 CQL supported versions: 2.0.0,3.0.0-beta1 (default: 2.0.0)
Reporter: Kristine Hahn
Assignee: paul cannon
  Labels: cql3
 Fix For: 1.1.0

 Attachments: 4163.patch.txt


 To reproduce the problem:
 ./cqlsh --cql3
 Connected to Test Cluster at localhost:9160.
 [cqlsh 2.2.0 | Cassandra 1.1.0-rc1-SNAPSHOT | CQL spec 3.0.0 | Thrift 
 protocol 19.30.0]
 Use HELP for help.
 cqlsh CREATE KEYSPACE test34 WITH strategy_class = 
 'org.apache.cassandra.locator.SimpleStrategy' AND 
 strategy_options:replication_factor='1';
 cqlsh USE test34;
 cqlsh:test34 CREATE TABLE users (
   ... password varchar,
   ... gender varchar,
   ... session_token varchar,
   ... state varchar,
   ... birth_year bigint,
   ... pk varchar,
   ... PRIMARY KEY (pk)
   ... );
 cqlsh:test34 ALTER TABLE users ADD coupon_code varchar;
 TSocket read 0 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256975#comment-13256975
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Does this support caching head/tail queries?  Or do X and Y have to be existing 
column values?

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1956) Convert row cache to row+filter cache

2012-04-18 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256983#comment-13256983
 ] 

Jonathan Ellis commented on CASSANDRA-1956:
---

Also, it sounds like this always invalidates on update.  Would it be possible 
to preserve the current row cache behavior?  I.e., update-in-place if a 
non-copying cache implementation.

 Convert row cache to row+filter cache
 -

 Key: CASSANDRA-1956
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1956
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-1956-cache-updates-v0.patch, 
 0001-commiting-block-cache.patch, 0001-re-factor-row-cache.patch, 
 0001-row-cache-filter.patch, 0002-1956-updates-to-thrift-and-avro-v0.patch, 
 0002-add-query-cache.patch


 Changing the row cache to a row+filter cache would make it much more useful. 
 We currently have to warn against using the row cache with wide rows, where 
 the read pattern is typically a peek at the head, but this usecase would be 
 perfect supported by a cache that stored only columns matching the filter.
 Possible implementations:
 * (copout) Cache a single filter per row, and leave the cache key as is
 * Cache a list of filters per row, leaving the cache key as is: this is 
 likely to have some gotchas for weird usage patterns, and it requires the 
 list overheard
 * Change the cache key to rowkey+filterid: basically ideal, but you need a 
 secondary index to lookup cache entries by rowkey so that you can keep them 
 in sync with the memtable
 * others?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3946) BulkRecordWriter shouldn't stream any empty data/index files that might be created at end of flush

2012-04-17 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255883#comment-13255883
 ] 

Jonathan Ellis commented on CASSANDRA-3946:
---

LGTM.  Could you also post a version against 1.0, Yuki?

 BulkRecordWriter shouldn't stream any empty data/index files that might be 
 created at end of flush
 --

 Key: CASSANDRA-3946
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3946
 Project: Cassandra
  Issue Type: Bug
Reporter: Chris Goffinet
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 1.1.1

 Attachments: 0001-Abort-SSTableWriter-when-exception-occured.patch, 
 0001-CASSANDRA-3946-BulkRecordWriter-shouldn-t-stream-any.patch


 If by chance, we flush sstables during BulkRecordWriter (we have seen it 
 happen), I want to make sure we don't try to stream them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4162) nodetool disablegossip does not prevent gossip delivery of writes via already-initiated hinted handoff

2012-04-17 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255976#comment-13255976
 ] 

Jonathan Ellis commented on CASSANDRA-4162:
---

I can easily think of a scenario where you want to let the HH complete (e.g., 
you only want up to date nodes serving reads) but having trouble thinking of 
a scenario for the other way around. So no, I don't think that's a good general 
rule...

(If you want it completely cut off ISTM you should kill it and bring it back up 
without joining the ring.)

 nodetool disablegossip does not prevent gossip delivery of writes via 
 already-initiated hinted handoff
 --

 Key: CASSANDRA-4162
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4162
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
 Environment: reported on IRC, believe it was a linux environment, 
 nick rhone, cassandra 1.0.8
Reporter: Robert Coli
Priority: Minor
  Labels: gossip

 This ticket derives from #cassandra, aaron_morton and I assisted a user who 
 had run disablethrift and disablegossip and was confused as to why he was 
 seeing writes to his node.
 Aaron and I went through a series of debugging questions, user verified that 
 there was traffic on the gossip port. His node was showing as down from the 
 perspective of other nodes, and nodetool also showed that gossip was not 
 active.
 Aaron read the code and had the user turn debug logging on. The user saw 
 Hinted Handoff messages being delivered and Aaron confirmed in the code that 
 a hinted handoff delivery session only checks gossip state when it first 
 starts. As a result, it will continue to deliver hints and disregard gossip 
 state on the target node.
 per nodetool docs
 
 disablegossip  - Disable gossip (effectively marking the node dead)
 
 I believe most people will be using disablegossip and disablethrift for 
 operational reasons, and propose that they do not expect HH delivery to 
 continue, via gossip, when they have run disablegossip.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4079) Check SSTable range before running cleanup

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254948#comment-13254948
 ] 

Jonathan Ellis commented on CASSANDRA-4079:
---

Changed to an abstract method in AB at 
https://github.com/jbellis/cassandra/branches/4079-3

 Check SSTable range before running cleanup
 --

 Key: CASSANDRA-4079
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4079
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Jonathan Ellis
Priority: Minor
  Labels: compaction
 Fix For: 1.1.1

 Attachments: 4079.txt


 Before running a cleanup compaction on an SSTable we should check the range 
 to see if the SSTable falls into the range we want to remove. If it doesn't 
 we can just mark the SSTable as compacted and be done with it, if it does, we 
 can no-op.
 Will not help with STCS, but for LCS, and perhaps some others we may see a 
 benefit here after topology changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4079) Check SSTable range before running cleanup

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254974#comment-13254974
 ] 

Jonathan Ellis commented on CASSANDRA-4079:
---

couldn't leave well enough alone ... 
https://github.com/jbellis/cassandra/branches/4079-4 makes AB.intersects 
non-abstract and pushes the type check into Range.insersects(AB).  I like this 
a little better since it lets me comment why I'm leaving the EB/IEB 
unimplemented in an obvious place.

 Check SSTable range before running cleanup
 --

 Key: CASSANDRA-4079
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4079
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benjamin Coverston
Assignee: Jonathan Ellis
Priority: Minor
  Labels: compaction
 Fix For: 1.1.1

 Attachments: 4079.txt


 Before running a cleanup compaction on an SSTable we should check the range 
 to see if the SSTable falls into the range we want to remove. If it doesn't 
 we can just mark the SSTable as compacted and be done with it, if it does, we 
 can no-op.
 Will not help with STCS, but for LCS, and perhaps some others we may see a 
 benefit here after topology changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255029#comment-13255029
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

Thanks, Dave!

I think it would be good to split up the method calls at the JMX level as well, 
since it doesn't really make sense to apply a specific CF name AND multiple 
keyspaces at the same time. What do you think?

Nit: help in nodecommand adds a second line for snapshot instead of 
snapshot_columnfamily

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4151) Apache project branding requirements: DOAP file [PATCH]

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255040#comment-13255040
 ] 

Jonathan Ellis commented on CASSANDRA-4151:
---

Comments:

- unclear what changed in Description, apart from s/Cassandra/it/, which isn't 
an improvement when the antecedent is unclear
- not a fan of updating this thing for each release.  would prefer to leave out.
- should probably leave out svn repo entirely as well, rather than pointing 
people to the unused (except for site) old one


 Apache project branding requirements: DOAP file [PATCH]
 ---

 Key: CASSANDRA-4151
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4151
 Project: Cassandra
  Issue Type: Improvement
Reporter: Shane Curcuru
  Labels: branding
 Attachments: doap_Cassandra.rdf


 Attached.  Re: http://www.apache.org/foundation/marks/pmcs
 See Also: http://projects.apache.org/create.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255045#comment-13255045
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

bq. Just wanted to make sure folks were ok with splitting the command as it is

I guess the main alternative would be to add more -flags...  I'm okay breaking 
backwards compatibility there.

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255072#comment-13255072
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

bq. then you have the potential situation of n keyspaces with a cf name

Not sure I follow, could you elaborate?

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-556) nodeprobe snapshot to support specific column families

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255082#comment-13255082
 ] 

Jonathan Ellis commented on CASSANDRA-556:
--

Ah, I see.  Quite right, CF names are not unique.  (So what you could do is 
check the schema nodetool-side and spit back a which KS did you want to 
snapshot CF in? error...)

 nodeprobe snapshot to support specific column families
 --

 Key: CASSANDRA-556
 URL: https://issues.apache.org/jira/browse/CASSANDRA-556
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Were
Assignee: Dave Brosius
Priority: Minor
  Labels: lhf
 Fix For: 1.1.1

 Attachments: cf_snapshots_556.diff


 It would be good to support dumping specific column families via nodeprobe 
 for backup purposes.
 In my particular case the majority of cassandra data doesn't need to be 
 backed up except for a couple of column families containing user settings / 
 profiles etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-16 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255126#comment-13255126
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

Ship it!

 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140-v2.patch, 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4004) Add support for ReversedType

2012-04-14 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254139#comment-13254139
 ] 

Jonathan Ellis commented on CASSANDRA-4004:
---

bq. The idea of my patch is that adding DESC to a field in a table declaration 
changes the order of records logically 

That makes sense for the old world of ReversedType and {{reversed}} slice flag, 
but I don't think it makes sense for CQL. When I ask for ORDER BY X DESC I 
expect the largest X first, period. So a table declaration like this makes 
sense as an optimization if DESC is your most frequent query type, but it 
shouldn't change the semantics of the query itself.

 Add support for ReversedType
 

 Key: CASSANDRA-4004
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4004
 Project: Cassandra
  Issue Type: Sub-task
  Components: API
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 1.1.1

 Attachments: 4004.txt


 It would be nice to add a native syntax for the use of ReversedType. I'm sure 
 there is anything in SQL that we inspired ourselves from, so I would propose 
 something like:
 {noformat}
 CREATE TABLE timeseries (
   key text,
   time uuid,
   value text,
   PRIMARY KEY (key, time DESC)
 )
 {noformat}
 Alternatively, the DESC could also be put after the column name definition 
 but one argument for putting it in the PK instead is that this only apply to 
 keys.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4151) Apache project branding requirements: DOAP file [PATCH]

2012-04-14 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254175#comment-13254175
 ] 

Jonathan Ellis commented on CASSANDRA-4151:
---

I put the doap file in svn/site which is where the rest of the site still lives.

 Apache project branding requirements: DOAP file [PATCH]
 ---

 Key: CASSANDRA-4151
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4151
 Project: Cassandra
  Issue Type: Improvement
Reporter: Shane Curcuru
  Labels: branding
 Attachments: doap_Cassandra.rdf


 Attached.  Re: http://www.apache.org/foundation/marks/pmcs
 See Also: http://projects.apache.org/create.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4153) Optimize truncate when snapshots are disabled or keyspace not durable

2012-04-14 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254250#comment-13254250
]

Jonathan Ellis commented on CASSANDRA-4153:
---

bq. truncate does not need to flush memtables to disk when snapshots are
disabled

It still needs to clear out the memtables somehow though, or truncate won't
actually discard all the data it's expected to.

Optimize truncate when snapshots are disabled or keyspace not durable
-

Key: CASSANDRA-4153
URL: https://issues.apache.org/jira/browse/CASSANDRA-4153
Project: Cassandra
Issue Type: Improvement
Reporter: Christian Spriegel
Priority: Minor
Attachments: OptimizeTruncate_v1.diff

My goal is to make truncate to be less IO intensive so that my junit tests
run faster (as already explained in CASSANDRA-3710). I think I have now a
solution which does not change too much:
I created a patch that optimizes three things within truncate:
- Skip the whole Commitlog.forceNewSegment/discardCompletedSegments, if
durable_writes are disabled for the keyspace.
- With CASSANDRA-3710 implemented, truncate does not need to flush memtables
to disk when snapshots are disabled.
- Reduce the sleep interval
The patch works nicely for me. Applying it and disabling
durable_writes/autoSnapshot increased the speed of my testsuite vastly. I
hope I did not overlook something.
Let me know if my patch needs cleanup. I'd be glad to change it, if it means
the patch will get accepted.

[jira] [Commented] (CASSANDRA-4147) cqlsh doesn't accept NULL as valid input

2012-04-13 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253445#comment-13253445
 ] 

Jonathan Ellis commented on CASSANDRA-4147:
---

Explicitly inserting a null doesn't make much sense either; to Cassandra/CQL, 
null is the complete absence of a value.  So we'll *return* a null if you ask 
for column {{foo}} and no {{foo}} exists for a given row, but I can't think of 
why you'd want to explicitly insert one.  (And a row consisting entirely of 
nulls wouldn't exist at all...)

 cqlsh doesn't accept NULL as valid input
 

 Key: CASSANDRA-4147
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4147
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.8
Reporter: T Jake Luciani
Assignee: paul cannon
Priority: Minor
 Fix For: 1.0.10


 cqlsh:cfs insert into foo (key,val1,val2)values('row2',NULL,NULL);
 Bad Request: unable to make long from 'NULL'

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-13 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253583#comment-13253583
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

How about tools/bin/stress?

 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4143) HH delivery should not be attempted when target node is down

2012-04-13 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253585#comment-13253585
 ] 

Jonathan Ellis commented on CASSANDRA-4143:
---

(done for 1.0.10 + 1.1.0)

 HH delivery should not be attempted when target node is down
 

 Key: CASSANDRA-4143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4143
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
Reporter: Radim Kolar
Priority: Minor

 Look at this log fragment. Cassandra tries to do HH delivery every 10 minutes 
 even if host is marked down by gossip.
  INFO [GossipTasks:1] 2012-04-12 03:01:55,040 Gossiper.java (line 818) 
 InetAddress /64.6.104.18 is now dead.
  INFO [MemoryMeter:1] 2012-04-12 03:04:12,503 Memtable.java (line 186) 
 CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 
 1.7635719581514129 (just-counted was 1.7635719581514129).  calculation took 
 224ms for 226 columns
  WARN [MemoryMeter:1] 2012-04-12 03:08:48,995 Memtable.java (line 176) 
 setting live ratio to minimum of 1.0 instead of 0.8717337990312605
  INFO [MemoryMeter:1] 2012-04-12 03:08:48,995 Memtable.java (line 186) 
 CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 
 1.7635719581514129 (just-counted was 1.0).  calculation took 8ms for 738 
 columns
  INFO [HintedHandoff:1] 2012-04-12 03:09:31,269 HintedHandOffManager.java 
 (line 292) Endpoint /64.6.104.18 died before hint delivery, aborting
  INFO [MemoryMeter:1] 2012-04-12 03:16:58,007 Memtable.java (line 186) 
 CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 
 1.7635719581514129 (just-counted was 1.0055416029080733).  calculation took 
 19ms for 1762 columns
  INFO [HintedHandoff:1] 2012-04-12 03:19:54,924 HintedHandOffManager.java 
 (line 292) Endpoint /64.6.104.18 died before hint delivery, aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4140) Build stress classes in a location that allows tools/stress/bin/stress to find them

2012-04-13 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253731#comment-13253731
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

bq. Stress bin to a common tools/bin 

That's what I was referring to; I don't care where the .class or .jar files go.

 Build stress classes in a location that allows tools/stress/bin/stress to 
 find them
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Assignee: Vijay
Priority: Trivial
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4140.patch


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4138) Add varint encoding to Serializing Cache

2012-04-13 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253895#comment-13253895
 ] 

Jonathan Ellis commented on CASSANDRA-4138:
---

If we're relying on overriding writeInt etc, does that mean we're giving up 
being able to use varints over the network?  (Not a concern for the cache 
obviously but I'm thinking ahead to CASSANDRA-3024.)

 Add varint encoding to Serializing Cache
 

 Key: CASSANDRA-4138
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4138
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.2

 Attachments: 0001-CASSANDRA-4138-Take1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4143) HH delivery should not be attempted when target node is down

2012-04-12 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252510#comment-13252510
 ] 

Jonathan Ellis commented on CASSANDRA-4143:
---

I'd be fine pushing that down to debug, though.

 HH delivery should not be attempted when target node is down
 

 Key: CASSANDRA-4143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4143
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9
Reporter: Radim Kolar
Priority: Minor

 Look at this log fragment. Cassandra tries to do HH delivery every 10 minutes 
 even if host is marked down by gossip.
  INFO [GossipTasks:1] 2012-04-12 03:01:55,040 Gossiper.java (line 818) 
 InetAddress /64.6.104.18 is now dead.
  INFO [MemoryMeter:1] 2012-04-12 03:04:12,503 Memtable.java (line 186) 
 CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 
 1.7635719581514129 (just-counted was 1.7635719581514129).  calculation took 
 224ms for 226 columns
  WARN [MemoryMeter:1] 2012-04-12 03:08:48,995 Memtable.java (line 176) 
 setting live ratio to minimum of 1.0 instead of 0.8717337990312605
  INFO [MemoryMeter:1] 2012-04-12 03:08:48,995 Memtable.java (line 186) 
 CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 
 1.7635719581514129 (just-counted was 1.0).  calculation took 8ms for 738 
 columns
  INFO [HintedHandoff:1] 2012-04-12 03:09:31,269 HintedHandOffManager.java 
 (line 292) Endpoint /64.6.104.18 died before hint delivery, aborting
  INFO [MemoryMeter:1] 2012-04-12 03:16:58,007 Memtable.java (line 186) 
 CFS(Keyspace='system', ColumnFamily='HintsColumnFamily') liveRatio is 
 1.7635719581514129 (just-counted was 1.0055416029080733).  calculation took 
 19ms for 1762 columns
  INFO [HintedHandoff:1] 2012-04-12 03:19:54,924 HintedHandOffManager.java 
 (line 292) Endpoint /64.6.104.18 died before hint delivery, aborting

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4142) OOM Exception during repair session with LeveledCompactionStrategy

2012-04-12 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252517#comment-13252517
]

Jonathan Ellis commented on CASSANDRA-4142:
---

bq. Eclipse Memory Analyzer's denominator tree shows that 99% of a
SSTableBoundedScanner object's memory is consumed by a
CompressedRandomAccessReader which contains two big byte arrays.

One would be the 64KB buffer in RandomAccessReader; the other is the CRAR
compression buffer.

The comments in CRAR say that it can't use super.read, so is the RAR buffer
wasted?

bq. an ArrayList of SSTableBoundedScanner which appears to contain as many
objects as there are SSTables on disk

Not sure how badly validation needs all of these at once. It definitely seems
like we could limit it to at most the overlapping sstables for a single L1
target at a time, though, which would cut it by a factor of 10.

OOM Exception during repair session with LeveledCompactionStrategy
--

Key: CASSANDRA-4142
URL: https://issues.apache.org/jira/browse/CASSANDRA-4142
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.0.6
Environment: OS: Linux CentOs 6
JDK: Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)
Node configuration:
Quad-core
10 GB RAM
Xmx set to 2,5 GB (as computed by default).
Reporter: Romain Hardouin

We encountered an OOM Exception on 2 nodes during repair session.
Our CF are set up to use LeveledCompactionStrategy and SnappyCompressor.
These two options used together maybe the key to the problem.
Despite of setting XX:+HeapDumpOnOutOfMemoryError, no dump have been
generated.
Nonetheless a memory analysis on a live node doing a repair reveals an
hotspot: an ArrayList of SSTableBoundedScanner which appears to contain as
many objects as there are SSTables on disk.
This ArrayList consumes 786 MB of the heap space for 5757 objects. Therefore
each object is about 140 KB.
Eclipse Memory Analyzer's denominator tree shows that 99% of a
SSTableBoundedScanner object's memory is consumed by a
CompressedRandomAccessReader which contains two big byte arrays.
Cluster information:
9 nodes
Each node handles 35 GB (RandomPartitioner)
This JIRA was created following this discussion:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Why-so-many-SSTables-td7453033.html

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

2012-04-12 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252522#comment-13252522
 ] 

Jonathan Ellis commented on CASSANDRA-3974:
---

bq. In the initial patch, I had made changes to both 
UpdateStatement.addToMutation and ColumnFamily.addColumn to use the larger of 
the column's TTL or the column family default TTL

Oops, I totally missed the addColumn changes.  That's exactly what I had in 
mind.

It sounds like you have an updated patch, could you post that?

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2635) make cache skipping optional

2012-04-12 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252793#comment-13252793
]

Jonathan Ellis commented on CASSANDRA-2635:
---

Skimming this it looks like this turns off the cache hints for the read side as
well as for writes. I don't see any benefit to disabling it for reads -- if
there's nothing to evict the data, it won't get evicted even with the dontneed
hint. It's on the write side that we need to tell it, go ahead and cache this
sstable that i'm writing now.

make cache skipping optional

Key: CASSANDRA-2635
URL: https://issues.apache.org/jira/browse/CASSANDRA-2635
Project: Cassandra
Issue Type: Improvement
Reporter: Peter Schuller
Assignee: Harish Doddi
Priority: Minor
Attachments: CASSANDRA-2635-075.txt, CASSANDRA-2635-trunk-1.txt,
CASSANDRA-2635-trunk.txt

We've applied this patch locally in order to turn of page skipping; not
completely but only for compaction/repair situations where it can be directly
detrimental in the sense of causing data to become cold even though your
entire data set fits in memory.
It's better than completely disabling DONTNEED because the cache skipping
does make sense and has no relevant (that I can see) detrimental effects in
some cases, like when dumping caches.
The patch is against 0.7.5 right now but if the change is desired I can make
a patch for trunk. Also, the name of the configuration option is dubious
since saying 'false' does not actually turn it off completely. I wasn't able
to figure out a good name that conveyed the functionality in a short brief
name however.
A related concern as discussed in CASSANDRA-1902 is that the cache skipping
isn't fsync:ing and so won't work reliably on writes. If the feature is to be
retained that's something to fix in a different ticket.
A question is also whether to retain the default to true or change it to
false. I'm kinda leaning to false since it's detrimental in the easy cases
of little data. In big cases with lots of data people will have to think
and tweak anyway, so better to put the burden on that end.

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251561#comment-13251561
]

Jonathan Ellis commented on CASSANDRA-3974:
---

bq. Part of the code I changed was in CFMetaData's toThrift and fromThrift
methods

Let me back up. I can see two main approaches towards respecting the per-CF
ttl:

# Set the column TTL to the max(column, CF) ttl on insert; then the rest of the
code doesn't have to know anything changed
# Take max(column, CF) ttl during operations like compaction, and leave column
ttl to specify *only* the column TTL

The code in UpdateStatement led me to believe you're going with option 1. So
what I meant by my comment was, you need to make a similar change for inserts
done over Thrift RPC, as well. (to/from Thrift methods are used for telling
Thrift clients about the schema, but are not used for insert/update operations.)

Does that help?

bq. Sorry, I'm not sure to which part of the code you're referring

CFMetadata.getTimeToLive. Sounds like you addressed this anyway.

Per-CF TTL
--

Key: CASSANDRA-3974
URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
Project: Cassandra
Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
Fix For: 1.2

Attachments: trunk-3974.txt

Per-CF TTL would allow compaction optimizations (drop an entire sstable's
worth of expired data) that we can't do with per-column.

[jira] [Commented] (CASSANDRA-4140) Move stress to main jar

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252027#comment-13252027
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

bq. Right now its hard to run stress from a checkout of trunk

Wow, that's a definition of hard I'm unfamiliar with. :)

I would rather move *more* non-core things to tools/ (sstable export/import, 
sstablekeys, sstableloader), than the other way around.

 Move stress to main jar
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Priority: Trivial
 Fix For: 1.2


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4140) Move stress to main jar

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252028#comment-13252028
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

To clarify, stress build is broken post-CASSANDRA-4103, but let's stuff 
everything in one big jar is not my preferred solution.

 Move stress to main jar
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Priority: Trivial
 Fix For: 1.2


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4140) Move stress to main jar

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252032#comment-13252032
 ] 

Jonathan Ellis commented on CASSANDRA-4140:
---

bq. I do think moving things like you mentioned into tools makes them less 
'discoverable'

I'm totally fine with that.  In fact I think it's a feature: 99% of people 
using sstable2json are Doing It Wrong; it's meant to be a debugging tool, end 
of story.

 Move stress to main jar
 ---

 Key: CASSANDRA-4140
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4140
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Affects Versions: 1.2
Reporter: Nick Bailey
Priority: Trivial
 Fix For: 1.2


 Right now its hard to run stress from a checkout of trunk. You need to do 
 'ant artifacts' and then run the stress tool in the generated artifacts.
 A discussion on irc came up with the proposal to just move stress to the main 
 jar, but the stress/stressd bash scripts in bin/, and drop the tools 
 directory altogether. It will be easier for users to find that way and will 
 make running stress from a checkout much easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2319) Promote row index

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252058#comment-13252058
]

Jonathan Ellis commented on CASSANDRA-2319:
---

bq. I don't see an easy way to merge those 2 settings into 1 if that was what
you were hinting to.

Yes, that's where I was going.

What if we dropped the main index and just kept the sample index of every
1/128 columns? Seems like we'd trade a little more seq i/o to do less random
i/o, and being able to get rid of the index sampling phase on startup...

Promote row index
-

Key: CASSANDRA-2319
URL: https://issues.apache.org/jira/browse/CASSANDRA-2319
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Labels: index, timeseries
Fix For: 1.2

Attachments: 2319-v1.tgz, 2319-v2.tgz, promotion.pdf, version-f.txt,
version-g-lzf.txt, version-g.txt

The row index contains entries for configurably sized blocks of a wide row.
For a row of appreciable size, the row index ends up directing the third seek
(1. index, 2. row index, 3. content) to nearby the first column of a scan.
Since the row index is always used for wide rows, and since it contains
information that tells us whether or not the 3rd seek is necessary (the
column range or name we are trying to slice may not exist in a given
sstable), promoting the row index into the sstable index would allow us to
drop the maximum number of seeks for wide rows back to 2, and, more
importantly, would allow sstables to be eliminated using only the index.
An example usecase that benefits greatly from this change is time series data
in wide rows, where data is appended to the beginning or end of the row. Our
existing compaction strategy gets lucky and clusters the oldest data in the
oldest sstables: for queries to recently appended data, we would be able to
eliminate wide rows using only the sstable index, rather than needing to seek
into the data file to determine that it isn't interesting. For narrow rows,
this change would have no effect, as they will not reach the threshold for
indexing anyway.
A first cut design for this change would look very similar to the file format
design proposed on #674:
http://wiki.apache.org/cassandra/FileFormatDesignDoc: row keys clustered,
column names clustered, and offsets clustered and delta encoded.

[jira] [Commented] (CASSANDRA-4052) Add way to force the cassandra-cli to refresh it's schema

2012-04-11 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252203#comment-13252203
]

Jonathan Ellis commented on CASSANDRA-4052:
---

Ah, I bet you're right.

Add way to force the cassandra-cli to refresh it's schema
-

Key: CASSANDRA-4052
URL: https://issues.apache.org/jira/browse/CASSANDRA-4052
Project: Cassandra
Issue Type: Improvement
Components: Tools
Affects Versions: 1.0.8
Reporter: Tupshin Harper
Priority: Minor

By design, the cassandra-cli caches the schema and doesn't refresh it when
various commands like describe keyspaces are run. This is reasonable, and
it is easy enough to restart the cli if necessary. However, this does lead
to confusion since a new user can reasonably assume that describe keyspaces
will always show an accurate current represention of the ring. We should find
a way to reduce the surprise (and lack of easy discoverability) of this
behaviour.
I propose any one of the following(#1 is probably the easiest and most
likely):
1) Add a command (that would be documented in the cli's help) to explicitly
refresh the schema (schema refresh, refresh schema, or anything similar).
2) Always force a refresh of the schema when performing at least the
describe keyspaces command.
3) Add a flag to cassandra-cli to explicitly enable schema caching. If that
flag is not passed, then schema caching will be disabled for that session.
This suggestion assumes that for simple deployments (few CFs, etc), schema
caching isn't very important to the performance of the cli.

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250651#comment-13250651
 ] 

Jonathan Ellis commented on CASSANDRA-3647:
---

Well, the ResultSet isn't written in stone itself; it's always been kind of a 
placeholder pending CASSANDRA-2478.  Our custom transport could represent the 
entire resultset in json (or Smile) if we want, which is the approach unql 
appears to take: http://www.unqlspec.org/display/UnQL/Example+Queries+and+Usage

For the existing Thrift transport though I'm not super concerned about it, as 
long as we come up with something halfway reasonable (which json-encoding 
qualifies as), I'm okay with it.

Alternatively, we could use a more compact, custom format leveraging the fact 
that we know the types involved (and thus don't need to implicitly encode those 
in an inefficient representation), e.g. for Map number of entries followed by 
key/value pairs in native binary format.

 Support arbitrarily nested documents in CQL
 -

 Key: CASSANDRA-3647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
  Labels: cql

 Composite columns introduce the ability to have arbitrarily nested data in a 
 Cassandra row.  We should expose this through CQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3710) Add a configuration option to disable snapshots

2012-04-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251096#comment-13251096
 ] 

Jonathan Ellis commented on CASSANDRA-3710:
---

This is a new one for me:

{noformat}
$ patch -p1  auto_snapshot_2.diff
missing header for unified diff at line 2 of patch
{noformat}


 Add a configuration option to disable snapshots
 ---

 Key: CASSANDRA-3710
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3710
 Project: Cassandra
  Issue Type: New Feature
Reporter: Brandon Williams
Assignee: Dave Brosius
Priority: Minor
 Attachments: Cassandra107Patch_TestModeV1.txt, auto_snapshot.diff, 
 auto_snapshot_2.diff


 Let me first say, I hate this idea.  It gives cassandra the ability to 
 permanently delete data at a large scale without any means of recovery.  
 However, I've seen this requested multiple times, and it is in fact useful in 
 some scenarios, such as when your application is using an embedded cassandra 
 instance for testing and need to truncate, which without JNA will timeout 
 more often than not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3785) Support slice with exclusive start and stop

2012-04-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251106#comment-13251106
 ] 

Jonathan Ellis commented on CASSANDRA-3785:
---

+1

 Support slice with exclusive start and stop
 ---

 Key: CASSANDRA-3785
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3785
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: cql3
 Fix For: 1.1.1

 Attachments: 3785.patch


 Currently, slices are always start and end inclusive. However, for CQL 3.0, 
 we already differenciate between inclusivity/exclusivity for the row key and 
 for the component of composite columns. It would be nice to always support 
 that distinction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4134) Do not send hints before a node is fully up

2012-04-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251108#comment-13251108
 ] 

Jonathan Ellis commented on CASSANDRA-4134:
---

But it *is* up, it just doesn't have the most recent schema (which it has no 
way of knowing).

 Do not send hints before a node is fully up
 ---

 Key: CASSANDRA-4134
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4134
 Project: Cassandra
  Issue Type: Bug
Reporter: Joaquin Casares
Priority: Minor

 After seeing this on a cluster and working with Pavel, we have seen the 
 following errors disappear after all migrations have been applied:
 {noformat}
 ERROR [MutationStage:1] 2012-04-09 18:16:00,240 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find 
 cfId=1028
   at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:129)
   at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
   at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
   at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
   at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 and
 ERROR [ReadStage:69] 2012-04-09 18:16:01,715 AbstractCassandraDaemon.java 
 (line 139) Fatal exception in thread Thread[ReadStage:69,5,main]
 java.lang.IllegalArgumentException: Unknown ColumnFamily content_indexes in 
 keyspace linkcurrent
   at org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
   at 
 org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:300)
   at 
 org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommand.init(SliceByNamesReadCommand.java:44)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
   at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
   at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It seems as though as soon as the correct Migration is applied, the Hints are 
 accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4134) Do not send hints before a node is fully up

2012-04-10 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251141#comment-13251141
 ] 

Jonathan Ellis commented on CASSANDRA-4134:
---

Hmm, yes (on the sending side).

 Do not send hints before a node is fully up
 ---

 Key: CASSANDRA-4134
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4134
 Project: Cassandra
  Issue Type: Bug
Reporter: Joaquin Casares
Priority: Minor

 After seeing this on a cluster and working with Pavel, we have seen the 
 following errors disappear after all migrations have been applied:
 {noformat}
 ERROR [MutationStage:1] 2012-04-09 18:16:00,240 RowMutationVerbHandler.java 
 (line 61) Error in row mutation
 org.apache.cassandra.db.UnserializableColumnFamilyException: Couldn't find 
 cfId=1028
   at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:129)
   at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:401)
   at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:409)
   at org.apache.cassandra.db.RowMutation.fromBytes(RowMutation.java:357)
   at 
 org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:42)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 and
 ERROR [ReadStage:69] 2012-04-09 18:16:01,715 AbstractCassandraDaemon.java 
 (line 139) Fatal exception in thread Thread[ReadStage:69,5,main]
 java.lang.IllegalArgumentException: Unknown ColumnFamily content_indexes in 
 keyspace linkcurrent
   at org.apache.cassandra.config.Schema.getComparator(Schema.java:223)
   at 
 org.apache.cassandra.db.ColumnFamily.getComparatorFor(ColumnFamily.java:300)
   at 
 org.apache.cassandra.db.ReadCommand.getComparator(ReadCommand.java:92)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommand.init(SliceByNamesReadCommand.java:44)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:106)
   at 
 org.apache.cassandra.db.SliceByNamesReadCommandSerializer.deserialize(SliceByNamesReadCommand.java:74)
   at 
 org.apache.cassandra.db.ReadCommandSerializer.deserialize(ReadCommand.java:132)
   at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {noformat}
 It seems as though as soon as the correct Migration is applied, the Hints are 
 accepted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249933#comment-13249933
 ] 

Jonathan Ellis commented on CASSANDRA-3647:
---

bq. So if you want to make a standard way of setting up composite columns for 
maps/lists/sets, I think this issue can be hi-jacked for that. If you want to 
add a new type of column that supports redis like map/set/list operations, I 
would make a new issue.

Well, it's both.  Because we do want the latter, but implemented as the former. 
 I suppose it's reasonable to split out CQL operations (list append/pop, map 
get/put/remove, set add/remove) to another ticket.

So first, we're going to need to support heterogeneous comparators somehow.  
Consider this table declaration:

{code}
CREATE TABLE foo (
id int PRIMARY KEY,
field1 text,
field2 mapint, text,
field3 listtext
);
{code}

The Cassandra CF containing these rows will contain single-level columns 
({{field1}}), CT(ascii, int) ({{field2}}), and CT(ascii, uuid) ({{field3}}), 
assuming that we represent lists with v1 uuid column names, which seems like 
the best option to me.

CASSANDRA-3657 gets us part of the way there (all CF column names will have the 
same prefix, which is the CQL column name) but not all the way.



 Support arbitrarily nested documents in CQL
 -

 Key: CASSANDRA-3647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
  Labels: cql

 Composite columns introduce the ability to have arbitrarily nested data in a 
 Cassandra row.  We should expose this through CQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249952#comment-13249952
 ] 

Jonathan Ellis commented on CASSANDRA-3647:
---

bq. a library to represent utf8 as byte[] instead of String

e.g., 
https://github.com/jruby/bytelist/blob/master/src/org/jruby/util/ByteList.java

 Support arbitrarily nested documents in CQL
 -

 Key: CASSANDRA-3647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
  Labels: cql

 Composite columns introduce the ability to have arbitrarily nested data in a 
 Cassandra row.  We should expose this through CQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4118) ConcurrentModificationException in ColumnFamily.updateDigest(ColumnFamily.java:294) (cassandra 1.0.8)

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249955#comment-13249955
 ] 

Jonathan Ellis commented on CASSANDRA-4118:
---

Also note that Solandra *does* use StorageProxy directly instead of going 
through Thrift.

 ConcurrentModificationException in 
 ColumnFamily.updateDigest(ColumnFamily.java:294)  (cassandra 1.0.8)
 --

 Key: CASSANDRA-4118
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4118
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.8
 Environment: two nodes, replication factor=2
Reporter: Zklanu Ryś
Assignee: Vijay
 Fix For: 1.0.10, 1.1.0


 Sometimes when reading data I receive them without any exception but I can 
 see in Cassandra logs, that there is an error:
 ERROR [ReadRepairStage:58] 2012-04-05 12:04:35,732 
 AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
 Thread[ReadRepairStage:58,5,main]
 java.util.ConcurrentModificationException
 at 
 java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
 at java.util.AbstractList$Itr.next(AbstractList.java:343)
 at 
 org.apache.cassandra.db.ColumnFamily.updateDigest(ColumnFamily.java:294)
 at org.apache.cassandra.db.ColumnFamily.digest(ColumnFamily.java:288)
 at 
 org.apache.cassandra.service.RowDigestResolver.resolve(RowDigestResolver.java:102)
 at 
 org.apache.cassandra.service.RowDigestResolver.resolve(RowDigestResolver.java:30)
 at 
 org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.runMayThrow(ReadCallback.java:227)
 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250234#comment-13250234
]

Jonathan Ellis commented on CASSANDRA-3647:
---

bq. we could very well have a UnionType comparator that takes argument and
allow a component to be one of different comparator

Maybe. Note that we never actually need to *compare* the different types,
since sub-components of types X and Y will always have a different parent
component. We just need to *allow* them.

Also, whether the field is a map/list/set is irrelevant for the purposes of the
storage engine. (All the operations I propose can be done as a single CT
insert operation, without read-before-right. Except for pop, which I didn't
think through and I withdraw. :) So not sure whether representing that as part
of the Comparator is the right thing to do. That is, QueryProcessor will need
to know that some columns should be bundled together as a Map, but
ColumnFamilyStore and beneath won't care.

Support arbitrarily nested documents in CQL
-

Key: CASSANDRA-3647
URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
Project: Cassandra
Issue Type: New Feature
Components: API, Core
Reporter: Jonathan Ellis
Labels: cql

Composite columns introduce the ability to have arbitrarily nested data in a
Cassandra row. We should expose this through CQL.

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250261#comment-13250261
]

Jonathan Ellis commented on CASSANDRA-3647:
---

Those are good ideas, but I'm going to be the bad guy in the name of keeping
this in-scope for 1.2. :)

The crucial idea here is to provide *nested* collections for convenience; if
it's so big you need to page it, it probably belongs in a separate row. So
paging is not on my short list to start with.

I also chose the word list over deque because I cannot think of a way to
provide push-front efficiently (i.e., without read-before-write, and without
update-all-existing-list-items). As I mentioned in passing, we can provide
append efficiently by using v1 uuids as column names (and translating to list
indexes in QueryProcessor), but that doesn't give us anything else for free.
Open to suggestions if you have a better design in mind, of course.

Support arbitrarily nested documents in CQL
-

Key: CASSANDRA-3647
URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
Project: Cassandra
Issue Type: New Feature
Components: API, Core
Reporter: Jonathan Ellis
Labels: cql

Composite columns introduce the ability to have arbitrarily nested data in a
Cassandra row. We should expose this through CQL.

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250301#comment-13250301
 ] 

Jonathan Ellis commented on CASSANDRA-3647:
---

Well, uuids are better than timestamps since that means you won't lose data if 
two clients update at the same time.  But we can pretend we're using timestamps 
for simplicity here.  Suppose that we're implementing field3.pushbefore(), 
then.  What should the negative timestamp be?

 Support arbitrarily nested documents in CQL
 -

 Key: CASSANDRA-3647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
  Labels: cql

 Composite columns introduce the ability to have arbitrarily nested data in a 
 Cassandra row.  We should expose this through CQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3647) Support arbitrarily nested documents in CQL

2012-04-09 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250330#comment-13250330
 ] 

Jonathan Ellis commented on CASSANDRA-3647:
---

Ah, good idea.  That should actually work fine.  For UUIDs we'd need to pick 
some point in time as zero and then subtract from that, since there is no 
such thing as a negative time in that context, but that's a minor wrinkle and 
doesn't matter for the purposes of the list, since the uuid contents are an 
implementation detail as far as the user is concerned.


 Support arbitrarily nested documents in CQL
 -

 Key: CASSANDRA-3647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
  Labels: cql

 Composite columns introduce the ability to have arbitrarily nested data in a 
 Cassandra row.  We should expose this through CQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3690) Streaming CommitLog backup

2012-04-07 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249329#comment-13249329
 ] 

Jonathan Ellis commented on CASSANDRA-3690:
---

I do think we should make recycling always-on; it's a non-negligible 
performance win and so far we don't have a use case that requires disabling it.

 Streaming CommitLog backup
 --

 Key: CASSANDRA-3690
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3690
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.1.1

 Attachments: 0001-CASSANDRA-3690-v2.patch, 
 0001-CASSANDRA-3690-v4.patch, 0001-Make-commitlog-recycle-configurable.patch, 
 0002-support-commit-log-listener.patch, 0003-helper-jmx-methods.patch, 
 0004-external-commitlog-with-sockets.patch, 
 0005-cmmiting-comments-to-yaml.patch


 Problems with the current SST backups
 1) The current backup doesn't allow us to restore point in time (within a SST)
 2) Current SST implementation needs the backup to read from the filesystem 
 and hence additional IO during the normal operational Disks
 3) in 1.0 we have removed the flush interval and size when the flush will be 
 triggered per CF, 
   For some use cases where there is less writes it becomes 
 increasingly difficult to time it right.
 4) Use cases which needs BI which are external (Non cassandra), needs the 
 data in regular intervals than waiting for longer or unpredictable intervals.
 Disadvantages of the new solution
 1) Over head in processing the mutations during the recover phase.
 2) More complicated solution than just copying the file to the archive.
 Additional advantages:
 Online and offline restore.
 Close to live incremental backup.
 Note: If the listener agent gets restarted, it is the agents responsibility 
 to Stream the files missed or incomplete.
 There are 3 Options in the initial implementation:
 1) Backup - Once a socket is connected we will switch the commit log and 
 send new updates via the socket.
 2) Stream - will take the absolute path of the file and will read the file 
 and send the updates via the socket.
 3) Restore - this will get the serialized bytes and apply's the mutation.
 Side NOTE: (Not related to this patch as such) The agent which will take 
 incremental backup is planned to be open sourced soon (Name: Priam).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-04-06 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248399#comment-13248399
 ] 

Jonathan Ellis commented on CASSANDRA-4093:
---

bq. we could allow defining something like {{create index on t(k1)}}

Ah, right.  Yes, I suppose there's no reason not to expose that to Thrift as 
well.

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4129) Cannot create keyspace with specific keywords through cli

2012-04-06 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248478#comment-13248478
 ] 

Jonathan Ellis commented on CASSANDRA-4129:
---

+1

 Cannot create keyspace with specific keywords through cli
 -

 Key: CASSANDRA-4129
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4129
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.8
Reporter: Manoj Kanta Mainali
Assignee: Pavel Yaskevich
Priority: Minor
 Fix For: 1.0.10

 Attachments: CASSANDRA-4129.patch


 Keyspaces cannot be create when the keyspace name which are used as keywords 
 in the cli, such as 'keyspace', 'family' etc., through CLI. Even when 
 surrounding the keyspace with quotation does not solve the problem. However, 
 such keyspaces can be created through other client such as Hector.
 This is similar to the issue CASSANDRA-3195, in which the column families 
 could not be created. Similar to the solution of CASSANDRA-3195, using String 
 keyspaceName = CliUtil.unescapeSQLString(statement.getChild(0).getText()) in 
 executeAddKeySpace would solve the problem. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-04-06 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248615#comment-13248615
 ] 

Jonathan Ellis commented on CASSANDRA-4093:
---

bq. I guess we can go with v2 and expose componentIndex later when it makes 
sense on the thrift.

That's what I had in mind.

We should keep dclocal_r_r as parameter 37 in the thrift idl to make it 
compatible w/ beta clients.
Otherwsise, +1 on v2 and backport of 4037.

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, 4093_v2.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4115) UNREACHABLE schema after decommissioning a non-seed node

2012-04-06 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248882#comment-13248882
 ] 

Jonathan Ellis commented on CASSANDRA-4115:
---

bq. The only way I can see this happening on 1.0 is if ring delay is set to 
something lower than the gossip interval *2

Pretty sure Tyler didn't mess with ring delay.

 UNREACHABLE schema after decommissioning a non-seed node
 

 Key: CASSANDRA-4115
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4115
 Project: Cassandra
  Issue Type: Bug
 Environment: ccm using the following unavailable_schema_test.py dtest.
Reporter: Tyler Patterson
Assignee: Brandon Williams
Priority: Minor
 Attachments: 4115.txt


 decommission a non-seed node, sleep 30 seconds, then use thrift to check the 
 schema. UNREACHABLE is listed:
 {'75dc4c07-3c1a-3013-ad7d-11fb34208465': ['127.0.0.1'],
  'UNREACHABLE': ['127.0.0.2']}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3883) CFIF WideRowIterator only returns batch size columns

2012-04-06 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249013#comment-13249013
 ] 

Jonathan Ellis commented on CASSANDRA-3883:
---

bq. Optimally, we'd have a way to express I'm at this column offset in this 
row, give me the next X number of columns, even if it requires going to the 
next row. But I'm not sure how to do that sanely, either

What if we allowed mixing (start key, end token) in KeyRange?  Wouldn't that 
fix it?

- 1st get_paged_slice call: ((start token, end token), empty start column) from 
slice
- subsequent get_paged_slice calls: ((last row key, end token), last column 
name)


 CFIF WideRowIterator only returns batch size columns
 

 Key: CASSANDRA-3883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3883
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.0
Reporter: Brandon Williams
 Fix For: 1.1.0

 Attachments: 3883-v1.txt


 Most evident with the word count, where there are 1250 'word1' items in two 
 rows (1000 in one, 250 in another) and it counts 198 with the batch size set 
 to 99.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3690) Streaming CommitLog backup

2012-04-06 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249119#comment-13249119
]

Jonathan Ellis commented on CASSANDRA-3690:
---

bq. it will good to have unique names in the backup sometimes so we dont
overwrite

If you think about it, target filename is just a suggestion... you'd be free
to ignore it and generate a different filename (incorporating timestamp for
instance, or even a uuid) in the archive script.

Streaming CommitLog backup
--

Key: CASSANDRA-3690
URL: https://issues.apache.org/jira/browse/CASSANDRA-3690
Project: Cassandra
Issue Type: Bug
Components: Tools
Reporter: Vijay
Assignee: Vijay
Priority: Minor
Fix For: 1.1.1

Attachments: 0001-CASSANDRA-3690-v2.patch,
0001-CASSANDRA-3690-v4.patch, 0001-Make-commitlog-recycle-configurable.patch,
0002-support-commit-log-listener.patch, 0003-helper-jmx-methods.patch,
0004-external-commitlog-with-sockets.patch,
0005-cmmiting-comments-to-yaml.patch

Problems with the current SST backups
1) The current backup doesn't allow us to restore point in time (within a SST)
2) Current SST implementation needs the backup to read from the filesystem
and hence additional IO during the normal operational Disks
3) in 1.0 we have removed the flush interval and size when the flush will be
triggered per CF,
For some use cases where there is less writes it becomes
increasingly difficult to time it right.
4) Use cases which needs BI which are external (Non cassandra), needs the
data in regular intervals than waiting for longer or unpredictable intervals.
Disadvantages of the new solution
1) Over head in processing the mutations during the recover phase.
2) More complicated solution than just copying the file to the archive.
Additional advantages:
Online and offline restore.
Close to live incremental backup.
Note: If the listener agent gets restarted, it is the agents responsibility
to Stream the files missed or incomplete.
There are 3 Options in the initial implementation:
1) Backup - Once a socket is connected we will switch the commit log and
send new updates via the socket.
2) Stream - will take the absolute path of the file and will read the file
and send the updates via the socket.
3) Restore - this will get the serialized bytes and apply's the mutation.
Side NOTE: (Not related to this patch as such) The agent which will take
incremental backup is planned to be open sourced soon (Name: Priam).

[jira] [Commented] (CASSANDRA-3919) Dropping a column should do more than just remove the definition

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247105#comment-13247105
 ] 

Jonathan Ellis commented on CASSANDRA-3919:
---

3 feels reasonable to me.

 Dropping a column should do more than just remove the definition
 

 Key: CASSANDRA-3919
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3919
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
  Labels: compaction, cql
 Fix For: 1.1.1


 Dropping a column should:
 - immediately make it unavailable for {{SELECT}}, including {{SELECT *}}
 - eventually (i.e., post-compaction) reclaim the space formerly used by that 
 column

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3629) Bootstrapping nodes don't ensure schema is ready before continuing

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247118#comment-13247118
 ] 

Jonathan Ellis commented on CASSANDRA-3629:
---

The easiest workaround for this is CASSANDRA-3600, which is also 1.0-only. That 
much should be safe to backport...

 Bootstrapping nodes don't ensure schema is ready before continuing
 --

 Key: CASSANDRA-3629
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3629
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.0.7

 Attachments: 
 0001-Wait-until-the-highest-known-schema-has-been-reached.txt


 A bootstrapping node will assume that after it has slept for RING_DELAY it 
 has all of the schema migrations and can continue the bootstrap process.  
 However, with a large enough amount of migrations this is not sufficient and 
 causes problems.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4120) Gossip identifies hosts by IP

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247273#comment-13247273
 ] 

Jonathan Ellis commented on CASSANDRA-4120:
---

We should probably pick a different unique identifier, since IP is not really a 
unique identifier either.  (E.g., host dies, new one joins the cluster with 
same IP.)  This motivated the switch from using IP to using token in the past.

 Gossip identifies hosts by IP
 -

 Key: CASSANDRA-4120
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4120
 Project: Cassandra
  Issue Type: Sub-task
  Components: Core
Reporter: Sam Overton
Assignee: Sam Overton

 Since there is no longer a one-to-one mapping of host to token, the IP should 
 be used to identify a host. This impacts:
 * Gossip
 * Hinted Hand-off
 * some JMX operations (eg. assassinateEndpointUnsafe)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247578#comment-13247578
]

Jonathan Ellis commented on CASSANDRA-4093:
---

bq. I'm not sure this will be so very confusing to users.

What I'm worried about is that it forces us to distinguish between logical
and physical columns again. I *love* that with CQL3 all I have to talk about
is CQL columns and not have to dig out my diagrams of mapping CT to logical
columns until someone starts to actually dig into the engine code.

bq. the introduction of composite_index (this patch) is just the first part
CASSANDRA-3680. We will have to add it soon enough. It is hardly something
added just for backward compatibility.

I don't follow at all. 3680 just means we want to be able to create an index
on a logical column that is part of a CT under the hood: i.e., exactly the same
thing we can already represent with the current-as-of-today CFMetadata.

I have virtually zero interest in supporting partial indexes as discussed in
3782; RDBMSes have shown pretty conclusively that this is a very niche feature.
Very much in the category of let's take our time and add it if it makes
sense, not just because we know how to do it.

bq. It's untrue that column_aliases and value_alias were 'added before we had
cqlsh'.

You left out key_alias, which is what I was referring to having added in 0.8
well before we had cqlsh. I can only guess that we added column_aliases and
value_aliases to thrift as well for the sake of consistency with that
precedent. As you point out, though, it's not too late to rip those out and we
probably should.

bq. As soon as CASSANDRA-3680 is done, it will be usefull to allow creating
secondary indexes on CT component on the thrift side. Why wouldn't we allow
CASSANDRA-3680 on the thrift if it only cost us the addition of a simple int
field in ColumnDef?

Sounds like you're getting a little ahead of yourself. Why not add it as part
of 3680, should that be something we want to do? (But as I described above, I
don't think it is.)

schema_* CFs do not respect column comparator which leads to CLI commands
failure.
--

Key: CASSANDRA-4093
URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
Project: Cassandra
Issue Type: Bug
Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
Fix For: 1.1.0

Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch

ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize
schema_* CFs column_metadata do not respect CF comparator and use
ByteBufferUtil.bytes(...) for column names which creates problems in CLI and
probably in other places.
The CompositeType validator throws exception on first column
String columnName = columnNameValidator.getString(columnDef.name);
Because it appears the composite type length header is wrong (25455)
AbstractCompositeType.getWithShortLength
java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:247)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
at
org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
at
org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
at
org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
(seen in trunk)

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247591#comment-13247591
 ] 

Jonathan Ellis commented on CASSANDRA-4093:
---

bq. What I'm worried about is that it forces us to distinguish between 
logical and physical columns again. 

That said, you're probably right that for CQL users this is an implementation 
detail that they won't care about.  And Thrift users can wallow in the mud to 
their hearts' content, I suppose.

But, that doesn't mean we should gratuitously blur the lines between the two.

So here is what I propose:

- Change ColumnDefinition back to its old behavior.  I guess we'll need to add 
the int to allow us to support CQL3 internally, but we don't need to expose it 
to thrift yet, if at all.  (Alternatively we could add a cql_column_metadata 
that supports the new semantics.)
- Remove column_aliases and value_aliases from Thrift.  They serve no purpose 
there than to give users a gun with which to shoot themselves in the foot.


 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3690) Streaming CommitLog backup

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247729#comment-13247729
]

Jonathan Ellis commented on CASSANDRA-3690:
---

I've made a bunch of minor changes and pushed to
https://github.com/jbellis/cassandra/branches/3690-v3.

I noticed that we need to wait for the archive to finish whether we end up
recycling or not. Seems to me it would be simpler to continue to always
recycle, but (as we have here) wait for the archive first. So archive can copy
off to s3 or whatever directly, instead of ln somewhere else as an intermediate
step. Total i/o will be lower and commitlog will create extra segments if
needed in the meantime.

Maybe we should also have a restore_list_segments command as well, so we can
query s3 (again for instance) directly and have restore_command pull from
there, rather than requiring a local directory?

Streaming CommitLog backup
--

Key: CASSANDRA-3690
URL: https://issues.apache.org/jira/browse/CASSANDRA-3690
Project: Cassandra
Issue Type: Bug
Components: Tools
Reporter: Vijay
Assignee: Vijay
Priority: Minor
Fix For: 1.1.1

Attachments: 0001-CASSANDRA-3690-v2.patch,
0001-Make-commitlog-recycle-configurable.patch,
0002-support-commit-log-listener.patch, 0003-helper-jmx-methods.patch,
0004-external-commitlog-with-sockets.patch,
0005-cmmiting-comments-to-yaml.patch

[jira] [Commented] (CASSANDRA-3690) Streaming CommitLog backup

2012-04-05 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247738#comment-13247738
]

Jonathan Ellis commented on CASSANDRA-3690:
---

Also: would be nice to get rid of the new Thread / busywait archive dance. If
we used an ExecutorService instead, we could add the Future to the segment and
just say segment.waitForArchive(), no looping.

Streaming CommitLog backup
--

Key: CASSANDRA-3690
URL: https://issues.apache.org/jira/browse/CASSANDRA-3690
Project: Cassandra
Issue Type: Bug
Components: Tools
Reporter: Vijay
Assignee: Vijay
Priority: Minor
Fix For: 1.1.1

[jira] [Commented] (CASSANDRA-3966) KeyCacheKey and RowCacheKey to use raw byte[]

2012-04-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246428#comment-13246428
 ] 

Jonathan Ellis commented on CASSANDRA-3966:
---

can we also change serializeForStorage signature to {{int 
write(DataOutputStream out)}} to avoid unnecessary conversions back to 
ByteBuffer?

 KeyCacheKey and RowCacheKey to use raw byte[]
 -

 Key: CASSANDRA-3966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3966
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.1.1

 Attachments: 0001-CASSANDRA-3966.patch


 We can just store the raw byte[] instead of byteBuffer,
 After reading the mail
 http://www.mail-archive.com/dev@cassandra.apache.org/msg03725.html
 Each ByteBuffer takes 48 bytes = for house keeping can be removed by just 
 implementing hashcode and equals in the KeyCacheKey and RowCacheKey
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/ByteBuffer.java#ByteBuffer.hashCode%28%29

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1311) Triggers

2012-04-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246482#comment-13246482
 ] 

Jonathan Ellis commented on CASSANDRA-1311:
---

bq. all of our triggers we implemented simply make REST posts out to services 
that actually do the work

I don't think REST calls should be a first-class citizen in the final api, 
since a main goal of triggers is to pushing the code closer to the data; 
calling out over the network cuts that off at the knees.  But, obviously a js 
or jar based trigger could call out to a REST service if that's how you want to 
roll.

 Triggers
 

 Key: CASSANDRA-1311
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1311
 Project: Cassandra
  Issue Type: New Feature
Reporter: Maxim Grinev
 Fix For: 1.2

 Attachments: HOWTO-PatchAndRunTriggerExample-update1.txt, 
 HOWTO-PatchAndRunTriggerExample.txt, ImplementationDetails-update1.pdf, 
 ImplementationDetails.pdf, trunk-967053.txt, trunk-984391-update1.txt, 
 trunk-984391-update2.txt


 Asynchronous triggers is a basic mechanism to implement various use cases of 
 asynchronous execution of application code at database side. For example to 
 support indexes and materialized views, online analytics, push-based data 
 propagation.
 Please find the motivation, triggers description and list of applications:
 http://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/
 An example of using triggers for indexing:
 http://maxgrinev.com/2010/07/23/managing-indexes-in-cassandra-using-async-triggers/
 Implementation details are attached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4115) UNREACHABLE schema after decommissioning a non-seed node

2012-04-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246509#comment-13246509
 ] 

Jonathan Ellis commented on CASSANDRA-4115:
---

Also, did you mean to attach unavailable_schema_test.py somewhere?

 UNREACHABLE schema after decommissioning a non-seed node
 

 Key: CASSANDRA-4115
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4115
 Project: Cassandra
  Issue Type: Bug
 Environment: ccm using the following unavailable_schema_test.py dtest.
Reporter: Tyler Patterson
Assignee: Brandon Williams
Priority: Minor

 decommission a non-seed node, sleep 30 seconds, then use thrift to check the 
 schema. UNREACHABLE is listed:
 {'75dc4c07-3c1a-3013-ad7d-11fb34208465': ['127.0.0.1'],
  'UNREACHABLE': ['127.0.0.2']}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

2012-04-04 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246621#comment-13246621
]

Jonathan Ellis commented on CASSANDRA-3974:
---

Thanks Kirk!

My comments:

- Looks like this only updates the CQL path? We'd want to make the Thrift path
cf-ttl-aware as well. I *think* this just means updating RowMutation + CF
addColumn methods.
- Nit: we could simplify getTTL a bit by adding assert ttl 0.
- I got it backwards: we want max(cf ttl, column ttl) to be able to reason
about the live-ness of CF data w/o looking at individual rows
- We can break the compaction optimizations into another ticket. It really
needs a separate compaction Strategy; the idea is if we have an sstable A older
than CF ttl, then all the data in the file is dead and we can just delete the
file without looking at it row-by-row. However, there's a lot of tension there
with the goal of normal compaction, which wants to merge different versions of
the same row, so we're going to churn a lot with a low chance of ever having an
sstable last the full TTL without being merged, effectively restarting our
timer. So, I think we're best served by a ArchivingCompactionStrategy that
doesn't merge sstables at all, just drops obsolete ones, and let people use
that for append-only insert workloads. Which is a common enough case that it's
worth the trouble... probably. :)

Per-CF TTL
--

Key: CASSANDRA-3974
URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
Project: Cassandra
Issue Type: New Feature
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
Fix For: 1.2

Attachments: trunk-3974.txt

Per-CF TTL would allow compaction optimizations (drop an entire sstable's
worth of expired data) that we can't do with per-column.

[jira] [Commented] (CASSANDRA-3945) Support incremental/batch sizes for BulkRecordWriter, due to GC overhead issues

2012-04-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246763#comment-13246763
 ] 

Jonathan Ellis commented on CASSANDRA-3945:
---

3859 has been committed.

 Support incremental/batch sizes for BulkRecordWriter, due to GC overhead 
 issues
 ---

 Key: CASSANDRA-3945
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3945
 Project: Cassandra
  Issue Type: Bug
Reporter: Chris Goffinet
Assignee: Chris Goffinet
Priority: Minor
 Fix For: 1.1.1


 When loading large amounts of data, currently the BulkRecordWriter will write 
 out all the sstables, then stream them. This actually caused us GC overhead 
 issues, due to our heap sizes for reducers. We ran into a problem where the 
 number of SSTables on disk that had to be open would cause the jvm process to 
 die. We also wanted a way to incrementally stream them as we created them. I 
 created support for setting this, the default behavior is wait for them to be 
 created. But if you increase to = 1, you can determine the batch size.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2897) Secondary indexes without read-before-write

2012-04-04 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246789#comment-13246789
 ] 

Jonathan Ellis commented on CASSANDRA-2897:
---

To summarize, I think an implementation of this would look something like this:

# Get rid of the synchronized oldIndexColumns code in Table.apply; on write, 
all we need to do is add an index entry for the newly-written value
# Index read code (ColumnFamilyStore.getIndexedIterator) will need to 
double-check the rows returned by the index to make sure the column value still 
matches the indexed one; if it does not, delete the index entry so we don't 
keep making the same mistake
# (The hard part, as described in the immediately previous comment) Change 
AbstractCompactionRow write implementations to delete old index entries as 
well; that is, we create index tombstones for each column value that is NOT the 
one retained after the compaction merge.  Specifically, PrecompactedRow::merge 
and LazilyCompactedRow::Reducer.
# Existing index tests (in ColumnFamilyStoreTest::testIndexDeletions and 
::testIndexUpdate) are fine for parts 1-2, but we should add a new test for 3 
to make sure that index-update-on-compaction works as advertised

 Secondary indexes without read-before-write
 ---

 Key: CASSANDRA-2897
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2897
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: secondary_index

 Currently, secondary index updates require a read-before-write to maintain 
 the index consistency. Keeping the index consistent at all time is not 
 necessary however. We could let the (secondary) index get inconsistent on 
 writes and repair those on reads. This would be easy because on reads, we 
 make sure to request the indexed columns anyway, so we can just skip the row 
 that are not needed and repair the index at the same time.
 This does trade work on writes for work on reads. However, read-before-write 
 is sufficiently costly that it will likely be a win overall.
 There is (at least) two small technical difficulties here though:
 # If we repair on read, this will be racy with writes, so we'll probably have 
 to synchronize there.
 # We probably shouldn't only rely on read to repair and we should also have a 
 task to repair the index for things that are rarely read. It's unclear how to 
 make that low impact though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4112) nodetool cleanup giving exception

2012-04-03 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245347#comment-13245347
 ] 

Jonathan Ellis commented on CASSANDRA-4112:
---

+1 on v2

 nodetool cleanup giving exception
 -

 Key: CASSANDRA-4112
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4112
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.0.9
 Environment: Ubuntu LTS 10.04, OpenJDK 1.6.0_20
Reporter: Shoaib
Assignee: Jonathan Ellis
  Labels: compaction
 Fix For: 1.0.9

 Attachments: 4112.txt, 4112_v2.txt


 We just recently started using version 1.0.9, previously we were using tiered 
 compaction because of a bug in 1.0.8 (not letting us use leveled compaction) 
 and now since moving to 1.0.9 we have started using leveled compaction.
 Trying to do a cleanup we are getting the following exception:
 root@test:~# nodetool -h localhost cleanup 
 Error occured during cleanup
 java.util.concurrent.ExecutionException: java.util.NoSuchElementException
 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
 at java.util.concurrent.FutureTask.get(FutureTask.java:111)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:204)
 at 
 org.apache.cassandra.db.compaction.CompactionManager.performCleanup(CompactionManager.java:240)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:988)
 at 
 org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1639)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
 at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
 at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
 at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
 at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
 at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:616)
 at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
 at sun.rmi.transport.Transport$1.run(Transport.java:177)
 at java.security.AccessController.doPrivileged(Native Method)
 at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
 at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
 at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)
 Caused by: java.util.NoSuchElementException
 at java.util.ArrayList$Itr.next(ArrayList.java:757)
 at 
 org.apache.cassandra.db.compaction.LeveledManifest.replace(LeveledManifest.java:196)
 at 
 org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:147)
 at 
 org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:495)
 at 
 org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:235)
 at

[jira] [Commented] (CASSANDRA-1311) Triggers

2012-04-03 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245418#comment-13245418
]

Jonathan Ellis commented on CASSANDRA-1311:
---

Here's some brainstorming about things to think through to get this into core:

- What guarantees can we make about durability? Once a mutation is in any
replica of the CSCL it can be read for replay, so it should be considered a
success in that respect. But, we can't call it a success for the purposes of
the client's request for CL.X yet. In the extreme case we could have a
successful CL write but all replicas down. One simple approach that does the
right thing most of the time would be to perform the same availability checks
on the CSCL replicas as for the data replicas. But, this doesn't address
corner cases (nodes going down after the check but before the write), overload
situations (nodes being technically up, but timeing out), and also makes the
write path more fragile (now we rely on CSCL replicas being up, not just the
data nodes).
- How do we handle replay? We can't simply replay immediately on startup since
the CSCL is (probably) on other machines. Do we wait for one CSCL replica?
All of them? Do we need to be worried about performance impact of every node
in the cluster hammering each other with CSCL requests after a full cluster
restart?
- Do we expose the CSCL to non-trigger uses, e.g. atomic batches?
- What API do we provide to trigger authors? What points in the write path do
we allow hooks into, and what do we allow them to do? (E.g.: cancel the
update, modify the RowMutation, create additional RowMutations? Do we provide
the pre-write CF row to the trigger? If so do we provide a lightweight
alternative that doesn't force read-before-write?)
- What about implementation? Here's an interface, implement it in whatever
JVM language you like and give us a class name? Appealing, but now restart
your server to get your trigger jar on the classpath is not. Neither am I
thrilled with the thought of implementing some kind of jar manager that stores
triggers in Cassandra itself. Triggers are always implemented in javascript?
Maybe a good lowest-common denominator but many developers are not fond of js
and Rhino is a bit of a dog. (Nashorn is due in Java 8, however.)

Triggers

Key: CASSANDRA-1311
URL: https://issues.apache.org/jira/browse/CASSANDRA-1311
Project: Cassandra
Issue Type: New Feature
Reporter: Maxim Grinev
Fix For: 1.2

Attachments: HOWTO-PatchAndRunTriggerExample-update1.txt,
HOWTO-PatchAndRunTriggerExample.txt, ImplementationDetails-update1.pdf,
ImplementationDetails.pdf, trunk-967053.txt, trunk-984391-update1.txt,
trunk-984391-update2.txt

Asynchronous triggers is a basic mechanism to implement various use cases of
asynchronous execution of application code at database side. For example to
support indexes and materialized views, online analytics, push-based data
propagation.
Please find the motivation, triggers description and list of applications:
http://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/
An example of using triggers for indexing:
http://maxgrinev.com/2010/07/23/managing-indexes-in-cassandra-using-async-triggers/
Implementation details are attached.

[jira] [Commented] (CASSANDRA-4111) Serializing cache can cause Segfault in 1.1

2012-04-03 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13245525#comment-13245525
 ] 

Jonathan Ellis commented on CASSANDRA-4111:
---

+1

 Serializing cache can cause Segfault in 1.1
 ---

 Key: CASSANDRA-4111
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4111
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Vijay
Assignee: Vijay
 Fix For: 1.1.0

 Attachments: 0001-CASSANDRA-4111-v2.patch, 0001-CASSANDRA-4111.patch


 Rare but this can happen per sure, looks like this issue is after 
 CASSANDRA-3862 hence affectes only 1.1
 FreeableMemory old = map.get(key);
 if (old == null)
 return false;
 // see if the old value matches the one we want to replace
 FreeableMemory mem = serialize(value);
 if (mem == null)
 return false; // out of memory.  never mind.
 V oldValue = deserialize(old);
 boolean success = oldValue.equals(oldToReplace)  map.replace(key, 
 old, mem);
 if (success)
 old.unreference();
 else
 mem.unreference();
 return success;
 in the above code block we deserialize(old) without taking reference to the 
 old memory, this can case seg faults when the old is reclaimed (free is 
 called)
 Fix is to get the reference just for deserialization
 V oldValue;
 // reference old guy before de-serializing
 old.reference();
 try
 {
  oldValue = deserialize(old);
 }
 finally
 {
 old.unreference();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4102) Upgrade to Jackson 2

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244175#comment-13244175
 ] 

Jonathan Ellis commented on CASSANDRA-4102:
---

Libraries we ship are in lib/.

 Upgrade to Jackson 2
 

 Key: CASSANDRA-4102
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4102
 Project: Cassandra
  Issue Type: Bug
Reporter: Ben McCann
Priority: Minor

 Cassandra is currently using Jackson 1.4.0.  It would be nice to upgrade to 
 Jackson 2, which is a smaller, lighter, and more modular library.  I'm using 
 Play Framework and SBT, which complain vociferously about Jackson 1 not 
 having its javadoc jars in the Maven repository.  Upgrading to Jackson 2 
 would fix this annoyance.
 Files using Jackson are:
 src/java/org/apache/cassandra/utils/FBUtilities.java
 src/java/org/apache/cassandra/tools/SSTableExport.java
 src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
 Info on Jackson 2 is available on Github and the wiki:
 https://github.com/FasterXML/jackson-core
 http://wiki.fasterxml.com/JacksonRelease20

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4102) Upgrade to Jackson 2

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244246#comment-13244246
 ] 

Jonathan Ellis commented on CASSANDRA-4102:
---

not sure what that dependency line is for, but it's probably pulling jackson 
into build/lib as well...

maybe try realclean, or manually clear out build/lib first.

 Upgrade to Jackson 2
 

 Key: CASSANDRA-4102
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4102
 Project: Cassandra
  Issue Type: Bug
Reporter: Ben McCann
Priority: Minor

 Cassandra is currently using Jackson 1.4.0.  It would be nice to upgrade to 
 Jackson 2, which is a smaller, lighter, and more modular library.  I'm using 
 Play Framework and SBT, which complain vociferously about Jackson 1 not 
 having its javadoc jars in the Maven repository.  Upgrading to Jackson 2 
 would fix this annoyance.
 Files using Jackson are:
 src/java/org/apache/cassandra/utils/FBUtilities.java
 src/java/org/apache/cassandra/tools/SSTableExport.java
 src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
 Info on Jackson 2 is available on Github and the wiki:
 https://github.com/FasterXML/jackson-core
 http://wiki.fasterxml.com/JacksonRelease20

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3951) make thrift interface backwards compat guarantee more specific

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244325#comment-13244325
]

Jonathan Ellis commented on CASSANDRA-3951:
---

To elaborate: the context here is that taking obsolete fields out of the IDL
makes it difficult for some clients (especially java) to support old C*
versions even if they want to, since you'd have to do some crazy classloader
stuff to get different jar versions supported.

So as a we'll try not to make life harder than necessary for clients position
I'm fine with saying thou shalt not remove obsolete fields from the thrift
idl. But more than that we can't promise from the server side.

make thrift interface backwards compat guarantee more specific

Key: CASSANDRA-3951
URL: https://issues.apache.org/jira/browse/CASSANDRA-3951
Project: Cassandra
Issue Type: Improvement
Components: API
Affects Versions: 0.5
Reporter: paul cannon
Assignee: paul cannon
Priority: Minor
Labels: thrift_protocol
Fix For: 1.0.10

The comments in cassandra.thrift read:
{noformat}
# The API version (NOT the product version), composed as a dot delimited
# string with major, minor, and patch level components.
#
# - Major: Incremented for backward incompatible changes. An example would
# be changes to the number or disposition of method arguments.
# - Minor: Incremented for backward compatible changes. An example would
# be the addition of a new (optional) method.
# - Patch: Incremented for bug fixes. The patch level should be increased
# for every edit that doesn't result in a change to major/minor.
#
# See the Semantic Versioning Specification (SemVer) http://semver.org.
{noformat}
This is great to have documented guarantees, but it is unclear whether the
backward compatibility discussed refers to the Cassandra server being able
to talk to clients built against older thrift specs, or whether it refers to
clients being able to talk to Cassandra servers built against older thrift
specs.
In a conversation on irc this morning, I found out that it actually meant
that the former (older clients should be able to talk to a new Cassandra, but
newer clients are not guaranteed to be able to talk to an old Cassandra). On
the other hand, people seemed willing to extend the compatibility guarantees
in *both* directions going forward, since we would like to switch to a
dedicated CQL transport anyway.
Either way, the comments in cassandra.thrift should be specific about what is
guaranteed so that client and library authors, and Cassandra developers, all
agree what to expect.

[jira] [Commented] (CASSANDRA-3966) KeyCacheKey and RowCacheKey to use raw byte[]

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1325#comment-1325
 ] 

Jonathan Ellis commented on CASSANDRA-3966:
---

+1 on the idea of switching to byte[] keys, though.

 KeyCacheKey and RowCacheKey to use raw byte[]
 -

 Key: CASSANDRA-3966
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3966
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0.0
Reporter: Vijay
Assignee: Vijay
Priority: Minor
 Fix For: 1.1.1


 We can just store the raw byte[] instead of byteBuffer,
 After reading the mail
 http://www.mail-archive.com/dev@cassandra.apache.org/msg03725.html
 Each ByteBuffer takes 48 bytes = for house keeping can be removed by just 
 implementing hashcode and equals in the KeyCacheKey and RowCacheKey
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/nio/ByteBuffer.java#ByteBuffer.hashCode%28%29

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4049) Add generic way of adding SSTable components required custom compaction strategy

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244584#comment-13244584
 ] 

Jonathan Ellis commented on CASSANDRA-4049:
---

I'm a little nervous about this.  Having descriptors know about only some 
components feels fragile and likely to cause bugs at some point.

What if we added Custom1..Custom5 Type entries so that we're not just ignoring 
them?

 Add generic way of adding SSTable components required custom compaction 
 strategy
 

 Key: CASSANDRA-4049
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4049
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Piotr Kołaczkowski
Priority: Minor
 Fix For: 1.0.10

 Attachments: compaction_strategy_cleanup.patch, component_patch.diff


 CFS compaction strategy coming up in the next DSE release needs to store some 
 important information in Tombstones.db and RemovedKeys.db files, one per 
 sstable. However, currently Cassandra issues warnings when these files are 
 found in the data directory. Additionally, when switched to 
 SizeTieredCompactionStrategy, the files are left in the data directory after 
 compaction.
 The attached patch adds new components to the Component class so Cassandra 
 knows about those files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4021) CFS.scrubDataDirectories tries to delete nonexistent orphans

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244695#comment-13244695
 ] 

Jonathan Ellis commented on CASSANDRA-4021:
---

I don't suppose you have a log file for that case?

 CFS.scrubDataDirectories tries to delete nonexistent orphans
 

 Key: CASSANDRA-4021
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4021
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7 beta 2
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
  Labels: datastax_qa
 Fix For: 1.0.10

 Attachments: 4021.txt


 The check only looks for a missing data file, then deletes all other 
 components, however it's possible for the data file and another component to 
 be missing, causing an error:
 {noformat}
  WARN 17:19:28,765 Removing orphans for 
 /var/lib/cassandra/data/system/HintsColumnFamily/system-HintsColumnFamily-hd-24492:
  [Index.db, Filter.db, Digest.sha1, Statistics.db, Data.db]
 ERROR 17:19:28,766 Exception encountered during startup
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 java.lang.AssertionError: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:49)
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scrubDataDirectories(ColumnFamilyStore.java:357)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:167)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:352)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:105)
 Exception encountered during startup: attempted to delete non-existing file 
 system-HintsColumnFamily-hd-24492-Index.db
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2749) fine-grained control over data directories

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244746#comment-13244746
 ] 

Jonathan Ellis commented on CASSANDRA-2749:
---

Did some more research on path limitations:

NTFS is technically okay with paths up to 32K long[1], but the windows api is 
limited to 256[2].  Common Linux filesystems have a limit of 255 bytes per path 
*component* (i.e. directory or filename) but no total path limit.  However, 
Linux defines PATH_MAX and FILENAME_MAX, both 4096. [3]

[1] http://en.wikipedia.org/wiki/Comparison_of_file_systems
[2] http://msdn.microsoft.com/en-us/library/aa365247.aspx
[3] http://serverfault.com/questions/9546/filename-length-limits-on-linux

In short: restricting KS and CF names to 32 characters is a good idea for the 
benefit of Windows portability.  However, we may want to exempt Linux systems 
from the startup length check to allow easier upgrades.

 fine-grained control over data directories
 --

 Key: CASSANDRA-2749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2749
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1.0

 Attachments: 0001-2749.patch, 
 0001-Make-it-possible-to-put-column-families-in-subdirect.patch, 
 0001-non-backwards-compatible-patch-for-2749-putting-cfs-.patch.gz, 
 0002-fix-unit-tests.patch, 0003-Fixes.patch, 2749.tar.gz, 
 2749_backwards_compatible_v1.patch, 2749_backwards_compatible_v2.patch, 
 2749_backwards_compatible_v3.patch, 2749_backwards_compatible_v4.patch, 
 2749_backwards_compatible_v4_rebase1.patch, 2749_not_backwards.tar.gz, 
 2749_proper.tar.gz


 Currently Cassandra supports multiple data directories but no way to control 
 what sstables are placed where. Particularly for systems with mixed SSDs and 
 rotational disks, it would be nice to pin frequently accessed columnfamilies 
 to the SSDs.
 Postgresql does this with tablespaces 
 (http://www.postgresql.org/docs/9.0/static/manage-ag-tablespaces.html) but we 
 should probably avoid using that name because of confusing similarity to 
 keyspaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4111) Serializing cache can cause Segfault in 1.1

2012-04-02 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244920#comment-13244920
 ] 

Jonathan Ellis commented on CASSANDRA-4111:
---

good catch.  also need to handle old.reference() returning false tho.


 Serializing cache can cause Segfault in 1.1
 ---

 Key: CASSANDRA-4111
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4111
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
Reporter: Vijay
Assignee: Vijay
 Fix For: 1.1.0

 Attachments: 0001-CASSANDRA-4111.patch


 Rare but this can happen per sure, looks like this issue is after 
 CASSANDRA-3862 hence affectes only 1.1
 FreeableMemory old = map.get(key);
 if (old == null)
 return false;
 // see if the old value matches the one we want to replace
 FreeableMemory mem = serialize(value);
 if (mem == null)
 return false; // out of memory.  never mind.
 V oldValue = deserialize(old);
 boolean success = oldValue.equals(oldToReplace)  map.replace(key, 
 old, mem);
 if (success)
 old.unreference();
 else
 mem.unreference();
 return success;
 in the above code block we deserialize(old) without taking reference to the 
 old memory, this can case seg faults when the old is reclaimed (free is 
 called)
 Fix is to get the reference just for deserialization
 V oldValue;
 // reference old guy before de-serializing
 old.reference();
 try
 {
  oldValue = deserialize(old);
 }
 finally
 {
 old.unreference();
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4097) Classes in org.apache.cassandra.deps:avro:1.4.0-cassandra-1 clash with core Avro classes

2012-03-31 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243140#comment-13243140
 ] 

Jonathan Ellis commented on CASSANDRA-4097:
---

look for avro in the lib dir

 Classes in org.apache.cassandra.deps:avro:1.4.0-cassandra-1 clash with core 
 Avro classes
 

 Key: CASSANDRA-4097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4097
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Andrew Swan
Priority: Minor

 Cassandra has this dependency:
 {code:title=build.xml}...
 dependency groupId=org.apache.cassandra.deps artifactId=avro 
 version=1.4.0-cassandra-1
 ...{code}
 Unfortunately this JAR file contains classes in the {{org.apache.avro}} 
 package that are incompatible with classes of the same fully-qualified name 
 in the current release of Avro. For example, the inner class 
 {{org.apache.avro.Schema$Parser}} found in Avro 1.6.1 is missing from the 
 Cassandra version of that class. This makes it impossible to have both 
 Cassandra and the latest Avro version on the classpath (my use case is an 
 application that embeds Cassandra but also uses Avro 1.6.1 for unrelated 
 serialization purposes). A simple and risk-free solution would be to change 
 the package declaration of Cassandra's Avro classes from {{org.apache.avro}} 
 to (say) {{org.apache.cassandra.avro}}, assuming that the above dependency is 
 only used by Cassandra and no other projects (which seems a reasonable 
 assumption given its name).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4093) schema_* CFs do not respect column comparator which leads to CLI commands failure.

2012-03-31 Thread Jonathan Ellis (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13243167#comment-13243167
 ] 

Jonathan Ellis commented on CASSANDRA-4093:
---

bq. Sounds like it wasn't a good time to make schema_* CFs to use CQL3 style 
metadata 

I'm still convinced that part is worth it to be able to query schema 
information without thrift describe_ methods.

 schema_* CFs do not respect column comparator which leads to CLI commands 
 failure.
 --

 Key: CASSANDRA-4093
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4093
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 1.1.0
Reporter: Dave Brosius
Assignee: Sylvain Lebresne
 Fix For: 1.1.0

 Attachments: 4093.txt, CASSANDRA-4093-CD-changes.patch


 ColumnDefinition.{ascii, utf8, bool, ...} static methods used to initialize 
 schema_* CFs column_metadata do not respect CF comparator and use 
 ByteBufferUtil.bytes(...) for column names which creates problems in CLI and 
 probably in other places.
 The CompositeType validator throws exception on first column
 String columnName = columnNameValidator.getString(columnDef.name);
 Because it appears the composite type length header is wrong (25455)
 AbstractCompositeType.getWithShortLength
 java.lang.IllegalArgumentException
   at java.nio.Buffer.limit(Buffer.java:247)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:59)
   at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.getString(AbstractCompositeType.java:139)
   at 
 org.apache.cassandra.cli.CliClient.describeColumnFamily(CliClient.java:2046)
   at 
 org.apache.cassandra.cli.CliClient.describeKeySpace(CliClient.java:1969)
   at 
 org.apache.cassandra.cli.CliClient.executeShowKeySpaces(CliClient.java:1574)
 (seen in trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1149 matches

Mail list logo