[jira] [Updated] (CASSANDRA-2589) row deletes do not remove columns

2011-05-15 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2589:


Attachment: 0001-remove-deleted-columns-before-flushing-memtable-v07.patch
0001-remove-deleted-columns-before-flushing-memtable-v08.patch

Patch to remove deleted columns inside of Memtable.writeSortedContents() 

- did not just skip them in ColumnFamilySerialiser.serialiseForSSTable as  that 
is also used for inter node messages. 
- not in CFS.serializeWithIndexes() because it would have to remove them from 
the CF before building the column index and that felt bad. 
- writes an empty CF if it has been marked for delete because the system tables 
have GCGraceSeconds 0 and it can lose the delete.  

 row deletes do not remove columns
 -

 Key: CASSANDRA-2589
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2589
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.5, 0.8 beta 1
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 
 0001-remove-deleted-columns-before-flushing-memtable-v07.patch, 
 0001-remove-deleted-columns-before-flushing-memtable-v08.patch


 When a row delete is issued CF.delete() sets the localDeletetionTime and 
 markedForDeleteAt values but does not remove columns which have a lower time 
 stamp. As a result:
 # Memory which could be freed is held on to (prob not too bad as it's already 
 counted)
 # The deleted columns are serialised to disk, along with the CF info to say 
 they are no longer valid. 
 # NamesQueryFilter and SliceQueryFilter have to do more work as they filter 
 out the irrelevant columns using QueryFilter.isRelevant()
 # Also columns written with a lower time stamp after the deletion are added 
 to the CF without checking markedForDeletionAt.
 This can cause RR to fail, will create another ticket for that and link. This 
 ticket is for a fix to removing the columns. 
 Two options I could think of:
 # Check for deletion when serialising to SSTable and ignore columns if the 
 have a lower timestamp. Otherwise leave as is so dead columns stay in memory. 
 # Ensure at all times if the CF is deleted all columns it contains have a 
 higher timestamp. 
 ## I *think* this would include all column types (DeletedColumn as well) as 
 the CF deletion has the same effect. But not sure.
 ## Deleting (potentially) all columns in delete() will take time. Could track 
 the highest timestamp in the CF so the normal case of deleting all cols does 
 not need to iterate. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2655) update wiki with CLI help

2011-05-15 Thread Aaron Morton (JIRA)
update wiki with CLI help
-

 Key: CASSANDRA-2655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2655
 Project: Cassandra
  Issue Type: Task
  Components: Documentation  website
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor


Need a way to update the wiki with the help written for the CLI. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2655) update wiki with CLI help

2011-05-15 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2655:


Attachment: 0001-add-command-text-to-help-sections.patch

Attached patch to add a command: text to each help section in the cli help. 
The text is the statement the help is for, e.g. create keyspace

Text is used when creating the wiki help as the sort order and heading. 

 update wiki with CLI help
 -

 Key: CASSANDRA-2655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2655
 Project: Cassandra
  Issue Type: Task
  Components: Documentation  website
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-add-command-text-to-help-sections.patch


 Need a way to update the wiki with the help written for the CLI. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2655) update wiki with CLI help

2011-05-15 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2655:


Attachment: yaml-to-mm.py

attached yaml-to-mm.py the script I used to generate the moin moin text 
fromsrc/resources/org/apache/cassandra/cli/CliHelp.yaml

Putting it here until I know what to do with it.  

 update wiki with CLI help
 -

 Key: CASSANDRA-2655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2655
 Project: Cassandra
  Issue Type: Task
  Components: Documentation  website
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-add-command-text-to-help-sections.patch, 
 yaml-to-mm.py


 Need a way to update the wiki with the help written for the CLI. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2656) nicer error message when endpoint not in topology

2011-05-16 Thread Aaron Morton (JIRA)
nicer error message when endpoint not in topology 
--

 Key: CASSANDRA-2656
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2656
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2, 0.7.5
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial


from http://www.mail-archive.com/user@cassandra.apache.org/msg13372.html

currently get a NullPointerException if a node is added to the cluster that is 
not in the topology file. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2655) update wiki with CLI help

2011-05-16 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033956#comment-13033956
 ] 

Aaron Morton commented on CASSANDRA-2655:
-

Create a new page in the wiki with the output 
http://wiki.apache.org/cassandra/CassandraCli08

Will ask on IRC what process would work best for getting this into the release 
process. 

 update wiki with CLI help
 -

 Key: CASSANDRA-2655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2655
 Project: Cassandra
  Issue Type: Task
  Components: Documentation  website
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-add-command-text-to-help-sections.patch, 
 yaml-to-mm.py


 Need a way to update the wiki with the help written for the CLI. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2655) update wiki with CLI help

2011-05-16 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034407#comment-13034407
 ] 

Aaron Morton commented on CASSANDRA-2655:
-

From IRC discussion, see CASSANDRA-2526 for example of how the CQL help is 
distributed. 

 update wiki with CLI help
 -

 Key: CASSANDRA-2655
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2655
 Project: Cassandra
  Issue Type: Task
  Components: Documentation  website
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-add-command-text-to-help-sections.patch, 
 yaml-to-mm.py


 Need a way to update the wiki with the help written for the CLI. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2268) CQL-enabled stress.java

2011-05-16 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton reassigned CASSANDRA-2268:
---

Assignee: Aaron Morton  (was: Pavel Yaskevich)

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.1


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-05-16 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034523#comment-13034523
 ] 

Aaron Morton commented on CASSANDRA-2268:
-

Spoke to jonathan on irc, he suggested I could help out with this one. 

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.1


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-05-20 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037220#comment-13037220
 ] 

Aaron Morton commented on CASSANDRA-2268:
-

Started working on this and discovered that the o.a.c.cql.jdbc.ColumnDecoder 
uses CFMetaData (line 61) which results in the DatabaseDescriptor been used 
which requires access to cassandra.yaml. 

I've not been following the JDBC/CQL tickets closely is that right ? 

Was creating the connection by:

{code}
Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
return DriverManager.getConnection(connString);
{code}

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.1


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-05-21 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037293#comment-13037293
 ] 

Aaron Morton commented on CASSANDRA-2268:
-

Thanks I'll give it a try, since then I also discovered that the driver was 
causing some of the thread pools in the main jar to spin up. Stopping the 
stress test app from exiting. I'll test it with your fix soon. 

This is what I've had to shutdown to allow the app to exit so far...
 
{code}
MessagingService.instance().shutdown();
StorageService.scheduledTasks.shutdown();
StorageService.tasks.shutdown();
{code}

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.1


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2694) remove references to DatabaseDescriptor in CFMetaData

2011-05-23 Thread Aaron Morton (JIRA)
remove references to DatabaseDescriptor in CFMetaData
-

 Key: CASSANDRA-2694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.0


The JDBC driver uses CFMetaData.fromThrift(), it was calling 
validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
This causes cassandra.yaml to be loaded and means the client side needs access 
to the file. 

I think this needs to be fixed for 0.8, I have the patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2694) remove references to DatabaseDescriptor in CFMetaData

2011-05-23 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2694:


Attachment: 0001-2694.patch

remove references to DatabaseDescriptor in CFMetaData

 remove references to DatabaseDescriptor in CFMetaData
 -

 Key: CASSANDRA-2694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.0

 Attachments: 0001-2694.patch


 The JDBC driver uses CFMetaData.fromThrift(), it was calling 
 validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
 This causes cassandra.yaml to be loaded and means the client side needs 
 access to the file. 
 I think this needs to be fixed for 0.8, I have the patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2694) remove references to DatabaseDescriptor in CFMetaData

2011-05-23 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038366#comment-13038366
 ] 

Aaron Morton commented on CASSANDRA-2694:
-

sounds good, will take a look soon.

 remove references to DatabaseDescriptor in CFMetaData
 -

 Key: CASSANDRA-2694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.0

 Attachments: 0001-2694.patch


 The JDBC driver uses CFMetaData.fromThrift(), it was calling 
 validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
 This causes cassandra.yaml to be loaded and means the client side needs 
 access to the file. 
 I think this needs to be fixed for 0.8, I have the patch. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2694) stop JDBC driver from needing access to cassandra.yaml

2011-05-24 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2694:


Description: 
The JDBC driver uses CFMetaData.fromThrift(), it was calling 
validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
This causes cassandra.yaml to be loaded and means the client side needs access 
to the file. 

I think this needs to be fixed for 0.8, I have the patch. 

**Updated** changed title from remove references to DatabaseDescriptor in 
CFMetaData

  was:
The JDBC driver uses CFMetaData.fromThrift(), it was calling 
validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
This causes cassandra.yaml to be loaded and means the client side needs access 
to the file. 

I think this needs to be fixed for 0.8, I have the patch. 

Summary: stop JDBC driver from needing access to cassandra.yaml  (was: 
remove references to DatabaseDescriptor in CFMetaData)

 stop JDBC driver from needing access to cassandra.yaml
 --

 Key: CASSANDRA-2694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.0

 Attachments: 0001-2694.patch


 The JDBC driver uses CFMetaData.fromThrift(), it was calling 
 validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
 This causes cassandra.yaml to be loaded and means the client side needs 
 access to the file. 
 I think this needs to be fixed for 0.8, I have the patch. 
 **Updated** changed title from remove references to DatabaseDescriptor in 
 CFMetaData

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2694) stop JDBC driver from needing access to cassandra.yaml

2011-05-24 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2694:


Attachment: 0001-2694-v2.patch

v2 patch moves thrift validation to ThriftValidation and calls it from 
ThriftValidation.validateCfDef()

I left the avro validation used by CFMetaData.apply() in CFMetaData. 

 stop JDBC driver from needing access to cassandra.yaml
 --

 Key: CASSANDRA-2694
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2694
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.0

 Attachments: 0001-2694-v2.patch, 0001-2694.patch


 The JDBC driver uses CFMetaData.fromThrift(), it was calling 
 validateMemtableSettings() which used static methods on  DatabaseDescriptor. 
 This causes cassandra.yaml to be loaded and means the client side needs 
 access to the file. 
 I think this needs to be fixed for 0.8, I have the patch. 
 **Updated** changed title from remove references to DatabaseDescriptor in 
 CFMetaData

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2704) provide a typed key value for returned rows via JDBC

2011-05-25 Thread Aaron Morton (JIRA)
provide a typed key value for returned rows via JDBC


 Key: CASSANDRA-2704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2704
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor


o.a.c.c.jdbc.CassandraResultSet provides access to column values as Cassandra 
type aware via the TypedColumn. But it only provides a byte[] for the key. It 
would be handy to have a TypedKey class that does the same but for the key.

This would be handy when doing a multi SELECT as the server only returns rows 
we have columns for.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2704) provide a typed key value for returned rows via JDBC

2011-05-25 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039328#comment-13039328
 ] 

Aaron Morton commented on CASSANDRA-2704:
-

What about if you are doing a slice range in a mutli select

e.g. select a..b from CF where KEY = 'foo' and KEY 'bar'

It's not possible to specify the key in the return.

 provide a typed key value for returned rows via JDBC
 

 Key: CASSANDRA-2704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2704
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Priority: Minor

 o.a.c.c.jdbc.CassandraResultSet provides access to column values as Cassandra 
 type aware via the TypedColumn. But it only provides a byte[] for the key. It 
 would be handy to have a TypedKey class that does the same but for the key.
 This would be handy when doing a multi SELECT as the server only returns rows 
 we have columns for.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2704) provide a typed key value for returned rows via JDBC

2011-05-25 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039428#comment-13039428
 ] 

Aaron Morton commented on CASSANDRA-2704:
-

Consider tracking some metric such as facebook likes count or twitter friends 
in a social media app. Columns may be the timestamp of when the value was 
recorded, row keys are the entity. You may want to get all the metrics since 
the start of the month for 3 entities to compare them. 

select '20110501T00:00:00'...'' from metric where key = 'coke' and key = 
'pepsi' and key = 'dr pepper';

This would support cases where metrics are possibly gathered at multiple times 
a day and the crawler does no look writes. At the end of the month individual 
values may be collapsed into a single column. 

Is there a better way to model this ? 

 provide a typed key value for returned rows via JDBC
 

 Key: CASSANDRA-2704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2704
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Priority: Minor

 o.a.c.c.jdbc.CassandraResultSet provides access to column values as Cassandra 
 type aware via the TypedColumn. But it only provides a byte[] for the key. It 
 would be handy to have a TypedKey class that does the same but for the key.
 This would be handy when doing a multi SELECT as the server only returns rows 
 we have columns for.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2704) provide a typed key value for returned rows via JDBC

2011-05-26 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2704:


Attachment: 0001-2704.patch

I'd already written the code last night before creating the ticket,and getting 
distracted by an unhappy toddler. So I may as well add it. 

The CassandraResultSet already provides the key as a byte[], this patch makes 
it available as a typed value like the columns. Handy for times when the key 
cannot projected into the result set. 

 provide a typed key value for returned rows via JDBC
 

 Key: CASSANDRA-2704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2704
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Priority: Minor
 Attachments: 0001-2704.patch


 o.a.c.c.jdbc.CassandraResultSet provides access to column values as Cassandra 
 type aware via the TypedColumn. But it only provides a byte[] for the key. It 
 would be handy to have a TypedKey class that does the same but for the key.
 This would be handy when doing a multi SELECT as the server only returns rows 
 we have columns for.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2717) duplicate rows returned from SELECT where KEY term is duplicated

2011-05-26 Thread Aaron Morton (JIRA)
duplicate rows returned from SELECT where KEY term is duplicated


 Key: CASSANDRA-2717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2717
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor


Noticed while working on CASSANDRA-2268 when random keys generated during a 
mutli_get test contain duplicate keys. 

The thrift multiget_slice() returns only the unique rows because of the map 
generated for the result. 

CQL will return a row for each KEY term in the SELECT. 

I could make QueryProcessor.getSlice() only create commands for the unique keys 
if we wanted to. 

Not sure it's a bug and it's definitely not something that should come up to 
often, reporting it because it's different to the thrift mutli_get operation. 

Happy to close if it's by design. 



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2717) duplicate rows returned from SELECT where KEY term is duplicated

2011-05-26 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040045#comment-13040045
 ] 

Aaron Morton commented on CASSANDRA-2717:
-

query was SELECT foo FROM my-cf WHERE KEY = 'bar' and KEY = 'bar';

Nothing a sane person would do, I was just using the randomly generated keys 
for the stress test the same way the thrift based one does. 


 duplicate rows returned from SELECT where KEY term is duplicated
 

 Key: CASSANDRA-2717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2717
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 Noticed while working on CASSANDRA-2268 when random keys generated during a 
 mutli_get test contain duplicate keys. 
 The thrift multiget_slice() returns only the unique rows because of the map 
 generated for the result. 
 CQL will return a row for each KEY term in the SELECT. 
 I could make QueryProcessor.getSlice() only create commands for the unique 
 keys if we wanted to. 
 Not sure it's a bug and it's definitely not something that should come up to 
 often, reporting it because it's different to the thrift mutli_get operation. 
 Happy to close if it's by design. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2717) duplicate rows returned from SELECT where KEY term is duplicated

2011-05-27 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040481#comment-13040481
 ] 

Aaron Morton commented on CASSANDRA-2717:
-

Ah, I sent a stupid query and it worked so I continued along. 

I'll try to dig into it this weekend. 

 duplicate rows returned from SELECT where KEY term is duplicated
 

 Key: CASSANDRA-2717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2717
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 Noticed while working on CASSANDRA-2268 when random keys generated during a 
 mutli_get test contain duplicate keys. 
 The thrift multiget_slice() returns only the unique rows because of the map 
 generated for the result. 
 CQL will return a row for each KEY term in the SELECT. 
 I could make QueryProcessor.getSlice() only create commands for the unique 
 keys if we wanted to. 
 Not sure it's a bug and it's definitely not something that should come up to 
 often, reporting it because it's different to the thrift mutli_get operation. 
 Happy to close if it's by design. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (CASSANDRA-2704) provide a typed key value for returned rows via JDBC

2011-05-29 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton reassigned CASSANDRA-2704:
---

Assignee: Aaron Morton

 provide a typed key value for returned rows via JDBC
 

 Key: CASSANDRA-2704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2704
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-2704.patch


 o.a.c.c.jdbc.CassandraResultSet provides access to column values as Cassandra 
 type aware via the TypedColumn. But it only provides a byte[] for the key. It 
 would be handy to have a TypedKey class that does the same but for the key.
 This would be handy when doing a multi SELECT as the server only returns rows 
 we have columns for.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2704) provide a typed key value for returned rows via JDBC

2011-05-29 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2704:


Attachment: 0001-2704-v2.patch

Updated to use the TypedColumn. Creates a ThriftColumn with a -1 timestamp like 
the server does when the Key is requested. 

 provide a typed key value for returned rows via JDBC
 

 Key: CASSANDRA-2704
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2704
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-2704-v2.patch, 0001-2704.patch


 o.a.c.c.jdbc.CassandraResultSet provides access to column values as Cassandra 
 type aware via the TypedColumn. But it only provides a byte[] for the key. It 
 would be handy to have a TypedKey class that does the same but for the key.
 This would be handy when doing a multi SELECT as the server only returns rows 
 we have columns for.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2268) CQL-enabled stress.java

2011-05-29 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2268:


Attachment: 0001-2268-wip.patch

Pavel attached is a work in progress that implements the insert and multi_get 
tests. 

I created a StressAPI interface and ThriftAPI and CqlAPI implementations for 
the operations to work through, and added some checking to see that all the 
requested columns and rows are read. Mostly as a sanity check when reading and 
writing through the different API's. 

If you get a minute can you let me know if the general approach is what you had 
in mind. If so I'll change the other Operations and tidy it up. 

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.1

 Attachments: 0001-2268-wip.patch


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2734) NPE running res.next() for a select statement

2011-06-02 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043200#comment-13043200
 ] 

Aaron Morton commented on CASSANDRA-2734:
-

on it now

 NPE running res.next() for a select statement
 -

 Key: CASSANDRA-2734
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2734
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0 beta 2
Reporter: Cathy Daw
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 *The following statement fails when used with a Statement or 
 PreparedStatement*
 {code}
 res = stmt.executeQuery(SELECT bar FROM users);  
 res.next();
 {code}
 *Error Message*
 {code}
 [junit] Testcase: simpleSelect(com.datastax.cql.reproBugTest):Caused 
 an ERROR
 [junit] null
 [junit] java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cql.jdbc.ColumnDecoder.makeKeyColumn(ColumnDecoder.java:136)
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CResultSet.next(CResultSet.java:388)
 [junit]   at 
 com.datastax.cql.reproBugTest.simpleSelect(reproBugTest.java:57)
 [junit] 
 [junit] 
 [junit] Test com.datastax.cql.reproBugTest FAILED
 {code}
 *Here is a quick repro.  Showing that res.next() works with other statements 
 but not select.*
 _Also notice that ResultSet.getMetaData().getColumnCount() always returns 
 zero._  
 _I noticed in the existing driver tests similar test cases, so not sure the 
 issue._
 *Steps to run script*
 * you will need to drop this in your test directory
 * change the package declaration
 * ant test -Dtest.name=reproBugTest
 {code}
 package com.datastax.cql;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import org.junit.Test;
 public class reproBugTest {
 
 @Test
 public void simpleSelect() throws Exception {   
 Connection connection = null;
 ResultSet res;
 Statement stmt;
 int colCount = 0;
 
 try {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Check create keyspace
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 stmt = connection.createStatement();
 try {
   System.out.println(Running DROP KS Statement);  
   res = stmt.executeQuery(DROP KEYSPACE ks1);  
   res.next();
   
   System.out.println(Running CREATE KS Statement);
   res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
   res.next();
 } catch (SQLException e) {
 if (e.getMessage().startsWith(Keyspace does not exist)) 
 {
 res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
 } 
 }   
 connection.close();
 
 // Run Test
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/ks1);   
   
 stmt = connection.createStatement();
 System.out.print(Running CREATE CF Statement);
 res = stmt.executeQuery(CREATE COLUMNFAMILY users (KEY varchar 
 PRIMARY KEY, password varchar, gender varchar, session_token varchar, state 
 varchar, birth_year bigint));
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.next();
 
 System.out.print(Running INSERT Statement);
 res = stmt.executeQuery(INSERT INTO users (KEY, password) VALUES 
 ('user1', 'ch@nge'));  
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.next();
 
 System.out.print(Running SELECT Statement);
 res = stmt.executeQuery(SELECT bar FROM users);  
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.getRow();
 res.next();
 
 connection.close();   
 } catch (SQLException e) {
 e.printStackTrace();
 }
 }

 }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2734) NPE running res.next() for a select statement

2011-06-02 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043212#comment-13043212
 ] 

Aaron Morton commented on CASSANDRA-2734:
-

All the keyspace meta data is read from the server and cached in the 
ColumnDecoder the first time o.a.c.cql.jdbc.Connection.execute() is run and 
cached for future calls. In the test this happens when the CREATE 
COLUMNFAMILY statement is run and so there are no CF's defined for the ks1.


The code for makeKeyColumn() expects that all the meta data is present, like 
the other functions on ColumnDecoder. And the NPE is because the CF does not 
exist in the ks metadata held by the Connection. 

The test passes if a new connection is created after CREATE COLUMNFAMILY. 

We could either:

1. Raise a StaleClientSchemaException or some such from ColumnDecoder if it 
does not have the meta data for a CF. Clients responsibility would be to create 
a new connection.  
2. Re-get the meta data if an unknown CF is returned. Unsure about the 
implications, if any, of lots of connections discovering they have bad meta 
data all at once. 


INSERT and CREATE COLUMNFAMILY return void resultsets, the 0 column count is to 
be expected. 

btw, thanks for the nice bug report :)
 

 NPE running res.next() for a select statement
 -

 Key: CASSANDRA-2734
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2734
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0 beta 2
Reporter: Cathy Daw
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 *The following statement fails when used with a Statement or 
 PreparedStatement*
 {code}
 res = stmt.executeQuery(SELECT bar FROM users);  
 res.next();
 {code}
 *Error Message*
 {code}
 [junit] Testcase: simpleSelect(com.datastax.cql.reproBugTest):Caused 
 an ERROR
 [junit] null
 [junit] java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cql.jdbc.ColumnDecoder.makeKeyColumn(ColumnDecoder.java:136)
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CResultSet.next(CResultSet.java:388)
 [junit]   at 
 com.datastax.cql.reproBugTest.simpleSelect(reproBugTest.java:57)
 [junit] 
 [junit] 
 [junit] Test com.datastax.cql.reproBugTest FAILED
 {code}
 *Here is a quick repro.  Showing that res.next() works with other statements 
 but not select.*
 _Also notice that ResultSet.getMetaData().getColumnCount() always returns 
 zero._  
 _I noticed in the existing driver tests similar test cases, so not sure the 
 issue._
 *Steps to run script*
 * you will need to drop this in your test directory
 * change the package declaration
 * ant test -Dtest.name=reproBugTest
 {code}
 package com.datastax.cql;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import org.junit.Test;
 public class reproBugTest {
 
 @Test
 public void simpleSelect() throws Exception {   
 Connection connection = null;
 ResultSet res;
 Statement stmt;
 int colCount = 0;
 
 try {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Check create keyspace
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 stmt = connection.createStatement();
 try {
   System.out.println(Running DROP KS Statement);  
   res = stmt.executeQuery(DROP KEYSPACE ks1);  
   res.next();
   
   System.out.println(Running CREATE KS Statement);
   res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
   res.next();
 } catch (SQLException e) {
 if (e.getMessage().startsWith(Keyspace does not exist)) 
 {
 res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
 } 
 }   
 connection.close();
 
 // Run Test
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/ks1);   
   
 stmt = connection.createStatement();
 System.out.print(Running CREATE CF Statement);
 res = stmt.executeQuery(CREATE COLUMNFAMILY users (KEY varchar 
 PRIMARY KEY, password varchar, gender varchar, session_token varchar, state 
 varchar, birth_year bigint));
 colCount = res.getMetaData().getColumnCount();
 

[jira] [Commented] (CASSANDRA-2734) NPE running res.next() for a select statement

2011-06-03 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043270#comment-13043270
 ] 

Aaron Morton commented on CASSANDRA-2734:
-

I think there is a reasonable case for handling it better, it just may be out 
of scope for the current CQL dev cycle. 

Imagine long running code, a schema change, and a situation where the code does 
not have to be restarted to start executing queries using the new or modified 
CF's. Or imagine creating a Query Tool for cassandra like other DB's have, 
where DDL and DML statement are run. 

Jonathan, in the future could we include the schema id in the query response 
and in the response when the client reads the schema (currently 
describe_keyspaces). Then we could invalidate the client side meta data.  
   
I'll take another look at the column count example. 



 NPE running res.next() for a select statement
 -

 Key: CASSANDRA-2734
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2734
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0 beta 2
Reporter: Cathy Daw
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 *The following statement fails when used with a Statement or 
 PreparedStatement*
 {code}
 res = stmt.executeQuery(SELECT bar FROM users);  
 res.next();
 {code}
 *Error Message*
 {code}
 [junit] Testcase: simpleSelect(com.datastax.cql.reproBugTest):Caused 
 an ERROR
 [junit] null
 [junit] java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cql.jdbc.ColumnDecoder.makeKeyColumn(ColumnDecoder.java:136)
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CResultSet.next(CResultSet.java:388)
 [junit]   at 
 com.datastax.cql.reproBugTest.simpleSelect(reproBugTest.java:57)
 [junit] 
 [junit] 
 [junit] Test com.datastax.cql.reproBugTest FAILED
 {code}
 *Here is a quick repro.  Showing that res.next() works with other statements 
 but not select.*
 _Also notice that ResultSet.getMetaData().getColumnCount() always returns 
 zero._  
 _I noticed in the existing driver tests similar test cases, so not sure the 
 issue._
 *Steps to run script*
 * you will need to drop this in your test directory
 * change the package declaration
 * ant test -Dtest.name=reproBugTest
 {code}
 package com.datastax.cql;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import org.junit.Test;
 public class reproBugTest {
 
 @Test
 public void simpleSelect() throws Exception {   
 Connection connection = null;
 ResultSet res;
 Statement stmt;
 int colCount = 0;
 
 try {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Check create keyspace
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 stmt = connection.createStatement();
 try {
   System.out.println(Running DROP KS Statement);  
   res = stmt.executeQuery(DROP KEYSPACE ks1);  
   res.next();
   
   System.out.println(Running CREATE KS Statement);
   res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
   res.next();
 } catch (SQLException e) {
 if (e.getMessage().startsWith(Keyspace does not exist)) 
 {
 res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
 } 
 }   
 connection.close();
 
 // Run Test
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/ks1);   
   
 stmt = connection.createStatement();
 System.out.print(Running CREATE CF Statement);
 res = stmt.executeQuery(CREATE COLUMNFAMILY users (KEY varchar 
 PRIMARY KEY, password varchar, gender varchar, session_token varchar, state 
 varchar, birth_year bigint));
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.next();
 
 System.out.print(Running INSERT Statement);
 res = stmt.executeQuery(INSERT INTO users (KEY, password) VALUES 
 ('user1', 'ch@nge'));  
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.next();
   

[jira] [Commented] (CASSANDRA-2621) sub columns under deleted CF returned

2011-06-05 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044669#comment-13044669
 ] 

Aaron Morton commented on CASSANDRA-2621:
-

Yes. 
It could be a bit clearer but it's working as designed. 

 sub columns under deleted CF returned 
 --

 Key: CASSANDRA-2621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2621
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.5, 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor

 found when working on CASSANDRA-2590
 In some cases sub columns are not filtered to check if they have a higher 
 timestamp than and container super column or CF. For example a super col with 
 two two columns, one with timestamp 0 and the other 5, will be returned with 
 all columns even if there is a row delete at timestamp 2. 
 If the QueryFilter is created with a null superColumnName in the QueryPath it 
 will not filter the sub columns. 
 IdentityQueryFilter.filterSuperColumn() lets all sub columns through. 
 NamesQueryFilter.filterSubColumn() and SliceQueryFilter() check that each sub 
 column is relavent.   
 I have a fix and am working on some test cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-2621) sub columns under deleted CF returned

2011-06-05 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton resolved CASSANDRA-2621.
-

Resolution: Not A Problem

 sub columns under deleted CF returned 
 --

 Key: CASSANDRA-2621
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2621
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.5, 0.8.0 beta 2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor

 found when working on CASSANDRA-2590
 In some cases sub columns are not filtered to check if they have a higher 
 timestamp than and container super column or CF. For example a super col with 
 two two columns, one with timestamp 0 and the other 5, will be returned with 
 all columns even if there is a row delete at timestamp 2. 
 If the QueryFilter is created with a null superColumnName in the QueryPath it 
 will not filter the sub columns. 
 IdentityQueryFilter.filterSuperColumn() lets all sub columns through. 
 NamesQueryFilter.filterSubColumn() and SliceQueryFilter() check that each sub 
 column is relavent.   
 I have a fix and am working on some test cases. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2468) Clean up after failed compaction

2011-06-05 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2468:


Attachment: 0001-clean-up-temp-files-after-failed-compaction-v08-2.patch

rebased the v08 version today, attached as 
0001-clean-up-temp-files-after-failed-compaction-v08-2

Have not updated v07, let me know if you need it.  

 Clean up after failed compaction
 

 Key: CASSANDRA-2468
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2468
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.7

 Attachments: 
 0001-clean-up-temp-files-after-failed-compaction-v08-2.patch, 
 0001-clean-up-temp-files-after-failed-compaction-v08.patch, 
 0001-cleanup-temp-files-after-failed-compaction-v07.patch


 (Started in CASSANDRA-2088.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2734) NPE running res.next() for a select statement

2011-06-05 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044683#comment-13044683
 ] 

Aaron Morton commented on CASSANDRA-2734:
-

Cathy, 
I've taken a look at the code and cannot reproduce the bug. I think the problem 
is you are checking the column count before advancing to the first record, call 
next() first. There should only be a column count for the SELECT statement and 
it potentially will be different for each row. 

 NPE running res.next() for a select statement
 -

 Key: CASSANDRA-2734
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2734
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0 beta 2
Reporter: Cathy Daw
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 *The following statement fails when used with a Statement or 
 PreparedStatement*
 {code}
 res = stmt.executeQuery(SELECT bar FROM users);  
 res.next();
 {code}
 *Error Message*
 {code}
 [junit] Testcase: simpleSelect(com.datastax.cql.reproBugTest):Caused 
 an ERROR
 [junit] null
 [junit] java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cql.jdbc.ColumnDecoder.makeKeyColumn(ColumnDecoder.java:136)
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CResultSet.next(CResultSet.java:388)
 [junit]   at 
 com.datastax.cql.reproBugTest.simpleSelect(reproBugTest.java:57)
 [junit] 
 [junit] 
 [junit] Test com.datastax.cql.reproBugTest FAILED
 {code}
 *Here is a quick repro.  Showing that res.next() works with other statements 
 but not select.*
 _Also notice that ResultSet.getMetaData().getColumnCount() always returns 
 zero._  
 _I noticed in the existing driver tests similar test cases, so not sure the 
 issue._
 *Steps to run script*
 * you will need to drop this in your test directory
 * change the package declaration
 * ant test -Dtest.name=reproBugTest
 {code}
 package com.datastax.cql;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import org.junit.Test;
 public class reproBugTest {
 
 @Test
 public void simpleSelect() throws Exception {   
 Connection connection = null;
 ResultSet res;
 Statement stmt;
 int colCount = 0;
 
 try {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Check create keyspace
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 stmt = connection.createStatement();
 try {
   System.out.println(Running DROP KS Statement);  
   res = stmt.executeQuery(DROP KEYSPACE ks1);  
   res.next();
   
   System.out.println(Running CREATE KS Statement);
   res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
   res.next();
 } catch (SQLException e) {
 if (e.getMessage().startsWith(Keyspace does not exist)) 
 {
 res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
 } 
 }   
 connection.close();
 
 // Run Test
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/ks1);   
   
 stmt = connection.createStatement();
 System.out.print(Running CREATE CF Statement);
 res = stmt.executeQuery(CREATE COLUMNFAMILY users (KEY varchar 
 PRIMARY KEY, password varchar, gender varchar, session_token varchar, state 
 varchar, birth_year bigint));
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.next();
 
 System.out.print(Running INSERT Statement);
 res = stmt.executeQuery(INSERT INTO users (KEY, password) VALUES 
 ('user1', 'ch@nge'));  
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.next();
 
 System.out.print(Running SELECT Statement);
 res = stmt.executeQuery(SELECT bar FROM users);  
 colCount = res.getMetaData().getColumnCount();
 System.out.println( -- Column Count:  + colCount); 
 res.getRow();
 res.next();
 
 connection.close();   
 

[jira] [Commented] (CASSANDRA-2734) NPE running res.next() for a select statement

2011-06-05 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044692#comment-13044692
 ] 

Aaron Morton commented on CASSANDRA-2734:
-

Jonathan, 
This is from the Postgres Protocol docs 
http://www.postgresql.org/docs/9.0/static/protocol-flow.html#AEN84318

bq. The response to a SELECT query (or other queries that return row sets, such 
as EXPLAIN or SHOW) normally consists of RowDescription, zero or more DataRow 
messages, and then CommandComplete. COPY to or from the frontend invokes 
special protocol as described in Section 46.2.5. All other query types normally 
produce only a CommandComplete message.

Obviously the ability for of a SQL SELECT to cast and create any data it wants 
makes their life a bit harder.

Longer term we could allow the client to include it's schema ID in the request 
(like the compression param) and then return the schema for the CF involved in 
the select if it does not match. Could get a bit messy with the client holding 
CF definitions from different schema versions. Returning the full KS def may be 
a bit heavy weight.   

Short term and until CASSANDRA-2477 is in place, a possible solution is:

Add an optional string to the CqlResult thrift type that is set to 
DatabaseDescriptor.getDefsVersion() for ROWS result types. 

Have the JDBC client could also call describe_schema_versions() and get the 
schema ID for the node it's connected to when it is building the meta data. 
describe_schema_versions() goes to all live nodes.
  

 NPE running res.next() for a select statement
 -

 Key: CASSANDRA-2734
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2734
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0 beta 2
Reporter: Cathy Daw
Assignee: Aaron Morton
Priority: Minor
  Labels: cql

 *The following statement fails when used with a Statement or 
 PreparedStatement*
 {code}
 res = stmt.executeQuery(SELECT bar FROM users);  
 res.next();
 {code}
 *Error Message*
 {code}
 [junit] Testcase: simpleSelect(com.datastax.cql.reproBugTest):Caused 
 an ERROR
 [junit] null
 [junit] java.lang.NullPointerException
 [junit]   at 
 org.apache.cassandra.cql.jdbc.ColumnDecoder.makeKeyColumn(ColumnDecoder.java:136)
 [junit]   at 
 org.apache.cassandra.cql.jdbc.CResultSet.next(CResultSet.java:388)
 [junit]   at 
 com.datastax.cql.reproBugTest.simpleSelect(reproBugTest.java:57)
 [junit] 
 [junit] 
 [junit] Test com.datastax.cql.reproBugTest FAILED
 {code}
 *Here is a quick repro.  Showing that res.next() works with other statements 
 but not select.*
 _Also notice that ResultSet.getMetaData().getColumnCount() always returns 
 zero._  
 _I noticed in the existing driver tests similar test cases, so not sure the 
 issue._
 *Steps to run script*
 * you will need to drop this in your test directory
 * change the package declaration
 * ant test -Dtest.name=reproBugTest
 {code}
 package com.datastax.cql;
 import java.sql.DriverManager;
 import java.sql.Connection;
 import java.sql.ResultSet;
 import java.sql.SQLException;
 import java.sql.Statement;
 import org.junit.Test;
 public class reproBugTest {
 
 @Test
 public void simpleSelect() throws Exception {   
 Connection connection = null;
 ResultSet res;
 Statement stmt;
 int colCount = 0;
 
 try {
 Class.forName(org.apache.cassandra.cql.jdbc.CassandraDriver);
 
 // Check create keyspace
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/default);
  
 stmt = connection.createStatement();
 try {
   System.out.println(Running DROP KS Statement);  
   res = stmt.executeQuery(DROP KEYSPACE ks1);  
   res.next();
   
   System.out.println(Running CREATE KS Statement);
   res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
   res.next();
 } catch (SQLException e) {
 if (e.getMessage().startsWith(Keyspace does not exist)) 
 {
 res = stmt.executeQuery(CREATE KEYSPACE ks1 with 
 strategy_class =  'org.apache.cassandra.locator.SimpleStrategy' and 
 strategy_options:replication_factor=1);  
 } 
 }   
 connection.close();
 
 // Run Test
 connection = 
 DriverManager.getConnection(jdbc:cassandra:root/root@127.0.0.1:9160/ks1);   
   
 stmt = connection.createStatement();
 System.out.print(Running CREATE CF Statement);
  

[jira] [Created] (CASSANDRA-2747) memtable flush during index build causes AssertionError

2011-06-07 Thread Aaron Morton (JIRA)
memtable flush during index build causes AssertionError
---

 Key: CASSANDRA-2747
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2747
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton


Noticed when loading a lot of rows and then creating secondary indexes using 
update CF via the CLI. 

{code:java}
ERROR 18:56:25,008 Fatal exception in thread Thread[FlushWriter:3,5,main]
java.lang.AssertionError
at org.apache.cassandra.io.sstable.SSTable.init(SSTable.java:91)
at 
org.apache.cassandra.io.sstable.SSTableWriter.init(SSTableWriter.java:71)
at 
org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2124)
at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}

Table.IndexBuilder.build() calls cfs.maybeSwitchMemtable() with writeCommitLog 
false. So a null ReplayPosition is eventually passed to 
Memtable.writeSortedContents(). 

SSTableRead.open() checks Descriptor.hasReplayPosition() and it looks like any 
0.8 stats file should have a ReplayPosition. 

Looks like cfs.maybeSwitchMemtable() should use ReplayPosition.NONE rather than 
null. Patch looks easy, will also try to write a test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-2747) memtable flush during index build causes AssertionError

2011-06-07 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton resolved CASSANDRA-2747.
-

Resolution: Duplicate

Was looking at the 0.8.0 tag, fixed in the 0.8 branch. 

Changes says CASSANDRA-2781 but this issue does not exist yet.

avoid NPE when bypassing commitlog during memtable flush (CASSANDRA-2781)
 

 memtable flush during index build causes AssertionError
 ---

 Key: CASSANDRA-2747
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2747
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0
Reporter: Aaron Morton
Assignee: Aaron Morton

 Noticed when loading a lot of rows and then creating secondary indexes using 
 update CF via the CLI. 
 {code:java}
 ERROR 18:56:25,008 Fatal exception in thread Thread[FlushWriter:3,5,main]
 java.lang.AssertionError
 at org.apache.cassandra.io.sstable.SSTable.init(SSTable.java:91)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.init(SSTableWriter.java:71)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.createFlushWriter(ColumnFamilyStore.java:2124)
 at 
 org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:246)
 at org.apache.cassandra.db.Memtable.access$400(Memtable.java:49)
 at org.apache.cassandra.db.Memtable$3.runMayThrow(Memtable.java:270)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}
 Table.IndexBuilder.build() calls cfs.maybeSwitchMemtable() with 
 writeCommitLog false. So a null ReplayPosition is eventually passed to 
 Memtable.writeSortedContents(). 
 SSTableRead.open() checks Descriptor.hasReplayPosition() and it looks like 
 any 0.8 stats file should have a ReplayPosition. 
 Looks like cfs.maybeSwitchMemtable() should use ReplayPosition.NONE rather 
 than null. Patch looks easy, will also try to write a test.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2468) Clean up after failed compaction

2011-06-08 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2468:


Attachment: 0001-clean-up-temp-files-after-failed-compaction-v08-3.patch

version 3 for v0.8 modified SSTable.delete() to raise an IOException so 
cleanupIfNecessary() can catch it. Also changes componentsFor to accept an 
enum. 

Do we want this in 0.7?

 Clean up after failed compaction
 

 Key: CASSANDRA-2468
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2468
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Jonathan Ellis
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.7

 Attachments: 
 0001-clean-up-temp-files-after-failed-compaction-v08-2.patch, 
 0001-clean-up-temp-files-after-failed-compaction-v08-3.patch, 
 0001-clean-up-temp-files-after-failed-compaction-v08.patch, 
 0001-cleanup-temp-files-after-failed-compaction-v07.patch


 (Started in CASSANDRA-2088.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2589) row deletes do not remove columns

2011-06-16 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050872#comment-13050872
 ] 

Aaron Morton commented on CASSANDRA-2589:
-

I'm passing Integer.MIN_VALUE for the gcBefore so I thought it would only 
remove colums if they were under a CF tombstone. 

One of the issues I ran into is that while it's seems technically correct to 
purge a tombstone after GCGraceSeconds, if it is not written into an SSTable 
it's lost.  

 row deletes do not remove columns
 -

 Key: CASSANDRA-2589
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2589
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.2

 Attachments: 
 0001-remove-deleted-columns-before-flushing-memtable-v07.patch, 
 0001-remove-deleted-columns-before-flushing-memtable-v08.patch


 When a row delete is issued CF.delete() sets the localDeletetionTime and 
 markedForDeleteAt values but does not remove columns which have a lower time 
 stamp. As a result:
 # Memory which could be freed is held on to (prob not too bad as it's already 
 counted)
 # The deleted columns are serialised to disk, along with the CF info to say 
 they are no longer valid. 
 # NamesQueryFilter and SliceQueryFilter have to do more work as they filter 
 out the irrelevant columns using QueryFilter.isRelevant()
 # Also columns written with a lower time stamp after the deletion are added 
 to the CF without checking markedForDeletionAt.
 This can cause RR to fail, will create another ticket for that and link. This 
 ticket is for a fix to removing the columns. 
 Two options I could think of:
 # Check for deletion when serialising to SSTable and ignore columns if the 
 have a lower timestamp. Otherwise leave as is so dead columns stay in memory. 
 # Ensure at all times if the CF is deleted all columns it contains have a 
 higher timestamp. 
 ## I *think* this would include all column types (DeletedColumn as well) as 
 the CF deletion has the same effect. But not sure.
 ## Deleting (potentially) all columns in delete() will take time. Could track 
 the highest timestamp in the CF so the normal case of deleting all cols does 
 not need to iterate. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2221) 'show create' commands on the CLI to export schema

2011-06-16 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2221:


Attachment: 0001-add-show-schema-statement-v08-2.patch

rebased for v0.8 as 0001-add-show-schema-statement-v08-2.patch

Let me know if you want it for 0.7

 'show create' commands on the CLI to export schema
 --

 Key: CASSANDRA-2221
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2221
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Aaron Morton
Priority: Minor
  Labels: cli
 Fix For: 0.8.2

 Attachments: 0001-add-show-schema-statement-8.patch, 
 0001-add-show-schema-statement-v08-2.patch, 
 0001-add-show-schema-statement.patch


 It would be nice to have 'show create' type of commands on the command-line 
 so that it would generate the DDL for the schema.
 A scenario that would make this useful is where a team works out a data model 
 over time with a dev cluster.  They want to use parts of that schema for new 
 clusters that they create, like a staging/prod cluster.  It would be very 
 handy in this scenario to have some sort of export mechanism.
 Another use case is for testing purposes - you want to replicate a problem.
 We currently have schematool for import/export but that is deprecated and it 
 exports into yaml.
 This new feature would just be able to 'show' - or export if they want the 
 entire keyspace - into a script or commands that could be used in a cli 
 script.  It would need to be able to regenerate everything about the keyspace 
 including indexes and metadata.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2828) CommitLog tool

2011-06-26 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2828:


Attachment: 0001-2828-07.patch

LogTool command line to read 0.7 log headers and output which CF's are stopping 
the segment from flushing. 

 CommitLog tool
 --

 Key: CASSANDRA-2828
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2828
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Affects Versions: 0.7.6
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-2828-07.patch


 I wrote this for 0.7.6-2 because I had a need to see what log segment headers 
 were preventing logs from flushing. 
 I've not had a chance to look at it in 0.8 yet. We dont not has header files 
 anymore, so I could turn this into a function on the StorageServiceMBean.
 For my use case i pulled the log headers off a server that had gone into a 
 spin after it filled the commit log volume. nodetool was not running so these 
 was the best solution for me. 
 Posting here to see if there is any interest or need. I think the best 
 approach may be to add a function to the StorageService MBean to find out 
 which CF's are dirty in the active log segments.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2829) always flush memtables

2011-06-26 Thread Aaron Morton (JIRA)
always flush memtables
--

 Key: CASSANDRA-2829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor


Only dirty Memtables are flushed, and so only dirty memtables are used to 
discard obsolete commit log segments. This can result it log segments not been 
deleted even though the data has been flushed.  

Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data 
loaded and a running application working against the cluster. Did a rolling 
restart and then kicked off a repair, one node filled up the commit log volume 
with 7GB+ of log data, there was about 20 hours of log files. 

{noformat}
$ sudo ls -lah commitlog/
total 6.9G
drwx-- 2 cassandra cassandra  12K 2011-06-24 20:38 .
drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
-rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 
CommitLog-1308876643288.log
-rw--- 1 cassandra cassandra   28 2011-06-24 20:47 
CommitLog-1308876643288.log.header
-rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 
CommitLog-1308877711517.log
-rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
CommitLog-1308877711517.log.header
-rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 
CommitLog-1308879395824.log
-rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
CommitLog-1308879395824.log.header
...
-rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 
CommitLog-1308946745380.log
-rw-r--r-- 1 cassandra cassandra   36 2011-06-24 20:47 
CommitLog-1308946745380.log.header
-rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 
CommitLog-1308947888397.log
-rw-r--r-- 1 cassandra cassandra   44 2011-06-24 20:47 
CommitLog-1308947888397.log.header
{noformat}

The user KS has 2 CF's with 60 minute flush times. System KS had the default 
settings which is 24 hours. Will create another ticket see if these can be 
reduced or if it's something users should do, in this case it would not have 
mattered. 

I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the 
segments had the system CF's marked as dirty.

{noformat}
$ bin/logtool dirty /tmp/logs/commitlog/

Not connected to a server, Keyspace and Column Family names are not available.

/tmp/logs/commitlog/CommitLog-1308876643288.log.header
Keyspace Unknown:
Cf id 0: 444
/tmp/logs/commitlog/CommitLog-1308877711517.log.header
Keyspace Unknown:
Cf id 1: 68848763
...
/tmp/logs/commitlog/CommitLog-1308944451460.log.header
Keyspace Unknown:
Cf id 1: 61074
/tmp/logs/commitlog/CommitLog-1308945597471.log.header
Keyspace Unknown:
Cf id 1000: 43175492
Cf id 1: 108483
/tmp/logs/commitlog/CommitLog-1308946745380.log.header
Keyspace Unknown:
Cf id 1000: 239223
Cf id 1: 172211

/tmp/logs/commitlog/CommitLog-1308947888397.log.header
Keyspace Unknown:
Cf id 1001: 57595560
Cf id 1: 816960
Cf id 1000: 0
{noformat}

CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont have 
it now, but IIRC CFStats showed the LocationInfo CF with dirty ops. 

I was able to repo a case where flushing the CF's did not mark the log segments 
as obsolete (attached unit-test patch). Steps are:

1. Write to cf1 and flush.
2. Current log segment is marked as dirty at the CL position when the flush 
started, CommitLog.discardCompletedSegmentsInternal()
3. Do not write to cf1 again.
4. Roll the log, my test does this manually. 
5. Write to CF2 and flush.
6. Only CF2 is flushed because it is the only dirty CF. 
cfs.maybeSwitchMemtable() is not called for cf1 and so log segment 1 is still 
marked as dirty from cf1.

Step 5 is not essential, just matched what I thought was happening. I thought 
SystemTable.updateToken() was called which does not flush, and this was the 
last thing that happened.  

The expired memtable thread created by Table uses the same cfs.forceFlush() 
which is a no-op if the cf or it's secondary indexes are clean. 

I think the same problem would exist in 0.8. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2829) always flush memtables

2011-06-26 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2829:


Attachment: 0001-2829-unit-test.patch
0002-2829.patch

2829-unit-test contains the unit test for the problem. 2829 is the fix. 

 always flush memtables
 --

 Key: CASSANDRA-2829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.6
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-2829-unit-test.patch, 0002-2829.patch


 Only dirty Memtables are flushed, and so only dirty memtables are used to 
 discard obsolete commit log segments. This can result it log segments not 
 been deleted even though the data has been flushed.  
 Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data 
 loaded and a running application working against the cluster. Did a rolling 
 restart and then kicked off a repair, one node filled up the commit log 
 volume with 7GB+ of log data, there was about 20 hours of log files. 
 {noformat}
 $ sudo ls -lah commitlog/
 total 6.9G
 drwx-- 2 cassandra cassandra  12K 2011-06-24 20:38 .
 drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
 -rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 
 CommitLog-1308876643288.log
 -rw--- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308876643288.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 
 CommitLog-1308877711517.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308877711517.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 
 CommitLog-1308879395824.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308879395824.log.header
 ...
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 
 CommitLog-1308946745380.log
 -rw-r--r-- 1 cassandra cassandra   36 2011-06-24 20:47 
 CommitLog-1308946745380.log.header
 -rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 
 CommitLog-1308947888397.log
 -rw-r--r-- 1 cassandra cassandra   44 2011-06-24 20:47 
 CommitLog-1308947888397.log.header
 {noformat}
 The user KS has 2 CF's with 60 minute flush times. System KS had the default 
 settings which is 24 hours. Will create another ticket see if these can be 
 reduced or if it's something users should do, in this case it would not have 
 mattered. 
 I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the 
 segments had the system CF's marked as dirty.
 {noformat}
 $ bin/logtool dirty /tmp/logs/commitlog/
 Not connected to a server, Keyspace and Column Family names are not available.
 /tmp/logs/commitlog/CommitLog-1308876643288.log.header
 Keyspace Unknown:
   Cf id 0: 444
 /tmp/logs/commitlog/CommitLog-1308877711517.log.header
 Keyspace Unknown:
   Cf id 1: 68848763
 ...
 /tmp/logs/commitlog/CommitLog-1308944451460.log.header
 Keyspace Unknown:
   Cf id 1: 61074
 /tmp/logs/commitlog/CommitLog-1308945597471.log.header
 Keyspace Unknown:
   Cf id 1000: 43175492
   Cf id 1: 108483
 /tmp/logs/commitlog/CommitLog-1308946745380.log.header
 Keyspace Unknown:
   Cf id 1000: 239223
   Cf id 1: 172211
 /tmp/logs/commitlog/CommitLog-1308947888397.log.header
 Keyspace Unknown:
   Cf id 1001: 57595560
   Cf id 1: 816960
   Cf id 1000: 0
 {noformat}
 CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont 
 have it now, but IIRC CFStats showed the LocationInfo CF with dirty ops. 
 I was able to repo a case where flushing the CF's did not mark the log 
 segments as obsolete (attached unit-test patch). Steps are:
 1. Write to cf1 and flush.
 2. Current log segment is marked as dirty at the CL position when the flush 
 started, CommitLog.discardCompletedSegmentsInternal()
 3. Do not write to cf1 again.
 4. Roll the log, my test does this manually. 
 5. Write to CF2 and flush.
 6. Only CF2 is flushed because it is the only dirty CF. 
 cfs.maybeSwitchMemtable() is not called for cf1 and so log segment 1 is still 
 marked as dirty from cf1.
 Step 5 is not essential, just matched what I thought was happening. I thought 
 SystemTable.updateToken() was called which does not flush, and this was the 
 last thing that happened.  
 The expired memtable thread created by Table uses the same cfs.forceFlush() 
 which is a no-op if the cf or it's secondary indexes are clean. 
 
 I think the same problem would exist in 0.8. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2589) row deletes do not remove columns

2011-06-28 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2589:


Attachment: 0001-remove-deleted-columns-before-flushing-memtable-v08-2.patch
0001-remove-deleted-columns-before-flushing-memtable-v07-2.patch

*-2.patch adds comment 

// Pedantically you could purge column level tombstones that are past GCGRace 
when writing to the SSTable.
// But it can result in unexpected behaviour where deletes never make it to 
disk,
// as they are lost and so cannot override existing column values. So we only 
remove deleted columns if there
// is a CF level tombstone to ensure the delete makes it into an SSTable.

 row deletes do not remove columns
 -

 Key: CASSANDRA-2589
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2589
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.2

 Attachments: 
 0001-remove-deleted-columns-before-flushing-memtable-v07-2.patch, 
 0001-remove-deleted-columns-before-flushing-memtable-v07.patch, 
 0001-remove-deleted-columns-before-flushing-memtable-v08-2.patch, 
 0001-remove-deleted-columns-before-flushing-memtable-v08.patch


 When a row delete is issued CF.delete() sets the localDeletetionTime and 
 markedForDeleteAt values but does not remove columns which have a lower time 
 stamp. As a result:
 # Memory which could be freed is held on to (prob not too bad as it's already 
 counted)
 # The deleted columns are serialised to disk, along with the CF info to say 
 they are no longer valid. 
 # NamesQueryFilter and SliceQueryFilter have to do more work as they filter 
 out the irrelevant columns using QueryFilter.isRelevant()
 # Also columns written with a lower time stamp after the deletion are added 
 to the CF without checking markedForDeletionAt.
 This can cause RR to fail, will create another ticket for that and link. This 
 ticket is for a fix to removing the columns. 
 Two options I could think of:
 # Check for deletion when serialising to SSTable and ignore columns if the 
 have a lower timestamp. Otherwise leave as is so dead columns stay in memory. 
 # Ensure at all times if the CF is deleted all columns it contains have a 
 higher timestamp. 
 ## I *think* this would include all column types (DeletedColumn as well) as 
 the CF deletion has the same effect. But not sure.
 ## Deleting (potentially) all columns in delete() will take time. Could track 
 the highest timestamp in the CF so the normal case of deleting all cols does 
 not need to iterate. 
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2829) memtable with no post-flush activity can leave commitlog permanently dirty

2011-07-21 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2829:


Attachment: 0002-2829-v08.patch
0001-2829-unit-test-v08.patch

I got to take another look at this tonight on the 0.8 trunk and ported the unit 
test to 0.8. 

The 002-2829-v08 patch was my second attempt. It changes CFS.forceFlush() to 
always flush and trusts maybeSwitchMemtable() will only flush non clean CF's. 

There are no changes to  CommitLog.discardCompletedSegmentsInternal(). The CF 
will be turned off in any segment that is not the context segment. It will 
always be turned on in the current / context segment. I think this gives the 
correct behaviour, i.e. the cf can never have dirty changes in a segment that 
is not current AND the cf may have changes in a segment that is current. It is 
a bit sloppy though as clean CF's will mark segments as dirty which may delay 
them been cleaned. 


I also think there is a theoretical risk of a race condition with access to the 
segments Deque.  The iterator runs in the postFlushExecutor and the segments 
are added on the appropriate commit log executor service.



 memtable with no post-flush activity can leave commitlog permanently dirty 
 ---

 Key: CASSANDRA-2829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Morton
Assignee: Jonathan Ellis
 Fix For: 0.8.2

 Attachments: 0001-2829-unit-test-v08.patch, 
 0001-2829-unit-test.patch, 0002-2829-v08.patch, 0002-2829.patch


 Only dirty Memtables are flushed, and so only dirty memtables are used to 
 discard obsolete commit log segments. This can result it log segments not 
 been deleted even though the data has been flushed.  
 Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data 
 loaded and a running application working against the cluster. Did a rolling 
 restart and then kicked off a repair, one node filled up the commit log 
 volume with 7GB+ of log data, there was about 20 hours of log files. 
 {noformat}
 $ sudo ls -lah commitlog/
 total 6.9G
 drwx-- 2 cassandra cassandra  12K 2011-06-24 20:38 .
 drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
 -rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 
 CommitLog-1308876643288.log
 -rw--- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308876643288.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 
 CommitLog-1308877711517.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308877711517.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 
 CommitLog-1308879395824.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308879395824.log.header
 ...
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 
 CommitLog-1308946745380.log
 -rw-r--r-- 1 cassandra cassandra   36 2011-06-24 20:47 
 CommitLog-1308946745380.log.header
 -rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 
 CommitLog-1308947888397.log
 -rw-r--r-- 1 cassandra cassandra   44 2011-06-24 20:47 
 CommitLog-1308947888397.log.header
 {noformat}
 The user KS has 2 CF's with 60 minute flush times. System KS had the default 
 settings which is 24 hours. Will create another ticket see if these can be 
 reduced or if it's something users should do, in this case it would not have 
 mattered. 
 I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the 
 segments had the system CF's marked as dirty.
 {noformat}
 $ bin/logtool dirty /tmp/logs/commitlog/
 Not connected to a server, Keyspace and Column Family names are not available.
 /tmp/logs/commitlog/CommitLog-1308876643288.log.header
 Keyspace Unknown:
   Cf id 0: 444
 /tmp/logs/commitlog/CommitLog-1308877711517.log.header
 Keyspace Unknown:
   Cf id 1: 68848763
 ...
 /tmp/logs/commitlog/CommitLog-1308944451460.log.header
 Keyspace Unknown:
   Cf id 1: 61074
 /tmp/logs/commitlog/CommitLog-1308945597471.log.header
 Keyspace Unknown:
   Cf id 1000: 43175492
   Cf id 1: 108483
 /tmp/logs/commitlog/CommitLog-1308946745380.log.header
 Keyspace Unknown:
   Cf id 1000: 239223
   Cf id 1: 172211
 /tmp/logs/commitlog/CommitLog-1308947888397.log.header
 Keyspace Unknown:
   Cf id 1001: 57595560
   Cf id 1: 816960
   Cf id 1000: 0
 {noformat}
 CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont 
 have it now, but IIRC CFStats showed the LocationInfo CF with dirty ops. 
 I was able to repo a case where flushing the CF's did not mark the log 
 segments as obsolete (attached unit-test patch). Steps are:
 1. Write to cf1 and flush.
 2. Current log segment is marked as dirty at the CL 

[jira] [Created] (CASSANDRA-2981) Provide Hadoop read access to Counter Columns.

2011-07-31 Thread Aaron Morton (JIRA)
Provide Hadoop read access to Counter Columns.
--

 Key: CASSANDRA-2981
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2981
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor


o.a.c.Hadoop.ColumnFamilyRecordReader does not test for counter columns, which 
are different objects in the ColumnOrSuperColumn struct. Currently it raises an 
error as it thinks it's a super column 

{code:java}
2011-07-26 17:23:34,376 ERROR CliDriver (SessionState.java:printError(343)) - 
Failed with exception java.io.IOException:java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.NullPointerException
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.unthriftifySuper(ColumnFamilyRecordReader.java:303)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.unthriftify(ColumnFamilyRecordReader.java:297)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:288)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:177)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:136)
at 
org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:153)
at 
org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:111)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:326)
... 10 more
{code}

My plan is to return o.a.c.db.CounterColumn objects just like the 
o.a.c.db.Column and SuperColumn that are returned.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2981) Provide Hadoop read access to Counter Columns.

2011-08-01 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2981:


Attachment: 0001-2981-hadoop-counters-input.patch

0001-2981-hadoop-counters-input.patch modifies the CFRR to turn CounterColumns 
returned through the thrift API into o.a.c.db.Column instances. 

Could not use the CounterColumn as the CounterContext needs to read the node 
ID, and this requires the StorageService to be running and access to 
cassandra.yaml.

It's not great, but the full CounterColumn should not be needed as Hadoop is 
read only access. Let me know it's too hacky.

Also added another test to the hadoop_word_count example that sums the counter 
columns in a row.  

 Provide Hadoop read access to Counter Columns.
 --

 Key: CASSANDRA-2981
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2981
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-2981-hadoop-counters-input.patch


 o.a.c.Hadoop.ColumnFamilyRecordReader does not test for counter columns, 
 which are different objects in the ColumnOrSuperColumn struct. Currently it 
 raises an error as it thinks it's a super column 
 {code:java}
 2011-07-26 17:23:34,376 ERROR CliDriver (SessionState.java:printError(343)) - 
 Failed with exception java.io.IOException:java.lang.NullPointerException
 java.io.IOException: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)
   at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:133)
   at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1114)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.NullPointerException
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.unthriftifySuper(ColumnFamilyRecordReader.java:303)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.unthriftify(ColumnFamilyRecordReader.java:297)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:288)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:177)
   at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
   at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:136)
   at 
 org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:153)
   at 
 org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:111)
   at 
 org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:326)
   ... 10 more
 {code}
 My plan is to return o.a.c.db.CounterColumn objects just like the 
 o.a.c.db.Column and SuperColumn that are returned.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2984) column_metadata used with LongType comparator causes migration to fail

2011-08-02 Thread Aaron Morton (JIRA)
column_metadata used with LongType comparator causes migration to fail 
---

 Key: CASSANDRA-2984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2984
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.2
Reporter: Aaron Morton
Priority: Minor


see http://www.mail-archive.com/user@cassandra.apache.org/msg15863.html

Running the create column family in the email works, but if the migration needs 
to be inflated (on restart or schema propagation) it fails with:

{code:java}
ERROR 21:41:26,876 Exception encountered during startup.
org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 5
at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:72)
at 
org.apache.cassandra.config.CFMetaData.getDefaultIndexName(CFMetaData.java:973)
at org.apache.cassandra.config.CFMetaData.inflate(CFMetaData.java:381)
at org.apache.cassandra.config.KSMetaData.inflate(KSMetaData.java:172)
at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:99)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:486)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:166)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:342)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:91)
{code}

CFMetaData.inflate() does not check if a index was defined like 
addDefaultIndexNames() does. 

I this case the underlying problem is that the column meta data includes col 
names that are invalid.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2984) column_metadata used with LongType comparator causes migration to fail

2011-08-02 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2984:


Description: 
see http://www.mail-archive.com/user@cassandra.apache.org/msg15863.html

Running the create column family in the email works, but if the migration needs 
to be inflated (on restart or schema propagation) it fails with:

{code:java}
ERROR 21:41:26,876 Exception encountered during startup.
org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 5
at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:72)
at 
org.apache.cassandra.config.CFMetaData.getDefaultIndexName(CFMetaData.java:973)
at org.apache.cassandra.config.CFMetaData.inflate(CFMetaData.java:381)
at org.apache.cassandra.config.KSMetaData.inflate(KSMetaData.java:172)
at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:99)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:486)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:166)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:342)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:91)
{code}

CFMetaData.inflate() is using the wrong comparator when creating the default 
index name, it should check for super CF. Also it does not check if a index was 
defined like addDefaultIndexNames() does. 
  

  was:
see http://www.mail-archive.com/user@cassandra.apache.org/msg15863.html

Running the create column family in the email works, but if the migration needs 
to be inflated (on restart or schema propagation) it fails with:

{code:java}
ERROR 21:41:26,876 Exception encountered during startup.
org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 5
at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:72)
at 
org.apache.cassandra.config.CFMetaData.getDefaultIndexName(CFMetaData.java:973)
at org.apache.cassandra.config.CFMetaData.inflate(CFMetaData.java:381)
at org.apache.cassandra.config.KSMetaData.inflate(KSMetaData.java:172)
at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:99)
at 
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:486)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:166)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:342)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:91)
{code}

CFMetaData.inflate() does not check if a index was defined like 
addDefaultIndexNames() does. 

I this case the underlying problem is that the column meta data includes col 
names that are invalid.  


 column_metadata used with LongType comparator causes migration to fail 
 ---

 Key: CASSANDRA-2984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2984
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.2
Reporter: Aaron Morton
Priority: Minor

 see http://www.mail-archive.com/user@cassandra.apache.org/msg15863.html
 Running the create column family in the email works, but if the migration 
 needs to be inflated (on restart or schema propagation) it fails with:
 {code:java}
 ERROR 21:41:26,876 Exception encountered during startup.
 org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 5
   at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:72)
   at 
 org.apache.cassandra.config.CFMetaData.getDefaultIndexName(CFMetaData.java:973)
   at org.apache.cassandra.config.CFMetaData.inflate(CFMetaData.java:381)
   at org.apache.cassandra.config.KSMetaData.inflate(KSMetaData.java:172)
   at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:99)
   at 
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:486)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:166)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:342)
   at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:91)
 {code}
 CFMetaData.inflate() is using the wrong comparator when creating the default 
 index name, it should check for super CF. Also it does not check if a index 
 was defined like addDefaultIndexNames() does. 
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2984) column_metadata used with LongType comparator and UTF8 sub_comparator causes migration to fail

2011-08-02 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2984:


Summary: column_metadata used with LongType comparator and UTF8 
sub_comparator causes migration to fail   (was: column_metadata used with 
LongType comparator causes migration to fail )

 column_metadata used with LongType comparator and UTF8 sub_comparator causes 
 migration to fail 
 ---

 Key: CASSANDRA-2984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2984
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.2
Reporter: Aaron Morton
Priority: Minor

 see http://www.mail-archive.com/user@cassandra.apache.org/msg15863.html
 Running the create column family in the email works, but if the migration 
 needs to be inflated (on restart or schema propagation) it fails with:
 {code:java}
 ERROR 21:41:26,876 Exception encountered during startup.
 org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 5
   at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:72)
   at 
 org.apache.cassandra.config.CFMetaData.getDefaultIndexName(CFMetaData.java:973)
   at org.apache.cassandra.config.CFMetaData.inflate(CFMetaData.java:381)
   at org.apache.cassandra.config.KSMetaData.inflate(KSMetaData.java:172)
   at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:99)
   at 
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:486)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:166)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:342)
   at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:91)
 {code}
 CFMetaData.inflate() is using the wrong comparator when creating the default 
 index name, it should check for super CF. Also it does not check if a index 
 was defined like addDefaultIndexNames() does. 
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-2984) column_metadata used with LongType comparator and UTF8 sub_comparator causes migration to fail

2011-08-02 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton reassigned CASSANDRA-2984:
---

Assignee: Aaron Morton

 column_metadata used with LongType comparator and UTF8 sub_comparator causes 
 migration to fail 
 ---

 Key: CASSANDRA-2984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2984
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.2
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor

 see http://www.mail-archive.com/user@cassandra.apache.org/msg15863.html
 Running the create column family in the email works, but if the migration 
 needs to be inflated (on restart or schema propagation) it fails with:
 {code:java}
 ERROR 21:41:26,876 Exception encountered during startup.
 org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 5
   at org.apache.cassandra.db.marshal.LongType.getString(LongType.java:72)
   at 
 org.apache.cassandra.config.CFMetaData.getDefaultIndexName(CFMetaData.java:973)
   at org.apache.cassandra.config.CFMetaData.inflate(CFMetaData.java:381)
   at org.apache.cassandra.config.KSMetaData.inflate(KSMetaData.java:172)
   at org.apache.cassandra.db.DefsTable.loadFromStorage(DefsTable.java:99)
   at 
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:486)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:166)
   at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:342)
   at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:91)
 {code}
 CFMetaData.inflate() is using the wrong comparator when creating the default 
 index name, it should check for super CF. Also it does not check if a index 
 was defined like addDefaultIndexNames() does. 
   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-08-17 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086203#comment-13086203
 ] 

Aaron Morton commented on CASSANDRA-2268:
-

Yeah sorry, will try to find some time next week. 

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.5

 Attachments: 0001-2268-wip.patch


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3047) implementations of IPartitioner.describeOwnership() are not DC aware

2011-08-17 Thread Aaron Morton (JIRA)
implementations of IPartitioner.describeOwnership() are not DC aware


 Key: CASSANDRA-3047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Aaron Morton
Priority: Trivial


see http://www.mail-archive.com/user@cassandra.apache.org/msg16375.html

When a cluster the multiple rings approach to tokens the output from nodetool 
ring is incorrect.

When it uses the interleaved token approach (e.g. dc1, dc2, dc1, dc2) it will 
be correct. 

It's a bit hacky but could we special case (RP) tokens that are off by 1 and 
calculate the ownership per dc ? I guess another approach would be to add some 
parameters so the partitioner can be told about the token assignment strategy.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3061) Optionally skip log4j configuration

2011-08-18 Thread Aaron Morton (JIRA)
Optionally skip log4j configuration
---

 Key: CASSANDRA-3061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3061
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor


from this thread 
http://groups.google.com/group/brisk-users/browse_thread/thread/3a18f4679673bea8

When brisk accesses cassandra classes inside of a Hadoop Task JVM the 
AbstractCassandraDaemon uses a log4j PropertyConfigurator to setup cassandra 
logging. This closes all the existing appenders, including the TaskLogAppender 
for the hadoop task. They are not opened again because they are not in the 
config. 

log4j has Logger Repositories to handle multiple configs in the same process, 
but there is a bit of suck involved in making a RepositorySelector. 

Two examples...
http://www.mail-archive.com/log4j-dev@jakarta.apache.org/msg02972.html
http://docs.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/4.2/html/Getting_Started_Guide/logging.log4j.reposelect.html

Basically all the selector has access to thread local storage, and it looks 
like normally people get the class loader from the current thread. A thread 
will inherit it's class loader from the thread that created it, unless 
otherwise specified. 

We have code in the same thread the uses hadoop and cassandra classes, so this 
could be a dead end.  

As a work around i've added cassandra.log4j.configure JVM param and made the 
AbstractCassandraServer skip the log4j config if it's false. My job completes 
and I can see the cassandra code logging an extra message I put in into the 
Hadoop task log file...

2011-08-19 15:56:06,442 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: 
Metrics system not started: Cannot locate configuration: tried 
hadoop-metrics2-maptask.properties, hadoop-metrics2.properties
2011-08-19 15:56:06,776 INFO 
org.apache.cassandra.service.AbstractCassandraDaemon: Logging initialized 
externally
2011-08-19 15:56:07,332 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
 
The param has to be passed to the task JVM, so need to modify Haddop 
mapred-site.xml as follows 

property
  namemapred.child.java.opts/name
  value-Xmx256m -Dcassandra.log4j.configure=false/value
  description
Tune your mapred jvm arguments for best performance. 
Also see documentation from jvm vendor.
  /description
/property

It's not pretty but it works. In my extra log4j logging I can see the second 
reset() call is gone.  


Change the to Hadoop TaskLogAppender also stops the NPE but there may also be 
some lost log messages 
https://issues.apache.org/jira/browse/HADOOP-7556

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3061) Optionally skip log4j configuration

2011-08-18 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-3061:


Attachment: 0001-3061.patch

 Optionally skip log4j configuration
 ---

 Key: CASSANDRA-3061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3061
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Attachments: 0001-3061.patch


 from this thread 
 http://groups.google.com/group/brisk-users/browse_thread/thread/3a18f4679673bea8
 When brisk accesses cassandra classes inside of a Hadoop Task JVM the 
 AbstractCassandraDaemon uses a log4j PropertyConfigurator to setup cassandra 
 logging. This closes all the existing appenders, including the 
 TaskLogAppender for the hadoop task. They are not opened again because they 
 are not in the config. 
 log4j has Logger Repositories to handle multiple configs in the same process, 
 but there is a bit of suck involved in making a RepositorySelector. 
 Two examples...
 http://www.mail-archive.com/log4j-dev@jakarta.apache.org/msg02972.html
 http://docs.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/4.2/html/Getting_Started_Guide/logging.log4j.reposelect.html
 Basically all the selector has access to thread local storage, and it looks 
 like normally people get the class loader from the current thread. A thread 
 will inherit it's class loader from the thread that created it, unless 
 otherwise specified. 
 We have code in the same thread the uses hadoop and cassandra classes, so 
 this could be a dead end.  
 As a work around i've added cassandra.log4j.configure JVM param and made the 
 AbstractCassandraServer skip the log4j config if it's false. My job completes 
 and I can see the cassandra code logging an extra message I put in into the 
 Hadoop task log file...
 2011-08-19 15:56:06,442 WARN 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Metrics system not 
 started: Cannot locate configuration: tried 
 hadoop-metrics2-maptask.properties, hadoop-metrics2.properties
 2011-08-19 15:56:06,776 INFO 
 org.apache.cassandra.service.AbstractCassandraDaemon: Logging initialized 
 externally
 2011-08-19 15:56:07,332 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
  
 The param has to be passed to the task JVM, so need to modify Haddop 
 mapred-site.xml as follows 
 property
   namemapred.child.java.opts/name
   value-Xmx256m -Dcassandra.log4j.configure=false/value
   description
 Tune your mapred jvm arguments for best performance. 
 Also see documentation from jvm vendor.
   /description
 /property
 It's not pretty but it works. In my extra log4j logging I can see the second 
 reset() call is gone.  
 Change the to Hadoop TaskLogAppender also stops the NPE but there may also be 
 some lost log messages 
 https://issues.apache.org/jira/browse/HADOOP-7556

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3061) Optionally skip log4j configuration

2011-08-22 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089090#comment-13089090
 ] 

Aaron Morton commented on CASSANDRA-3061:
-

Thanks, will try it out in the next few days. 

 Optionally skip log4j configuration
 ---

 Key: CASSANDRA-3061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3061
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.5

 Attachments: 0001-3061.patch, 3061_v2.txt


 from this thread 
 http://groups.google.com/group/brisk-users/browse_thread/thread/3a18f4679673bea8
 When brisk accesses cassandra classes inside of a Hadoop Task JVM the 
 AbstractCassandraDaemon uses a log4j PropertyConfigurator to setup cassandra 
 logging. This closes all the existing appenders, including the 
 TaskLogAppender for the hadoop task. They are not opened again because they 
 are not in the config. 
 log4j has Logger Repositories to handle multiple configs in the same process, 
 but there is a bit of suck involved in making a RepositorySelector. 
 Two examples...
 http://www.mail-archive.com/log4j-dev@jakarta.apache.org/msg02972.html
 http://docs.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/4.2/html/Getting_Started_Guide/logging.log4j.reposelect.html
 Basically all the selector has access to thread local storage, and it looks 
 like normally people get the class loader from the current thread. A thread 
 will inherit it's class loader from the thread that created it, unless 
 otherwise specified. 
 We have code in the same thread the uses hadoop and cassandra classes, so 
 this could be a dead end.  
 As a work around i've added cassandra.log4j.configure JVM param and made the 
 AbstractCassandraServer skip the log4j config if it's false. My job completes 
 and I can see the cassandra code logging an extra message I put in into the 
 Hadoop task log file...
 2011-08-19 15:56:06,442 WARN 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Metrics system not 
 started: Cannot locate configuration: tried 
 hadoop-metrics2-maptask.properties, hadoop-metrics2.properties
 2011-08-19 15:56:06,776 INFO 
 org.apache.cassandra.service.AbstractCassandraDaemon: Logging initialized 
 externally
 2011-08-19 15:56:07,332 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
  
 The param has to be passed to the task JVM, so need to modify Haddop 
 mapred-site.xml as follows 
 property
   namemapred.child.java.opts/name
   value-Xmx256m -Dcassandra.log4j.configure=false/value
   description
 Tune your mapred jvm arguments for best performance. 
 Also see documentation from jvm vendor.
   /description
 /property
 It's not pretty but it works. In my extra log4j logging I can see the second 
 reset() call is gone.  
 Change the to Hadoop TaskLogAppender also stops the NPE but there may also be 
 some lost log messages 
 https://issues.apache.org/jira/browse/HADOOP-7556

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2268) CQL-enabled stress.java

2011-08-30 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093676#comment-13093676
 ] 

Aaron Morton commented on CASSANDRA-2268:
-

Sorry I'm getting totally dominated by a project that looks like it may go on 
until next week. I may get some time but I cannot say for sure. 

If anyone else would like to take it please do.

 CQL-enabled stress.java
 ---

 Key: CASSANDRA-2268
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2268
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Eric Evans
Assignee: Aaron Morton
Priority: Minor
  Labels: cql
 Fix For: 0.8.5

 Attachments: 0001-2268-wip.patch


 It would be great if stress.java had a CQL mode.  For making the inevitable 
 RPC-CQL comparisons, but also as a basis for measuring optimizations, and 
 spotting performance regressions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3061) Optionally skip log4j configuration

2011-08-30 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094237#comment-13094237
 ] 

Aaron Morton commented on CASSANDRA-3061:
-

I tried to run it with the current cassandra-0.8 branch and brisk beta 2 but 
ran into this problem 
http://groups.google.com/group/brisk-users/browse_thread/thread/75c9f39d4c1859a9

What I can see looks good to me though. 

 Optionally skip log4j configuration
 ---

 Key: CASSANDRA-3061
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3061
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8.5

 Attachments: 0001-3061.patch, 3061_v2.txt


 from this thread 
 http://groups.google.com/group/brisk-users/browse_thread/thread/3a18f4679673bea8
 When brisk accesses cassandra classes inside of a Hadoop Task JVM the 
 AbstractCassandraDaemon uses a log4j PropertyConfigurator to setup cassandra 
 logging. This closes all the existing appenders, including the 
 TaskLogAppender for the hadoop task. They are not opened again because they 
 are not in the config. 
 log4j has Logger Repositories to handle multiple configs in the same process, 
 but there is a bit of suck involved in making a RepositorySelector. 
 Two examples...
 http://www.mail-archive.com/log4j-dev@jakarta.apache.org/msg02972.html
 http://docs.redhat.com/docs/en-US/JBoss_Enterprise_Application_Platform/4.2/html/Getting_Started_Guide/logging.log4j.reposelect.html
 Basically all the selector has access to thread local storage, and it looks 
 like normally people get the class loader from the current thread. A thread 
 will inherit it's class loader from the thread that created it, unless 
 otherwise specified. 
 We have code in the same thread the uses hadoop and cassandra classes, so 
 this could be a dead end.  
 As a work around i've added cassandra.log4j.configure JVM param and made the 
 AbstractCassandraServer skip the log4j config if it's false. My job completes 
 and I can see the cassandra code logging an extra message I put in into the 
 Hadoop task log file...
 2011-08-19 15:56:06,442 WARN 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Metrics system not 
 started: Cannot locate configuration: tried 
 hadoop-metrics2-maptask.properties, hadoop-metrics2.properties
 2011-08-19 15:56:06,776 INFO 
 org.apache.cassandra.service.AbstractCassandraDaemon: Logging initialized 
 externally
 2011-08-19 15:56:07,332 INFO org.apache.hadoop.mapred.MapTask: 
 numReduceTasks: 0
  
 The param has to be passed to the task JVM, so need to modify Haddop 
 mapred-site.xml as follows 
 property
   namemapred.child.java.opts/name
   value-Xmx256m -Dcassandra.log4j.configure=false/value
   description
 Tune your mapred jvm arguments for best performance. 
 Also see documentation from jvm vendor.
   /description
 /property
 It's not pretty but it works. In my extra log4j logging I can see the second 
 reset() call is gone.  
 Change the to Hadoop TaskLogAppender also stops the NPE but there may also be 
 some lost log messages 
 https://issues.apache.org/jira/browse/HADOOP-7556

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4601) Ensure unique commit log file names

2012-09-02 Thread Aaron Morton (JIRA)
Aaron Morton created CASSANDRA-4601:
---

 Summary: Ensure unique commit log file names
 Key: CASSANDRA-4601
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Critical


The commit log segment name uses System.nanoTime() as part of the file name. 
There is no guarantee that successive calls to nanoTime() will return different 
values. And on less than optimal hypervisors this happens a lot. 

I observed the following in the wild:

{code:java}
ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.lang.AssertionError: attempted to delete non-existing file 
CommitLog-13926764209796414.log
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Unknown Source)
{code}

My _assumption_ is that it was because of duplicate file names. As this is on a 
hypervisor that is less than optimal. 
 
After a while (about 30 minutes) mutations stopped being processed and the 
pending count sky rocketed. I _think_ this was because log writing was blocked 
trying to get a new segment and writers could not submit to the commit log 
queue. The only way to stop the affected nodes was kill -9. 

Over about 24 hours this happened 5 times. I have deployed a patch that has 
been running for 12 hours without incident, will attach. 

The affected nodes could still read, and I'm checking logs to see how the other 
nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-4601) Ensure unique commit log file names

2012-09-02 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-4601:


Affects Version/s: 0.8.10
   1.0.11
   1.1.4

 Ensure unique commit log file names
 ---

 Key: CASSANDRA-4601
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.10, 1.0.11, 1.1.4
 Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Critical

 The commit log segment name uses System.nanoTime() as part of the file name. 
 There is no guarantee that successive calls to nanoTime() will return 
 different values. And on less than optimal hypervisors this happens a lot. 
 I observed the following in the wild:
 {code:java}
 ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.lang.AssertionError: attempted to delete non-existing file 
 CommitLog-13926764209796414.log
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Unknown Source)
 {code}
 My _assumption_ is that it was because of duplicate file names. As this is on 
 a hypervisor that is less than optimal. 
  
 After a while (about 30 minutes) mutations stopped being processed and the 
 pending count sky rocketed. I _think_ this was because log writing was 
 blocked trying to get a new segment and writers could not submit to the 
 commit log queue. The only way to stop the affected nodes was kill -9. 
 Over about 24 hours this happened 5 times. I have deployed a patch that has 
 been running for 12 hours without incident, will attach. 
 The affected nodes could still read, and I'm checking logs to see how the 
 other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-4601) Ensure unique commit log file names

2012-09-02 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-4601:


Attachment: cassandra-1.1-4601.patch

 Ensure unique commit log file names
 ---

 Key: CASSANDRA-4601
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.10, 1.0.11, 1.1.4
 Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Critical
 Attachments: cassandra-1.1-4601.patch


 The commit log segment name uses System.nanoTime() as part of the file name. 
 There is no guarantee that successive calls to nanoTime() will return 
 different values. And on less than optimal hypervisors this happens a lot. 
 I observed the following in the wild:
 {code:java}
 ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.lang.AssertionError: attempted to delete non-existing file 
 CommitLog-13926764209796414.log
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Unknown Source)
 {code}
 My _assumption_ is that it was because of duplicate file names. As this is on 
 a hypervisor that is less than optimal. 
  
 After a while (about 30 minutes) mutations stopped being processed and the 
 pending count sky rocketed. I _think_ this was because log writing was 
 blocked trying to get a new segment and writers could not submit to the 
 commit log queue. The only way to stop the affected nodes was kill -9. 
 Over about 24 hours this happened 5 times. I have deployed a patch that has 
 been running for 12 hours without incident, will attach. 
 The affected nodes could still read, and I'm checking logs to see how the 
 other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-4602) Stack Size on Sun JVM 1.6.0_33 must be at least 160k

2012-09-02 Thread Aaron Morton (JIRA)
Aaron Morton created CASSANDRA-4602:
---

 Summary: Stack Size on Sun JVM 1.6.0_33 must be at least 160k
 Key: CASSANDRA-4602
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4602
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.4
 Environment: Ubuntu 10.04 
java version 1.6.0_35
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
Reporter: Aaron Morton


I started a fresh Cassandra 1.1.4 install with Sun JVM 1.6.35.

On startup I got this in output.log

{noformat}
The stack size specified is too small, Specify at least 160k
Cannot create Java VM
Service exit with a return value of 1
{noformat}

Remembering CASSANDRA-4275 I monkeyed around and started the JVM with -Xss160k 
the same as Java 7. I then got

{code:java}
ERROR [WRITE-/192.168.1.12] 2012-08-31 01:43:29,865 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[WRITE-/192.168.1.12,5,main]
java.lang.StackOverflowError
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(Unknown Source)
at java.net.SocketOutputStream.write(Unknown Source)
at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
at java.io.BufferedOutputStream.flush(Unknown Source)
at java.io.DataOutputStream.flush(Unknown Source)
at 
org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:156)
at 
org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:126)
{code}

Same as CASSANDRA-4442

At which point I dropped back to Java 6.33. 

CASSANDRA-4457 bumped the stack size to 180 for java 7, should we also do this 
for Java 6.33+ ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4601) Ensure unique commit log file names

2012-09-04 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448022#comment-13448022
 ] 

Aaron Morton commented on CASSANDRA-4601:
-

Thanks

 Ensure unique commit log file names
 ---

 Key: CASSANDRA-4601
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4601
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.0
 Environment: Sun JVM 1.6.33 / Ubuntu 10.04.4 LTS 
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Critical
 Fix For: 1.1.5

 Attachments: cassandra-1.1-4601.patch


 The commit log segment name uses System.nanoTime() as part of the file name. 
 There is no guarantee that successive calls to nanoTime() will return 
 different values. And on less than optimal hypervisors this happens a lot. 
 I observed the following in the wild:
 {code:java}
 ERROR [COMMIT-LOG-ALLOCATOR] 2012-08-31 15:56:49,815 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.lang.AssertionError: attempted to delete non-existing file 
 CommitLog-13926764209796414.log
 at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:68)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Unknown Source)
 {code}
 My _assumption_ is that it was because of duplicate file names. As this is on 
 a hypervisor that is less than optimal. 
  
 After a while (about 30 minutes) mutations stopped being processed and the 
 pending count sky rocketed. I _think_ this was because log writing was 
 blocked trying to get a new segment and writers could not submit to the 
 commit log queue. The only way to stop the affected nodes was kill -9. 
 Over about 24 hours this happened 5 times. I have deployed a patch that has 
 been running for 12 hours without incident, will attach. 
 The affected nodes could still read, and I'm checking logs to see how the 
 other nodes handled the situation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-4626) Multiple values for CurrentLocal Node ID

2012-09-06 Thread Aaron Morton (JIRA)
Aaron Morton created CASSANDRA-4626:
---

 Summary: Multiple values for CurrentLocal Node ID
 Key: CASSANDRA-4626
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4626
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.11
Reporter: Aaron Morton


From this email thread 
http://www.mail-archive.com/user@cassandra.apache.org/msg24677.html

There are multiple columns for the CurrentLocal row in NodeIdInfo:

{noformat}

[default@system] list NodeIdInfo ;
Using default limit of 100
...
---
RowKey: 43757272656e744c6f63616c
= (column=01efa5d0-e133-11e1--51be601cd0ff, value=0a1020d2, 
timestamp=1344414498989)
= (column=92109b80-ea0a-11e1--51be601cd0af, value=0a1020d2, 
timestamp=1345386691897)
{noformat}

SystemTable.getCurrentLocalNodeId() throws an assertion that occurs when the 
static constructor for o.a.c.utils.NodeId is in the stack.

The impact is a java.lang.NoClassDefFoundError when accessing a particular CF 
(I assume on with counters) on a particular node.

Cannot see an obvious cause in the code. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2293) Rewrite nodetool help

2012-09-10 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13451846#comment-13451846
 ] 

Aaron Morton commented on CASSANDRA-2293:
-

LGTM Committed on trunk for 1.2 only

Thanks :)

 Rewrite nodetool help
 -

 Key: CASSANDRA-2293
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2293
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Documentation  website
Affects Versions: 0.8 beta 1
Reporter: Aaron Morton
Assignee: Jason Brown
Priority: Minor
 Fix For: 1.2.0

 Attachments: 0001-Jira-CASSANDRA-2293-Rewrite-nodetool-help.patch, 
 0002-Jira-CASSANDRA-2293-Rewrite-nodetool-help.patch


 Once CASSANDRA-2008 is through and we are happy with the approach I would 
 like to write similar help for nodetool. 
 Both command line help of the form nodetool help and nodetool help 
 command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2008) CLI help incorrect in places

2011-02-24 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12998749#comment-12998749
 ] 

Aaron Morton commented on CASSANDRA-2008:
-

Should be done in the next day or two. In line with the recent discussions on 
reducing what goes into the point released I've been writing it against the 0.8 
trunk.

Earthquake has sapped my energy this week :( 


 CLI help incorrect in places
 

 Key: CASSANDRA-2008
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2008
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.7.3


 Found some errors in the CLI help, such as these for create column family.
 - memtable_operations: Flush memtables after this many operations
 - memtable_throughput: ... or after this many bytes have been written
 - memtable_flush_after: ... or after this many seconds
 Should be millions of ops, MB's written and minutes not seconds.  Have 
 confirmed thats how the values are used. Will check all the help. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2008) CLI help incorrect in places

2011-02-27 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2008:


Attachment: 2007.txt

 CLI help incorrect in places
 

 Key: CASSANDRA-2008
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2008
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.8

 Attachments: 2007.txt


 Found some errors in the CLI help, such as these for create column family.
 - memtable_operations: Flush memtables after this many operations
 - memtable_throughput: ... or after this many bytes have been written
 - memtable_flush_after: ... or after this many seconds
 Should be millions of ops, MB's written and minutes not seconds.  Have 
 confirmed thats how the values are used. Will check all the help. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2007) Move demo Keyspace1 definition from casandra.yaml to an input file for cassandra-cli

2011-03-06 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2007:


Affects Version/s: (was: 0.7.0)
   0.8
Fix Version/s: (was: 0.7.4)
   0.8

 Move demo Keyspace1 definition from casandra.yaml to an input file for 
 cassandra-cli
 

 Key: CASSANDRA-2007
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2007
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.8

 Attachments: 2007-1.patch, 2007-2.patch


 Th suggested way to make schema changes is through cassandra-cli but we do 
 not have an example of how to do it. Additionally, to get the demo keyspace 
 created users have to use a different process. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2008) CLI help incorrect in places

2011-03-06 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2008:


Attachment: 2008-2.patch

Second patch includes all the stuff from first patch and adds CounterColumnType 
to the help.

Against the 0.8 trunk

 CLI help incorrect in places
 

 Key: CASSANDRA-2008
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2008
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.8

 Attachments: 2007.txt, 2008-2.patch


 Found some errors in the CLI help, such as these for create column family.
 - memtable_operations: Flush memtables after this many operations
 - memtable_throughput: ... or after this many bytes have been written
 - memtable_flush_after: ... or after this many seconds
 Should be millions of ops, MB's written and minutes not seconds.  Have 
 confirmed thats how the values are used. Will check all the help. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Assigned: (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

2011-03-07 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton reassigned CASSANDRA-2088:
---

Assignee: Aaron Morton

 Temp files for failed compactions/streaming not cleaned up
 --

 Key: CASSANDRA-2088
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Stu Hood
Assignee: Aaron Morton
 Fix For: 0.7.4


 From separate reports, compaction and repair are currently missing 
 opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2221) 'show create' commands on the CLI to export schema

2011-03-08 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004196#comment-13004196
 ] 

Aaron Morton commented on CASSANDRA-2221:
-

Sounds reasonable, will take a look. 


 'show create' commands on the CLI to export schema
 --

 Key: CASSANDRA-2221
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2221
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Aaron Morton
Priority: Minor
  Labels: cli
 Fix For: 0.7.4


 It would be nice to have 'show create' type of commands on the command-line 
 so that it would generate the DDL for the schema.
 A scenario that would make this useful is where a team works out a data model 
 over time with a dev cluster.  They want to use parts of that schema for new 
 clusters that they create, like a staging/prod cluster.  It would be very 
 handy in this scenario to have some sort of export mechanism.
 Another use case is for testing purposes - you want to replicate a problem.
 We currently have schematool for import/export but that is deprecated and it 
 exports into yaml.
 This new feature would just be able to 'show' - or export if they want the 
 entire keyspace - into a script or commands that could be used in a cli 
 script.  It would need to be able to regenerate everything about the keyspace 
 including indexes and metadata.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CASSANDRA-2293) Rewrite nodetool help

2011-03-08 Thread Aaron Morton (JIRA)
Rewrite nodetool help
-

 Key: CASSANDRA-2293
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2293
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Documentation  website
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8


Once CASSANDRA-2008 is through and we are happy with the approach I would like 
to write similar help for nodetool. 

Both command line help of the form nodetool help and nodetool help command 
and make a better wiki page. 



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2008) CLI help incorrect in places

2011-03-08 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2008:


Attachment: 2008-3.patch

2008-2.patch used the wrong target in build.xml when copying resources into the 
out path. I missed that these are now copied to ${build.classes.main}

Also fixed typo; fixed an error in the help for drop keyspace / column family; 
and fixed some error messages to say help; rather than help

2008-3.patch is rebased against current trunk and includes all the work from 
the previous patches. 

 CLI help incorrect in places
 

 Key: CASSANDRA-2008
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2008
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.8

 Attachments: 2007.txt, 2008-2.patch, 2008-3.patch


 Found some errors in the CLI help, such as these for create column family.
 - memtable_operations: Flush memtables after this many operations
 - memtable_throughput: ... or after this many bytes have been written
 - memtable_flush_after: ... or after this many seconds
 Should be millions of ops, MB's written and minutes not seconds.  Have 
 confirmed thats how the values are used. Will check all the help. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2290) Repair hangs if one of the neighbor is dead

2011-03-09 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004707#comment-13004707
 ] 

Aaron Morton commented on CASSANDRA-2290:
-

Not sure if this helps. I found a place where AES was hanging while testing 
failure during streaming transfer for CASSANDRA-2088. I broke the 
FileStresmTask to only send one range and close the sending channel. 

The  IncomingStreamReader.readFile() got stuck in an infinite loop because it 
does not check the return from FileChannel.transferFrom(). It was returning 0 
bytes read. Also the FileStreamTask does not check the bytes sent by 
transferTo()

While stuck in the loop the socket it was reading from was (127.0.0.1 was in 
the loop, .0.2 was sending) 
java  25371 aaron   73u  IPv4 0xff8010742ff8  0t0  TCP 
127.0.0.1:7000-127.0.0.2:52759 (CLOSE_WAIT)

When I was debugging the socketChannel was still reporting it was open. 

 Repair hangs if one of the neighbor is dead
 ---

 Key: CASSANDRA-2290
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.4

 Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Repair don't cope well with dead/dying neighbors. There is 2 problems:
   # Repair don't check if a node is dead before sending a TreeRequest; this 
 is easily fixable.
   # If a neighbor dies mid-repair, the repair will also hang forever.
 The second point is not easy to deal with. The best approach is probably 
 CASSANDRA-1740 however. That is, if we add a way to query the state of a 
 repair, and that this query correctly check all neighbors and also add a way 
 to cancel a repair, this would probably be enough.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Issue Comment Edited: (CASSANDRA-2290) Repair hangs if one of the neighbor is dead

2011-03-09 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004707#comment-13004707
 ] 

Aaron Morton edited comment on CASSANDRA-2290 at 3/9/11 6:32 PM:
-

Not sure if this helps. I found a place where AES was hanging while testing 
failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the 
FileStresmTask to only send one range and close the sending channel. 

The  IncomingStreamReader.readFile() got stuck in an infinite loop because it 
does not check the return from FileChannel.transferFrom(). It was returning 0 
bytes read. Also the FileStreamTask does not check the bytes sent by 
transferTo()

While stuck in the loop the socket it was reading from was (127.0.0.1 was in 
the loop, .0.2 was sending) 
java  25371 aaron   73u  IPv4 0xff8010742ff8  0t0  TCP 
127.0.0.1:7000-127.0.0.2:52759 (CLOSE_WAIT)

When I was debugging the socketChannel was still reporting it was open. 

  was (Author: amorton):
Not sure if this helps. I found a place where AES was hanging while testing 
failure during streaming transfer for CASSANDRA-2088. I broke the 
FileStresmTask to only send one range and close the sending channel. 

The  IncomingStreamReader.readFile() got stuck in an infinite loop because it 
does not check the return from FileChannel.transferFrom(). It was returning 0 
bytes read. Also the FileStreamTask does not check the bytes sent by 
transferTo()

While stuck in the loop the socket it was reading from was (127.0.0.1 was in 
the loop, .0.2 was sending) 
java  25371 aaron   73u  IPv4 0xff8010742ff8  0t0  TCP 
127.0.0.1:7000-127.0.0.2:52759 (CLOSE_WAIT)

When I was debugging the socketChannel was still reporting it was open. 
  
 Repair hangs if one of the neighbor is dead
 ---

 Key: CASSANDRA-2290
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.4

 Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Repair don't cope well with dead/dying neighbors. There is 2 problems:
   # Repair don't check if a node is dead before sending a TreeRequest; this 
 is easily fixable.
   # If a neighbor dies mid-repair, the repair will also hang forever.
 The second point is not easy to deal with. The best approach is probably 
 CASSANDRA-1740 however. That is, if we add a way to query the state of a 
 repair, and that this query correctly check all neighbors and also add a way 
 to cancel a repair, this would probably be enough.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Issue Comment Edited: (CASSANDRA-2290) Repair hangs if one of the neighbor is dead

2011-03-09 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13004707#comment-13004707
 ] 

Aaron Morton edited comment on CASSANDRA-2290 at 3/9/11 6:45 PM:
-

Not sure if this helps. I found a place where AES was hanging while testing 
failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the 
FileStresmTask to only send one range and close the sending channel. 

The  IncomingStreamReader.readFile() got stuck in an infinite loop because it 
does not check the return from FileChannel.transferFrom(). It was returning 0 
bytes read. Also the FileStreamTask does not check the bytes sent by 
transferTo()

While stuck in the loop the socket it was reading from was (127.0.0.1 was in 
the loop, .0.2 was sending) 
java  25371 aaron   73u  IPv4 0xff8010742ff8  0t0  TCP 
127.0.0.1:7000-127.0.0.2:52759 (CLOSE_WAIT)

When I was debugging the socketChannel was still reporting it was open. 

Update: Modified FileStresmTask to call System.exit() after sending the first 
section and got the same result.

  was (Author: amorton):
Not sure if this helps. I found a place where AES was hanging while testing 
failure during streaming transfer for CASSANDRA-2088 (against 0.7). I broke the 
FileStresmTask to only send one range and close the sending channel. 

The  IncomingStreamReader.readFile() got stuck in an infinite loop because it 
does not check the return from FileChannel.transferFrom(). It was returning 0 
bytes read. Also the FileStreamTask does not check the bytes sent by 
transferTo()

While stuck in the loop the socket it was reading from was (127.0.0.1 was in 
the loop, .0.2 was sending) 
java  25371 aaron   73u  IPv4 0xff8010742ff8  0t0  TCP 
127.0.0.1:7000-127.0.0.2:52759 (CLOSE_WAIT)

When I was debugging the socketChannel was still reporting it was open. 
  
 Repair hangs if one of the neighbor is dead
 ---

 Key: CASSANDRA-2290
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2290
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 0.7.4

 Attachments: 0001-Don-t-start-repair-if-a-neighbor-is-dead.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Repair don't cope well with dead/dying neighbors. There is 2 problems:
   # Repair don't check if a node is dead before sending a TreeRequest; this 
 is easily fixable.
   # If a neighbor dies mid-repair, the repair will also hang forever.
 The second point is not easy to deal with. The best approach is probably 
 CASSANDRA-1740 however. That is, if we add a way to query the state of a 
 repair, and that this query correctly check all neighbors and also add a way 
 to cancel a repair, this would probably be enough.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2009) Move relevant methods to ByteBufferUtil (and normalize on names)

2011-03-13 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006162#comment-13006162
 ] 

Aaron Morton commented on CASSANDRA-2009:
-

I just noticed using 0.8 using the FBUtilities.hexToBytes(source) in 
BytesType.fromString() means that in cassandra-cli the statement

set data['foo1']['bar']='baz';

will fail with 
org.apache.cassandra.db.marshal.MarshalException: cannot parse 'foo1' as hex 
bytes

It must now be
set data[ascii('foo1')]['bar']='baz';

Is this an unintended consequence or do we want that behavior in the cli?



 

 Move relevant methods to ByteBufferUtil (and normalize on names)
 

 Key: CASSANDRA-2009
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2009
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 0.7.1

 Attachments: 
 0001-Move-and-rename-byteBufferToInt-and-toByteBuffer-int.patch, 
 0002-Move-method-to-read-a-BB-in-BBUtil-an-rename-to-matc.patch, 
 0003-Move-inputStream-to-BBUtil-and-change-clone-to-dupli.patch, 
 0004-Move-bytesToHex-BB-to-BBUtil-and-create-BBUtil.hexTo.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 A number of methods are in FBUtilities while they more naturally belong to
 ByteBufferUtil. Moreover, their naming convention conflict with some of the
 method already moved to ByteBufferUtil.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-2280) Request specific column families using StreamIn

2011-03-13 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13006164#comment-13006164
 ] 

Aaron Morton commented on CASSANDRA-2280:
-

StreamOut.transferRangesForRequest() flushes all SSTables for the keyspace even 
if we know the CFs. Can it flush just the CF's it is sending? 

Although CompactionManager.doValidation() also forces the CF to flush, so it 
may bo not be necessary when streaming for repair. May still be necessary for 
StreamOut.transferRanges() as it is used during move and decomission. 

Otherwise no problems.
 
Jonathan has moved CASSANDRA-2088 to 0.8 because the counters make it difficult 
to share compaction code with 0.7. I'll now do that ticket on top of this one.  


 Request specific column families using StreamIn
 ---

 Key: CASSANDRA-2280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2280
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
 Fix For: 0.8

 Attachments: 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt


 StreamIn.requestRanges only specifies a keyspace, meaning that requesting a 
 range will request it for all column families: if you have a large number of 
 CFs, this can cause quite a headache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Created: (CASSANDRA-2328) Index predicate values used in get_indexed_slice() are not validated

2011-03-15 Thread Aaron Morton (JIRA)
Index predicate values used in get_indexed_slice() are not validated


 Key: CASSANDRA-2328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.3
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.5, 0.8


If a client makes a get_indexed_slice() request with malformed predicate values 
we get an assertion failing rather than InvalidRequestException.

{noformat}
ERROR 14:47:56,842 Error in ThreadPoolExecutor
java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:51)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
java:72)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.IndexOutOfBoundsException: 6
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
meUUIDType.java:56)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:45)
at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
a:29)
at org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
.java:1608)
at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
:1552)
at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
bHandler.java:42)
... 4 more
{noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (CASSANDRA-2328) Index predicate values used in get_indexed_slice() are not validated

2011-03-15 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2328:


Attachment: 0001-validate-index-predicate-name-and-value.patch

Attached patch validates the expression column name and value for 
get_indexed_slice(). 

Also adds a regression test in the (thrift) system tests.

 Index predicate values used in get_indexed_slice() are not validated
 

 Key: CASSANDRA-2328
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2328
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.3
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.5, 0.8

 Attachments: 0001-validate-index-predicate-name-and-value.patch


 If a client makes a get_indexed_slice() request with malformed predicate 
 values we get an assertion failing rather than InvalidRequestException.
 {noformat}
 ERROR 14:47:56,842 Error in ThreadPoolExecutor
 java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: 6
 at 
 org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
 bHandler.java:51)
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.
 java:72)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
 utor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
 .java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: java.lang.IndexOutOfBoundsException: 6
 at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
 at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(Ti
 meUUIDType.java:56)
 at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
 a:45)
 at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.jav
 a:29)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.satisfies(ColumnFamilyStore
 .java:1608)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java
 :1552)
 at 
 org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVer
 bHandler.java:42)
 ... 4 more
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2280) Request specific column families using StreamIn

2011-03-20 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009005#comment-13009005
 ] 

Aaron Morton commented on CASSANDRA-2280:
-

Cannot see any problems, good to go.

 Request specific column families using StreamIn
 ---

 Key: CASSANDRA-2280
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2280
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
 Fix For: 0.8

 Attachments: 
 0001-Allow-specific-column-families-to-be-requested-for-str.txt, 
 0002-Only-flush-matching-CFS.txt


 StreamIn.requestRanges only specifies a keyspace, meaning that requesting a 
 range will request it for all column families: if you have a large number of 
 CFs, this can cause quite a headache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2360) help in schema-sample uses wrong file name

2011-03-20 Thread Aaron Morton (JIRA)
help in schema-sample uses wrong file name
--

 Key: CASSANDRA-2360
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2360
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Priority: Trivial


As described in CASSANDRA-2007 

Wasn't sure about re-opening a resolved issue and wanted to make sure it was 
not lost. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2360) help in schema-sample uses wrong file name

2011-03-20 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2360:


Attachment: 0001-change-help-to-use-correct-file-name-conf-sample-sch.patch

 help in schema-sample uses wrong file name
 --

 Key: CASSANDRA-2360
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2360
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.7.5

 Attachments: 
 0001-change-help-to-use-correct-file-name-conf-sample-sch.patch


 As described in CASSANDRA-2007 
 Wasn't sure about re-opening a resolved issue and wanted to make sure it was 
 not lost. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2007) Move demo Keyspace1 definition from casandra.yaml to an input file for cassandra-cli

2011-03-20 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009007#comment-13009007
 ] 

Aaron Morton commented on CASSANDRA-2007:
-

Was not sure about re-opening a resolved issue, so created CASSANDRA-2360 

 Move demo Keyspace1 definition from casandra.yaml to an input file for 
 cassandra-cli
 

 Key: CASSANDRA-2007
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2007
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.0
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.7.4

 Attachments: 2007-1.patch, 2007-2.patch, 2007-ambitious.txt


 Th suggested way to make schema changes is through cassandra-cli but we do 
 not have an example of how to do it. Additionally, to get the demo keyspace 
 created users have to use a different process. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

2011-03-20 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13009010#comment-13009010
 ] 

Aaron Morton commented on CASSANDRA-2191:
-

Stu, as discussed on IRC I tried to have a look at this but it failed to apply 
against the current trunk.

{noformat}
aarons-MBP-2011:cassandra aaron$ git am 
patch/2191/0001-Add-a-compacting-set-to-sstabletracker.txt 
Applying: Add a `compacting` set to sstabletracker
error: patch failed: 
src/java/org/apache/cassandra/io/sstable/SSTableTracker.java:53
error: src/java/org/apache/cassandra/io/sstable/SSTableTracker.java: patch does 
not apply
Patch failed at 0001 Add a `compacting` set to sstabletracker
{noformat}

 Multithread across compaction buckets
 -

 Key: CASSANDRA-2191
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Critical
  Labels: compaction
 Fix For: 0.8

 Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 
 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt


 This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
 reasoning are different enough to open a separate issue.
 The problem with compactions currently is that they compact the set of 
 sstables that existed the moment the compaction started. This means that for 
 longer running compactions (even when running as fast as possible on the 
 hardware), a very large number of new sstables might be created in the 
 meantime. We have observed this proliferation of sstables killing performance 
 during major/high-bucketed compactions.
 One approach would be to pause compactions in upper buckets (containing 
 larger files) when compactions in lower buckets become possible. While this 
 would likely solve the problem with read performance, it does not actually 
 help us perform compaction any faster, which is a reasonable requirement for 
 other situations.
 Instead, we need to be able to perform any compactions that are currently 
 required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2363) cli sets RF to 1 when replica strategy is not specified

2011-03-22 Thread Aaron Morton (JIRA)
cli sets RF to 1 when replica strategy is not specified
---

 Key: CASSANDRA-2363
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2363
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Priority: Minor
 Fix For: 0.7.5, 0.8


If a keyspace is created via the cli with

{noformat}
create keyspace dev with replication_factor = 2;
{noformat}

It will be created using the NetworkTopologyStrategy and default options 

{noformat}
[default@dev] describe keyspace;
Keyspace: dev:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
Options: [datacenter1:1]
{noformat}

And the effective RF will be 1 not 2.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2363) cli sets RF to 1 when replica strategy is not specified

2011-03-22 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2363:


Affects Version/s: (was: 0.7.4)
   0.8
Fix Version/s: (was: 0.7.5)
 Assignee: Aaron Morton

 cli sets RF to 1 when replica strategy is not specified
 ---

 Key: CASSANDRA-2363
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2363
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8


 If a keyspace is created via the cli with
 {noformat}
 create keyspace dev with replication_factor = 2;
 {noformat}
 It will be created using the NetworkTopologyStrategy and default options 
 {noformat}
 [default@dev] describe keyspace;
 Keyspace: dev:
   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Options: [datacenter1:1]
 {noformat}
 And the effective RF will be 1 not 2.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2363) cli sets RF to 1 when replica strategy is not specified

2011-03-22 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2363:


Attachment: 0001-use-the-cluster-RF-for-the-default-DC-RF.patch

 cli sets RF to 1 when replica strategy is not specified
 ---

 Key: CASSANDRA-2363
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2363
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.8

 Attachments: 0001-use-the-cluster-RF-for-the-default-DC-RF.patch


 If a keyspace is created via the cli with
 {noformat}
 create keyspace dev with replication_factor = 2;
 {noformat}
 It will be created using the NetworkTopologyStrategy and default options 
 {noformat}
 [default@dev] describe keyspace;
 Keyspace: dev:
   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
 Options: [datacenter1:1]
 {noformat}
 And the effective RF will be 1 not 2.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

2011-03-23 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13010077#comment-13010077
 ] 

Aaron Morton commented on CASSANDRA-2191:
-

I have a bunch of questions mostly because I'm trying to understand the reasons 
for doing things.
 
# If max is 0 SSTableTracker.markCompacting() will return an empty list rather 
than null. 
# CompactionManager.submitMinorIfNeeded() sorts the SSTables in the bucket to 
compact the older ones first. When the list is passed to 
SSTableTracker.markCompacting() the order is lost. 
# In CompactionManager.submitIndexBuild() and submmitSSTableBuild() should the 
calls to executor be in an inner try block to ensure the lock is always 
released.
# If the size of the thread pool for CompactionManager.CompactionExecutor() is 
not configurable is there a risk of using too many threads and saturating the 
IO with compaction? Could some people want less than 1 thread per core?
# For my understanding: What about the CompactionExecutor using the 
JMXEnabledThreadPoolExecutor so it's stats come back in TP Stats ? 
# There is a comment in CompactionManager.doCompaction() about relying on a 
single thread in compaction to when determining if it's a major compaction. 
# The order in which the buckets are processed appears to be undefined. Would 
it make sense to order them by number of files or avg size so there is a more 
predictable outcome with multiple threads possibly working through a similar 
set of files? 
# For my understanding: Have you considered adding a flag to so that a minor 
compaction will stop processing buckets if additional threads have started? I 
think this may make the compaction less aggressive as it would more quickly 
fall back to a single thread until more were needed again.   
# The order of the list returned from CompactionExecutor.getCompactions() is 
undefined. Could they be returned in the order they were added to the executor 
to make to the data returned from 
CompactionExecutor.getColumnFamilyInProgress() more reliable?


 Multithread across compaction buckets
 -

 Key: CASSANDRA-2191
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Priority: Critical
  Labels: compaction
 Fix For: 0.8

 Attachments: 0001-Add-a-compacting-set-to-sstabletracker.txt, 
 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt


 This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
 reasoning are different enough to open a separate issue.
 The problem with compactions currently is that they compact the set of 
 sstables that existed the moment the compaction started. This means that for 
 longer running compactions (even when running as fast as possible on the 
 hardware), a very large number of new sstables might be created in the 
 meantime. We have observed this proliferation of sstables killing performance 
 during major/high-bucketed compactions.
 One approach would be to pause compactions in upper buckets (containing 
 larger files) when compactions in lower buckets become possible. While this 
 would likely solve the problem with read performance, it does not actually 
 help us perform compaction any faster, which is a reasonable requirement for 
 other situations.
 Instead, we need to be able to perform any compactions that are currently 
 required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

2011-03-24 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011068#comment-13011068
 ] 

Aaron Morton commented on CASSANDRA-2156:
-

Bunch of questions again as I'm trying to understand some more of whats going 
on. 

# if compaction_throughput_kb_per_sec is always going be megabytes should it 
change to MB  
# Not in your changes but CompactionIterator.close() will stop closing files 
after the first one fails. 
# I'm guessing most of the time the actual and target throughput will not 
match. How about moving the INFO message in throttle() to the DEBUG level? Or 
only logging at INFO is the thread will sleep? 
# Should there be a config setting to turn throttling on and off? Could setting 
compaction_throughput_kb_per_sec to 0 disable it ?  
# For my understanding: Is there a case for making the sampling interval in 
CompactionIterator.getReduce() configurable? Would we want different settings 
for fewer big rows vs many small rows. e.g. two CFs where one is a secondary 
index for rows in the other, could be millions of cols in one an a few in 
another.


I dont understand the approach to deciding what value 
compaction_throughput_kb_per_sec should have. Can you add some more info and 
clarify if you are talking about the per CF buckets creating during Compaction?

Final question. Would it be better to have fewer parallel compactions where 
each compaction completes quickly, than more parallel compactions that take 
longer to complete. Assuming that once compaction has finished read performance 
and disk usage may improve. If so would limiting compaction by sizing the 
compaction thread pool be effective? (I guess the down side may be starvation 
for some CF's) 

 Compaction Throttling
 -

 Key: CASSANDRA-2156
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
 Project: Cassandra
  Issue Type: New Feature
Reporter: Stu Hood
 Fix For: 0.8

 Attachments: 
 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, 
 for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, 
 for-0.6-0002-Make-compaction-throttling-configurable.txt


 Compaction is currently relatively bursty: we compact as fast as we can, and 
 then we wait for the next compaction to be possible (hurry up and wait).
 Instead, to properly amortize compaction, you'd like to compact exactly as 
 fast as you need to to keep the sstable count under control.
 For every new level of compaction, you need to increase the rate that you 
 compact at: a rule of thumb that we're testing on our clusters is to 
 determine the maximum number of buckets a node can support (aka, if the 15th 
 bucket holds 750 GB, we're not going to have more than 15 buckets), and then 
 multiply the flush throughput by the number of buckets to get a minimum 
 compaction throughput to maintain your sstable count.
 Full explanation: for a min compaction threshold of {{T}}, the bucket at 
 level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of 
 data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of 
 causing the bucket at level N to fill. If the bucket at level N fills, it 
 causes {{SsubN}} units to be compacted. So, for each active level in your 
 system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any 
 time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2221) 'show create' commands on the CLI to export schema

2011-03-24 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011074#comment-13011074
 ] 

Aaron Morton commented on CASSANDRA-2221:
-

Do we need this in 0.7.5 or only 0.8 ? There have been some small changes to 
the CLI script in 0.8 wrt specifying the RF. 

 'show create' commands on the CLI to export schema
 --

 Key: CASSANDRA-2221
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2221
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Aaron Morton
Priority: Minor
  Labels: cli
 Fix For: 0.7.5


 It would be nice to have 'show create' type of commands on the command-line 
 so that it would generate the DDL for the schema.
 A scenario that would make this useful is where a team works out a data model 
 over time with a dev cluster.  They want to use parts of that schema for new 
 clusters that they create, like a staging/prod cluster.  It would be very 
 handy in this scenario to have some sort of export mechanism.
 Another use case is for testing purposes - you want to replicate a problem.
 We currently have schematool for import/export but that is deprecated and it 
 exports into yaml.
 This new feature would just be able to 'show' - or export if they want the 
 entire keyspace - into a script or commands that could be used in a cli 
 script.  It would need to be able to regenerate everything about the keyspace 
 including indexes and metadata.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2404) if out of disk space reclaim compacted SSTables during memtable flush

2011-03-30 Thread Aaron Morton (JIRA)
if out of disk space reclaim compacted SSTables during memtable flush
-

 Key: CASSANDRA-2404
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2404
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.5


During compaction if there is not enough disk space we invoke GC to reclaim 
unused space.

During memtable and binary memtable flush we just error out if there is not 
enough disk space to flush the table. 

Can we make cfs.createFlushWriter() use the same logic as 
Table.getDataFileLocation() to reclaim space if needed?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2191) Multithread across compaction buckets

2011-04-03 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015150#comment-13015150
 ] 

Aaron Morton commented on CASSANDRA-2191:
-

4) Sounds reasonable if throttling is on.
 
6) I'm not familiar with the bloom filter optimization you mentioned. However 
it seems that more than anything else the major flag in doCompaction() 
indicates if the compaction is running on all sstables, regardless of how the 
process was triggered. i.e. the first ever minor compaction would also be 
marked as major by this logic. PrecompactedRow and LazilyCompactedRow will 
purge rows if the major flag is set or the key is only present in the sstables 
under compaction. I'm not sure why the extra check is there for minor 
compactions, but it looks like losing the fact the a major/manual compaction 
was started could change the purge behaviour. 

I'm also trying to understand if the isKeyInRemainingSSTables() in the 
AbstractCompactedRow sub classes could be affected by multithreading. e.g. CF 
with two buckets, high min compaction threshold so longer compaction, two 
concurrent minor compactions one in each bucket, row A in both buckets, if 
either thread processes row A before the other finishes it would stop that 
thread purging the row, is there a race condition that stops both threads 
purging the row?

9) We do not use the value in the compactions map, could we set it to the 
current system time when beginCompaction() is called and use that to the sort 
the list ? was not a biggie 



 Multithread across compaction buckets
 -

 Key: CASSANDRA-2191
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2191
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Stu Hood
Assignee: Stu Hood
Priority: Critical
  Labels: compaction
 Fix For: 0.8

 Attachments: 0001-Add-a-compacting-set-to-DataTracker.txt, 
 0002-Use-the-compacting-set-of-sstables-to-schedule-multith.txt, 
 0003-Expose-multiple-compactions-via-JMX-and-deprecate-sing.txt, 
 0004-Try-harder-to-close-scanners-in-compaction-close.txt


 This ticket overlaps with CASSANDRA-1876 to a degree, but the approaches and 
 reasoning are different enough to open a separate issue.
 The problem with compactions currently is that they compact the set of 
 sstables that existed the moment the compaction started. This means that for 
 longer running compactions (even when running as fast as possible on the 
 hardware), a very large number of new sstables might be created in the 
 meantime. We have observed this proliferation of sstables killing performance 
 during major/high-bucketed compactions.
 One approach would be to pause compactions in upper buckets (containing 
 larger files) when compactions in lower buckets become possible. While this 
 would likely solve the problem with read performance, it does not actually 
 help us perform compaction any faster, which is a reasonable requirement for 
 other situations.
 Instead, we need to be able to perform any compactions that are currently 
 required in parallel, independent of what bucket they might be in.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2156) Compaction Throttling

2011-04-08 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017794#comment-13017794
 ] 

Aaron Morton commented on CASSANDRA-2156:
-

From a discussion on the user list 
http://www.mail-archive.com/user@cassandra.apache.org/msg12027.html

CompactionManager.submitSSTableBuild() and submitIndexBuild() are used when 
receiving streams from other nodes. But they do not use the 
CompactionIterator() so are not covered by this ticket.

Want to create another ticket just for those tasks or reopen CASSANDRA-1882 and 
punt it to a future version?

 Compaction Throttling
 -

 Key: CASSANDRA-2156
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2156
 Project: Cassandra
  Issue Type: New Feature
Reporter: Stu Hood
Assignee: Stu Hood
 Fix For: 0.8

 Attachments: 
 0001-Throttle-total-compaction-to-a-configurable-throughput.txt, 
 for-0.6-0001-Throttle-compaction-to-a-fixed-throughput.txt, 
 for-0.6-0002-Make-compaction-throttling-configurable.txt


 Compaction is currently relatively bursty: we compact as fast as we can, and 
 then we wait for the next compaction to be possible (hurry up and wait).
 Instead, to properly amortize compaction, you'd like to compact exactly as 
 fast as you need to to keep the sstable count under control.
 For every new level of compaction, you need to increase the rate that you 
 compact at: a rule of thumb that we're testing on our clusters is to 
 determine the maximum number of buckets a node can support (aka, if the 15th 
 bucket holds 750 GB, we're not going to have more than 15 buckets), and then 
 multiply the flush throughput by the number of buckets to get a minimum 
 compaction throughput to maintain your sstable count.
 Full explanation: for a min compaction threshold of {{T}}, the bucket at 
 level {{N}} can contain {{SsubN = T^N}} 'units' (unit == memtable's worth of 
 data on disk). Every time a new unit is added, it has a {{1/SsubN}} chance of 
 causing the bucket at level N to fill. If the bucket at level N fills, it 
 causes {{SsubN}} units to be compacted. So, for each active level in your 
 system you have {{SubN * 1 / SsubN}}, or {{1}} amortized unit to compact any 
 time a new unit is added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

2011-04-10 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2088:


Attachment: 0002-delete-partial-sstable-if-compaction-error.patch
0001-detect-streaming-failures-and-cleanup-temp-files.patch

patch 0001 tracks failures during AES streaming, files for failed Stream 
sessions are cleaned up and repair is allowed to continue. Failed files are 
logged at the StreamSession, TreeRequest, and RepairSession level. 

patch 0002 handle exceptions when doing a (normal) compaction and deletes the 
temp SSTable. The SSTableWriter components are closed before deletion so that 
windows will delete correctly. 

 Temp files for failed compactions/streaming not cleaned up
 --

 Key: CASSANDRA-2088
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Stu Hood
Assignee: Aaron Morton
 Fix For: 0.8

 Attachments: 
 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 
 0002-delete-partial-sstable-if-compaction-error.patch


 From separate reports, compaction and repair are currently missing 
 opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2008) CLI help incorrect in places

2011-04-11 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2008:


Attachment: 0001-moved-cli-help-to-yaml-resource-and-expanded-content.patch

Rebased to the current trunk and updated content.

0001-moved-cli-help-to-yaml-resource-and-expanded-content.patch is the only 
patch needed, could not see how to delete the others.

 CLI help incorrect in places
 

 Key: CASSANDRA-2008
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2008
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.8
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Fix For: 0.8

 Attachments: 
 0001-moved-cli-help-to-yaml-resource-and-expanded-content.patch, 2007.txt, 
 2008-2.patch, 2008-3.patch


 Found some errors in the CLI help, such as these for create column family.
 - memtable_operations: Flush memtables after this many operations
 - memtable_throughput: ... or after this many bytes have been written
 - memtable_flush_after: ... or after this many seconds
 Should be millions of ops, MB's written and minutes not seconds.  Have 
 confirmed thats how the values are used. Will check all the help. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2458) cli divides read repair chance by 100

2011-04-12 Thread Aaron Morton (JIRA)
cli divides read repair chance by 100
-

 Key: CASSANDRA-2458
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
 Fix For: 0.7.5, 0.8


cli incorrectly divides the read_repair chance by 100 when creating / updating 
CF's

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2458) cli divides read repair chance by 100

2011-04-12 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2458:


Attachment: 0001-do-not-divide-read_repair_chance-by-100.patch

now expects read repair chance to be between 0 and 1 for create and update CF.

 cli divides read repair chance by 100
 -

 Key: CASSANDRA-2458
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2458
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Minor
  Labels: cli
 Fix For: 0.6.13, 0.7.5, 0.8

 Attachments: 0001-do-not-divide-read_repair_chance-by-100.patch


 cli incorrectly divides the read_repair chance by 100 when creating / 
 updating CF's

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2088) Temp files for failed compactions/streaming not cleaned up

2011-04-12 Thread Aaron Morton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13019171#comment-13019171
 ] 

Aaron Morton commented on CASSANDRA-2088:
-

Thanks will take another look at the cleanup for compaction. 


 Temp files for failed compactions/streaming not cleaned up
 --

 Key: CASSANDRA-2088
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2088
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Stu Hood
Assignee: Aaron Morton
 Fix For: 0.8

 Attachments: 
 0001-Better-detect-failures-from-the-other-side-in-Incomi.patch, 
 0001-detect-streaming-failures-and-cleanup-temp-files.patch, 
 0002-delete-partial-sstable-if-compaction-error.patch


 From separate reports, compaction and repair are currently missing 
 opportunities to clean up tmp files after failures.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2492) add an escapeSQLString function and fix unescapeSQLString

2011-04-16 Thread Aaron Morton (JIRA)
add an escapeSQLString function and fix unescapeSQLString
-

 Key: CASSANDRA-2492
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2492
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial


CliUtils.unescapeSqlString repeats the escape character e.g. 
{noformat}my \\t tab becomes my \tt{noformat}
because {{i}} is not bumped when an escape is processed.
 
Also for Cassandra-2221 I need a function to escape strings back so they will 
work if processed by the cli again. 

There are a number of non [standard 
escapes|http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#101089]
 which I assume is a hang over from is original source 
https://github.com/apache/cassandra/blob/1aeca2b6257b0ad6680080b1756edf7ee9acf8c8/src/java/org/apache/cassandra/cli/CliUtils.java

Will change to use the 
[StringEscapeUtils|http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringEscapeUtils.html]
 class  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2492) add an escapeSQLString function and fix unescapeSQLString

2011-04-19 Thread Aaron Morton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Morton updated CASSANDRA-2492:


Attachment: 0001-use-StringEscapeUtils-to-escape-and-unescape.patch

Attached patch to use StringEscapeUtils to escape and unescape cli strings, 
includes unit test.

 add an escapeSQLString function and fix unescapeSQLString
 -

 Key: CASSANDRA-2492
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2492
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.4
Reporter: Aaron Morton
Assignee: Aaron Morton
Priority: Trivial
 Attachments: 0001-use-StringEscapeUtils-to-escape-and-unescape.patch


 CliUtils.unescapeSqlString repeats the escape character e.g. 
 {noformat}my \\t tab becomes my \tt{noformat}
 because {{i}} is not bumped when an escape is processed.
  
 Also for Cassandra-2221 I need a function to escape strings back so they will 
 work if processed by the cli again. 
 There are a number of non [standard 
 escapes|http://java.sun.com/docs/books/jls/second_edition/html/lexical.doc.html#101089]
  which I assume is a hang over from is original source 
 https://github.com/apache/cassandra/blob/1aeca2b6257b0ad6680080b1756edf7ee9acf8c8/src/java/org/apache/cassandra/cli/CliUtils.java
 Will change to use the 
 [StringEscapeUtils|http://commons.apache.org/lang/api-2.5/org/apache/commons/lang/StringEscapeUtils.html]
  class  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >