date:20110721

[
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068830#comment-13068830
]

Yang Yang commented on CASSANDRA-2843:
--

Brandon:

I used commit 4629648899e637e8e03938935f126689cce5ad48 and applied the
2843_c.patch, and also tried the head, but got the following error with the
benchmark pycassa script. how did you succeed with it?

[default@query] create column family NoCache
... with comparator = AsciiType
... and default_validation_class = AsciiType
... and key_validation_class = AsciiType
... and keys_cached = 0
... and rows_cached = 0;
Unable to set Compaction Strategy Class of AsciiType

thanks
Yang

better performance on long row read
---

Key: CASSANDRA-2843
URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
Project: Cassandra
Issue Type: New Feature
Reporter: Yang Yang
Attachments: 2843.patch, 2843_c.patch, fast_cf_081_trunk.diff,
incremental.diff, microBenchmark.patch

currently if a row contains 1000 columns, the run time becomes considerably
slow (my test of
a row with 30 00 columns (standard, regular) each with 8 bytes in name, and
40 bytes in value, is about 16ms.
this is all running in memory, no disk read is involved.
through debugging we can find
most of this time is spent on
[Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter)
[Wall Time]
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter,
ColumnFamily)
[Wall Time]
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int,
ColumnFamily)
[Wall Time]
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter,
int, ColumnFamily)
[Wall Time]
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
Iterator, int)
[Wall Time]
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
Iterator, int)
[Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
ColumnFamily.addColumn() is slow because it inserts into an internal
concurrentSkipListMap() that maps column names to values.
this structure is slow for two reasons: it needs to do synchronization; it
needs to maintain a more complex structure of map.
but if we look at the whole read path, thrift already defines the read output
to be ListColumnOrSuperColumn so it does not make sense to use a luxury map
data structure in the interium and finally convert it to a list. on the
synchronization side, since the return CF is never going to be
shared/modified by other threads, we know the access is always single thread,
so no synchronization is needed.
but these 2 features are indeed needed for ColumnFamily in other cases,
particularly write. so we can provide a different ColumnFamily to
CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always
creates the standard ColumnFamily, but take a provided returnCF, whose cost
is much cheaper.
the provided patch is for demonstration now, will work further once we agree
on the general direction.
CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is
provided. the main work is to let the FastColumnFamily use an array for
internal storage. at first I used binary search to insert new columns in
addColumn(), but later I found that even this is not necessary, since all
calling scenarios of ColumnFamily.addColumn() has an invariant that the
inserted columns come in sorted order (I still have an issue to resolve
descending or ascending now, but ascending works). so the current logic is
simply to compare the new column against the end column in the array, if
names not equal, append, if equal, reconcile.
slight temporary hacks are made on getTopLevelColumnFamily so we have 2
flavors of the method, one accepting a returnCF. but we could definitely
think about what is the better way to provide this returnCF.
this patch compiles fine, no tests are provided yet. but I tested it in my
application, and the performance improvement is dramatic: it offers about 50%
reduction in read time in the 3000-column case.
thanks
Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2521) Move away from Phantom References for Compaction/Memtable


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2521:


Attachment: 2521-v5.patch

Attaching v5. It is rebased and now all unit tests are passing. It also adds a 
few requested comments and only triggers GC for disk space when that could 
possibly be useful (i.e, when mmap is used on a non sun jvm (more precisely, if 
the mmap cleaner is not available)).

Will commmit this based on earlier +1 and on Terje successful testing (thanks a 
lot for that btw).


 Move away from Phantom References for Compaction/Memtable
 -

 Key: CASSANDRA-2521
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2521
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
 Fix For: 1.0

 Attachments: 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-.patch, 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-v2.patch, 
 0002-Force-unmapping-files-before-deletion-v2.patch, 2521-v3.txt, 
 2521-v4.txt, 2521-v5.patch


 http://wiki.apache.org/cassandra/MemtableSSTable
 Let's move to using reference counting instead of relying on GC to be called 
 in StorageService.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2929) Don't include tmp files as sstable when create column families

Don't include tmp files as sstable when create column families
--

 Key: CASSANDRA-2929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2929
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 0.7.8, 0.8.2
 Attachments: 
0001-Don-t-include-tmp-files-as-sstables-when-creating-CF.patch

When we open a column family and populate the SSTableReader, we happen to 
include -tmp files. This has no change to actually happen in a real life 
situation, but that is what was triggering a race in the unit tests triggering 
spurious assertion failure in estimateRowsFromIndex.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2929) Don't include tmp files as sstable when create column families


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-2929:


Attachment: 0001-Don-t-include-tmp-files-as-sstables-when-creating-CF.patch

Patch is against 0.7

 Don't include tmp files as sstable when create column families
 --

 Key: CASSANDRA-2929
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2929
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.0
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Priority: Trivial
 Fix For: 0.7.8, 0.8.2

 Attachments: 
 0001-Don-t-include-tmp-files-as-sstables-when-creating-CF.patch


 When we open a column family and populate the SSTableReader, we happen to 
 include -tmp files. This has no change to actually happen in a real life 
 situation, but that is what was triggering a race in the unit tests 
 triggering spurious assertion failure in estimateRowsFromIndex.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (CASSANDRA-2825) Auto bootstrapping the 4th node in a 4 node cluster doesn't work, when no token explicitly assigned in config.


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne reopened CASSANDRA-2825:
-


Reopening because the patch broke BootStrapperTest (it somehow hangs forever 
until junit timeout)

 Auto bootstrapping the 4th node in a 4 node cluster doesn't work, when no 
 token explicitly assigned in config.
 --

 Key: CASSANDRA-2825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.8.1
Reporter: Michael Allen
Assignee: Brandon Williams
 Fix For: 0.8.2

 Attachments: 2825-v2.txt, 2825.txt


 This was done in sequence.  A, B, C, and D.  Node A with token 0 explicitly 
 set in config.  The rest with auto_bootstrap: true and no token explicitly 
 assigned.  B and C work as expected. D ends up stealing C's token.  
 from system.log on C:
 INFO [GossipStage:1] 2011-06-24 16:40:41,947 Gossiper.java (line 638) Node 
 /10.171.47.226 is now part of the cluster
 INFO [GossipStage:1] 2011-06-24 16:40:41,947 Gossiper.java (line 606) 
 InetAddress /10.171.47.226 is now UP
 INFO [GossipStage:1] 2011-06-24 16:42:09,432 StorageService.java (line 769) 
 Nodes /10.171.47.226 and /10.171.55.77 have the same token 
 61078635599166706937511052402724559481.  /10.171.47.226 is the new owner
 WARN [GossipStage:1] 2011-06-24 16:42:09,432 TokenMetadata.java (line 120) 
 Token 61078635599166706937511052402724559481 changing ownership from 
 /10.171.55.77 to /10.171.47.226

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2521) Move away from Phantom References for Compaction/Memtable


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068887#comment-13068887
 ] 

Hudson commented on CASSANDRA-2521:
---

Integrated in Cassandra #967 (See 
[https://builds.apache.org/job/Cassandra/967/])
Use reference counting to delete sstables instead of relying on the GC
patch by slebresne; reviewed by jbellis for CASSANDRA-2521

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149085
Files : 
* /cassandra/trunk/test/unit/org/apache/cassandra/io/sstable/SSTableUtils.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/AntiEntropyService.java
* /cassandra/trunk/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Table.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamOutSession.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/streaming/SerializationsTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/StorageService.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/util/BufferedSegmentedFile.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/util/SegmentedFile.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/util/MmappedSegmentedFile.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableDeletingTask.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/service/StorageServiceMBean.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/PendingFile.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamInSession.java
* /cassandra/trunk/src/java/org/apache/cassandra/streaming/StreamOut.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/GCInspector.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/streaming/StreamingTransferTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableReader.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTableDeletingReference.java


 Move away from Phantom References for Compaction/Memtable
 -

 Key: CASSANDRA-2521
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2521
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Sylvain Lebresne
 Fix For: 1.0

 Attachments: 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-.patch, 
 0001-Use-reference-counting-to-decide-when-a-sstable-can-v2.patch, 
 0002-Force-unmapping-files-before-deletion-v2.patch, 2521-v3.txt, 
 2521-v4.txt, 2521-v5.patch


 http://wiki.apache.org/cassandra/MemtableSSTable
 Let's move to using reference counting instead of relying on GC to be called 
 in StorageService.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2843) better performance on long row read

[
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068896#comment-13068896
]

Sylvain Lebresne commented on CASSANDRA-2843:
-

bq. You mean just as a javadoc comment?

Yes.

better performance on long row read
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1149121 - in /cassandra/branches/cassandra-0.8: ./ conf/ src/java/org/apache/cassandra/concurrent/ src/java/org/apache/cassandra/db/compaction/ src/java/org/apache/cassandra/service/ test

Author: slebresne
Date: Thu Jul 21 11:11:50 2011
New Revision: 1149121

URL: http://svn.apache.org/viewvc?rev=1149121view=rev
Log:
Properly synchronize merkle tree computation
patch by slebresne; reviewed by jbellis for CASSANDRA-2816

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt
cassandra/branches/cassandra-0.8/conf/cassandra.yaml

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/AntiEntropyService.java

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java

cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/service/AntiEntropyServiceTestAbstract.java

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1149121r1=1149120r2=1149121view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Thu Jul 21 11:11:50 2011
@@ -38,6 +38,7 @@
  * fix re-using index CF sstable names after drop/recreate (CASSANDRA-2872)
  * prepend CF to default index names (CASSANDRA-2903)
  * fix hint replay (CASSANDRA-2928)
+ * Properly synchronize merkle tree computation (CASSANDRA-2816)
 
 
 0.8.1

Modified: cassandra/branches/cassandra-0.8/conf/cassandra.yaml
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/conf/cassandra.yaml?rev=1149121r1=1149120r2=1149121view=diff
==
--- cassandra/branches/cassandra-0.8/conf/cassandra.yaml (original)
+++ cassandra/branches/cassandra-0.8/conf/cassandra.yaml Thu Jul 21 11:11:50 
2011
@@ -253,13 +253,15 @@ column_index_size_in_kb: 64
 # will be logged specifying the row key.
 in_memory_compaction_limit_in_mb: 64
 
-# Number of compaction threads. This default to the number of processors,
+# Number of compaction threads (NOT including validation compactions
+# for anti-entropy repair). This default to the number of processors,
 # enabling multiple compactions to execute at once. Using more than one
 # thread is highly recommended to preserve read performance in a mixed
 # read/write workload as this avoids sstables from accumulating during long
 # running compactions. The default is usually fine and if you experience
 # problems with compaction running too slowly or too fast, you should look at
 # compaction_throughput_mb_per_sec first.
+#
 # Uncomment to make compaction mono-threaded.
 #concurrent_compactors: 1
 
@@ -267,7 +269,8 @@ in_memory_compaction_limit_in_mb: 64
 # system. The faster you insert data, the faster you need to compact in
 # order to keep the sstable count down, but in general, setting this to
 # 16 to 32 times the rate you are inserting data is more than sufficient.
-# Setting this to 0 disables throttling.
+# Setting this to 0 disables throttling. Note that this account for all types
+# of compaction, including validation compaction.
 compaction_throughput_mb_per_sec: 16
 
 # Track cached row keys during compaction, and re-cache their new

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java?rev=1149121r1=1149120r2=1149121view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java
 Thu Jul 21 11:11:50 2011
@@ -52,9 +52,14 @@ public class DebuggableThreadPoolExecuto
 this(1, Integer.MAX_VALUE, TimeUnit.SECONDS, new 
LinkedBlockingQueueRunnable(), new NamedThreadFactory(threadPoolName, 
priority));
 }
 
-public DebuggableThreadPoolExecutor(int corePoolSize, long keepAliveTime, 
TimeUnit unit, BlockingQueueRunnable workQueue, ThreadFactory threadFactory)
+public DebuggableThreadPoolExecutor(int corePoolSize, long keepAliveTime, 
TimeUnit unit, BlockingQueueRunnable queue, ThreadFactory factory)
 {
-super(corePoolSize, corePoolSize, keepAliveTime, unit, workQueue, 
threadFactory);
+this(corePoolSize, corePoolSize, keepAliveTime, unit, queue, factory);
+}
+
+protected DebuggableThreadPoolExecutor(int corePoolSize, int maxPoolSize, 
long keepAliveTime, TimeUnit unit, BlockingQueueRunnable workQueue, 
ThreadFactory threadFactory)
+{
+super(corePoolSize, maxPoolSize, keepAliveTime, unit, workQueue, 
threadFactory);

[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068907#comment-13068907
 ] 

Sylvain Lebresne commented on CASSANDRA-2816:
-

Alright, v5 looks good to me. Committed, thanks.

 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch, 
 2816-v2.txt, 2816-v4.txt, 2816-v5.txt, 2816_0.8_v3.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2924) Consolidate JDBC driver classes: Connection and CassandraConnection in advance of feature additions for 1.1


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068927#comment-13068927
 ] 

Rick Shaw commented on CASSANDRA-2924:
--

+1 

(lesson learned)

 Consolidate JDBC driver classes: Connection and CassandraConnection in 
 advance of feature additions for 1.1
 ---

 Key: CASSANDRA-2924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2924
 Project: Cassandra
  Issue Type: Improvement
  Components: Drivers
Affects Versions: 0.8.1
Reporter: Rick Shaw
Assignee: Rick Shaw
Priority: Minor
  Labels: JDBC
 Fix For: 0.8.2

 Attachments: 2924-v2.txt, consolidate-connection-v1.txt


 For the JDBC Driver suite, additional cleanup and consolidation of classes 
 {{Connection}} and {{CassandraConnection}} were in order. Those changes drove 
 a few casual additional changes in related classes {{CResultSet}}, 
 {{CassandraStatement}} and {{CassandraPreparedStatement}} in order to 
 continue to communicate properly. The class {{Utils}} was also enhanced to 
 move more static utility methods into this holder class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2829) memtable with no post-flush activity can leave commitlog permanently dirty


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068931#comment-13068931
 ] 

Sylvain Lebresne commented on CASSANDRA-2829:
-

bq. It feels like we need to add a most recent write at information as well 
as the oldest write/replay position at one. This would not need to be 
persisted to disk.

Agreed, I think this is the right fix too.

 memtable with no post-flush activity can leave commitlog permanently dirty 
 ---

 Key: CASSANDRA-2829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Morton
Assignee: Jonathan Ellis
 Fix For: 0.8.2

 Attachments: 0001-2829-unit-test.patch, 0002-2829.patch


 Only dirty Memtables are flushed, and so only dirty memtables are used to 
 discard obsolete commit log segments. This can result it log segments not 
 been deleted even though the data has been flushed.  
 Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data 
 loaded and a running application working against the cluster. Did a rolling 
 restart and then kicked off a repair, one node filled up the commit log 
 volume with 7GB+ of log data, there was about 20 hours of log files. 
 {noformat}
 $ sudo ls -lah commitlog/
 total 6.9G
 drwx-- 2 cassandra cassandra  12K 2011-06-24 20:38 .
 drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
 -rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 
 CommitLog-1308876643288.log
 -rw--- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308876643288.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 
 CommitLog-1308877711517.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308877711517.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 
 CommitLog-1308879395824.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308879395824.log.header
 ...
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 
 CommitLog-1308946745380.log
 -rw-r--r-- 1 cassandra cassandra   36 2011-06-24 20:47 
 CommitLog-1308946745380.log.header
 -rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 
 CommitLog-1308947888397.log
 -rw-r--r-- 1 cassandra cassandra   44 2011-06-24 20:47 
 CommitLog-1308947888397.log.header
 {noformat}
 The user KS has 2 CF's with 60 minute flush times. System KS had the default 
 settings which is 24 hours. Will create another ticket see if these can be 
 reduced or if it's something users should do, in this case it would not have 
 mattered. 
 I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the 
 segments had the system CF's marked as dirty.
 {noformat}
 $ bin/logtool dirty /tmp/logs/commitlog/
 Not connected to a server, Keyspace and Column Family names are not available.
 /tmp/logs/commitlog/CommitLog-1308876643288.log.header
 Keyspace Unknown:
   Cf id 0: 444
 /tmp/logs/commitlog/CommitLog-1308877711517.log.header
 Keyspace Unknown:
   Cf id 1: 68848763
 ...
 /tmp/logs/commitlog/CommitLog-1308944451460.log.header
 Keyspace Unknown:
   Cf id 1: 61074
 /tmp/logs/commitlog/CommitLog-1308945597471.log.header
 Keyspace Unknown:
   Cf id 1000: 43175492
   Cf id 1: 108483
 /tmp/logs/commitlog/CommitLog-1308946745380.log.header
 Keyspace Unknown:
   Cf id 1000: 239223
   Cf id 1: 172211
 /tmp/logs/commitlog/CommitLog-1308947888397.log.header
 Keyspace Unknown:
   Cf id 1001: 57595560
   Cf id 1: 816960
   Cf id 1000: 0
 {noformat}
 CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont 
 have it now, but IIRC CFStats showed the LocationInfo CF with dirty ops. 
 I was able to repo a case where flushing the CF's did not mark the log 
 segments as obsolete (attached unit-test patch). Steps are:
 1. Write to cf1 and flush.
 2. Current log segment is marked as dirty at the CL position when the flush 
 started, CommitLog.discardCompletedSegmentsInternal()
 3. Do not write to cf1 again.
 4. Roll the log, my test does this manually. 
 5. Write to CF2 and flush.
 6. Only CF2 is flushed because it is the only dirty CF. 
 cfs.maybeSwitchMemtable() is not called for cf1 and so log segment 1 is still 
 marked as dirty from cf1.
 Step 5 is not essential, just matched what I thought was happening. I thought 
 SystemTable.updateToken() was called which does not flush, and this was the 
 last thing that happened.  
 The expired memtable thread created by Table uses the same cfs.forceFlush() 
 which is a no-op if the cf or it's secondary indexes are clean. 
 
 I think the same problem would exist in 0.8. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see:

[jira] [Commented] (CASSANDRA-2863) NPE when writing SSTable generated via repair


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068951#comment-13068951
 ] 

Sylvain Lebresne commented on CASSANDRA-2863:
-

I'm a little bit baffled by that one. Trusting the stack trace, apparently when 
SSTW.RowIndexer.close() is called, the iwriter field is null. But iwriter is 
set in prepareIndexing() that is called the line before index() in 
SSTW.Builder. Thus if an exception happens in prepareIndexing, we shouldn't 
arrive to the index() method (which is the one triggering the close()). And 
looking at the use of iwriter, no other line set it (so it can't be set back to 
null after prepareIndexing()).

So I mean we can add a {{if (iwriter != null)}} before calling the close, but 
the truth is I have no clue how it could ever be null at that point.

Héctor: are you positive that you are using stock 0.8.1 ?

 NPE when writing SSTable generated via repair
 -

 Key: CASSANDRA-2863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2863
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Héctor Izquierdo
Assignee: Sylvain Lebresne
 Fix For: 0.8.2


 A NPE is generated during repair when closing an sstable generated via 
 SSTable build. It doesn't happen always. The node had been scrubbed and 
 compacted before calling repair.
  INFO [CompactionExecutor:2] 2011-07-06 11:11:32,640 SSTableReader.java (line 
 158) Opening /d2/cassandra/data/sbs/walf-g-730
 ERROR [CompactionExecutor:2] 2011-07-06 11:11:34,327 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:2,1,main] 
 java.lang.NullPointerException
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.close(SSTableWriter.java:382)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:370)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1103)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1094)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2816) Repair doesn't synchronize merkle tree creation properly


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068953#comment-13068953
 ] 

Hudson commented on CASSANDRA-2816:
---

Integrated in Cassandra-0.8 #231 (See 
[https://builds.apache.org/job/Cassandra-0.8/231/])
Properly synchronize merkle tree computation
patch by slebresne; reviewed by jbellis for CASSANDRA-2816

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149121
Files : 
* 
/cassandra/branches/cassandra-0.8/test/unit/org/apache/cassandra/service/AntiEntropyServiceTestAbstract.java
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/AntiEntropyService.java
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/conf/cassandra.yaml
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/concurrent/DebuggableThreadPoolExecutor.java
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/db/compaction/CompactionManager.java


 Repair doesn't synchronize merkle tree creation properly
 

 Key: CASSANDRA-2816
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2816
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
  Labels: repair
 Fix For: 0.8.2

 Attachments: 0001-Schedule-merkle-tree-request-one-by-one.patch, 
 2816-v2.txt, 2816-v4.txt, 2816-v5.txt, 2816_0.8_v3.patch


 Being a little slow, I just realized after having opened CASSANDRA-2811 and 
 CASSANDRA-2815 that there is a more general problem with repair.
 When a repair is started, it will send a number of merkle tree to its 
 neighbor as well as himself and assume for correction that the building of 
 those trees will be started on every node roughly at the same time (if not, 
 we end up comparing data snapshot at different time and will thus mistakenly 
 repair a lot of useless data). This is bogus for many reasons:
 * Because validation compaction runs on the same executor that other 
 compaction, the start of the validation on the different node is subject to 
 other compactions. 0.8 mitigates this in a way by being multi-threaded (and 
 thus there is less change to be blocked a long time by a long running 
 compaction), but the compaction executor being bounded, its still a problem)
 * if you run a nodetool repair without arguments, it will repair every CFs. 
 As a consequence it will generate lots of merkle tree requests and all of 
 those requests will be issued at the same time. Because even in 0.8 the 
 compaction executor is bounded, some of those validations will end up being 
 queued behind the first ones. Even assuming that the different validation are 
 submitted in the same order on each node (which isn't guaranteed either), 
 there is no guarantee that on all nodes, the first validation will take the 
 same time, hence desynchronizing the queued ones.
 Overall, it is important for the precision of repair that for a given CF and 
 range (which is the unit at which trees are computed), we make sure that all 
 node will start the validation at the same time (or, since we can't do magic, 
 as close as possible).
 One (reasonably simple) proposition to fix this would be to have repair 
 schedule validation compactions across nodes one by one (i.e, one CF/range at 
 a time), waiting for all nodes to return their tree before submitting the 
 next request. Then on each node, we should make sure that the node will start 
 the validation compaction as soon as requested. For that, we probably want to 
 have a specific executor for validation compaction and:
 * either we fail the whole repair whenever one node is not able to execute 
 the validation compaction right away (because no thread are available right 
 away).
 * we simply tell the user that if he start too many repairs in parallel, he 
 may start seeing some of those repairing more data than it should.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2863) NPE when writing SSTable generated via repair

2011-07-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068956#comment-13068956
 ] 

Héctor Izquierdo commented on CASSANDRA-2863:
-

I have a patch from 2818 (2818-v4) applied, if that's of any help. The patch 
only touches messaging classes though.

 NPE when writing SSTable generated via repair
 -

 Key: CASSANDRA-2863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2863
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Héctor Izquierdo
Assignee: Sylvain Lebresne
 Fix For: 0.8.2


 A NPE is generated during repair when closing an sstable generated via 
 SSTable build. It doesn't happen always. The node had been scrubbed and 
 compacted before calling repair.
  INFO [CompactionExecutor:2] 2011-07-06 11:11:32,640 SSTableReader.java (line 
 158) Opening /d2/cassandra/data/sbs/walf-g-730
 ERROR [CompactionExecutor:2] 2011-07-06 11:11:34,327 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:2,1,main] 
 java.lang.NullPointerException
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.close(SSTableWriter.java:382)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:370)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1103)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1094)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2930) corrupt commitlog

2011-07-21 Thread ivan (JIRA)

corrupt commitlog
-

 Key: CASSANDRA-2930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Linux, amd64.
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: ivan


We get Exception encountered during startup error while Cassandra starts.

Error messages:
 INFO 13:56:28,736 Finished reading 
/var/lib/cassandra/commitlog/CommitLog-1310637513214.log
ERROR 13:56:28,736 Exception encountered during startup.
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
at 
java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
at 
java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
... 13 more
Exception encountered during startup.
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
at 
org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
at 
java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
at 
java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
at 
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
at 
org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
at 
org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at 
org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
at 
org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
at 
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
at 
org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
... 13 more

After some debugging I found that in some serialized supercolumns column 
counter is less than the number of serialized columns. Difference was always 1 
in corrupt commitlogs. This error always appears with supercolumns with more 
than one column, but there are properly serialized supercolumns also in 
commitlog.

I have no clue yet why this error happens. I suspect it maybe a race condition.







--
This message is

[jira] [Issue Comment Edited] (CASSANDRA-2863) NPE when writing SSTable generated via repair

2011-07-21 Thread JIRA


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068956#comment-13068956
 ] 

Héctor Izquierdo edited comment on CASSANDRA-2863 at 7/21/11 12:36 PM:
---

I have a patch from #2818 (2818-v4) applied, if that's of any help. The patch 
only touches messaging classes though.

  was (Author: hector.izquierdo):
I have a patch from 2818 (2818-v4) applied, if that's of any help. The 
patch only touches messaging classes though.
  
 NPE when writing SSTable generated via repair
 -

 Key: CASSANDRA-2863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2863
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Héctor Izquierdo
Assignee: Sylvain Lebresne
 Fix For: 0.8.2


 A NPE is generated during repair when closing an sstable generated via 
 SSTable build. It doesn't happen always. The node had been scrubbed and 
 compacted before calling repair.
  INFO [CompactionExecutor:2] 2011-07-06 11:11:32,640 SSTableReader.java (line 
 158) Opening /d2/cassandra/data/sbs/walf-g-730
 ERROR [CompactionExecutor:2] 2011-07-06 11:11:34,327 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:2,1,main] 
 java.lang.NullPointerException
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.close(SSTableWriter.java:382)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:370)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1103)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1094)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2843) better performance on long row read

[
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068957#comment-13068957
]

Brandon Williams commented on CASSANDRA-2843:
-

{quote}
I used commit 4629648899e637e8e03938935f126689cce5ad48 and applied the
2843_c.patch, and also tried the head, but got the following error with the
benchmark pycassa script. how did you succeed with it?

Add and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'

better performance on long row read
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2930) corrupt commitlog

2011-07-21 Thread ivan (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ivan updated CASSANDRA-2930:


Attachment: CommitLog-1310637513214.log

A corrupt serialized row from a corrupt commitlog.

 corrupt commitlog
 -

 Key: CASSANDRA-2930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Linux, amd64.
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: ivan
 Attachments: CommitLog-1310637513214.log


 We get Exception encountered during startup error while Cassandra starts.
 Error messages:
  INFO 13:56:28,736 Finished reading 
 /var/lib/cassandra/commitlog/CommitLog-1310637513214.log
 ERROR 13:56:28,736 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ... 13 more
 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ... 13 more
 After

[jira] [Updated] (CASSANDRA-2921) Split BufferedRandomAccessFile (BRAF) into Input and Output classes


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-2921:
---

Attachment: CASSANDRA-2921-make-Writer-a-stream.patch

Patch changes a BRAF.Writer: instead of extending AbstractRandomAccessFile it 
expends AbstractDataOutput (new class) and introduces mark() and 
resetAndTruncate(...) methods to satisfy scrub and CommitLong requirements.

 Split BufferedRandomAccessFile (BRAF) into Input and Output classes 
 

 Key: CASSANDRA-2921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2921
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: CASSANDRA-2921-make-Writer-a-stream.patch, 
 CASSANDRA-2921-v2.patch, CASSANDRA-2921.patch


 Split BRAF into Input and Output classes to void complexity related to random 
 I/O in write mode that we don't need any more, see CASSANDRA-2879. And make 
 implementation more clean and reusable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2829) memtable with no post-flush activity can leave commitlog permanently dirty

2011-07-21 Thread Aaron Morton (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Morton updated CASSANDRA-2829:

Attachment: 0002-2829-v08.patch
0001-2829-unit-test-v08.patch

I got to take another look at this tonight on the 0.8 trunk and ported the unit
test to 0.8.

The 002-2829-v08 patch was my second attempt. It changes CFS.forceFlush() to
always flush and trusts maybeSwitchMemtable() will only flush non clean CF's.

There are no changes to CommitLog.discardCompletedSegmentsInternal(). The CF
will be turned off in any segment that is not the context segment. It will
always be turned on in the current / context segment. I think this gives the
correct behaviour, i.e. the cf can never have dirty changes in a segment that
is not current AND the cf may have changes in a segment that is current. It is
a bit sloppy though as clean CF's will mark segments as dirty which may delay
them been cleaned.

I also think there is a theoretical risk of a race condition with access to the
segments Deque. The iterator runs in the postFlushExecutor and the segments
are added on the appropriate commit log executor service.

memtable with no post-flush activity can leave commitlog permanently dirty
---

Key: CASSANDRA-2829
URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
Project: Cassandra
Issue Type: Bug
Components: Core
Reporter: Aaron Morton
Assignee: Jonathan Ellis
Fix For: 0.8.2

Attachments: 0001-2829-unit-test-v08.patch,
0001-2829-unit-test.patch, 0002-2829-v08.patch, 0002-2829.patch

Only dirty Memtables are flushed, and so only dirty memtables are used to
discard obsolete commit log segments. This can result it log segments not
been deleted even though the data has been flushed.
Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data
loaded and a running application working against the cluster. Did a rolling
restart and then kicked off a repair, one node filled up the commit log
volume with 7GB+ of log data, there was about 20 hours of log files.
{noformat}
$ sudo ls -lah commitlog/
total 6.9G
drwx-- 2 cassandra cassandra 12K 2011-06-24 20:38 .
drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
-rw--- 1 cassandra cassandra 129M 2011-06-24 01:08
CommitLog-1308876643288.log
-rw--- 1 cassandra cassandra 28 2011-06-24 20:47
CommitLog-1308876643288.log.header
-rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36
CommitLog-1308877711517.log
-rw-r--r-- 1 cassandra cassandra 28 2011-06-24 20:47
CommitLog-1308877711517.log.header
-rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20
CommitLog-1308879395824.log
-rw-r--r-- 1 cassandra cassandra 28 2011-06-24 20:47
CommitLog-1308879395824.log.header
...
-rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38
CommitLog-1308946745380.log
-rw-r--r-- 1 cassandra cassandra 36 2011-06-24 20:47
CommitLog-1308946745380.log.header
-rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54
CommitLog-1308947888397.log
-rw-r--r-- 1 cassandra cassandra 44 2011-06-24 20:47
CommitLog-1308947888397.log.header
{noformat}
The user KS has 2 CF's with 60 minute flush times. System KS had the default
settings which is 24 hours. Will create another ticket see if these can be
reduced or if it's something users should do, in this case it would not have
mattered.
I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the
segments had the system CF's marked as dirty.
{noformat}
$ bin/logtool dirty /tmp/logs/commitlog/
Not connected to a server, Keyspace and Column Family names are not available.
/tmp/logs/commitlog/CommitLog-1308876643288.log.header
Keyspace Unknown:
Cf id 0: 444
/tmp/logs/commitlog/CommitLog-1308877711517.log.header
Keyspace Unknown:
Cf id 1: 68848763
...
/tmp/logs/commitlog/CommitLog-1308944451460.log.header
Keyspace Unknown:
Cf id 1: 61074
/tmp/logs/commitlog/CommitLog-1308945597471.log.header
Keyspace Unknown:
Cf id 1000: 43175492
Cf id 1: 108483
/tmp/logs/commitlog/CommitLog-1308946745380.log.header
Keyspace Unknown:
Cf id 1000: 239223
Cf id 1: 172211
/tmp/logs/commitlog/CommitLog-1308947888397.log.header
Keyspace Unknown:
Cf id 1001: 57595560
Cf id 1: 816960
Cf id 1000: 0
{noformat}
CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont
have it now, but IIRC CFStats showed the LocationInfo CF with dirty ops.
I was able to repo a case where flushing the CF's did not mark the log
segments as obsolete (attached unit-test patch). Steps are:
1. Write to cf1 and flush.
2. Current log segment is marked as dirty at the CL

[jira] [Created] (CASSANDRA-2931) Nodetool ring prints the same token regardless of node queried

2011-07-21 Thread David Allsopp (JIRA)

Nodetool ring prints the same token regardless of node queried
--

 Key: CASSANDRA-2931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2931
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.6
Reporter: David Allsopp
Priority: Trivial


I have a 3-node test cluster. Using {{nodetool ring}} for any of the nodes 
returns the _same_ token at the top of the list 
(113427455640312821154458202477256070484) - but presumably this should reflect 
the token _of the node I am querying_ (as specified using -h) ? Or if not, what 
does it mean?

{noformat} 
[dna@dev6 ~]$ nodetool -h dev6 -p 8082 ring
Address Status State   LoadOwnsToken
   
   
113427455640312821154458202477256070484 
10.0.11.8   Up Normal  2.41 GB 33.33%  0
  
10.0.11.6   Up Normal  3.13 GB 33.33%  
56713727820156410577229101238628035242 
10.0.11.9   Up Normal  1.65 GB 33.33%  
113427455640312821154458202477256070484  

[dna@dev6 ~]$ nodetool -h dev8 -p 8082 ring
Address Status State   LoadOwnsToken
   
   
113427455640312821154458202477256070484 
10.0.11.8   Up Normal  2.41 GB 33.33%  0
   
10.0.11.6   Up Normal  3.13 GB 33.33%  
56713727820156410577229101238628035242  
10.0.11.9   Up Normal  1.65 GB 33.33%  
113427455640312821154458202477256070484  
   
[dna@dev6 ~]$ nodetool -h dev9 -p 8082 ring
Address Status State   LoadOwnsToken
   
   
113427455640312821154458202477256070484 
10.0.11.8   Up Normal  2.41 GB 33.33%  0
   
10.0.11.6   Up Normal  3.13 GB 33.33%  
56713727820156410577229101238628035242  
10.0.11.9   Up Normal  1.65 GB 33.33%  
113427455640312821154458202477256070484
{noformat} 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2843) better performance on long row read


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068972#comment-13068972
 ] 

Sylvain Lebresne commented on CASSANDRA-2843:
-

bq. Doesn't it make sense then to change the AL fallback-to-bsearch into an 
assertion failure?

Actually I just realized that there is one place where we do add a column after 
a read not at the end of the CF. That's for counters, after the read for 
replication and in the case where we can shrink the context because the node 
has renewed its NodeId multiple times and we can merge the old ones together. 
In that case, we end up updating some of the columns of the column family we've 
just read. Note that this code won't even be executed 99.999% of the time, and 
even then only a handful of columns are likely to be updated, so using the AL 
implementation really is the best choice.
We could, if we really want to, add special code for that specific case 
(copying the CF read into a CLSM backed one typically before updating it). That 
would be less efficient but that probably doesn't matter in that specific case. 
But more importantly, that exemplify why I think using an assertion is more 
dangerous than it needs to be. Imho, the bug we would have had is the kind that 
could likely made it into a release.  

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_c.patch, fast_cf_081_trunk.diff, 
 incremental.diff, microBenchmark.patch


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This

[jira] [Resolved] (CASSANDRA-2931) Nodetool ring prints the same token regardless of node queried


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne resolved CASSANDRA-2931.
-

Resolution: Not A Problem

This is not what the first token means. The first is always the bigger assigned 
token. The fact that it is diplayed at the top is an artistic rendering 
supposed to explain that we have a ring. I.e, the first and last printed token 
are the same, suggesting some kind of continuity. 

 Nodetool ring prints the same token regardless of node queried
 --

 Key: CASSANDRA-2931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2931
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.6
Reporter: David Allsopp
Priority: Trivial

 I have a 3-node test cluster. Using {{nodetool ring}} for any of the nodes 
 returns the _same_ token at the top of the list 
 (113427455640312821154458202477256070484) - but presumably this should 
 reflect the token _of the node I am querying_ (as specified using -h) ? Or if 
 not, what does it mean?
 {noformat} 
 [dna@dev6 ~]$ nodetool -h dev6 -p 8082 ring
 Address Status State   LoadOwnsToken  
  

 113427455640312821154458202477256070484 
 10.0.11.8   Up Normal  2.41 GB 33.33%  0  
 
 10.0.11.6   Up Normal  3.13 GB 33.33%  
 56713727820156410577229101238628035242 
 10.0.11.9   Up Normal  1.65 GB 33.33%  
 113427455640312821154458202477256070484  
 [dna@dev6 ~]$ nodetool -h dev8 -p 8082 ring
 Address Status State   LoadOwnsToken  
  

 113427455640312821154458202477256070484 
 10.0.11.8   Up Normal  2.41 GB 33.33%  0  
  
 10.0.11.6   Up Normal  3.13 GB 33.33%  
 56713727820156410577229101238628035242  
 10.0.11.9   Up Normal  1.65 GB 33.33%  
 113427455640312821154458202477256070484  

 [dna@dev6 ~]$ nodetool -h dev9 -p 8082 ring
 Address Status State   LoadOwnsToken  
  

 113427455640312821154458202477256070484 
 10.0.11.8   Up Normal  2.41 GB 33.33%  0  
  
 10.0.11.6   Up Normal  3.13 GB 33.33%  
 56713727820156410577229101238628035242  
 10.0.11.9   Up Normal  1.65 GB 33.33%  
 113427455640312821154458202477256070484
 {noformat} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2825) Auto bootstrapping the 4th node in a 4 node cluster doesn't work, when no token explicitly assigned in config.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068974#comment-13068974
 ] 

Brandon Williams commented on CASSANDRA-2825:
-

Impressive, I have no idea how this is breaking testTokenRoundtrip():

{noformat}
public void testTokenRoundtrip() throws Exception
{  
StorageService.instance.initServer();
// fetch a bootstrap token from the local node
assert 
BootStrapper.getBootstrapTokenFrom(FBUtilities.getLocalAddress()) != null;
}
{noformat}

The log just shows a bunch of attempts to connect to the seed (127.0.0.2) which 
hasn't started yet.

 Auto bootstrapping the 4th node in a 4 node cluster doesn't work, when no 
 token explicitly assigned in config.
 --

 Key: CASSANDRA-2825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.8.1
Reporter: Michael Allen
Assignee: Brandon Williams
 Fix For: 0.8.2

 Attachments: 2825-v2.txt, 2825.txt


 This was done in sequence.  A, B, C, and D.  Node A with token 0 explicitly 
 set in config.  The rest with auto_bootstrap: true and no token explicitly 
 assigned.  B and C work as expected. D ends up stealing C's token.  
 from system.log on C:
 INFO [GossipStage:1] 2011-06-24 16:40:41,947 Gossiper.java (line 638) Node 
 /10.171.47.226 is now part of the cluster
 INFO [GossipStage:1] 2011-06-24 16:40:41,947 Gossiper.java (line 606) 
 InetAddress /10.171.47.226 is now UP
 INFO [GossipStage:1] 2011-06-24 16:42:09,432 StorageService.java (line 769) 
 Nodes /10.171.47.226 and /10.171.55.77 have the same token 
 61078635599166706937511052402724559481.  /10.171.47.226 is the new owner
 WARN [GossipStage:1] 2011-06-24 16:42:09,432 TokenMetadata.java (line 120) 
 Token 61078635599166706937511052402724559481 changing ownership from 
 /10.171.55.77 to /10.171.47.226

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Update of NodeTool by DavidAllsopp

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The NodeTool page has been changed by DavidAllsopp:
http://wiki.apache.org/cassandra/NodeTool?action=diffrev1=15rev2=16

  10.176.1.162  Up 511.34 MB 63538518574533451921556363897953848387 
|--|
  }}}
  
+ The format is a little different for later versions - this is from v0.7.6:
+ 
+ {{{
+ Address Status State   LoadOwnsToken  
 
+
113427455640312821154458202477256070484 
+ 10.176.0.146Up Normal  459.27 MB   33.33%  0  
 
+ 10.176.1.161Up Normal  382.53 MB   33.33%  
56713727820156410577229101238628035242  
+ 10.176.1.162Up Normal  511.34 MB   33.33%  
113427455640312821154458202477256070484 
+ }}}
+ 
+ The `Owns` column indicates the percentage of the ring (keyspace) handled by 
that node
+ 
+ The largest token is repeated at the top of the list to indicate that we have 
a ring. i.e, the first and last printed token are the same, suggesting some 
kind of continuity
+ 
  == Info ==
  Outputs node information including the token, load info (on disk storage), 
generation number (times started), uptime in seconds, and heap memory usage.

[jira] [Commented] (CASSANDRA-2931) Nodetool ring prints the same token regardless of node queried

2011-07-21 Thread David Allsopp (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068980#comment-13068980
 ] 

David Allsopp commented on CASSANDRA-2931:
--

Thanks - I have edited http://wiki.apache.org/cassandra/NodeTool to spell this 
out, as most of the documentation I've seen uses the older format (with the 
ASCII ring arrows on the right).

 Nodetool ring prints the same token regardless of node queried
 --

 Key: CASSANDRA-2931
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2931
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.7.6
Reporter: David Allsopp
Priority: Trivial

 I have a 3-node test cluster. Using {{nodetool ring}} for any of the nodes 
 returns the _same_ token at the top of the list 
 (113427455640312821154458202477256070484) - but presumably this should 
 reflect the token _of the node I am querying_ (as specified using -h) ? Or if 
 not, what does it mean?
 {noformat} 
 [dna@dev6 ~]$ nodetool -h dev6 -p 8082 ring
 Address Status State   LoadOwnsToken  
  

 113427455640312821154458202477256070484 
 10.0.11.8   Up Normal  2.41 GB 33.33%  0  
 
 10.0.11.6   Up Normal  3.13 GB 33.33%  
 56713727820156410577229101238628035242 
 10.0.11.9   Up Normal  1.65 GB 33.33%  
 113427455640312821154458202477256070484  
 [dna@dev6 ~]$ nodetool -h dev8 -p 8082 ring
 Address Status State   LoadOwnsToken  
  

 113427455640312821154458202477256070484 
 10.0.11.8   Up Normal  2.41 GB 33.33%  0  
  
 10.0.11.6   Up Normal  3.13 GB 33.33%  
 56713727820156410577229101238628035242  
 10.0.11.9   Up Normal  1.65 GB 33.33%  
 113427455640312821154458202477256070484  

 [dna@dev6 ~]$ nodetool -h dev9 -p 8082 ring
 Address Status State   LoadOwnsToken  
  

 113427455640312821154458202477256070484 
 10.0.11.8   Up Normal  2.41 GB 33.33%  0  
  
 10.0.11.6   Up Normal  3.13 GB 33.33%  
 56713727820156410577229101238628035242  
 10.0.11.9   Up Normal  1.65 GB 33.33%  
 113427455640312821154458202477256070484
 {noformat} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1149176 - in /cassandra/branches/cassandra-0.7: CHANGES.txt NEWS.txt build.xml debian/changelog

Author: slebresne
Date: Thu Jul 21 13:53:13 2011
New Revision: 1149176

URL: http://svn.apache.org/viewvc?rev=1149176view=rev
Log:
Updates for 0.7.8 release (changelog, news, version number)

Modified:
cassandra/branches/cassandra-0.7/CHANGES.txt
cassandra/branches/cassandra-0.7/NEWS.txt
cassandra/branches/cassandra-0.7/build.xml
cassandra/branches/cassandra-0.7/debian/changelog

Modified: cassandra/branches/cassandra-0.7/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/CHANGES.txt?rev=1149176r1=1149175r2=1149176view=diff
==
--- cassandra/branches/cassandra-0.7/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.7/CHANGES.txt Thu Jul 21 13:53:13 2011
@@ -5,6 +5,11 @@
  * avoid including inferred types in CF update (CASSANDRA-2809)
  * fix re-using index CF sstable names after drop/recreate (CASSANDRA-2872)
  * fix hint replay (CASSANDRA-2928)
+ * don't accept extra args for 0-arg nodetool commands (CASSANDRA-2740)
+ * allows using cli functions in cli del statement (CASSANDRA-2821)
+ * allows quoted classes in CLI (CASSANDRA-2899)
+ * log unavailableexception details at debug level (CASSANDRA-2856)
+ * expose data_dir though jmx (CASSANDRA-2770)
 
 
 0.7.7

Modified: cassandra/branches/cassandra-0.7/NEWS.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/NEWS.txt?rev=1149176r1=1149175r2=1149176view=diff
==
--- cassandra/branches/cassandra-0.7/NEWS.txt (original)
+++ cassandra/branches/cassandra-0.7/NEWS.txt Thu Jul 21 13:53:13 2011
@@ -1,3 +1,12 @@
+0.7.8
+=
+
+Upgrading
+-
+- Nothing specific to 0.7.8, but see 0.7.3 Upgrading if upgrading
+  from earlier than 0.7.1.
+
+
 0.7.7
 =
 

Modified: cassandra/branches/cassandra-0.7/build.xml
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/build.xml?rev=1149176r1=1149175r2=1149176view=diff
==
--- cassandra/branches/cassandra-0.7/build.xml (original)
+++ cassandra/branches/cassandra-0.7/build.xml Thu Jul 21 13:53:13 2011
@@ -24,7 +24,7 @@
 property name=debuglevel value=source,lines,vars/
 
 !-- default version and SCM information (we need the default SCM info as 
people may checkout with git-svn) --
-property name=base.version value=0.7.7/
+property name=base.version value=0.7.8/
 property name=scm.default.path 
value=cassandra/branches/cassandra-0.7/
 property name=scm.default.connection 
value=scm:svn:http://svn.apache.org/repos/asf/${scm.default.path}/
 property name=scm.default.developerConnection 
value=scm:svn:https://svn.apache.org/repos/asf/${scm.default.path}/

Modified: cassandra/branches/cassandra-0.7/debian/changelog
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/debian/changelog?rev=1149176r1=1149175r2=1149176view=diff
==
--- cassandra/branches/cassandra-0.7/debian/changelog (original)
+++ cassandra/branches/cassandra-0.7/debian/changelog Thu Jul 21 13:53:13 2011
@@ -1,3 +1,9 @@
+cassandra (0.7.8) unstable; urgency=low
+
+  * New stable point release
+
+ -- Sylvain Lebresne slebre...@apache.org  Thu, 21 Jul 2011 15:51:51 +0200
+
 cassandra (0.7.7) unstable; urgency=low
 
   * New stable point release

[jira] [Commented] (CASSANDRA-2829) memtable with no post-flush activity can leave commitlog permanently dirty


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069011#comment-13069011
 ] 

Jonathan Ellis commented on CASSANDRA-2829:
---

bq. I also think there is a theoretical risk of a race condition with access to 
the segments Deque. The iterator runs in the postFlushExecutor

discardCompletedSegments actually does the real work in a task on the CL 
executor. Unless that's not what you're thinking of, I think we're ok here.

bq. It changes CFS.forceFlush() to always flush and trusts 
maybeSwitchMemtable() will only flush non clean CF's

Hmm.  Interesting.

Part of me thinks it can't be that simple but I don't see a problem with it. :)

Sylvain?


 memtable with no post-flush activity can leave commitlog permanently dirty 
 ---

 Key: CASSANDRA-2829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Morton
Assignee: Jonathan Ellis
 Fix For: 0.8.2

 Attachments: 0001-2829-unit-test-v08.patch, 
 0001-2829-unit-test.patch, 0002-2829-v08.patch, 0002-2829.patch


 Only dirty Memtables are flushed, and so only dirty memtables are used to 
 discard obsolete commit log segments. This can result it log segments not 
 been deleted even though the data has been flushed.  
 Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data 
 loaded and a running application working against the cluster. Did a rolling 
 restart and then kicked off a repair, one node filled up the commit log 
 volume with 7GB+ of log data, there was about 20 hours of log files. 
 {noformat}
 $ sudo ls -lah commitlog/
 total 6.9G
 drwx-- 2 cassandra cassandra  12K 2011-06-24 20:38 .
 drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
 -rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 
 CommitLog-1308876643288.log
 -rw--- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308876643288.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 
 CommitLog-1308877711517.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308877711517.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 
 CommitLog-1308879395824.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308879395824.log.header
 ...
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 
 CommitLog-1308946745380.log
 -rw-r--r-- 1 cassandra cassandra   36 2011-06-24 20:47 
 CommitLog-1308946745380.log.header
 -rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 
 CommitLog-1308947888397.log
 -rw-r--r-- 1 cassandra cassandra   44 2011-06-24 20:47 
 CommitLog-1308947888397.log.header
 {noformat}
 The user KS has 2 CF's with 60 minute flush times. System KS had the default 
 settings which is 24 hours. Will create another ticket see if these can be 
 reduced or if it's something users should do, in this case it would not have 
 mattered. 
 I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the 
 segments had the system CF's marked as dirty.
 {noformat}
 $ bin/logtool dirty /tmp/logs/commitlog/
 Not connected to a server, Keyspace and Column Family names are not available.
 /tmp/logs/commitlog/CommitLog-1308876643288.log.header
 Keyspace Unknown:
   Cf id 0: 444
 /tmp/logs/commitlog/CommitLog-1308877711517.log.header
 Keyspace Unknown:
   Cf id 1: 68848763
 ...
 /tmp/logs/commitlog/CommitLog-1308944451460.log.header
 Keyspace Unknown:
   Cf id 1: 61074
 /tmp/logs/commitlog/CommitLog-1308945597471.log.header
 Keyspace Unknown:
   Cf id 1000: 43175492
   Cf id 1: 108483
 /tmp/logs/commitlog/CommitLog-1308946745380.log.header
 Keyspace Unknown:
   Cf id 1000: 239223
   Cf id 1: 172211
 /tmp/logs/commitlog/CommitLog-1308947888397.log.header
 Keyspace Unknown:
   Cf id 1001: 57595560
   Cf id 1: 816960
   Cf id 1000: 0
 {noformat}
 CF 0 is the Status / LocationInfo CF and 1 is the HintedHandof CF. I dont 
 have it now, but IIRC CFStats showed the LocationInfo CF with dirty ops. 
 I was able to repo a case where flushing the CF's did not mark the log 
 segments as obsolete (attached unit-test patch). Steps are:
 1. Write to cf1 and flush.
 2. Current log segment is marked as dirty at the CL position when the flush 
 started, CommitLog.discardCompletedSegmentsInternal()
 3. Do not write to cf1 again.
 4. Roll the log, my test does this manually. 
 5. Write to CF2 and flush.
 6. Only CF2 is flushed because it is the only dirty CF. 
 cfs.maybeSwitchMemtable() is not called for cf1 and so log segment 1 is still 
 marked as dirty from cf1.
 Step 5 is not essential, just matched what I thought was happening. I thought 
 SystemTable.updateToken() was called

[Cassandra Wiki] Update of NodeTool by DavidAllsopp

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The NodeTool page has been changed by DavidAllsopp:
http://wiki.apache.org/cassandra/NodeTool?action=diffrev1=16rev2=17

  == Flush ==
  Flushes memtables (in memory) to SSTables (on disk), which also enables 
CommitLog segments to be deleted. 
  
+ == Removetoken==
+ Removes a dead node from the ring - this command is issued to any other live 
node (since clearly the dead node cannot respond!). 
+ 
  == Scrub ==
  Cassandra v0.7.1 and v0.7.2 shipped with a bug that caused incorrect 
row-level bloom filters to be generated when compacting sstables generated with 
earlier versions.  This would manifest in IOExceptions during column name-based 
queries.  v0.7.3 provides nodetool scrub to rebuild sstables with correct 
bloom filters, with no data lost. (If your cluster was never on 0.7.0 or 
earlier, you don't have to worry about this.)  Note that nodetool scrub will 
snapshot your data files before rebuilding, just in case.

[Cassandra Wiki] Trivial Update of NodeTool by DavidAllsopp

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The NodeTool page has been changed by DavidAllsopp:
http://wiki.apache.org/cassandra/NodeTool?action=diffrev1=17rev2=18

Comment:
Fixed broken heading

  == Flush ==
  Flushes memtables (in memory) to SSTables (on disk), which also enables 
CommitLog segments to be deleted. 
  
- == Removetoken==
+ == Removetoken ==
  Removes a dead node from the ring - this command is issued to any other live 
node (since clearly the dead node cannot respond!). 
  
  == Scrub ==

[Cassandra Wiki] Trivial Update of MultinodeCluster by DavidAllsopp

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The MultinodeCluster page has been changed by DavidAllsopp:
http://wiki.apache.org/cassandra/MultinodeCluster?action=diffrev1=8rev2=9

Comment:
Added extra detail about using netstat to verify listen address

  
  }}}
  
- Once these changes are made, simply restart cassandra on this node.  Use 
netstat to verify cassandra is listening on the right address.  Look for a line 
like this:
+ Once these changes are made, simply restart cassandra on this node.  Use 
netstat (e.g. `netstat -ant | grep 7000`) to verify cassandra is listening on 
the right address.  Look for a line like this:
  
  {{{tcp4   0  0  192.168.1.1.7000 *.*
LISTEN}}}

svn commit: r1149217 - /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java

Author: slebresne
Date: Thu Jul 21 15:14:36 2011
New Revision: 1149217

URL: http://svn.apache.org/viewvc?rev=1149217view=rev
Log:
Reverting #2825 until BootStrapper unit test is fixed

Modified:

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java?rev=1149217r1=1149216r2=1149217view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/StorageService.java
 Thu Jul 21 15:14:36 2011
@@ -1726,8 +1726,6 @@ public class StorageService implements I
 ListDecoratedKey keys = new ArrayListDecoratedKey();
 for (ColumnFamilyStore cfs : ColumnFamilyStore.all())
 {
-if (cfs.table.name.equals(Table.SYSTEM_TABLE))
-continue;
 for (DecoratedKey key : cfs.allKeySamples())
 {
 if (range.contains(key.token))
@@ -1736,19 +1734,9 @@ public class StorageService implements I
 }
 FBUtilities.sortSampledKeys(keys, range);
 
-Token token;
-if (keys.size()  3)
-{
-token = partitioner.midpoint(range.left, range.right);
-logger_.debug(Used midpoint to assign token  + token);
-}
-else
-{
-token = keys.get(keys.size() / 2).token;
-logger_.debug(Used key sample of size  + keys.size() +  to 
assign token  + token);
-}
-if (tokenMetadata_.isMember(tokenMetadata_.getEndpoint(token)))
-throw new RuntimeException(Chose token  + token +  which is 
already in use by  + tokenMetadata_.getEndpoint(token) +  -- specify one 
manually with initial_token);
+Token token = keys.size()  3
+? partitioner.midpoint(range.left, range.right)
+: keys.get(keys.size() / 2).token;
 // Hack to prevent giving nodes tokens with DELIMITER_STR in them 
(which is fine in a row key/token)
 if (token instanceof StringToken)
 {

[jira] [Commented] (CASSANDRA-2825) Auto bootstrapping the 4th node in a 4 node cluster doesn't work, when no token explicitly assigned in config.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069017#comment-13069017
 ] 

Sylvain Lebresne commented on CASSANDRA-2825:
-

I've reverted the patch too so we can do a release of 0.8.2 without having to 
wait on the unit test fix.

 Auto bootstrapping the 4th node in a 4 node cluster doesn't work, when no 
 token explicitly assigned in config.
 --

 Key: CASSANDRA-2825
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2825
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.0, 0.8.1
Reporter: Michael Allen
Assignee: Brandon Williams
 Fix For: 0.8.2

 Attachments: 2825-v2.txt, 2825.txt


 This was done in sequence.  A, B, C, and D.  Node A with token 0 explicitly 
 set in config.  The rest with auto_bootstrap: true and no token explicitly 
 assigned.  B and C work as expected. D ends up stealing C's token.  
 from system.log on C:
 INFO [GossipStage:1] 2011-06-24 16:40:41,947 Gossiper.java (line 638) Node 
 /10.171.47.226 is now part of the cluster
 INFO [GossipStage:1] 2011-06-24 16:40:41,947 Gossiper.java (line 606) 
 InetAddress /10.171.47.226 is now UP
 INFO [GossipStage:1] 2011-06-24 16:42:09,432 StorageService.java (line 769) 
 Nodes /10.171.47.226 and /10.171.55.77 have the same token 
 61078635599166706937511052402724559481.  /10.171.47.226 is the new owner
 WARN [GossipStage:1] 2011-06-24 16:42:09,432 TokenMetadata.java (line 120) 
 Token 61078635599166706937511052402724559481 changing ownership from 
 /10.171.55.77 to /10.171.47.226

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2932) Implement assume in cqlsh

2011-07-21 Thread Jeremy Hanna (JIRA)

Implement assume in cqlsh
---

 Key: CASSANDRA-2932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2932
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Priority: Minor


In the CLI there is a handy way to assume validators.  It would be very nice to 
have the assume command in cqlsh as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2932) Implement assume in cqlsh


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavel Yaskevich updated CASSANDRA-2932:
---

Description: In the CLI there is a handy way to assume CF 
comparators/validators (CASSANDRA-1693). It would be very nice to have the 
assume command in cqlsh as well.  (was: In the CLI there is a handy way to 
assume validators.  It would be very nice to have the assume command in cqlsh 
as well.)

 Implement assume in cqlsh
 ---

 Key: CASSANDRA-2932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2932
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Priority: Minor
  Labels: lhf

 In the CLI there is a handy way to assume CF comparators/validators 
 (CASSANDRA-1693). It would be very nice to have the assume command in cqlsh 
 as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-957) convenience workflow for replacing dead node

2011-07-21 Thread Vijay (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069029#comment-13069029
 ] 

Vijay commented on CASSANDRA-957:
-

Seems like CASSANDRA-2928 fixes the hints issue... so we can ignore 0003 in 
this ticket.

 convenience workflow for replacing dead node
 

 Key: CASSANDRA-957
 URL: https://issues.apache.org/jira/browse/CASSANDRA-957
 Project: Cassandra
  Issue Type: Wish
  Components: Core, Tools
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Vijay
 Fix For: 1.0

 Attachments: 0001-Support-Token-Replace.patch, 
 0001-Support-bringing-back-a-node-to-the-cluster-that-exi.patch, 
 0001-Support-token-replace.patch, 
 0002-Do-not-include-local-node-when-computing-workMap.patch, 
 0002-Rework-Hints-to-be-on-token.patch, 
 0002-Rework-Hints-to-be-on-token.patch, 
 0003-Make-HintedHandoff-More-reliable.patch, 
 0003-Make-hints-More-reliable.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 Replacing a dead node with a new one is a common operation, but nodetool 
 removetoken followed by bootstrap is inefficient (re-replicating data first 
 to the remaining nodes, then to the new one) and manually bootstrapping to a 
 token just less than the old one's, followed by nodetool removetoken is 
 slightly painful and prone to manual errors.
 First question: how would you expose this in our tool ecosystem?  It needs to 
 be a startup-time option to the new node, so it can't be nodetool, and 
 messing with the config xml definitely takes the convenience out.  A 
 one-off -DreplaceToken=XXY argument?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1149235 - in /cassandra/branches/cassandra-0.8: CHANGES.txt NEWS.txt build.xml debian/changelog

Author: slebresne
Date: Thu Jul 21 15:47:58 2011
New Revision: 1149235

URL: http://svn.apache.org/viewvc?rev=1149235view=rev
Log:
Updates for 0.8.2 release (changelog, news, version number)

Modified:
cassandra/branches/cassandra-0.8/CHANGES.txt
cassandra/branches/cassandra-0.8/NEWS.txt
cassandra/branches/cassandra-0.8/build.xml
cassandra/branches/cassandra-0.8/debian/changelog

Modified: cassandra/branches/cassandra-0.8/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/CHANGES.txt?rev=1149235r1=1149234r2=1149235view=diff
==
--- cassandra/branches/cassandra-0.8/CHANGES.txt (original)
+++ cassandra/branches/cassandra-0.8/CHANGES.txt Thu Jul 21 15:47:58 2011
@@ -39,6 +39,13 @@
  * prepend CF to default index names (CASSANDRA-2903)
  * fix hint replay (CASSANDRA-2928)
  * Properly synchronize merkle tree computation (CASSANDRA-2816)
+ * escape quotes in sstable2json (CASSANDRA-2780)
+ * allows using cli functions in cli del statement (CASSANDRA-2821)
+ * allows quoted classes in CLI (CASSANDRA-2899)
+ * expose data_dir though jmx (CASSANDRA-2770)
+ * proper support for validation and functions for cli count statement
+   (CASSANDRA-1902)
+ * debian package now depend on libjna-java (CASSANDRA-2803)
 
 
 0.8.1

Modified: cassandra/branches/cassandra-0.8/NEWS.txt
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/NEWS.txt?rev=1149235r1=1149234r2=1149235view=diff
==
--- cassandra/branches/cassandra-0.8/NEWS.txt (original)
+++ cassandra/branches/cassandra-0.8/NEWS.txt Thu Jul 21 15:47:58 2011
@@ -11,6 +11,16 @@ Upgrading
   if replicate_on_write was uncorrectly set to false (before or after
   upgrade).
 
+Tools
+-
+- Add new simplified classes to write sstables (to complement the bulk
+  loading utility).
+
+Other
+-
+- This release fix a regression of 0.8.1 that made hinted handoff being
+  never delivered. Upgrade from 0.8.1 is thus highly encourage.
+
 
 0.8.1
 =

Modified: cassandra/branches/cassandra-0.8/build.xml
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/build.xml?rev=1149235r1=1149234r2=1149235view=diff
==
--- cassandra/branches/cassandra-0.8/build.xml (original)
+++ cassandra/branches/cassandra-0.8/build.xml Thu Jul 21 15:47:58 2011
@@ -25,7 +25,7 @@
 property name=debuglevel value=source,lines,vars/
 
 !-- default version and SCM information (we need the default SCM info as 
people may checkout with git-svn) --
-property name=base.version value=0.8.2-dev/
+property name=base.version value=0.8.2/
 property name=scm.default.path 
value=cassandra/branches/cassandra-0.8/
 property name=scm.default.connection 
value=scm:svn:http://svn.apache.org/repos/asf/${scm.default.path}/
 property name=scm.default.developerConnection 
value=scm:svn:https://svn.apache.org/repos/asf/${scm.default.path}/

Modified: cassandra/branches/cassandra-0.8/debian/changelog
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/debian/changelog?rev=1149235r1=1149234r2=1149235view=diff
==
--- cassandra/branches/cassandra-0.8/debian/changelog (original)
+++ cassandra/branches/cassandra-0.8/debian/changelog Thu Jul 21 15:47:58 2011
@@ -1,3 +1,9 @@
+cassandra (0.8.2) unstable; urgency=low
+
+  * New release
+
+ -- Sylvain Lebresne slebre...@apache.org  Thu, 21 Jul 2011 17:45:19 +0200
+
 cassandra (0.8.1) unstable; urgency=low
 
   * New release

[jira] [Created] (CASSANDRA-2933) nodetool hangs (doesn't return prompt) if you specify a table that doesn't exist or a KS that has no CF's

2011-07-21 Thread Cathy Daw (JIRA)

nodetool hangs (doesn't return prompt) if you specify a table that doesn't 
exist or a KS that has no CF's
-

 Key: CASSANDRA-2933
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2933
 Project: Cassandra
  Issue Type: Bug
Reporter: Cathy Daw
Priority: Minor


Invalid CF
{code}
ERROR 02:18:18,904 Fatal exception in thread Thread[AntiEntropyStage:3,5,main]
java.lang.IllegalArgumentException: Unknown table/cf pair 
(StressKeyspace.StressStandard)
at org.apache.cassandra.db.Table.getColumnFamilyStore(Table.java:147)
at 
org.apache.cassandra.service.AntiEntropyService$TreeRequestVerbHandler.doVerb(AntiEntropyService.java:601)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}


Empty KS
{code}
 INFO 02:19:21,483 Waiting for repair requests: []
 INFO 02:19:21,484 Waiting for repair requests: []
 INFO 02:19:21,484 Waiting for repair requests: []
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2933) nodetool hangs (doesn't return prompt) if you specify a table that doesn't exist or a KS that has no CF's


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2933:
--

Component/s: Tools
 Labels: lhf  (was: )

 nodetool hangs (doesn't return prompt) if you specify a table that doesn't 
 exist or a KS that has no CF's
 -

 Key: CASSANDRA-2933
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2933
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Cathy Daw
Priority: Minor
  Labels: lhf

 Invalid CF
 {code}
 ERROR 02:18:18,904 Fatal exception in thread Thread[AntiEntropyStage:3,5,main]
 java.lang.IllegalArgumentException: Unknown table/cf pair 
 (StressKeyspace.StressStandard)
   at org.apache.cassandra.db.Table.getColumnFamilyStore(Table.java:147)
   at 
 org.apache.cassandra.service.AntiEntropyService$TreeRequestVerbHandler.doVerb(AntiEntropyService.java:601)
   at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 {code}
 Empty KS
 {code}
  INFO 02:19:21,483 Waiting for repair requests: []
  INFO 02:19:21,484 Waiting for repair requests: []
  INFO 02:19:21,484 Waiting for repair requests: []
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2829) memtable with no post-flush activity can leave commitlog permanently dirty


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069070#comment-13069070
 ] 

Sylvain Lebresne commented on CASSANDRA-2829:
-

I think this kind of work, in that we won't keep commit log forever, but it 
still keep commit logs for much longer than necessary because:
# it relies on forceFlush being called, which unless client triggered will only 
be after the memtable expires and quite a bunch of commit log could pile up 
during that time. Quite potentially enough to be a problem (if the commit logs 
fills up you hard drive, it doesn't matter much that it would have been 
deleted in 5 hours). I think we can do much better with not too much effort.
# when we do flush the expired memtable, we'll call maybeSwitchMemtable() will 
potentially clean memtables. This doesn't sound like a good use of resource: 
we'll grab the write lock, create a latch, create a new memtable, increment the 
memtable switch number, push an almost no-op job on the flush executor.

I think we should fix the real problem. The problem is that we discard segment, 
we always keep the current segment dirty because we don't know if there was 
some write since we grabbed the context. Let's add that information and fix 
that. This would make commit log being deleted much quicker, even if we don't 
consider the corner case of column family that have suddenly no write anymore, 
because CFs like the system ones, that have very low update volume can retain 
the logs longer than it's really need.

As for the fix, because the CL executor is mono-threaded, this is fairly easy, 
let's have an in-memory map of cfId-lastPositionWritten, and compare that to 
the context position in discardCompletedSegmentInternal (we could probably even 
just use a set of cfid who would meant: dirty since last getContext).

 memtable with no post-flush activity can leave commitlog permanently dirty 
 ---

 Key: CASSANDRA-2829
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2829
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Aaron Morton
Assignee: Jonathan Ellis
 Fix For: 0.8.3

 Attachments: 0001-2829-unit-test-v08.patch, 
 0001-2829-unit-test.patch, 0002-2829-v08.patch, 0002-2829.patch


 Only dirty Memtables are flushed, and so only dirty memtables are used to 
 discard obsolete commit log segments. This can result it log segments not 
 been deleted even though the data has been flushed.  
 Was using a 3 node 0.7.6-2 AWS cluster (DataStax AMI's) with pre 0.7 data 
 loaded and a running application working against the cluster. Did a rolling 
 restart and then kicked off a repair, one node filled up the commit log 
 volume with 7GB+ of log data, there was about 20 hours of log files. 
 {noformat}
 $ sudo ls -lah commitlog/
 total 6.9G
 drwx-- 2 cassandra cassandra  12K 2011-06-24 20:38 .
 drwxr-xr-x 3 cassandra cassandra 4.0K 2011-06-25 01:47 ..
 -rw--- 1 cassandra cassandra 129M 2011-06-24 01:08 
 CommitLog-1308876643288.log
 -rw--- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308876643288.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 01:36 
 CommitLog-1308877711517.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308877711517.log.header
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 02:20 
 CommitLog-1308879395824.log
 -rw-r--r-- 1 cassandra cassandra   28 2011-06-24 20:47 
 CommitLog-1308879395824.log.header
 ...
 -rw-r--r-- 1 cassandra cassandra 129M 2011-06-24 20:38 
 CommitLog-1308946745380.log
 -rw-r--r-- 1 cassandra cassandra   36 2011-06-24 20:47 
 CommitLog-1308946745380.log.header
 -rw-r--r-- 1 cassandra cassandra 112M 2011-06-24 20:54 
 CommitLog-1308947888397.log
 -rw-r--r-- 1 cassandra cassandra   44 2011-06-24 20:47 
 CommitLog-1308947888397.log.header
 {noformat}
 The user KS has 2 CF's with 60 minute flush times. System KS had the default 
 settings which is 24 hours. Will create another ticket see if these can be 
 reduced or if it's something users should do, in this case it would not have 
 mattered. 
 I grabbed the log headers and used the tool in CASSANDRA-2828 and most of the 
 segments had the system CF's marked as dirty.
 {noformat}
 $ bin/logtool dirty /tmp/logs/commitlog/
 Not connected to a server, Keyspace and Column Family names are not available.
 /tmp/logs/commitlog/CommitLog-1308876643288.log.header
 Keyspace Unknown:
   Cf id 0: 444
 /tmp/logs/commitlog/CommitLog-1308877711517.log.header
 Keyspace Unknown:
   Cf id 1: 68848763
 ...
 /tmp/logs/commitlog/CommitLog-1308944451460.log.header
 Keyspace Unknown:
   Cf id 1: 61074
 /tmp/logs/commitlog/CommitLog-1308945597471.log.header
 Keyspace Unknown:
   Cf id 1000:

[jira] [Commented] (CASSANDRA-2843) better performance on long row read

[
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069125#comment-13069125
]

Yang Yang commented on CASSANDRA-2843:
--

bq. I did some performance testing using Aaron's script here:
http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and overall in the
95th percentile there was an approximate 10% gain across the board.

I looked at Aaron's script, it actually returns 100 columns on each get. since
the column count filtering happens in the memtable iterator *before* the
collating iterator and ColumnMap.add(), the advantage of this patch is not
fully shown (only 10% ). I added a simple test case to the script to return all
columns , in the 10,000 columns case, the time reduction is about 50%. I'm
still running the full test, will upload the data later

better performance on long row read
---

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1405) Switch to THsHaServer, redux


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069137#comment-13069137
 ] 

Brandon Williams commented on CASSANDRA-1405:
-

I see, that's kind of annoying.  I'll change the log4j level to ERROR on 
commit.  One last thing is that the rpc_type should be validated, instead an 
invalid type produces:
{noformat}
 INFO 14:02:46,359 Listening for thrift clients...
ERROR 14:02:46,360 Fatal exception in thread Thread[Thread-3,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run(CassandraDaemon.java:192)
{noformat}

 Switch to THsHaServer, redux
 

 Key: CASSANDRA-1405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1405
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jonathan Ellis
Assignee: Vijay
Priority: Minor
 Fix For: 0.8.3

 Attachments: 0001-log4j-config-change.patch, 
 1405-Thrift-Patch-SVN.patch, libthrift-r1026391.jar, trunk-1405.patch


 Brian's patch to CASSANDRA-876  suggested using a custom TProcessorFactory 
 subclass, overriding getProcessor to reset to a default state when a new 
 client connects. It looks like this would allow dropping 
 CustomTThreadPoolServer as well as allowing non-thread based servers. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1405) Switch to THsHaServer, redux


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069145#comment-13069145
 ] 

Jonathan Ellis commented on CASSANDRA-1405:
---

bq. It is logged by thrift internally so we dont have much control over that

Let's submit a Thrift patch to fix that then.  It's easy to get a Java-only 
patch reviewed.

In the meantime I'm ok w/ turning org.apache.thrift log4j levels down to debug.

 Switch to THsHaServer, redux
 

 Key: CASSANDRA-1405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1405
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jonathan Ellis
Assignee: Vijay
Priority: Minor
 Fix For: 0.8.3

 Attachments: 0001-log4j-config-change.patch, 
 1405-Thrift-Patch-SVN.patch, libthrift-r1026391.jar, trunk-1405.patch


 Brian's patch to CASSANDRA-876  suggested using a custom TProcessorFactory 
 subclass, overriding getProcessor to reset to a default state when a new 
 client connects. It looks like this would allow dropping 
 CustomTThreadPoolServer as well as allowing non-thread based servers. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1149332 - in /cassandra/branches/cassandra-0.8: src/java/org/apache/cassandra/thrift/CassandraServer.java test/system/test_thrift_server.py

Author: jbellis
Date: Thu Jul 21 19:33:24 2011
New Revision: 1149332

URL: http://svn.apache.org/viewvc?rev=1149332view=rev
Log:
fix test failures w/ index names

Modified:

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java
cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java?rev=1149332r1=1149331r2=1149332view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/thrift/CassandraServer.java
 Thu Jul 21 19:33:24 2011
@@ -960,6 +960,7 @@ public class CassandraServer implements 
 CFMetaData oldCfm = 
DatabaseDescriptor.getCFMetaData(CFMetaData.getId(cf_def.keyspace, 
cf_def.name));
 if (oldCfm == null)
 throw new InvalidRequestException(Could not find column family 
definition to modify.);
+CFMetaData.addDefaultIndexNames(cf_def);
 ThriftValidation.validateCfDef(cf_def, oldCfm);
 validateSchemaAgreement();
 

Modified: cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py?rev=1149332r1=1149331r2=1149332view=diff
==
--- cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py 
(original)
+++ cassandra/branches/cassandra-0.8/test/system/test_thrift_server.py Thu Jul 
21 19:33:24 2011
@@ -1415,13 +1415,13 @@ class TestMutations(ThriftTester):
 
 ks1 = client.describe_keyspace('Keyspace1')
 cfid = [x.id for x in ks1.cf_defs if x.name=='BlankCF'][0]
-modified_cd = ColumnDef('birthdate', 'BytesType', IndexType.KEYS, 
'birthdate_index')
+modified_cd = ColumnDef('birthdate', 'BytesType', IndexType.KEYS, None)
 modified_cf = CfDef('Keyspace1', 'BlankCF', 
column_metadata=[modified_cd])
 modified_cf.id = cfid
 client.system_update_column_family(modified_cf)
 
 # Add a second indexed CF ...
-birthdate_coldef = ColumnDef('birthdate', 'BytesType', IndexType.KEYS, 
'birthdate2_index')
+birthdate_coldef = ColumnDef('birthdate', 'BytesType', IndexType.KEYS, 
None)
 age_coldef = ColumnDef('age', 'BytesType', IndexType.KEYS, 'age_index')
 cfdef = CfDef('Keyspace1', 'BlankCF2', 
column_metadata=[birthdate_coldef, age_coldef])
 client.system_add_column_family(cfdef)
@@ -1472,7 +1472,7 @@ class TestMutations(ThriftTester):
 # add an index on 'birthdate'
 ks1 = client.describe_keyspace('Keyspace1')
 cfid = [x.id for x in ks1.cf_defs if x.name=='ToBeIndexed'][0]
-modified_cd = ColumnDef('birthdate', 'BytesType', IndexType.KEYS, None)
+modified_cd = ColumnDef('birthdate', 'BytesType', IndexType.KEYS, 
'bd_index')
 modified_cf = CfDef('Keyspace1', 'ToBeIndexed', 
column_metadata=[modified_cd])
 modified_cf.id = cfid
 client.system_update_column_family(modified_cf)

[jira] [Created] (CASSANDRA-2934) log broken incoming connections at DEBUG

log broken incoming connections at DEBUG


 Key: CASSANDRA-2934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2934
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
 Fix For: 0.8.2
 Attachments: 2934.txt



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2934) log broken incoming connections at DEBUG


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2934:
--

Attachment: 2934.txt

 log broken incoming connections at DEBUG
 

 Key: CASSANDRA-2934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2934
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
 Fix For: 0.8.2

 Attachments: 2934.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1405) Switch to THsHaServer, redux


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069168#comment-13069168
 ] 

Brandon Williams commented on CASSANDRA-1405:
-

More serious benchmarks reveal that sync and hsha are even, and async is 50% 
slower.

 Switch to THsHaServer, redux
 

 Key: CASSANDRA-1405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1405
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jonathan Ellis
Assignee: Vijay
Priority: Minor
 Fix For: 0.8.3

 Attachments: 0001-log4j-config-change.patch, 
 1405-Thrift-Patch-SVN.patch, libthrift-r1026391.jar, trunk-1405.patch


 Brian's patch to CASSANDRA-876  suggested using a custom TProcessorFactory 
 subclass, overriding getProcessor to reset to a default state when a new 
 client connects. It looks like this would allow dropping 
 CustomTThreadPoolServer as well as allowing non-thread based servers. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1149341 - /cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

2011-07-21 Thread brandonwilliams

Author: brandonwilliams
Date: Thu Jul 21 20:10:54 2011
New Revision: 1149341

URL: http://svn.apache.org/viewvc?rev=1149341view=rev
Log:
Use a UDF-specific context signature.
Patch by Jeremy Hanna, reviewed by brandonwilliams for CASSANDRA-2869

Modified:

cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java

Modified: 
cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?rev=1149341r1=1149340r2=1149341view=diff
==
--- 
cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
 (original)
+++ 
cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java
 Thu Jul 21 20:10:54 2011
@@ -68,8 +68,6 @@ public class CassandraStorage extends Lo
 public final static String PIG_INITIAL_ADDRESS = PIG_INITIAL_ADDRESS;
 public final static String PIG_PARTITIONER = PIG_PARTITIONER;
 
-private static String UDFCONTEXT_SCHEMA_KEY_PREFIX = cassandra.schema;
-
 private final static ByteBuffer BOUND = ByteBufferUtil.EMPTY_BYTE_BUFFER;
 private static final Log logger = 
LogFactory.getLog(CassandraStorage.class);
 
@@ -78,6 +76,8 @@ public class CassandraStorage extends Lo
 private boolean slice_reverse = false;
 private String keyspace;
 private String column_family;
+private String loadSignature;
+private String storeSignature;
 
 private Configuration conf;
 private RecordReader reader;
@@ -112,7 +112,7 @@ public class CassandraStorage extends Lo
 if (!reader.nextKeyValue())
 return null;
 
-CfDef cfDef = getCfDef();
+CfDef cfDef = getCfDef(loadSignature);
 ByteBuffer key = (ByteBuffer)reader.getCurrentKey();
 SortedMapByteBuffer,IColumn cf = 
(SortedMapByteBuffer,IColumn)reader.getCurrentValue();
 assert key != null  cf != null;
@@ -165,11 +165,11 @@ public class CassandraStorage extends Lo
 return pair;
 }
 
-private CfDef getCfDef()
+private CfDef getCfDef(String signature)
 {
 UDFContext context = UDFContext.getUDFContext();
 Properties property = context.getUDFProperties(CassandraStorage.class);
-return cfdefFromString(property.getProperty(getSchemaContextKey()));
+return cfdefFromString(property.getProperty(signature));
 }
 
 private ListAbstractType getDefaultMarshallers(CfDef cfDef) throws 
IOException
@@ -289,7 +289,7 @@ public class CassandraStorage extends Lo
 }
 ConfigHelper.setInputColumnFamily(conf, keyspace, column_family);
 setConnectionInformation();
-initSchema();
+initSchema(loadSignature);
 }
 
 @Override
@@ -298,9 +298,16 @@ public class CassandraStorage extends Lo
 return location;
 }
 
+@Override
+public void setUDFContextSignature(String signature)
+{
+this.loadSignature = signature;
+}
+
 /* StoreFunc methods */
 public void setStoreFuncUDFContextSignature(String signature)
 {
+this.storeSignature = signature;
 }
 
 public String relToAbsPathForStoreLocation(String location, Path curDir) 
throws IOException
@@ -314,7 +321,7 @@ public class CassandraStorage extends Lo
 setLocationFromUri(location);
 ConfigHelper.setOutputColumnFamily(conf, keyspace, column_family);
 setConnectionInformation();
-initSchema();
+initSchema(storeSignature);
 }
 
 public OutputFormat getOutputFormat()
@@ -346,7 +353,7 @@ public class CassandraStorage extends Lo
 ByteBuffer key = objToBB(t.get(0));
 DefaultDataBag pairs = (DefaultDataBag) t.get(1);
 ArrayListMutation mutationList = new ArrayListMutation();
-CfDef cfDef = getCfDef();
+CfDef cfDef = getCfDef(storeSignature);
 ListAbstractType marshallers = getDefaultMarshallers(cfDef);
 MapByteBuffer,AbstractType validators = getValidatorMap(cfDef);
 try
@@ -404,7 +411,6 @@ public class CassandraStorage extends Lo
column.timestamp = System.currentTimeMillis() * 1000;
mutation.column_or_supercolumn = new 
ColumnOrSuperColumn();
mutation.column_or_supercolumn.column = column;
-   mutationList.add(mutation);
}
}
mutationList.add(mutation);
@@ -412,7 +418,7 @@ public class CassandraStorage extends Lo
 }
 catch (ClassCastException e)
 {
-throw new IOException(e +  Output must be (key, 
{(column,value)...}) for ColumnFamily or (key,

[jira] [Updated] (CASSANDRA-2496) Gossip should handle 'dead' states

2011-07-21 Thread paul cannon (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paul cannon updated CASSANDRA-2496:
---

Attachment: 0006-acknowledge-unexpected-repl-fins.patch.txt

0006-acknowledge-unexpected-repl-fins.patch.txt (updated): also log at info 
when acknowledging the unexpected messages

 Gossip should handle 'dead' states
 --

 Key: CASSANDRA-2496
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2496
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Attachments: 0001-Rework-token-removal-process.txt, 
 0002-add-2115-back.txt, 0003-update-gossip-related-comments.patch.txt, 
 0004-do-REMOVING_TOKEN-REMOVED_TOKEN.patch.txt, 
 0005-drain-self-if-removetoken-d-elsewhere.patch.txt, 
 0006-acknowledge-unexpected-repl-fins.patch.txt, 
 0006-acknowledge-unexpected-repl-fins.patch.txt


 For background, see CASSANDRA-2371

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2863) NPE when writing SSTable generated via repair


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2863.
---

   Resolution: Cannot Reproduce
Fix Version/s: (was: 0.8.3)
 Assignee: (was: Sylvain Lebresne)

Doesn't make any sense to me, either.  The only place close() is called is from 
index() [as seen in the stacktrace here] and the only place index() is called 
is after prepareIndexing, which sets iwriter to non-null:

{code}
long estimatedRows = indexer.prepareIndexing();

// build the index and filter
long rows = indexer.index();
{code}


 NPE when writing SSTable generated via repair
 -

 Key: CASSANDRA-2863
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2863
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
Reporter: Héctor Izquierdo

 A NPE is generated during repair when closing an sstable generated via 
 SSTable build. It doesn't happen always. The node had been scrubbed and 
 compacted before calling repair.
  INFO [CompactionExecutor:2] 2011-07-06 11:11:32,640 SSTableReader.java (line 
 158) Opening /d2/cassandra/data/sbs/walf-g-730
 ERROR [CompactionExecutor:2] 2011-07-06 11:11:34,327 
 AbstractCassandraDaemon.java (line 113) Fatal exception in thread 
 Thread[CompactionExecutor:2,1,main] 
 java.lang.NullPointerException
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.close(SSTableWriter.java:382)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:370)
   at 
 org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1103)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:1094)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Cassandra Wiki] Trivial Update of ArticlesAndPresentations by MatthewDennis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on Cassandra Wiki for 
change notification.

The ArticlesAndPresentations page has been changed by MatthewDennis:
http://wiki.apache.org/cassandra/ArticlesAndPresentations?action=diffrev1=123rev2=124

   * 
[[http://www.slideshare.net/hjort/persistncia-nas-nuvens-com-no-sql-hjort|Persistência
 nas Nuvens com NoSQL]], Brazilian Portuguese, June 2011
  
  = Presentations =
+  * [[http://www.slideshare.net/mattdennis/cassandra-data-modeling|Cassandra 
Data Modeling Workshop]] - Cassandra SF, Matthew F. Dennis, July 2011
   * 
[[http://www.slideshare.net/jeromatron/cassandrahadoop-integration|Cassandra/Hadoop
 Integration]] - Jeremy Hanna, January 2011
   * 
[[http://www.slideshare.net/supertom/using-cassandra-with-your-web-application|Using
 Cassandra with your Web Application]] - Tom Melendez, Oct 2010
   * 
[[http://www.slideshare.net/yutuki/cassandrah-baseno-sql|CassandraとHBaseの比較をして入門するNoSQL]]
 by Shusuke Shiina (Sep 2010 Japanese)

[jira] [Updated] (CASSANDRA-2496) Gossip should handle 'dead' states

2011-07-21 Thread paul cannon (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paul cannon updated CASSANDRA-2496:
---

Attachment: (was: 0006-acknowledge-unexpected-repl-fins.patch.txt)

 Gossip should handle 'dead' states
 --

 Key: CASSANDRA-2496
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2496
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Attachments: 0001-Rework-token-removal-process.txt, 
 0002-add-2115-back.txt, 0003-update-gossip-related-comments.patch.txt, 
 0004-do-REMOVING_TOKEN-REMOVED_TOKEN.patch.txt, 
 0005-drain-self-if-removetoken-d-elsewhere.patch.txt, 
 0006-acknowledge-unexpected-repl-fins.patch.txt


 For background, see CASSANDRA-2371

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2869) CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069189#comment-13069189
 ] 

Hudson commented on CASSANDRA-2869:
---

Integrated in Cassandra-0.7 #534 (See 
[https://builds.apache.org/job/Cassandra-0.7/534/])
Use a UDF-specific context signature.
Patch by Jeremy Hanna, reviewed by brandonwilliams for CASSANDRA-2869

brandonwilliams : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149341
Files : 
* 
/cassandra/branches/cassandra-0.7/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java


 CassandraStorage does not function properly when used multiple times in a 
 single pig script due to UDFContext sharing issues
 

 Key: CASSANDRA-2869
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2869
 Project: Cassandra
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.7.2
Reporter: Grant Ingersoll
Assignee: Jeremy Hanna
 Fix For: 0.7.9, 0.8.2

 Attachments: 2869-2.txt, 2869.txt


 CassandraStorage appears to have threading issues along the lines of those 
 described at http://pig.markmail.org/message/oz7oz2x2dwp66eoz due to the 
 sharing of the UDFContext.
 I believe the fix lies in implementing
 {code}
 public void setStoreFuncUDFContextSignature(String signature)
 {
 }
 {code}
 and then using that signature when getting the UDFContext.
 From the Pig manual:
 {quote}
 setStoreFunc!UDFContextSignature(): This method will be called by Pig both in 
 the front end and back end to pass a unique signature to the Storer. The 
 signature can be used to store into the UDFContext any information which the 
 Storer needs to store between various method invocations in the front end and 
 back end. The default implementation in StoreFunc has an empty body. This 
 method will be called before other methods.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1405) Switch to THsHaServer, redux

2011-07-21 Thread Vijay (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vijay updated CASSANDRA-1405:
-

Attachment: 0001-including-validation.patch

Awesome... Actually if we have a test with more client connections and 
unlimited threads in the sync we should actually have a better performance 
:)

BTW: attached is a validation, this goes on top of the earlier patch.

Jonathan, will submit a ticket and work on thrift patch to make it trace 
instead of error.

 Switch to THsHaServer, redux
 

 Key: CASSANDRA-1405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1405
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jonathan Ellis
Assignee: Vijay
Priority: Minor
 Fix For: 0.8.3

 Attachments: 0001-including-validation.patch, 
 0001-log4j-config-change.patch, 1405-Thrift-Patch-SVN.patch, 
 libthrift-r1026391.jar, trunk-1405.patch


 Brian's patch to CASSANDRA-876  suggested using a custom TProcessorFactory 
 subclass, overriding getProcessor to reset to a default state when a new 
 client connects. It looks like this would allow dropping 
 CustomTThreadPoolServer as well as allowing non-thread based servers. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2930) corrupt commitlog


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069197#comment-13069197
 ] 

Jonathan Ellis commented on CASSANDRA-2930:
---

Sounds like https://issues.apache.org/jira/browse/CASSANDRA-2675.  

Are you sure you're actually running 0.8.1?  We've had a lot of 0.8.0 installs 
that people thought were 0.8.1 due to incorrect packages being published.

grep Cassandra version /var/log/cassandra/system.log should verify.

 corrupt commitlog
 -

 Key: CASSANDRA-2930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Linux, amd64.
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: ivan
 Attachments: CommitLog-1310637513214.log


 We get Exception encountered during startup error while Cassandra starts.
 Error messages:
  INFO 13:56:28,736 Finished reading 
 /var/lib/cassandra/commitlog/CommitLog-1310637513214.log
 ERROR 13:56:28,736 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ... 13 more
 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at

[jira] [Commented] (CASSANDRA-2924) Consolidate JDBC driver classes: Connection and CassandraConnection in advance of feature additions for 1.1


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069199#comment-13069199
 ] 

Jonathan Ellis commented on CASSANDRA-2924:
---

Why is ThriftConnection introduced?

 Consolidate JDBC driver classes: Connection and CassandraConnection in 
 advance of feature additions for 1.1
 ---

 Key: CASSANDRA-2924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2924
 Project: Cassandra
  Issue Type: Improvement
  Components: Drivers
Affects Versions: 0.8.1
Reporter: Rick Shaw
Assignee: Rick Shaw
Priority: Minor
  Labels: JDBC
 Fix For: 0.8.3

 Attachments: 2924-v2.txt, consolidate-connection-v1.txt


 For the JDBC Driver suite, additional cleanup and consolidation of classes 
 {{Connection}} and {{CassandraConnection}} were in order. Those changes drove 
 a few casual additional changes in related classes {{CResultSet}}, 
 {{CassandraStatement}} and {{CassandraPreparedStatement}} in order to 
 continue to communicate properly. The class {{Utils}} was also enhanced to 
 move more static utility methods into this holder class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2496) Gossip should handle 'dead' states

2011-07-21 Thread paul cannon (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069202#comment-13069202
 ] 

paul cannon commented on CASSANDRA-2496:


ok, +1 with these patches.

 Gossip should handle 'dead' states
 --

 Key: CASSANDRA-2496
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2496
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Attachments: 0001-Rework-token-removal-process.txt, 
 0002-add-2115-back.txt, 0003-update-gossip-related-comments.patch.txt, 
 0004-do-REMOVING_TOKEN-REMOVED_TOKEN.patch.txt, 
 0005-drain-self-if-removetoken-d-elsewhere.patch.txt, 
 0006-acknowledge-unexpected-repl-fins.patch.txt


 For background, see CASSANDRA-2371

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back

2011-07-21 Thread Patricio Echague (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069201#comment-13069201
]

Patricio Echague commented on CASSANDRA-2045:
-

Tested with CASSANDRA-2928 patch and it works perfectly.

Test environment:
- 2 nodes on localhost (127.0.02 and .3)

Test case:
- start both nodes
- create the schema for testing
- stop node 1
- insert 5 keys into node 2.
- verified HintsColumnFamily that has 5 entries in node 2.
- start node 1.
- Verify that node 1 has the new data
- Verify that node 2 deleted the delivered hints.

Simplify HH to decrease read load when nodes come back
--

Key: CASSANDRA-2045
URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
Project: Cassandra
Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Nicholas Telford
Fix For: 1.0

Attachments:
0001-Changed-storage-of-Hints-to-store-a-serialized-RowMu.patch,
0002-Refactored-HintedHandoffManager.sendRow-to-reduce-co.patch,
0003-Fixed-some-coding-style-issues.patch,
0004-Fixed-direct-usage-of-Gossiper.getEndpointStateForEn.patch,
0005-Removed-duplicate-failure-detection-conditionals.-It.patch,
0006-Removed-handling-of-old-style-hints.patch, 2045-v3.txt, 2045-v5.txt,
2045-v6.txt, CASSANDRA-2045-simplify-hinted-handoff-001.diff,
CASSANDRA-2045-simplify-hinted-handoff-002.diff, CASSANDRA-2045-v4.diff

Currently when HH is enabled, hints are stored, and when a node comes back,
we begin sending that node data. We do a lookup on the local node for the row
to send. To help reduce read load (if a node is offline for long period of
time) we should store the data we want forward the node locally instead. We
wouldn't have to do any lookups, just take byte[] and send to the destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-1405) Switch to THsHaServer, redux


[ 
https://issues.apache.org/jira/browse/CASSANDRA-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069208#comment-13069208
 ] 

Brandon Williams commented on CASSANDRA-1405:
-

bq. Actually if we have a test with more client connections and unlimited 
threads in the sync we should actually have a better performance

With 2k conns, hsha starts to show a small edge over sync.

 Switch to THsHaServer, redux
 

 Key: CASSANDRA-1405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1405
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jonathan Ellis
Assignee: Vijay
Priority: Minor
 Fix For: 0.8.3

 Attachments: 0001-including-validation.patch, 
 0001-log4j-config-change.patch, 1405-Thrift-Patch-SVN.patch, 
 libthrift-r1026391.jar, trunk-1405.patch


 Brian's patch to CASSANDRA-876  suggested using a custom TProcessorFactory 
 subclass, overriding getProcessor to reset to a default state when a new 
 client connects. It looks like this would allow dropping 
 CustomTThreadPoolServer as well as allowing non-thread based servers. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2935) CLI ignores quoted default_validation_class in create column family command

2011-07-21 Thread Andy Bauch (JIRA)

CLI ignores quoted default_validation_class in create column family command
-

 Key: CASSANDRA-2935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2935
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.8.1
 Environment: Ubuntu 10.10
Reporter: Andy Bauch
Priority: Trivial


The default_validation_class parameter of CREATE COLUMN FAMILY is ignored when 
quotes.  The key_validation_class and comparator parameters do not exhibit this 
behavior.  

Sample output:

[default@vp2] create column family UserPlaybackHistory with 
comparator='AsciiType' and key_validation_class='AsciiType' and 
default_validation_class='AsciiType';  
18a9f020-b3ce-11e0--9904252df9ff
Waiting for schema agreement...
... schemas agree across the cluster

[default@vp2] describe keyspace;
Keyspace: vp2:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]
  Column Families:
ColumnFamily: UserPlaybackHistory
  Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.0875/232/1440 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: false
  Built indexes: []

[default@vp2] drop column family UserPlaybackHistory;
2513e2d0-b3ce-11e0--9904252df9ff
Waiting for schema agreement...
... schemas agree across the cluster

[default@vp2] create column family UserPlaybackHistory with 
comparator=AsciiType and key_validation_class=AsciiType and 
default_validation_class=AsciiType;  
5d1b4ce0-b3ce-11e0--9904252df9ff
Waiting for schema agreement...
... schemas agree across the cluster

[default@vp2] describe keyspace;
 
Keyspace: vp2:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:2]
  Column Families:
ColumnFamily: UserPlaybackHistory
  Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Default column value validator: org.apache.cassandra.db.marshal.AsciiType
  Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.0875/232/1440 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: false
  Built indexes: []


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2930) corrupt commitlog

2011-07-21 Thread ivan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069243#comment-13069243
 ] 

ivan commented on CASSANDRA-2930:
-

Hi Jonathan,

we built our Cassandra package from git://git.apache.org/cassandra.git 
cassandra-0.8.1 branch.
As I see it's the same as official 0.8.1 Cassandra code.

grep Cassandra version /var/log/cassandra/system.log:
INFO [main] 2011-07-21 14:42:48,553 StorageService.java (line 378) Cassandra 
version: 0.8.1

I checked CASSANDRA-2675 report.
Patch 0002-Avoid-modifying-super-column-in-memtable-being-flush-v2.patch is in 
our code.
PAtch 0001-Don-t-remove-columns-from-super-columns-in-memtable.patch is not in 
our code, but as I see it's not in official package and trunk also.

This error happens rarely. I found it 2 to 3 times in a 128MB commitlog.

I suspect some race condition, but a RowMutation shouldn't change. 
(SuperColumn.java:371)

So I have no clue yet. It's welcomed any further test suggestion.

Regards,
ivan


 corrupt commitlog
 -

 Key: CASSANDRA-2930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Linux, amd64.
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: ivan
 Attachments: CommitLog-1310637513214.log


 We get Exception encountered during startup error while Cassandra starts.
 Error messages:
  INFO 13:56:28,736 Finished reading 
 /var/lib/cassandra/commitlog/CommitLog-1310637513214.log
 ERROR 13:56:28,736 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ... 13 more
 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at

[jira] [Commented] (CASSANDRA-2761) JDBC driver does not build

2011-07-21 Thread Eric Evans (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069248#comment-13069248
]

Eric Evans commented on CASSANDRA-2761:
---

To summarize, it is now possible to build and test w/ ant. This is currently
done by pointing to a local (built) working copy of Cassandra (a site config).
What's left, and seems reasonable to scope with this issue:

* Create an alternate mechanism for specifying the version of Cassandra to
build/test against (in order to run the tests against prior releases). I'm
thinking Ivy could be used here to automatically download artifacts when a
property is passed (-Dcassandra.release=0.8.0 for example).
* (Re)build Cassandra as needed from the drivers Ant build, or at the very
least, handle the case when a build is needed.
* Fix the {{generate-eclipse-files}} target if possible, or remove it otherwise.

Work should also continue to reduce the cross-section of Cassandra that this
driver depends on, but I'll open another issue for that.

JDBC driver does not build
--

Key: CASSANDRA-2761
URL: https://issues.apache.org/jira/browse/CASSANDRA-2761
Project: Cassandra
Issue Type: Bug
Components: API
Affects Versions: 1.0
Reporter: Jonathan Ellis
Assignee: Rick Shaw
Fix For: 1.0

Attachments: jdbc-driver-build-v1.txt,
v1-0001-CASSANDRA-2761-cleanup-nits.txt

Need a way to build (and run tests for) the Java driver.
Also: still some vestigal references to drivers/ in trunk build.xml.
Should we remove drivers/ from the 0.8 branch as well?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2921) Split BufferedRandomAccessFile (BRAF) into Input and Output classes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069250#comment-13069250
 ] 

Jonathan Ellis commented on CASSANDRA-2921:
---

There's no reason to have Reader and Writer live in the same outer class 
anymore.  Let's split them up.

Similarly, no reason to split ARAF out from Reader.

Writer should probably just extend OutputStream, and let caller wrap in DOS if 
they want instead of pulling in that Harmony code (if we can get rid of it, 
great).


 Split BufferedRandomAccessFile (BRAF) into Input and Output classes 
 

 Key: CASSANDRA-2921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2921
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: CASSANDRA-2921-make-Writer-a-stream.patch, 
 CASSANDRA-2921-v2.patch, CASSANDRA-2921.patch


 Split BRAF into Input and Output classes to void complexity related to random 
 I/O in write mode that we don't need any more, see CASSANDRA-2879. And make 
 implementation more clean and reusable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2921) Split BufferedRandomAccessFile (BRAF) into Input and Output classes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069258#comment-13069258
 ] 

Pavel Yaskevich commented on CASSANDRA-2921:


Can we go only with v2 patch by now? I'm a bit concerned about design of the 
Writer: it should be able to seek back if we want to support resetAndTruncate() 
so there is no way to go away from using RAF inside of Writer that I see. I 
have tried to use FileOutputStream and it's channel but it didn't go so well.

 Split BufferedRandomAccessFile (BRAF) into Input and Output classes 
 

 Key: CASSANDRA-2921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2921
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: CASSANDRA-2921-make-Writer-a-stream.patch, 
 CASSANDRA-2921-v2.patch, CASSANDRA-2921.patch


 Split BRAF into Input and Output classes to void complexity related to random 
 I/O in write mode that we don't need any more, see CASSANDRA-2879. And make 
 implementation more clean and reusable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2921) Split BufferedRandomAccessFile (BRAF) into Input and Output classes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069262#comment-13069262
 ] 

Jonathan Ellis commented on CASSANDRA-2921:
---

By just extend OutputStream i just meant in the class definition, i agree 
that you probably need RAF internally.

 Split BufferedRandomAccessFile (BRAF) into Input and Output classes 
 

 Key: CASSANDRA-2921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2921
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: CASSANDRA-2921-make-Writer-a-stream.patch, 
 CASSANDRA-2921-v2.patch, CASSANDRA-2921.patch


 Split BRAF into Input and Output classes to void complexity related to random 
 I/O in write mode that we don't need any more, see CASSANDRA-2879. And make 
 implementation more clean and reusable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2921) Split BufferedRandomAccessFile (BRAF) into Input and Output classes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069263#comment-13069263
 ] 

Pavel Yaskevich commented on CASSANDRA-2921:


AbstractDataOutput extends OutputStream

 Split BufferedRandomAccessFile (BRAF) into Input and Output classes 
 

 Key: CASSANDRA-2921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2921
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: CASSANDRA-2921-make-Writer-a-stream.patch, 
 CASSANDRA-2921-v2.patch, CASSANDRA-2921.patch


 Split BRAF into Input and Output classes to void complexity related to random 
 I/O in write mode that we don't need any more, see CASSANDRA-2879. And make 
 implementation more clean and reusable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2921) Split BufferedRandomAccessFile (BRAF) into Input and Output classes


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069271#comment-13069271
 ] 

Pavel Yaskevich commented on CASSANDRA-2921:


I think we should stay with Reader/Writer introduced by v2 here to support full 
(expected) BRAF functionality. In the separate issue we can create an in-house 
implementation of FileOutputStream with support of mark(), 
resetAndTruncate(...) and truncate(...) methods and replace BRAF.Writer with it 
where needed, that should be a better design.

 Split BufferedRandomAccessFile (BRAF) into Input and Output classes 
 

 Key: CASSANDRA-2921
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2921
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Pavel Yaskevich
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: CASSANDRA-2921-make-Writer-a-stream.patch, 
 CASSANDRA-2921-v2.patch, CASSANDRA-2921.patch


 Split BRAF into Input and Output classes to void complexity related to random 
 I/O in write mode that we don't need any more, see CASSANDRA-2879. And make 
 implementation more clean and reusable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-07-21 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069275#comment-13069275
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

bq. after RpcTimeout we check the responseHandler write acks and write local 
hints for any missing targets.

CASSANDRA-2914 handles the local storage of hints.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2045) Simplify HH to decrease read load when nodes come back


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069280#comment-13069280
 ] 

Hudson commented on CASSANDRA-2045:
---

Integrated in Cassandra #968 (See 
[https://builds.apache.org/job/Cassandra/968/])
store hints as serialized mutations instead of pointers to data rows
patch by Nick Telford, jbellis, and Patricio Echague for CASSANDRA-2045

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149396
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowMutationVerbHandler.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/HintedHandOffManager.java


 Simplify HH to decrease read load when nodes come back
 --

 Key: CASSANDRA-2045
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2045
 Project: Cassandra
  Issue Type: Improvement
Reporter: Chris Goffinet
Assignee: Nicholas Telford
 Fix For: 1.0

 Attachments: 
 0001-Changed-storage-of-Hints-to-store-a-serialized-RowMu.patch, 
 0002-Refactored-HintedHandoffManager.sendRow-to-reduce-co.patch, 
 0003-Fixed-some-coding-style-issues.patch, 
 0004-Fixed-direct-usage-of-Gossiper.getEndpointStateForEn.patch, 
 0005-Removed-duplicate-failure-detection-conditionals.-It.patch, 
 0006-Removed-handling-of-old-style-hints.patch, 2045-v3.txt, 2045-v5.txt, 
 2045-v6.txt, CASSANDRA-2045-simplify-hinted-handoff-001.diff, 
 CASSANDRA-2045-simplify-hinted-handoff-002.diff, CASSANDRA-2045-v4.diff


 Currently when HH is enabled, hints are stored, and when a node comes back, 
 we begin sending that node data. We do a lookup on the local node for the row 
 to send. To help reduce read load (if a node is offline for long period of 
 time) we should store the data we want forward the node locally instead. We 
 wouldn't have to do any lookups, just take byte[] and send to the destination.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (CASSANDRA-2937) certain generic type causes compile error in eclipse

certain generic type causes compile error in eclipse


 Key: CASSANDRA-2937
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2937
 Project: Cassandra
  Issue Type: Bug
Reporter: Yang Yang
Priority: Trivial


the code ColumnFamily and AbstractColumnContainer uses code similar to the 
following (substitute Blah with AbstractColumnContainer.DeletionInfo):



import java.util.concurrent.atomic.AtomicReference;
public class TestPrivateAtomicRef {
protected final AtomicReferenceBlah b = new AtomicReferenceBlah(new 
Blah());
// the following form would work for eclipse
//protected final AtomicReference b = new AtomicReference(new Blah());

private static class Blah {
}
}


class Child extends TestPrivateAtomicRef {
public void aaa() {
Child c = new Child();
c.b.set(
b.get()  // eclipse shows error here
);
}
}


in eclipse, the above code generates compile error, but works fine under java 
command line. since many people use eclipse, it's better to 
make a temporary compromise and make DeletionInfo protected

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2924) Consolidate JDBC driver classes: Connection and CassandraConnection in advance of feature additions for 1.1


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069284#comment-13069284
 ] 

Rick Shaw commented on CASSANDRA-2924:
--

It defines the methods that are to be implemented over and above those required 
by the {{java.sql.Connection}} interface; these moved from the 
{{o.a.c.cql.jdbc.Connection}}. That class can now be removed. (I did not know 
how to do that in the patch?) I thought that seemed like the right thing to do 
(defining an interface) but it is not necessary.

 Consolidate JDBC driver classes: Connection and CassandraConnection in 
 advance of feature additions for 1.1
 ---

 Key: CASSANDRA-2924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2924
 Project: Cassandra
  Issue Type: Improvement
  Components: Drivers
Affects Versions: 0.8.1
Reporter: Rick Shaw
Assignee: Rick Shaw
Priority: Minor
  Labels: JDBC
 Fix For: 0.8.3

 Attachments: 2924-v2.txt, consolidate-connection-v1.txt


 For the JDBC Driver suite, additional cleanup and consolidation of classes 
 {{Connection}} and {{CassandraConnection}} were in order. Those changes drove 
 a few casual additional changes in related classes {{CResultSet}}, 
 {{CassandraStatement}} and {{CassandraPreparedStatement}} in order to 
 continue to communicate properly. The class {{Utils}} was also enhanced to 
 move more static utility methods into this holder class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2937) certain generic type causes compile error in eclipse


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2937:
-

Attachment: 0002-avoid-eclipse-compile-error-for-generic-type-on-Atom.patch

minor fix to avoid Eclipse compile error

 certain generic type causes compile error in eclipse
 

 Key: CASSANDRA-2937
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2937
 Project: Cassandra
  Issue Type: Bug
Reporter: Yang Yang
Priority: Trivial
 Attachments: 
 0002-avoid-eclipse-compile-error-for-generic-type-on-Atom.patch


 the code ColumnFamily and AbstractColumnContainer uses code similar to the 
 following (substitute Blah with AbstractColumnContainer.DeletionInfo):
 import java.util.concurrent.atomic.AtomicReference;
 public class TestPrivateAtomicRef {
 protected final AtomicReferenceBlah b = new AtomicReferenceBlah(new 
 Blah());
 // the following form would work for eclipse
 //protected final AtomicReference b = new AtomicReference(new Blah());
 private static class Blah {
 }
 }
 class Child extends TestPrivateAtomicRef {
 public void aaa() {
 Child c = new Child();
 c.b.set(
 b.get()  // eclipse shows error here
 );
 }
 }
 in eclipse, the above code generates compile error, but works fine under java 
 command line. since many people use eclipse, it's better to 
 make a temporary compromise and make DeletionInfo protected

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: 2843_d.patch

the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_c.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: 2843_d.patch

the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: 2843_d.patch)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: 2843_c.patch)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: 2843_d.patch

the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: 2843_d.patch)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: incremental.diff)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, microBenchmark.patch, 
 patch_timing, std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Comment: was deleted

(was: the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, microBenchmark.patch, 
 patch_timing, std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: 2843_d.patch)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, microBenchmark.patch, 
 patch_timing, std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Comment: was deleted

(was: the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, microBenchmark.patch, 
 patch_timing, std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Comment: was deleted

(was: the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, microBenchmark.patch, 
 patch_timing, std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2935) CLI ignores quoted default_validation_class in create column family command


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams resolved CASSANDRA-2935.
-

Resolution: Duplicate

Dupe of CASSANDRA-2899

 CLI ignores quoted default_validation_class in create column family command
 -

 Key: CASSANDRA-2935
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2935
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Affects Versions: 0.8.1
 Environment: Ubuntu 10.10
Reporter: Andy Bauch
Priority: Trivial

 The default_validation_class parameter of CREATE COLUMN FAMILY is ignored 
 when quotes.  The key_validation_class and comparator parameters do not 
 exhibit this behavior.  
 Sample output:
 [default@vp2] create column family UserPlaybackHistory with 
 comparator='AsciiType' and key_validation_class='AsciiType' and 
 default_validation_class='AsciiType';  
 18a9f020-b3ce-11e0--9904252df9ff
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@vp2] describe keyspace;
 Keyspace: vp2:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:2]
   Column Families:
 ColumnFamily: UserPlaybackHistory
   Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Default column value validator: 
 org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
   Row cache size / save period in seconds: 0.0/0
   Key cache size / save period in seconds: 20.0/14400
   Memtable thresholds: 1.0875/232/1440 (millions of ops/MB/minutes)
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: false
   Built indexes: []
 [default@vp2] drop column family UserPlaybackHistory;
 2513e2d0-b3ce-11e0--9904252df9ff
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@vp2] create column family UserPlaybackHistory with 
 comparator=AsciiType and key_validation_class=AsciiType and 
 default_validation_class=AsciiType;  
 5d1b4ce0-b3ce-11e0--9904252df9ff
 Waiting for schema agreement...
 ... schemas agree across the cluster
 [default@vp2] describe keyspace;  
   
  
 Keyspace: vp2:
   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
   Durable Writes: true
 Options: [replication_factor:2]
   Column Families:
 ColumnFamily: UserPlaybackHistory
   Key Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Default column value validator: 
 org.apache.cassandra.db.marshal.AsciiType
   Columns sorted by: org.apache.cassandra.db.marshal.AsciiType
   Row cache size / save period in seconds: 0.0/0
   Key cache size / save period in seconds: 20.0/14400
   Memtable thresholds: 1.0875/232/1440 (millions of ops/MB/minutes)
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: false
   Built indexes: []

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: 2843_d.patch

the DeletionInfo private=protected change is moved to 
https://issues.apache.org/jira/browse/CASSANDRA-2937

new patch uploaded here

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_c.patch, 2843_d.patch, 2843_d.patch, 
 fast_cf_081_trunk.diff, incremental.diff, microBenchmark.patch, patch_timing, 
 std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2843) better performance on long row read


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yang updated CASSANDRA-2843:
-

Attachment: (was: fast_cf_081_trunk.diff)

 better performance on long row read
 ---

 Key: CASSANDRA-2843
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
 Project: Cassandra
  Issue Type: New Feature
Reporter: Yang Yang
 Attachments: 2843.patch, 2843_d.patch, microBenchmark.patch, 
 patch_timing, std_timing


 currently if a row contains  1000 columns, the run time becomes considerably 
 slow (my test of 
 a row with 30 00 columns (standard, regular) each with 8 bytes in name, and 
 40 bytes in value, is about 16ms.
 this is all running in memory, no disk read is involved.
 through debugging we can find
 most of this time is spent on 
 [Wall Time]  org.apache.cassandra.db.Table.getRow(QueryFilter)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int, 
 ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter, 
 int, ColumnFamily)
 [Wall Time]  
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
  Iterator, int)
 [Wall Time]  
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
  Iterator, int)
 [Wall Time]  org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
 ColumnFamily.addColumn() is slow because it inserts into an internal 
 concurrentSkipListMap() that maps column names to values.
 this structure is slow for two reasons: it needs to do synchronization; it 
 needs to maintain a more complex structure of map.
 but if we look at the whole read path, thrift already defines the read output 
 to be ListColumnOrSuperColumn so it does not make sense to use a luxury map 
 data structure in the interium and finally convert it to a list. on the 
 synchronization side, since the return CF is never going to be 
 shared/modified by other threads, we know the access is always single thread, 
 so no synchronization is needed.
 but these 2 features are indeed needed for ColumnFamily in other cases, 
 particularly write. so we can provide a different ColumnFamily to 
 CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always 
 creates the standard ColumnFamily, but take a provided returnCF, whose cost 
 is much cheaper.
 the provided patch is for demonstration now, will work further once we agree 
 on the general direction. 
 CFS, ColumnFamily, and Table  are changed; a new FastColumnFamily is 
 provided. the main work is to let the FastColumnFamily use an array  for 
 internal storage. at first I used binary search to insert new columns in 
 addColumn(), but later I found that even this is not necessary, since all 
 calling scenarios of ColumnFamily.addColumn() has an invariant that the 
 inserted columns come in sorted order (I still have an issue to resolve 
 descending or ascending  now, but ascending works). so the current logic is 
 simply to compare the new column against the end column in the array, if 
 names not equal, append, if equal, reconcile.
 slight temporary hacks are made on getTopLevelColumnFamily so we have 2 
 flavors of the method, one accepting a returnCF. but we could definitely 
 think about what is the better way to provide this returnCF.
 this patch compiles fine, no tests are provided yet. but I tested it in my 
 application, and the performance improvement is dramatic: it offers about 50% 
 reduction in read time in the 3000-column case.
 thanks
 Yang

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2761) JDBC driver does not build


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069317#comment-13069317
 ] 

Jonathan Ellis commented on CASSANDRA-2761:
---

+1 cleanup patch

 JDBC driver does not build
 --

 Key: CASSANDRA-2761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2761
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 1.0
Reporter: Jonathan Ellis
Assignee: Rick Shaw
 Fix For: 1.0

 Attachments: jdbc-driver-build-v1.txt, 
 v1-0001-CASSANDRA-2761-cleanup-nits.txt


 Need a way to build (and run tests for) the Java driver.
 Also: still some vestigal references to drivers/ in trunk build.xml.
 Should we remove drivers/ from the 0.8 branch as well?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-2930) corrupt commitlog


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis reassigned CASSANDRA-2930:
-

Assignee: Sylvain Lebresne

 corrupt commitlog
 -

 Key: CASSANDRA-2930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Linux, amd64.
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: ivan
Assignee: Sylvain Lebresne
 Fix For: 0.8.3

 Attachments: CommitLog-1310637513214.log


 We get Exception encountered during startup error while Cassandra starts.
 Error messages:
  INFO 13:56:28,736 Finished reading 
 /var/lib/cassandra/commitlog/CommitLog-1310637513214.log
 ERROR 13:56:28,736 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ... 13 more
 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)

[jira] [Updated] (CASSANDRA-2930) corrupt commitlog


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-2930:
--

Fix Version/s: 0.8.3

 corrupt commitlog
 -

 Key: CASSANDRA-2930
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2930
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.1
 Environment: Linux, amd64.
 Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Reporter: ivan
Assignee: Sylvain Lebresne
 Fix For: 0.8.3

 Attachments: CommitLog-1310637513214.log


 We get Exception encountered during startup error while Cassandra starts.
 Error messages:
  INFO 13:56:28,736 Finished reading 
 /var/lib/cassandra/commitlog/CommitLog-1310637513214.log
 ERROR 13:56:28,736 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ... 13 more
 Exception encountered during startup.
 java.io.IOError: java.io.EOFException
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:265)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:281)
 at 
 org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:236)
 at 
 java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
 at 
 java.util.concurrent.ConcurrentSkipListMap.init(ConcurrentSkipListMap.java:1443)
 at 
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:419)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:139)
 at 
 org.apache.cassandra.db.ColumnFamilySerializer.deserialize(ColumnFamilySerializer.java:127)
 at 
 org.apache.cassandra.db.RowMutation$RowMutationSerializer.deserialize(RowMutation.java:382)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:278)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:158)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:175)
 at 
 org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:368)
 at 
 org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
 Caused by: java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:394)
 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:368)
 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:87)
 at 
 org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:261)
 ...

[jira] [Updated] (CASSANDRA-2914) Simplify HH to always store hints on the coordinator

2011-07-21 Thread Patricio Echague (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patricio Echague updated CASSANDRA-2914:


Attachment: CASSANDRA-2914-trunk-v2.diff

v2 replaces v1

Rebase diff file
Add entry in CHANGES.txt
update javadoc and comments.

 Simplify HH to always store hints on the coordinator
 

 Key: CASSANDRA-2914
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2914
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 1.0
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: CASSANDRA-2914-trunk-v1.diff, 
 CASSANDRA-2914-trunk-v2.diff


 Moved from CASSANDRA-2045:
 Since we're storing the full mutation post-2045, there's no benefit to be 
 gained from storing the hint on the replica node, only an increase in 
 complexity.  Let's switch it to always store hints on the coordinator 
 instead. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2934) log broken incoming connections at DEBUG


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069329#comment-13069329
 ] 

Brandon Williams commented on CASSANDRA-2934:
-

+1

 log broken incoming connections at DEBUG
 

 Key: CASSANDRA-2934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2934
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
 Fix For: 0.8.2

 Attachments: 2934.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

svn commit: r1149426 - in /cassandra/trunk: CHANGES.txt src/java/org/apache/cassandra/db/RowMutation.java src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java

Author: jbellis
Date: Fri Jul 22 01:13:06 2011
New Revision: 1149426

URL: http://svn.apache.org/viewvc?rev=1149426view=rev
Log:
store hints in the coordinator node instead of in the closest replica
patch by Patricio Echague; reviewed by jbellis for CASSANDRA-2914

Modified:
cassandra/trunk/CHANGES.txt
cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java

cassandra/trunk/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java

Modified: cassandra/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/CHANGES.txt?rev=1149426r1=1149425r2=1149426view=diff
==
--- cassandra/trunk/CHANGES.txt (original)
+++ cassandra/trunk/CHANGES.txt Fri Jul 22 01:13:06 2011
@@ -16,6 +16,8 @@
  * use reference counting for deleting sstables instead of relying on the GC
(CASSANDRA-2521)
  * store hints as serialized mutations instead of pointers to data rows
+ * store hints in the coordinator node instead of in the closest 
+   replica (CASSANDRA-2914). 
 
 
 0.8.2

Modified: cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java?rev=1149426r1=1149425r2=1149426view=diff
==
--- cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java (original)
+++ cassandra/trunk/src/java/org/apache/cassandra/db/RowMutation.java Fri Jul 
22 01:13:06 2011
@@ -97,6 +97,23 @@ public class RowMutation implements IMut
 return modifications_.values();
 }
 
+/**
+ * Returns mutation representing a Hints to be sent to codeaddress/code
+ * as soon as it becomes available.
+ * The format is the following:
+ *
+ * HintsColumnFamily: {// cf
+ *   dest ip: {  // key
+ * uuid: {   // super-column
+ *   table: table// columns
+ *   key: key
+ *   mutation: mutation
+ *   version: version
+ * }
+ *   }
+ * }
+ *
+ */
 public static RowMutation hintFor(RowMutation mutation, ByteBuffer 
address) throws IOException
 {
 RowMutation rm = new RowMutation(Table.SYSTEM_TABLE, address);

Modified: 
cassandra/trunk/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java?rev=1149426r1=1149425r2=1149426view=diff
==
--- 
cassandra/trunk/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java
 (original)
+++ 
cassandra/trunk/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java
 Fri Jul 22 01:13:06 2011
@@ -157,10 +157,7 @@ public abstract class AbstractReplicatio
 if (map.size() == targets.size() || 
!StorageProxy.isHintedHandoffEnabled())
 return map;
 
-// assign dead endpoints to be hinted to the closest live one, or to 
the local node
-// (since it is trivially the closest) if none are alive.  This way, 
the cost of doing
-// a hint is only adding the hint header, rather than doing a full 
extra write, if any
-// destination nodes are alive.
+// Assign dead endpoints to be hinted to the local node.
 //
 // we do a 2nd pass on targets instead of using temporary storage,
 // to optimize for the common case (everything was alive).
@@ -176,10 +173,8 @@ public abstract class AbstractReplicatio
 continue;
 }
 
-InetAddress destination = map.isEmpty()
-? localAddress
-: 
snitch.getSortedListByProximity(localAddress, map.keySet()).get(0);
-map.put(destination, ep);
+// We always store the hint on the coordinator node.
+map.put(localAddress, ep);
 }
 
 return map;

svn commit: r1149430 - /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java

Author: jbellis
Date: Fri Jul 22 01:48:17 2011
New Revision: 1149430

URL: http://svn.apache.org/viewvc?rev=1149430view=rev
Log:
log broken incoming connections at DEBUG
patch by jbellis; reviewed by brandonwilliams for CASSANDRA-2934

Modified:

cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java

Modified: 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
URL: 
http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java?rev=1149430r1=1149429r2=1149430view=diff
==
--- 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
 (original)
+++ 
cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
 Fri Jul 22 01:48:17 2011
@@ -75,8 +75,9 @@ public class IncomingTcpConnection exten
 }
 catch (IOException e)
 {
+logger.debug(Incoming IOException, e);
 close();
-throw new IOError(e);
+return;
 }
 
 if (version  MessagingService.version_)

svn commit: r1149431 - /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java

Author: jbellis
Date: Fri Jul 22 01:50:07 2011
New Revision: 1149431

URL: http://svn.apache.org/viewvc?rev=1149431view=rev
Log:
humor Eclipse

Modified:

cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java

Modified: 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
URL: 
http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java?rev=1149431r1=1149430r2=1149431view=diff
==
--- 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java 
(original)
+++ 
cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java 
Fri Jul 22 01:50:07 2011
@@ -193,7 +193,7 @@ public abstract class AbstractColumnCont
 return columns.values().iterator();
 }
 
-private static class DeletionInfo
+protected static class DeletionInfo
 {
 public final long markedForDeleteAt;
 public final int localDeletionTime;

[jira] [Resolved] (CASSANDRA-2937) certain generic type causes compile error in eclipse


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis resolved CASSANDRA-2937.
---

Resolution: Fixed

committed, although in general I'm against humoring broken tools

 certain generic type causes compile error in eclipse
 

 Key: CASSANDRA-2937
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2937
 Project: Cassandra
  Issue Type: Bug
Reporter: Yang Yang
Priority: Trivial
 Attachments: 
 0002-avoid-eclipse-compile-error-for-generic-type-on-Atom.patch


 the code ColumnFamily and AbstractColumnContainer uses code similar to the 
 following (substitute Blah with AbstractColumnContainer.DeletionInfo):
 import java.util.concurrent.atomic.AtomicReference;
 public class TestPrivateAtomicRef {
 protected final AtomicReferenceBlah b = new AtomicReferenceBlah(new 
 Blah());
 // the following form would work for eclipse
 //protected final AtomicReference b = new AtomicReference(new Blah());
 private static class Blah {
 }
 }
 class Child extends TestPrivateAtomicRef {
 public void aaa() {
 Child c = new Child();
 c.b.set(
 b.get()  // eclipse shows error here
 );
 }
 }
 in eclipse, the above code generates compile error, but works fine under java 
 command line. since many people use eclipse, it's better to 
 make a temporary compromise and make DeletionInfo protected

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2924) Consolidate JDBC driver classes: Connection and CassandraConnection in advance of feature additions for 1.1


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069350#comment-13069350
 ] 

Jonathan Ellis commented on CASSANDRA-2924:
---

[committed w/ above changes]

 Consolidate JDBC driver classes: Connection and CassandraConnection in 
 advance of feature additions for 1.1
 ---

 Key: CASSANDRA-2924
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2924
 Project: Cassandra
  Issue Type: Improvement
  Components: Drivers
Affects Versions: 0.8.1
Reporter: Rick Shaw
Assignee: Rick Shaw
Priority: Minor
  Labels: JDBC
 Fix For: 0.8.3

 Attachments: 2924-v2.txt, consolidate-connection-v1.txt


 For the JDBC Driver suite, additional cleanup and consolidation of classes 
 {{Connection}} and {{CassandraConnection}} were in order. Those changes drove 
 a few casual additional changes in related classes {{CResultSet}}, 
 {{CassandraStatement}} and {{CassandraPreparedStatement}} in order to 
 continue to communicate properly. The class {{Utils}} was also enhanced to 
 move more static utility methods into this holder class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2761) JDBC driver does not build


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069353#comment-13069353
 ] 

Rick Shaw commented on CASSANDRA-2761:
--

+1 for the cleanup path

The {{generate-eclipse-files}} seems to be working for me? How does it fail?



 JDBC driver does not build
 --

 Key: CASSANDRA-2761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2761
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 1.0
Reporter: Jonathan Ellis
Assignee: Rick Shaw
 Fix For: 1.0

 Attachments: jdbc-driver-build-v1.txt, 
 v1-0001-CASSANDRA-2761-cleanup-nits.txt


 Need a way to build (and run tests for) the Java driver.
 Also: still some vestigal references to drivers/ in trunk build.xml.
 Should we remove drivers/ from the 0.8 branch as well?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-2761) JDBC driver does not build


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069353#comment-13069353
 ] 

Rick Shaw edited comment on CASSANDRA-2761 at 7/22/11 2:08 AM:
---

+1 for the cleanup patch

The {{generate-eclipse-files}} seems to be working for me? How does it fail?



  was (Author: ardot):
+1 for the cleanup path

The {{generate-eclipse-files}} seems to be working for me? How does it fail?


  
 JDBC driver does not build
 --

 Key: CASSANDRA-2761
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2761
 Project: Cassandra
  Issue Type: Bug
  Components: API
Affects Versions: 1.0
Reporter: Jonathan Ellis
Assignee: Rick Shaw
 Fix For: 1.0

 Attachments: jdbc-driver-build-v1.txt, 
 v1-0001-CASSANDRA-2761-cleanup-nits.txt


 Need a way to build (and run tests for) the Java driver.
 Also: still some vestigal references to drivers/ in trunk build.xml.
 Should we remove drivers/ from the 0.8 branch as well?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2934) log broken incoming connections at DEBUG


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069356#comment-13069356
 ] 

Hudson commented on CASSANDRA-2934:
---

Integrated in Cassandra-0.8 #234 (See 
[https://builds.apache.org/job/Cassandra-0.8/234/])
log broken incoming connections at DEBUG
patch by jbellis; reviewed by brandonwilliams for CASSANDRA-2934

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1149430
Files : 
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/net/IncomingTcpConnection.java


 log broken incoming connections at DEBUG
 

 Key: CASSANDRA-2934
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2934
 Project: Cassandra
  Issue Type: Task
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Trivial
 Fix For: 0.8.2

 Attachments: 2934.txt




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2914) Simplify HH to always store hints on the coordinator