date:20110809

[jira] [Created] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)

Enormous counter 
-

 Key: CASSANDRA-3006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.3
 Environment: ubuntu 10.04
Reporter: Boris Yen


I have two-node cluster with the following keyspace and column family settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [datacenter1:2]
  Column Families:
ColumnFamily: testCounter (Super)
APP status information.
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: 
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []

Then, I use a test program based on hector to add a counter column 
(testCounter[sc][column]) 1000 times. In the middle the adding process, I 
intentional shut down the node 172.17.19.152. In addition to that, the test 
program is smart enough to switch the consistency level from Quorum to One, so 
that the following adding actions would not fail. 

After all the adding actions are done, I start the cassandra on 172.17.19.152, 
and I use cassandra-cli to check if the counter is correct on both nodes, and I 
got a result 1001 which should be reasonable because hector will retry once. 
However, when I shut down 172.17.19.151 and after 172.17.19.152 is aware of 
172.17.19.151 is down, I try to start the cassandra on 172.17.19.151 again. 
Then, I check the counter again, this time I got a result 481387 which is so 
wrong.

I use 0.8.3 the reproduce this bug, but I think this also happens on 0.8.2 or 
before also. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081545#comment-13081545
 ] 

Boris Yen commented on CASSANDRA-3006:
--

I forgot the mention that the counter is out of sync between these two nodes, 
one shows 481387 and the other one shows 20706.

 Enormous counter 
 -

 Key: CASSANDRA-3006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.3
 Environment: ubuntu 10.04
Reporter: Boris Yen

 I have two-node cluster with the following keyspace and column family 
 settings.
 Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions: 
   63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]
 Keyspace: test:
   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
   Durable Writes: true
 Options: [datacenter1:2]
   Column Families:
 ColumnFamily: testCounter (Super)
 APP status information.
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator: 
 org.apache.cassandra.db.marshal.CounterColumnType
   Columns sorted by: 
 org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
   Row cache size / save period in seconds: 0.0/0
   Key cache size / save period in seconds: 20.0/14400
   Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: true
   Built indexes: []
 Then, I use a test program based on hector to add a counter column 
 (testCounter[sc][column]) 1000 times. In the middle the adding process, I 
 intentional shut down the node 172.17.19.152. In addition to that, the test 
 program is smart enough to switch the consistency level from Quorum to One, 
 so that the following adding actions would not fail. 
 After all the adding actions are done, I start the cassandra on 
 172.17.19.152, and I use cassandra-cli to check if the counter is correct on 
 both nodes, and I got a result 1001 which should be reasonable because hector 
 will retry once. However, when I shut down 172.17.19.151 and after 
 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the cassandra 
 on 172.17.19.151 again. Then, I check the counter again, this time I got a 
 result 481387 which is so wrong.
 I use 0.8.3 the reproduce this bug, but I think this also happens on 0.8.2 or 
 before also. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3006) Enormous counter

2011-08-09 Thread Boris Yen (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Yen updated CASSANDRA-3006:
-

Description: 
I have two-node cluster with the following keyspace and column family settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [datacenter1:2]
  Column Families:
ColumnFamily: testCounter (Super)
APP status information.
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: 
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []

Then, I use a test program based on hector to add a counter column 
(testCounter[sc][column]) 1000 times. In the middle the adding process, I 
intentional shut down the node 172.17.19.152. In addition to that, the test 
program is smart enough to switch the consistency level from Quorum to One, so 
that the following adding actions would not fail. 

After all the adding actions are done, I start the cassandra on 172.17.19.152, 
and I use cassandra-cli to check if the counter is correct on both nodes, and I 
got a result 1001 which should be reasonable because hector will retry once. 
However, when I shut down 172.17.19.151 and after 172.17.19.152 is aware of 
172.17.19.151 is down, I try to start the cassandra on 172.17.19.151 again. 
Then, I check the counter again, this time I got a result 481387 which is so 
wrong.

I use 0.8.3 to reproduce this bug, but I think this also happens on 0.8.2 or 
before also. 

  was:
I have two-node cluster with the following keyspace and column family settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions: 
63fda700-c243-11e0--2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
Options: [datacenter1:2]
  Column Families:
ColumnFamily: testCounter (Super)
APP status information.
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: 
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: 
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds: 0.0/0
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []

Then, I use a test program based on hector to add a counter column 
(testCounter[sc][column]) 1000 times. In the middle the adding process, I 
intentional shut down the node 172.17.19.152. In addition to that, the test 
program is smart enough to switch the consistency level from Quorum to One, so 
that the following adding actions would not fail. 

After all the adding actions are done, I start the cassandra on 172.17.19.152, 
and I use cassandra-cli to check if the counter is correct on both nodes, and I 
got a result 1001 which should be reasonable because hector will retry once. 
However, when I shut down 172.17.19.151 and after 172.17.19.152 is aware of 
172.17.19.151 is down, I try to start the cassandra on 172.17.19.151 again. 
Then, I check the counter again, this time I got a result 481387 which is so 
wrong.

I use 0.8.3 the reproduce this bug, but I think this also happens on 0.8.2 or 
before also. 


 Enormous counter 
 -

 Key: CASSANDRA-3006
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3006
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.3
 Environment: ubuntu 10.04
Reporter: Boris Yen

 I have two-node cluster with the following keyspace and column family 
 settings.
 Cluster Information:
Snitch: org.apache.cassandra.locator.SimpleSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions: 
   63fda700-c243-11e0--2d03dcafebdf:

[jira] [Updated] (CASSANDRA-2843) better performance on long row read

2011-08-09 Thread Sylvain Lebresne (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-2843:

Attachment: 2843_h.patch

bq. the IColumnMap name when it does not implement Map interface, and some
things it has in common with Map (iteration) it changes semantics of (iterating
values instead of keys). not sure what to use instead though, since we already
have an IColumnContainer. Maybe ISortedColumns?

Yeah, I'm not sure I have a better name either, maybe ISortedColumnHolder, but
not sure it's better than ISortedColumns so attached rebased patch simply
rename ColumnMap - SortedColumns

bq. TSCM and ALCM extending instead of wrapping CSLM/AL, respectively

The idea was to save one object creation. I admit this is probably not a huge
deal, but it felt that in this case it was no big deal to extend instead of
wrapping either, so felt like worth optimizing. I still stand by that choice
but I have no good argument against the criticism that it is possibly premature.

bq. unrelated reformatting

If we're talking about the ones in SuperColumn.java, sorry, I mistakenly forced
re-indentation on the file which rewrote the tab to spaces. New patch keeps the
old formatting. I'd mention that there is also a few places where I've rewrote
cf.getSortedColumns().iterator() to cf.iterator(), which is arguably a bit
gratuitous for this patch, but I figured this avoids creating a new Collection
in the case of CLSM and there's not so many occurrences.

better performance on long row read
---

Key: CASSANDRA-2843
URL: https://issues.apache.org/jira/browse/CASSANDRA-2843
Project: Cassandra
Issue Type: New Feature
Reporter: Yang Yang
Fix For: 1.0

Attachments: 2843.patch, 2843_d.patch, 2843_g.patch, 2843_h.patch,
fix.diff, microBenchmark.patch, patch_timing, std_timing

currently if a row contains 1000 columns, the run time becomes considerably
slow (my test of
a row with 30 00 columns (standard, regular) each with 8 bytes in name, and
40 bytes in value, is about 16ms.
this is all running in memory, no disk read is involved.
through debugging we can find
most of this time is spent on
[Wall Time] org.apache.cassandra.db.Table.getRow(QueryFilter)
[Wall Time]
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter,
ColumnFamily)
[Wall Time]
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(QueryFilter, int,
ColumnFamily)
[Wall Time]
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(QueryFilter,
int, ColumnFamily)
[Wall Time]
org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(ColumnFamily,
Iterator, int)
[Wall Time]
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(IColumnContainer,
Iterator, int)
[Wall Time] org.apache.cassandra.db.ColumnFamily.addColumn(IColumn)
ColumnFamily.addColumn() is slow because it inserts into an internal
concurrentSkipListMap() that maps column names to values.
this structure is slow for two reasons: it needs to do synchronization; it
needs to maintain a more complex structure of map.
but if we look at the whole read path, thrift already defines the read output
to be ListColumnOrSuperColumn so it does not make sense to use a luxury map
data structure in the interium and finally convert it to a list. on the
synchronization side, since the return CF is never going to be
shared/modified by other threads, we know the access is always single thread,
so no synchronization is needed.
but these 2 features are indeed needed for ColumnFamily in other cases,
particularly write. so we can provide a different ColumnFamily to
CFS.getTopLevelColumnFamily(), so getTopLevelColumnFamily no longer always
creates the standard ColumnFamily, but take a provided returnCF, whose cost
is much cheaper.
the provided patch is for demonstration now, will work further once we agree
on the general direction.
CFS, ColumnFamily, and Table are changed; a new FastColumnFamily is
provided. the main work is to let the FastColumnFamily use an array for
internal storage. at first I used binary search to insert new columns in
addColumn(), but later I found that even this is not necessary, since all
calling scenarios of ColumnFamily.addColumn() has an invariant that the
inserted columns come in sorted order (I still have an issue to resolve
descending or ascending now, but ascending works). so the current logic is
simply to compare the new column against the end column in the array, if
names not equal, append, if equal, reconcile.
slight temporary hacks are made on getTopLevelColumnFamily so we have 2
flavors of the method, one accepting a returnCF. but we could definitely
think about what is the better way to provide

[jira] [Created] (CASSANDRA-3007) NullPointerException in MessagingService.java:420

2011-08-09 Thread Viliam Holub (JIRA)

NullPointerException in MessagingService.java:420
-

 Key: CASSANDRA-3007
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3007
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.3
 Environment: Linux w0 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2 
05:15:26 UTC 2010 x86_64 GNU/Linux
java version 1.6.0_18
OpenJDK Runtime Environment (IcedTea6 1.8.7) (6b18-1.8.7-2~squeeze1)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
Reporter: Viliam Holub
Priority: Minor


I'm getting large quantity of exceptions during streaming. It is always in 
MessagingService.java:420. The streaming appears to be blocked.

 INFO 10:11:14,734 Streaming to /10.235.77.27
ERROR 10:11:14,734 Fatal exception in thread Thread[StreamStage:2,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.net.MessagingService.stream(MessagingService.java:420)
at 
org.apache.cassandra.streaming.StreamOutSession.begin(StreamOutSession.java:176)
at 
org.apache.cassandra.streaming.StreamOut.transferRangesForRequest(StreamOut.java:148)
at 
org.apache.cassandra.streaming.StreamRequestVerbHandler.doVerb(StreamRequestVerbHandler.java:54)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pavel Yaskevich updated CASSANDRA-1717:
---

Attachment: CASSANDRA-1717-v2.patch

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit
tests btw).

Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int
internally (getValue() returns a long only because CRC32 implements the
interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to
modify that system-wide.

bq. Here we checksum the compressed data. The other approach would be to
checksum the uncompressed data. The advantage of checksumming compressed data
is the speed (less data to checksum), but checksumming the uncompressed data
would be a little bit safer. In particular, it would prevent us from messing up
in the decompression (and we don't have to trust the compression algorithm, not
that I don't trust Snappy, but...). This is a clearly a trade-off that we have
to make, but I admit that my personal preference would lean towards safety (in
particular, I know that checksumming the uncompressed data give a bit more
safety, I don't know what is our exact gain quantitatively with checksumming
compressed data). On the other side, checksumming the uncompressed data would
likely mean that a good part of the bitrot would result in a decompression
error rather than a checksum error, which is maybe less convenient from the
implementation point of view. So I don't know, I guess I'm thinking aloud to
have other's opinions more than anything else.

Checksum is moved to the original data.

bq. Let's add some unit tests. At least it's relatively easy to write a few
blocks, switch one bit in the resulting file, and checking this is caught at
read time (or better, do that multiple time changing a different bit each time).

Test was added to CompressedRandomAccessReaderTest.

As Todd noted, HADOOP-6148 contains a bunch of discussions on the efficiency of
java CRC32. In particular, it seems they have been able to close to double the
speed of the CRC32, with a solution that seems fairly simple to me. It would be
ok to use java native CRC32 and leave the improvement to another ticket, but
quite frankly if it is that simple and since the hadoop guys have done all the
hard work for us, I say we start with the efficient version directly.

As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

Cassandra cannot detect corrupt-but-readable column data

Key: CASSANDRA-1717
URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
Fix For: 1.0

Attachments: CASSANDRA-1717-v2.patch, CASSANDRA-1717.patch,
checksums.txt

Most corruptions of on-disk data due to bitrot render the column (or row)
unreadable, so the data can be replaced by read repair or anti-entropy. But
if the corruption keeps column data readable we do not detect it, and if it
corrupts to a higher timestamp value can even resist being overwritten by
newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081569#comment-13081569
]

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:25 AM:
-

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit
tests btw).

Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int
internally (getValue() returns a long only because CRC32 implements the
interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to
modify that system-wide.

Checksum is moved to the original data.

Test was added to CompressedRandomAccessReaderTest.

bq. As Todd noted, HADOOP-6148 contains a bunch of discussions on the
efficiency of java CRC32. In particular, it seems they have been able to close
to double the speed of the CRC32, with a solution that seems fairly simple to
me. It would be ok to use java native CRC32 and leave the improvement to
another ticket, but quite frankly if it is that simple and since the hadoop
guys have done all the hard work for us, I say we start with the efficient
version directly.

As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

was (Author: xedin):
bq. CSW.flushData() forgot to reset the checksum (this is caught by the
unit tests btw).

Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int
internally (getValue() returns a long only because CRC32 implements the
interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to
modify that system-wide.

Checksum is moved to the original data.

Test was added to CompressedRandomAccessReaderTest.

[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-09 Thread Pavel Yaskevich (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081569#comment-13081569
]

Pavel Yaskevich edited comment on CASSANDRA-1717 at 8/9/11 11:29 AM:
-

bq. CSW.flushData() forgot to reset the checksum (this is caught by the unit
tests btw).

Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int
internally (getValue() returns a long only because CRC32 implements the
interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to
modify that system-wide.

It checksums original (non-compressed) data and stores checksum at the end of
the compressed chunk, reader makes a checksum check after decompression.

Test was added to CompressedRandomAccessReaderTest.

As decided previously this will be a matter of the separate ticket.

Rebased with latest trunk (last commit 1e36fb1e44bff96005dd75a25648ff25eea6a95f)

was (Author: xedin):
bq. CSW.flushData() forgot to reset the checksum (this is caught by the
unit tests btw).

Not a problem since it was due to Sylvain's bad merge.

bq. We should convert the CRC32 to an int (and only write that) as it is an int
internally (getValue() returns a long only because CRC32 implements the
interface Checksum that require that).

Lets leave that to the ticket for CRC optimization which will allow us to
modify that system-wide.

Checksum is moved to the original data.

Test was added to CompressedRandomAccessReaderTest.

96 matches

Mail list logo