[jira] [Comment Edited] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019165#comment-16019165
 ] 

Jeff Jirsa edited comment on CASSANDRA-13004 at 5/22/17 5:46 AM:
-

Thanks for the script! Repro'd for me on the first try on 3.0.13.

{code}
MacBook-Pro:cassandra-13004 jjirsa$ ./stress
2017/05/21 22:34:18 ERROR ON READ 295331734109814785: java.io.IOError: 
java.io.IOException: Corrupt flags value for unfiltered partition (isStatic 
flag set): 252
2017/05/21 22:34:18 ERROR ON READ 295331720679522316: java.io.IOError: 
java.io.EOFException: EOF after 23415 bytes out of 262146
{code}

127.0.0.1:

{code}
ERROR [MessagingService-Incoming-/127.0.0.3] 2017-05-21 22:34:18,447 
CassandraDaemon.java:207 - Exception in thread 
Thread[MessagingService-Incoming-/127.0.0.3,5,main]
java.lang.RuntimeException: Unknown column aaa during deserialization
at 
org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:432) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:428)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:661)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:327)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:283)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
{code}

127.0.0.3:

{code}
ERROR [SharedPool-Worker-5] 2017-05-21 22:34:18,460 Message.java:621 - 
Unexpected exception during request; channel = [id: 0x7b0819ca, 
L:/127.0.0.3:9042 - R:/127.0.0.3:59709]
java.io.IOError: java.io.EOFException: EOF after 23415 bytes out of 262146
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:222)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:210)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:369)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:189)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:509)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:369)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.cql3.statements.SelectStatement.processPartition(SelectStatement.java:774)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:711)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 

[jira] [Commented] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019165#comment-16019165
 ] 

Jeff Jirsa commented on CASSANDRA-13004:


Thanks for the script! Repro'd for me on the first try on 3.0.13.

{quote}
MacBook-Pro:cassandra-13004 jjirsa$ ./stress
2017/05/21 22:34:18 ERROR ON READ 295331734109814785: java.io.IOError: 
java.io.IOException: Corrupt flags value for unfiltered partition (isStatic 
flag set): 252
2017/05/21 22:34:18 ERROR ON READ 295331720679522316: java.io.IOError: 
java.io.EOFException: EOF after 23415 bytes out of 262146
{quote}

127.0.0.1:

{quote}
ERROR [MessagingService-Incoming-/127.0.0.3] 2017-05-21 22:34:18,447 
CassandraDaemon.java:207 - Exception in thread 
Thread[MessagingService-Incoming-/127.0.0.3,5,main]
java.lang.RuntimeException: Unknown column aaa during deserialization
at 
org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:432) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.SerializationHeader$Serializer.deserializeForMessaging(SerializationHeader.java:428)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.deserializeHeader(UnfilteredRowIteratorSerializer.java:190)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize30(PartitionUpdate.java:661)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.deserialize(PartitionUpdate.java:635)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:327)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:346)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:283)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at org.apache.cassandra.net.MessageIn.read(MessageIn.java:98) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:201)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:178)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:92)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
{quote}

127.0.0.3:

{quote}
ERROR [SharedPool-Worker-5] 2017-05-21 22:34:18,460 Message.java:621 - 
Unexpected exception during request; channel = [id: 0x7b0819ca, 
L:/127.0.0.3:9042 - R:/127.0.0.3:59709]
java.io.IOError: java.io.EOFException: EOF after 23415 bytes out of 262146
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:222)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer$1.computeNext(UnfilteredRowIteratorSerializer.java:210)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:369)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:189)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:509)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:369)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129) 
~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.cql3.statements.SelectStatement.processPartition(SelectStatement.java:774)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:711)
 ~[apache-cassandra-3.0.13.jar:3.0.13]
at 

[jira] [Commented] (CASSANDRA-13209) test failure in cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_blogposts_with_max_connections

2017-05-21 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019158#comment-16019158
 ] 

Stefania commented on CASSANDRA-13209:
--

bq. however it will never go past 1.

If it succeeds importing rows in a following attempt, it would not log attempt 
no. 2.

bq. Going to fix the error handling in the COPY FROM command to actually do 
retries (like COPY TO does),

COPY FROM does retry on timeouts, otherwise 
{{test_bulk_round_trip_with_timeouts}} would always fail. The code that retries 
is in the [error 
callback|https://github.com/apache/cassandra/blob/trunk/pylib/cqlshlib/copyutil.py#L2583].
  These [two 
lines|https://github.com/apache/cassandra/blob/trunk/pylib/cqlshlib/copyutil.py#L2579]
 just above are because of 
[PYTHON-652|https://datastax-oss.atlassian.net/browse/PYTHON-652], which 
doesn't seem fixed yet. You may want to try and see if by any chance these 
lines cause some time outs not to get retried but I doubt it. The reason why 
COPY TO and COPY FROM have difference retry mechanisms is performance 
(CASSANDRA-11053).

bq. Also once the COPY FROM code in cqlsh gets over 1000 failed rows it exits,

This is configurable, see 
[here|https://github.com/apache/cassandra/blob/trunk/pylib/cqlshlib/copyutil.py#L362].

> test failure in 
> cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest.test_bulk_round_trip_blogposts_with_max_connections
> --
>
> Key: CASSANDRA-13209
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13209
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Assignee: Kurt Greaves
>  Labels: dtest, test-failure
> Attachments: node1.log, node2.log, node3.log, node4.log, node5.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.1_dtest/528/testReport/cqlsh_tests.cqlsh_copy_tests/CqlshCopyTest/test_bulk_round_trip_blogposts_with_max_connections
> {noformat}
> Error Message
> errors={'127.0.0.4': 'Client request timeout. See 
> Session.execute[_async](timeout)'}, last_host=127.0.0.4
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-792s6j
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: removing ccm cluster test at: /tmp/dtest-792s6j
> dtest: DEBUG: clearing ssl stores from [/tmp/dtest-792s6j] directory
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-uNMsuW
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.policies: INFO: Using datacenter 'datacenter1' for 
> DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify 
> a local_dc to the constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  
> discovered
> cassandra.cluster: INFO: New Cassandra host  
> discovered
> cassandra.cluster: INFO: New Cassandra host  
> discovered
> cassandra.cluster: INFO: New Cassandra host  
> discovered
> dtest: DEBUG: Running stress with user profile 
> /home/automaton/cassandra-dtest/cqlsh_tests/blogposts.yaml
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/dtest.py", line 1090, in wrapped
> f(obj)
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2571, in test_bulk_round_trip_blogposts_with_max_connections
> copy_from_options={'NUMPROCESSES': 2})
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2500, in _test_bulk_round_trip
> num_records = create_records()
>   File "/home/automaton/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2473, in create_records
> ret = rows_to_list(self.session.execute(count_statement))[0][0]
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 1998, in execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "/home/automaton/src/cassandra-driver/cassandra/cluster.py", line 
> 3784, in result
> 

[jira] [Updated] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13004:
---
Fix Version/s: 4.x
   3.11.x
   3.0.x

> Corruption while adding a column to a table
> ---
>
> Key: CASSANDRA-13004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stanislav Vishnevskiy
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
> nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
> id bigint,
> type int,
> allow_ int,
> deny int
> );
> CREATE TABLE IF NOT EXISTS discord_channels.channels (
> id bigint,
> guild_id bigint,
> type tinyint,
> name text,
> topic text,
> position int,
> owner_id bigint,
> icon_hash text,
> recipients map,
> permission_overwrites map,
> bitrate int,
> user_limit int,
> last_pin_timestamp timestamp,
> last_message_id bigint,
> PRIMARY KEY (id)
> );
> {code}
> And then we executed the following alter.
> {code:none}
> ALTER TABLE discord_channels.channels ADD application_id bigint;
> {code}
> And one row (that we can tell) got corrupted at the same time and could no 
> longer be read from the Python driver. 
> {code:none}
> [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. 
> ver(4); flags(); stream(27); op(8); offset(9); len(887); buffer: 
> '\x84\x00\x00\x1b\x08\x00\x00\x03w\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x0f\x00\x10discord_channels\x00\x08channels\x00\x02id\x00\x02\x00\x0eapplication_id\x00\x02\x00\x07bitrate\x00\t\x00\x08guild_id\x00\x02\x00\ticon_hash\x00\r\x00\x0flast_message_id\x00\x02\x00\x12last_pin_timestamp\x00\x0b\x00\x04name\x00\r\x00\x08owner_id\x00\x02\x00\x15permission_overwrites\x00!\x00\x02\x000\x00\x10discord_channels\x00\x1cchannel_permission_overwrite\x00\x04\x00\x02id\x00\x02\x00\x04type\x00\t\x00\x06allow_\x00\t\x00\x04deny\x00\t\x00\x08position\x00\t\x00\nrecipients\x00!\x00\x02\x000\x00\x10discord_channels\x00\x11channel_recipient\x00\x01\x00\x04nick\x00\r\x00\x05topic\x00\r\x00\x04type\x00\x14\x00\nuser_limit\x00\t\x00\x00\x00\x01\x00\x00\x00\x08\x03\x8a\x19\x8e\xf8\x82\x00\x01\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\xfa\x00\x00\x00\x00\x08\x00\x00\xfa\x00\x00\xf8G\xc5\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8b\xc0\xb5nB\x00\x02\x00\x00\x00\x08G\xc5\xffI\x98\xc4\xb4(\x00\x00\x00\x03\x8b\xc0\xa8\xff\xff\xff\xff\x00\x00\x01<\x00\x00\x00\x06\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x040\x07\xf8Q\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x07\xf8Q\x00\x00\x00\x04\x10\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x05\xe8A\x00\x00\x00\x04\x10\x02\x00\x00\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00
>  
> \x08\x01\x00\x00\x00\x04\xc4\xb4(\x00\xff\xff\xff\xff\x00\x00\x00O[f\x80Q\x07general\x05\xf8G\xc5\xffI\x98\xc4\xb4(\x00\xf8O[f\x80Q\x00\x00\x00\x02\x04\xf8O[f\x80Q\x00\xf8G\xc5\xffI\x98\x01\x00\x00\xf8O[f\x80Q\x00\x00\x00\x00\xf8G\xc5\xffI\x97\xc4\xb4(\x06\x00\xf8O\x7fe\x1fm\x08\x03\x00\x00\x00\x01\x00\x00\x00\x00\x04\x00\x00\x00\x00'
> {code}
> And then in cqlsh when trying to read the row we got this. 
> {code:none}
> /usr/bin/cqlsh.py:632: DateOverFlowWarning: Some timestamps are larger than 
> Python datetime can represent. Timestamps are displayed in milliseconds from 
> epoch.
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1301, in perform_simple_statement
> result = future.result()
>   File 
> "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.5.0.post0-d8d0456.zip/cassandra-driver-3.5.0.post0-d8d0456/cassandra/cluster.py",
>  line 3650, in result
> raise self._final_exception
> UnicodeDecodeError: 'utf8' codec can't decode 

[jira] [Updated] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13004:
---
Priority: Critical  (was: Major)

> Corruption while adding a column to a table
> ---
>
> Key: CASSANDRA-13004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stanislav Vishnevskiy
>Priority: Critical
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
> nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
> id bigint,
> type int,
> allow_ int,
> deny int
> );
> CREATE TABLE IF NOT EXISTS discord_channels.channels (
> id bigint,
> guild_id bigint,
> type tinyint,
> name text,
> topic text,
> position int,
> owner_id bigint,
> icon_hash text,
> recipients map,
> permission_overwrites map,
> bitrate int,
> user_limit int,
> last_pin_timestamp timestamp,
> last_message_id bigint,
> PRIMARY KEY (id)
> );
> {code}
> And then we executed the following alter.
> {code:none}
> ALTER TABLE discord_channels.channels ADD application_id bigint;
> {code}
> And one row (that we can tell) got corrupted at the same time and could no 
> longer be read from the Python driver. 
> {code:none}
> [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. 
> ver(4); flags(); stream(27); op(8); offset(9); len(887); buffer: 
> '\x84\x00\x00\x1b\x08\x00\x00\x03w\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x0f\x00\x10discord_channels\x00\x08channels\x00\x02id\x00\x02\x00\x0eapplication_id\x00\x02\x00\x07bitrate\x00\t\x00\x08guild_id\x00\x02\x00\ticon_hash\x00\r\x00\x0flast_message_id\x00\x02\x00\x12last_pin_timestamp\x00\x0b\x00\x04name\x00\r\x00\x08owner_id\x00\x02\x00\x15permission_overwrites\x00!\x00\x02\x000\x00\x10discord_channels\x00\x1cchannel_permission_overwrite\x00\x04\x00\x02id\x00\x02\x00\x04type\x00\t\x00\x06allow_\x00\t\x00\x04deny\x00\t\x00\x08position\x00\t\x00\nrecipients\x00!\x00\x02\x000\x00\x10discord_channels\x00\x11channel_recipient\x00\x01\x00\x04nick\x00\r\x00\x05topic\x00\r\x00\x04type\x00\x14\x00\nuser_limit\x00\t\x00\x00\x00\x01\x00\x00\x00\x08\x03\x8a\x19\x8e\xf8\x82\x00\x01\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\xfa\x00\x00\x00\x00\x08\x00\x00\xfa\x00\x00\xf8G\xc5\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8b\xc0\xb5nB\x00\x02\x00\x00\x00\x08G\xc5\xffI\x98\xc4\xb4(\x00\x00\x00\x03\x8b\xc0\xa8\xff\xff\xff\xff\x00\x00\x01<\x00\x00\x00\x06\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x040\x07\xf8Q\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x07\xf8Q\x00\x00\x00\x04\x10\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x05\xe8A\x00\x00\x00\x04\x10\x02\x00\x00\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00
>  
> \x08\x01\x00\x00\x00\x04\xc4\xb4(\x00\xff\xff\xff\xff\x00\x00\x00O[f\x80Q\x07general\x05\xf8G\xc5\xffI\x98\xc4\xb4(\x00\xf8O[f\x80Q\x00\x00\x00\x02\x04\xf8O[f\x80Q\x00\xf8G\xc5\xffI\x98\x01\x00\x00\xf8O[f\x80Q\x00\x00\x00\x00\xf8G\xc5\xffI\x97\xc4\xb4(\x06\x00\xf8O\x7fe\x1fm\x08\x03\x00\x00\x00\x01\x00\x00\x00\x00\x04\x00\x00\x00\x00'
> {code}
> And then in cqlsh when trying to read the row we got this. 
> {code:none}
> /usr/bin/cqlsh.py:632: DateOverFlowWarning: Some timestamps are larger than 
> Python datetime can represent. Timestamps are displayed in milliseconds from 
> epoch.
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1301, in perform_simple_statement
> result = future.result()
>   File 
> "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.5.0.post0-d8d0456.zip/cassandra-driver-3.5.0.post0-d8d0456/cassandra/cluster.py",
>  line 3650, in result
> raise self._final_exception
> UnicodeDecodeError: 'utf8' codec can't decode byte 

[jira] [Updated] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-13004:
---
Reproduced In: 3.10, 3.0.9  (was: 3.0.9)

> Corruption while adding a column to a table
> ---
>
> Key: CASSANDRA-13004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stanislav Vishnevskiy
>
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
> nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
> id bigint,
> type int,
> allow_ int,
> deny int
> );
> CREATE TABLE IF NOT EXISTS discord_channels.channels (
> id bigint,
> guild_id bigint,
> type tinyint,
> name text,
> topic text,
> position int,
> owner_id bigint,
> icon_hash text,
> recipients map,
> permission_overwrites map,
> bitrate int,
> user_limit int,
> last_pin_timestamp timestamp,
> last_message_id bigint,
> PRIMARY KEY (id)
> );
> {code}
> And then we executed the following alter.
> {code:none}
> ALTER TABLE discord_channels.channels ADD application_id bigint;
> {code}
> And one row (that we can tell) got corrupted at the same time and could no 
> longer be read from the Python driver. 
> {code:none}
> [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. 
> ver(4); flags(); stream(27); op(8); offset(9); len(887); buffer: 
> '\x84\x00\x00\x1b\x08\x00\x00\x03w\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x0f\x00\x10discord_channels\x00\x08channels\x00\x02id\x00\x02\x00\x0eapplication_id\x00\x02\x00\x07bitrate\x00\t\x00\x08guild_id\x00\x02\x00\ticon_hash\x00\r\x00\x0flast_message_id\x00\x02\x00\x12last_pin_timestamp\x00\x0b\x00\x04name\x00\r\x00\x08owner_id\x00\x02\x00\x15permission_overwrites\x00!\x00\x02\x000\x00\x10discord_channels\x00\x1cchannel_permission_overwrite\x00\x04\x00\x02id\x00\x02\x00\x04type\x00\t\x00\x06allow_\x00\t\x00\x04deny\x00\t\x00\x08position\x00\t\x00\nrecipients\x00!\x00\x02\x000\x00\x10discord_channels\x00\x11channel_recipient\x00\x01\x00\x04nick\x00\r\x00\x05topic\x00\r\x00\x04type\x00\x14\x00\nuser_limit\x00\t\x00\x00\x00\x01\x00\x00\x00\x08\x03\x8a\x19\x8e\xf8\x82\x00\x01\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\xfa\x00\x00\x00\x00\x08\x00\x00\xfa\x00\x00\xf8G\xc5\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8b\xc0\xb5nB\x00\x02\x00\x00\x00\x08G\xc5\xffI\x98\xc4\xb4(\x00\x00\x00\x03\x8b\xc0\xa8\xff\xff\xff\xff\x00\x00\x01<\x00\x00\x00\x06\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x81L\xea\xfc\x82\x00\n\x00\x00\x00\x04\x00\x00\x00\x01\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1e\xe6\x8b\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x040\x07\xf8Q\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1f\x1b{\x82\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x07\xf8Q\x00\x00\x00\x04\x10\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x1fH6\x82\x00\x01\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x05\xe8A\x00\x00\x00\x04\x10\x02\x00\x00\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a+=\xca\xc0\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x08\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00$\x00\x00\x00\x08\x03\x8a\x8f\x979\x80\x00\n\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x04\x00
>  
> \x08\x01\x00\x00\x00\x04\xc4\xb4(\x00\xff\xff\xff\xff\x00\x00\x00O[f\x80Q\x07general\x05\xf8G\xc5\xffI\x98\xc4\xb4(\x00\xf8O[f\x80Q\x00\x00\x00\x02\x04\xf8O[f\x80Q\x00\xf8G\xc5\xffI\x98\x01\x00\x00\xf8O[f\x80Q\x00\x00\x00\x00\xf8G\xc5\xffI\x97\xc4\xb4(\x06\x00\xf8O\x7fe\x1fm\x08\x03\x00\x00\x00\x01\x00\x00\x00\x00\x04\x00\x00\x00\x00'
> {code}
> And then in cqlsh when trying to read the row we got this. 
> {code:none}
> /usr/bin/cqlsh.py:632: DateOverFlowWarning: Some timestamps are larger than 
> Python datetime can represent. Timestamps are displayed in milliseconds from 
> epoch.
> Traceback (most recent call last):
>   File "/usr/bin/cqlsh.py", line 1301, in perform_simple_statement
> result = future.result()
>   File 
> "/usr/share/cassandra/lib/cassandra-driver-internal-only-3.5.0.post0-d8d0456.zip/cassandra-driver-3.5.0.post0-d8d0456/cassandra/cluster.py",
>  line 3650, in result
> raise self._final_exception
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 2: 
> invalid start byte
> {code}
> We tried to read 

[jira] [Commented] (CASSANDRA-13510) CI for validating cassandra on power platform

2017-05-21 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019150#comment-16019150
 ] 

Jeff Jirsa commented on CASSANDRA-13510:


Your run had 
[2194|https://builds.apache.org/view/A-D/view/Cassandra/job/cassandra-devbranch-ppc64le-testall/lastBuild/testReport/]
 tests run, with 45 failures, 2 of which were the CAPI tests which we expect to 
fail without the hardware in place. For most patches, we want to see a green 
run (no failures) in unit tests, and single digit failures (currently about 
5-7) in dtests. From time to time regressions get introduced, and hopefully we 
catch those quickly - if there were a test that was also failing on the trunk 
and 3.11 branches, we don't always block new merges if we're sure the 
regression was elsewhere (though we've discussed, multiple times, that we 
should always fix tests before we merge more code, we typically leave it to 
committer discretion).






> CI for validating cassandra on power platform
> -
>
> Key: CASSANDRA-13510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13510
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Amitkumar Ghatwal
>
> Hi All,
> As i understand that currently CI available for cassandra  ( to validate any 
> code updates )  is : http://cassci.datastax.com/view/Dev/ and as can be seen 
> most of the deployment here is on Intel - X86 arch . 
> Just wanted to know your views/comments/suggestions for having a CI of 
> cassandra on Power .
> 1) If the community will be willing to add - vm's/slaves ( ppc64le based ) to 
> their current above CI . May be some externally hosted - ppc64le vm's can be 
> attached as slaves to above jenkins server.
> 2) Use an externally hosted jenkins CI - for running cassandra build on power 
> and link the results of the build to the above CI.
> This ticket is just a follow up on CI query for Cassandra on power - 
> https://issues.apache.org/jira/browse/CASSANDRA-13486.
> Please let me know your thoughts.
> Regards,
> Amit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13510) CI for validating cassandra on power platform

2017-05-21 Thread Amitkumar Ghatwal (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019143#comment-16019143
 ] 

Amitkumar Ghatwal commented on CASSANDRA-13510:
---

[~jjirsa] - Thanks for scheduling build 9 - its returning " Unstable" as of now 
and am investigating the same. Also agree on your strategy for "dtest" . Will 
keep you posted here once the unit test case are passing , thereafter we can 
move to dtest.

Just for information sake , can you tell what sort of builds are set up in 
ASF's cassandra . Understand that there are below
1. testall  ( Unit test case)
2. dtest

So what are the builds which are required to pass for any code changes 
introduced in Cassandra-dev branches ?


> CI for validating cassandra on power platform
> -
>
> Key: CASSANDRA-13510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13510
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Amitkumar Ghatwal
>
> Hi All,
> As i understand that currently CI available for cassandra  ( to validate any 
> code updates )  is : http://cassci.datastax.com/view/Dev/ and as can be seen 
> most of the deployment here is on Intel - X86 arch . 
> Just wanted to know your views/comments/suggestions for having a CI of 
> cassandra on Power .
> 1) If the community will be willing to add - vm's/slaves ( ppc64le based ) to 
> their current above CI . May be some externally hosted - ppc64le vm's can be 
> attached as slaves to above jenkins server.
> 2) Use an externally hosted jenkins CI - for running cassandra build on power 
> and link the results of the build to the above CI.
> This ticket is just a follow up on CI query for Cassandra on power - 
> https://issues.apache.org/jira/browse/CASSANDRA-13486.
> Please let me know your thoughts.
> Regards,
> Amit



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Andrei Zbikowski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019084#comment-16019084
 ] 

Andrei Zbikowski edited comment on CASSANDRA-13004 at 5/22/17 2:55 AM:
---

Hey all, I work with Stan and Jake at Discord and have a few updates for this 
thread that may help drive us towards a solution.

First up, we've been able to produce a minimal reproduction with two sanitized 
rows of our production data set. So far this appears to be a 100% consistent 
reproduction, each time producing corrupted/unreadable data. Through the 
process of building this reproduction we've noticed the following attributes to 
this bug:

- From what we can tell, this is not reproducible with a single node. We've 
tested both 1 node, 3 node and 6 node clusters, with only the 3/6 combinations 
producing corruption.
- The bug does not seem to be related to client driver, we've tested both 
Python and gocql
- Appears to exhibit itself on all 3.x versions (we've mostly been testing on 
Cassandra 3.10)

I've uploaded the reproduction case to github over at 
(https://github.com/b1naryth1ef/cassandra-13004). A few notes about this:

- Some stuff might have to be tweaked (e.g. the cluster ip, keyspace 
replication datacenter) depending on your test env.
- Although we reproduced this in Python too, I've only included the Go version 
(mostly because the Python one was more heavily intertwined with our internal 
code). If it helps, I'm mor ethan happy to cleanup and provide a Python version 
too.

To run the reproduction, you must spin up a Cassandra 3.10 cluster, and then 
either run the script or the commands within 'test.sh'. This creates the 
keyspace/types/table, and loads two rows in (there is a bug w/ recent versions 
of cqlsh that break loading of multiple large rows, so the script loads one 
road at a time). Finally, you can run the Go binary which will connect to the 
cluster and start reading/writing data. At this point you can run alters (I've 
been using `ALTER TABLE test13004.guilds ADD aaa bigint;` which seems to work 
every time) and observe the output using cqlsh `SELECT * FROM test13004.guilds`.

Happy to provide any more details or information as it's needed.




was (Author: b1nzy):
Hey all, I work with Stan and Jake at Discord and have a few updates for this 
thread that may help drive us towards a solution.

First up, we've been able to produce a minimal reproduction with two sanitized 
rows of our production data set. So far this appears to be a 100% consistent 
reproduction, each time producing corrupted/unreadable data. Through the 
process of building this reproduction we've noticed the following attributes to 
this bug:

- From what we can tell, this is not reproducible with a single node. We've 
tested both 1 node, 3 node and 6 node clusters, with only the 3/6 combinations 
producing corruption.
- The bug does not seem to be related to client driver, we've tested both 
Python and gocql
- Appears to exhibit itself on all versions (we've mostly been testing on 
Cassandra 3.10)

I've uploaded the reproduction case to github over at 
(https://github.com/b1naryth1ef/cassandra-13004). A few notes about this:

- Some stuff might have to be tweaked (e.g. the cluster ip, keyspace 
replication datacenter) depending on your test env.
- Although we reproduced this in Python too, I've only included the Go version 
(mostly because the Python one was more heavily intertwined with our internal 
code). If it helps, I'm mor ethan happy to cleanup and provide a Python version 
too.

To run the reproduction, you must spin up a Cassandra 3.10 cluster, and then 
either run the script or the commands within 'test.sh'. This creates the 
keyspace/types/table, and loads two rows in (there is a bug w/ recent versions 
of cqlsh that break loading of multiple large rows, so the script loads one 
road at a time). Finally, you can run the Go binary which will connect to the 
cluster and start reading/writing data. At this point you can run alters (I've 
been using `ALTER TABLE test13004.guilds ADD aaa bigint;` which seems to work 
every time) and observe the output using cqlsh `SELECT * FROM test13004.guilds`.

Happy to provide any more details or information as it's needed.



> Corruption while adding a column to a table
> ---
>
> Key: CASSANDRA-13004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stanislav Vishnevskiy
>
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
> nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
> id bigint,
> type int,
> allow_ int,
> deny int
> );
> CREATE TABLE IF NOT EXISTS 

[jira] [Commented] (CASSANDRA-13004) Corruption while adding a column to a table

2017-05-21 Thread Andrei Zbikowski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019084#comment-16019084
 ] 

Andrei Zbikowski commented on CASSANDRA-13004:
--

Hey all, I work with Stan and Jake at Discord and have a few updates for this 
thread that may help drive us towards a solution.

First up, we've been able to produce a minimal reproduction with two sanitized 
rows of our production data set. So far this appears to be a 100% consistent 
reproduction, each time producing corrupted/unreadable data. Through the 
process of building this reproduction we've noticed the following attributes to 
this bug:

- From what we can tell, this is not reproducible with a single node. We've 
tested both 1 node, 3 node and 6 node clusters, with only the 3/6 combinations 
producing corruption.
- The bug does not seem to be related to client driver, we've tested both 
Python and gocql
- Appears to exhibit itself on all versions (we've mostly been testing on 
Cassandra 3.10)

I've uploaded the reproduction case to github over at 
(https://github.com/b1naryth1ef/cassandra-13004). A few notes about this:

- Some stuff might have to be tweaked (e.g. the cluster ip, keyspace 
replication datacenter) depending on your test env.
- Although we reproduced this in Python too, I've only included the Go version 
(mostly because the Python one was more heavily intertwined with our internal 
code). If it helps, I'm mor ethan happy to cleanup and provide a Python version 
too.

To run the reproduction, you must spin up a Cassandra 3.10 cluster, and then 
either run the script or the commands within 'test.sh'. This creates the 
keyspace/types/table, and loads two rows in (there is a bug w/ recent versions 
of cqlsh that break loading of multiple large rows, so the script loads one 
road at a time). Finally, you can run the Go binary which will connect to the 
cluster and start reading/writing data. At this point you can run alters (I've 
been using `ALTER TABLE test13004.guilds ADD aaa bigint;` which seems to work 
every time) and observe the output using cqlsh `SELECT * FROM test13004.guilds`.

Happy to provide any more details or information as it's needed.



> Corruption while adding a column to a table
> ---
>
> Key: CASSANDRA-13004
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13004
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stanislav Vishnevskiy
>
> We had the following schema in production. 
> {code:none}
> CREATE TYPE IF NOT EXISTS discord_channels.channel_recipient (
> nick text
> );
> CREATE TYPE IF NOT EXISTS discord_channels.channel_permission_overwrite (
> id bigint,
> type int,
> allow_ int,
> deny int
> );
> CREATE TABLE IF NOT EXISTS discord_channels.channels (
> id bigint,
> guild_id bigint,
> type tinyint,
> name text,
> topic text,
> position int,
> owner_id bigint,
> icon_hash text,
> recipients map,
> permission_overwrites map,
> bitrate int,
> user_limit int,
> last_pin_timestamp timestamp,
> last_message_id bigint,
> PRIMARY KEY (id)
> );
> {code}
> And then we executed the following alter.
> {code:none}
> ALTER TABLE discord_channels.channels ADD application_id bigint;
> {code}
> And one row (that we can tell) got corrupted at the same time and could no 
> longer be read from the Python driver. 
> {code:none}
> [E 161206 01:56:58 geventreactor:141] Error decoding response from Cassandra. 
> ver(4); flags(); stream(27); op(8); offset(9); len(887); buffer: 
> 

[jira] [Commented] (CASSANDRA-13541) Mark couple of API methods for Compaction as Deprecated

2017-05-21 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019080#comment-16019080
 ] 

Lerh Chuan Low commented on CASSANDRA-13541:


Feel free to let me know if there's anything you would like changed :)

https://github.com/apache/cassandra/pull/114

> Mark couple of API methods for Compaction as Deprecated
> ---
>
> Key: CASSANDRA-13541
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13541
> Project: Cassandra
>  Issue Type: Task
>Reporter: Lerh Chuan Low
>Assignee: Lerh Chuan Low
>Priority: Trivial
>
> A follow up from 
> https://issues.apache.org/jira/browse/CASSANDRA-13182?focusedCommentId=16013347=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16013347.
>  
> Enabling and disabling Compaction is done via {{CompactionStrategyManager}} 
> so these methods are no longer used. We shouldn't totally remove them as 
> people may have written plugins for CompactionStrategy. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13541) Mark couple of API methods for Compaction as Deprecated

2017-05-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019079#comment-16019079
 ] 

ASF GitHub Bot commented on CASSANDRA-13541:


GitHub user juiceblender opened a pull request:

https://github.com/apache/cassandra/pull/114

Cassandra-13541 Mark couple of API methods for compactions as deprecated

As they are handled by CompactionStrategyManager. 

https://issues.apache.org/jira/browse/CASSANDRA-13541

@krummas 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/juiceblender/cassandra 
deprecate-compaction-API-methods

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/114.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #114


commit cc1a1e8bf2138fa9fa3659ff63479111fc87da49
Author: Lerh Chuan Low 
Date:   2017-05-22T02:20:25Z

Cassandra-13541 Mark couple of API methods for compactions as deprecated as 
they are handled by CompactionStrategyManager




> Mark couple of API methods for Compaction as Deprecated
> ---
>
> Key: CASSANDRA-13541
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13541
> Project: Cassandra
>  Issue Type: Task
>Reporter: Lerh Chuan Low
>Assignee: Lerh Chuan Low
>Priority: Trivial
>
> A follow up from 
> https://issues.apache.org/jira/browse/CASSANDRA-13182?focusedCommentId=16013347=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16013347.
>  
> Enabling and disabling Compaction is done via {{CompactionStrategyManager}} 
> so these methods are no longer used. We shouldn't totally remove them as 
> people may have written plugins for CompactionStrategy. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13541) Mark couple of API methods for Compaction as Deprecated

2017-05-21 Thread Lerh Chuan Low (JIRA)
Lerh Chuan Low created CASSANDRA-13541:
--

 Summary: Mark couple of API methods for Compaction as Deprecated
 Key: CASSANDRA-13541
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13541
 Project: Cassandra
  Issue Type: Task
Reporter: Lerh Chuan Low
Assignee: Lerh Chuan Low
Priority: Trivial


A follow up from 
https://issues.apache.org/jira/browse/CASSANDRA-13182?focusedCommentId=16013347=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16013347.
 

Enabling and disabling Compaction is done via {{CompactionStrategyManager}} so 
these methods are no longer used. We shouldn't totally remove them as people 
may have written plugins for CompactionStrategy. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13120) Trace and Histogram output misleading

2017-05-21 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019058#comment-16019058
 ] 

Stefania commented on CASSANDRA-13120:
--

bq. Right now, what CFHistograms expose is the number of SSTables on which we 
do a partition lookup.

Except it has never been documented, as far as I can see neither the [ASF 
docs|http://cassandra.apache.org/doc/latest/tools/nodetool/tablehistograms.html]
 nor the [DS 
docs|http://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsTablehisto.html]
 say what the number of sstables actually is  and 
[people|https://www.smartcat.io/blog/2017/where-is-my-data-debugging-sstables-in-cassandra/]
 tend to think this is the number of sstables it touches on each read, so it is 
very misleading. Our own comment says {{/** Histogram of the number of sstable 
data files accessed per read */}}, which to me would indicate that if the BF 
excludes a table then it should not be counted. We should at a minimum improve 
our own comments.

bq. keep the metric as it is and to add a new one mergedSSTable to track how 
many SSTables have been actually merged.

Are we thinking of a new metrics histogram? I'm not opposed, as long as we 
document {{nodetool \[cf|table\]histograms}} accordingly. My only concern is 
that adding a new histogram on each read may have a performance impact - but I 
do understand if we don't want to change the existing behavior, especially in 
3.0.



> Trace and Histogram output misleading
> -
>
> Key: CASSANDRA-13120
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13120
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Adam Hattrell
>Assignee: Benjamin Lerer
>Priority: Minor
>
> If we look at the following output:
> {noformat}
> [centos@cassandra-c-3]$ nodetool getsstables -- keyspace table 
> 60ea4399-6b9f-4419-9ccb-ff2e6742de10
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647146-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647147-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647145-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647152-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-647157-big-Data.db
> /mnt/cassandra/data/data/keyspace/table-62f30431acf411e69a4ed7dd11246f8a/mc-648137-big-Data.db
> {noformat}
> We can see that this key value appears in just 6 sstables.  However, when we 
> run a select against the table and key we get:
> {noformat}
> Tracing session: a6c81330-d670-11e6-b00b-c1d403fd6e84
>  activity 
>  | timestamp  | source
>  | source_elapsed
> ---+++
>   
>   Execute CQL3 query | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |  0
>  Parsing SELECT * FROM keyspace.table WHERE id = 
> 60ea4399-6b9f-4419-9ccb-ff2e6742de10; [SharedPool-Worker-2]   | 
> 2017-01-09 13:36:40.419000 | 10.200.254.141 |104
>  
> Preparing statement [SharedPool-Worker-2] | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |220
> Executing single-partition query on 
> table [SharedPool-Worker-1]| 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |450
> Acquiring 
> sstable references [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |477
>  Bloom filter allows skipping 
> sstable 648146 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419000 | 
> 10.200.254.141 |496
>  Bloom filter allows skipping 
> sstable 648145 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |503
> Key cache hit for 
> sstable 648140 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |513
>  Bloom filter allows skipping 
> sstable 648135 [SharedPool-Worker-1] | 2017-01-09 13:36:40.419001 | 
> 10.200.254.141 |520
>  Bloom filter allows skipping 
> sstable