[jira] [Commented] (CASSANDRA-14871) Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser

2018-12-03 Thread Joel Knighton (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707454#comment-16707454
 ] 

Joel Knighton commented on CASSANDRA-14871:
---

No apology necessary, [~bdeggleston]. Thanks for taking a look! I'm busy enough 
that this hasn't made its way up my queue yet. If you have time/interest, 
please feel free to take over the review.

> Severe concurrency issues in STCS,DTCS,TWCS,TMD.Topology,TypeParser
> ---
>
> Key: CASSANDRA-14871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14871
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Critical
> Fix For: 4.0, 3.0.x, 3.11.x
>
>
> There are a couple of places in the code base that do not respect that 
> j.u.HashMap + related classes are not thread safe and some parts rely on 
> internals of the implementation of HM, which can change.
> We have observed failures like {{NullPointerException}} and  
> {{ConcurrentModificationException}} as well as wrong behavior.
> Affected areas in the code base:
>  * {{SizeTieredCompactionStrategy}}
>  * {{DateTieredCompactionStrategy}}
>  * {{TimeWindowCompactionStrategy}}
>  * {{TokenMetadata.Topology}}
>  * {{TypeParser}}
>  * streaming / concurrent access to {{LifecycleTransaction}} (handled in 
> CASSANDRA-14554)
> While the patches for the compaction strategies + {{TypeParser}} are pretty 
> straight forward, the patch for {{TokenMetadata.Topology}} requires it to be 
> made immutable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2018-06-15 Thread Joel Knighton (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513964#comment-16513964
 ] 

Joel Knighton commented on CASSANDRA-13692:
---

Thanks again for the patch. I apologize for my genuinely staggering lateness 
here; you deserve better. Some thoughts:
 * On all versions, I don't think we'll actually hit the error handling here. 
In CASSANDRA-9669, we pushed error handling down into {{getWriteableLocation}} 
so that callers don't need to check for null and throw an exception themselves.
 * On 2.2, in the error-handling, we still throw a {{RuntimeException}} from 
within {{getWriteableLocation}}. I think this probably remains the right thing 
to do. in CASSANDRA-11828, for 3.0+, we switched to throwing an 
{{FSWriteError}} from {{getWriteableLocation}}. This caused the behavior in  
CASSANDRA-12385 which we fixed by adding an {{FSDiskFullWriteError}} that we 
throw instead and handle in ACT. I think backporting all that to 2.2 is 
probably unnecessary given the age of 2.2.

Am I missing a situation in which we'd hit this error handling? It really seems 
to me that we longer return null from {{getWriteableLocation}}. If I'm not, I 
think the best thing to do is update the javadocs of affected methods and 
remove the unused error handling from {{CompactionAwareWriter}}.

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>Priority: Major
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results-updated.PNG, 
> c13692-3.0-dtest-results.PNG, c13692-3.0-testall-results.PNG, 
> c13692-3.11-dtest-results-updated.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results-updated.PNG, 
> c13692-dtest-results.PNG, c13692-testall-results-updated.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code 

[jira] [Updated] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2018-06-15 Thread Joel Knighton (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13692:
--
Status: Open  (was: Patch Available)

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>Priority: Major
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results-updated.PNG, 
> c13692-3.0-dtest-results.PNG, c13692-3.0-testall-results.PNG, 
> c13692-3.11-dtest-results-updated.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results-updated.PNG, 
> c13692-dtest-results.PNG, c13692-testall-results-updated.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2018-06-15 Thread Joel Knighton (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13692:
--
Status: Awaiting Feedback  (was: Open)

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>Priority: Major
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results-updated.PNG, 
> c13692-3.0-dtest-results.PNG, c13692-3.0-testall-results.PNG, 
> c13692-3.11-dtest-results-updated.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results-updated.PNG, 
> c13692-dtest-results.PNG, c13692-testall-results-updated.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14174) Remove GossipDigestSynVerbHandler#doSort()

2018-01-26 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341709#comment-16341709
 ] 

Joel Knighton commented on CASSANDRA-14174:
---

I took another look at this. I've still got nothing for a positive effect of 
this code remaining, intentional or otherwise. CI looks good. The dtests are 
admittedly noisy, but I don't see any problems that look like a result of this 
change.

 

+1.

> Remove GossipDigestSynVerbHandler#doSort()
> --
>
> Key: CASSANDRA-14174
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14174
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> I have personally tripped up on this function a couple of times over the 
> years, believing that it contributes to bugs in some way or another. While I 
> have not found that (necessarily!) to be the case, I feel this function is 
> completely useless in the grand scope of things.
> Going back through the mists of time (that is, {{git log}}), it appears this 
> function was part of the original code drop from Facebook when they open 
> sourced cassandra. Looking at the {{#doSort()}} method, all it does is sort 
> the incoming list of {{GossipDigest}} s by the difference between the remote 
> node's maxValue for a given peer and the local nodes' maxValue.
> The only universe where this is actually an optimization is if you go back 
> and read the [Scuttlebutt 
> paper|https://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf] (upon which 
> cassandra's Gossip anti-entropy reconciliation is based). The end of section 
> 3.2 describes ordering of the incoming digests such that, in the case where 
> you do not return all of the differences (because you are optimizing for the 
> return message size), you can gather the differences for the peers which are 
> most of out sync. The ordering implemented in cassandra is the second 
> ordering described in the paper, called "scuttle depth".
> As we always send all differences between two nodes (message size be damned), 
> this optimization, borrowed from the paper, is largely irrelevant for 
> Cassandra's purposes.
> Thus, I propose we remove this method for the following gains:
>  - less garbage created
>  - less CPU (sure, it's mostly trivial; see next point)
>  - less time spent on unnecessary functionality on the *single threaded* 
> gossip stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13873) Ref bug in Scrub

2017-12-08 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16284384#comment-16284384
 ] 

Joel Knighton commented on CASSANDRA-13873:
---

Great. Thanks!

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-04 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-14059.
---
Resolution: Fixed

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-04 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277944#comment-16277944
 ] 

Joel Knighton commented on CASSANDRA-14059:
---

Committed the fix to cassandra-dtest as 
{{413b18a87d9416446cf4adec5f8483ad408b3e83}}. Thanks for taking a look and 
noticing the error.

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-04 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277829#comment-16277829
 ] 

Joel Knighton commented on CASSANDRA-14059:
---

Follow-up commit here 
https://github.com/jkni/cassandra-dtest/commit/179bf9f16a03394aee9715b745bfd4773c5b05ba

It uses string formatting instead of concatenation, so we'll automatically 
stringify lists. I smoke tested this on the old set as well as manually on the 
tests listed above. I'm running a full set of dtests as well, but this is very 
likely no worse than the current situation if there's support for committing it 
sooner. Otherwise, I'll commit once tests are clean.

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Reopened] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-04 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reopened CASSANDRA-14059:
---

I'm really sorry about that. I ran this with a subset of dtests that were 
clearly inadequate to cover all cases here.

Reopening and I'll supply a patch on this issue shortly.

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-01 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14059:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review! Committed to cassandra-dtest as 
{{c1bcc18664cd9e9035f05a98ed23e763173fafd9}}.

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-01 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274774#comment-16274774
 ] 

Joel Knighton commented on CASSANDRA-14059:
---

Works for me - okay with you if I go ahead and commit this with that change?

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-12-01 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274675#comment-16274675
 ] 

Joel Knighton commented on CASSANDRA-14059:
---

Thanks for taking a look. Any suggestions for the extra character to serve as 
the delimiter? More whitespace? I have very few opinions here.

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13873) Ref bug in Scrub

2017-11-30 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273882#comment-16273882
 ] 

Joel Knighton commented on CASSANDRA-13873:
---

Thanks for the patches and CI. Both your remarks look correct to me; frankly, I 
have no idea how I missed that in anticompaction.

Test results look good for the most part. There's a few flaky unit tests on 
3.0/3.11 that appear to have failed the same way before the patch, pass for me 
locally, and appear to be at the limits of CircleCI's timeouts/resources. The 
2.2 dtests timed out, so it seems worthwhile to trigger those again just in 
case. The only unusual failures on 3.0 dtests are a bunch of tests where 
Jolokia failed to attach for JMX. I'm not sure if this is a known environmental 
problem on ASF dtests, but I was unable to reproduce this elsewhere.

Overall, +1 to the patch for me, and this looks good to merge if none of the 
test issues I raised above worry you.

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-13873) Ref bug in Scrub

2017-11-30 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-13873:
-

Assignee: Marcus Eriksson  (was: Joel Knighton)

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13873) Ref bug in Scrub

2017-11-30 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13873:
--
Reviewer: Joel Knighton  (was: Marcus Eriksson)

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Marcus Eriksson
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14079) Prevent compaction strategies from looping indefinitely

2017-11-30 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273603#comment-16273603
 ] 

Joel Knighton commented on CASSANDRA-14079:
---

It looks like this broke the build on 3.11/trunk. On trunk only, there's a 
place in {{AbstractCompactionStrategyTest}} where we pass a 
{{TableMetadataRef}} instead of a {{TableMetadata}}. On 3.11/trunk, it looks 
like there's a missing {{removeUnsafe}} test method on {{Tracker}} that 
{{AbstractCompactionStrategyTest}} uses. It looks like that's missing on all 
branches, so maybe it just got left out of the commit. 

[~pauloricardomg] ^

> Prevent compaction strategies from looping indefinitely
> ---
>
> Key: CASSANDRA-14079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14079
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Minor
> Fix For: 3.11.2, 4.0
>
>
> As a result of CASSANDRA-13948, LCS was looping indefinitely trying to 
> generate the same candidates for SSTables which were not on the tracker.
> We should add a protection on compaction strategies against looping 
> indefinitely to avoid similar bugs in the future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13873) Ref bug in Scrub

2017-11-29 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16271931#comment-16271931
 ] 

Joel Knighton commented on CASSANDRA-13873:
---

Sorry for the latency here - my fault. The patch looks good to me. I considered 
a few other cases where a similar problem might exist. It seems to me the same 
issue could exist in in the Splitter/Upgrader, but since they're offline, I 
don't know what future changes would require another operation to reference 
canonical sstables in parallel. I also don't see anything in anticompaction 
grabbing a ref; am I missing something there?

The patches look good for existing cases. Unfortunately, I let the dtests age 
out before taking a closer look, but I can rerun them after you look at the 
question above. I'm +1 to merging the relatively trivial patches through to 
trunk and opening a ticket to improve it later. As you've seen, I don't have a 
huge amount of bandwidth for this right now, so I'd rather not delay a definite 
improvement with only the promise of a better one. Thanks for the patience.

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: T Jake Luciani
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14037) sstableloader_with_failing_2i_test - sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]

2017-11-29 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14037:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review. Patch committed to cassandra-dtest as 
{{fc68a0de8d05082a0a78196695572ff2346179c4}}.

> sstableloader_with_failing_2i_test - 
> sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: 
> Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]]
> --
>
> Key: CASSANDRA-14037
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14037
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> sstableloader_with_failing_2i_test - 
> sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: 
> Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]]
> Expected [['k', 'idx']] from SELECT * FROM system."IndexInfo" WHERE 
> table_name='k', but got [[u'k', u'idx', None]]
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-2My0fh
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/home/cassandra/env/src/cassandra-driver/cassandra/io/asyncorereactor.py", 
> line 344, in __init__
> self._connect_socket()
>   File "cassandra/connection.py", line 371, in 
> cassandra.connection.Connection._connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: 
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> error: [Errno 111] Tried connecting to [('127.0.0.1', 9042)]. Last error: 
> Connection refused
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/home/cassandra/env/src/cassandra-driver/cassandra/io/asyncorereactor.py", 
> line 344, in __init__
> self._connect_socket()
>   File "cassandra/connection.py", line 371, in 
> cassandra.connection.Connection._connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: 
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> error: [Errno 111] Tried connecting to [('127.0.0.1', 9042)]. Last error: 
> Connection refused
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File 

[jira] [Updated] (CASSANDRA-14037) sstableloader_with_failing_2i_test - sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]

2017-11-28 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14037:
--
Status: Patch Available  (was: Open)

One-line patch available 
[here|https://github.com/jkni/cassandra-dtest/commit/cb77eecfd7fa7a4a584dee46784c543ec9e6e43c].
 It looks like this was just an oversight from [CASSANDRA-10857]. Since 
IndexInfo was originally declared as compact, we need to expect the value 
column when selecting all columns. This change was already made in one other 
place in the same test. [~ifesdjeen] to review if interested.

> sstableloader_with_failing_2i_test - 
> sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: 
> Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]]
> --
>
> Key: CASSANDRA-14037
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14037
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> sstableloader_with_failing_2i_test - 
> sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: 
> Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]]
> Expected [['k', 'idx']] from SELECT * FROM system."IndexInfo" WHERE 
> table_name='k', but got [[u'k', u'idx', None]]
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-2My0fh
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/home/cassandra/env/src/cassandra-driver/cassandra/io/asyncorereactor.py", 
> line 344, in __init__
> self._connect_socket()
>   File "cassandra/connection.py", line 371, in 
> cassandra.connection.Connection._connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: 
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> error: [Errno 111] Tried connecting to [('127.0.0.1', 9042)]. Last error: 
> Connection refused
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/home/cassandra/env/src/cassandra-driver/cassandra/io/asyncorereactor.py", 
> line 344, in __init__
> self._connect_socket()
>   File "cassandra/connection.py", line 371, in 
> cassandra.connection.Connection._connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: 
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> error: [Errno 111] Tried connecting to [('127.0.0.1', 9042)]. Last error: 
> Connection refused
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = 

[jira] [Assigned] (CASSANDRA-14037) sstableloader_with_failing_2i_test - sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: Expected [['k', 'idx']] ... but got [[u'k', u'idx', None

2017-11-28 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-14037:
-

Assignee: Joel Knighton

> sstableloader_with_failing_2i_test - 
> sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: 
> Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]]
> --
>
> Key: CASSANDRA-14037
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14037
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> sstableloader_with_failing_2i_test - 
> sstable_generation_loading_test.TestSSTableGenerationAndLoading fails: 
> Expected [['k', 'idx']] ... but got [[u'k', u'idx', None]]
> Expected [['k', 'idx']] from SELECT * FROM system."IndexInfo" WHERE 
> table_name='k', but got [[u'k', u'idx', None]]
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-2My0fh
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'num_tokens': '32',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/home/cassandra/env/src/cassandra-driver/cassandra/io/asyncorereactor.py", 
> line 344, in __init__
> self._connect_socket()
>   File "cassandra/connection.py", line 371, in 
> cassandra.connection.Connection._connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: 
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> error: [Errno 111] Tried connecting to [('127.0.0.1', 9042)]. Last error: 
> Connection refused
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> "/home/cassandra/env/src/cassandra-driver/cassandra/io/asyncorereactor.py", 
> line 344, in __init__
> self._connect_socket()
>   File "cassandra/connection.py", line 371, in 
> cassandra.connection.Connection._connect_socket
> raise socket.error(sockerr.errno, "Tried connecting to %s. Last error: 
> %s" % ([a[4] for a in addresses], sockerr.strerror or sockerr))
> error: [Errno 111] Tried connecting to [('127.0.0.1', 9042)]. Last error: 
> Connection refused
> cassandra.cluster: WARNING: [control connection] Error connecting to 
> 127.0.0.1:
> Traceback (most recent call last):
>   File "cassandra/cluster.py", line 2781, in 
> cassandra.cluster.ControlConnection._reconnect_internal
> return self._try_connect(host)
>   File "cassandra/cluster.py", line 2803, in 
> cassandra.cluster.ControlConnection._try_connect
> connection = self._cluster.connection_factory(host.address, 
> is_control_connection=True)
>   File "cassandra/cluster.py", line 1195, in 
> cassandra.cluster.Cluster.connection_factory
> return self.connection_class.factory(address, self.connect_timeout, 
> *args, **kwargs)
>   File "cassandra/connection.py", line 332, in 
> cassandra.connection.Connection.factory
> conn = cls(host, *args, **kwargs)
>   File 
> 

[jira] [Updated] (CASSANDRA-14028) counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters fails

2017-11-27 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14028:
--
Status: Patch Available  (was: Open)

Patch available 
[here|https://github.com/jkni/cassandra-dtest/commit/a209a6311eb4b050a80d261e8b57176f74b070a4].

There appear to be two different issues here. First, the test manually forced 
the use of vnodes in cluster creation, which breaks novnode runs. As far as I 
can tell, there's no inherent incompatibility with novnode configurations, so I 
removed that behavior. It now passes on both vnode and novnode runs.

Second, this test was introduced for [CASSANDRA-13043], which only fixes the 
issue in 3.0+. The test fails on 2.2 and fails to run entirely on 2.1 due to 
missing byteman dependencies. I scoped the test to run on 3.0+.

> counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters 
> fails
> -
>
> Key: CASSANDRA-14028
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14028
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> Unexpected error in log, see stdout
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-S1sTss
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 358, in run
> self.tearDown()
>   File "/home/cassandra/cassandra-dtest/dtest.py", line 599, in tearDown
> raise AssertionError('Unexpected error in log, see stdout')
> "Unexpected error in log, see stdout\n >> begin captured 
> logging << \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-S1sTss\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None,\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14028) counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters fails

2017-11-27 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-14028:
-

Assignee: Joel Knighton

> counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters 
> fails
> -
>
> Key: CASSANDRA-14028
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14028
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> Unexpected error in log, see stdout
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-S1sTss
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 358, in run
> self.tearDown()
>   File "/home/cassandra/cassandra-dtest/dtest.py", line 599, in tearDown
> raise AssertionError('Unexpected error in log, see stdout')
> "Unexpected error in log, see stdout\n >> begin captured 
> logging << \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-S1sTss\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None,\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14029) counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters fails: Error starting node2

2017-11-27 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-14029.
---
Resolution: Duplicate

This is a duplicate of [CASSANDRA-14028].

> counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters 
> fails: Error starting node2
> ---
>
> Key: CASSANDRA-14029
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14029
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>
> counter_leader_with_partial_view_test-novnodes - counter_tests.TestCounters 
> fails: Error starting node2
> Error starting node2.
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-S1sTss
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/counter_tests.py", line 95, in 
> counter_leader_with_partial_view_test
> cluster.start(wait_for_binary_proto=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 423, in start
> raise NodeError("Error starting {0}.".format(node.name), p)
> "Error starting node2.\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-S1sTss\ndtest: DEBUG: Done setting configuration options:\n{   
> 'num_tokens': None,\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-11-17 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14059:
--
Description: 
Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
{{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
--nologcapture, errors of the following form are printed:
{code}
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
s = self._fmt % record.__dict__
KeyError: 'current_test'
Logged from file dtest.py, line 485
{code}

This is because CCM no longer installs a basic root logger configuration, which 
is probably a more correct behavior than what it did prior to this change. Now, 
dtest installs its own basic root logger configuration which writes to 
'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
%(current_test)s %(levelname)s %(message)s'}}. This means that anything logging 
a message must provide the current_test key in its extras map. The dtest 
{{debug}} and {{warning}} functions do this, but logging from dependencies 
doesn't, producing these {{KeyError}} s. 

  was:
Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
{{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
--nologcapture, errors of the following form are printed:
{code}
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
s = self._fmt % record.__dict__
KeyError: 'current_test'
Logged from file dtest.py, line 485
{code}

This is because CCM no longer installs a basic root logger configuration, which 
is probably a more correct behavior than what it did prior to this change. Now, 
dtest installs its own basic root logger configuration which writes to 
'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
%(current_test)s %(levelname)s %(message)s'}}. This means that anything logging 
a message must provide the current_test key in its extras map. The dtest 
{{debug}} and {{warning}} functions do this, but logging from dependencies 
doesn't, producing these {{KeyError}}s. 


> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}} s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-11-17 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14059:
--
Status: Patch Available  (was: Open)

Patch here: 
[https://github.com/jkni/cassandra-dtest/commit/91e860da6b5959df02d14cc56b0d5c09a2926a83].
 This removes the field from the formatter and instead concatenates it to the 
message inside dtest logging functions. It also changes the way we construct 
the CURRENT_TEST global. In my testing, the test id already contained the 
method name, so method names were duplicated in log output.

There may be a convincing argument to removing the dtest.log logger 
configuration entirely. It appears to have been missing for some time without 
anyone noticing. For now, I introduced a behavior that's close to the original 
intention of the dtest.log, as far as I can tell.

> Root logging formatter broken in dtests
> ---
>
> Key: CASSANDRA-14059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Minor
>
> Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
> {{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
> --nologcapture, errors of the following form are printed:
> {code}
> Traceback (most recent call last):
>   File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
> msg = self.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
> return fmt.format(record)
>   File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
> s = self._fmt % record.__dict__
> KeyError: 'current_test'
> Logged from file dtest.py, line 485
> {code}
> This is because CCM no longer installs a basic root logger configuration, 
> which is probably a more correct behavior than what it did prior to this 
> change. Now, dtest installs its own basic root logger configuration which 
> writes to 'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
> %(current_test)s %(levelname)s %(message)s'}}. This means that anything 
> logging a message must provide the current_test key in its extras map. The 
> dtest {{debug}} and {{warning}} functions do this, but logging from 
> dependencies doesn't, producing these {{KeyError}}s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14059) Root logging formatter broken in dtests

2017-11-17 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-14059:
-

 Summary: Root logging formatter broken in dtests
 Key: CASSANDRA-14059
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14059
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Joel Knighton
Assignee: Joel Knighton
Priority: Minor


Since the ccm dependency in dtest was bumped to {{3.1.0}} in 
{{7cc06a086f89ed76499837558ff263d84337acba}}, when dtests are run with 
--nologcapture, errors of the following form are printed:
{code}
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 861, in emit
msg = self.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 734, in format
return fmt.format(record)
  File "/usr/lib64/python2.7/logging/__init__.py", line 469, in format
s = self._fmt % record.__dict__
KeyError: 'current_test'
Logged from file dtest.py, line 485
{code}

This is because CCM no longer installs a basic root logger configuration, which 
is probably a more correct behavior than what it did prior to this change. Now, 
dtest installs its own basic root logger configuration which writes to 
'dtest.log' using the formatter {{'%(asctime)s,%(msecs)d %(name)s 
%(current_test)s %(levelname)s %(message)s'}}. This means that anything logging 
a message must provide the current_test key in its extras map. The dtest 
{{debug}} and {{warning}} functions do this, but logging from dependencies 
doesn't, producing these {{KeyError}}s. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14025) test_simple_strategy_counters - consistency_test.TestAccuracy always fails code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 'AND' expecting

2017-11-16 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14025:
--
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Thanks for the review! Committed as 
{{d67fd66253ebb8aabc7d90d7ff048a4c9a2ff2cf}}.

> test_simple_strategy_counters - consistency_test.TestAccuracy always fails 
> code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 
> 'AND' expecting EOF (... text, age int ) [AND]...)
> ---
>
> Key: CASSANDRA-14025
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14025
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> test_simple_strategy_counters - consistency_test.TestAccuracy always fails 
> code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 
> 'AND' expecting EOF (... text,age int) [AND]...)
> 
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-dYXpHm
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'memtable_allocation_type': 'offheap_objects',
> 'num_tokens': '256',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing single dc, counters
> dtest: DEBUG: Changing snitch for single dc case
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 704, in 
> test_simple_strategy_counters
> 
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters,
>  [self.nodes], [self.rf], combinations)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
> self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 147, in 
> _start_cluster
> self.create_tables(session, requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 156, in 
> create_tables
> self.create_users_table(session, requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 178, in 
> create_users_table
> session.execute(create_cmd)
>   File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "cassandra/cluster.py", line 3982, in 
> cassandra.cluster.ResponseFuture.result
> raise self._final_exception
> '\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-dYXpHm\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'memtable_allocation_type\': 
> \'offheap_objects\',\n\'num_tokens\': \'256\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ndtest: DEBUG: Testing single dc, 
> counters\ndtest: DEBUG: Changing snitch for single dc 
> case\ncassandra.cluster: INFO: New Cassandra host  
> discovered\ncassandra.cluster: INFO: New Cassandra host  
> discovered\n- >> end captured logging << 
> -'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14025) test_simple_strategy_counters - consistency_test.TestAccuracy always fails code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 'AND' expecting

2017-11-16 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-14025:
--
Status: Patch Available  (was: Open)

Patch 
[here|https://github.com/jkni/cassandra-dtest/commit/eff8d51c9cdd16e39afa871cd69b4df24a5f858a].
 This is a trivial fix to table creation syntax. Prior to [CASSANDRA-10857], 
this table had a {{WITH COMPACT STORAGE}} suffix. With that removed, we need to 
introduce options with {{WITH}} ourselves.

> test_simple_strategy_counters - consistency_test.TestAccuracy always fails 
> code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 
> 'AND' expecting EOF (... text, age int ) [AND]...)
> ---
>
> Key: CASSANDRA-14025
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14025
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> test_simple_strategy_counters - consistency_test.TestAccuracy always fails 
> code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 
> 'AND' expecting EOF (... text,age int) [AND]...)
> 
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-dYXpHm
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'memtable_allocation_type': 'offheap_objects',
> 'num_tokens': '256',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing single dc, counters
> dtest: DEBUG: Changing snitch for single dc case
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 704, in 
> test_simple_strategy_counters
> 
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters,
>  [self.nodes], [self.rf], combinations)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
> self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 147, in 
> _start_cluster
> self.create_tables(session, requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 156, in 
> create_tables
> self.create_users_table(session, requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 178, in 
> create_users_table
> session.execute(create_cmd)
>   File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "cassandra/cluster.py", line 3982, in 
> cassandra.cluster.ResponseFuture.result
> raise self._final_exception
> '\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-dYXpHm\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'memtable_allocation_type\': 
> \'offheap_objects\',\n\'num_tokens\': \'256\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ndtest: DEBUG: Testing single dc, 
> counters\ndtest: DEBUG: Changing snitch for single dc 
> case\ncassandra.cluster: INFO: New Cassandra host  
> discovered\ncassandra.cluster: INFO: New Cassandra host  
> discovered\n- >> end captured logging << 
> -'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14025) test_simple_strategy_counters - consistency_test.TestAccuracy always fails code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 'AND' expecting

2017-11-15 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-14025:
-

Assignee: Joel Knighton

> test_simple_strategy_counters - consistency_test.TestAccuracy always fails 
> code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 
> 'AND' expecting EOF (... text, age int ) [AND]...)
> ---
>
> Key: CASSANDRA-14025
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14025
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Michael Kjellman
>Assignee: Joel Knighton
>
> test_simple_strategy_counters - consistency_test.TestAccuracy always fails 
> code=2000 [Syntax error in CQL query] message="line 7:14 mismatched input 
> 'AND' expecting EOF (... text,age int) [AND]...)
> 
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-dYXpHm
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'memtable_allocation_type': 'offheap_objects',
> 'num_tokens': '256',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Testing single dc, counters
> dtest: DEBUG: Changing snitch for single dc case
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 704, in 
> test_simple_strategy_counters
> 
> self._run_test_function_in_parallel(TestAccuracy.Validation.validate_counters,
>  [self.nodes], [self.rf], combinations)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 535, in 
> _run_test_function_in_parallel
> self._start_cluster(save_sessions=True, 
> requires_local_reads=requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 147, in 
> _start_cluster
> self.create_tables(session, requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 156, in 
> create_tables
> self.create_users_table(session, requires_local_reads)
>   File "/home/cassandra/cassandra-dtest/consistency_test.py", line 178, in 
> create_users_table
> session.execute(create_cmd)
>   File "cassandra/cluster.py", line 2122, in cassandra.cluster.Session.execute
> return self.execute_async(query, parameters, trace, custom_payload, 
> timeout, execution_profile, paging_state).result()
>   File "cassandra/cluster.py", line 3982, in 
> cassandra.cluster.ResponseFuture.result
> raise self._final_exception
> '\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-dYXpHm\ndtest: DEBUG: Done setting configuration options:\n{   
> \'initial_token\': None,\n\'memtable_allocation_type\': 
> \'offheap_objects\',\n\'num_tokens\': \'256\',\n
> \'phi_convict_threshold\': 5,\n\'range_request_timeout_in_ms\': 1,\n  
>   \'read_request_timeout_in_ms\': 1,\n\'request_timeout_in_ms\': 
> 1,\n\'truncate_request_timeout_in_ms\': 1,\n
> \'write_request_timeout_in_ms\': 1}\ndtest: DEBUG: Testing single dc, 
> counters\ndtest: DEBUG: Changing snitch for single dc 
> case\ncassandra.cluster: INFO: New Cassandra host  
> discovered\ncassandra.cluster: INFO: New Cassandra host  
> discovered\n- >> end captured logging << 
> -'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14007) cqlshlib tests fail due to compact table

2017-11-13 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16249778#comment-16249778
 ] 

Joel Knighton commented on CASSANDRA-14007:
---

Yeah, the cqlshlib tests have their own script to run and don't run as part of 
dtests. See 
[https://github.com/apache/cassandra-builds/blob/f0e63d66269f9086c3a0393a24a55577d21b4454/build-scripts/cassandra-cqlsh-tests.sh]
 for an example of how to run them.

> cqlshlib tests fail due to compact table
> 
>
> Key: CASSANDRA-14007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14007
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Alex Petrov
>
> The pylib/cqlshlib tests fail on initialization with the error 
> {{SyntaxException:  query\] message="Compact tables are not allowed in Cassandra starting with 
> 4.0 version.">}}. 
> The table {{dynamic_columns}} is created {{WITH COMPACT STORAGE}}. Since 
> [CASSANDRA-10857], this is no longer supported. It looks like dropping the 
> COMPACT STORAGE modifier is enough for the tests to run, but I haven't looked 
> if we should instead remove the table and all related tests entirely, or if 
> there's an interesting code path covered by this that we should test in a 
> different way now. [~ifesdjeen] might know at a glance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14007) cqlshlib tests fail due to compact table

2017-11-09 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-14007:
-

 Summary: cqlshlib tests fail due to compact table
 Key: CASSANDRA-14007
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14007
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Joel Knighton


The pylib/cqlshlib tests fail on initialization with the error 
{{SyntaxException: }}. 

The table {{dynamic_columns}} is created {{WITH COMPACT STORAGE}}. Since 
[CASSANDRA-10857], this is no longer supported. It looks like dropping the 
COMPACT STORAGE modifier is enough for the tests to run, but I haven't looked 
if we should instead remove the table and all related tests entirely, or if 
there's an interesting code path covered by this that we should test in a 
different way now. [~ifesdjeen] might know at a glance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13978) dtest failure in cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_pep8_compliance due to deprecation warning

2017-10-27 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-13978:
-

 Summary: dtest failure in 
cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_pep8_compliance due to deprecation 
warning
 Key: CASSANDRA-13978
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13978
 Project: Cassandra
  Issue Type: Test
  Components: Testing
Reporter: Joel Knighton
Priority: Trivial


The dtest {{cqlsh_tests/cqlsh_tests.py:TestCqlsh.test_pep8_compliance}} fails 
on all branches with pep8 package version 1.7.1. The pep8 package has been 
deprecated and renamed pycodestyle.

{code}
==
FAIL: test_pep8_compliance (cqlsh_tests.cqlsh_tests.TestCqlsh)
--
Traceback (most recent call last):
  File "/home/jkni/projects/cassandra-dtest/tools/decorators.py", line 48, in 
wrapped
f(obj)
  File "/home/jkni/projects/cassandra-dtest/cqlsh_tests/cqlsh_tests.py", line 
68, in test_pep8_compliance
self.assertEqual(len(stderr), 0, stderr)
AssertionError: 
/home/jkni/projects/cassandra-dtest/venv/lib/python2.7/site-packages/pep8.py:2124:
 UserWarning: 

pep8 has been renamed to pycodestyle (GitHub issue #466)
Use of the pep8 tool will be removed in a future release.
Please install and use `pycodestyle` instead.

$ pip install pycodestyle
$ pycodestyle ...

  '\n\n'
{code}

We should update this dependency from pep8 to pycodestyle. With this change, 
several new errors are thrown. I don't know if these are new checks that we 
should choose to ignore, false positives due to new behaviors, or false 
negatives that are now successfully caught. If they were previously false 
negatives, we'll need to fix these in cqlsh on some branches.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13873) Ref bug in Scrub

2017-10-20 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13873:
--
Reproduced In: 3.11.0, 3.10, 4.0  (was: 3.10, 3.11.0, 4.0)
 Priority: Major  (was: Critical)

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Joel Knighton
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13873) Ref bug in Scrub

2017-10-20 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212774#comment-16212774
 ] 

Joel Knighton commented on CASSANDRA-13873:
---

+1 - I'd like to introduce as few changes as possible to older versions here. 
That combination sounds good to me. Do you want to prepare the patch for older 
versions or would you like me to?

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Joel Knighton
>Priority: Critical
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13873) Ref bug in Scrub

2017-10-19 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211081#comment-16211081
 ] 

Joel Knighton commented on CASSANDRA-13873:
---

You're correct that cancelling will also finish the txn and allow operations to 
select and reference canonical sstables. In the specific repro that Jake 
provided, which is the case of multiple scrubs over the same cfs (an admittedly 
somewhat artificial case), we'll try to select and reference canonical sstables 
in the snapshot before cancelling the original scrub compaction, so the new 
scrubs will hang until the original scrub finishes.

That'd be great if you could review. I'm admittedly very unfamiliar with this 
part of the code, so I expect my initial patch is a rough sketch of the 
eventual solution.

As far as criticality goes, I could go either way. I know of no situations that 
this causes data loss or permanent deadlocks at this time, but it can 
potentially cause operations referencing canonical sstables to hang for long 
periods of time.

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Joel Knighton
>Priority: Critical
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13873) Ref bug in Scrub

2017-10-18 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13873:
--
Reproduced In: 3.11.0, 3.10, 4.0  (was: 3.10, 3.11.0, 4.0)
   Status: Patch Available  (was: In Progress)

It looks like this situation can occur when referencing canonical sstables. As 
far as I can tell, the issue reproduces only when we have an sstable in a 
lifecycle transaction with no referencers other than its selfref. If the 
lifecycle transaction updates this sstable, we'll put a new instance of the 
sstable reader in the tracker. This causes no problems when getting live 
sstables, but the canonical sstables can also include sstable readers from the 
compacting set. In this case, the sstable reader that got updated will still be 
in the compacting set, but we won't be able to reference it when we try to 
select and reference canonical sstables, since its instance tidier has run when 
its last ref was released in the lifecycle transaction. Note that the global 
tidier doesn't run, since the updated sstable reader is still referenced. With 
the reproduction provided above in the multiple scrub, the scrubs will 
eventually proceed once the lifecycle transaction finishes, since it will put 
an updated sstablereader in the tracker. If there is a situation where a 
lifecyce transaction needed to select canonical sstables to proceed, this could 
cause a deadlock.

I pushed a branch at 
[c13873-2.2|https://github.com/jkni/cassandra/commit/ba70f70d97f648037e742a16bfdf1c8002d2be9c]
 that implements the simplest fix I can think of. The patch references the 
original sstables involved in a lifecycle transaction when we create the 
transaction, releasing these references whenever we do postCleanup or cancel an 
sstable reader from a transaction. I merged this forward and tests came back 
clean on all active branches. I'm not sure if there is some existing mechanism 
that should cover this case - maybe [~krummas] knows from reviewing 
[CASSANDRA-9699]?

> Ref bug in Scrub
> 
>
> Key: CASSANDRA-13873
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13873
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Joel Knighton
>Priority: Critical
>
> I'm hitting a Ref bug when many scrubs run against a node.  This doesn't 
> happen on 3.0.X.  I'm not sure if/if not this happens with compactions too 
> but I suspect it does.
> I'm not seeing any Ref leaks or double frees.
> To Reproduce:
> {quote}
> ./tools/bin/cassandra-stress write n=10m -rate threads=100
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> #Ctrl-C
> ./bin/nodetool scrub
> {quote}
> Eventually in the logs you get:
> WARN  [RMI TCP Connection(4)-127.0.0.1] 2017-09-14 15:51:26,722 
> NoSpamLogger.java:97 - Spinning trying to capture readers 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-32-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-31-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-29-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-27-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-26-big-Data.db'),
>  
> BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-20-big-Data.db')],
> *released: 
> [BigTableReader(path='/home/jake/workspace/cassandra2/data/data/keyspace1/standard1-2eb5c780998311e79e09311efffdcd17/mc-5-big-Data.db')],*
>  
> This released table has a selfRef of 0 but is in the Tracker



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13866) Clock-dependent integer overflow in tests CellTest and RowsTest

2017-09-15 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13866:
--
   Resolution: Fixed
Fix Version/s: (was: 3.11.x)
   (was: 4.x)
   (was: 3.0.x)
   4.0
   3.11.1
   3.0.15
   Status: Resolved  (was: Ready to Commit)

Thanks! Committed to 3.0 branch as {{d79fc9a2258d10e8a54fd4136d5544e10ad3ddda}} 
and merged forward.

> Clock-dependent integer overflow in tests CellTest and RowsTest
> ---
>
> Key: CASSANDRA-13866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13866
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Trivial
> Fix For: 3.0.15, 3.11.1, 4.0
>
>
> These tests create timestamps from Unix time, but this is done as int math 
> with the result stored in a long. This means that if the test is run at 
> certain times, like 1505177731, corresponding to Tuesday, September 12, 2017, 
> 12:55:31, the test can have two timestamps separated by a single second that 
> reverse their ordering when multiplied by 100, such as 1505177731 -> 
> 2147149504 and 1505177732 -> -2146817792. This causes a variety of test 
> failures, since it changes the reconciliation order of these cells.
> Note that I've tagged this as trivial because the problem is in the manual 
> construction of timestamps in the test; I know of nowhere  that we make this 
> mistake with real data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13864) Failure to execute cql script using cqlsh with nested SOURCE on cassandra 3.11.0

2017-09-14 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-13864.
---
Resolution: Invalid

This JIRA is for the Apache Cassandra project. I don't believe this issue is 
reproducible in Apache Cassandra. You should report issues in DataStax products 
to DataStax.

> Failure to execute cql script  using cqlsh with nested SOURCE on cassandra 
> 3.11.0
> -
>
> Key: CASSANDRA-13864
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13864
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: SUSE Linux Enterprise Server 12
> Python 2.7.7
>Reporter: Haim Raman
> Attachments: test.zip
>
>
> I have a script I used to execute queries and DDL on cassandra 2.1.15 (based 
> on DSE 4.8.10).
> The script include nested use of the SOURCE command.
> {code:title=1.sql|borderStyle=solid}
> USE test;
> SOURCE '2.sql'
> exit;
> {code}
> {code:title=2.sql|borderStyle=solid}
> SELECT count(1) FROM user;
> SOURCE '3.sql';
> {code}
> {code:title=3.sql|borderStyle=solid}
> SELECT count(1) FROM user;
> {code}
> When executing this script with DSE 4.8.10 it runs correctly and output
> {code:title=cassandra  2.1.15|borderStyle=none}
> cqlsh –f  1.sql
> count
> 
>  0
> (1 rows)
> count
> 
>  0
> (1 rows)
> {code}
> Running the same script with cassandra 3.11.0 (based on DSE 5.1.2).
> {code:title=cassandra  3.11.0|borderStyle=none}
> cqlsh –f  1.sql
>  count
> ---
>  0
> (1 rows)
> Warnings :
> Aggregation query used without partition key
> 2.sql:3:DSEShell instance has no attribute 'execution_profiles'
> {code}
> *The actual issue is that the script in 3.sql is not executed.*
> I did some additional investigations
> # With DSE-5.1.2 I switch the off to authenticator: AllowAllAuthenticator 
> authorizer: AllowAllAuthorizer, but I am still experiencing the issue
> # With DSE-5.0.9 (Cassandra 3.0.13.1735) it works
> h4.Steps to reproduce: 
> Use the attached test.zip
> execute
> cqlsh -f  create_keyspace.sql
> cqlsh -f 1.sql



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2017-09-14 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166361#comment-16166361
 ] 

Joel Knighton commented on CASSANDRA-13692:
---

Thanks for the patch (and rebase/test rerun)! I'll review this later today.

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-dtest-results-updated.PNG, c13692-3.0-testall-results.PNG, 
> c13692-3.11-dtest-results.PNG, c13692-3.11-dtest-results-updated.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-dtest-results-updated.PNG, c13692-testall-results.PNG, 
> c13692-testall-results-updated.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13866) Clock-dependent integer overflow in tests CellTest and RowsTest

2017-09-13 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13866:
--
Status: Patch Available  (was: Open)

Trivial patches for 
[3.0|https://github.com/jkni/cassandra/tree/CASSANDRA-13866-3.0], 
[3.11|https://github.com/jkni/cassandra/tree/CASSANDRA-13866-3.11], and 
[trunk|https://github.com/jkni/cassandra/tree/CASSANDRA-13866-trunk]. The 3.0 
patch merges forward cleanly.

> Clock-dependent integer overflow in tests CellTest and RowsTest
> ---
>
> Key: CASSANDRA-13866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13866
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Trivial
>
> These tests create timestamps from Unix time, but this is done as int math 
> with the result stored in a long. This means that if the test is run at 
> certain times, like 1505177731, corresponding to Tuesday, September 12, 2017, 
> 12:55:31, the test can have two timestamps separated by a single second that 
> reverse their ordering when multiplied by 100, such as 1505177731 -> 
> 2147149504 and 1505177732 -> -2146817792. This causes a variety of test 
> failures, since it changes the reconciliation order of these cells.
> Note that I've tagged this as trivial because the problem is in the manual 
> construction of timestamps in the test; I know of nowhere  that we make this 
> mistake with real data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13866) Clock-dependent integer overflow in tests CellTest and RowsTest

2017-09-13 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-13866:
-

 Summary: Clock-dependent integer overflow in tests CellTest and 
RowsTest
 Key: CASSANDRA-13866
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13866
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Joel Knighton
Assignee: Joel Knighton
Priority: Trivial


These tests create timestamps from Unix time, but this is done as int math with 
the result stored in a long. This means that if the test is run at certain 
times, like 1505177731, corresponding to Tuesday, September 12, 2017, 12:55:31, 
the test can have two timestamps separated by a single second that reverse 
their ordering when multiplied by 100, such as 1505177731 -> 2147149504 and 
1505177732 -> -2146817792. This causes a variety of test failures, since it 
changes the reconciliation order of these cells.

Note that I've tagged this as trivial because the problem is in the manual 
construction of timestamps in the test; I know of nowhere  that we make this 
mistake with real data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13847) test failure in cqlsh_tests.cqlsh_tests.CqlLoginTest.test_list_roles_after_login

2017-09-06 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-13847:
-

 Summary: test failure in 
cqlsh_tests.cqlsh_tests.CqlLoginTest.test_list_roles_after_login
 Key: CASSANDRA-13847
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13847
 Project: Cassandra
  Issue Type: Bug
  Components: Testing, Tools
Reporter: Joel Knighton


example failure:

http://cassci.datastax.com/job/cassandra-2.1_dtest/546/testReport/cqlsh_tests.cqlsh_tests/CqlLoginTest/test_list_roles_after_login

This test was added for [CASSANDRA-13640]. The comments seem to indicated this 
is only a problem on 3.0+, but the added test certainly seems to reproduce the 
problem on 2.1 and 2.2. Even if the issue does affect 2.1/2.2, it seems 
insufficiently critical for 2.1, so we need to limit the test to run on 2.2+ at 
the very least, possibly 3.0+ if we don't fix the cause on 2.2.

Thoughts [~adelapena]?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2017-09-06 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13692:
--
Reviewer: Joel Knighton

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-dtest-results-updated.PNG, c13692-3.0-testall-results.PNG, 
> c13692-3.11-dtest-results.PNG, c13692-3.11-dtest-results-updated.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13837) Hanging threads in BulkLoader

2017-09-05 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153917#comment-16153917
 ] 

Joel Knighton commented on CASSANDRA-13837:
---

I pointed [~mshuler] here prematurely - my bad. It looks like the System.exit 
uncommented in [CASSANDRA-13836] resolves the hang we were seeing on that test. 
The remaining hangs seem to be repair-related and unrelated to this ticket.

> Hanging threads in BulkLoader
> -
>
> Key: CASSANDRA-13837
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13837
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
>
> [~krummas] discovered some threads that were not closing correctly when he 
> fixed CASSANDRA-13836. We suspect this is due to 
> CASSANDRA-8457/CASSANDRA-12229.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13764) SelectTest.testMixedTTLOnColumnsWide is flaky

2017-08-29 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16146196#comment-16146196
 ] 

Joel Knighton commented on CASSANDRA-13764:
---

The idea looks sound to me - I think you need to remove the TTL column from the 
select 
[here|https://github.com/jeffjirsa/cassandra/commit/a1e49db69622de11a996d09105e5ebf3b54c58c3#diff-7f5981228f9d9428fb164aa91316aa85R2976],
 as you did in {{testMixedTTLOnColumnsWide}}. If you agree, I'm comfortable 
with you doing that on commit and don't need to rereview if CI looks good.

> SelectTest.testMixedTTLOnColumnsWide is flaky
> -
>
> Key: CASSANDRA-13764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Jeff Jirsa
>Priority: Trivial
>
> {{org.apache.cassandra.cql3.validation.operations.SelectTest.testMixedTTLOnColumnsWide}}
>  is flaky. This is because it inserts rows and then asserts their contents 
> using {{ttl()}} in the select, but if the test is sufficiently slow, the 
> remaining ttl may change by the time the select is run. Anecdotally, 
> {{testSelectWithAlias}} in the same class uses a fudge factor of 1 second 
> that would fix all the failures I've seen, but it might make more sense to 
> measure the elapsed time in the test and calculate the acceptable variation 
> from that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12870) Calculate pending ranges for identical KS settings just once

2017-08-28 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143823#comment-16143823
 ] 

Joel Knighton commented on CASSANDRA-12870:
---

Yes, sorry - this dropped off my radar. I can get to this in the next week or 
so. Otherwise, if someone else is interested in taking it sooner and does so, I 
definitely won't be offended.

> Calculate pending ranges for identical KS settings just once
> 
>
> Key: CASSANDRA-12870
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12870
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Stefan Podkowinski
>Assignee: Stefan Podkowinski
> Fix For: 4.x
>
> Attachments: 12870-trunk.patch
>
>
> The {{TokenMetadata.calculatePendingRanges()}} operation can be very 
> expensive and already has been subject to micro optimization in 
> CASSANDRA-9258. Instead of further optimizing the calculation itself, I'd 
> like to prevent it from being executed as often by only calling it just once 
> for all keyspaces sharing identical replication settings. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13594) Use an ExecutorService for repair commands instead of new Thread(..).start()

2017-08-15 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127540#comment-16127540
 ] 

Joel Knighton commented on CASSANDRA-13594:
---

No problem - fix committed as {{256a74faa31fcf25bdae753c563fa2c69f7f355c}}. 
Thanks!

> Use an ExecutorService for repair commands instead of new Thread(..).start()
> 
>
> Key: CASSANDRA-13594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13594
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.0
>
> Attachments: 13594.png
>
>
> Currently when starting a new repair, we create a new Thread and start it 
> immediately
> It would be nice to be able to 1) limit the number of threads and 2) reject 
> starting new repair commands if we are already running too many.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13594) Use an ExecutorService for repair commands instead of new Thread(..).start()

2017-08-15 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127479#comment-16127479
 ] 

Joel Knighton commented on CASSANDRA-13594:
---

This causes a test failure in {{DatabaseDescriptorRefTest}} because of the new 
Config class - I've pushed a fix 
[here|https://github.com/jkni/cassandra/commit/ec3e7a84e5bae4b6968ee39a39f331fe0f5dd036].

> Use an ExecutorService for repair commands instead of new Thread(..).start()
> 
>
> Key: CASSANDRA-13594
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13594
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.0
>
> Attachments: 13594.png
>
>
> Currently when starting a new repair, we create a new Thread and start it 
> immediately
> It would be nice to be able to 1) limit the number of threads and 2) reject 
> starting new repair commands if we are already running too many.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13764) SelectTest.testMixedTTLOnColumnsWide is flaky

2017-08-15 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127438#comment-16127438
 ] 

Joel Knighton commented on CASSANDRA-13764:
---

This also affects {{SelectTest.testMixedTTLOnColumns}}.

> SelectTest.testMixedTTLOnColumnsWide is flaky
> -
>
> Key: CASSANDRA-13764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13764
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Priority: Trivial
>
> {{org.apache.cassandra.cql3.validation.operations.SelectTest.testMixedTTLOnColumnsWide}}
>  is flaky. This is because it inserts rows and then asserts their contents 
> using {{ttl()}} in the select, but if the test is sufficiently slow, the 
> remaining ttl may change by the time the select is run. Anecdotally, 
> {{testSelectWithAlias}} in the same class uses a fudge factor of 1 second 
> that would fix all the failures I've seen, but it might make more sense to 
> measure the elapsed time in the test and calculate the acceptable variation 
> from that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13764) SelectTest.testMixedTTLOnColumnsWide is flaky

2017-08-14 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-13764:
-

 Summary: SelectTest.testMixedTTLOnColumnsWide is flaky
 Key: CASSANDRA-13764
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13764
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Joel Knighton
Priority: Trivial


{{org.apache.cassandra.cql3.validation.operations.SelectTest.testMixedTTLOnColumnsWide}}
 is flaky. This is because it inserts rows and then asserts their contents 
using {{ttl()}} in the select, but if the test is sufficiently slow, the 
remaining ttl may change by the time the select is run. Anecdotally, 
{{testSelectWithAlias}} in the same class uses a fudge factor of 1 second that 
would fix all the failures I've seen, but it might make more sense to measure 
the elapsed time in the test and calculate the acceptable variation from that 
time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11483) Enhance sstablemetadata

2017-08-11 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124103#comment-16124103
 ] 

Joel Knighton commented on CASSANDRA-11483:
---

The dtest fix was committed in [CASSANDRA-13755] - thanks everyone.

> Enhance sstablemetadata
> ---
>
> Key: CASSANDRA-11483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 4.0
>
> Attachments: CASSANDRA-11483.txt, CASSANDRA-11483v2.txt, 
> CASSANDRA-11483v3.txt, CASSANDRA-11483v4.txt, CASSANDRA-11483v5.txt, Screen 
> Shot 2016-04-03 at 11.40.32 PM.png
>
>
> sstablemetadata provides quite a bit of useful information but theres a few 
> hiccups I would like to see addressed:
> * Does not use client mode
> * Units are not provided (or anything for that matter). There is data in 
> micros, millis, seconds as durations and timestamps from epoch. But there is 
> no way to tell what one is without a non-trival code dive
> * in general pretty frustrating to parse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13755) dtest failure: repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test

2017-08-10 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122535#comment-16122535
 ] 

Joel Knighton commented on CASSANDRA-13755:
---

Thanks!

> dtest failure: 
> repair_tests.incremental_repair_test:TestIncRepair.consistent_repair_test
> 
>
> Key: CASSANDRA-13755
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13755
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Joel Knighton
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11483) Enhance sstablemetadata

2017-08-09 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120382#comment-16120382
 ] 

Joel Knighton commented on CASSANDRA-11483:
---

Is there a branch of dtest changes for this ticket anywhere? While working on 
an unrelated ticket, I noticed this commit broke 
{{repair_tests.incremental_repair_test.TestIncRepair.consistent_repair_test}}. 
I've pushed a trivial fix 
[here|https://github.com/jkni/cassandra-dtest/commit/f55f78b093fc668dc5cc9d1fc72f66dc5a9bf3a6],
 but I don't want to commit it and create conflicts if there's already an 
existing dtest branch. I didn't notice any other tests that needed fixing.

> Enhance sstablemetadata
> ---
>
> Key: CASSANDRA-11483
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11483
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability
>Reporter: Chris Lohfink
>Assignee: Chris Lohfink
>Priority: Minor
> Fix For: 4.0
>
> Attachments: CASSANDRA-11483.txt, CASSANDRA-11483v2.txt, 
> CASSANDRA-11483v3.txt, CASSANDRA-11483v4.txt, CASSANDRA-11483v5.txt, Screen 
> Shot 2016-04-03 at 11.40.32 PM.png
>
>
> sstablemetadata provides quite a bit of useful information but theres a few 
> hiccups I would like to see addressed:
> * Does not use client mode
> * Units are not provided (or anything for that matter). There is data in 
> micros, millis, seconds as durations and timestamps from epoch. But there is 
> no way to tell what one is without a non-trival code dive
> * in general pretty frustrating to parse



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13713) Move processing of EchoMessage response to gossip stage

2017-08-02 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111375#comment-16111375
 ] 

Joel Knighton edited comment on CASSANDRA-13713 at 8/2/17 5:31 PM:
---

Ah - guess I missed that one. LGTM to the change here. I don't have any strong 
opinions as to where it goes in 2.2+. While we don't have any known problems 
related to this, we also know bugs can lurk in this subsystem for a long time, 
and this seems like a safe preventative measure.


was (Author: jkni):
Ah - guess I missed that one. +1 to the change here. I don't have any strong 
opinions as to where it goes in 2.2+. While we don't have any known problems 
related to this, we also know bugs can lurk in this subsystem for a long time, 
and this seems like a safe preventative measure.

> Move processing of EchoMessage response to gossip stage
> ---
>
> Key: CASSANDRA-13713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13713
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> Currently, when a node receives an {{EchoMessage}}, is sends a simple ACK 
> reply back (see {{EchoVerbHandler}}). The ACK is sent on the small message 
> connection, and because it is 'generically' typed as 
> {{Verb.REQUEST_RESPONSE}}, is consumed on a {{Stage.REQUEST_RESPONSE}} 
> thread. The proper thread for this response to be consumed is 
> {{Stage.GOSSIP}}, that way we can move more of the updating of the gossip 
> state to a single, centralized thread, and less abuse of gossip's shared 
> mutable state can occur.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13713) Move processing of EchoMessage response to gossip stage

2017-08-02 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111375#comment-16111375
 ] 

Joel Knighton commented on CASSANDRA-13713:
---

Ah - guess I missed that one. +1 to the change here. I don't have any strong 
opinions as to where it goes in 2.2+. While we don't have any known problems 
related to this, we also know bugs can lurk in this subsystem for a long time, 
and this seems like a safe preventative measure.

> Move processing of EchoMessage response to gossip stage
> ---
>
> Key: CASSANDRA-13713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13713
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> Currently, when a node receives an {{EchoMessage}}, is sends a simple ACK 
> reply back (see {{EchoVerbHandler}}). The ACK is sent on the small message 
> connection, and because it is 'generically' typed as 
> {{Verb.REQUEST_RESPONSE}}, is consumed on a {{Stage.REQUEST_RESPONSE}} 
> thread. The proper thread for this response to be consumed is 
> {{Stage.GOSSIP}}, that way we can move more of the updating of the gossip 
> state to a single, centralized thread, and less abuse of gossip's shared 
> mutable state can occur.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-08-01 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108999#comment-16108999
 ] 

Joel Knighton commented on CASSANDRA-13700:
---

I'm not sure of any places in the current codebase where the distinction 
matters in practice, but the change is cleaner and makes the code more tolerant 
of changes elsewhere, so +1.

> Heartbeats can cause gossip information to go permanently missing on certain 
> nodes
> --
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Critical
> Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0, 2.1.19
>
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send. 
> When we're getting state for ourselves, this means that we add a reference to 
> the local {{HeartBeatState}}. Then, once we've built a message (in either the 
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case 
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may 
> run before serialization of the Syn or Ack. This means that when the 
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the 
> {{HeartBeatState}} version of the local node as stored in the endpoint state 
> map. Then, when we finally serialize the Syn or Ack, we'll follow the 
> reference to the {{HeartBeatState}} and serialize it with a higher version 
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when 
> constructing an Ack and send it through the {{MessagingService}}. Then, we 
> add some piece of state with version 5 to our local {{EndpointState}}. If 
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
> the {{MessageOut}} containing the Ack is serialized, the node receiving the 
> Ack will believe it is current to version 6, despite the fact that it has 
> never received a message containing the {{ApplicationState}} tagged with 
> version 5.
> I've reproduced in this in several versions; so far, I believe this is 
> possible in all versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13713) Move processing of EchoMessage response to gossip stage

2017-07-26 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102175#comment-16102175
 ] 

Joel Knighton commented on CASSANDRA-13713:
---

The simple change here seems good - do you think it's worth doing something to 
map the echo response directly to the gossip stage and gossip connection? This 
seems like a good fix for the present state, but I think perhaps a different 
verb or a way to map individual REQUEST_RESPONSE verbs to a stage/connection 
makes sense on a longer term.

> Move processing of EchoMessage response to gossip stage
> ---
>
> Key: CASSANDRA-13713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13713
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.x
>
>
> Currently, when a node receives an {{EchoMessage}}, is sends a simple ACK 
> reply back (see {{EchoVerbHandler}}). The ACK is sent on the small message 
> connection, and because it is 'generically' typed as 
> {{Verb.REQUEST_RESPONSE}}, is consumed on a {{Stage.REQUEST_RESPONSE}} 
> thread. The proper thread for this response to be consumed is 
> {{Stage.GOSSIP}}, that way we can move more of the updating of the gossip 
> state to a single, centralized thread, and less abuse of gossip's shared 
> mutable state can occur.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11825) NPE in gossip

2017-07-26 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102145#comment-16102145
 ] 

Joel Knighton commented on CASSANDRA-11825:
---

Executing {{#start(int)}} and {{#stop()}} on the gossip stage seems safe to me 
at first glance. As an alternative, I think it's safe to simply pass the 
generation we use at comparison time down the chain of methods and make sure to 
use that in the endpoint state we send, since the problem is a mismatch between 
generation at comparison time and generation at message sending time. Then, 
we'll send everything on the next message when we compare and see the 
generation difference.

Either seems fine to me, but I could be missing something on either. I'd spend 
more time on a proper review.

> NPE in gossip
> -
>
> Key: CASSANDRA-11825
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11825
> Project: Cassandra
>  Issue Type: Bug
>Reporter: T Jake Luciani
>Assignee: Joel Knighton
>  Labels: fallout
> Fix For: 3.0.x
>
>
> We have a test that causes an NPE in gossip code:
> It's basically calling nodetool enable/disable gossip
> From the debug log
> {quote}
> WARN  [RMI TCP Connection(17)-54.153.70.214] 2016-05-17 18:58:44,423 
> StorageService.java:395 - Starting gossip by operator request
> DEBUG [RMI TCP Connection(17)-54.153.70.214] 2016-05-17 18:58:44,424 
> StorageService.java:1996 - Node /172.31.24.76 state NORMAL, token 
> [-9223372036854775808]
> INFO  [RMI TCP Connection(17)-54.153.70.214] 2016-05-17 18:58:44,424 
> StorageService.java:1999 - Node /172.31.24.76 state jump to NORMAL
> DEBUG [RMI TCP Connection(17)-54.153.70.214] 2016-05-17 18:58:44,424 
> YamlConfigurationLoader.java:102 - Loading settings from 
> file:/mnt/ephemeral/automaton/cassandra-src/conf/cassandra.yaml
> DEBUG [PendingRangeCalculator:1] 2016-05-17 18:58:44,425 
> PendingRangeCalculatorService.java:66 - finished calculation for 5 keyspaces 
> in 0ms
> DEBUG [GossipStage:1] 2016-05-17 18:58:45,346 FailureDetector.java:456 - 
> Ignoring interval time of 75869093776 for /172.31.31.1
> DEBUG [GossipStage:1] 2016-05-17 18:58:45,347 FailureDetector.java:456 - 
> Ignoring interval time of 75869214424 for /172.31.17.32
> INFO  [GossipStage:1] 2016-05-17 18:58:45,347 Gossiper.java:1028 - Node 
> /172.31.31.1 has restarted, now UP
> DEBUG [GossipStage:1] 2016-05-17 18:58:45,347 StorageService.java:1996 - Node 
> /172.31.31.1 state NORMAL, token [-3074457345618258603]
> INFO  [GossipStage:1] 2016-05-17 18:58:45,347 StorageService.java:1999 - Node 
> /172.31.31.1 state jump to NORMAL
> INFO  [HANDSHAKE-/172.31.31.1] 2016-05-17 18:58:45,348 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.31.1
> ERROR [GossipStage:1] 2016-05-17 18:58:45,354 CassandraDaemon.java:195 - 
> Exception in thread Thread[GossipStage:1,5,main]
> java.lang.NullPointerException: null
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:846) 
> ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2008)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1729)
>  ~[main/:na]
>   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2446) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1050) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1133) 
> ~[main/:na]
>   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na]
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
> ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_40]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_40]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_40]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_40]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> INFO  [GossipStage:1] 2016-05-17 18:58:45,355 Gossiper.java:1028 - Node 
> /172.31.17.32 has restarted, now UP
> DEBUG [GossipStage:1] 2016-05-17 18:58:45,355 StorageService.java:1996 - Node 
> /172.31.17.32 state NORMAL, token [3074457345618258602]
> INFO  [GossipStage:1] 2016-05-17 18:58:45,356 StorageService.java:1999 - Node 
> /172.31.17.32 state jump to NORMAL
> INFO  [HANDSHAKE-/172.31.17.32] 2016-05-17 18:58:45,356 
> OutboundTcpConnection.java:514 - Handshaking version with /172.31.17.32
> DEBUG [PendingRangeCalculator:1] 2016-05-17 18:58:45,357 
> 

[jira] [Updated] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-07-24 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13700:
--
   Resolution: Fixed
Fix Version/s: 4.0
   3.11.1
   3.0.15
   2.2.11
   2.1.19
   Status: Resolved  (was: Ready to Commit)

Thanks! Tests looked good on all branches.

Committed to 2.1 as {{2290c0d4b0c20ce3407ae2c542e580c75a5ab337}} and merged 
forward through 2.2, 3.0, 3.11, and trunk.

> Heartbeats can cause gossip information to go permanently missing on certain 
> nodes
> --
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Critical
> Fix For: 2.1.19, 2.2.11, 3.0.15, 3.11.1, 4.0
>
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send. 
> When we're getting state for ourselves, this means that we add a reference to 
> the local {{HeartBeatState}}. Then, once we've built a message (in either the 
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case 
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may 
> run before serialization of the Syn or Ack. This means that when the 
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the 
> {{HeartBeatState}} version of the local node as stored in the endpoint state 
> map. Then, when we finally serialize the Syn or Ack, we'll follow the 
> reference to the {{HeartBeatState}} and serialize it with a higher version 
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when 
> constructing an Ack and send it through the {{MessagingService}}. Then, we 
> add some piece of state with version 5 to our local {{EndpointState}}. If 
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
> the {{MessageOut}} containing the Ack is serialized, the node receiving the 
> Ack will believe it is current to version 6, despite the fact that it has 
> never received a message containing the {{ApplicationState}} tagged with 
> version 5.
> I've reproduced in this in several versions; so far, I believe this is 
> possible in all versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-07-20 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094911#comment-16094911
 ] 

Joel Knighton commented on CASSANDRA-13700:
---

Thanks, Jason! In this case, I agree the first option is safer for this issue. 
Something like the second likely makes sense eventually, at least as part of a 
larger audit of correctness issues in gossip. I believe your volatile 
suggestion is correct.

I don't have a lot of helpful information to reproduce this; it reproduces in 
larger clusters, particularly with higher latency levels. We can see the 
effects locally with a few well-timed sleeps in MessagingService, but that 
isn't terribly representative.

Branches pushed here:
||branch||
|[13700-2.1|https://github.com/jkni/cassandra/tree/13700-2.1]||
|[13700-2.2|https://github.com/jkni/cassandra/tree/13700-2.2]||
|[13700-3.0|https://github.com/jkni/cassandra/tree/13700-3.0]||
|[13700-3.11|https://github.com/jkni/cassandra/tree/13700-3.11]||
|[13700-trunk|https://github.com/jkni/cassandra/tree/13700-trunk]||

There's a somewhat conceptually similar issue when we bump the gossip 
generation in the middle of constructing a reply - I believe that's the cause 
in [CASSANDRA-11825], which presents similar problems. I'm choosing to address 
them separately because they're indeed distinct problems and 11825 requires an 
additional trigger (enabling and disabling gossip during runtime).

> Heartbeats can cause gossip information to go permanently missing on certain 
> nodes
> --
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Critical
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send. 
> When we're getting state for ourselves, this means that we add a reference to 
> the local {{HeartBeatState}}. Then, once we've built a message (in either the 
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case 
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may 
> run before serialization of the Syn or Ack. This means that when the 
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the 
> {{HeartBeatState}} version of the local node as stored in the endpoint state 
> map. Then, when we finally serialize the Syn or Ack, we'll follow the 
> reference to the {{HeartBeatState}} and serialize it with a higher version 
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when 
> constructing an Ack and send it through the {{MessagingService}}. Then, we 
> add some piece of state with version 5 to our local {{EndpointState}}. If 
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
> the {{MessageOut}} containing the Ack is serialized, the node receiving the 
> Ack will believe it is current to version 6, despite the fact that it has 
> never received a message containing the {{ApplicationState}} tagged with 
> version 5.
> I've reproduced in this in several versions; so far, I believe this is 
> possible in all versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-07-20 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13700:
--
Status: Patch Available  (was: In Progress)

> Heartbeats can cause gossip information to go permanently missing on certain 
> nodes
> --
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Critical
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send. 
> When we're getting state for ourselves, this means that we add a reference to 
> the local {{HeartBeatState}}. Then, once we've built a message (in either the 
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case 
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may 
> run before serialization of the Syn or Ack. This means that when the 
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the 
> {{HeartBeatState}} version of the local node as stored in the endpoint state 
> map. Then, when we finally serialize the Syn or Ack, we'll follow the 
> reference to the {{HeartBeatState}} and serialize it with a higher version 
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when 
> constructing an Ack and send it through the {{MessagingService}}. Then, we 
> add some piece of state with version 5 to our local {{EndpointState}}. If 
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
> the {{MessageOut}} containing the Ack is serialized, the node receiving the 
> Ack will believe it is current to version 6, despite the fact that it has 
> never received a message containing the {{ApplicationState}} tagged with 
> version 5.
> I've reproduced in this in several versions; so far, I believe this is 
> possible in all versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-07-20 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13700:
--
Reviewer: Jason Brown

> Heartbeats can cause gossip information to go permanently missing on certain 
> nodes
> --
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Critical
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send. 
> When we're getting state for ourselves, this means that we add a reference to 
> the local {{HeartBeatState}}. Then, once we've built a message (in either the 
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case 
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may 
> run before serialization of the Syn or Ack. This means that when the 
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the 
> {{HeartBeatState}} version of the local node as stored in the endpoint state 
> map. Then, when we finally serialize the Syn or Ack, we'll follow the 
> reference to the {{HeartBeatState}} and serialize it with a higher version 
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when 
> constructing an Ack and send it through the {{MessagingService}}. Then, we 
> add some piece of state with version 5 to our local {{EndpointState}}. If 
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
> the {{MessageOut}} containing the Ack is serialized, the node receiving the 
> Ack will believe it is current to version 6, despite the fact that it has 
> never received a message containing the {{ApplicationState}} tagged with 
> version 5.
> I've reproduced in this in several versions; so far, I believe this is 
> possible in all versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-07-19 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13700:
--
Description: 
In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
from the corresponding {{EndpointState}} to the {{EndpointState}} to send. When 
we're getting state for ourselves, this means that we add a reference to the 
local {{HeartBeatState}}. Then, once we've built a message (in either the Syn 
or Ack handler), we send it through the {{MessagingService}}. In the case that 
the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may run 
before serialization of the Syn or Ack. This means that when the {{GossipTask}} 
acquires the gossip {{taskLock}}, it may increment the {{HeartBeatState}} 
version of the local node as stored in the endpoint state map. Then, when we 
finally serialize the Syn or Ack, we'll follow the reference to the 
{{HeartBeatState}} and serialize it with a higher version than we saw when 
constructing the Ack or Ack2.

Consider the case where we see {{HeartBeatState}} with version 4 when 
constructing an Ack and send it through the {{MessagingService}}. Then, we add 
some piece of state with version 5 to our local {{EndpointState}}. If 
{{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
the {{MessageOut}} containing the Ack is serialized, the node receiving the Ack 
will believe it is current to version 6, despite the fact that it has never 
received a message containing the {{ApplicationState}} tagged with version 5.

I've reproduced in this in several versions; so far, I believe this is possible 
in all versions.

  was:
In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
from the corresponding {{EndpointState}} to the {{EndpointState}} to send. When 
we're getting state for ourselves, this means that we add a reference to the 
local {{HeartBeatState}}. Then, once we've built a message (in either the Syn 
or Ack handler), we send it through the {{MessagingService}}. In the case that 
the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may run 
before serialization of the Syn or Ack. This means that when the {{GossipTask}} 
acquires the gossip {{taskLock}}, it may increment the {{HeartBeatState}} 
version of the local node as stored in the endpoint state map. Then, when we 
finally serialize the Syn or Ack, we'll follow the reference to the 
{{HeartBeatState}} and serialize it with a higher version than we saw when 
constructing the Ack or Ack2.

Consider the case where we see {{HeartBeatState}} with version 4 when 
constructing an Ack and send it through the {{Messaging Service}}. Then, we add 
some piece of state with version 5 to our local {{EndpointState}}. If 
{{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
the {{MessageOut}} containing the Ack is serialized, the node receiving the Ack 
will believe it is current to version 6, despite the fact that it has never 
received a message containing the {{ApplicationState}} tagged with version 5.

I've reproduced in this in several versions; so far, I believe this is possible 
in all versions.


> Heartbeats can cause gossip information to go permanently missing on certain 
> nodes
> --
>
> Key: CASSANDRA-13700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Distributed Metadata
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>Priority: Critical
>
> In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
> from the corresponding {{EndpointState}} to the {{EndpointState}} to send. 
> When we're getting state for ourselves, this means that we add a reference to 
> the local {{HeartBeatState}}. Then, once we've built a message (in either the 
> Syn or Ack handler), we send it through the {{MessagingService}}. In the case 
> that the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may 
> run before serialization of the Syn or Ack. This means that when the 
> {{GossipTask}} acquires the gossip {{taskLock}}, it may increment the 
> {{HeartBeatState}} version of the local node as stored in the endpoint state 
> map. Then, when we finally serialize the Syn or Ack, we'll follow the 
> reference to the {{HeartBeatState}} and serialize it with a higher version 
> than we saw when constructing the Ack or Ack2.
> Consider the case where we see {{HeartBeatState}} with version 4 when 
> constructing an Ack and send it through the {{MessagingService}}. Then, we 
> add some piece of state with version 5 to our local {{EndpointState}}. If 
> {{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
> the 

[jira] [Created] (CASSANDRA-13700) Heartbeats can cause gossip information to go permanently missing on certain nodes

2017-07-19 Thread Joel Knighton (JIRA)
Joel Knighton created CASSANDRA-13700:
-

 Summary: Heartbeats can cause gossip information to go permanently 
missing on certain nodes
 Key: CASSANDRA-13700
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13700
 Project: Cassandra
  Issue Type: Bug
  Components: Distributed Metadata
Reporter: Joel Knighton
Assignee: Joel Knighton
Priority: Critical


In {{Gossiper.getStateForVersionBiggerThan}}, we add the {{HeartBeatState}} 
from the corresponding {{EndpointState}} to the {{EndpointState}} to send. When 
we're getting state for ourselves, this means that we add a reference to the 
local {{HeartBeatState}}. Then, once we've built a message (in either the Syn 
or Ack handler), we send it through the {{MessagingService}}. In the case that 
the {{MessagingService}} is sufficiently slow, the {{GossipTask}} may run 
before serialization of the Syn or Ack. This means that when the {{GossipTask}} 
acquires the gossip {{taskLock}}, it may increment the {{HeartBeatState}} 
version of the local node as stored in the endpoint state map. Then, when we 
finally serialize the Syn or Ack, we'll follow the reference to the 
{{HeartBeatState}} and serialize it with a higher version than we saw when 
constructing the Ack or Ack2.

Consider the case where we see {{HeartBeatState}} with version 4 when 
constructing an Ack and send it through the {{Messaging Service}}. Then, we add 
some piece of state with version 5 to our local {{EndpointState}}. If 
{{GossipTask}} runs and increases the {{HeartBeatState}} version to 6 before 
the {{MessageOut}} containing the Ack is serialized, the node receiving the Ack 
will believe it is current to version 6, despite the fact that it has never 
received a message containing the {{ApplicationState}} tagged with version 5.

I've reproduced in this in several versions; so far, I believe this is possible 
in all versions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13139) test failure in hintedhandoff_test.TestHintedHandoff.hintedhandoff_decom_test

2017-06-26 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-13139.
---
Resolution: Fixed

The linked PR resolved this issue.

> test failure in hintedhandoff_test.TestHintedHandoff.hintedhandoff_decom_test
> -
>
> Key: CASSANDRA-13139
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13139
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log, node4_debug.log, node4_gc.log, node4.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/503/testReport/hintedhandoff_test/TestHintedHandoff/hintedhandoff_decom_test
> {code}
> Error Message
> Subprocess ['nodetool', '-h', 'localhost', '-p', '7200', ['decommission']] 
> exited with non-zero status; exit status: 1; 
> stdout: nodetool: Unsupported operation: Not enough live nodes to maintain 
> replication factor in keyspace system_distributed (RF = 3, N = 3). Perform a 
> forceful decommission to ignore.
> See 'nodetool help' or 'nodetool help '.
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/hintedhandoff_test.py", line 169, in 
> hintedhandoff_decom_test
> node2.decommission()
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1314, in 
> decommission
> self.nodetool("decommission")
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 783, in 
> nodetool
> return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', 
> '-p', str(self.jmx_port), cmd.split()])
>   File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1993, in 
> handle_external_tool_process
> raise ToolError(cmd_args, rc, out, err)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-12260) dtest failure in topology_test.TestTopology.decommissioned_node_cant_rejoin_test

2017-06-22 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-12260.
---
Resolution: Cannot Reproduce

This failure no longer reproduces in any form - closing as 'Cannot reproduce' 
for now.

> dtest failure in 
> topology_test.TestTopology.decommissioned_node_cant_rejoin_test
> 
>
> Key: CASSANDRA-12260
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12260
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>Assignee: Joel Knighton
>  Labels: dtest
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_novnode_dtest/14/testReport/topology_test/TestTopology/decommissioned_node_cant_rejoin_test



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-13567) test failure in topology_test.TestTopology.size_estimates_multidc_test

2017-06-22 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-13567.
---
Resolution: Fixed

This test was added after [CASSANDRA-9639], which only went into 3.0.11+. It 
was version-gated in dtest commit 
[3cf276e966f253a49df91293a1a0b46620192c59|https://github.com/riptano/cassandra-dtest/commit/3cf276e966f253a49df91293a1a0b46620192c59].

> test failure in topology_test.TestTopology.size_estimates_multidc_test
> --
>
> Key: CASSANDRA-13567
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13567
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Assignee: Joel Knighton
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_novnode_dtest/367/testReport/topology_test/TestTopology/size_estimates_multidc_test
> {noformat}
> Error Message
> Expected [['-3736333188524231709', '-2688160409776496397'], 
> ['-6639341390736545756', '-3736333188524231709'], ['-9223372036854775808', 
> '-6639341390736545756'], ['8473270337963525440', '8673615181726552074'], 
> ['8673615181726552074', '-9223372036854775808']] from SELECT range_start, 
> range_end FROM system.size_estimates WHERE keyspace_name = 'ks2', but got 
> [[u'-3736333188524231709', u'-2688160409776496397'], 
> [u'-9223372036854775808', u'-6639341390736545756'], [u'8673615181726552074', 
> u'-9223372036854775808']]
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-yNH4mu
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Creating cluster
> dtest: DEBUG: Setting tokens
> dtest: DEBUG: Starting cluster
> dtest: DEBUG: Nodetool ring output 
> Datacenter: dc1
> ==
> AddressRackStatus State   LoadOwns
> Token   
>   
> 8473270337963525440 
> 127.0.0.1  r1  Up Normal  92.11 KB39.49%  
> -6639341390736545756
> 127.0.0.1  r1  Up Normal  92.11 KB39.49%  
> -2688160409776496397
> 127.0.0.2  r1  Up Normal  92.1 KB 66.19%  
> -2506475074448728501
> 127.0.0.2  r1  Up Normal  92.1 KB 66.19%  
> 8473270337963525440 
> Datacenter: dc2
> ==
> AddressRackStatus State   LoadOwns
> Token   
>   
> 8673615181726552074 
> 127.0.0.3  r1  Up Normal  92.1 KB 94.32%  
> -3736333188524231709
> 127.0.0.3  r1  Up Normal  92.1 KB 94.32%  
> 8673615181726552074 
>   Warning: "nodetool ring" is used to output all the tokens of a node.
>   To view status related info of a node use "nodetool status" instead.
>   
> dtest: DEBUG: Creating keyspaces
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> dtest: DEBUG: Refreshing size estimates
> dtest: DEBUG: Checking node1_1 size_estimates primary ranges
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/topology_test.py", line 107, in 
> size_estimates_multidc_test
> ['8673615181726552074', '-9223372036854775808']])
>   File "/home/automaton/cassandra-dtest/tools/assertions.py", line 170, in 
> assert_all
> assert list_res == expected, "Expected {} from {}, but got 
> {}".format(expected, query, list_res)
> 'Expected 

[jira] [Assigned] (CASSANDRA-13567) test failure in topology_test.TestTopology.size_estimates_multidc_test

2017-06-22 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-13567:
-

Assignee: Joel Knighton

> test failure in topology_test.TestTopology.size_estimates_multidc_test
> --
>
> Key: CASSANDRA-13567
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13567
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Shuler
>Assignee: Joel Knighton
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_novnode_dtest/367/testReport/topology_test/TestTopology/size_estimates_multidc_test
> {noformat}
> Error Message
> Expected [['-3736333188524231709', '-2688160409776496397'], 
> ['-6639341390736545756', '-3736333188524231709'], ['-9223372036854775808', 
> '-6639341390736545756'], ['8473270337963525440', '8673615181726552074'], 
> ['8673615181726552074', '-9223372036854775808']] from SELECT range_start, 
> range_end FROM system.size_estimates WHERE keyspace_name = 'ks2', but got 
> [[u'-3736333188524231709', u'-2688160409776496397'], 
> [u'-9223372036854775808', u'-6639341390736545756'], [u'8673615181726552074', 
> u'-9223372036854775808']]
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-yNH4mu
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> dtest: DEBUG: Creating cluster
> dtest: DEBUG: Setting tokens
> dtest: DEBUG: Starting cluster
> dtest: DEBUG: Nodetool ring output 
> Datacenter: dc1
> ==
> AddressRackStatus State   LoadOwns
> Token   
>   
> 8473270337963525440 
> 127.0.0.1  r1  Up Normal  92.11 KB39.49%  
> -6639341390736545756
> 127.0.0.1  r1  Up Normal  92.11 KB39.49%  
> -2688160409776496397
> 127.0.0.2  r1  Up Normal  92.1 KB 66.19%  
> -2506475074448728501
> 127.0.0.2  r1  Up Normal  92.1 KB 66.19%  
> 8473270337963525440 
> Datacenter: dc2
> ==
> AddressRackStatus State   LoadOwns
> Token   
>   
> 8673615181726552074 
> 127.0.0.3  r1  Up Normal  92.1 KB 94.32%  
> -3736333188524231709
> 127.0.0.3  r1  Up Normal  92.1 KB 94.32%  
> 8673615181726552074 
>   Warning: "nodetool ring" is used to output all the tokens of a node.
>   To view status related info of a node use "nodetool status" instead.
>   
> dtest: DEBUG: Creating keyspaces
> cassandra.policies: INFO: Using datacenter 'dc1' for DCAwareRoundRobinPolicy 
> (via host '127.0.0.1'); if incorrect, please specify a local_dc to the 
> constructor, or limit contact points to local cluster nodes
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> dtest: DEBUG: Refreshing size estimates
> dtest: DEBUG: Checking node1_1 size_estimates primary ranges
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/topology_test.py", line 107, in 
> size_estimates_multidc_test
> ['8673615181726552074', '-9223372036854775808']])
>   File "/home/automaton/cassandra-dtest/tools/assertions.py", line 170, in 
> assert_all
> assert list_res == expected, "Expected {} from {}, but got 
> {}".format(expected, query, list_res)
> 'Expected [[\'-3736333188524231709\', \'-2688160409776496397\'], 
> [\'-6639341390736545756\', \'-3736333188524231709\'], 
> [\'-9223372036854775808\', \'-6639341390736545756\'], 
> [\'8473270337963525440\', \'8673615181726552074\'], [\'8673615181726552074\', 
> 

[jira] [Commented] (CASSANDRA-13583) test failure in rebuild_test.TestRebuild.disallow_rebuild_from_nonreplica_test

2017-06-19 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054560#comment-16054560
 ] 

Joel Knighton commented on CASSANDRA-13583:
---

This is consistently failing when run without vnodes after the commit of 
[CASSANDRA-4650].

> test failure in rebuild_test.TestRebuild.disallow_rebuild_from_nonreplica_test
> --
>
> Key: CASSANDRA-13583
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13583
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Hamm
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_novnode_dtest/524/testReport/rebuild_test/TestRebuild/disallow_rebuild_from_nonreplica_test
> {noformat}
> Error Message
> ToolError not raised
>  >> begin captured logging << 
> dtest: DEBUG: Python driver version in use: 3.10
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-0tUjhX
> dtest: DEBUG: Done setting configuration options:
> {   'num_tokens': None,
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> cassandra.cluster: INFO: New Cassandra host  discovered
> cassandra.cluster: INFO: New Cassandra host  discovered
> - >> end captured logging << -
> {noformat}
> {noformat}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/tools/decorators.py", line 48, in 
> wrappedtestrebuild
> f(obj)
>   File "/home/automaton/cassandra-dtest/rebuild_test.py", line 357, in 
> disallow_rebuild_from_nonreplica_test
> node1.nodetool('rebuild -ks ks1 -ts (%s,%s] -s %s' % (node3_token, 
> node1_token, node3_address))
>   File "/usr/lib/python2.7/unittest/case.py", line 116, in __exit__
> "{0} not raised".format(exc_name))
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13576) test failure in bootstrap_test.TestBootstrap.consistent_range_movement_false_with_rf1_should_succeed_test

2017-06-19 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054404#comment-16054404
 ] 

Joel Knighton commented on CASSANDRA-13576:
---

It looks like this has consistently been failing, with or without vnodes, since 
[CASSANDRA-4650] was committed.

> test failure in 
> bootstrap_test.TestBootstrap.consistent_range_movement_false_with_rf1_should_succeed_test
> -
>
> Key: CASSANDRA-13576
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13576
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Michael Hamm
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_offheap_dtest/445/testReport/bootstrap_test/TestBootstrap/consistent_range_movement_false_with_rf1_should_succeed_test
> {noformat}
> Error Message
> 31 May 2017 04:28:09 [node3] Missing: ['Starting listening for CQL clients']:
> INFO  [main] 2017-05-31 04:18:01,615 YamlConfigura.
> See system.log for remainder
> {noformat}
> {noformat}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/bootstrap_test.py", line 236, in 
> consistent_range_movement_false_with_rf1_should_succeed_test
> self._bootstrap_test_with_replica_down(False, rf=1)
>   File "/home/automaton/cassandra-dtest/bootstrap_test.py", line 278, in 
> _bootstrap_test_with_replica_down
> 
> jvm_args=["-Dcassandra.consistent.rangemovement={}".format(consistent_range_movement)])
>   File 
> "/home/automaton/venv/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 696, in start
> self.wait_for_binary_interface(from_mark=self.mark)
>   File 
> "/home/automaton/venv/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 514, in wait_for_binary_interface
> self.watch_log_for("Starting listening for CQL clients", **kwargs)
>   File 
> "/home/automaton/venv/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 471, in watch_log_for
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", time.gmtime()) + " 
> [" + self.name + "] Missing: " + str([e.pattern for e in tofind]) + ":\n" + 
> reads[:50] + ".\nSee {} for remainder".format(filename))
> "31 May 2017 04:28:09 [node3] Missing: ['Starting listening for CQL 
> clients']:\nINFO  [main] 2017-05-31 04:18:01,615 YamlConfigura.\n
> {noformat}
> {noformat}
>  >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-PKphwD\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n  
>   'num_tokens': '32',\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\ncassandra.policies: INFO: Using datacenter 'datacenter1' for 
> DCAwareRoundRobinPolicy (via host '127.0.0.1'); if incorrect, please specify 
> a local_dc to the constructor, or limit contact points to local cluster 
> nodes\ncassandra.cluster: INFO: New Cassandra host  datacenter1> discovered\ncassandra.protocol: WARNING: Server warning: When 
> increasing replication factor you need to run a full (-full) repair to 
> distribute the data.\ncassandra.connection: WARNING: Heartbeat failed for 
> connection (139927174110160) to 127.0.0.2\ncassandra.cluster: WARNING: Host 
> 127.0.0.2 has been marked down\ncassandra.pool: WARNING: Error attempting to 
> reconnect to 127.0.0.2, scheduling retry in 2.0 seconds: [Errno 111] Tried 
> connecting to [('127.0.0.2', 9042)]. Last error: Connection 
> refused\ncassandra.pool: WARNING: Error attempting to reconnect to 127.0.0.2, 
> scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to 
> [('127.0.0.2', 9042)]. Last error: Connection refused\ncassandra.pool: 
> WARNING: Error attempting to reconnect to 127.0.0.2, scheduling retry in 8.0 
> seconds: [Errno 111] Tried connecting to [('127.0.0.2', 9042)]. Last error: 
> Connection refused\ncassandra.pool: WARNING: Error attempting to reconnect to 
> 127.0.0.2, scheduling retry in 16.0 seconds: [Errno 111] Tried connecting to 
> [('127.0.0.2', 9042)]. Last error: Connection refused\ncassandra.pool: 
> WARNING: Error attempting to reconnect to 127.0.0.2, scheduling retry in 32.0 
> seconds: [Errno 111] Tried connecting to [('127.0.0.2', 9042)]. Last error: 
> Connection refused\ncassandra.pool: WARNING: Error attempting to reconnect to 
> 127.0.0.2, 

[jira] [Commented] (CASSANDRA-12606) CQLSSTableWriter unable to use blob conversion functions

2017-06-13 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16048662#comment-16048662
 ] 

Joel Knighton commented on CASSANDRA-12606:
---

Tests look good on the rebased branches,
 which I pushed for 
[trunk|https://github.com/jkni/cassandra/tree/ifesdjeen-12606-trunk], 
[3.11|https://github.com/jkni/cassandra/tree/ifesdjeen-12606-3.X], and 
[3.0|https://github.com/jkni/cassandra/tree/ifesdjeen-12606-3.0]. The 3.0/3.X 
rebases were trivial; trunk took a little bit of work because of the schema 
improvements made.

I'm comfortable with you making the above changes on commit if you agree. I'm 
also happy to take a look at the changes before commit, if you want.

+1.

> CQLSSTableWriter unable to use blob conversion functions
> 
>
> Key: CASSANDRA-12606
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12606
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL, Tools
>Reporter: Mark Reddy
>Assignee: Alex Petrov
>Priority: Minor
>
> Attempting to use blob conversion functions e.g. textAsBlob, from 3.0 - 3.7 
> results in:
> {noformat}
> Exception in thread "main" 
> org.apache.cassandra.exceptions.InvalidRequestException: Unknown function 
> textasblob called
>   at 
> org.apache.cassandra.cql3.functions.FunctionCall$Raw.prepare(FunctionCall.java:136)
>   at 
> org.apache.cassandra.cql3.Operation$SetValue.prepare(Operation.java:163)
>   at 
> org.apache.cassandra.cql3.statements.UpdateStatement$ParsedInsert.prepareInternal(UpdateStatement.java:173)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:785)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:771)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.prepareInsert(CQLSSTableWriter.java:567)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.build(CQLSSTableWriter.java:510)
> {noformat}
> The following snippet will reproduce the issue
> {code}
> String table = String.format("%s.%s", "test_ks", "test_table");
> String schema = String.format("CREATE TABLE %s (test_text text, test_blob 
> blob, PRIMARY KEY(test_text));", table);
> String insertStatement = String.format("INSERT INTO %s (test_text, test_blob) 
> VALUES (?, textAsBlob(?))", table);
> File tempDir = Files.createTempDirectory("tempDir").toFile();
> CQLSSTableWriter sstableWriter = CQLSSTableWriter.builder()
> .forTable(schema)
> .using(insertStatement)
> .inDirectory(tempDir)
> .build();
> {code}
> This is caused in FunctionResolver.get(...) when 
> candidates.addAll(Schema.instance.getFunctions(name.asNativeFunction())); is 
> called, as there is no system keyspace initialised.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12606) CQLSSTableWriter unable to use blob conversion functions

2017-06-13 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047952#comment-16047952
 ] 

Joel Knighton edited comment on CASSANDRA-12606 at 6/13/17 3:43 PM:


Sorry - as you well know, it's 100% (200%?) my fault this sat for so long. This 
looks good to me, but I'd like to rebase and rerun tests since the base 
branches have changed in the meantime. I'll do so, and if the tests come back 
good, I'll +1.

Some minor nits:
* on all versions, {{testUpdateSatement}} -> {{testUpdateStatement}} in 
{{CQLSSTableWriterTest}}. It looks like this typo already existed in later 
branches, so might as well fix there too.
* on 3.11/trunk, there seem to be some unused imports in {{CQLSSTableWriter}} 
(Collection, Function, FunctionName).
* on 3.11/trunk, there's a bit of duplicate code in creating the types/tables 
down either branch of KeyspaceMetadata existence in 
{{CQLSSTableWriter.Builder.build()}}. You could move this out of the 
conditional and then do a direct null check on {{ksm.tables.getNullable(...)}}. 
That said, it only removes a few lines of duplicate code in a piece of code 
that isn't touched often, so I'm fine either way on this.


was (Author: jkni):
Sorry - as you well know, it's 100% (200%?) my fault this sat for so long. This 
looks good to me, but I'd like to rebase and rerun tests since the base 
branches have changed in the meantime. I'll do so, and if the tests come back 
good, I'll +1.

Some minor nits:
* on all versions, {{testUpdateSatement}} -> {{testUpdateStatement}} in 
{{CQLSSTableWriterTest}}. It looks like this typo already existed in later 
branches, so might as well fix there too.
* on 3.11/trunk, there seem to be some unused imports in {{CQLSSTableWriter}} 
(Collection, Function, FunctionName).
* on 3.11/trunk, in the createTable docstring, the word types is duplicated.
* on 3.11/trunk, there's a bit of duplicate code in creating the types/tables 
down either branch of KeyspaceMetadata existence in 
{{CQLSSTableWriter.Builder.build()}}. You could move this out of the 
conditional and then do a direct null check on {{ksm.tables.getNullable(...)}}. 
That said, it only removes a few lines of duplicate code in a piece of code 
that isn't touched often, so I'm fine either way on this.

> CQLSSTableWriter unable to use blob conversion functions
> 
>
> Key: CASSANDRA-12606
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12606
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL, Tools
>Reporter: Mark Reddy
>Assignee: Alex Petrov
>Priority: Minor
>
> Attempting to use blob conversion functions e.g. textAsBlob, from 3.0 - 3.7 
> results in:
> {noformat}
> Exception in thread "main" 
> org.apache.cassandra.exceptions.InvalidRequestException: Unknown function 
> textasblob called
>   at 
> org.apache.cassandra.cql3.functions.FunctionCall$Raw.prepare(FunctionCall.java:136)
>   at 
> org.apache.cassandra.cql3.Operation$SetValue.prepare(Operation.java:163)
>   at 
> org.apache.cassandra.cql3.statements.UpdateStatement$ParsedInsert.prepareInternal(UpdateStatement.java:173)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:785)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:771)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.prepareInsert(CQLSSTableWriter.java:567)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.build(CQLSSTableWriter.java:510)
> {noformat}
> The following snippet will reproduce the issue
> {code}
> String table = String.format("%s.%s", "test_ks", "test_table");
> String schema = String.format("CREATE TABLE %s (test_text text, test_blob 
> blob, PRIMARY KEY(test_text));", table);
> String insertStatement = String.format("INSERT INTO %s (test_text, test_blob) 
> VALUES (?, textAsBlob(?))", table);
> File tempDir = Files.createTempDirectory("tempDir").toFile();
> CQLSSTableWriter sstableWriter = CQLSSTableWriter.builder()
> .forTable(schema)
> .using(insertStatement)
> .inDirectory(tempDir)
> .build();
> {code}
> This is caused in FunctionResolver.get(...) when 
> candidates.addAll(Schema.instance.getFunctions(name.asNativeFunction())); is 
> called, as there is no system keyspace initialised.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12606) CQLSSTableWriter unable to use blob conversion functions

2017-06-13 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16047952#comment-16047952
 ] 

Joel Knighton commented on CASSANDRA-12606:
---

Sorry - as you well know, it's 100% (200%?) my fault this sat for so long. This 
looks good to me, but I'd like to rebase and rerun tests since the base 
branches have changed in the meantime. I'll do so, and if the tests come back 
good, I'll +1.

Some minor nits:
* on all versions, {{testUpdateSatement}} -> {{testUpdateStatement}} in 
{{CQLSSTableWriterTest}}. It looks like this typo already existed in later 
branches, so might as well fix there too.
* on 3.11/trunk, there seem to be some unused imports in {{CQLSSTableWriter}} 
(Collection, Function, FunctionName).
* on 3.11/trunk, in the createTable docstring, the word types is duplicated.
* on 3.11/trunk, there's a bit of duplicate code in creating the types/tables 
down either branch of KeyspaceMetadata existence in 
{{CQLSSTableWriter.Builder.build()}}. You could move this out of the 
conditional and then do a direct null check on {{ksm.tables.getNullable(...)}}. 
That said, it only removes a few lines of duplicate code in a piece of code 
that isn't touched often, so I'm fine either way on this.

> CQLSSTableWriter unable to use blob conversion functions
> 
>
> Key: CASSANDRA-12606
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12606
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL, Tools
>Reporter: Mark Reddy
>Assignee: Alex Petrov
>Priority: Minor
>
> Attempting to use blob conversion functions e.g. textAsBlob, from 3.0 - 3.7 
> results in:
> {noformat}
> Exception in thread "main" 
> org.apache.cassandra.exceptions.InvalidRequestException: Unknown function 
> textasblob called
>   at 
> org.apache.cassandra.cql3.functions.FunctionCall$Raw.prepare(FunctionCall.java:136)
>   at 
> org.apache.cassandra.cql3.Operation$SetValue.prepare(Operation.java:163)
>   at 
> org.apache.cassandra.cql3.statements.UpdateStatement$ParsedInsert.prepareInternal(UpdateStatement.java:173)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:785)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement$Parsed.prepare(ModificationStatement.java:771)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.prepareInsert(CQLSSTableWriter.java:567)
>   at 
> org.apache.cassandra.io.sstable.CQLSSTableWriter$Builder.build(CQLSSTableWriter.java:510)
> {noformat}
> The following snippet will reproduce the issue
> {code}
> String table = String.format("%s.%s", "test_ks", "test_table");
> String schema = String.format("CREATE TABLE %s (test_text text, test_blob 
> blob, PRIMARY KEY(test_text));", table);
> String insertStatement = String.format("INSERT INTO %s (test_text, test_blob) 
> VALUES (?, textAsBlob(?))", table);
> File tempDir = Files.createTempDirectory("tempDir").toFile();
> CQLSSTableWriter sstableWriter = CQLSSTableWriter.builder()
> .forTable(schema)
> .using(insertStatement)
> .inDirectory(tempDir)
> .build();
> {code}
> This is caused in FunctionResolver.get(...) when 
> candidates.addAll(Schema.instance.getFunctions(name.asNativeFunction())); is 
> called, as there is no system keyspace initialised.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-11381) Node running with join_ring=false and authentication can not serve requests

2017-06-12 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11381:
--
Status: Ready to Commit  (was: Patch Available)

> Node running with join_ring=false and authentication can not serve requests
> ---
>
> Key: CASSANDRA-11381
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11381
> Project: Cassandra
>  Issue Type: Bug
>Reporter: mck
>Assignee: mck
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Starting up a node with {{-Dcassandra.join_ring=false}} in a cluster that has 
> authentication configured, eg PasswordAuthenticator, won't be able to serve 
> requests. This is because {{Auth.setup()}} never gets called during the 
> startup.
> Without {{Auth.setup()}} having been called in {{StorageService}} clients 
> connecting to the node fail with the node throwing
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:119)
> at 
> org.apache.cassandra.thrift.CassandraServer.login(CassandraServer.java:1471)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3505)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3489)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at com.thinkaurelius.thrift.Message.invoke(Message.java:314)
> at 
> com.thinkaurelius.thrift.Message$Invocation.execute(Message.java:90)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:695)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:689)
> at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:112)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The exception thrown from the 
> [code|https://github.com/apache/cassandra/blob/cassandra-2.0.16/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java#L119]
> {code}
> ResultMessage.Rows rows = 
> authenticateStatement.execute(QueryState.forInternalCalls(), new 
> QueryOptions(consistencyForUser(username),
>   
>Lists.newArrayList(ByteBufferUtil.bytes(username;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11381) Node running with join_ring=false and authentication can not serve requests

2017-06-12 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16046602#comment-16046602
 ] 

Joel Knighton commented on CASSANDRA-11381:
---

Great! I also ran dtests on some other machines this weekend and they looked 
good on all branches (relative to the current state of the branches).

+1 - one minor nit that can be fixed on commit: new CHANGES.txt entries should 
go at the top of the list.

> Node running with join_ring=false and authentication can not serve requests
> ---
>
> Key: CASSANDRA-11381
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11381
> Project: Cassandra
>  Issue Type: Bug
>Reporter: mck
>Assignee: mck
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Starting up a node with {{-Dcassandra.join_ring=false}} in a cluster that has 
> authentication configured, eg PasswordAuthenticator, won't be able to serve 
> requests. This is because {{Auth.setup()}} never gets called during the 
> startup.
> Without {{Auth.setup()}} having been called in {{StorageService}} clients 
> connecting to the node fail with the node throwing
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:119)
> at 
> org.apache.cassandra.thrift.CassandraServer.login(CassandraServer.java:1471)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3505)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3489)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at com.thinkaurelius.thrift.Message.invoke(Message.java:314)
> at 
> com.thinkaurelius.thrift.Message$Invocation.execute(Message.java:90)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:695)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:689)
> at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:112)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The exception thrown from the 
> [code|https://github.com/apache/cassandra/blob/cassandra-2.0.16/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java#L119]
> {code}
> ResultMessage.Rows rows = 
> authenticateStatement.execute(QueryState.forInternalCalls(), new 
> QueryOptions(consistencyForUser(username),
>   
>Lists.newArrayList(ByteBufferUtil.bytes(username;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-11381) Node running with join_ring=false and authentication can not serve requests

2017-05-26 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026830#comment-16026830
 ] 

Joel Knighton commented on CASSANDRA-11381:
---

Thanks - this looks just about good. A few comments:
* The 3.0 branch looks like it is an older version of the patch than the 2.2, 
3.11, and trunk patches - it's missing the atomic guard ensuring we only run 
the set up one. Is this just an oversight?
* The new exception looks good, but the condition is too restrictive. We'll 
only hit an error when there are no tokens in the ring; the attached patch 
fails whenever the local node has no tokens, even if there are other tokens in 
the ring. Changing the condition to something like 
{{StorageService.instance.getTokenMetadata().sortedTokens().isEmpty()}} should 
suffice.

It would be great it we could cut down the time the new tests take to run. I 
have a few suggestions that I'll post on the dtest PR once this is ready to go.

> Node running with join_ring=false and authentication can not serve requests
> ---
>
> Key: CASSANDRA-11381
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11381
> Project: Cassandra
>  Issue Type: Bug
>Reporter: mck
>Assignee: mck
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>
> Starting up a node with {{-Dcassandra.join_ring=false}} in a cluster that has 
> authentication configured, eg PasswordAuthenticator, won't be able to serve 
> requests. This is because {{Auth.setup()}} never gets called during the 
> startup.
> Without {{Auth.setup()}} having been called in {{StorageService}} clients 
> connecting to the node fail with the node throwing
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:119)
> at 
> org.apache.cassandra.thrift.CassandraServer.login(CassandraServer.java:1471)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3505)
> at 
> org.apache.cassandra.thrift.Cassandra$Processor$login.getResult(Cassandra.java:3489)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at com.thinkaurelius.thrift.Message.invoke(Message.java:314)
> at 
> com.thinkaurelius.thrift.Message$Invocation.execute(Message.java:90)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:695)
> at 
> com.thinkaurelius.thrift.TDisruptorServer$InvocationHandler.onEvent(TDisruptorServer.java:689)
> at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:112)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The exception thrown from the 
> [code|https://github.com/apache/cassandra/blob/cassandra-2.0.16/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java#L119]
> {code}
> ResultMessage.Rows rows = 
> authenticateStatement.execute(QueryState.forInternalCalls(), new 
> QueryOptions(consistencyForUser(username),
>   
>Lists.newArrayList(ByteBufferUtil.bytes(username;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13533) ColumnIdentifier object size wrong when tables are not flushed

2017-05-26 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13533:
--
Status: Ready to Commit  (was: Patch Available)

> ColumnIdentifier object size wrong when tables are not flushed
> --
>
> Key: CASSANDRA-13533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13533
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: columnidentifier.png
>
>
> It turns out that the object size of {{ColumnIdentifier}} is wrong when 
> *cassandra.test.flush_local_schema_changes: false*. This looks like stuff is 
> being wrongly reused when no flush is happening.
> We only noticed this because we were using the prepared stmt cache and 
> noticed that prepared statements would account for *1-6mb* when 
> *cassandra.test.flush_local_schema_changes: false*. With 
> *cassandra.test.flush_local_schema_changes: true* (which is the default) 
> those would be around *5000 bytes*.
> Attached is a test that reproduces the problem and also a fix.
> Also after talking to [~jkni] / [~blerer] we shouldn't probably take 
> {{ColumnDefinition}} into account when measuring object sizes with 
> {{MemoryMeter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13533) ColumnIdentifier object size wrong when tables are not flushed

2017-05-26 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13533:
--
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

> ColumnIdentifier object size wrong when tables are not flushed
> --
>
> Key: CASSANDRA-13533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13533
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: columnidentifier.png
>
>
> It turns out that the object size of {{ColumnIdentifier}} is wrong when 
> *cassandra.test.flush_local_schema_changes: false*. This looks like stuff is 
> being wrongly reused when no flush is happening.
> We only noticed this because we were using the prepared stmt cache and 
> noticed that prepared statements would account for *1-6mb* when 
> *cassandra.test.flush_local_schema_changes: false*. With 
> *cassandra.test.flush_local_schema_changes: true* (which is the default) 
> those would be around *5000 bytes*.
> Attached is a test that reproduces the problem and also a fix.
> Also after talking to [~jkni] / [~blerer] we shouldn't probably take 
> {{ColumnDefinition}} into account when measuring object sizes with 
> {{MemoryMeter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13533) ColumnIdentifier object size wrong when tables are not flushed

2017-05-26 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026719#comment-16026719
 ] 

Joel Knighton commented on CASSANDRA-13533:
---

+1 - thanks for the patch. I ran tests for all relevant branches; no unit tests 
failed on 3.0/3.11/trunk, and dtests looked the same as the present state of 
those branches.

Committed to 3.0 as {{8ffdd26cbee33c5dc1205c0f7292628e1a2c69e3}} and merged 
forward.

> ColumnIdentifier object size wrong when tables are not flushed
> --
>
> Key: CASSANDRA-13533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13533
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: columnidentifier.png
>
>
> It turns out that the object size of {{ColumnIdentifier}} is wrong when 
> *cassandra.test.flush_local_schema_changes: false*. This looks like stuff is 
> being wrongly reused when no flush is happening.
> We only noticed this because we were using the prepared stmt cache and 
> noticed that prepared statements would account for *1-6mb* when 
> *cassandra.test.flush_local_schema_changes: false*. With 
> *cassandra.test.flush_local_schema_changes: true* (which is the default) 
> those would be around *5000 bytes*.
> Attached is a test that reproduces the problem and also a fix.
> Also after talking to [~jkni] / [~blerer] we shouldn't probably take 
> {{ColumnDefinition}} into account when measuring object sizes with 
> {{MemoryMeter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13533) ColumnIdentifier object size wrong when tables are not flushed

2017-05-24 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023753#comment-16023753
 ] 

Joel Knighton commented on CASSANDRA-13533:
---

As a concrete example, something like the following would suffice to show the 
interning issue on all active branches.

{code}
@Test
public void testInterningUsesMinimalByteBuffer()
{
byte[] bytes = new byte[2];
bytes[0] = 0x63;
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
byteBuffer.limit(1);

ColumnIdentifier c1 = ColumnIdentifier.getInterned(byteBuffer, 
UTF8Type.instance);

Assert.assertEquals(2, byteBuffer.capacity());
Assert.assertEquals(1, c1.bytes.capacity());
}
{code}

What do you think, [~eduard.tudenhoefner]?

> ColumnIdentifier object size wrong when tables are not flushed
> --
>
> Key: CASSANDRA-13533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13533
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: columnidentifier.png
>
>
> It turns out that the object size of {{ColumnIdentifier}} is wrong when 
> *cassandra.test.flush_local_schema_changes: false*. This looks like stuff is 
> being wrongly reused when no flush is happening.
> We only noticed this because we were using the prepared stmt cache and 
> noticed that prepared statements would account for *1-6mb* when 
> *cassandra.test.flush_local_schema_changes: false*. With 
> *cassandra.test.flush_local_schema_changes: true* (which is the default) 
> those would be around *5000 bytes*.
> Attached is a test that reproduces the problem and also a fix.
> Also after talking to [~jkni] / [~blerer] we shouldn't probably take 
> {{ColumnDefinition}} into account when measuring object sizes with 
> {{MemoryMeter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13533) ColumnIdentifier object size wrong when tables are not flushed

2017-05-24 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16023663#comment-16023663
 ] 

Joel Knighton commented on CASSANDRA-13533:
---

The patch looks good. I merged this forward through 3.11 and trunk - testall 
and dtests look good for all branches relative to upstream.

I'm not sure about including the test as written; it passes on 3.11 and trunk 
even before the fix because of the default of offheap memtables introduced in 
[CASSANDRA-9472]. It seems to me that we might as well reduce this test to just 
checking that interning a ColumnIdentifier uses a minimal bytebuffer and add it 
to {{ColumnIdentifierTest}}.

> ColumnIdentifier object size wrong when tables are not flushed
> --
>
> Key: CASSANDRA-13533
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13533
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
> Fix For: 3.0.14, 3.11.0, 4.0
>
> Attachments: columnidentifier.png
>
>
> It turns out that the object size of {{ColumnIdentifier}} is wrong when 
> *cassandra.test.flush_local_schema_changes: false*. This looks like stuff is 
> being wrongly reused when no flush is happening.
> We only noticed this because we were using the prepared stmt cache and 
> noticed that prepared statements would account for *1-6mb* when 
> *cassandra.test.flush_local_schema_changes: false*. With 
> *cassandra.test.flush_local_schema_changes: true* (which is the default) 
> those would be around *5000 bytes*.
> Attached is a test that reproduces the problem and also a fix.
> Also after talking to [~jkni] / [~blerer] we shouldn't probably take 
> {{ColumnDefinition}} into account when measuring object sizes with 
> {{MemoryMeter}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13407) test failure at RemoveTest.testBadHostId

2017-04-10 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15963226#comment-15963226
 ] 

Joel Knighton commented on CASSANDRA-13407:
---

Patches look good - good catch on 2.2/3.0 difference.

+1.

> test failure at RemoveTest.testBadHostId
> 
>
> Key: CASSANDRA-13407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13407
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> Example trace:
> {code}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:881)
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:876)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2201)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1855)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:216)
>   at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:89)
> {code} 
> [failure 
> example|https://cassci.datastax.com/job/trunk_testall/1491/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/]
> [history|https://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/history/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13347) dtest failure in upgrade_tests.upgrade_through_versions_test.TestUpgrade_current_2_2_x_To_indev_3_0_x.rolling_upgrade_test

2017-04-10 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962957#comment-15962957
 ] 

Joel Knighton commented on CASSANDRA-13347:
---

There's a subtly different problem at play on 3.11. The issue is that these 
upgrade through versions tests use the released branch for all earlier 
versions, so we upgrade from 2.1-current, 2.2-current, 3.0-current, to 
3.11-dev. The patch from [CASSANDRA-13320] went into 3.0 with 3.0.13, which 
hasn't been released. That means running this test on 3.11 in-dev won't bring 
in the fix.

I'm not sure how we want to address this, but it's a dtest fix, not a C* fix, 
so I'm going to close this for now.

> dtest failure in 
> upgrade_tests.upgrade_through_versions_test.TestUpgrade_current_2_2_x_To_indev_3_0_x.rolling_upgrade_test
> --
>
> Key: CASSANDRA-13347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13347
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>  Labels: dtest, test-failure
> Fix For: 3.0.13, 3.11.0
>
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.0_large_dtest/58/testReport/upgrade_tests.upgrade_through_versions_test/TestUpgrade_current_2_2_x_To_indev_3_0_x/rolling_upgrade_test
> {code}
> Error Message
> Subprocess ['nodetool', '-h', 'localhost', '-p', '7100', ['upgradesstables', 
> '-a']] exited with non-zero status; exit status: 2; 
> stderr: error: null
> -- StackTrace --
> java.lang.AssertionError
>   at org.apache.cassandra.db.rows.Rows.collectStats(Rows.java:70)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter$StatsCollector.applyToRow(BigTableWriter.java:197)
>   at 
> org.apache.cassandra.db.transform.BaseRows.applyOne(BaseRows.java:116)
>   at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:107)
>   at 
> org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:41)
>   at 
> org.apache.cassandra.db.transform.Transformation.add(Transformation.java:156)
>   at 
> org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:122)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:147)
>   at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125)
>   at 
> org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57)
>   at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:415)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:307)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>   at java.lang.Thread.run(Thread.java:745)
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File 
> "/home/automaton/cassandra-dtest/upgrade_tests/upgrade_through_versions_test.py",
>  line 279, in rolling_upgrade_test
> self.upgrade_scenario(rolling=True)
>   File 
> "/home/automaton/cassandra-dtest/upgrade_tests/upgrade_through_versions_test.py",
>  line 345, in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,))
>   File 
> "/home/automaton/cassandra-dtest/upgrade_tests/upgrade_through_versions_test.py",
>  line 446, in upgrade_to_version
> node.nodetool('upgradesstables -a')
>   File 
> "/home/automaton/venv/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 789, in nodetool
> return 

[jira] [Resolved] (CASSANDRA-13347) dtest failure in upgrade_tests.upgrade_through_versions_test.TestUpgrade_current_2_2_x_To_indev_3_0_x.rolling_upgrade_test

2017-04-10 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-13347.
---
Resolution: Not A Problem

> dtest failure in 
> upgrade_tests.upgrade_through_versions_test.TestUpgrade_current_2_2_x_To_indev_3_0_x.rolling_upgrade_test
> --
>
> Key: CASSANDRA-13347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13347
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>  Labels: dtest, test-failure
> Fix For: 3.0.13, 3.11.0
>
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.0_large_dtest/58/testReport/upgrade_tests.upgrade_through_versions_test/TestUpgrade_current_2_2_x_To_indev_3_0_x/rolling_upgrade_test
> {code}
> Error Message
> Subprocess ['nodetool', '-h', 'localhost', '-p', '7100', ['upgradesstables', 
> '-a']] exited with non-zero status; exit status: 2; 
> stderr: error: null
> -- StackTrace --
> java.lang.AssertionError
>   at org.apache.cassandra.db.rows.Rows.collectStats(Rows.java:70)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter$StatsCollector.applyToRow(BigTableWriter.java:197)
>   at 
> org.apache.cassandra.db.transform.BaseRows.applyOne(BaseRows.java:116)
>   at org.apache.cassandra.db.transform.BaseRows.add(BaseRows.java:107)
>   at 
> org.apache.cassandra.db.transform.UnfilteredRows.add(UnfilteredRows.java:41)
>   at 
> org.apache.cassandra.db.transform.Transformation.add(Transformation.java:156)
>   at 
> org.apache.cassandra.db.transform.Transformation.apply(Transformation.java:122)
>   at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:147)
>   at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125)
>   at 
> org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:57)
>   at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
>   at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:415)
>   at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:307)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>   at java.lang.Thread.run(Thread.java:745)
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File 
> "/home/automaton/cassandra-dtest/upgrade_tests/upgrade_through_versions_test.py",
>  line 279, in rolling_upgrade_test
> self.upgrade_scenario(rolling=True)
>   File 
> "/home/automaton/cassandra-dtest/upgrade_tests/upgrade_through_versions_test.py",
>  line 345, in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,))
>   File 
> "/home/automaton/cassandra-dtest/upgrade_tests/upgrade_through_versions_test.py",
>  line 446, in upgrade_to_version
> node.nodetool('upgradesstables -a')
>   File 
> "/home/automaton/venv/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 789, in nodetool
> return handle_external_tool_process(p, ['nodetool', '-h', 'localhost', 
> '-p', str(self.jmx_port), cmd.split()])
>   File 
> "/home/automaton/venv/local/lib/python2.7/site-packages/ccmlib/node.py", line 
> 2002, in handle_external_tool_process
> raise ToolError(cmd_args, rc, out, err)
> {code}
> Related failures:
> http://cassci.datastax.com/job/cassandra-3.0_large_dtest/58/testReport/upgrade_tests.upgrade_through_versions_test/TestUpgrade_current_2_1_x_To_indev_3_0_x/rolling_upgrade_with_internode_ssl_test/
> 

[jira] [Commented] (CASSANDRA-13322) testall failure in org.apache.cassandra.io.compress.CompressedRandomAccessReaderTest.testDataCorruptionDetection-compression

2017-04-07 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961106#comment-15961106
 ] 

Joel Knighton commented on CASSANDRA-13322:
---

This looks like the same issue fixed in 3.0+ by [CASSANDRA-12552]. It might 
make sense to just backport that fix.

> testall failure in 
> org.apache.cassandra.io.compress.CompressedRandomAccessReaderTest.testDataCorruptionDetection-compression
> 
>
> Key: CASSANDRA-13322
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13322
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Sean McCarthy
>  Labels: test-failure, testall
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_testall/658/testReport/org.apache.cassandra.io.compress/CompressedRandomAccessReaderTest/testDataCorruptionDetection_compression
> {code}
> Stacktrace
> junit.framework.AssertionFailedError: 
>   at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReaderTest.testDataCorruptionDetection(CompressedRandomAccessReaderTest.java:218)
> {code}{code}
> Standard Output
> WARN  10:58:45 open(null, O_RDONLY) failed, errno (14).
> WARN  10:58:45 open(null, O_RDONLY) failed, errno (14).
> WARN  10:58:45 open(null, O_RDONLY) failed, errno (14).
> WARN  10:58:45 open(null, O_RDONLY) failed, errno (14).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-11892) Can not replace a dead host

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11892:
--
Reviewer:   (was: Joel Knighton)

> Can not replace a dead host
> ---
>
> Key: CASSANDRA-11892
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11892
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
> Attachments: 0001-handle-hibernate-case.patch
>
>
> I got some errors when trying to replace a dead host.
> {code}
> 2016-05-25_20:59:37.61838 ERROR 20:59:37 [main]: Exception encountered during 
> startup
> 2016-05-25_20:59:37.61839 java.lang.UnsupportedOperationException: Cannot 
> replace token 100284002935427428580945058996711341062 which does not exist!
> 2016-05-25_20:59:37.61839   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:925)
>  ~[apache-cassandra-2.1.14+git20160523.7442267.jar:2.1.14+git20160523.7442267]
> 2016-05-25_20:59:37.61839   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:740)
>  ~[apache-cassandra-2.1.14+git20160523.7442267.jar:2.1.14+git20160523.7442267]
> 2016-05-25_20:59:37.61839   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:617)
>  ~[apache-cassandra-2.1.14+git20160523.7442267.jar:2.1.14+git20160523.7442267]
> 2016-05-25_20:59:37.61840   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:389) 
> [apache-cassandra-2.1.14+git20160523.7442267.jar:2.1.14+git20160523.7442267]
> 2016-05-25_20:59:37.61840   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564)
>  [apache-cassandra-2.1.14+git20160523.7442267.jar:2.1.14+git20160523.7442267]
> 2016-05-25_20:59:37.61841   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) 
> [apache-cassandra-2.1.14+git20160523.7442267.jar:2.1.14+git20160523.7442267]
> 2016-05-25_20:59:37.61910 Exception encountered during startup: Cannot 
> replace token 100284002935427428580945058996711341062 which does not exist!
> {code}
> the status of the node is DN:
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address  Load   Tokens  OwnsHost ID   
> Rack
> DN  2401:db00:2050:4196:face:0:13:0  809.83 GB  256 ?   null  
> ash5-04-pp
> {code}
> I add some logging and find something like this:
> {code}
> 2016-05-25_20:58:33.44305 INFO  20:58:33 [main]: Gathering node replacement 
> information for /2401:db00:2050:4196:face:0:13:0
> 2016-05-25_20:58:34.36966 INFO  20:58:34 [GossipStage:1]: InetAddress 
> /2401:db00:2050:4196:face:0:13:0 is now DOWN
> 2016-05-25_20:58:41.12167 INFO  20:58:41 [GossipStage:1]: InetAddress 
> /2401:db00:2050:4196:face:0:13:0 is now DOWN
> 2016-05-25_20:58:41.12248 INFO  20:58:41 [GossipStage:1]: Node 
> /2401:db00:2050:4196:face:0:13:0 state STATUS
> 2016-05-25_20:58:41.12250 INFO  20:58:41 [GossipStage:1]: Node 
> /2401:db00:2050:4196:face:0:13:0 movename hibernate
> 2016-05-25_20:58:41.12252 INFO  20:58:41 [GossipStage:1]: Node 
> /2401:db00:2050:4196:face:0:13:0 state LOAD
> {code}
> I find in the StorageService.onChange, we do not handle the "hibernate" 
> VersionValue, does it cause the problem?
> Is it safe to apply the patch to fix it?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-11402) Alignment wrong in tpstats output for PerDiskMemtableFlushWriter

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-11402:
--
Reviewer:   (was: Joel Knighton)

> Alignment wrong in tpstats output for PerDiskMemtableFlushWriter
> 
>
> Key: CASSANDRA-11402
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11402
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Nishant Kelkar
>Priority: Trivial
>  Labels: lhf
> Fix For: 3.11.x
>
> Attachments: 11402-3_5_patch1.patch, 11402-trunk.txt
>
>
> With the accompanying designation of which memtableflushwriter it is, this 
> threadpool name is too long for the hardcoded padding in tpstats output.
> We should dynamically calculate padding so that we don't need to check this 
> every time we add a threadpool.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-8094) Heavy writes in RangeSlice read requests

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-8094:
-
Reviewer:   (was: Joel Knighton)

> Heavy writes in RangeSlice read  requests 
> --
>
> Key: CASSANDRA-8094
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8094
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Minh Do
>  Labels: lhf
> Fix For: 3.11.x
>
>
> RangeSlice requests always do a scheduled read repair when coordinators try 
> to resolve replicas' responses no matter read_repair_chance is set or not.
> Because of this, in low writes and high reads clusters, there are very high 
> write requests going on between nodes.  
> We should have an option to turn this off and this can be different than the 
> read_repair_chance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-12475) dtest failure in consistency_test.TestConsistency.short_read_test

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-12475.
---
Resolution: Cannot Reproduce

> dtest failure in consistency_test.TestConsistency.short_read_test
> -
>
> Key: CASSANDRA-12475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12475
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/42/testReport/junit/consistency_test/TestConsistency/short_read_test/
> Error:
> {code}
> Error from server: code=2200 [Invalid query] message="No keyspace has been 
> specified. USE a keyspace, or explicitly specify keyspace.tablename"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12475) dtest failure in consistency_test.TestConsistency.short_read_test

2017-04-07 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961047#comment-15961047
 ] 

Joel Knighton commented on CASSANDRA-12475:
---

I think this may have been a driver race that has been fixed. Closing for now, 
since this doesn't appear to be a problem any more.

> dtest failure in consistency_test.TestConsistency.short_read_test
> -
>
> Key: CASSANDRA-12475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12475
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>  Labels: dtest
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.9_dtest/42/testReport/junit/consistency_test/TestConsistency/short_read_test/
> Error:
> {code}
> Error from server: code=2200 [Invalid query] message="No keyspace has been 
> specified. USE a keyspace, or explicitly specify keyspace.tablename"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-12821) testall failure in org.apache.cassandra.service.RemoveTest.testNonmemberId

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-12821.
---
Resolution: Duplicate

> testall failure in org.apache.cassandra.service.RemoveTest.testNonmemberId
> --
>
> Key: CASSANDRA-12821
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12821
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sean McCarthy
>Assignee: Joel Knighton
>  Labels: test-failure
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.X_testall/41/testReport/org.apache.cassandra.service/RemoveTest/testNonmemberId/
> {code}
> Stacktrace
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:871)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2226)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1892)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:216)
>   at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:88)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-11740) Nodes have wrong membership view of the cluster

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-11740:
-

Assignee: (was: Joel Knighton)

> Nodes have wrong membership view of the cluster
> ---
>
> Key: CASSANDRA-11740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11740
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
> Fix For: 2.2.x, 3.11.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few 
> millions writes per second into the cluster.
> The problem we found is that there are some nodes (>10) have very wrong view 
> of the cluster.
> For example, we have 3 data centers A, B and C. On the problem nodes, in the 
> output of the 'nodetool status', it shows that ~100 nodes are not in data 
> center A, B, or C. Instead, it shows nodes are in DC1, and rack r1, which is 
> very wrong. And as a result, the node will return wrong results to client 
> requests.
> {code}
> Datacenter: DC1
> ===
> Status=Up/Down
> / State=Normal/Leaving/Joining/Moving
> – Address Load Tokens Owns Host ID Rack
> UN 2401:db00:11:6134:face:0:1:0 509.52 GB 256 ? 
> e24656ac-c3b2-4117-b933-a5b06852c993 r1
> UN 2401:db00:11:b218:face:0:5:0 510.01 GB 256 ? 
> 53da2104-b1b5-4fa5-a3dd-52c7557149f9 r1
> UN 2401:db00:2130:5133:face:0:4d:0 459.75 GB 256 ? 
> ef8311f0-f6b8-491c-904d-baa925cdd7c2 r1
> {code}
> We are using GossipingPropertyFileSnitch.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-12465) Failure in CompressedRandomAccessReaderTest.testDataCorruptionDetection

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-12465.
---
Resolution: Duplicate

> Failure in CompressedRandomAccessReaderTest.testDataCorruptionDetection
> ---
>
> Key: CASSANDRA-12465
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12465
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Joel Knighton
>Assignee: Joel Knighton
>
> This test is flaky. Example failure: 
> [https://cassci.datastax.com/job/cassandra-3.9_testall/76/testReport/junit/org.apache.cassandra.io.compress/CompressedRandomAccessReaderTest/testDataCorruptionDetection/].
> Stacktrace:
> {code}
> junit.framework.AssertionFailedError: 
>   at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReaderTest.testDataCorruptionDetection(CompressedRandomAccessReaderTest.java:246)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CASSANDRA-11835) dtest failure in replace_address_test.TestReplaceAddress.replace_with_reset_resume_state_test

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton reassigned CASSANDRA-11835:
-

Assignee: (was: Joel Knighton)

> dtest failure in 
> replace_address_test.TestReplaceAddress.replace_with_reset_resume_state_test
> -
>
> Key: CASSANDRA-11835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11835
> Project: Cassandra
>  Issue Type: Test
>Reporter: Philip Thompson
>  Labels: dtest
> Attachments: node1_debug.log, node1.log, node2_debug.log, node2.log, 
> node3_debug.log, node3.log, node4_debug.log, node4.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-2.2_offheap_dtest/375/testReport/replace_address_test/TestReplaceAddress/replace_with_reset_resume_state_test
> Failed on CassCI build cassandra-2.2_offheap_dtest #375
> Node4 is started up to replace the killed node3, but fails with this error:
> {code}
> ERROR [main] 2016-05-18 03:08:02,244 CassandraDaemon.java:638 - Exception 
> encountered during startup
> java.lang.RuntimeException: Cannot replace_address /127.0.0.3 because it 
> doesn't exist in gossip
> at 
> org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:529)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:775)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:709)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:595)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) 
> [main/:na]
> {code}
> Logs are attached.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-11689) dtest failures in internode_ssl_test tests

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-11689.
---
   Resolution: Duplicate
Reproduced In:   (was: )

Fixed by [CASSANDRA-12653].

> dtest failures in internode_ssl_test tests
> --
>
> Key: CASSANDRA-11689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11689
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Russ Hatch
>Assignee: Joel Knighton
>  Labels: dtest
> Fix For: 3.11.x
>
>
> has happened a few times on trunk, two different tests:
> http://cassci.datastax.com/job/trunk_dtest/1179/testReport/internode_ssl_test/TestInternodeSSL/putget_with_internode_ssl_without_compression_test
> http://cassci.datastax.com/job/trunk_dtest/1169/testReport/internode_ssl_test/TestInternodeSSL/putget_with_internode_ssl_test/
> Failed on CassCI build trunk_dtest #1179



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (CASSANDRA-13112) test failure in snitch_test.TestDynamicEndpointSnitch.test_multidatacenter_local_quorum

2017-04-07 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton resolved CASSANDRA-13112.
---
Resolution: Fixed

The dtest PR was merged and this failure has not reoccurred.

> test failure in 
> snitch_test.TestDynamicEndpointSnitch.test_multidatacenter_local_quorum
> ---
>
> Key: CASSANDRA-13112
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13112
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sean McCarthy
>Assignee: Joel Knighton
>  Labels: dtest, test-failure
> Attachments: node1_debug.log, node1_gc.log, node1.log, 
> node2_debug.log, node2_gc.log, node2.log, node3_debug.log, node3_gc.log, 
> node3.log, node4_debug.log, node4_gc.log, node4.log, node5_debug.log, 
> node5_gc.log, node5.log, node6_debug.log, node6_gc.log, node6.log
>
>
> example failure:
> http://cassci.datastax.com/job/trunk_large_dtest/48/testReport/snitch_test/TestDynamicEndpointSnitch/test_multidatacenter_local_quorum
> {code}
> Error Message
> 75 != 76
> {code}{code}
> Stacktrace
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/automaton/cassandra-dtest/tools/decorators.py", line 48, in 
> wrapped
> f(obj)
>   File "/home/automaton/cassandra-dtest/snitch_test.py", line 168, in 
> test_multidatacenter_local_quorum
> bad_jmx.read_attribute(read_stage, 'Value'))
>   File "/usr/lib/python2.7/unittest/case.py", line 513, in assertEqual
> assertion_func(first, second, msg=msg)
>   File "/usr/lib/python2.7/unittest/case.py", line 506, in _baseAssertEqual
> raise self.failureException(msg)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13407) test failure at RemoveTest.testBadHostId

2017-04-06 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959301#comment-15959301
 ] 

Joel Knighton edited comment on CASSANDRA-13407 at 4/6/17 5:02 PM:
---

For posterity, this is the race possible when the Gossiper is started, as far 
as I can tell.

In setup, we initialize a fake ring using Util.createInitialRing. This will 
intialize the nodes in an unsafe manner and then inject the token states. If a 
status check runs before the tokens state is set, the previously decommissioned 
node will look like a fat client, since it won't have tokens and will not have 
a DEAD_STATE. Since we aren't gossiping, we won't have heard from it in greater 
than fatClientTimeout, so we'll remove it. If this races with the ss.onChange 
in createInitialRing, we can remove the endpointstate while processing it, 
which will cause a NPE as above.

We also need to remove SchemaLoader.loadSchema() as you did in the patch - this 
is because it starts the Gossiper as well. This is fine; we don't appear to 
need it.

The patch looks good - the race exists in theory on 2.1/2.2, but it appears to 
only manifest on 3.0+. I don't think it is worth committing to 2.1 for that 
reason - let's do 2.2+ forward and run the test at least once on each branch 
before committing.




was (Author: jkni):
For posterity, this is the race possible when the Gossiper is started, as far 
as I can tell.

In setup, we initialize a fake ring using Util.createInitialRing. This will 
intialize the nodes in an unsafe manner and then inject the token states. If a 
status check runs before the tokens state is set, the previously decommissioned 
node will look like a fat client, since it won't have tokens and will not have 
a DEAD_STATE. Since we aren't gossiping, we won't have heard from it in greater 
than fatClientTimeout, so we'll remove it. If this races with the ss.onChange 
in createInitialRing, we can remove the endpointstate while processing it, 
which will cause a NPE as above. This race can be seen at 16:15:51,205 in the 
log linked from the test failure.

We also need to remove SchemaLoader.loadSchema() as you did in the patch - this 
is because it starts the Gossiper as well. This is fine; we don't appear to 
need it.

The patch looks good - the race exists in theory on 2.1/2.2, but it appears to 
only manifest on 3.0+. I don't think it is worth committing to 2.1 for that 
reason - let's do 2.2+ forward and run the test at least once on each branch 
before committing.



> test failure at RemoveTest.testBadHostId
> 
>
> Key: CASSANDRA-13407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13407
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> Example trace:
> {code}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:881)
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:876)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2201)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1855)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:216)
>   at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:89)
> {code} 
> [failure 
> example|https://cassci.datastax.com/job/trunk_testall/1491/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/]
> [history|https://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/history/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13407) test failure at RemoveTest.testBadHostId

2017-04-06 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15959301#comment-15959301
 ] 

Joel Knighton commented on CASSANDRA-13407:
---

For posterity, this is the race possible when the Gossiper is started, as far 
as I can tell.

In setup, we initialize a fake ring using Util.createInitialRing. This will 
intialize the nodes in an unsafe manner and then inject the token states. If a 
status check runs before the tokens state is set, the previously decommissioned 
node will look like a fat client, since it won't have tokens and will not have 
a DEAD_STATE. Since we aren't gossiping, we won't have heard from it in greater 
than fatClientTimeout, so we'll remove it. If this races with the ss.onChange 
in createInitialRing, we can remove the endpointstate while processing it, 
which will cause a NPE as above. This race can be seen at 16:15:51,205 in the 
log linked from the test failure.

We also need to remove SchemaLoader.loadSchema() as you did in the patch - this 
is because it starts the Gossiper as well. This is fine; we don't appear to 
need it.

The patch looks good - the race exists in theory on 2.1/2.2, but it appears to 
only manifest on 3.0+. I don't think it is worth committing to 2.1 for that 
reason - let's do 2.2+ forward and run the test at least once on each branch 
before committing.



> test failure at RemoveTest.testBadHostId
> 
>
> Key: CASSANDRA-13407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13407
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> Example trace:
> {code}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:881)
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:876)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2201)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1855)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:216)
>   at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:89)
> {code} 
> [failure 
> example|https://cassci.datastax.com/job/trunk_testall/1491/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/]
> [history|https://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/history/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13407) test failure at RemoveTest.testBadHostId

2017-04-06 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-13407:
--
Status: Ready to Commit  (was: Patch Available)

> test failure at RemoveTest.testBadHostId
> 
>
> Key: CASSANDRA-13407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13407
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>
> Example trace:
> {code}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:881)
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:876)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2201)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1855)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:216)
>   at org.apache.cassandra.service.RemoveTest.setup(RemoveTest.java:89)
> {code} 
> [failure 
> example|https://cassci.datastax.com/job/trunk_testall/1491/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/]
> [history|https://cassci.datastax.com/job/trunk_testall/lastCompletedBuild/testReport/org.apache.cassandra.service/RemoveTest/testBadHostId/history/]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   8   >