[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2022-04-27 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528707#comment-17528707
 ] 

Jan Karlsson commented on CASSANDRA-16718:
--

[~brandon.williams] The ticket has been quiet for awhile now. What is the 
status on this?

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2022-04-27 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528707#comment-17528707
 ] 

Jan Karlsson edited comment on CASSANDRA-16718 at 4/27/22 9:57 AM:
---

[~brandon.williams] The ticket has been quiet for awhile now. What is the 
status on this? Are we waiting for a reviewer?


was (Author: jan karlsson):
[~brandon.williams] The ticket has been quiet for awhile now. What is the 
status on this?

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-03-07 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502267#comment-17502267
 ] 

Jan Karlsson edited comment on CASSANDRA-17407 at 3/8/22, 7:18 AM:
---

Valid point, although I am not sure how much use the average person will get 
out of it. Datacenter names usually have some pattern that makes it easy to see 
errors. Even with 10 dcs, it would be rather easy for the user to do a nodetool 
status and compare the lists. Although I suppose a more precise error message 
couldn't hurt and the improved validation on the dtest would be good.

||Branch||Circle||dtest||
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCI|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/10/workflows/f5008a69-2798-4b03-802d-fb6d0128d68a]|[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|

edit: Fixed 3.11 link



was (Author: jan karlsson):
Valid point, although I am not sure how much use the average person will get 
out of it. Datacenter names usually have some pattern that makes it easy to see 
errors. Even with 10 dcs, it would be rather easy for the user to do a nodetool 
status and compare the lists. Although I suppose a more precise error message 
couldn't hurt and the improved validation on the dtest would be good.

||Branch||Circle||dtest||
|[3.11|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|[CircleCI|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/10/workflows/f5008a69-2798-4b03-802d-fb6d0128d68a]|[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|


> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-03-07 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502267#comment-17502267
 ] 

Jan Karlsson commented on CASSANDRA-17407:
--

Valid point, although I am not sure how much use the average person will get 
out of it. Datacenter names usually have some pattern that makes it easy to see 
errors. Even with 10 dcs, it would be rather easy for the user to do a nodetool 
status and compare the lists. Although I suppose a more precise error message 
couldn't hurt and the improved validation on the dtest would be good.

||Branch||Circle||dtest||
|[3.11|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|[CircleCI|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/10/workflows/f5008a69-2798-4b03-802d-fb6d0128d68a]|[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]|


> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-03-02 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500056#comment-17500056
 ] 

Jan Karlsson edited comment on CASSANDRA-17407 at 3/2/22, 11:09 AM:


Sure thing. I created a 
[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]
 that tests this behavior.

I added a check for the validation of the local data center as well as I 
couldn't find any other place where we test this.


was (Author: jan karlsson):
Sure thing. I created a 
[test|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]
 that tests this behavior.

I added a check for the validation of the local data center as well as I 
couldn't find any other place where we test this.

> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-03-02 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500056#comment-17500056
 ] 

Jan Karlsson commented on CASSANDRA-17407:
--

Sure thing. I created a 
[test|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:c17407-trunk?expand=1]
 that tests this behavior.

I added a check for the validation of the local data center as well as I 
couldn't find any other place where we test this.

> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-03-01 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499506#comment-17499506
 ] 

Jan Karlsson commented on CASSANDRA-17407:
--

You might have a point about this feeling more like a bug than an improvement. 
It certainly is confusing behavior for users.

I was thinking something like this for the patch:
||Patch||Test||
|[3.11\|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCi\|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/9/workflows/ca527be8-4145-4c55-bfb4-7cec670e7b4c]|

The patch applies cleanly to 4.0/trunk. I can provide patches for the other 
versions too if needed.

> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-03-01 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499506#comment-17499506
 ] 

Jan Karlsson edited comment on CASSANDRA-17407 at 3/1/22, 12:38 PM:


You might have a point about this feeling more like a bug than an improvement. 
It certainly is confusing behavior for users.

I was thinking something like this for the patch:
||Patch||Test||
|[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCi|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/9/workflows/ca527be8-4145-4c55-bfb4-7cec670e7b4c]|

The patch applies cleanly to 4.0/trunk. I can provide patches for the other 
versions too if needed.


was (Author: jan karlsson):
You might have a point about this feeling more like a bug than an improvement. 
It certainly is confusing behavior for users.

I was thinking something like this for the patch:
||Patch||Test||
|[3.11\|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:c17407-3.11?expand=1]|[CircleCi\|https://app.circleci.com/pipelines/github/itskarlsson/cassandra/9/workflows/ca527be8-4145-4c55-bfb4-7cec670e7b4c]|

The patch applies cleanly to 4.0/trunk. I can provide patches for the other 
versions too if needed.

> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-17407) Validate existence of DCs when repairing

2022-02-28 Thread Jan Karlsson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-17407:
-
Summary: Validate existence of DCs when repairing  (was: Validate existance 
of DCs when repairing)

> Validate existence of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-17407) Validate existance of DCs when repairing

2022-02-28 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498940#comment-17498940
 ] 

Jan Karlsson commented on CASSANDRA-17407:
--

I'd be happy to provide a patch if we decide this to be a good addition.

> Validate existance of DCs when repairing
> 
>
> Key: CASSANDRA-17407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
>
> With the new validation of data centers in the replication factor, it might 
> be good to give similar treatment to repair.
> Currently the behavior of the --in-dc flag only validates that it contains 
> the local data center.
> If a list is given containing nonexistent data centers, the repair will pass 
> without errors or warning as long as this list also contains the local data 
> center.
> My suggestion would be to validate all the data centers and give an error 
> when a nonexistent data center is given.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-17407) Validate existance of DCs when repairing

2022-02-28 Thread Jan Karlsson (Jira)
Jan Karlsson created CASSANDRA-17407:


 Summary: Validate existance of DCs when repairing
 Key: CASSANDRA-17407
 URL: https://issues.apache.org/jira/browse/CASSANDRA-17407
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jan Karlsson
Assignee: Jan Karlsson


With the new validation of data centers in the replication factor, it might be 
good to give similar treatment to repair.

Currently the behavior of the --in-dc flag only validates that it contains the 
local data center.

If a list is given containing nonexistent data centers, the repair will pass 
without errors or warning as long as this list also contains the local data 
center.

My suggestion would be to validate all the data centers and give an error when 
a nonexistent data center is given.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-09-17 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416492#comment-17416492
 ] 

Jan Karlsson commented on CASSANDRA-16718:
--

Observed this by fetching the peers table before stopping node2 in my dtest.

Without the patch I observed preferred_ip populated but with the patch it is 
null throughout the test.

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-09-16 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17416038#comment-17416038
 ] 

Jan Karlsson commented on CASSANDRA-16718:
--

LGTM. Seems to fix the issue completely. However, preferred_ip is null after 
your patch throughout the test. Is this an intended side effect?

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-09-08 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17411763#comment-17411763
 ] 

Jan Karlsson commented on CASSANDRA-16718:
--

Great findings so far. Thank you for taking the time to dig into this.

I agree that the old local address is persisted somewhere and therefore used by 
the existing node. However, in an attempt to verify your findings I modified my 
test case to manually change the preferred_ip before I start the last node so 
that it points to the correct address. The test still fails even with an 
updated preferred_ip.

My original thought was that the Gossiper was persisting this ip in 
endpointStateMap. During checkEndpointCollison, the UP node will attempt to 
connect through the local address before this address is updated by the 
ShadowRound.

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-08-03 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17392196#comment-17392196
 ] 

Jan Karlsson commented on CASSANDRA-16718:
--

Took a look at the code base. It seems to be quite a difficult thing to change 
as it is intertwined with the message pools per node. Maybe someone with more 
experience with the networking can shed some light on the issue.

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-08-03 Thread Jan Karlsson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson reassigned CASSANDRA-16718:


Assignee: (was: Jan Karlsson)

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-08-03 Thread Jan Karlsson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson reassigned CASSANDRA-16718:


Assignee: Jan Karlsson

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-06-07 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17358575#comment-17358575
 ] 

Jan Karlsson commented on CASSANDRA-16718:
--

dtest to repro can be found 
[here|https://github.com/itskarlsson/cassandra-dtest/tree/CASSANDRA-16718]. It 
works if prefer_local is set to false.

 

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Consistency/Bootstrap and Decommission
>Reporter: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0, 4.0-rc1
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-06-07 Thread Jan Karlsson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-16718:
-
Fix Version/s: 4.0-rc1
   4.0
   3.11.11
   3.0.25

> Changing listen_address with prefer_local may lead to issues
> 
>
> Key: CASSANDRA-16718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Consistency/Bootstrap and Decommission
>Reporter: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.25, 3.11.11, 4.0, 4.0-rc1
>
>
> Many container based solution function by assigning new listen_addresses when 
> nodes are stopped. Changing the listen_address is usually as simple as 
> turning off the node and changing the yaml file. 
> However, if prefer_local is enabled, I observed that nodes were unable to 
> join the cluster and fail with 'Unable to gossip with any seeds'. 
> Trace shows that the changing node will try to communicate with the existing 
> node but the response is never received. I assume it is because the existing 
> node attempts to communicate with the local address during the shadow round.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16718) Changing listen_address with prefer_local may lead to issues

2021-06-07 Thread Jan Karlsson (Jira)
Jan Karlsson created CASSANDRA-16718:


 Summary: Changing listen_address with prefer_local may lead to 
issues
 Key: CASSANDRA-16718
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16718
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip, Consistency/Bootstrap and Decommission
Reporter: Jan Karlsson


Many container based solution function by assigning new listen_addresses when 
nodes are stopped. Changing the listen_address is usually as simple as turning 
off the node and changing the yaml file. 

However, if prefer_local is enabled, I observed that nodes were unable to join 
the cluster and fail with 'Unable to gossip with any seeds'. 

Trace shows that the changing node will try to communicate with the existing 
node but the response is never received. I assume it is because the existing 
node attempts to communicate with the local address during the shadow round.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16577) Node waits for schema agreement on removed nodes

2021-04-09 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17317944#comment-17317944
 ] 

Jan Karlsson commented on CASSANDRA-16577:
--

Tried to reproduce with your patch. It worked both on 4.0-rc1-SNAPSHOT and 
3.11.11-SNAPSHOT.

As for the code, LGTM. One nit would be to include the ignore log message into 
the exception instead of the warning.

> Node waits for schema agreement on removed nodes
> 
>
> Key: CASSANDRA-16577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16577
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Consistency/Bootstrap and Decommission
>Reporter: Jan Karlsson
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0, 3.0.x, 3.11.x, 4.0-beta
>
>
> CASSANDRA-15158 might have introduced a bug where bootstrapping nodes wait 
> for schema agreement from nodes that have been removed if token allocation 
> for keyspace is enabled.
>  
> It is fairly easy to reproduce with the following steps:
> {noformat}
> // Create 3 node cluster
> ccm create test --vnodes -n 3 -s -v 3.11.10
> // Remove two nodes
> ccm node2 decommission
> ccm node3 decommission
> ccm node2 remove
> ccm node3 remove
> // Create keyspace to change the schema. It works if the schema never changes.
> ccm node1 cqlsh -x "CREATE KEYSPACE k WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 1};"
> // Add allocate parameter
> ccm updateconf 'allocate_tokens_for_keyspace: k'
> // Add node2 again to cluster
> ccm add node2 -i 127.0.0.2 -j 7200 -r 2200
> ccm node2 start{noformat}
>  
> This will cause node2 to throw exception on startup:
> {noformat}
> WARN  [main] 2021-04-08 14:10:53,272 StorageService.java:941 - There are 
> nodes in the cluster with a different schema version than us we did not 
> merged schemas from, our version : (a5da47ec-ffe3-3111-b2f3-325f771f1539), 
> outstanding versions -> endpoints : 
> {8e9ec79e-5ed2-3949-8ac8-794abfee3837=[/127.0.0.3]}
> ERROR [main] 2021-04-08 14:10:53,274 CassandraDaemon.java:803 - Exception 
> encountered during startup
> java.lang.RuntimeException: Didn't receive schemas for all known versions 
> within the timeout
> at 
> org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:947)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:206) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:177)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1073)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:753)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:687)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395) 
> [apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633)
>  [apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786) 
> [apache-cassandra-3.11.10.jar:3.11.10]
> INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,279 
> HintsService.java:209 - Paused hints dispatch
> WARN  [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 Gossiper.java:1670 
> - No local state, state is in silent shutdown, or node hasn't joined, not 
> announcing shutdown
> INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 
> MessagingService.java:985 - Waiting for messaging service to quiesce
> INFO  [ACCEPT-/127.0.0.2] 2021-04-08 14:10:53,281 MessagingService.java:1346 
> - MessagingService has terminated the accept() thread
> INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,416 
> HintsService.java:209 - Paused hints dispatch{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16577) Node waits for schema agreement on removed nodes

2021-04-08 Thread Jan Karlsson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-16577:
-
Fix Version/s: 3.0.24
   3.11.10
   4.0-beta
   4.0

> Node waits for schema agreement on removed nodes
> 
>
> Key: CASSANDRA-16577
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16577
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Consistency/Bootstrap and Decommission
>Reporter: Jan Karlsson
>Priority: Normal
> Fix For: 3.0.24, 3.11.10, 4.0, 4.0-beta
>
>
> CASSANDRA-15158 might have introduced a bug where bootstrapping nodes wait 
> for schema agreement from nodes that have been removed if token allocation 
> for keyspace is enabled.
>  
> It is fairly easy to reproduce with the following steps:
> {noformat}
> // Create 3 node cluster
> ccm create test --vnodes -n 3 -s -v 3.11.10
> // Remove two nodes
> ccm node2 decommission
> ccm node3 decommission
> ccm node2 remove
> ccm node3 remove
> // Create keyspace to change the schema. It works if the schema never changes.
> ccm node1 cqlsh -x "CREATE KEYSPACE k WITH replication = {'class': 
> 'SimpleStrategy', 'replication_factor': 1};"
> // Add allocate parameter
> ccm updateconf 'allocate_tokens_for_keyspace: k'
> // Add node2 again to cluster
> ccm add node2 -i 127.0.0.2 -j 7200 -r 2200
> ccm node2 start{noformat}
>  
> This will cause node2 to throw exception on startup:
> {noformat}
> WARN  [main] 2021-04-08 14:10:53,272 StorageService.java:941 - There are 
> nodes in the cluster with a different schema version than us we did not 
> merged schemas from, our version : (a5da47ec-ffe3-3111-b2f3-325f771f1539), 
> outstanding versions -> endpoints : 
> {8e9ec79e-5ed2-3949-8ac8-794abfee3837=[/127.0.0.3]}
> ERROR [main] 2021-04-08 14:10:53,274 CassandraDaemon.java:803 - Exception 
> encountered during startup
> java.lang.RuntimeException: Didn't receive schemas for all known versions 
> within the timeout
> at 
> org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:947)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:206) 
> ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:177)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1073)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:753)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:687)
>  ~[apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395) 
> [apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633)
>  [apache-cassandra-3.11.10.jar:3.11.10]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786) 
> [apache-cassandra-3.11.10.jar:3.11.10]
> INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,279 
> HintsService.java:209 - Paused hints dispatch
> WARN  [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 Gossiper.java:1670 
> - No local state, state is in silent shutdown, or node hasn't joined, not 
> announcing shutdown
> INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 
> MessagingService.java:985 - Waiting for messaging service to quiesce
> INFO  [ACCEPT-/127.0.0.2] 2021-04-08 14:10:53,281 MessagingService.java:1346 
> - MessagingService has terminated the accept() thread
> INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,416 
> HintsService.java:209 - Paused hints dispatch{noformat}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16577) Node waits for schema agreement on removed nodes

2021-04-08 Thread Jan Karlsson (Jira)
Jan Karlsson created CASSANDRA-16577:


 Summary: Node waits for schema agreement on removed nodes
 Key: CASSANDRA-16577
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16577
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip, Consistency/Bootstrap and Decommission
Reporter: Jan Karlsson


CASSANDRA-15158 might have introduced a bug where bootstrapping nodes wait for 
schema agreement from nodes that have been removed if token allocation for 
keyspace is enabled.

 

It is fairly easy to reproduce with the following steps:
{noformat}
// Create 3 node cluster
ccm create test --vnodes -n 3 -s -v 3.11.10

// Remove two nodes
ccm node2 decommission
ccm node3 decommission
ccm node2 remove
ccm node3 remove

// Create keyspace to change the schema. It works if the schema never changes.
ccm node1 cqlsh -x "CREATE KEYSPACE k WITH replication = {'class': 
'SimpleStrategy', 'replication_factor': 1};"

// Add allocate parameter
ccm updateconf 'allocate_tokens_for_keyspace: k'

// Add node2 again to cluster
ccm add node2 -i 127.0.0.2 -j 7200 -r 2200
ccm node2 start{noformat}
 

This will cause node2 to throw exception on startup:
{noformat}
WARN  [main] 2021-04-08 14:10:53,272 StorageService.java:941 - There are nodes 
in the cluster with a different schema version than us we did not merged 
schemas from, our version : (a5da47ec-ffe3-3111-b2f3-325f771f1539), outstanding 
versions -> endpoints : {8e9ec79e-5ed2-3949-8ac8-794abfee3837=[/127.0.0.3]}
ERROR [main] 2021-04-08 14:10:53,274 CassandraDaemon.java:803 - Exception 
encountered during startup
java.lang.RuntimeException: Didn't receive schemas for all known versions 
within the timeout
at 
org.apache.cassandra.service.StorageService.waitForSchema(StorageService.java:947)
 ~[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.dht.BootStrapper.allocateTokens(BootStrapper.java:206) 
~[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.dht.BootStrapper.getBootstrapTokens(BootStrapper.java:177) 
~[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1073)
 ~[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:753) 
~[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) 
~[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:395) 
[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:633) 
[apache-cassandra-3.11.10.jar:3.11.10]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:786) 
[apache-cassandra-3.11.10.jar:3.11.10]
INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,279 
HintsService.java:209 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 Gossiper.java:1670 - 
No local state, state is in silent shutdown, or node hasn't joined, not 
announcing shutdown
INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,280 
MessagingService.java:985 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/127.0.0.2] 2021-04-08 14:10:53,281 MessagingService.java:1346 - 
MessagingService has terminated the accept() thread
INFO  [StorageServiceShutdownHook] 2021-04-08 14:10:53,416 
HintsService.java:209 - Paused hints dispatch{noformat}
 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16316) Tracing continues after session completed

2020-12-18 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251749#comment-17251749
 ] 

Jan Karlsson commented on CASSANDRA-16316:
--

Sure thing. Here is a 
[dtest|https://github.com/apache/cassandra-dtest/compare/trunk...itskarlsson:16316]
 that tests for this issue. I tested it both with and without patch on 3.11.9 
to verify.

> Tracing continues after session completed
> -
>
> Key: CASSANDRA-16316
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16316
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> We saw the system_trace.events table increasing in size continuously without 
> any trace requests being issued.
> I traced the issue back to a specific version and patch. I believe we have 
> removed the call to reset the trace flag in CASSANDRA-15041 which causes 
> tracing to continue in the thread even after it is finished with the request.
> Reproduced like follows:
> 1. ccm test -n 1 -v 3.11.9
> 2. Enable authentication/authorization
> 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is 
> default value. I am guessing this is because the update is done in the 
> calling thread)
> 4. select * from some table a bunch of times until PermissionRoleCache is 
> refreshed
> 5. Watch system_traces.events grow
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16316) Tracing continues after session completed

2020-12-09 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246496#comment-17246496
 ] 

Jan Karlsson edited comment on CASSANDRA-16316 at 12/9/20, 12:36 PM:
-

You are probably right about trunk. I also tried reproducing it on there 
without success.

As for a fix, I think something simple like calling 
maybeResetTraceSessionWrapper should suffice.

Something like 
[this|https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:16316-3.11].

I can provide patches for the other versions if needed.


was (Author: jan karlsson):
You are probably right about trunk. I also tried reproducing it on there 
without success.

As for a fix, I think something simple like calling 
maybeResetTraceSessionWrapper should suffice.

Something like 
[this|[https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:16316-3.11]].

I can provide patches for the other versions if needed.

> Tracing continues after session completed
> -
>
> Key: CASSANDRA-16316
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16316
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> We saw the system_trace.events table increasing in size continuously without 
> any trace requests being issued.
> I traced the issue back to a specific version and patch. I believe we have 
> removed the call to reset the trace flag in CASSANDRA-15041 which causes 
> tracing to continue in the thread even after it is finished with the request.
> Reproduced like follows:
> 1. ccm test -n 1 -v 3.11.9
> 2. Enable authentication/authorization
> 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is 
> default value. I am guessing this is because the update is done in the 
> calling thread)
> 4. select * from some table a bunch of times until PermissionRoleCache is 
> refreshed
> 5. Watch system_traces.events grow
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16316) Tracing continues after session completed

2020-12-09 Thread Jan Karlsson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246496#comment-17246496
 ] 

Jan Karlsson commented on CASSANDRA-16316:
--

You are probably right about trunk. I also tried reproducing it on there 
without success.

As for a fix, I think something simple like calling 
maybeResetTraceSessionWrapper should suffice.

Something like 
[this|[https://github.com/apache/cassandra/compare/cassandra-3.11...itskarlsson:16316-3.11]].

I can provide patches for the other versions if needed.

> Tracing continues after session completed
> -
>
> Key: CASSANDRA-16316
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16316
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Tracing
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
>
> We saw the system_trace.events table increasing in size continuously without 
> any trace requests being issued.
> I traced the issue back to a specific version and patch. I believe we have 
> removed the call to reset the trace flag in CASSANDRA-15041 which causes 
> tracing to continue in the thread even after it is finished with the request.
> Reproduced like follows:
> 1. ccm test -n 1 -v 3.11.9
> 2. Enable authentication/authorization
> 3. Set permissions_update_interval_in_ms: 1000 (It works if this value is 
> default value. I am guessing this is because the update is done in the 
> calling thread)
> 4. select * from some table a bunch of times until PermissionRoleCache is 
> refreshed
> 5. Watch system_traces.events grow
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16316) Tracing continues after session completed

2020-12-08 Thread Jan Karlsson (Jira)
Jan Karlsson created CASSANDRA-16316:


 Summary: Tracing continues after session completed
 Key: CASSANDRA-16316
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16316
 Project: Cassandra
  Issue Type: Bug
  Components: Observability/Tracing
Reporter: Jan Karlsson
Assignee: Jan Karlsson


We saw the system_trace.events table increasing in size continuously without 
any trace requests being issued.

I traced the issue back to a specific version and patch. I believe we have 
removed the call to reset the trace flag in CASSANDRA-15041 which causes 
tracing to continue in the thread even after it is finished with the request.

Reproduced like follows:
1. ccm test -n 1 -v 3.11.9

2. Enable authentication/authorization

3. Set permissions_update_interval_in_ms: 1000 (It works if this value is 
default value. I am guessing this is because the update is done in the calling 
thread)

4. select * from some table a bunch of times until PermissionRoleCache is 
refreshed

5. Watch system_traces.events grow

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14710) Use quilt to patch cassandra.in.sh in Debian packaging

2019-05-06 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833757#comment-16833757
 ] 

Jan Karlsson commented on CASSANDRA-14710:
--

Took a look at the patch and LGTM. Seems to all apply cleanly and installs just 
fine in a Debian docker container.

 

> Use quilt to patch cassandra.in.sh in Debian packaging
> --
>
> Key: CASSANDRA-14710
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14710
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Michael Shuler
>Assignee: Michael Shuler
>Priority: Normal
> Fix For: 4.0
>
> Attachments: CASSANDRA-14710_c.in.sh.patch.txt
>
>
> While working on CASSANDRA-14707, I found the debian/cassandra.in.sh file is 
> outdated and is missing some elements from bin/cassandra.in.sh. This should 
> not be a separately maintained file, so let's use quilt to patch the few bits 
> that need to be updated on Debian package installations.
>  * rm debian/cassandra.in.sh
>  * create quilt patch for path updates needed
>  * update debian/cassandra.install to install our patched bin/cassandra.in.sh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14904) SSTableloader doesn't understand listening for CQL connections on multiple ports

2019-04-10 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814423#comment-16814423
 ] 

Jan Karlsson commented on CASSANDRA-14904:
--

I scraped together some time to have a look. LGTM for the most part, but I have 
some thoughts.

I have been thinking of the use case where both native_transport_port and the 
native_transport_port_ssl are set.

1. With this patch, the behavior will be that we will always use the 
native_transport_port_ssl if both are set unless overridden by command line. I 
don't necessarily see a problem with that but it might not be very transparent 
behavior. 
2. No matter what we choose to do about this behavior, a test case that tests 
the case of both being set would be good to add.

> SSTableloader doesn't understand listening for CQL connections on multiple 
> ports
> 
>
> Key: CASSANDRA-14904
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14904
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Kurt Greaves
>Assignee: Ian Cleasby
>Priority: Low
> Fix For: 4.0, 3.11.x
>
>
> sstableloader only searches the yaml for native_transport_port, so if 
> native_transport_port_ssl is set and encryption is enabled sstableloader will 
> fail to connect as it will use the non-SSL port for the connection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10091) Integrated JMX authn & authz

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-10091:
-
Component/s: (was: Legacy/Observability)

> Integrated JMX authn & authz
> 
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Local/Config, Local/Startup and Shutdown
>Reporter: Jan Karlsson
>Assignee: Sam Tunnicliffe
>Priority: Minor
>  Labels: doc-impacting, security
> Fix For: 3.6
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Component/s: (was: Legacy/Tools)

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13404:
-
Component/s: (was: Legacy/Streaming and Messaging)

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Messaging/Client
>Reporter: Jan Karlsson
>Assignee: Per Otterström
>Priority: Major
>  Labels: security
> Fix For: 4.x
>
> Attachments: 13404-trunk-v2.patch, 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Component/s: (was: Legacy/Streaming and Messaging)

> Repair grows data on nodes, causes load to become unbalanced
> 
>
> Key: CASSANDRA-8366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
> Environment: 4 node cluster
> 2.1.2 Cassandra
> Inserts and reads are done with CQL driver
>Reporter: Jan Karlsson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 2.1.5
>
> Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
> results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
> results-500_2_inc_repairs.txt, 
> results-500_full_repair_then_inc_repairs.txt, 
> results-500_inc_repairs_not_parallel.txt, 
> run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
> run3_no_compact_before_repair.log, test.sh, testv2.sh
>
>
> There seems to be something weird going on when repairing data.
> I have a program that runs 2 hours which inserts 250 random numbers and reads 
> 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
> I use size-tiered compaction for my cluster. 
> After those 2 hours I run a repair and the load of all nodes goes up. If I 
> run incremental repair the load goes up alot more. I saw the load shoot up 8 
> times the original size multiple times with incremental repair. (from 2G to 
> 16G)
> with node 9 8 7 and 6 the repro procedure looked like this:
> (Note that running full repair first is not a requirement to reproduce.)
> {noformat}
> After 2 hours of 250 reads + 250 writes per second:
> UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> Repair -pr -par on all nodes sequentially
> UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> after rolling restart
> UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> compact all nodes sequentially
> UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> restart once more
> UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> {noformat}
> Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13404:
-
Component/s: Legacy/Streaming and Messaging

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Streaming and Messaging, Messaging/Client
>Reporter: Jan Karlsson
>Assignee: Per Otterström
>Priority: Major
>  Labels: security
> Fix For: 4.x
>
> Attachments: 13404-trunk-v2.patch, 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Component/s: Consistency/Repair

> Repair grows data on nodes, causes load to become unbalanced
> 
>
> Key: CASSANDRA-8366
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8366
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Legacy/Streaming and Messaging
> Environment: 4 node cluster
> 2.1.2 Cassandra
> Inserts and reads are done with CQL driver
>Reporter: Jan Karlsson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 2.1.5
>
> Attachments: 0001-8366.patch, results-1000-inc-repairs.txt, 
> results-1750_inc_repair.txt, results-500_1_inc_repairs.txt, 
> results-500_2_inc_repairs.txt, 
> results-500_full_repair_then_inc_repairs.txt, 
> results-500_inc_repairs_not_parallel.txt, 
> run1_with_compact_before_repair.log, run2_no_compact_before_repair.log, 
> run3_no_compact_before_repair.log, test.sh, testv2.sh
>
>
> There seems to be something weird going on when repairing data.
> I have a program that runs 2 hours which inserts 250 random numbers and reads 
> 250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 
> I use size-tiered compaction for my cluster. 
> After those 2 hours I run a repair and the load of all nodes goes up. If I 
> run incremental repair the load goes up alot more. I saw the load shoot up 8 
> times the original size multiple times with incremental repair. (from 2G to 
> 16G)
> with node 9 8 7 and 6 the repro procedure looked like this:
> (Note that running full repair first is not a requirement to reproduce.)
> {noformat}
> After 2 hours of 250 reads + 250 writes per second:
> UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> Repair -pr -par on all nodes sequentially
> UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> after rolling restart
> UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> compact all nodes sequentially
> UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> repair -inc -par on all nodes sequentially
> UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> restart once more
> UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
> UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
> UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
> UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
> {noformat}
> Is there something im missing or is this strange behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Component/s: Legacy/Tools

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools, Tool/bulk load
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13404:
-
Component/s: (was: Legacy/Streaming and Messaging)
 Messaging/Client

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Messaging/Client
>Reporter: Jan Karlsson
>Assignee: Per Otterström
>Priority: Major
>  Labels: security
> Fix For: 4.x
>
> Attachments: 13404-trunk-v2.patch, 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2019-01-29 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Component/s: (was: Legacy/Tools)
 Tool/bulk load

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/bulk load
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14789) Configuring nodetool from a file

2018-10-17 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653632#comment-16653632
 ] 

Jan Karlsson commented on CASSANDRA-14789:
--

{{I had a look at how we can do this, but was not impressed with the options we 
have. Airline does not seem to jive(no pun intended) well with going into the 
code to fetch defaults from a file. Overriding the different parameters in the 
abstract class does not seem to be too smooth. Doing it in the script calling 
Nodetool might be a little more clean. Sourcing in a file could allow to 
manipulate the ARGS variable by adding lines likes this:}}

{{JMX_PORT=7199}}
{{ARGS="$ARGS -h 127.0.0.2"}}

{{There are a few concerns I have with this approach. Firstly, It might some 
security risks associated, but file permissions can help with that. Secondly, 
we will be practically requiring the user to provide lines of bash script. I 
would like to avoid that, but I am not sure how to do that without having a map 
of all the available options and grepping the ARGS parameter with each 
options.}}

{{All in all, it might not be as bad, considering that this is an optional 
feature for more advanced use cases.}}

{{This solution is definitely quite non intrusive, but it does mean that 
parameters could be placed twice into the command run. It is a little iffy but 
it should still work.}}

> Configuring nodetool from a file
> 
>
> Key: CASSANDRA-14789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14789
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> Fix For: 4.x
>
>
> Nodetool has a lot of options that can be set. SSL can be configured through 
> a file[1], but most other parameters must be provided when running the 
> command. It would be helpful to be able to configure its parameters through a 
> file much like how cqlsh can be configured[2].
>  
> [1] https://issues.apache.org/jira/browse/CASSANDRA-9090
> [2] 
> [https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14789) Configuring nodetool from a file

2018-09-25 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-14789:


 Summary: Configuring nodetool from a file
 Key: CASSANDRA-14789
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14789
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jan Karlsson
Assignee: Jan Karlsson
 Fix For: 4.x


Nodetool has a lot of options that can be set. SSL can be configured through a 
file[1], but most other parameters must be provided when running the command. 
It would be helpful to be able to configure its parameters through a file much 
like how cqlsh can be configured[2].

 

[1] https://issues.apache.org/jira/browse/CASSANDRA-9090

[2] 
[https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshUsingCqlshrc.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2018-08-15 Thread Jan Karlsson (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Reproduced In: 3.0.15, 2.2.9  (was: 2.2.9, 4.0)
Fix Version/s: (was: 4.x)

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2018-08-15 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16581085#comment-16581085
 ] 

Jan Karlsson commented on CASSANDRA-13639:
--

I tried running it with both 3.0.15 and trunk. I was not able to reproduce this 
on latest trunk but I could get this behavior on 3.0.15. Seems the changes have 
fixed this issue.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2018-08-01 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565346#comment-16565346
 ] 

Jan Karlsson edited comment on CASSANDRA-13639 at 8/1/18 1:56 PM:
--

I apologize for my long absence but I have time look into this now.
{quote}If outboundBindAny would be set to true, then the SSL Socket would be 
bound to any local address, which is most likely not what we want, so not sure 
why we would ever want to set outboundBindAny to true anyway.
{quote}
I actually believe the contrary. Having the SSL Socket bound to the address 
which is specified by your operating system's routing is precisely what we 
want. It seens fishy that we always pick the local address and ignore the 
routing of the operating system.
{quote}I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL-}} or 
{{-sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.
{quote}
I can see the point of adding a flag for the simple fact that we would not 
break backward compatibility, but we should also consider that picking the 
first interface no matter what routing is set up seems like faulty behavior.

If we choose to go this route to keep backwards compatibility, we should 
describe this behavior in the documentation. The error I received was rather 
strange when I hit this issue locally on my machine and required me to dig 
quite deep to find the root cause.


was (Author: jan karlsson):
I apologize for my long absence but I have time look into this now.
{quote}If outboundBindAny would be set to true, then the SSL Socket would be 
bound to any local address, which is most likely not what we want, so not sure 
why we would ever want to set outboundBindAny to true anyway.
{quote}
I actually believe the contrary. Having the SSL Socket bound to the local 
address which is specified by your operating system's routing is precisely what 
we want. It seens fishy that we always pick the local address and ignore the 
routing of the operating system.
{quote}I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL-}} or 
{{-sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.
{quote}
I can see the point of adding a flag for the simple fact that we would not 
break backward compatibility, but we should also consider that picking the 
first interface no matter what routing is set up seems like faulty behavior.

If we choose to go this route to keep backwards compatibility, we should 
describe this behavior in the documentation. The error I received was rather 
strange when I hit this issue locally on my machine and required me to dig 
quite deep to find the root cause.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2018-08-01 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565346#comment-16565346
 ] 

Jan Karlsson edited comment on CASSANDRA-13639 at 8/1/18 1:55 PM:
--

I apologize for my long absence but I have time look into this now.
{quote}If outboundBindAny would be set to true, then the SSL Socket would be 
bound to any local address, which is most likely not what we want, so not sure 
why we would ever want to set outboundBindAny to true anyway.
{quote}
I actually believe the contrary. Having the SSL Socket bound to the local 
address which is specified by your operating system's routing is precisely what 
we want. It seens fishy that we always pick the local address and ignore the 
routing of the operating system.
{quote}I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL-}} or 
{{-sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.
{quote}
I can see the point of adding a flag for the simple fact that we would not 
break backward compatibility, but we should also consider that picking the 
first interface no matter what routing is set up seems like faulty behavior.

If we choose to go this route to keep backwards compatibility, we should 
describe this behavior in the documentation. The error I received was rather 
strange when I hit this issue locally on my machine and required me to dig 
quite deep to find the root cause.


was (Author: jan karlsson):
I apologize for my long absence but I have time look into this now.
{quote}If outboundBindAny would be set to true, then the SSL Socket would be 
bound to any local address, which is most likely not what we want, so not sure 
why we would ever want to set outboundBindAny to true anyway.
{quote}
I actually believe the contrary. Having the SSL Socket bound to the local 
address which is specified by your operating system's routing instead of always 
picking the local address(aka the first interface) is precisely what we want.
{quote}I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL}} or 
{{--sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.
{quote}

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2018-08-01 Thread Jan Karlsson (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565346#comment-16565346
 ] 

Jan Karlsson commented on CASSANDRA-13639:
--

I apologize for my long absence but I have time look into this now.
{quote}If outboundBindAny would be set to true, then the SSL Socket would be 
bound to any local address, which is most likely not what we want, so not sure 
why we would ever want to set outboundBindAny to true anyway.
{quote}
I actually believe the contrary. Having the SSL Socket bound to the local 
address which is specified by your operating system's routing instead of always 
picking the local address(aka the first interface) is precisely what we want.
{quote}I agree with [~spo...@gmail.com] here because I think having a cmd line 
parameter seems to be better. Something like {{--localOutboundAddressSSL}} or 
{{--sslLocalOutboundAddress}}, which defaults to 
{{FBUtilities.getLocalAddress()}}.
{quote}

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Major
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-07-19 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092807#comment-16092807
 ] 

Jan Karlsson commented on CASSANDRA-13639:
--

If SSL is enabled, {SSTableLoader} always uses the hostname no matter how your 
routing is set up. If you have a second interface that you route all 
{SSTableLoader} traffic from, it will still pick your first network interface 
because it corresponds with your hostname. Thereby overriding any routing you 
might have set up. This screams bug to me.

The correct behavior would be for {SSTableLoader} to use the normal routing of 
the server. I am unclear why we set the from address specifically ourself 
instead of leaving it blank. I can see that it might be useful to have it as a 
command variable as well. However it is quite strange to set up a 'connect 
from' address.
{code}
if (encryptionOptions != null && encryptionOptions.internode_encryption 
!= EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none)
{
if (outboundBindAny)
return SSLFactory.getSocket(encryptionOptions, peer, 
secureStoragePort);
else
return SSLFactory.getSocket(encryptionOptions, peer, 
secureStoragePort, FBUtilities.getLocalAddress(), 0);
}{code}

I am a little unclear of why the code is the way it is. The method is only 
called with {outboundBindAny} set to false. It seems to me that calling it 
without the {FBUtilities} call would be the correct way of calling it.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-07-19 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16092807#comment-16092807
 ] 

Jan Karlsson edited comment on CASSANDRA-13639 at 7/19/17 8:54 AM:
---

If SSL is enabled, {{SSTableLoader}} always uses the hostname no matter how 
your routing is set up. If you have a second interface that you route all 
{{SSTableLoader}} traffic from, it will still pick your first network interface 
because it corresponds with your hostname. Thereby overriding any routing you 
might have set up. This screams bug to me.

The correct behavior would be for {{SSTableLoader}} to use the normal routing 
of the server. I am unclear why we set the from address specifically ourself 
instead of leaving it blank. I can see that it might be useful to have it as a 
command variable as well. However it is quite strange to set up a 'connect 
from' address.
{code}
if (encryptionOptions != null && encryptionOptions.internode_encryption 
!= EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none)
{
if (outboundBindAny)
return SSLFactory.getSocket(encryptionOptions, peer, 
secureStoragePort);
else
return SSLFactory.getSocket(encryptionOptions, peer, 
secureStoragePort, FBUtilities.getLocalAddress(), 0);
}{code}

I am a little unclear of why the code is the way it is. The method is only 
called with {{outboundBindAny}} set to false. It seems to me that calling it 
without the {{FBUtilities}} call would be the correct way of calling it.


was (Author: jan karlsson):
If SSL is enabled, {SSTableLoader} always uses the hostname no matter how your 
routing is set up. If you have a second interface that you route all 
{SSTableLoader} traffic from, it will still pick your first network interface 
because it corresponds with your hostname. Thereby overriding any routing you 
might have set up. This screams bug to me.

The correct behavior would be for {SSTableLoader} to use the normal routing of 
the server. I am unclear why we set the from address specifically ourself 
instead of leaving it blank. I can see that it might be useful to have it as a 
command variable as well. However it is quite strange to set up a 'connect 
from' address.
{code}
if (encryptionOptions != null && encryptionOptions.internode_encryption 
!= EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none)
{
if (outboundBindAny)
return SSLFactory.getSocket(encryptionOptions, peer, 
secureStoragePort);
else
return SSLFactory.getSocket(encryptionOptions, peer, 
secureStoragePort, FBUtilities.getLocalAddress(), 0);
}{code}

I am a little unclear of why the code is the way it is. The method is only 
called with {outboundBindAny} set to false. It seems to me that calling it 
without the {FBUtilities} call would be the correct way of calling it.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-07-17 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089522#comment-16089522
 ] 

Jan Karlsson commented on CASSANDRA-13639:
--

The problem stems from the fact that the SSTableLoader has his own way of 
reading the yaml file but still uses the a default created DatabaseDescriptor 
to connect by using {{FBUtilities.getLocalAddress()}}. Perhaps another solution 
maybe to add this as a parameter to SSTableLoader.

In BulkLoadConnectionFactory, after a rather strange if clause that is always 
false, {{SSLFactory.getSocket(encryptionOptions, peer, secureStoragePort, 
FBUtilities.getLocalAddress(), 0);}} fetches the IP address from the 
DatabaseDescriptor which will return null because the listenAddress is not set 
by default on the DatabaseDescriptor object. My patch applies the listen 
address from the yaml file to the DatabaseDescriptor which in turn fixes the 
issue.

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-07-02 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Reproduced In: 2.2.9, 4.0  (was: 2.2.9)
   Status: Patch Available  (was: Open)

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-07-02 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Attachment: 13639-trunk

Patch on trunk which resolves this issue. Verified it manually with lsof. 

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13639-trunk
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-06-27 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Fix Version/s: 4.x

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files from

2017-06-27 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13639:
-
Summary: SSTableLoader always uses hostname to stream files from  (was: 
SSTableLoader always uses hostname to stream files)

> SSTableLoader always uses hostname to stream files from
> ---
>
> Key: CASSANDRA-13639
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>
> I stumbled upon an issue where SSTableLoader was ignoring our routing by 
> using the wrong interface to send the SSTables to the other nodes. Looking at 
> the code, it seems that we are using FBUtilities.getLocalAddress() to fetch 
> out the hostname, even if the yaml file specifies a different host. I am not 
> sure why we call this function instead of using the routing by leaving it 
> blank, perhaps someone could enlighten me.
> This behaviour comes from the fact that we use a default created 
> DatabaseDescriptor which does not set the values for listenAddress and 
> listenInterface. This causes the aforementioned function to retrieve the 
> hostname at all times, even if it is not the interface used in the yaml file.
> I propose we break out the function that handles listenAddress and 
> listenInterface and call it so that listenAddress or listenInterface is 
> getting populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13639) SSTableLoader always uses hostname to stream files

2017-06-27 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-13639:


 Summary: SSTableLoader always uses hostname to stream files
 Key: CASSANDRA-13639
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13639
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Jan Karlsson
Assignee: Jan Karlsson


I stumbled upon an issue where SSTableLoader was ignoring our routing by using 
the wrong interface to send the SSTables to the other nodes. Looking at the 
code, it seems that we are using FBUtilities.getLocalAddress() to fetch out the 
hostname, even if the yaml file specifies a different host. I am not sure why 
we call this function instead of using the routing by leaving it blank, perhaps 
someone could enlighten me.

This behaviour comes from the fact that we use a default created 
DatabaseDescriptor which does not set the values for listenAddress and 
listenInterface. This causes the aforementioned function to retrieve the 
hostname at all times, even if it is not the interface used in the yaml file.

I propose we break out the function that handles listenAddress and 
listenInterface and call it so that listenAddress or listenInterface is getting 
populated in the DatabaseDescriptor.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-04-07 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960403#comment-15960403
 ] 

Jan Karlsson commented on CASSANDRA-13354:
--

yes small change lgtm

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2017-04-06 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958876#comment-15958876
 ] 

Jan Karlsson commented on CASSANDRA-13404:
--

It is good that you made the distinction that MiM is not something that this 
ticket aims to solve. Instead this ticket allows you to bind certificates to 
certain hosts to make it less vulnerable. 

Applications which have to worry about rogue clients can use this on top of 
application side authentication as an extra layer of security and have broader 
control over the clients that connect to their server. 
{Quote}
I think it was mentioned somewhere that reusing SSLContext instances would be 
preferable in the future due to performance reasons. We'd have to change the 
code to either return a shared or a newly created instance if we would add this 
feature. 
{Quote}
Could you elaborate on this? Are we not using the same SSLContext and 
retrieving the engine from it?

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2017-04-05 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956604#comment-15956604
 ] 

Jan Karlsson commented on CASSANDRA-13404:
--

{quote}
To back up and add a bit more context (for myself, if anything), where do you 
want to add the additional hostname verification? Can you explain the specific 
behaviour you're looking to add? 
{quote}
The behaviour I am trying to add is that the server validates that the client 
certificate is issued to the IP address/host that the client connects from. You 
are correct that this would require require_client_auth to be set as this will 
ensure that the server validates the client to begin with. Disabling 
require_client_auth while enabling hostname verification will actually not do 
anything. We wont validate anything. Do you think we should add a warning 
during startup that you cannot have hostname validation without requiring 
validation?
{quote}
Further, this would require the database server to know all of the possible 
peers that would want to connect to it, before the process starts.
{quote}
Not necessarily, I take the incoming connection, extract the IP, then the 
identification algorithm checks whether the SAN in the certificate holds this 
IP address.
{quote}
Also, I've spoken with the netty developers, and they said netty currently does 
not support (in either netty 4.0 or 4.1) the ability to perform hostname 
verification on the server side (either openssl or jdk ssl). Thus, I'm not sure 
how you verified your patch behaves correctly.
{quote}
I used the java driver and added [Netty Options | 
http://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/NettyOptions.html#afterBootstrapInitialized-io.netty.bootstrap.Bootstrap-]
 to change the local address in afterBootstrapInitialized. This allows me to 
change what interface I use to connect to C*. Then I used a certificate I had 
forged for a different interface and tested to connect to a node. Worked like a 
charm. I then applied my patch and got a exception on both the server and the 
client side. Lastly i switched which IP address I connected from to the 
interface that was specified in the certificate and the exceptions disappeared.

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2017-04-03 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13404:
-
Fix Version/s: 4.x
   Status: Patch Available  (was: Open)

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2017-04-03 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13404:
-
Attachment: 13404-trunk.txt

Should apply cleanly to trunk

> Hostname verification for client-to-node encryption
> ---
>
> Key: CASSANDRA-13404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Fix For: 4.x
>
> Attachments: 13404-trunk.txt
>
>
> Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
> for client-node connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13404) Hostname verification for client-to-node encryption

2017-04-03 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-13404:


 Summary: Hostname verification for client-to-node encryption
 Key: CASSANDRA-13404
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13404
 Project: Cassandra
  Issue Type: New Feature
Reporter: Jan Karlsson
Assignee: Jan Karlsson


Similarily to CASSANDRA-9220, Cassandra should support hostname verification 
for client-node connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-23 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939107#comment-15939107
 ] 

Jan Karlsson commented on CASSANDRA-13354:
--

I did some tests simulating traffic on a 4 node cluster. 2 of the nodes were 
running with my patch while the other two ran without it.
Steps to reproduce:
Traffic on
Turn one of the nodes off
Wait 7 minutes
Truncate hints on all other nodes
Turn node on
Run repair on the node

As you can see the unpatched version kept increasing as non-repaired data from 
ongoing traffic was prioritized. If I had more discrepancies in my data set, 
this would just increase to the configured FD limit or until you die from heap 
pressure.

Repair is completed at 8:11pm but those small repaired files are not compacted 
as it picks unrepaired new sstables over the small repaired sstables. However, 
it did show a downwards trend as compaction was slightly faster than insertion 
and would probably eventually end with the repaired files compacted.

During the unpatched test, it only showed 2 pending compactions with 22k~ file 
descriptors open/10k~ sstables. At 8:33pm I disabled the traffic completely to 
hurry this along.
SSTables in each level: [10347/4, 5, 0, 0, 0, 0, 0, 0, 0]

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-23 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-
Attachment: patchedTest.png

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-23 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-
Attachment: unpatchedTest.png

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt, patchedTest.png, unpatchedTest.png
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-20 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932799#comment-15932799
 ] 

Jan Karlsson edited comment on CASSANDRA-13354 at 3/20/17 3:45 PM:
---

Added patch on 4.0 to fix this. Applies cleanly to other versions as 
well(tested 2.2.9).
I have tested this in a cluster and will upload some graphs as well.
Comments and suggestions welcome!


was (Author: jan karlsson):
Added patch on 4.0 to fix this. Should be pretty minimal work to get this to 
apply to other versions as well.

I have tested this in a cluster and will upload some graphs as well.
Comments and suggestions welcome!

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-20 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-
Attachment: (was: CASSANDRA-13354)

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-20 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-
Attachment: 13354-trunk.txt

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: 13354-trunk.txt
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-20 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-
Attachment: CASSANDRA-13354

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
> Attachments: CASSANDRA-13354
>
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-20 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-13354:
-
Status: Patch Available  (was: Open)

Added patch on 4.0 to fix this. Should be pretty minimal work to get this to 
apply to other versions as well.

I have tested this in a cluster and will upload some graphs as well.
Comments and suggestions welcome!

> LCS estimated compaction tasks does not take number of files into account
> -
>
> Key: CASSANDRA-13354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
> Environment: Cassandra 2.2.9
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>
> In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
> taking the size of a SSTable and multiply it by four. This would give 4*160mb 
> with default settings. This calculation is used to determine whether repaired 
> or repaired data is being compacted.
> Now this works well until you take repair into account. Repair streams over 
> many many sstables which could be smaller than the configured SSTable size 
> depending on your use case. In our case we are talking about many thousands 
> of tiny SSTables. As number of files increases one can run into any number of 
> problems, including GC issues, too many open files or plain increase in read 
> latency.
> With the current algorithm we will choose repaired or unrepaired depending on 
> whichever side has more data in it. Even if the repaired files outnumber the 
> unrepaired files by a large margin.
> Similarily, our algorithm that selects compaction candidates takes up to 32 
> SSTables at a time in L0, however our estimated task calculation does not 
> take this number into account. These two mechanisms should be aligned with 
> each other.
> I propose that we take the number of files in L0 into account when estimating 
> remaining tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13354) LCS estimated compaction tasks does not take number of files into account

2017-03-20 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-13354:


 Summary: LCS estimated compaction tasks does not take number of 
files into account
 Key: CASSANDRA-13354
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13354
 Project: Cassandra
  Issue Type: Bug
  Components: Compaction
 Environment: Cassandra 2.2.9
Reporter: Jan Karlsson
Assignee: Jan Karlsson


In LCS, the way we estimate number of compaction tasks remaining for L0 is by 
taking the size of a SSTable and multiply it by four. This would give 4*160mb 
with default settings. This calculation is used to determine whether repaired 
or repaired data is being compacted.

Now this works well until you take repair into account. Repair streams over 
many many sstables which could be smaller than the configured SSTable size 
depending on your use case. In our case we are talking about many thousands of 
tiny SSTables. As number of files increases one can run into any number of 
problems, including GC issues, too many open files or plain increase in read 
latency.

With the current algorithm we will choose repaired or unrepaired depending on 
whichever side has more data in it. Even if the repaired files outnumber the 
unrepaired files by a large margin.

Similarily, our algorithm that selects compaction candidates takes up to 32 
SSTables at a time in L0, however our estimated task calculation does not take 
this number into account. These two mechanisms should be aligned with each 
other.

I propose that we take the number of files in L0 into account when estimating 
remaining tasks. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-10091) Integrated JMX authn & authz

2016-03-21 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204083#comment-15204083
 ] 

Jan Karlsson commented on CASSANDRA-10091:
--

Great that you like the patch! I am really excited to get this in!

We have already created some dtests for this which can be found 
[here|https://github.com/beobal/cassandra-dtest/commits/10091].

I could take a look at the comments next week unless you want to take this 
[~beobal]?

> Integrated JMX authn & authz
> 
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11210) Unresolved hostname in replace address

2016-03-04 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-11210:
-
Attachment: 0001-Unresolved-hostname-leads-to-replace-being-ignored.patch

> Unresolved hostname in replace address
> --
>
> Key: CASSANDRA-11210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11210
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: Jan Karlsson
>Priority: Minor
>  Labels: lhf
> Fix For: 2.2.6
>
> Attachments: 
> 0001-Unresolved-hostname-leads-to-replace-being-ignored.patch
>
>
> If you provide a hostname which could not be resolved by DNS, it leads to 
> replace args being ignored. If you provide an IP which is not in the cluster, 
> it does the right thing and complain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11210) Unresolved hostname in replace address

2016-03-04 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-11210:
-
Fix Version/s: 2.2.6
   Status: Patch Available  (was: Open)

This should apply cleanly to 3.0/trunk except for the Changelog.

> Unresolved hostname in replace address
> --
>
> Key: CASSANDRA-11210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11210
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: Jan Karlsson
>Priority: Minor
>  Labels: lhf
> Fix For: 2.2.6
>
>
> If you provide a hostname which could not be resolved by DNS, it leads to 
> replace args being ignored. If you provide an IP which is not in the cluster, 
> it does the right thing and complain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-11210) Unresolved hostname in replace address

2016-03-04 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson reassigned CASSANDRA-11210:


Assignee: Jan Karlsson

> Unresolved hostname in replace address
> --
>
> Key: CASSANDRA-11210
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11210
> Project: Cassandra
>  Issue Type: Bug
>Reporter: sankalp kohli
>Assignee: Jan Karlsson
>Priority: Minor
>  Labels: lhf
>
> If you provide a hostname which could not be resolved by DNS, it leads to 
> replace args being ignored. If you provide an IP which is not in the cluster, 
> it does the right thing and complain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10091) Align JMX authentication with internal authentication

2016-03-01 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169402#comment-15169402
 ] 

Jan Karlsson edited comment on CASSANDRA-10091 at 3/1/16 2:14 PM:
--

[~beobal] We need to change the StartupChecks because we are still throwing an 
error in [checkJMXPorts| 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StartupChecks.java#L142]
 when we do not set cassandra.jmx.local.port.

We can also use {code}#JVM_OPTS="$JVM_OPTS 
-Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config"{code}
 instead of requiring the user to add their own path.

Otherwise LGTM.

Dtest can be found [here|https://github.com/ejankan/cassandra-dtest/tree/10091]
This Dtest needs the aforementioned changes to StartupChecks and 
$CASSANDRA_HOME to work.


was (Author: jan karlsson):
[~beobal] We need to change the StartupChecks because we are still throwing an 
error in [checkJMXPorts| 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StartupChecks.java#L142]
 when we do not set cassandra.jmx.local.port.

Otherwise LGTM.

I am currently writing a Dtest for the authn part of it.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10776) Prepare of statements after table creation fail with unconfigured column family

2016-03-01 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173470#comment-15173470
 ] 

Jan Karlsson edited comment on CASSANDRA-10776 at 3/1/16 9:18 AM:
--

This can actually be solved client side by maintaining a lock table. Before 
creating a table you check, using LWT, whether the lock exists and if it does 
not, gain the lock and create the table.


was (Author: jan karlsson):
This can actually be solved by having a lock table, which you check with before 
creating a table. Use LWT to check whether the lock exists and if it does not, 
gain the lock and create the table.

> Prepare of statements after table creation fail with unconfigured column 
> family
> ---
>
> Key: CASSANDRA-10776
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10776
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Dougal
>
> Cassandra 2.1.8
> We have multiple app instances trying to create the same table using IF NOT 
> EXISTS.
> We check for schema agreement via the Java Driver before and after every 
> statement.
> After creating the table we then prepare statements and we sometimes get:
> {code}
> com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured 
> columnfamily locks
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
>  ~[cassandra-driver-core-2.1.8.jar:na]
>   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>  ~[cassandra-driver-core-2.1.8.jar:na]
>   at 
> com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) 
> ~[cassandra-driver-core-2.1.8.jar:na]
>   at 
> uk.sky.cirrus.locking.CassandraLockingMechanism.init(CassandraLockingMechanism.java:69)
>  ~[main/:na]
>   at uk.sky.cirrus.locking.Lock.acquire(Lock.java:35) [main/:na]
>   at uk.sky.cirrus.CqlMigratorImpl.migrate(CqlMigratorImpl.java:83) 
> [main/:na]
>   at 
> uk.sky.cirrus.locking.LockVerificationTest.lambda$shouldManageContentionsForSchemaMigrate$0(LockVerificationTest.java:90)
>  [test/:na]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
> {code}
> Looking at the server logs we get:
> {code}
> ava.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 90bbb372-9446-11e5-b1ca-8119a6964819; expected 
> 90b87f20-9446-11e5-b1ca-8119a6964819)
>   at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1145) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) 
> ~[main/:na]
>   at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
>  ~[main/:na]
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> We found this issue which is marked as resolved:
> https://issues.apache.org/jira/browse/CASSANDRA-8387
> Does the IF NOT EXISTS just check the local node?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10776) Prepare of statements after table creation fail with unconfigured column family

2016-03-01 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173470#comment-15173470
 ] 

Jan Karlsson commented on CASSANDRA-10776:
--

This can actually be solved by having a lock table, which you check with before 
creating a table. Use LWT to check whether the lock exists and if it does not, 
gain the lock and create the table.

> Prepare of statements after table creation fail with unconfigured column 
> family
> ---
>
> Key: CASSANDRA-10776
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10776
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Adam Dougal
>
> Cassandra 2.1.8
> We have multiple app instances trying to create the same table using IF NOT 
> EXISTS.
> We check for schema agreement via the Java Driver before and after every 
> statement.
> After creating the table we then prepare statements and we sometimes get:
> {code}
> com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured 
> columnfamily locks
>   at 
> com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
>  ~[cassandra-driver-core-2.1.8.jar:na]
>   at 
> com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)
>  ~[cassandra-driver-core-2.1.8.jar:na]
>   at 
> com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:79) 
> ~[cassandra-driver-core-2.1.8.jar:na]
>   at 
> uk.sky.cirrus.locking.CassandraLockingMechanism.init(CassandraLockingMechanism.java:69)
>  ~[main/:na]
>   at uk.sky.cirrus.locking.Lock.acquire(Lock.java:35) [main/:na]
>   at uk.sky.cirrus.CqlMigratorImpl.migrate(CqlMigratorImpl.java:83) 
> [main/:na]
>   at 
> uk.sky.cirrus.locking.LockVerificationTest.lambda$shouldManageContentionsForSchemaMigrate$0(LockVerificationTest.java:90)
>  [test/:na]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
> {code}
> Looking at the server logs we get:
> {code}
> ava.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 90bbb372-9446-11e5-b1ca-8119a6964819; expected 
> 90b87f20-9446-11e5-b1ca-8119a6964819)
>   at org.apache.cassandra.config.CFMetaData.reload(CFMetaData.java:1145) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefsTables.updateColumnFamily(DefsTables.java:422) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefsTables.mergeColumnFamilies(DefsTables.java:295) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefsTables.mergeSchemaInternal(DefsTables.java:194) 
> ~[main/:na]
>   at org.apache.cassandra.db.DefsTables.mergeSchema(DefsTables.java:166) 
> ~[main/:na]
>   at 
> org.apache.cassandra.db.DefinitionsUpdateVerbHandler$1.runMayThrow(DefinitionsUpdateVerbHandler.java:49)
>  ~[main/:na]
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_60]
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> We found this issue which is marked as resolved:
> https://issues.apache.org/jira/browse/CASSANDRA-8387
> Does the IF NOT EXISTS just check the local node?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication

2016-02-26 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169402#comment-15169402
 ] 

Jan Karlsson commented on CASSANDRA-10091:
--

[~beobal] We need to change the StartupChecks because we are still throwing an 
error in [checkJMXPorts| 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/StartupChecks.java#L142]
 when we do not set cassandra.jmx.local.port.

Otherwise LGTM.

I am currently writing a Dtest for the authn part of it.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Sam Tunnicliffe
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException

2016-02-09 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138538#comment-15138538
 ] 

Jan Karlsson commented on CASSANDRA-8643:
-

This problem was on 2.1.12 and we were running full repair with -pr.

> merkle tree creation fails with NoSuchElementException
> --
>
> Key: CASSANDRA-8643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8643
> Project: Cassandra
>  Issue Type: Bug
> Environment: We are running on a three node cluster with three in 
> replication(C* 2.1.1). It uses a default C* installation and STCS.
>Reporter: Jan Karlsson
> Fix For: 2.1.3
>
>
> We have a problem that we encountered during testing over the weekend. 
> During the tests we noticed that repairs started to fail. This error has 
> occured on multiple non-coordinator nodes during repair. It also ran at least 
> once without producing this error.
> We run repair -pr on all nodes on different days. CPU values were around 40% 
> and disk was 50% full.
> From what I understand, the coordinator asked for merkle trees from the other 
> two nodes. However one of the nodes fails to create his merkle tree.
> Unfortunately we do not have a way to reproduce this problem.
> The coordinator receives:
> {noformat}
> 2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 
> [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for 
> censored (to [/xx.90, /xx.98, /xx.82])
> 2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] 
> RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] 
> Received merkle tree for censored from /xx.90
> 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] 
> RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session 
> completed with the following error
> org.apache.cassandra.exceptions.RepairException: [repair 
> #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
> (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
> at 
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] 
> CassandraDaemon.java:153 Exception in thread 
> Thread[AntiEntropySessions:76,5,RMI Runtime]
> java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: 
> [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
> (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
> at com.google.common.base.Throwables.propagate(Throwables.java:160) 
> ~[guava-16.0.jar:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_51]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
> (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
> at 
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> ... 3 common frames omitted
> {noformat}
> 

[jira] [Commented] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException

2016-02-08 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136995#comment-15136995
 ] 

Jan Karlsson commented on CASSANDRA-8643:
-

We hit it again. This time we had more time debugging the situation and we 
might have found the problem. It started occuring when we switched to 
LeveledCompactionStrategy. However it does not occur consistently. We usually 
get it once every 2-3 runs.

We enabled assertions and got "received out of order wrt". The problem we found 
is that the ranges of the tables are intersecting but the getScanners method in 
LCS expects them to be non-intersecting (as all sstables in the same level 
should not be intersecting). 

It could be that during the snapshot, a compaction occurs which writes more 
sstables into the level. Then when it is supplied to the repair job, it fails 
due to the ranges intersecting in the new and old sstables. 

When we tried repairing with -par, we did not hit it. It also worked with 2.2.4 
(which runs -par by default).

> merkle tree creation fails with NoSuchElementException
> --
>
> Key: CASSANDRA-8643
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8643
> Project: Cassandra
>  Issue Type: Bug
> Environment: We are running on a three node cluster with three in 
> replication(C* 2.1.1). It uses a default C* installation and STCS.
>Reporter: Jan Karlsson
> Fix For: 2.1.3
>
>
> We have a problem that we encountered during testing over the weekend. 
> During the tests we noticed that repairs started to fail. This error has 
> occured on multiple non-coordinator nodes during repair. It also ran at least 
> once without producing this error.
> We run repair -pr on all nodes on different days. CPU values were around 40% 
> and disk was 50% full.
> From what I understand, the coordinator asked for merkle trees from the other 
> two nodes. However one of the nodes fails to create his merkle tree.
> Unfortunately we do not have a way to reproduce this problem.
> The coordinator receives:
> {noformat}
> 2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 
> [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for 
> censored (to [/xx.90, /xx.98, /xx.82])
> 2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] 
> RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] 
> Received merkle tree for censored from /xx.90
> 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] 
> RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session 
> completed with the following error
> org.apache.cassandra.exceptions.RepairException: [repair 
> #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
> (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
> at 
> org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
>  ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] 
> CassandraDaemon.java:153 Exception in thread 
> Thread[AntiEntropySessions:76,5,RMI Runtime]
> java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: 
> [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
> (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
> at com.google.common.base.Throwables.propagate(Throwables.java:160) 
> ~[guava-16.0.jar:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
> ~[apache-cassandra-2.1.1.jar:2.1.1]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
> ~[na:1.7.0_51]
> at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_51]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: 
> 

[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication

2016-02-05 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134299#comment-15134299
 ] 

Jan Karlsson commented on CASSANDRA-10091:
--

[~beobal] I apologize for my long absence. How is the refactoring going?
Next week I will try to find some time to write up some tests. 

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-12-03 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037490#comment-15037490
 ] 

Jan Karlsson commented on CASSANDRA-10091:
--

I took a look at your proposal and it looks good. I like this approach on 
authz. You are definitely on the right track. 
{quote}
What does CassandraLoginModule give us? I appreciate that it's the standard-ish 
java way to do things, but it seems to me that we could just perform the call 
to legacyAuthenticate directly from JMXPasswordAuthenticator::authenticate. The 
authenticator impl is already pretty specific, so using the more generic APIs 
just seems to add bloat (but I could be missing something useful here).
{quote}
The advantage of doing it this way is that you could use the 
CassandraLoginModule without the JMXPasswordAuthenticator by setting the 
LoginModule as a jvm parameter. It might not be that useful for our use case 
though but this would give us authentication without having to start up our JMX 
server programmatically. One could use the module with Cassandra as is.
{quote}
The same thing goes for CassandraPrincipal, could we just create a 
javax.management.remote.JMXPrincipal in the name of the AuthenticatedUser 
obtained from the IAuthenticator?
{quote}
+1. I had originally included it incase we wanted to pass some Cassandra 
related information down to authz but it does not seem currently necessary.
{quote}
Will MX4J work with JMXPasswordAuthenticator?
{quote}
I have not tried this myself but according to 
[this|http://mx4j.sourceforge.net/docs/ch03s10.html] it seems work in the same 
fashion.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-10551) Investigate JMX auth using JMXMP & SASL

2015-11-05 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson reassigned CASSANDRA-10551:


Assignee: Jan Karlsson

> Investigate JMX auth using JMXMP & SASL
> ---
>
> Key: CASSANDRA-10551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10551
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sam Tunnicliffe
>Assignee: Jan Karlsson
> Fix For: 3.x
>
>
> (broken out from CASSANDRA-10091)
> We should look into whether using 
> [JMXMP|https://meteatamel.wordpress.com/2012/02/13/jmx-rmi-vs-jmxmp/] would 
> enable JMX authentication using SASL. If so, could we then define a custom 
> SaslServer which wraps a SaslNegotiator instance provided by the configured 
> IAuthenticator. 
> An intial look at the 
> [JMXMP|http://docs.oracle.com/cd/E19698-01/816-7609/6mdjrf873/] docs, 
> particularly section *11.4.2 SASL Provider*, suggests this might be feasible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10551) Investigate JMX auth using JMXMP & SASL

2015-11-05 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991610#comment-14991610
 ] 

Jan Karlsson commented on CASSANDRA-10551:
--

Changing to JMXMP seems to work from an implementation standpoint. However this 
will mean that current tools which are hardcoded to connect through RMI will 
have to be changed to function with JMXMP. I'm refering mostly to nodetool. 
i.e. eariler versions of nodetool will not be able to connect to the server.

What is more concerning is that some 3rd party tools like jconsole seem to lack 
the functionality to connect with Sasl profiles through jmxmp. I tried 
connecting with a [plain 
profile/mechanism|https://tools.ietf.org/html/rfc4616], but have not found a 
way to set a profile for jconsole.

> Investigate JMX auth using JMXMP & SASL
> ---
>
> Key: CASSANDRA-10551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10551
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sam Tunnicliffe
>Assignee: Jan Karlsson
> Fix For: 3.x
>
>
> (broken out from CASSANDRA-10091)
> We should look into whether using 
> [JMXMP|https://meteatamel.wordpress.com/2012/02/13/jmx-rmi-vs-jmxmp/] would 
> enable JMX authentication using SASL. If so, could we then define a custom 
> SaslServer which wraps a SaslNegotiator instance provided by the configured 
> IAuthenticator. 
> An intial look at the 
> [JMXMP|http://docs.oracle.com/cd/E19698-01/816-7609/6mdjrf873/] docs, 
> particularly section *11.4.2 SASL Provider*, suggests this might be feasible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-10-22 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968712#comment-14968712
 ] 

Jan Karlsson commented on CASSANDRA-10091:
--

{quote}
For instance, do we actually need to enact fine grained control over nodetool 
at the keyspace or table level, such that a user with permissions on keyspace 
ks_a would be able to run nodetool status ks_a, but not to run nodetool status 
ks_b? I think that's overkill and not really needed by most admins. 
{quote}
The current patch does not restrict access from nodetool in terms of which 
keyspace/table the command is run on. This is due to nodetool calling methods 
in the {{StorageProxy}}. However if someone were to call these methods from a 
specific columnfamily, it would prevent that. I believe preventing users from 
initiating operations like major compaction on a some tables but not on others 
is a fairly common use case. Especially when we provide so many potentially, in 
some cases detrimental operations like {{compact}}. Unfortunately this patch 
does not give this distinction on nodetool level, because you either have the 
permission for StorageProxy or you do not. However it does give you the choice 
to make that distinction on non-nodetool users.

{quote}
So for example, this would enable us to grant read access to all the 
ColumnFamily mbeans with GRANT SELECT ON ALL MBEANS IN 
'org.apache.cassandra.db:type=ColumnFamily', e.g. for running nodetool cfstats. 
What it doesn't permit is restricting access to a particular subset of 
ColumnFamily beans.
{quote}
Another disadvantage is that if the client application (for example I observed 
jconsole doing this) sends a jmx request with a wildcard mbean. For instance it 
might send something like {{java.lang:*}} or a wildcard would be send in when a 
program is trying to receive the names of all mbeans. Now the latter instance 
might not be so difficult to handle with your proposal, since {{queryNames}} 
and {{isInstanceOf}} are granted to everyone, but there might be other cases 
where wildcard mbean are being passed in. We would have to handle this somehow. 
Otherwise applications who pass wildcard mbean will have to have root 
permission.

{quote}
Also, I noticed one other thing regarding the MBeanServerForwarder 
implementation. We should create a new ClientState and log the 
AuthenticatedUser derived from the subject into it, which would have a couple 
of benefits. Firstly, the check that the user has the LOGIN privilege would be 
performed which isn't the case in the current patch. Second, the permissions 
check could include the full resource hierarchy using ensureHasPermission, 
rather than directly by calling the IAuthorizer::authorize.
{quote}
+1

Another aspect we need to remember is that currently there is no way to 
ascertain which mbeans are needed for a particular nodetool commands or for the 
different tools that exist (like jconsole). We probably need to document this 
somewhere.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-10-14 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955032#comment-14955032
 ] 

Jan Karlsson edited comment on CASSANDRA-10091 at 10/14/15 8:37 AM:


Great points. Thank you for taking the time to review this.

First of all, I agree completely on the use of 
{{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 
and I only recently forward ported it. I just wanted to get it out so we can 
commence with the discussion. I agree that we will have to make use of the 
{{IAuthenticator::newSaslAuthenticator}} and we should Investigate further.

Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in 
your points on the subject. I cannot stress the importance of wildcards. It 
seemed like an unpleasant experience to go through countless permissions and 
apply them one at a time. I know this is somehow lessened by the fact that you 
will only do this once per role, which can then be assigned to different users. 
However calling a simple command like {{nodetool status}} will require 4~ 
different mbeans under the hood while starting Jconsole can only be done by 
adding 10~ different mbeans.

Simplifying the {{JMXResource}} might be the way to go but we should consider 
how much freedom we will lose from doing this. I was actually debating this 
very thing when I implemented it. Should I have only meta permission, should I 
expose all permissions or both? I settled on doing both to cater to every use 
case.

The problem is that the mapping between nodetool commands and permissions is 
somewhat confusing. For instance in your remapping proposal. One would have to 
give SELECT, DESCRIBE and EXECUTE to be able to get all information out of 
{{nodetool info}}. Not something one would expect from such a command. This is 
why these meta-permissions were born.

It is simpler to give {{MBREAD}} to a user, then to give 
{{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are 
possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if 
you happen to have such a use case. One could argue that this might be an 
uncommon use case, but I have a hard time ruling it out.

 However if the consensus is that we should simplify it, which does have it's 
advantages, then I agree with your proposal.


was (Author: jan karlsson):
Great points. Thank you for taking the time to review this.

First of all, I agree completely on the use of 
{{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 
and I only recently forward ported it. I just wanted to get it out so we can 
commence with the discussion. I agree that we will have to make use of the 
{{IAuthenticator::newSaslAuthenticator}} and we should Investigate further.

Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in 
your points on the subject. I cannot stress the importance of wildcards. It 
seemed like an unpleasant experience to go through countless permissions and 
apply them one at a time. I know this is somehow lessened by the fact that you 
will only do this once per role, which can then be assigned to different users. 
However calling a simple command like {{nodetool status}} will require 4~ 
different permissions under the hood while starting Jconsole can only be done 
by adding 10~ different permissions.

Simplifying the {{JMXResource}} might be the way to go but we should consider 
how much freedom we will lose from doing this. I was actually debating this 
very thing when I implemented it. Should I have only meta permission, should I 
expose all permissions or both? I settled on doing both to cater to every use 
case.

The problem is that the mapping between nodetool commands and permissions is 
somewhat confusing. For instance in your remapping proposal. One would have to 
give SELECT, DESCRIBE and EXECUTE to be able to get all information out of 
{{nodetool info}}. Not something one would expect from such a command. This is 
why these meta-permissions were born.

It is simpler to give {{MBREAD}} to a user, then to give 
{{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are 
possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if 
you happen to have such a use case. One could argue that this might be an 
uncommon use case, but I have a hard time ruling it out.

 However if the consensus is that we should simplify it, which does have it's 
advantages, then I agree with your proposal.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jan Karlsson
>

[jira] [Commented] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-10-13 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955032#comment-14955032
 ] 

Jan Karlsson commented on CASSANDRA-10091:
--

Great points. Thank you for taking the time to review this.

First of all, I agree completely on the use of 
{{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 
and I only recently forward ported it. I just wanted to get it out so we can 
commence with the discussion. I agree that we will have to make use of the 
{{IAuthenticator::newSaslAuthenticator}} and we should Investigate further.

Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in 
your points on the subject. I cannot stress the importance of wildcards. It 
seemed like an unpleasant experience to go through countless permissions and 
apply them one at a time. I know this is somehow lessened by the fact that you 
will only do this once per role, which can then be assigned to different users. 
However calling a simple command like {{nodetool status}} will require 4~ 
different JMXResources under the hood. 

Simplifying the {{JMXResource}} might be the way to go but we should consider 
how much freedom we will lose from doing this. I was actually debating this 
very thing when I implemented it. Should I have only meta permission, should I 
expose all permissions or both? I settled on doing both to cater to every use 
case.

The problem is that the mapping between nodetool commands and permissions is 
somewhat confusing. For instance in your remapping proposal. One would have to 
give SELECT, DESCRIBE and EXECUTE to be able to get all information out of 
{{nodetool info}}. Not something one would expect from such a command. This is 
why these meta-permissions were born.

It is simpler to give {{MBREAD}} to a user, then to give 
{{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are 
possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if 
you happen to have such a use case. One could argue that this might be an 
uncommon use case, but I have a hard time ruling it out.

 However if the consensus is that we should simplify it, which does have it's 
advantages, then I agree with your proposal.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> Fix For: 3.x
>
>
> It would be useful to authenticate with JMX through Cassandra's internal 
> authentication. This would reduce the overhead of keeping passwords in files 
> on the machine and would consolidate passwords to one location. It would also 
> allow the possibility to handle JMX permissions in Cassandra.
> It could be done by creating our own JMX server and setting custom classes 
> for the authenticator and authorizer. We could then add some parameters where 
> the user could specify what authenticator and authorizer to use in case they 
> want to make their own.
> This could also be done by creating a premain method which creates a jmx 
> server. This would give us the feature without changing the Cassandra code 
> itself. However I believe this would be a good feature to have in Cassandra.
> I am currently working on a solution which creates a JMX server and uses a 
> custom authenticator and authorizer. It is currently build as a premain, 
> however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-10-13 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955032#comment-14955032
 ] 

Jan Karlsson edited comment on CASSANDRA-10091 at 10/13/15 2:35 PM:


Great points. Thank you for taking the time to review this.

First of all, I agree completely on the use of 
{{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 
and I only recently forward ported it. I just wanted to get it out so we can 
commence with the discussion. I agree that we will have to make use of the 
{{IAuthenticator::newSaslAuthenticator}} and we should Investigate further.

Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in 
your points on the subject. I cannot stress the importance of wildcards. It 
seemed like an unpleasant experience to go through countless permissions and 
apply them one at a time. I know this is somehow lessened by the fact that you 
will only do this once per role, which can then be assigned to different users. 
However calling a simple command like {{nodetool status}} will require 4~ 
different permissions under the hood while starting Jconsole can only be done 
by adding 10~ different permissions.

Simplifying the {{JMXResource}} might be the way to go but we should consider 
how much freedom we will lose from doing this. I was actually debating this 
very thing when I implemented it. Should I have only meta permission, should I 
expose all permissions or both? I settled on doing both to cater to every use 
case.

The problem is that the mapping between nodetool commands and permissions is 
somewhat confusing. For instance in your remapping proposal. One would have to 
give SELECT, DESCRIBE and EXECUTE to be able to get all information out of 
{{nodetool info}}. Not something one would expect from such a command. This is 
why these meta-permissions were born.

It is simpler to give {{MBREAD}} to a user, then to give 
{{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are 
possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if 
you happen to have such a use case. One could argue that this might be an 
uncommon use case, but I have a hard time ruling it out.

 However if the consensus is that we should simplify it, which does have it's 
advantages, then I agree with your proposal.


was (Author: jan karlsson):
Great points. Thank you for taking the time to review this.

First of all, I agree completely on the use of 
{{IAuthenticator::legacyAuthenticate}}. Originally this patch was against 2.1 
and I only recently forward ported it. I just wanted to get it out so we can 
commence with the discussion. I agree that we will have to make use of the 
{{IAuthenticator::newSaslAuthenticator}} and we should Investigate further.

Also great points on {{IAuthorizer::authorizeJMX}}. While I see the merit in 
your points on the subject. I cannot stress the importance of wildcards. It 
seemed like an unpleasant experience to go through countless permissions and 
apply them one at a time. I know this is somehow lessened by the fact that you 
will only do this once per role, which can then be assigned to different users. 
However calling a simple command like {{nodetool status}} will require 4~ 
different JMXResources under the hood. 

Simplifying the {{JMXResource}} might be the way to go but we should consider 
how much freedom we will lose from doing this. I was actually debating this 
very thing when I implemented it. Should I have only meta permission, should I 
expose all permissions or both? I settled on doing both to cater to every use 
case.

The problem is that the mapping between nodetool commands and permissions is 
somewhat confusing. For instance in your remapping proposal. One would have to 
give SELECT, DESCRIBE and EXECUTE to be able to get all information out of 
{{nodetool info}}. Not something one would expect from such a command. This is 
why these meta-permissions were born.

It is simpler to give {{MBREAD}} to a user, then to give 
{{MBGET|MBINSTANCEOF|MBQUERYNAMES}}. With this solution, both variants are 
possible. Furthermore giving only MBGET or MBINSTANCEOF is also an option, if 
you happen to have such a use case. One could argue that this might be an 
uncommon use case, but I have a hard time ruling it out.

 However if the consensus is that we should simplify it, which does have it's 
advantages, then I agree with your proposal.

> Align JMX authentication with internal authentication
> -
>
> Key: CASSANDRA-10091
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jan Karlsson
>Assignee: Jan Karlsson
>Priority: Minor
> 

[jira] [Commented] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do

2015-09-02 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726930#comment-14726930
 ] 

Jan Karlsson commented on CASSANDRA-8741:
-

LGTM.

Except i'm not finding the test in the dtests you linked.

> Running a drain before a decommission apparently the wrong thing to do
> --
>
> Key: CASSANDRA-8741
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8741
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise 
> 4.5.3)
>Reporter: Casey Marshall
>Assignee: Jan Karlsson
>Priority: Trivial
>  Labels: lhf
> Fix For: 2.1.x, 2.0.x
>
> Attachments: 8741.txt
>
>
> This might simply be a documentation issue. It appears that running "nodetool 
> drain" is a very wrong thing to do before running a "nodetool decommission".
> The idea was that I was going to safely shut off writes and flush everything 
> to disk before beginning the decommission. What happens is the "decommission" 
> call appears to fail very early on after starting, and afterwards, the node 
> in question is stuck in state LEAVING, but all other nodes in the ring see 
> that node as NORMAL, but down. No streams are ever sent from the node being 
> decommissioned to other nodes.
> The drain command does indeed shut down the "BatchlogTasks" executor 
> (org/apache/cassandra/service/StorageService.java, line 3445 in git tag 
> "cassandra-2.0.11") but the decommission process tries using that executor 
> when calling the "startBatchlogReplay" function 
> (org/apache/cassandra/db/BatchlogManager.java, line 123) called through 
> org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace 
> pasted below).
> This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4).
> So, either something is wrong with the drain/decommission commands, or it's 
> very wrong to run a drain before a decommission. What's worse, there seems to 
> be no way to recover this node once it is in this state; you need to shut it 
> down and run "removenode".
> My terminal output:
> {code}
> ubuntu@x:~$ nodetool drain
> ubuntu@x:~$ tail /var/log/^C
> ubuntu@x:~$ nodetool decommission
> Exception in thread "main" java.util.concurrent.RejectedExecutionException: 
> Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629)
> at 
> org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123)
> at 
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966)
> at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at 
> 

[jira] [Comment Edited] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do

2015-09-02 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726930#comment-14726930
 ] 

Jan Karlsson edited comment on CASSANDRA-8741 at 9/2/15 8:26 AM:
-

LGTM.

Except i'm not seeing the test being run in the dtests you linked.


was (Author: jan karlsson):
LGTM.

Except i'm not finding the test in the dtests you linked.

> Running a drain before a decommission apparently the wrong thing to do
> --
>
> Key: CASSANDRA-8741
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8741
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise 
> 4.5.3)
>Reporter: Casey Marshall
>Assignee: Jan Karlsson
>Priority: Trivial
>  Labels: lhf
> Fix For: 2.1.x, 2.0.x
>
> Attachments: 8741.txt
>
>
> This might simply be a documentation issue. It appears that running "nodetool 
> drain" is a very wrong thing to do before running a "nodetool decommission".
> The idea was that I was going to safely shut off writes and flush everything 
> to disk before beginning the decommission. What happens is the "decommission" 
> call appears to fail very early on after starting, and afterwards, the node 
> in question is stuck in state LEAVING, but all other nodes in the ring see 
> that node as NORMAL, but down. No streams are ever sent from the node being 
> decommissioned to other nodes.
> The drain command does indeed shut down the "BatchlogTasks" executor 
> (org/apache/cassandra/service/StorageService.java, line 3445 in git tag 
> "cassandra-2.0.11") but the decommission process tries using that executor 
> when calling the "startBatchlogReplay" function 
> (org/apache/cassandra/db/BatchlogManager.java, line 123) called through 
> org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace 
> pasted below).
> This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4).
> So, either something is wrong with the drain/decommission commands, or it's 
> very wrong to run a drain before a decommission. What's worse, there seems to 
> be no way to recover this node once it is in this state; you need to shut it 
> down and run "removenode".
> My terminal output:
> {code}
> ubuntu@x:~$ nodetool drain
> ubuntu@x:~$ tail /var/log/^C
> ubuntu@x:~$ nodetool decommission
> Exception in thread "main" java.util.concurrent.RejectedExecutionException: 
> Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629)
> at 
> org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123)
> at 
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966)
> at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)

[jira] [Commented] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do

2015-09-02 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727475#comment-14727475
 ] 

Jan Karlsson commented on CASSANDRA-8741:
-

Took it for a test spin.
+1

> Running a drain before a decommission apparently the wrong thing to do
> --
>
> Key: CASSANDRA-8741
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8741
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise 
> 4.5.3)
>Reporter: Casey Marshall
>Assignee: Jan Karlsson
>Priority: Trivial
>  Labels: lhf
> Fix For: 2.1.x, 2.0.x
>
> Attachments: 8741.txt
>
>
> This might simply be a documentation issue. It appears that running "nodetool 
> drain" is a very wrong thing to do before running a "nodetool decommission".
> The idea was that I was going to safely shut off writes and flush everything 
> to disk before beginning the decommission. What happens is the "decommission" 
> call appears to fail very early on after starting, and afterwards, the node 
> in question is stuck in state LEAVING, but all other nodes in the ring see 
> that node as NORMAL, but down. No streams are ever sent from the node being 
> decommissioned to other nodes.
> The drain command does indeed shut down the "BatchlogTasks" executor 
> (org/apache/cassandra/service/StorageService.java, line 3445 in git tag 
> "cassandra-2.0.11") but the decommission process tries using that executor 
> when calling the "startBatchlogReplay" function 
> (org/apache/cassandra/db/BatchlogManager.java, line 123) called through 
> org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace 
> pasted below).
> This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4).
> So, either something is wrong with the drain/decommission commands, or it's 
> very wrong to run a drain before a decommission. What's worse, there seems to 
> be no way to recover this node once it is in this state; you need to shut it 
> down and run "removenode".
> My terminal output:
> {code}
> ubuntu@x:~$ nodetool drain
> ubuntu@x:~$ tail /var/log/^C
> ubuntu@x:~$ nodetool decommission
> Exception in thread "main" java.util.concurrent.RejectedExecutionException: 
> Task 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 
> rejected from 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated,
>  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52]
> at 
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
> at 
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629)
> at 
> org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123)
> at 
> org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966)
> at 
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
> at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at 
> 

[jira] [Created] (CASSANDRA-10091) Align JMX authentication with internal authentication

2015-08-17 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-10091:


 Summary: Align JMX authentication with internal authentication
 Key: CASSANDRA-10091
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10091
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jan Karlsson
Assignee: Jan Karlsson
Priority: Minor


It would be useful to authenticate with JMX through Cassandra's internal 
authentication. This would reduce the overhead of keeping passwords in files on 
the machine and would consolidate passwords to one location. It would also 
allow the possibility to handle JMX permissions in Cassandra.

It could be done by creating our own JMX server and setting custom classes for 
the authenticator and authorizer. We could then add some parameters where the 
user could specify what authenticator and authorizer to use in case they want 
to make their own.

This could also be done by creating a premain method which creates a jmx 
server. This would give us the feature without changing the Cassandra code 
itself. However I believe this would be a good feature to have in Cassandra.

I am currently working on a solution which creates a JMX server and uses a 
custom authenticator and authorizer. It is currently build as a premain, 
however it would be great if we could put this in Cassandra instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-9657) Hint table doing unnecessary compaction

2015-06-26 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-9657:
---

 Summary: Hint table doing unnecessary compaction
 Key: CASSANDRA-9657
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9657
 Project: Cassandra
  Issue Type: Bug
 Environment: 2.1.7
Reporter: Jan Karlsson
Priority: Minor


I found some really strange behaviour. During the replay of a node I found this 
in the log:
{code}INFO [CompactionExecutor:7] CompactionTask.java:271 Compacted 1 sstables 
to 
[/var/lib/cassandra/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-120,].
  452,150,727 bytes to 452,150,727 (~100% of original) in 267,588ms = 
1.611449MB/s.  1 total partitions merged to 1.  Partition merge counts were 
{1:1, }{code}

This happened multiple times until the hint replay was completed and the 
sstables were removed.

I tried to replicate this by just starting up a cluster in ccm and killing a 
node for a few minutes. I got the same behaviour then.
{Code}
INFO  [CompactionExecutor:2] CompactionTask.java:270 - Compacted 1 sstables to 
[/home/ejankan/.ccm/hint/node3/data/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-2,].
  65,570 bytes to 65,570 (~100% of original) in 600ms = 0.104221MB/s.  1 total 
partitions merged to 1.  Partition merge counts were {1:1, }
{Code}

It seems weird to me that the file does not decrease in size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8741) Running a drain before a decommission apparently the wrong thing to do

2015-05-28 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8741:

Attachment: 8741.txt

Should work for both 2.1 and 2.0.

 Running a drain before a decommission apparently the wrong thing to do
 --

 Key: CASSANDRA-8741
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8741
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: Ubuntu 14.04; Cassandra 2.0.11.82 (Datastax Enterprise 
 4.5.3)
Reporter: Casey Marshall
Assignee: Jan Karlsson
Priority: Trivial
  Labels: lhf
 Fix For: 2.1.x, 2.0.x

 Attachments: 8741.txt


 This might simply be a documentation issue. It appears that running nodetool 
 drain is a very wrong thing to do before running a nodetool decommission.
 The idea was that I was going to safely shut off writes and flush everything 
 to disk before beginning the decommission. What happens is the decommission 
 call appears to fail very early on after starting, and afterwards, the node 
 in question is stuck in state LEAVING, but all other nodes in the ring see 
 that node as NORMAL, but down. No streams are ever sent from the node being 
 decommissioned to other nodes.
 The drain command does indeed shut down the BatchlogTasks executor 
 (org/apache/cassandra/service/StorageService.java, line 3445 in git tag 
 cassandra-2.0.11) but the decommission process tries using that executor 
 when calling the startBatchlogReplay function 
 (org/apache/cassandra/db/BatchlogManager.java, line 123) called through 
 org.apache.cassandra.service.StorageService.unbootstrap (see the stack trace 
 pasted below).
 This also failed in a similar way on Cassandra 1.2.13-ish (DSE 3.2.4).
 So, either something is wrong with the drain/decommission commands, or it's 
 very wrong to run a drain before a decommission. What's worse, there seems to 
 be no way to recover this node once it is in this state; you need to shut it 
 down and run removenode.
 My terminal output:
 {code}
 ubuntu@x:~$ nodetool drain
 ubuntu@x:~$ tail /var/log/^C
 ubuntu@x:~$ nodetool decommission
 Exception in thread main java.util.concurrent.RejectedExecutionException: 
 Task 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3008fa33 
 rejected from 
 org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor@1d6242e8[Terminated,
  pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 52]
 at 
 java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:325)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:530)
 at 
 java.util.concurrent.ScheduledThreadPoolExecutor.submit(ScheduledThreadPoolExecutor.java:629)
 at 
 org.apache.cassandra.db.BatchlogManager.startBatchlogReplay(BatchlogManager.java:123)
 at 
 org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2966)
 at 
 org.apache.cassandra.service.StorageService.decommission(StorageService.java:2934)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
 at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
 at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
 at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
 at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
 at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
 at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
 at 
 

[jira] [Commented] (CASSANDRA-8327) snapshots taken before repair are not cleared if snapshot fails

2015-04-07 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482767#comment-14482767
 ] 

Jan Karlsson commented on CASSANDRA-8327:
-

Would it be possible to send clearsnapshots messages after every RepairJob? I 
guess the problem is that we do not really know when the snapshot has 
completed, which would introduce a race condition where the clear can occur 
before the taking of the snapshot. Any other ideas for solving this without 
requiring a restart?

 snapshots taken before repair are not cleared if snapshot fails
 ---

 Key: CASSANDRA-8327
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8327
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: cassandra 2.0.10.71
Reporter: MASSIMO CELLI
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0


 running repair service the following directory was created for the snapshots:
 drwxr-xr-x 2 cassandra cassandra 36864 Nov 5 07:47 
 073d16e0-64c0-11e4-8e9a-7b3d4674c508 
 but the system.log reports the following error which suggests the snapshot 
 failed:
 ERROR [RMI TCP Connection(3251)-10.150.27.78] 2014-11-05 07:47:55,734 
 StorageService.java (line 2599) Repair session 
 073d16e0-64c0-11e4-8e9a-7b3d4674c508 for range 
 (7530018576963469312,7566047373982433280] failed with error 
 java.io.IOException: Failed during snapshot creation. 
 java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
 java.io.IOException: Failed during snapshot creation.  ERROR 
 [AntiEntropySessions:3312] 2014-11-05 07:47:55,731 RepairSession.java (line 
 288) [repair #073d16e0-64c0-11e4-8e9a-7b3d4674c508] session completed with 
 the following error java.io.IOException: Failed during snapshot creation.
 the problem is that the directory for the snapshots that fail are just left 
 on the disk and don't get cleaned up. They must be removed manually, which is 
 not ideal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8696) nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could not create snapshot

2015-02-02 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301157#comment-14301157
 ] 

Jan Karlsson commented on CASSANDRA-8696:
-

I was only able to reproduce this when the amount of data on disk was over 12G. 
From taking a quick glance at the code, this is caused by the snapshot process 
throwing a timeout.

 nodetool repair on cassandra 2.1.2 keyspaces return 
 java.lang.RuntimeException: Could not create snapshot
 -

 Key: CASSANDRA-8696
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8696
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeff Liu

 When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra 
 throw java exceptions: cannot create snapshot. 
 the error log from system.log:
 {noformat}
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 
 StreamResultFuture.java:166 - [Stream #692c1450-a692-11e4-9973-070e938df227 
 ID#0] Prepare completed. Receiving 2 files(221187 bytes), sending 5 
 files(632105 bytes)
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 
 StreamResultFuture.java:180 - [Stream #692c1450-a692-11e4-9973-070e938df227] 
 Session with /10.97.9.110 is complete
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 
 StreamResultFuture.java:212 - [Stream #692c1450-a692-11e4-9973-070e938df227] 
 All sessions completed
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 
 StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] 
 streaming task succeed, returning response to /10.98.194.68
 INFO  [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - 
 [Stream #692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for 
 Repair
 INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 
 StreamSession.java:213 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
 Starting streaming to /10.66.187.201
 INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 
 StreamCoordinator.java:209 - [Stream #692c6270-a692-11e4-9973-070e938df227, 
 ID#0] Beginning stream session with /10.66.187.201
 INFO  [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 
 StreamResultFuture.java:166 - [Stream #692c6270-a692-11e4-9973-070e938df227 
 ID#0] Prepare completed. Receiving 5 files(627994 bytes), sending 5 
 files(632105 bytes)
 INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,971 
 StreamResultFuture.java:180 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
 Session with /10.66.187.201 is complete
 INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 
 StreamResultFuture.java:212 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
 All sessions completed
 INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 
 StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] 
 streaming task succeed, returning response to /10.98.194.68
 ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error 
 occurred during snapshot phase
 java.lang.RuntimeException: Could not create snapshot at /10.97.9.110
 at 
 org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_45]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_45]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_45]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_45]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
 INFO  [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 
 - [repair #6f85e740-a692-11e4-9973-070e938df227] new session: will sync 
 /10.98.194.68, /10.66.187.201, /10.226.218.135 on range 
 (12817179804668051873746972069086
 2638799,12863540308359254031520865977436165] for events.[bigint0text, 
 bigint0boolean, bigint0int, dataset_catalog, column_categories, 
 bigint0double, bigint0bigint]
 ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 
 - [repair #685e3d00-a692-11e4-9973-070e938df227] session completed with the 
 following error
 java.io.IOException: Failed during snapshot creation.
 at 
 org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) 
 

[jira] [Comment Edited] (CASSANDRA-8696) nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could not create snapshot

2015-02-02 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14301157#comment-14301157
 ] 

Jan Karlsson edited comment on CASSANDRA-8696 at 2/2/15 11:08 AM:
--

We stumbled upon this issue aswell. I was only able to reproduce this when the 
amount of data on disk was over 12G. 


was (Author: jan karlsson):
I was only able to reproduce this when the amount of data on disk was over 12G. 
From taking a quick glance at the code, this is caused by the snapshot process 
throwing a timeout.

 nodetool repair on cassandra 2.1.2 keyspaces return 
 java.lang.RuntimeException: Could not create snapshot
 -

 Key: CASSANDRA-8696
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8696
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeff Liu

 When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra 
 throw java exceptions: cannot create snapshot. 
 the error log from system.log:
 {noformat}
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 
 StreamResultFuture.java:166 - [Stream #692c1450-a692-11e4-9973-070e938df227 
 ID#0] Prepare completed. Receiving 2 files(221187 bytes), sending 5 
 files(632105 bytes)
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 
 StreamResultFuture.java:180 - [Stream #692c1450-a692-11e4-9973-070e938df227] 
 Session with /10.97.9.110 is complete
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 
 StreamResultFuture.java:212 - [Stream #692c1450-a692-11e4-9973-070e938df227] 
 All sessions completed
 INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 
 StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] 
 streaming task succeed, returning response to /10.98.194.68
 INFO  [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - 
 [Stream #692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for 
 Repair
 INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 
 StreamSession.java:213 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
 Starting streaming to /10.66.187.201
 INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 
 StreamCoordinator.java:209 - [Stream #692c6270-a692-11e4-9973-070e938df227, 
 ID#0] Beginning stream session with /10.66.187.201
 INFO  [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 
 StreamResultFuture.java:166 - [Stream #692c6270-a692-11e4-9973-070e938df227 
 ID#0] Prepare completed. Receiving 5 files(627994 bytes), sending 5 
 files(632105 bytes)
 INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,971 
 StreamResultFuture.java:180 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
 Session with /10.66.187.201 is complete
 INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 
 StreamResultFuture.java:212 - [Stream #692c6270-a692-11e4-9973-070e938df227] 
 All sessions completed
 INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 
 StreamingRepairTask.java:96 - [repair #685e3d00-a692-11e4-9973-070e938df227] 
 streaming task succeed, returning response to /10.98.194.68
 ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error 
 occurred during snapshot phase
 java.lang.RuntimeException: Could not create snapshot at /10.97.9.110
 at 
 org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347) 
 ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_45]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_45]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_45]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_45]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
 INFO  [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 
 - [repair #6f85e740-a692-11e4-9973-070e938df227] new session: will sync 
 /10.98.194.68, /10.66.187.201, /10.226.218.135 on range 
 (12817179804668051873746972069086
 2638799,12863540308359254031520865977436165] for events.[bigint0text, 
 bigint0boolean, bigint0int, dataset_catalog, column_categories, 
 bigint0double, bigint0bigint]
 ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 
 - [repair #685e3d00-a692-11e4-9973-070e938df227] session completed with the 
 following error
 java.io.IOException: Failed during snapshot creation.
 at 
 org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
  ~[apache-cassandra-2.1.2.jar:2.1.2]
 at 

[jira] [Updated] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException

2015-01-20 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8643:

Environment: We are running on a three node cluster with three in 
replication(C* 2.1.1). It uses a default C* installation and STCS.  (was: We 
are running on a three node cluster with three in replication(C* 2.1.2). It 
uses a default C* installation and STCS.)

 merkle tree creation fails with NoSuchElementException
 --

 Key: CASSANDRA-8643
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8643
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: We are running on a three node cluster with three in 
 replication(C* 2.1.1). It uses a default C* installation and STCS.
Reporter: Jan Karlsson
 Fix For: 2.1.3


 We have a problem that we encountered during testing over the weekend. 
 During the tests we noticed that repairs started to fail. This error has 
 occured on multiple non-coordinator nodes during repair. It also ran at least 
 once without producing this error.
 We run repair -pr on all nodes on different days. CPU values were around 40% 
 and disk was 50% full.
 From what I understand, the coordinator asked for merkle trees from the other 
 two nodes. However one of the nodes fails to create his merkle tree.
 Unfortunately we do not have a way to reproduce this problem.
 The coordinator receives:
 {noformat}
 2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 
 [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for 
 censored (to [/xx.90, /xx.98, /xx.82])
 2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] 
 RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] 
 Received merkle tree for censored from /xx.90
 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] 
 RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session 
 completed with the following error
 org.apache.cassandra.exceptions.RepairException: [repair 
 #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at 
 org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] 
 CassandraDaemon.java:153 Exception in thread 
 Thread[AntiEntropySessions:76,5,RMI Runtime]
 java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: 
 [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at com.google.common.base.Throwables.propagate(Throwables.java:160) 
 ~[guava-16.0.jar:na]
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_51]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at 
 org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 

[jira] [Updated] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException

2015-01-20 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8643:

Reproduced In: 2.1.1  (was: 2.1.2)

 merkle tree creation fails with NoSuchElementException
 --

 Key: CASSANDRA-8643
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8643
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: We are running on a three node cluster with three in 
 replication(C* 2.1.2). It uses a default C* installation and STCS.
Reporter: Jan Karlsson
 Fix For: 2.1.3


 We have a problem that we encountered during testing over the weekend. 
 During the tests we noticed that repairs started to fail. This error has 
 occured on multiple non-coordinator nodes during repair. It also ran at least 
 once without producing this error.
 We run repair -pr on all nodes on different days. CPU values were around 40% 
 and disk was 50% full.
 From what I understand, the coordinator asked for merkle trees from the other 
 two nodes. However one of the nodes fails to create his merkle tree.
 Unfortunately we do not have a way to reproduce this problem.
 The coordinator receives:
 {noformat}
 2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 
 [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for 
 censored (to [/xx.90, /xx.98, /xx.82])
 2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] 
 RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] 
 Received merkle tree for censored from /xx.90
 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] 
 RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session 
 completed with the following error
 org.apache.cassandra.exceptions.RepairException: [repair 
 #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at 
 org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] 
 CassandraDaemon.java:153 Exception in thread 
 Thread[AntiEntropySessions:76,5,RMI Runtime]
 java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: 
 [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at com.google.common.base.Throwables.propagate(Throwables.java:160) 
 ~[guava-16.0.jar:na]
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_51]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at 
 org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 ... 3 common frames omitted
 {noformat}
 While one of the other nodes produces this error:
 {noformat}
 2015-01-09T17:55:59.574+0100 ERROR [ValidationExecutor:16] 

[jira] [Commented] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException

2015-01-20 Thread Jan Karlsson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14285307#comment-14285307
 ] 

Jan Karlsson commented on CASSANDRA-8643:
-

Unfortunately we have not encountered this bug since. It seemed like we went 
into some sort of bad state with repairs as most repairs on this cluster failed 
with this exception until we wiped it. I will keep you posted if I see this 
happen again.

 merkle tree creation fails with NoSuchElementException
 --

 Key: CASSANDRA-8643
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8643
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: We are running on a three node cluster with three in 
 replication(C* 2.1.1). It uses a default C* installation and STCS.
Reporter: Jan Karlsson
 Fix For: 2.1.3


 We have a problem that we encountered during testing over the weekend. 
 During the tests we noticed that repairs started to fail. This error has 
 occured on multiple non-coordinator nodes during repair. It also ran at least 
 once without producing this error.
 We run repair -pr on all nodes on different days. CPU values were around 40% 
 and disk was 50% full.
 From what I understand, the coordinator asked for merkle trees from the other 
 two nodes. However one of the nodes fails to create his merkle tree.
 Unfortunately we do not have a way to reproduce this problem.
 The coordinator receives:
 {noformat}
 2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 
 [repair #59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for 
 censored (to [/xx.90, /xx.98, /xx.82])
 2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] 
 RepairSession.java:171 [repair #59455950-9820-11e4-b5c1-7797064e1316] 
 Received merkle tree for censored from /xx.90
 2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] 
 RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session 
 completed with the following error
 org.apache.cassandra.exceptions.RepairException: [repair 
 #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at 
 org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  [na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
 at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
 2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] 
 CassandraDaemon.java:153 Exception in thread 
 Thread[AntiEntropySessions:76,5,RMI Runtime]
 java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: 
 [repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at com.google.common.base.Throwables.propagate(Throwables.java:160) 
 ~[guava-16.0.jar:na]
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
 ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
 ~[na:1.7.0_51]
 at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
 ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  ~[na:1.7.0_51]
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  [na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: 
 org.apache.cassandra.exceptions.RepairException: [repair 
 #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
 (-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
 at 
 org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
  ~[apache-cassandra-2.1.1.jar:2.1.1]
 at 
 

[jira] [Created] (CASSANDRA-8643) merkle tree creation fails with NoSuchElementException

2015-01-19 Thread Jan Karlsson (JIRA)
Jan Karlsson created CASSANDRA-8643:
---

 Summary: merkle tree creation fails with NoSuchElementException
 Key: CASSANDRA-8643
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8643
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: We are running on a three node cluster with three in 
replication(C* 2.1.2). It uses a default C* installation and STCS.
Reporter: Jan Karlsson


We have a problem that we encountered during testing over the weekend. 
During the tests we noticed that repairs started to fail. This error has 
occured on multiple non-coordinator nodes during repair. It also ran at least 
once without producing this error.

We run repair -pr on all nodes on different days. CPU values were around 40% 
and disk was 50% full.

From what I understand, the coordinator asked for merkle trees from the other 
two nodes. However one of the nodes fails to create his merkle tree.

Unfortunately we do not have a way to reproduce this problem.

The coordinator receives:
{noformat}
2015-01-09T17:55:57.091+0100  INFO [RepairJobTask:4] RepairJob.java:145 [repair 
#59455950-9820-11e4-b5c1-7797064e1316] requesting merkle trees for censored (to 
[/xx.90, /xx.98, /xx.82])
2015-01-09T17:55:58.516+0100  INFO [AntiEntropyStage:1] RepairSession.java:171 
[repair #59455950-9820-11e4-b5c1-7797064e1316] Received merkle tree for 
censored from /xx.90
2015-01-09T17:55:59.581+0100 ERROR [AntiEntropySessions:76] 
RepairSession.java:303 [repair #59455950-9820-11e4-b5c1-7797064e1316] session 
completed with the following error
org.apache.cassandra.exceptions.RepairException: [repair 
#59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
(-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
2015-01-09T17:55:59.582+0100 ERROR [AntiEntropySessions:76] 
CassandraDaemon.java:153 Exception in thread 
Thread[AntiEntropySessions:76,5,RMI Runtime]
java.lang.RuntimeException: org.apache.cassandra.exceptions.RepairException: 
[repair #59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
(-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
at com.google.common.base.Throwables.propagate(Throwables.java:160) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) 
~[apache-cassandra-2.1.1.jar:2.1.1]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 
~[na:1.7.0_51]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
   at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] Caused by: 
org.apache.cassandra.exceptions.RepairException: [repair 
#59455950-9820-11e4-b5c1-7797064e1316 on censored/censored, 
(-6476420463551243930,-6471459119674373580]] Validation failed in /xx.98
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:166)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:384)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:126)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
~[apache-cassandra-2.1.1.jar:2.1.1]
... 3 common frames omitted
{noformat}
While one of the other nodes produces this error:
{noformat}
2015-01-09T17:55:59.574+0100 ERROR [ValidationExecutor:16] Validator.java:232 
Failed creating a merkle tree for [repair #59455950-9820-11e4-b5c1-7797064e1316 
on censored/censored, (-6476420463551243930,-6471459119674373580]], /xx.82 (see 
log for details)
2015-01-09T17:55:59.578+0100 ERROR [ValidationExecutor:16] 
CassandraDaemon.java:153 Exception in thread 

[jira] [Updated] (CASSANDRA-8366) Repair grows data on nodes, causes load to become unbalanced

2015-01-14 Thread Jan Karlsson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Karlsson updated CASSANDRA-8366:

Description: 
There seems to be something weird going on when repairing data.

I have a program that runs 2 hours which inserts 250 random numbers and reads 
250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 

I use size-tiered compaction for my cluster. 

After those 2 hours I run a repair and the load of all nodes goes up. If I run 
incremental repair the load goes up alot more. I saw the load shoot up 8 times 
the original size multiple times with incremental repair. (from 2G to 16G)


with node 9 8 7 and 6 the repro procedure looked like this:
(Note that running full repair first is not a requirement to reproduce.)
{noformat}
After 2 hours of 250 reads + 250 writes per second:
UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

Repair -pr -par on all nodes sequentially
UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.6 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  2.17 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

after rolling restart
UN  9  1.47 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  1.5 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.19 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

compact all nodes sequentially
UN  9  989.99 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  994.75 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  1.46 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.82 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  1.98 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.3 GB 256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  3.71 GB256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

restart once more
UN  9  2 GB   256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.05 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  4.1 GB 256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  1.68 GB256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1
{noformat}

Is there something im missing or is this strange behavior?

  was:
There seems to be something weird going on when repairing data.

I have a program that runs 2 hours which inserts 250 random numbers and reads 
250 times per second. It creates 2 keyspaces with SimpleStrategy and RF of 3. 

I use size-tiered compaction for my cluster. 

After those 2 hours I run a repair and the load of all nodes goes up. If I run 
incremental repair the load goes up alot more. I saw the load shoot up 8 times 
the original size multiple times with incremental repair. (from 2G to 16G)


with node 9 8 7 and 6 the repro procedure looked like this:
(Note that running full repair first is not a requirement to reproduce.)

After 2 hours of 250 reads + 250 writes per second:
UN  9  583.39 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  584.01 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  583.72 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  583.84 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

Repair -pr -par on all nodes sequentially
UN  9  746.29 MB  256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  751.02 MB  256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  748.89 MB  256 ?   2b6b5d66-13c8-43d8-855c-290c0f3c3a0b  rack1
UN  6  758.34 MB  256 ?   b8bd67f1-a816-46ff-b4a4-136ad5af6d4b  rack1

repair -inc -par on all nodes sequentially
UN  9  2.41 GB256 ?   28220962-26ae-4eeb-8027-99f96e377406  rack1
UN  8  2.53 GB256 ?   f2de6ea1-de88-4056-8fde-42f9c476a090  rack1
UN  7  2.6 GB 256 ?   

  1   2   >