[jira] [Assigned] (KUDU-2032) Kerberos authentication fails with rdns disabled in krb5.conf

2017-08-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned KUDU-2032:
-

Assignee: Todd Lipcon

> Kerberos authentication fails with rdns disabled in krb5.conf
> -
>
> Key: KUDU-2032
> URL: https://issues.apache.org/jira/browse/KUDU-2032
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
>
> Currently if 'rnds = false' is configured in krb5.conf, Kudu ends up using 
> the IP addresses of remote hosts instead of the hostnames. This means that it 
> will look for krb5 principals by IP, even if actual hostnames have been 
> passed instead.
> This prevents krb5 from working properly in most environments where 
> rdns=false is set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KUDU-2091) Certificates with intermediate CA's do not work with Kudu

2017-08-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved KUDU-2091.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Resolved for 1.5.0. Worth noting that this applies only to the krpc library 
itself and Kudu (ie tservers and masters) still don't support CA chains for 
RPC. (though afaik they have always worked for the HTTPS code paths)

> Certificates with intermediate CA's do not work with Kudu
> -
>
> Key: KUDU-2091
> URL: https://issues.apache.org/jira/browse/KUDU-2091
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.4.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Critical
> Fix For: 1.5.0
>
>
> Certificates with intermediate CA's and chain certificates are not recognized 
> by the Kudu security library. We need to track down the root of the problem 
> and enable support for these certificates.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over

2017-08-15 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128092#comment-16128092
 ] 

Mike Percy commented on KUDU-2033:
--

Linking to KUDU-1188 for tracking RYW / leader leases

> Add a 'torture' scenario to verify Java client's behavior during fail-over 
> ---
>
> Key: KUDU-2033
> URL: https://issues.apache.org/jira/browse/KUDU-2033
> Project: Kudu
>  Issue Type: Test
>  Components: client, java
>Reporter: Alexey Serbin
>Assignee: Edward Fancher
>  Labels: newbie, newbie++
>
> For the Kudu Java client we have {{TestLeaderFailover}} test which verifies 
> how the client handles the tablet server fail-over scenario.  However, the 
> test covers only one fail-over event and mainly performs write operations 
> while the backend handles the 'unexpected crash' of the tablet server.
> It would be nice to add more tests which cover the client's fail-over 
> behavior:
>   * Add the mixed workload scenario, i.e. combine inserts/scans during the 
> fail-over.  Running the scans would not only verify that the data eventually 
> reaches the destination, but verify that the client automatically retries the 
> scan operations and eventually succeeds reading the data from the cluster.
>   * Induce more fail-over events while running the scenario, i.e. pause and 
> then resume the tservers processes many more times and run the test longer.  
> This is to spot possible bugs during the transition processes and occurrence 
> of multiple fail-over events.
>   * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
> mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
> cover the retry code paths for both cases (as of now, I could see only the 
> LEADER_ONLY path covered, but I might be mistaken).
> The general idea is to make sure the Java client during fail-over events:
> * Retries write and read operations automatically on an error happened due to 
> a fail-over event.
> * Does not silently lose any data: if the client cannot send the data due to 
> timeout or running out of retry attempts, it should report on that.
>



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KUDU-2100) Verify Java client's behavior for tserver and master fail-over scenario

2017-08-15 Thread Edward Fancher (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Fancher reassigned KUDU-2100:


Assignee: Edward Fancher

> Verify Java client's behavior for tserver and master fail-over scenario
> ---
>
> Key: KUDU-2100
> URL: https://issues.apache.org/jira/browse/KUDU-2100
> Project: Kudu
>  Issue Type: Test
>Reporter: Alexey Serbin
>Assignee: Edward Fancher
>
> This is to introduce a scenario where both the leader tserver and leader 
> master 'unexpectedly crash' during the run. The idea is to verify that the 
> client automatically updates its metacache even if the leader master changes 
> and manages to send the data to the destination server eventually.
> Mike suggested the following test scenario:
> # Have a configuration with 3 master servers, 6 tablet servers, and a table 
> consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
> are hosted by tablet servers TS1, TS2, and TS3.
> # Start the Kudu cluster.
> # Run the client to insert at least one row into the table.
> # Stop the client's activity, but keep the client object alive to keep it 
> ready for the next steps.
> # 3 times: permanently kill the leader of the tablet, so the tablet 
> eventually migrates to and is hosted by tablet servers TS4, TS5, TS6.
> # Kill the leader master (after the configuration change is committed).
> # Run the pre-warmed client to insert some data into the table again.  Doing 
> so, the client should refresh its metadata from the new leader master and be 
> able to send the data to the right destination.
> # Count the number of rows in the table to make sure it matches the 
> expectation.
> There was a discussion on when to kill the leader master: prior or after 
> moving the table to the new set of tablet servers.  It seems the latter case 
> (the sequence suggested above) allows covering a situation when no master 
> server recognizes itself as a leader.  The client should retry in that case 
> as well and eventually receive the tablet location info from the established 
> leader master.  If possible, let's implement the sequence for the former case 
> as well as an additional test.
> The general idea is to make sure the Java client during fail-over events:
> * Retries write and read operations automatically on an error happened due to 
> a fail-over event.
> * Does not silently lose any data: if the client cannot send the data due to 
> timeout or running out of retry attempts, it should report on that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-2100) Verify Java client's behavior for tserver and master fail-over scenario

2017-08-15 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2100:

Description: 
This is to introduce a scenario where both the leader tserver and leader master 
'unexpectedly crash' during the run. The idea is to verify that the client 
automatically updates its metacache even if the leader master changes and 
manages to send the data to the destination server eventually.

Mike suggested the following test scenario:
# Have a configuration with 3 master servers, 6 tablet servers, and a table 
consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
are hosted by tablet servers TS1, TS2, and TS3.
# Start the Kudu cluster.
# Run the client to insert at least one row into the table.
# Stop the client's activity, but keep the client object alive to keep it ready 
for the next steps.
# 3 times: permanently kill the leader of the tablet, so the tablet eventually 
migrates to and is hosted by tablet servers TS4, TS5, TS6.
# Kill the leader master (after the configuration change is committed).
# Run the pre-warmed client to insert some data into the table again.  Doing 
so, the client should refresh its metadata from the new leader master and be 
able to send the data to the right destination.
# Count the number of rows in the table to make sure it matches the expectation.

There was a discussion on when to kill the leader master: prior or after moving 
the table to the new set of tablet servers.  It seems the latter case (the 
sequence suggested above) allows covering a situation when no master server 
recognizes itself as a leader.  The client should retry in that case as well 
and eventually receive the tablet location info from the established leader 
master.  If possible, let's implement the sequence for the former case as well 
as an additional test.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.

  was:
This is to introduce a scenario where both the leader tserver and leader master 
'unexpectedly crash' during the run. The idea is to verify that the client 
automatically updates its metacache even if the leader master changes and 
manages to send the data to the destination server eventually.

Mike suggested the following test scenario:
# Have a configuration with 3 master servers, 6 tablet servers, and a table 
consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
are hosted by tablet servers TS1, TS2, and TS3.
# Start the Kudu cluster.
# Run the client to insert at least one row into the table.
# Stop the client's activity, but keep the client object alive to keep it ready 
for the next steps.
# 3 times: permanently kill the leader of the tablet, so the tablet eventually 
migrates to and is hosted by tablet servers TS4, TS5, TS6.
# Kill the leader master (after the configuration change is committed).
# Run the pre-warmed client to insert some data into the table again.  Doing 
so, the client should refresh its metadata from the new leader master and be 
able to send the data to the right destination.
# Count the number of rows in the table to make sure it matches the expectation.

There was a discussion on when to kill the leader master: prior or after moving 
the table to the new set of tablet servers.  It seems the latter case (the 
sequence suggested above) allows covering a situation when no master server 
recognizes itself as a leader.  The client should retry in that case as well 
and eventually receive the tablet location info from the established leader 
master.  If possible, the former case should be covered by the test as well.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.


> Verify Java client's behavior for tserver and master fail-over scenario
> ---
>
> Key: KUDU-2100
> URL: https://issues.apache.org/jira/browse/KUDU-2100
> Project: Kudu
>  Issue Type: Test
>Reporter: Alexey Serbin
>
> This is to introduce a scenario where both the leader tserver and leader 
> master 'unexpectedly crash' during the run. The idea is to verify that the 
> client automatically updates its metacache even if the leader master changes 
> and manages to send the data to the destination server eventually.
> Mike suggested the following test scenario:
> # Have a configuration 

[jira] [Updated] (KUDU-2100) Verify Java client's behavior for tserver and master fail-over scenario

2017-08-15 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2100:

Description: 
This is to introduce a scenario where both the leader tserver and leader master 
'unexpectedly crash' during the run. The idea is to verify that the client 
automatically updates its metacache even if the leader master changes and 
manages to send the data to the destination server eventually.

Mike suggested the following test scenario:
# Have a configuration with 3 master servers, 6 tablet servers, and a table 
consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
are hosted by tablet servers TS1, TS2, and TS3.
# Start the Kudu cluster.
# Run the client to insert at least one row into the table.
# Stop the client's activity, but keep the client object alive to keep it ready 
for the next steps.
# 3 times: permanently kill the leader of the tablet, so the tablet eventually 
migrates to and is hosted by tablet servers TS4, TS5, TS6.
# Kill the leader master (after the configuration change is committed).
# Run the pre-warmed client to insert some data into the table again.  Doing 
so, the client should refresh its metadata from the new leader master and be 
able to send the data to the right destination.
# Count the number of rows in the table to make sure it matches the expectation.

There was a discussion on when to kill the leader master: prior or after moving 
the table to the new set of tablet servers.  It seems the latter case (the 
sequence suggested above) allows covering a situation when no master server 
recognizes itself as a leader.  The client should retry in that case as well 
and eventually receive the tablet location info from the established leader 
master.  If possible, the former case should be covered by the test as well.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.

  was:
This is to introduce a scenario where both the leader tserver and leader master 
'unexpectedly crash' during the run. The idea is to verify that the client 
automatically updates its metacache even if the leader master changes and 
manages to send the data to the destination server eventually.

Mike suggested the following test scenario:
# Have a configuration with 3 master servers, 6 tablet servers, and a table 
consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
are hosted by tablet servers TS1, TS2, and TS3.
# Start the Kudu cluster.
# Run the client to insert at least one row into the table.
# Stop the client's activity, but keep the client object alive to keep it ready 
for the next steps.
# 3 times: permanently kill the leader of the tablet, so the tablet eventually 
migrates to and is hosted by tablet servers TS4, TS5, TS6.
# Kill the leader master (after the configuration change is committed).
# Run the pre-warmed client to insert some data into the table again.  Doing 
so, the client should refresh its metadata from the new leader master and be 
able to send the data to the right destination.
# Count the number of rows in the table to make sure it matches the expectation.

There was a discussion on when to kill the leader master: prior or after moving 
the table to the new set of tablet servers.  It seems the latter case (the 
sequence suggested above) allows covering a situation when the no master server 
recognizes itself as a leader.  The client should retry in that case as well 
and eventually receive the tablet location info from the established leader 
master.  If possible, the former case should be covered by the test as well.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.


> Verify Java client's behavior for tserver and master fail-over scenario
> ---
>
> Key: KUDU-2100
> URL: https://issues.apache.org/jira/browse/KUDU-2100
> Project: Kudu
>  Issue Type: Test
>Reporter: Alexey Serbin
>
> This is to introduce a scenario where both the leader tserver and leader 
> master 'unexpectedly crash' during the run. The idea is to verify that the 
> client automatically updates its metacache even if the leader master changes 
> and manages to send the data to the destination server eventually.
> Mike suggested the following test scenario:
> # Have a configuration with 3 master servers, 6 

[jira] [Created] (KUDU-2100) Verify Java client's behavior for tserver and master fail-over scenario

2017-08-15 Thread Alexey Serbin (JIRA)
Alexey Serbin created KUDU-2100:
---

 Summary: Verify Java client's behavior for tserver and master 
fail-over scenario
 Key: KUDU-2100
 URL: https://issues.apache.org/jira/browse/KUDU-2100
 Project: Kudu
  Issue Type: Test
Reporter: Alexey Serbin


This is to introduce a scenario where both the leader tserver and leader master 
'unexpectedly crash' during the run. The idea is to verify that the client 
automatically updates its metacache even if the leader master changes and 
manages to send the data to the destination server eventually.

Mike suggested the following test scenario:
# Have a configuration with 3 master servers, 6 tablet servers, and a table 
consisting of 1 tablet with replication factor of 3.  Let's assume the tablet 
are hosted by tablet servers TS1, TS2, and TS3.
# Start the Kudu cluster.
# Run the client to insert at least one row into the table.
# Stop the client's activity, but keep the client object alive to keep it ready 
for the next steps.
# 3 times: permanently kill the leader of the tablet, so the tablet eventually 
migrates to and is hosted by tablet servers TS4, TS5, TS6.
# Kill the leader master (after the configuration change is committed).
# Run the pre-warmed client to insert some data into the table again.  Doing 
so, the client should refresh its metadata from the new leader master and be 
able to send the data to the right destination.
# Count the number of rows in the table to make sure it matches the expectation.

There was a discussion on when to kill the leader master: prior or after moving 
the table to the new set of tablet servers.  It seems the latter case (the 
sequence suggested above) allows covering a situation when the no master server 
recognizes itself as a leader.  The client should retry in that case as well 
and eventually receive the tablet location info from the established leader 
master.  If possible, the former case should be covered by the test as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over

2017-08-15 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2033:

Description: 
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * Add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over.  Running the scans would not only verify that the data eventually 
reaches the destination, but verify that the client automatically retries the 
scan operations and eventually succeeds reading the data from the cluster.
  * Induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer.  
This is to spot possible bugs during the transition processes and occurrence of 
multiple fail-over events.
  * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
cover the retry code paths for both cases (as of now, I could see only the 
LEADER_ONLY path covered, but I might be mistaken).

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.
   

  was:
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * Add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over.  Running the scans would not only verify that the data eventually 
reaches the destination, but verify that the client automatically retries the 
scan operations and eventually succeeds reading the data from the cluster.
  * Induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer.  
This is to spot possible bugs during the transition processes and occurrence of 
multiple fail-over events.
  * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
cover the retry code paths for both cases (as of now, I could see only the 
LEADER_ONLY path covered, but I might be mistaken).
  * Extra: add the multi-master scenario, where both the leader tserver and 
leader master 'unexpectedly crash' during the run.  The idea is to verify that 
the client automatically updates its metacache even if the leader master 
changes and manages to send the data to the destination server eventually.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.
   


> Add a 'torture' scenario to verify Java client's behavior during fail-over 
> ---
>
> Key: KUDU-2033
> URL: https://issues.apache.org/jira/browse/KUDU-2033
> Project: Kudu
>  Issue Type: Test
>  Components: client, java
>Reporter: Alexey Serbin
>Assignee: Edward Fancher
>  Labels: newbie, newbie++
>
> For the Kudu Java client we have {{TestLeaderFailover}} test which verifies 
> how the client handles the tablet server fail-over scenario.  However, the 
> test covers only one fail-over event and mainly performs write operations 
> while the backend handles the 'unexpected crash' of the tablet server.
> It would be nice to add more tests which cover the client's fail-over 
> behavior:
>   * Add the mixed workload scenario, i.e. combine inserts/scans during the 
> fail-over.  Running the scans would not only verify that the data eventually 
> reaches the destination, but verify that the client automatically retries the 
> scan operations and eventually succeeds reading the data from the cluster.
>   * Induce more fail-over events while running the scenario, i.e. pause and 
> then resume the tservers processes many more times and run the test longer.  
> This is to spot 

[jira] [Commented] (KUDU-2096) Document necessary configuration for Kerberos with master CNAMEs

2017-08-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127962#comment-16127962
 ] 

Todd Lipcon commented on KUDU-2096:
---

If you check out the chart in KUDU-2032, you can actually see the behavior that 
MIT krb5 has for different settings for an example of a CNAME. I think the 
defaults (canonicalize_host_name = true, rdns = true) would work as expected 
for the CNAME config -- it would resolve all the way to an IP, then reverse 
back to the true FQDN of the host, and use that as a principal. The trick is 
that it's possible to configure krb5 to do only the "cname -> actual name" 
step, which might actually not match the reversed IP.

> Document necessary configuration for Kerberos with master CNAMEs
> 
>
> Key: KUDU-2096
> URL: https://issues.apache.org/jira/browse/KUDU-2096
> Project: Kudu
>  Issue Type: Task
>  Components: documentation, security
>Reporter: Todd Lipcon
>
> Currently our docs recommend using CNAMEs for master addresses to simplify 
> moving them around. However, if clients connect to a master with its 
> non-canonical name, there are some complications with Kerberos principals, 
> etc. We should test and document the necessary steps for such a configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2096) Document necessary configuration for Kerberos with master CNAMEs

2017-08-15 Thread Attila Bukor (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127878#comment-16127878
 ] 

Attila Bukor commented on KUDU-2096:


[~tlipcon] this approach wouldn't work if the user has CNAME-s for the host 
like we suggest in the docs, e.g. the way you used in your previous example, 
right?

{code}
kudumaster.example.com CNAME server1.example.com
server1.example.com A 1.2.3.4
4.3.2.1.in-addr.arpa. PTR server1.example.com
{code}

> Document necessary configuration for Kerberos with master CNAMEs
> 
>
> Key: KUDU-2096
> URL: https://issues.apache.org/jira/browse/KUDU-2096
> Project: Kudu
>  Issue Type: Task
>  Components: documentation, security
>Reporter: Todd Lipcon
>
> Currently our docs recommend using CNAMEs for master addresses to simplify 
> moving them around. However, if clients connect to a master with its 
> non-canonical name, there are some complications with Kerberos principals, 
> etc. We should test and document the necessary steps for such a configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2096) Document necessary configuration for Kerberos with master CNAMEs

2017-08-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127875#comment-16127875
 ] 

Todd Lipcon commented on KUDU-2096:
---

It appears the JDK Kerberos implementation does some limited canonicalization: 
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/security/krb5/PrincipalName.java#L386

Specifically, it calls {{InetAddress.getByName}} and if the resulting name is 
just more-qualified version of the original name ("foo" -> "foo.example.com") 
then it will do that canonicalization.

It seems that if we are OK using some internal APIs, we can use 
sun.security.krb5.Config to read the configured krb5.conf and match the 
behavior of the C++ client without having to add a new Java-specific 
configuration in the client.

> Document necessary configuration for Kerberos with master CNAMEs
> 
>
> Key: KUDU-2096
> URL: https://issues.apache.org/jira/browse/KUDU-2096
> Project: Kudu
>  Issue Type: Task
>  Components: documentation, security
>Reporter: Todd Lipcon
>
> Currently our docs recommend using CNAMEs for master addresses to simplify 
> moving them around. However, if clients connect to a master with its 
> non-canonical name, there are some complications with Kerberos principals, 
> etc. We should test and document the necessary steps for such a configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2032) Kerberos authentication fails with rdns disabled in krb5.conf

2017-08-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127869#comment-16127869
 ] 

Todd Lipcon commented on KUDU-2032:
---

Wrote a quick program to test out the different behavior of the configs on a 
few domain names: https://gist.github.com/a2ca8c434c14520e10da65d47e50e350

{code}
www.cloudera.com

-canon  -rdns   www.cloudera.com
-canon  +rdns   www.cloudera.com
+canon  -rdns   aem-prod-external-elb-1751714427.us-west-1.elb.amazonaws.com
+canon  +rdns   ec2-52-52-88-106.us-west-1.compute.amazonaws.com

localhost

-canon  -rdns   localhost
-canon  +rdns   localhost
+canon  -rdns   localhost
+canon  +rdns   localhost

127.0.0.1

-canon  -rdns   127.0.0.1
-canon  +rdns   127.0.0.1
+canon  -rdns   127.0.0.1
+canon  +rdns   localhost
{code}

> Kerberos authentication fails with rdns disabled in krb5.conf
> -
>
> Key: KUDU-2032
> URL: https://issues.apache.org/jira/browse/KUDU-2032
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> Currently if 'rnds = false' is configured in krb5.conf, Kudu ends up using 
> the IP addresses of remote hosts instead of the hostnames. This means that it 
> will look for krb5 principals by IP, even if actual hostnames have been 
> passed instead.
> This prevents krb5 from working properly in most environments where 
> rdns=false is set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2032) Kerberos authentication fails with rdns disabled in krb5.conf

2017-08-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127859#comment-16127859
 ] 

Todd Lipcon commented on KUDU-2032:
---

Just dropping a few more notes from more reading I did today:

There are actually two relevant krb5 configs related to service name 
canonicalization: *dns_canonicalize_hostname* and *rdns*

- *dns_canonicalize_hostname* seems to be rarely changed from its default 
(true). If this is set, krb5 calls getaddrinfo(host) with AI_CANONNAME set, and 
then uses the returned 'canonhost' if set.
- *rdns* - if set, and the previous DNS query returned an address, then it does 
a reverse lookup using getnameinfo(), and if that succeeds, uses that instead 
to replace the above 'canonhost'.

The code is in the {{canon_hostname}} function in {{sn2princ.c}} file in the 
krb5 source.

> Kerberos authentication fails with rdns disabled in krb5.conf
> -
>
> Key: KUDU-2032
> URL: https://issues.apache.org/jira/browse/KUDU-2032
> Project: Kudu
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> Currently if 'rnds = false' is configured in krb5.conf, Kudu ends up using 
> the IP addresses of remote hosts instead of the hostnames. This means that it 
> will look for krb5 principals by IP, even if actual hostnames have been 
> passed instead.
> This prevents krb5 from working properly in most environments where 
> rdns=false is set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KUDU-2099) Drop Java 7 Support

2017-08-15 Thread Grant Henke (JIRA)
Grant Henke created KUDU-2099:
-

 Summary: Drop Java 7 Support
 Key: KUDU-2099
 URL: https://issues.apache.org/jira/browse/KUDU-2099
 Project: Kudu
  Issue Type: Task
  Components: java
Affects Versions: 1.4.0
Reporter: Grant Henke
Assignee: Grant Henke


Java 8 has been out for quite some time and with Java 9 coming we should 
consider dropping Java 7 support. This also would allow us to update some 
libraries that have also decided to move to Java 8 only including Spark 2.2.0 
and Guava 23.0.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KUDU-1726) Avoid fsync-per-block in tablet copy

2017-08-15 Thread Hao Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao reassigned KUDU-1726:
-

Assignee: Hao Hao

> Avoid fsync-per-block in tablet copy
> 
>
> Key: KUDU-1726
> URL: https://issues.apache.org/jira/browse/KUDU-1726
> Project: Kudu
>  Issue Type: Bug
>  Components: tablet
>Reporter: Mike Percy
>Assignee: Hao Hao
>
> We should be able to do a full tablet copy, keeping in mind which blocks and 
> WAL segments have changed, and then do a bulk sync-to-disk once the full 
> operation is complete. This would allow the kernel to schedule the IO at its 
> leisure until durability is actually required.
> This will likely require changes from the BlockManger API and LogBlockManager 
> implementation to support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (KUDU-2033) Add a 'torture' scenario to verify Java client's behavior during fail-over

2017-08-15 Thread Alexey Serbin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KUDU-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Serbin updated KUDU-2033:

Description: 
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * Add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over.  Running the scans would not only verify that the data eventually 
reaches the destination, but verify that the client automatically retries the 
scan operations and eventually succeeds reading the data from the cluster.
  * Induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer.  
This is to spot possible bugs during the transition processes and occurrence of 
multiple fail-over events.
  * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
cover the retry code paths for both cases (as of now, I could see only the 
LEADER_ONLY path covered, but I might be mistaken).
  * Extra: add the multi-master scenario, where both the leader tserver and 
leader master 'unexpectedly crash' during the run.  The idea is to verify that 
the client automatically updates its metacache even if the leader master 
changes and manages to send the data to the destination server eventually.

The general idea is to make sure the Java client during fail-over events:
* Retries write and read operations automatically on an error happened due to a 
fail-over event.
* Does not silently lose any data: if the client cannot send the data due to 
timeout or running out of retry attempts, it should report on that.
   

  was:
For the Kudu Java client we have {{TestLeaderFailover}} test which verifies how 
the client handles the tablet server fail-over scenario.  However, the test 
covers only one fail-over event and mainly performs write operations while the 
backend handles the 'unexpected crash' of the tablet server.

It would be nice to add more tests which cover the client's fail-over behavior:
  * add the mixed workload scenario, i.e. combine inserts/scans during the 
fail-over
  * induce more fail-over events while running the scenario, i.e. pause and 
then resume the tservers processes many more times and run the test longer
  * add the multi-master scenario, where both the leader tserver and leader 
master 'unexpectedly crash' during the run
  * in the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
mode to exercise RYW (Read-Your-Writes) behavior and add assertions to make 
sure the RYW behavior is observed as expected
   


> Add a 'torture' scenario to verify Java client's behavior during fail-over 
> ---
>
> Key: KUDU-2033
> URL: https://issues.apache.org/jira/browse/KUDU-2033
> Project: Kudu
>  Issue Type: Test
>  Components: client, java
>Reporter: Alexey Serbin
>Assignee: Edward Fancher
>  Labels: newbie, newbie++
>
> For the Kudu Java client we have {{TestLeaderFailover}} test which verifies 
> how the client handles the tablet server fail-over scenario.  However, the 
> test covers only one fail-over event and mainly performs write operations 
> while the backend handles the 'unexpected crash' of the tablet server.
> It would be nice to add more tests which cover the client's fail-over 
> behavior:
>   * Add the mixed workload scenario, i.e. combine inserts/scans during the 
> fail-over.  Running the scans would not only verify that the data eventually 
> reaches the destination, but verify that the client automatically retries the 
> scan operations and eventually succeeds reading the data from the cluster.
>   * Induce more fail-over events while running the scenario, i.e. pause and 
> then resume the tservers processes many more times and run the test longer.  
> This is to spot possible bugs during the transition processes and occurrence 
> of multiple fail-over events.
>   * In the mixed workload scenarios, run scan operations in READ_AT_SNAPSHOT 
> mode with different selectors: LEADER_ONLY and CLOSEST_REPLICA.  That's to 
> cover the retry code paths for both cases (as of now, I could see only the 
> LEADER_ONLY path covered, but I might be mistaken).
>   * Extra: add the multi-master scenario, where both the leader tserver and 
> leader master 'unexpectedly crash' during the run.  The idea is to verify 
> that the client automatically updates its metacache even if the leader