[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1: Code-Review+2

sgtm.

As a side note, what do you think of adding PARTIAL_WRITE mode for TlsContext?  
 So that Socket::Write() and TlsSocket::Write() would behave similar.

If we are eager to fix the flakes, I think it's possible to add the test in a 
separate changelist.


--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 21:35:49 +
Gerrit-HasComments: No


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
> How do you know it will call it with the same length of 10?
I think the code as it is today will always call with the same length. The edge 
case is if a partial write happens. But if Writev() does a partial write of an 
iovec/slice, it returns back to the caller to adjust the offset within that 
slice:
https://github.com/apache/kudu/blob/f9a6a1b2d15feb8e2c044f20248774105a3a62bf/src/kudu/security/tls_socket.cc#L87

So there wouldn't be a case where writev() calls SSL_write() with a different 
length after a failure.

But I agree that's not in the contract and it's best not to depend on 
implementation details that could change at any point, which may invalidate 
this fix.



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:49:33 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
> Ah, it seems I misread the docs regarding SSL_write() and SSL_MODE_ACCEPT_M
Correction: actually, there is a restriction on the length for the follow-up 
write: it should not be less than the length specified when SSL_write() 
returned SSL_ERROR_WANT_WRITE, as seen from ssl/s3_pkt.c:

if ((s->s3->wpend_tot > (int)len) 
|| ((s->s3->wpend_buf != buf) &&
!(s->mode & SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER))
|| (s->s3->wpend_type != type)) {
SSLerr(SSL_F_SSL3_WRITE_PENDING, SSL_R_BAD_WRITE_RETRY);
return (-1);
} 

And specifying SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER would not help here.



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:48:08 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
> In other words, I think it's brittle to assume the caller will call it with
Ah, it seems I misread the docs regarding SSL_write() and 
SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER.  Only the buffer needs to be the same, 
right.  No need to require the same length.

So, no need for SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER mode -- the fix should be 
enough.



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:40:03 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
> How do you know it will call it with the same length of 10?
In other words, I think it's brittle to assume the caller will call it with the 
same length.  The offset in the buffer will be the same, yes, but it's brittle 
to assume the caller will use the same length for the next call, since that's 
not in the contract of that OutboundTransfer::SendBuffer API, as I understand.

Maybe, we should also enable SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER for the 
TlsContext.

Also, I found that enabling SSL_MODE_ENABLE_PARTIAL_WRITE for TlsContext 
(without any other changes) fixes the issue with that error in LargeWrites 
test.  But this fix is also needed to have consistent behavior of 
TlsSocket::Writev().

Overall, I think enabling SSL_MODE_ENABLE_PARTIAL_WRITE might be beneficial to 
have as well: so far callers of TlsSocket::Write() handle partial writes 
correctly, and there will be less difference in behavior of TlsSocket::Write() 
and Socket::Write() in that case.



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:34:36 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
> It would get the same parameters, wouldn't it?
How do you know it will call it with the same length of 10?



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:22:13 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
> Does this really help?
It would get the same parameters, wouldn't it?

If there is a vector with 2 frames of 10 bytes each, Writev() would call 
Write() twice with a different buffer:

Write(buf, 10, ...);  // succeeds

Write(buf+10, 10, ...); // fails


If the second call fails with 'SSL_ERROR_WANT_WRITE', we go back up to the 
caller with Status::OK() and 10 bytes written. The caller would retry Writev() 
with 'buf+10' as the new offset, so SSL_write() would see the same offset as it 
last tried with.



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:17:37 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc
File src/kudu/security/tls_socket.cc:

http://gerrit.cloudera.org:8080/#/c/8570/1/src/kudu/security/tls_socket.cc@99
PS1, Line 99:   if (total_written > 0 && 
Socket::IsTemporarySocketError(write_status.posix_code())) {
: return Status::OK();
:   }
Does this really help?

I'm not sure that would help in some cases.

My concern is if it was a partial write indeed, the upper-level caller (i.e. 
OutboundTransfer::SendBuffer()) will adjust the cur_offset_in_slice_, so the 
next time it will call TlsSocket::Writev() with different buffer and offset, 
that would result calling SSL_write() with different parameters.  However, 
SSL_write() expects the follow-up call after SSL_ERROR_WANT_WRITE to be exactly 
with the same set parameters that was used when SSL_write() returned 
SSL_ERROR_WANT_WRITE.



--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 19:12:40 +
Gerrit-HasComments: Yes


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-15 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8570 )

Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..


Patch Set 1: Code-Review+1

The fix makes sense to me.


--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Comment-Date: Thu, 16 Nov 2017 07:27:48 +
Gerrit-HasComments: No


[kudu-CR] WIP: tls socket: properly handle temporary socket errors in Writev

2017-11-15 Thread Todd Lipcon (Code Review)
Hello Alexey Serbin, Sailesh Mukil,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/8570

to review the following change.


Change subject: WIP: tls_socket: properly handle temporary socket errors in 
Writev
..

WIP: tls_socket: properly handle temporary socket errors in Writev

This fixes a bug which caused RaftConsensusITest.TestLargeBatches to
fail when run under stress, as in the following command line:

taskset -c 0-4 \
 build/latest/bin/raft_consensus-itest \
   --gtest_filter=\*LargeBat\* \
   --stress-cpu-threads=8

This would produce an error like:
Network error: failed to write to TLS socket: error:1409F07F:SSL 
routines:SSL3_WRITE_PENDING:bad write retry:s3_pkt.c:878

This means that we were retrying a write after getting EAGAIN, but with
a different buffer than the first time.

I tracked this down to mishandling of temporary socket errors in
TlsSocket::Writev(). In the case that we successfully write part of the
io vector but hit such an error trying to write a later element in the
vector, we were still propagating the error back up to the caller. The
caller didn't realize that part of the write was successful, and thus it
would retry the write from the beginning.

WIP because this should have better TLS-socket-specific test coverage.

Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
---
M src/kudu/security/tls_socket.cc
1 file changed, 8 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/8570/1
--
To view, visit http://gerrit.cloudera.org:8080/8570
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If797f220f42bfb2e6f452b66f15e7a758e883472
Gerrit-Change-Number: 8570
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Sailesh Mukil