[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..

KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()

Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.

The following sequence becomes problematic with the previous implementation
of OutboundTransfer::TransferStarted():

- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
  Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
  and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
  hasn't really started yet and "car->call" is NULL due to cancellation, it
  removes it from the outbound transfer queue and moves on to the next entry
  in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
  eventually calls SSL_write() with a different buffer than expected by
  SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.

This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.

Confirmed the problem above is fixed by running stress test in a 130 node
cluster with Impala. The problem happened consistently without the fix.

Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Reviewed-on: http://gerrit.cloudera.org:8080/9587
Reviewed-by: Sailesh Mukil 
Reviewed-by: Todd Lipcon 
Tested-by: Todd Lipcon 
---
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/transfer.cc
M src/kudu/rpc/transfer.h
4 files changed, 84 insertions(+), 1 deletion(-)

Approvals:
  Sailesh Mukil: Looks good to me, but someone else must approve
  Todd Lipcon: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-13 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 2: Verified+1 Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 15:10:59 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc
File src/kudu/rpc/rpc-test.cc:

http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc@1225
PS2, Line 1225: // This serves as a helper function for 
TestCancellationMultiThreads() to exercise
  : // cancellation when there are concurrent RPCs.
> nit: Better to not explain what a user of this function would do, since it'
Doesn't seem too bad to as this provides context on how this function fits in 
with the rest of the test.



--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 06:43:50 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 04:34:13 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc
File src/kudu/rpc/rpc-test.cc:

http://gerrit.cloudera.org:8080/#/c/9587/2/src/kudu/rpc/rpc-test.cc@1225
PS2, Line 1225: // This serves as a helper function for 
TestCancellationMultiThreads() to exercise
  : // cancellation when there are concurrent RPCs.
nit: Better to not explain what a user of this function would do, since it's 
already explained in TestCancellationMultiThreads().



--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 04:33:57 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 2:

Hit with flaky tests again in TSAN build:

[  FAILED  ] AllTypesItest/3.TestAllKeyTypes, where TypeParam = 
kudu::client::IntKeysTestSetup >
[  FAILED  ] AllTypesItest/7.TestAllKeyTypes, where TypeParam = 
kudu::client::IntKeysTestSetup 
>

Both failed with the following backtraces:

F0313 01:15:33.904728 14346 master_main.cc:74] Check failed: _s.ok() Bad 
status: Service unavailable: Cannot initialize clock: Error reading clock. 
Clock considered unsynchronized
*** Check failure stack trace: ***
@ 0x7f9641ff03b9  google::LogMessage::Flush() at ??:0
@ 0x7f9641ff446e  google::LogMessageFatal::~LogMessageFatal() at ??:0
@   0x4b5391  kudu::master::MasterMain() at 
/home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:6199
@   0x4b4ddf  main at 
/home/jenkins-slave/workspace/kudu-master/3/thirdparty/src/llvm-4.0.0.src/projects/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:6189
@ 0x7f963f379f45  __libc_start_main at ??:0
@   0x41e092  (unknown) at ??:0
/home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/all_types-itest.cc:518:
 Failure


--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 01:47:34 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 1:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h
File src/kudu/rpc/rpc-test-base.h:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h@441
PS1, Line 441:   FLAGS_rpc_encrypt_loopback_connections = true;
> ah, does this mean we were not successfully testing encrypted RPCs in the p
I believe so. I confirmed that the TlsSocket path is now being exercised by 
manually adding a CHECK() in that TlsSocket::Write().


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc
File src/kudu/rpc/rpc-test.cc:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@740
PS1, Line 740:// Linux, SSL enabled
> hmm, is this fix related to the patch in question? I'm surprised we haven't
Yes, as we weren't using TlsSocket before so we weren't triggering this error 
message.


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1223
PS1, Line 1223: // Helper function for TestCancellationMultiThreads.
> better to document briefly what it does, rather than what it is.
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1224
PS1, Line 1224: SendRpcAndCancel
> this should probably be named SendAndCancelRpcs or something, since it send
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1251
PS1, Line 1251: ASSERT_TRUE
> have to use CHECK here since this isn't the main thread and gtest isn't thr
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1256
PS1, Line 1256: #define BUF_SIZE (16 * 1024 * 1024)
> can you use a const int defined below within the function instead?  We pref
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1258
PS1, Line 1258: Regression
> did this test successfully repro the bug w/o the fix?
Unfortunately, no even with multiple tries in different build types. That said, 
I think this is still a good test to include. Any suggestion to make this test 
more effective in triggering the bug is welcomed.

I confirmed the fix with Impala stress test running on a 130 node cluster which 
hit this bug quite consistently without the fix. Updated the commit message 
with this piece of information.


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1274
PS1, Line 1274:   unique_ptr buf(new uint8_t[BUF_SIZE]);
  :   memset(buf.get(), 'a', BUF_SIZE);
  :   Slice slice(buf.get(), BUF_SIZE);
> you could probably get away with just:
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h
File src/kudu/rpc/transfer.h:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@186
PS1, Line 186: True if we have tried sending anything in 'payload_slices_'
> Actually isn't it more like "True if SendBuffer() is called at least once."
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@187
PS1, Line 187:   // no bytes were sent successfully. This is needed as 
SSL_write() is stateful.
> nit: Add a reference to this JIRA, else it's hard to understand why we adde
Done


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc
File src/kudu/rpc/transfer.cc:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc@190
PS1, Line 190:   started_ = true;
> Do we know of any cases where setting started_ = true this early can backfi
As far as I can tell, this should be fine. Both SendBuffer() and 
OutboundTransfer::TransferStarted() are called from Connection::WriteHandler() 
only. So, if we get to the point of SendBuffer(), we should have taken the path 
of if (!transfer->TransferStarted()) once.



--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 13 Mar 2018 00:54:24 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Michael Ho (Code Review)
Hello Kudu Jenkins, Sailesh Mukil, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9587

to look at the new patch set (#2).

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..

KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()

Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.

The following sequence becomes problematic with the previous implementation
of OutboundTransfer::TransferStarted():

- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
  Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
  and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
  hasn't really started yet and "car->call" is NULL due to cancellation, it
  removes it from the outbound transfer queue and moves on to the next entry
  in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
  eventually calls SSL_write() with a different buffer than expected by
  SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.

This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.

Confirmed the problem above is fixed by running stress test in a 130 node
cluster with Impala. The problem happened consistently without the fix.

Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
---
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/transfer.cc
M src/kudu/rpc/transfer.h
4 files changed, 84 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/9587/2
--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h
File src/kudu/rpc/transfer.h:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@186
PS1, Line 186: True if we have tried sending anything in 'payload_slices_'
Actually isn't it more like "True if SendBuffer() is called at least once." ?

The Writev() function can return without attempting to send anything in some 
cases.


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.h@187
PS1, Line 187:   // no bytes were sent successfully. This is needed as 
SSL_write() is stateful.
nit: Add a reference to this JIRA, else it's hard to understand why we added it.


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc
File src/kudu/rpc/transfer.cc:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/transfer.cc@190
PS1, Line 190:   started_ = true;
Do we know of any cases where setting started_ = true this early can backfire? 
I looked at the code and it seems fine to me, but I just want to make sure.

In some cases, we won't send anything at all:
https://github.com/apache/kudu/blob/master/src/kudu/security/tls_socket.cc#L50-L55

Or we could hit errors before actually calling the SSL_write() or send() 
functions.

So we should make sure that having started_ = true for these cases doesn't 
cause a problem elsewhere in the code.



--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Mar 2018 23:36:34 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 1:

(8 comments)

> Patch Set 1:
>
> the test failure seems to be infrastructure issue ?
>
> tablet-decoder-eval-test.0dist-test-slave-dist-test-slave-p2pc-2  
> Nonefailed to download task files: WARNING 100 
> isolateserver(1484): Adding unknown file 9a0053f2a34

yea that's odd, agree it's unrelated.

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h
File src/kudu/rpc/rpc-test-base.h:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test-base.h@441
PS1, Line 441:   FLAGS_rpc_encrypt_loopback_connections = true;
ah, does this mean we were not successfully testing encrypted RPCs in the past 
with this test? :(


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc
File src/kudu/rpc/rpc-test.cc:

http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@740
PS1, Line 740:// Linux, SSL enabled
hmm, is this fix related to the patch in question? I'm surprised we haven't 
seen this failure on our flaky test dashboard


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1223
PS1, Line 1223: // Helper function for TestCancellationMultiThreads.
better to document briefly what it does, rather than what it is.


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1224
PS1, Line 1224: SendRpcAndCancel
this should probably be named SendAndCancelRpcs or something, since it sends 
and cancels more than one of them


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1251
PS1, Line 1251: ASSERT_TRUE
have to use CHECK here since this isn't the main thread and gtest isn't 
thread-safe


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1256
PS1, Line 1256: #define BUF_SIZE (16 * 1024 * 1024)
can you use a const int defined below within the function instead?  We prefer 
'const int kBufferSize' vs a #define


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1258
PS1, Line 1258: Regression
did this test successfully repro the bug w/o the fix?


http://gerrit.cloudera.org:8080/#/c/9587/1/src/kudu/rpc/rpc-test.cc@1274
PS1, Line 1274:   unique_ptr buf(new uint8_t[BUF_SIZE]);
  :   memset(buf.get(), 'a', BUF_SIZE);
  :   Slice slice(buf.get(), BUF_SIZE);
you could probably get away with just:

  string buf('a', kBufferSize);

and then down below pass Slice(buf)



--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Mar 2018 22:59:10 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9587 )

Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..


Patch Set 1:

the test failure seems to be infrastructure issue ?

tablet-decoder-eval-test.0  dist-test-slave-dist-test-slave-p2pc-2  
Nonefailed to download task files: WARNING 100 isolateserver(1484): 
Adding unknown file 9a0053f2a34


--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Mar 2018 21:23:21 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL write()

2018-03-12 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9587


Change subject: KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with 
SSL_write()
..

KUDU-2334: Fix OutboundTransfer::TransferStarted() to work with SSL_write()

Previously, OutboundTransfer::TransferStarted() returns true iff
non-zero bytes have been successfully sent via Writev(). As it turns
out, this doesn't work well with SSL_write(). When SSL_write() returns -1
with errno EAGAIN or ETRYAGAIN, we need to retry the call with exactly
the same buffer pointer next time even if 0 bytes have been written.

The following sequence becomes problematic with the previous implementation
of OutboundTransfer::TransferStarted():

- WriteHandler() calls SendBuffer() on an OutboundTransfer.
- SendBuffer() calls TlsSocket::Writev() which hits the EAGAIN error above.
  Since 0 bytes were written, cur_slice_idx_ and cur_offset_in_slice_ remain 0
  and OutboundTransfer::TransferStarted() still returns false.
- OutboundTransfer is cancelled or timed out. car->call is set to NULL.
- WirteHandler() is called again and as it notices that the OutboundTransfer
  hasn't really started yet and "car->call" is NULL due to cancellation, it
  removes it from the outbound transfer queue and moves on to the next entry
  in the queue.
- WriteHandler() calls SendBuffer() with the next entry in the queue and
  eventually calls SSL_write() with a different buffer than expected by
  SSL_write(), leading to "SSL3_WRITE_PENDING:bad write retry" error.

This change fixes the problem above by adding a boolean flag 'started_'
which is set to true if OutboundTransfer::SendBuffer() has been called
at least once. Also added some tests to exercise cancellation paths with
multiple concurrent RPCs.

Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
---
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/transfer.cc
M src/kudu/rpc/transfer.h
4 files changed, 83 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/9587/1
--
To view, visit http://gerrit.cloudera.org:8080/9587
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id7ebdcbc1ef2a3e0c5e7162f03214c232755b683
Gerrit-Change-Number: 9587
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho