[ 
https://issues.apache.org/jira/browse/THRIFT-862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918668#action_12918668
 ] 

Ning Liang commented on THRIFT-862:
-----------------------------------

Awesome, thanks again to Eric and Bryan for the review.

> Async client issues / improvements
> ----------------------------------
>
>                 Key: THRIFT-862
>                 URL: https://issues.apache.org/jira/browse/THRIFT-862
>             Project: Thrift
>          Issue Type: Task
>          Components: Java - Library
>    Affects Versions: 0.4
>            Reporter: Ning Liang
>            Assignee: Ning Liang
>             Fix For: 0.6
>
>         Attachments: thrift-862-v2.diff, thrift-862-v3.diff, thrift-862.diff
>
>
> From Thrift-845:
> Hey Eric,
> You bring up some good points. We have indirect communication between the 
> client manager and client going through the method call object, which we can 
> refactor into direct communication. I'd like to keep the state machine 
> separate from the async client manager, though - the client manager only 
> coordinates transitioning the method call on ready. Responses inline for the 
> other comments:
> 1. Timeouts per-transition aren't really what I want as a consumer of this 
> api. It looks like there are about five states, which as a consumer I need to 
> know since the effective timeout on the whole call is then 5x the one I set. 
> I'd rather be guaranteed I will be called back one way or another within the 
> timeout i set, counting time i spend in the selector's queue, etc. Seems like 
> that would be easy to implement.
>     * The naming is misleading - we can have idle socket timeout or an 
> overall method call timeout (more like a deadline). I coded for the former 
> case, so we can modify to the latter or rename. Modifying is trivial - we 
> just need to track the method call start time, as opposed to the last action 
> time. The effective deadline is arbitrarily long, since we're updating the 
> last action timestamp on each socket read/write, which doesn't necessarily 
> correspond to a transition between method states.
> 2. You should not be allowed to set a timeout lower than 
> TAsyncClientManager.SELECT_TIME, and the value there should be much lower, 
> maybe 5ms? Because its per-transition as above, the true minimum timeout for 
> an entire call is 1s if each of its states took 200ms. We've actually run 
> clients with lower than 200ms timeouts.
>     * Agreed, though we won't be able to get to 5ms resolution. The reason is 
> because the JVM makes no guarantees on actual sleep time. Maybe a better 
> solution would be to use Selector.selectNow(), which returns immediately with 
> all ready keys, making our timeout granularity as small as possible.
> 3. TAsyncMethodCall.onError does not do anything to remove itself from 
> TAsyncClientManager.timeoutWatchSet, so unless it shows up in selectedKeys() 
> it will stay there until its timeout. Not a big deal, but could make that set 
> prohibitively large if there are many errors within a timeout interval.
>     * TAsyncClientManager removes the method after a transition if the 
> method's client.hasError(). Again, somewhat confusing since it's relying on 
> the method calling the client's onError.
> 4. We call TAsyncMethodCall.transition before adding it to timeoutWatchSet, 
> so lastTransitionTime is initialized before it is inspected. But, this is a 
> somewhat dangerous pattern to allow an uninitialized primitive to ever be 
> accessible. You should probably initialize it to the current time or a 
> sentinel on instantiation.
>     * Current time is good - if we end up using the deadline definition of 
> timeout, we don't even have to update the value.
> 5. it might be safer for lastTransitionTime = System.currentTimeMillis() to 
> be at the top of transition(SelectionKey key) so you can guarantee the 
> timestamp is updated even for errant transitions...don't think it matter's 
> functionally in the current code, but it could in refactorings. you don't 
> want to inadvertently obscure an error with a timeout.
>     * Good point. I wanted the timestamp to reflect the last socket activity 
> - putting at the beginning of transition could make it inaccurate on long 
> socket reads/writes. Maybe we can use the finally block?
> 6. TNonblockingServerSocket is the only non-test, non-tutorial thrift source 
> file that uses System.err.println instead of LOGGER.warn
>     * We can easily fix.
> 7. You might want to catch Exception instead of Throwable? I don't think you 
> intend or want to catch things like OutOfMemoryError, etc. The docs say "An 
> Error is a subclass of Throwable that indicates serious problems that a 
> reasonable application should not try to catch"
>     * Also easily fixed, though I want to make sure we didn't do this for a 
> reason earlier. Bryan, why Throwable vs Exception?
> 8. TAsyncClientManager.call should really throw an exception if its 
> SelectThread is not running (because its run method did not catch an 
> unexpected Throwable) instead of inserting it into a queue that will grow 
> unbounded and cause an OOM. You probably want a catch and throw block in run 
> that cleans up by calling onError on all pending calls and sets its done 
> flag. You probably also want to do this on ClosedSelectorException instead of 
> just catching and logging it in an infinite loop. It's odd that you catch it 
> on selectedKeys() but not select().
>     * That catch got folded into the catch(Throwable)
> 9. timeoutIdleMethods would be a bit faster if it only called 
> currentTimeMillis() once instead of per-check
>     * Good point. I was going for accuracy but even for large timeout watch 
> sets the iteration time should be negligible.
> 10. Probably an unnecessary optimization, but instead of relying on 
> Object.hashCode for removing timed out methods, you could identify them by 
> their start timestamp from currentTimeMillis() + a few bits of a static 
> atomic sequence counter, put them in a TreeSet, and shortcut the iteration 
> for timed out methods when you hit the first one that's not too old.
>     * Definitely, this was the original plan. We'll get it in on next 
> iteration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to