On Fri, Mar 23, 2018 at 4:06 PM, mdladakos <[email protected]> wrote:
> Keith, thanks for your quick response!
>
> Maybe I wasn't clear enough or I am not understanding your explanation.
>
> What I was exploring was performing a scan with a large number of
> authorizations. While I did use tables with thousands of rows, I also ran
> scans against empty tables and still performed at ~25 Seconds. So shouldn't
> VisibilityEvaluator not be in involved?

Gotcha.  So one possibility is its just taking a while to send the
auths from the client to the tserver.  The following code is the
thrift RPC to start a scan.  client.startScan() is passed
scanState.authorizations.getAuthorizationsBB() which is the auths.
The getAuthorizationsBB() method does a copy.  So there is a copy,
then thrift has to serialize auths, send them, and then deserialize on
server side.. and this is done for each startScan RPC.  The startScan
call happens once per tablet, subsequent batches of key/vals from a
tablet are fetched using contunueScan RPC which does not pass auths
again.

https://github.com/apache/accumulo/blob/1e4d4827096bd0047c7de3e0b672263defe66634/core/src/main/java/org/apache/accumulo/core/client/impl/ThriftScanner.java#L429

It would be interesting to see how long the call to startScan takes
for your case.  Enabling trace logging for ThriftScanner will give
some insight into this.

>
> I don't think the actual filtering is the problem. Is there some work done
> by the tablet servers when receiving the scan request, specifically in
> regard to user authorizations?
>
> Again, if I used -s to pass a subset of authorizations for the user with
> 100000 authorizations, this increase in return time would be equivalent to a
> user with that number of authorizations (i.e.: If I scanned with 100
> authorizations out of the 100000, it would be the normal, fast speed)
>
>
>
> --
> Sent from: http://apache-accumulo.1065345.n5.nabble.com/Users-f2.html

Reply via email to