On Fri, Mar 23, 2018 at 4:06 PM, mdladakos <[email protected]> wrote: > Keith, thanks for your quick response! > > Maybe I wasn't clear enough or I am not understanding your explanation. > > What I was exploring was performing a scan with a large number of > authorizations. While I did use tables with thousands of rows, I also ran > scans against empty tables and still performed at ~25 Seconds. So shouldn't > VisibilityEvaluator not be in involved?
Gotcha. So one possibility is its just taking a while to send the auths from the client to the tserver. The following code is the thrift RPC to start a scan. client.startScan() is passed scanState.authorizations.getAuthorizationsBB() which is the auths. The getAuthorizationsBB() method does a copy. So there is a copy, then thrift has to serialize auths, send them, and then deserialize on server side.. and this is done for each startScan RPC. The startScan call happens once per tablet, subsequent batches of key/vals from a tablet are fetched using contunueScan RPC which does not pass auths again. https://github.com/apache/accumulo/blob/1e4d4827096bd0047c7de3e0b672263defe66634/core/src/main/java/org/apache/accumulo/core/client/impl/ThriftScanner.java#L429 It would be interesting to see how long the call to startScan takes for your case. Enabling trace logging for ThriftScanner will give some insight into this. > > I don't think the actual filtering is the problem. Is there some work done > by the tablet servers when receiving the scan request, specifically in > regard to user authorizations? > > Again, if I used -s to pass a subset of authorizations for the user with > 100000 authorizations, this increase in return time would be equivalent to a > user with that number of authorizations (i.e.: If I scanned with 100 > authorizations out of the 100000, it would be the normal, fast speed) > > > > -- > Sent from: http://apache-accumulo.1065345.n5.nabble.com/Users-f2.html
