UNCLASSIFIED I have tried increasing the number of threads and it seems to guarantee that it will return before it hits the timeout but it is taking approx. 7 minutes to complete. Looking at the accumulo manager page it appears that all the tablet servers get equally hit (around 16 per node) and start to return but a couple of tablet servers take longer than the others. This behaviour was indicated to potentially happen in the doco but I was hoping it wouldn't be taking this long.
________________________________ From: David Medinets [mailto:[email protected]] Sent: Wednesday, 14 August 2013 12:45 To: accumulo-user Subject: Re: Intersecting Iterators [SEC=UNCLASSIFIED] I'm wondering about the 20 threads in the BatchScanner. Have you played with increasing it? I've seen that number go above 15 per accumulo node. Are you seeing the scans in the Accumulo monitor? Are the scans progressing through the Accumulo nodes? On Tue, Aug 13, 2013 at 9:58 PM, Williamson, Luke MR 1 <[email protected]> wrote: UNCLASSIFIED Hi, I have field indexes that looks something like Row Id: <date>-<UUID> CF: fi||<type>||<value> CQ: <date>-<UUID> For example: 20130814-550e8400-e29b-41d4-a716-446655440000 fi||verb||run 20130814-550e8400-e29b-41d4-a716-446655440000 20130814-550e8400-e29b-41d4-a716-446655440000 page||58 line||16 "the boy can run up the hill" From what I could determine from the doco and API I am executing the following code to perform an intersecting query on two values... Set<Range> shards = new HashSet<Range>(); Text[] terms = {new Text("fi||<type>||<value>"), new Text("fi||<type>||<value>")}; BatchScanner bs = conn.createBatchScanner(table, auths, 20); bs.setTimeout(360, TimeUnit.SECONDS); IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class); IntersectingIterator.setColumnFamilies(iter, terms); bs.addScanIterator(iter); bs.setRanges(Collections.singleton(new Range())); for(Entry<Key,Value> entry : bs) { shards.add(new Range(entry.getKey().getColumnQualifier())); } I then perform a second batch scan using the set of ranges returned by the above to get my actual results. My issues is that the intersecting query takes several minutes to return if at all (in some cases it times out). Is this expected? Is there some way to improve performance? Is there a better way to do this sort of query? Any guidance would be much appreciated. Thanks Luke IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email. IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email.
