Re: 回复：how can i optimize scan speed when use batch scan ?

覃璐 Tue, 13 Jan 2015 18:52:47 -0800

this is the code how I get the row ids which in ColumnQualify:


Scanner scan = conn.createScanner(“t1", new Authorizations());
ListRange list = new ArrayListRange();
for (Map.EntryKey, Value entry : scan) {
  if (list.size() == resultNum * threadNum) {
    break;
  }
  list.add(Range.exact(entry.getKey().getColumnQualifier()));
}
scan.close();


and then I use the row ids to scan data.
BatchScanner bs = null;
try {
  bs = conn.createBatchScanner("test.new_index", new Authorizations(), 10);
} catch (TableNotFoundException e) {
  e.printStackTrace();
}
bs.setRanges(list);


原始邮件
发件人:Josh [email protected]
收件人:[email protected]
发送时间:2015年1月14日(周三) 10:32
主题:Re: 回复：how can i optimize scan speed when use batch scan ?


You might need to set tserver.cache.data.size to a larger value. Depending on 
the amount of data, you might just churn through the cache without getting much 
benefit. I think you have to restart Accumulo after changing this property. Can 
you show us the code you used to try to scan for a row ID and the data in the 
table you expected to be returned that wasn't? 覃璐 wrote:  Yes,I received all 
results what I want when the program end.   But I do not know why the scan 
received 0 result when I ensure a exists  row id?   I config the 
table.cache.block.enable=true,but I do not found distinct  change.   Thanks    
原始邮件  *发件人:* Eric [email protected]  *收件人:* 
[email protected][email protected]  *发送时间:* 2015年1月14日(周三) 00:17  
*主题:* Re: 回复：how can i optimize scan speed when use batch scan ?   You should 
have received at least 1390 Key/Value pairs (#results=1390).   If your 
application has many exact RowID look-ups, you may want to  investigate Bloom 
filters.   Consider turning on data block caching to reduce latency on future 
look-ups.   -Eric    On Mon, Jan 12, 2015 at 8:15 PM, 覃璐 [email protected]  
mailto:[email protected] wrote:   i am sorry i do not know about the image.   
the log is this:    [17:50:38] TRACE  
[org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]  [21521] - 
tid=65 oid=675 Continuing multi scan,  scanid=-152589127623326551   [17:50:38] 
TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.stop(OpTimer.java:49)]  [21544] - tid=65 
oid=675 Got more multi scan results, #results=1390  scanID=-152589127623326551 
in 0.023 secs   [17:50:38] TRACE  
[org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]  [21546] - 
tid=65 oid=676 Continuing multi scan,  scanid=-152589127623326551   [17:50:38] 
TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.stop(OpTimer.java:49)]  [21555] - tid=45 
oid=644 Got more multi scan results, #results=0  scanID=-4477962012178388198 in 
1.002 secs   [17:50:38] TRACE  
[org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]  [21555] - 
tid=45 oid=677 Continuing multi scan,  scanid=-4477962012178388198   [17:50:38] 
TRACE  [org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.stop(OpTimer.java:49)]  [21596] - tid=57 
oid=645 Got more multi scan results, #results=0  scanID=-8718025066902358141 in 
1.003 secs   [17:50:38] TRACE  
[org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator]  
[org.apache.accumulo.core.util.OpTimer.start(OpTimer.java:39)]  [21596] - 
tid=57 oid=678 Continuing multi scan,  scanid=-8718025066902358141    the scan 
spend long time but has no result.    i use 1.6.1,and the config output is 
this:    default | table.balancer ............................ |  
org.apache.accumulo.server.master.balancer.DefaultLoadBalancer   default | 
table.bloom.enabled ....................... | false   default | 
table.bloom.error.rate .................... | 0.5%   default | 
table.bloom.hash.type ..................... | murmur   default | 
table.bloom.key.functor ................... |  
org.apache.accumulo.core.file.keyfunctor.RowFunctor   default | 
table.bloom.load.threshold ................ | 1   default | table.bloom.size 
.......................... | 1048576   default | table.cache.block.enable 
.................. | false   default | table.cache.index.enable 
.................. | true   default | table.classpath.context 
................... |   default | table.compaction.major.everything.idle .... | 
1h   default | table.compaction.major.ratio .............. | 3   default | 
table.compaction.minor.idle ............... | 5m   default | 
table.compaction.minor.logs.threshold ..... | 3   table | table.constraint.1 
........................ |  
org.apache.accumulo.core.constraints.DefaultKeySizeConstraint   default | 
table.failures.ignore ..................... | false   default | 
table.file.blocksize ...................... | 0B   default | 
table.file.compress.blocksize ............. | 100K   default | 
table.file.compress.blocksize.index ....... | 128K   default | 
table.file.compress.type .................. | gz   default | table.file.max 
............................ | 15   default | table.file.replication 
.................... | 0   default | table.file.type 
........................... | rf   default | table.formatter 
........................... |  
org.apache.accumulo.core.util.format.DefaultFormatter   default | 
table.groups.enabled ...................... |   default | table.interepreter 
........................ |  
org.apache.accumulo.core.util.interpret.DefaultScanInterpreter   table | 
table.iterator.majc.vers .................. |  
20,org.apache.accumulo.core.iterators.user.VersioningIterator   table | 
table.iterator.majc.vers.opt.maxVersions .. | 1   table | 
table.iterator.minc.vers .................. |  
20,org.apache.accumulo.core.iterators.user.VersioningIterator   table | 
table.iterator.minc.vers.opt.maxVersions .. | 1   table | 
table.iterator.scan.vers .................. |  
20,org.apache.accumulo.core.iterators.user.VersioningIterator   table | 
table.iterator.scan.vers.opt.maxVersions .. | 1   default | 
table.majc.compaction.strategy ............ |  
org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy   default | 
table.scan.max.memory ..................... | 512K   default | 
table.security.scan.visibility.default .... |   default | table.split.threshold 
..................... | 1G   default | table.walog.enabled 
....................... | true    and my tablet server is 4 core,32G.    Thanks 
   原始邮件  *发件人:* Josh [email protected] mailto:[email protected]  
*收件人:* [email protected]  mailto:[email protected]  *发送时间:* 
2015年1月12日(周一) 23:52  *主题:* Re: 回复：how can i optimize scan speed when use batch 
scan ?   FYI, images don't (typically) come across on the mailing list. Use 
some  external hosting and provide the link if it's important, please.   How 
many tabletservers do you have? What version of Accumulo are you  running? Can 
you share the output of `config -t your_table_name`?   Thanks.   覃璐 wrote:   i 
look the trace log       why it receive 0 result and spend so long?       原始邮件  
 *发件人:* 覃璐[email protected] mailto:[email protected]   *收件人:* 
[email protected] mailto:[email protected]   *发送时间:* 
2015年1月12日(周一) 17:05   *主题:* how can i optimize scan speed when use batch scan 
?     hi all.     now i have code like this:     ListRange rangeList=…..;   
BatchScanner bs=conn.createBatchScanner();   bs.setRanges(rangeList);       the 
rangeList has many ranges about 1000,and every range has a random   row id when 
i use Range.exact(new Text(…)),   but the speed is so slowly,it maybe spend 
2-3s,how can i optimize it ?     thanks

Re: 回复：how can i optimize scan speed when use batch scan ?

Reply via email to