OfflineScanner is package protected. So I'll need to hack it. If it proves to 
be faster at least 20% then it's worth having it in the public Ali, perhaps 
even let user use it by a asking specific file to be scanned rather than 
directing scan by carefully defining the range to touch the intended file.

Ara.

On Feb 19, 2015, at 8:15 AM, Keith Turner 
<[email protected]<mailto:[email protected]>> wrote:



On Thu, Feb 19, 2015 at 12:57 AM, Ara Ebrahimi 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I'm trying to optimize a connector we've written for Presto. In some cases we 
need to perform full table scans. This happens across all the nodes but each 
node is assigned to process only a sharded subset of data. Each shard is hosted 
by only 1 RFile. I'm looking at the AbstractInputFormat and OfflineIterator and 
it seems like the code is not that hard to use for this case. Is there any 
drawback? It seems like if the table is offline then OfflineIterator is used 
which apparently reads the RFiles directly and doesn't involve any RPC and I 
think should be significantly faster. Is it so? Is there any drawback to using 
this while the table is not offline but no other app is messing with the table?

The code will throw an exception if the table is not offline (intent is to 
ensure the files are stable and not garbage collected). As others have stated 
you can clone.

Currently offline scanning is only supported in the public API w/ Map Reduce.  
Curious, would you be interested in seeing this in the client public API?


Thanks,
Ara.



________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Thank you in advance for your 
cooperation.

________________________________




________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Thank you in advance for your 
cooperation.

________________________________



________________________________

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise confidential information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of the e-mail by you is prohibited. Thank you in advance for your 
cooperation.

________________________________

Reply via email to