Hi, I’m trying to optimize a connector we’ve written for Presto. In some cases we need to perform full table scans. This happens across all the nodes but each node is assigned to process only a sharded subset of data. Each shard is hosted by only 1 RFile. I’m looking at the AbstractInputFormat and OfflineIterator and it seems like the code is not that hard to use for this case. Is there any drawback? It seems like if the table is offline then OfflineIterator is used which apparently reads the RFiles directly and doesn’t involve any RPC and I think should be significantly faster. Is it so? Is there any drawback to using this while the table is not offline but no other app is messing with the table?
Thanks, Ara. ________________________________ This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Thank you in advance for your cooperation. ________________________________
