Hey guys,
Is there any performance benchmark for PXF interface? I would like to study what is the overhead when performing a big tablescan by communicating through PXF REST interface. It seems there's no Parquet HDFS plugin, so there's no direct way to do head-to-head comparison with/without PXF framework. Is there any internal benchmark result to share? Also, since I haven't seen any detailed documents about how exactly PXF works, can you correct me if I'm wrong? In my understanding, bankend/access/external is the main component to handle PXF calls, so any external table access will invoke this module to send request to local PXF-SERVICE ( where the master node locate ). PXF-SERVICE is responsible to pickup the correct java libraries and construct filters. It will first attempt to get fragments, and then assign fragments to each Segment process ( try to match the hostname for data locality ), each Segment process is going to talk with local PXF-SERVER and calls Accessor class in order to fetch data from external storage, then pass back the result to Segment process through REST API. Is my understanding correct? Cheers
