Hi, There’s no public performance benchmark done for PXF. But my team is planning to do a few performance tests and will publish the results in Apache Wiki when it’s done.
Internally, my company did some benchmark testing along with our partners and customers, I’m asking around to see if it’s ok to share with the community. In terms of Parquet Plugin - today users can only access parquet through Hive profile that is provided. I’m a little bit confused by your statement It seems there’s no Parquet HDFS plugin, so there’s no direct way to do head-to-head comparison with/without PXF framework. Either you have a parquet pxf plugin or you go through Hive profile , both require PXF, what do you mean by “*without*”. Can you clarify what exactly you want to compare? I'd also like to know what's your typical object model with Parquet format (Avro? Swift? Hive? or something else?) Our team is considering developing a parquet pxf plugin but I want to get some user feedback as well as what's the typical problem without the plugin in play. (suppose it's mainly performance concern?) Last but not least, your understanding of PXF work flow is roughly correct. Some discrepancies like PXF has filter push-down feature designed for some predicates so filtering could also happen down on each data node. There're also details of resource management, etc. Thanks Goden On Thu, Oct 29, 2015 at 1:47 AM mailing-list-recv [email protected] <http://mailto:[email protected]> wrote: Hey guys, > > Is there any performance benchmark for PXF interface? I would like to > study what is the overhead when performing a big tablescan by communicating > through PXF REST interface. > > It seems there's no Parquet HDFS plugin, so there's no direct way to do > head-to-head comparison with/without PXF framework. > > Is there any internal benchmark result to share? > > Also, since I haven't seen any detailed documents about how exactly PXF > works, can you correct me if I'm wrong? > In my understanding, bankend/access/external is the main component to > handle PXF calls, so any external table access will invoke this module to > send request to local PXF-SERVICE ( where the master node locate ). > PXF-SERVICE is responsible to pickup the correct java libraries and > construct filters. It will first attempt to get fragments, and then assign > fragments to each Segment process ( try to match the hostname for data > locality ), each Segment process is going to talk with local PXF-SERVER and > calls Accessor class in order to fetch data from external storage, then > pass back the result to Segment process through REST API. > > Is my understanding correct? > > Cheers >
