Hi, I checked without product marketing guys. Unfortunately we cannot share the results as they're done by our partners and specific for a potential customer.
We'll continue evaluating pxf performance and share the findings with the community if any. Meanwhile, let me know if there's further questions about your parquet plugin. -Goden On Tue, Nov 3, 2015 at 1:59 PM Ting(Goden) Yao <[email protected]> wrote: > Hi, > > There’s no public performance benchmark done for PXF. But my team is > planning to do a few performance tests and will publish the results in > Apache Wiki when it’s done. > > Internally, my company did some benchmark testing along with our partners > and customers, I’m asking around to see if it’s ok to share with the > community. > > In terms of Parquet Plugin - today users can only access parquet through > Hive profile that is provided. I’m a little bit confused by your statement > > It seems there’s no Parquet HDFS plugin, so there’s no direct way to do > head-to-head comparison with/without PXF framework. > > Either you have a parquet pxf plugin or you go through Hive profile , both > require PXF, what do you mean by “*without*”. Can you clarify what > exactly you want to compare? > > I'd also like to know what's your typical object model with Parquet format > (Avro? Swift? Hive? or something else?) Our team is considering developing > a parquet pxf plugin but I want to get some user feedback as well as what's > the typical problem without the plugin in play. (suppose it's mainly > performance concern?) > > Last but not least, your understanding of PXF work flow is roughly > correct. Some discrepancies like PXF has filter push-down feature designed > for some predicates so filtering could also happen down on each data node. > > There're also details of resource management, etc. > > Thanks > > Goden > > On Thu, Oct 29, 2015 at 1:47 AM mailing-list-recv > [email protected] > <http://mailto:[email protected]> wrote: > > Hey guys, >> >> Is there any performance benchmark for PXF interface? I would like to >> study what is the overhead when performing a big tablescan by communicating >> through PXF REST interface. >> >> It seems there's no Parquet HDFS plugin, so there's no direct way to do >> head-to-head comparison with/without PXF framework. >> >> Is there any internal benchmark result to share? >> >> Also, since I haven't seen any detailed documents about how exactly PXF >> works, can you correct me if I'm wrong? >> In my understanding, bankend/access/external is the main component to >> handle PXF calls, so any external table access will invoke this module to >> send request to local PXF-SERVICE ( where the master node locate ). >> PXF-SERVICE is responsible to pickup the correct java libraries and >> construct filters. It will first attempt to get fragments, and then assign >> fragments to each Segment process ( try to match the hostname for data >> locality ), each Segment process is going to talk with local PXF-SERVER and >> calls Accessor class in order to fetch data from external storage, then >> pass back the result to Segment process through REST API. >> >> Is my understanding correct? >> >> Cheers >> > >
