Hi,

There’s no public performance benchmark done for PXF. But my team is
planning to do a few performance tests and will publish the results in
Apache Wiki when it’s done.

Internally, my company did some benchmark testing along with our partners
and customers, I’m asking around to see if it’s ok to share with the
community.

In terms of Parquet Plugin - today users can only access parquet through
Hive profile that is provided. I’m a little bit confused by your statement

It seems there’s no Parquet HDFS plugin, so there’s no direct way to do
head-to-head comparison with/without PXF framework.

Either you have a parquet pxf plugin or you go through Hive profile , both
require PXF, what do you mean by “*without*”. Can you clarify what exactly
you want to compare?

I'd also like to know what's your typical object model with Parquet format
(Avro? Swift? Hive? or something else?) Our team is considering developing
a parquet pxf plugin but I want to get some user feedback as well as what's
the typical problem without the plugin in play. (suppose it's mainly
performance concern?)

Last but not least, your understanding of PXF work flow is roughly correct.
Some discrepancies like PXF has filter push-down feature designed for some
predicates so filtering could also happen down on each data node.

There're also details of resource management, etc.

Thanks

Goden

On Thu, Oct 29, 2015 at 1:47 AM mailing-list-recv
[email protected]
<http://mailto:[email protected]> wrote:

Hey guys,
>
> Is there any performance benchmark for PXF interface? I would like to
> study what is the overhead when performing a big tablescan by communicating
> through PXF REST interface.
>
> It seems there's no Parquet HDFS plugin, so there's no direct way to do
> head-to-head comparison with/without PXF framework.
>
> Is there any internal benchmark result to share?
>
> Also, since I haven't seen any detailed documents about how exactly PXF
> works, can you correct me if I'm wrong?
> In my understanding, bankend/access/external is the main component to
> handle PXF calls, so any external table access will invoke this module to
> send request to local PXF-SERVICE ( where the master node locate ).
> PXF-SERVICE is responsible to pickup the correct java libraries and
> construct filters. It will first attempt to get fragments, and then assign
> fragments to each Segment process ( try to match the hostname for data
> locality ), each Segment process is going to talk with local PXF-SERVER and
> calls Accessor class in order to fetch data from external storage, then
> pass back the result to Segment process through REST API.
>
> Is my understanding correct?
>
> Cheers
>
​

Reply via email to