Hi, I checked without product marketing guys. Unfortunately we cannot share
the results as they're done by our partners and specific for a potential
customer.

We'll continue evaluating pxf performance and share the findings with the
community if any.
Meanwhile, let me know if there's further questions about your parquet
plugin.

-Goden

On Tue, Nov 3, 2015 at 1:59 PM Ting(Goden) Yao <[email protected]> wrote:

> Hi,
>
> There’s no public performance benchmark done for PXF. But my team is
> planning to do a few performance tests and will publish the results in
> Apache Wiki when it’s done.
>
> Internally, my company did some benchmark testing along with our partners
> and customers, I’m asking around to see if it’s ok to share with the
> community.
>
> In terms of Parquet Plugin - today users can only access parquet through
> Hive profile that is provided. I’m a little bit confused by your statement
>
> It seems there’s no Parquet HDFS plugin, so there’s no direct way to do
> head-to-head comparison with/without PXF framework.
>
> Either you have a parquet pxf plugin or you go through Hive profile , both
> require PXF, what do you mean by “*without*”. Can you clarify what
> exactly you want to compare?
>
> I'd also like to know what's your typical object model with Parquet format
> (Avro? Swift? Hive? or something else?) Our team is considering developing
> a parquet pxf plugin but I want to get some user feedback as well as what's
> the typical problem without the plugin in play. (suppose it's mainly
> performance concern?)
>
> Last but not least, your understanding of PXF work flow is roughly
> correct. Some discrepancies like PXF has filter push-down feature designed
> for some predicates so filtering could also happen down on each data node.
>
> There're also details of resource management, etc.
>
> Thanks
>
> Goden
>
> On Thu, Oct 29, 2015 at 1:47 AM mailing-list-recv
> [email protected]
> <http://mailto:[email protected]> wrote:
>
> Hey guys,
>>
>> Is there any performance benchmark for PXF interface? I would like to
>> study what is the overhead when performing a big tablescan by communicating
>> through PXF REST interface.
>>
>> It seems there's no Parquet HDFS plugin, so there's no direct way to do
>> head-to-head comparison with/without PXF framework.
>>
>> Is there any internal benchmark result to share?
>>
>> Also, since I haven't seen any detailed documents about how exactly PXF
>> works, can you correct me if I'm wrong?
>> In my understanding, bankend/access/external is the main component to
>> handle PXF calls, so any external table access will invoke this module to
>> send request to local PXF-SERVICE ( where the master node locate ).
>> PXF-SERVICE is responsible to pickup the correct java libraries and
>> construct filters. It will first attempt to get fragments, and then assign
>> fragments to each Segment process ( try to match the hostname for data
>> locality ), each Segment process is going to talk with local PXF-SERVER and
>> calls Accessor class in order to fetch data from external storage, then
>> pass back the result to Segment process through REST API.
>>
>> Is my understanding correct?
>>
>> Cheers
>>
> ​
>

Reply via email to