Thanks Jimmy,

That’s very helpful explanation.

It looks like backend/access/external and bin/gpfusion are the main code where 
PXF request being sent. I’m proposing to create an interface calling our C API 
if the given URI indicating the data source is located in our own system.

It seems we should overwrite pxfwritable_export and pxfwritable_import 
interface, is it correct?

Thanks

> 在 2015年10月30日,下午12:01,Jimmy Da <[email protected]> 写道:
> 
> Great job on linking the right classpaths!
> 
> In terms of resource consumption, pxf daemon shouldn't use more than 1GB and 
> then some (off-heap stack/memory). C.f.
> https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10 
> <https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10>
> 
> In terms of performance, the slow down is unavoidable compared with hbase 
> shell as the two go through different paths to retrieve the data.
> 
> In hbase shell, the client talk with HBaseMaster and RegionServer and get 
> data in an optimal way where the data could even be warm in the HFile cache 
> (in memory store).
> 
> With PXF, the Java daemon read off the hdfs location in your CREATE EXTERNAL 
> TABLE definition, talk to NAMENODE to find out the block locations containing 
> the HFile (on disk), and then use the HBase java file reader to read the data 
> with some serde, and then send the results to the local HAWQ segments, where 
> query processing will happen.
> 
> PXF is built in a way that it generalizes data access to different systems 
> (the previous paragraph could also apply to reading HDFS files, Hive files, 
> name-your-own-system). The additional overhead mostly come from retrieving 
> the initial metadata. I suppose it would be an interesting experiment to run 
> when dealing with larger data set and see if the performance difference is 
> additive or multiplicative.
> 
> Noa correct me if I made a mistake :)
> 
> Jimmy Da
> That’s what people do, they leap, and hoping to God they can fly.
> 
> On Thu, Oct 29, 2015 at 6:49 PM, sequoiadb <[email protected] 
> <mailto:[email protected]>> wrote:
> Creating soft link from /usr/phd to /usr/hdp makes pxf-service start 
> successfully.
> 
> Just curious what’s the overhead of using PXF?
> 
> postgres=# select * from hbase_member;
>  recordkey  | address:city | address:contry | address:province | info:age | 
> info
> :birthday | info:company 
> ------------+--------------+----------------+------------------+----------+-----
> ----------+--------------
>  scutshuxue | hangzhou     | china          | zhejiang         |       99 | 
> 1987
> -06-17    | alibaba
>  xiaofeng   | jieyang      | china          | guangdong        |          | 
> 1987
> -4-17     | alibaba
> (2 rows)
> 
> Time: 434.412 ms
> 
> hbase(main):004:0* scan 'member'
> ROW                               COLUMN+CELL                                 
>                                                     
>  scutshuxue                       column=address:city, 
> timestamp=1446104911726, value=hangzhou                                     
>  scutshuxue                       column=address:contry, 
> timestamp=1446104910743, value=china                                      
>  scutshuxue                       column=address:province, 
> timestamp=1446104910775, value=zhejiang                                 
>  scutshuxue                       column=info:age, timestamp=1446104987420, 
> value=99                                               
>  scutshuxue                       column=info:birthday, 
> timestamp=1446104910674, value=1987-06-17                                 
>  scutshuxue                       column=info:company, 
> timestamp=1446104910715, value=alibaba                                      
>  xiaofeng                         column=address:city, 
> timestamp=1446104920523, value=jieyang                                      
>  xiaofeng                         column=address:contry, 
> timestamp=1446104920461, value=china                                      
>  xiaofeng                         column=address:province, 
> timestamp=1446104920486, value=guangdong                                
>  xiaofeng                         column=address:town, 
> timestamp=1446104921802, value=xianqiao                                     
>  xiaofeng                         column=info:birthday, 
> timestamp=1446104920358, value=1987-4-17                                  
>  xiaofeng                         column=info:company, 
> timestamp=1446104920423, value=alibaba                                      
>  xiaofeng                         column=info:favorite, 
> timestamp=1446104920397, value=movie                                      
> 2 row(s) in 0.0540 seconds
> 
> It’s very slow comparing running in hbase shell.
> 
> 
>> 在 2015年10月29日,下午8:33,Noa Horn <[email protected] <mailto:[email protected]>> 
>> 写道:
>> 
>> The problem is probably because the jars that are required by PXF are not 
>> found.
>> 
>> In the attached log file, this error for example shows that hadoop-auth.jar 
>> is not found:
>> 29-Oct-2015 16:37:33.405 WARNING [localhost-startStop-1] 
>> com.pivotal.pxf.service.utilities.CustomWebappLoader.addRepositories Failed 
>> to load entry /usr/phd/current/hadoop-client/hadoop-auth.jar: 
>> java.nio.file.NoSuchFileException: /usr/phd/current/hadoop-client
>> 
>> Have a look at /etc/conf/gphd/pxf (old version) or /etc/conf/pxf (open 
>> source version), at the file pxf-private.classpath.
>> Every source specified there is required by PXF.
>> The default paths for these resources is under /usr/phd/... (Pivotal 
>> distribution) while your system is hdp so the path is different. Luckily, we 
>> also provide the paths for hdp distribution - in pxf-privatehdp.classpath. 
>> If you copy the content of that file into pxf-private.classpath and run init 
>> and start again, it should work.
>> 
>> As an aside, it's highly recommended to compile and use the open source 
>> version, because we made a few changes in the rpms.
>> From the pxf directory, run 'make tomcat' to generate a tomcat rpm (required 
>> by PXF) and 'make rpm' to compile and create PXF rpms.
>> 
>> Noa
>> 
>> 
>> On Wed, Oct 28, 2015 at 11:38 PM, mailing-list-recv 
>> <[email protected] <mailto:[email protected]>> 
>> wrote:
>> Thanks guys,
>> 
>> Not sure if mailing list supports attachment, let me try anyway.
>> 
>> Status command shows following:
>> [root@cent61 ~]# service pxf-service status
>> 
>> Checking if tcServer is up and running...
>> 
>> Checking if PXF webapp is up and running...
>> 
>> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more 
>> information
>> 
>> I was using the binary version downloaded from the site. I haven't tried to 
>> compile from open source yet.
>> 
>> The port 51200 is opened
>> [root@cent61 logs]# cat tcserver.pid
>> 
>> 8385
>> 
>> [root@cent61 logs]# ps -elf | grep 8385
>> 
>> 0 S pxf       8385     1  0  80   0 - 312017 futex_ Oct29 ?       00:00:40 
>> /usr/jdk64/jdk1.7.0_67/bin/java 
>> -Djava.util.logging.config.file=/var/gphd/pxf/pxf-service/conf/logging.properties
>>  
>> -Djava.util.logging.manager=com.springsource.tcserver.serviceability.logging.TcServerLogManager
>>  -Xmx512M -Xss256K 
>> -Djava.endorsed.dirs=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/endorsed
>>  -classpath 
>> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar
>>  -Dcatalina.base=/var/gphd/pxf/pxf-service 
>> -Dcatalina.home=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
>>  -Djava.io.tmpdir=/var/gphd/pxf/pxf-service/temp 
>> org.apache.catalina.startup.Bootstrap start
>> 
>> 4 S root     23247 22386  0  80   0 - 25813 pipe_w 14:35 pts/2    00:00:00 
>> grep 8385
>> 
>> [root@cent61 logs]# netstat -anp | grep 8385
>> 
>> tcp        0      0 ::ffff:127.0.0.1:6969 <http://127.0.0.1:6969/>       
>> :::*                        LISTEN      8385/java           
>> 
>> tcp        0      0 :::51200                    :::*                        
>> LISTEN      8385/java           
>> 
>> unix  2      [ ]         STREAM     CONNECTED     2344585 8385/java          
>>  
>> 
>> unix  2      [ ]         STREAM     CONNECTED     2344417 8385/java          
>>  
>> 
>> 
>> 
>> Cheers
>> 
>> 
>> 
>> 
>> 在 2015-10-29 03:22:48,"Jimmy Da" <[email protected] 
>> <mailto:[email protected]>> 写道:
>> So it seems that Tomcat server is up, but the pxf servlet has not started. 
>> To confirm this, you can run "pxf-service status" to double check that pxf 
>> service is running.
>> 
>> One guess on what the problem is that the Java libraries were not loaded 
>> correctly. I am looking at this line
>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log
>> 
>> Can you double check that you can find all the jar files at the locations in 
>> this file?
>> https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath
>>  
>> <https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath>
>> 
>> Jimmy Da
>> That’s what people do, they leap, and hoping to God they can fly.
>> 
>> On Wed, Oct 28, 2015 at 12:03 PM, Ting(Goden) Yao <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi sequoiadb, 
>> 
>> which hawq/pxf version are you using (did you just compile the open source 
>> version or it's former pivotal released hawq versions)?
>> 
>> Can you also attach pxf logs for investigation?
>> it's at var/log/gphd/
>> 
>> -Goden
>> 
>> On Wed, Oct 28, 2015 at 1:51 AM sequoiadb <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi guys,
>> 
>> I’m trying to setup PXF for HBase and got the following error:
>> tpch=# create external table hbase_member ( recordkey bytea, "address:city" 
>> varchar, "address:contry" varchar, "address:province" varchar, "info:age" 
>> int, "info:birthday" varchar, "info:company" varchar ) location ( 
>> 'pxf://cent61:50070/member?PROFILE=HBase' <>) FORMAT 'CUSTOM'( 
>> FORMATTER='pxfwritable_import');
>> CREATE EXTERNAL TABLE
>> tpch=# select * from hbase_member;
>> ERROR:  remote component error (0) from '192.168.31.205:51200 
>> <http://192.168.31.205:51200/>': couldn't connect to host (libchurl.c:852)
>> 
>> I could successfully create regular tables and perform queries, but when I 
>> try to create pxf tables I’m keep getting error on connecting to port 51200.
>> 
>> So I tried to start pxf-service and got
>> [root@cent61 profile.d]# service pxf-service init
>> Creating instance 'pxf-service' ...
>>   Using separate layout
>>   Creating bin/setenv.sh
>>   Applying template 'base'
>>     Copying template's contents
>>     Applying fragment 'context-fragment.xml' to 'conf/context.xml'
>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>     Applying fragment 'web-fragment.xml' to 'conf/web.xml'
>>     Applying fragment 'tomcat-users-fragment.xml' to 'conf/tomcat-users.xml'
>>     Applying fragment 'catalina-fragment.properties' to 
>> 'conf/catalina.properties'
>>   Applying template 'base-tomcat-7'
>>     Copying template's contents
>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>     Applying fragment 'web-fragment.xml' to 'conf/web.xml'
>>     Applying fragment 'catalina-fragment.properties' to 
>> 'conf/catalina.properties'
>>   Applying template 'bio'
>>     Copying template's contents
>>     Applying fragment 'server-fragment.xml' to 'conf/server.xml'
>>   Configuring instance 'pxf-service' to use Tomcat version 7.0.55.A.RELEASE
>>   Setting permissions
>> Instance created
>> Connector summary
>>   Port: 51200   Type: Blocking IO   Secure: false
>> [root@cent61 profile.d]# service pxf-service start
>> /var/gphd/pxf /
>> Creating home directory for pxf.
>> Using CATALINA_BASE:   /var/gphd/pxf/pxf-service
>> Using CATALINA_HOME:   
>> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE
>> Using CATALINA_TMPDIR: /var/gphd/pxf/pxf-service/temp
>> Using JRE_HOME:        /usr/jdk64/jdk1.7.0_67
>> Using CLASSPATH:       
>> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar
>> Using CATALINA_PID:    /var/gphd/pxf/pxf-service/logs/tcserver.pid
>> Tomcat started.
>> Status:                RUNNING as PID=8385
>> /
>> Checking if tcServer is up and running...
>> tcServer not responding, re-trying after 1 second (attempt number 1)
>> tcServer not responding, re-trying after 1 second (attempt number 2)
>> Checking if PXF webapp is up and running...
>> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more 
>> information
>> 
>> Now the select statement showing another error:
>> tpch=# select * from base_member;
>> ERROR:  GPHD component not found (libchurl.c:1058)
>> 
>> Looks like hit this error:
>> bool handle_special_error(long response)
>> {
>>      switch (response)
>>      {
>>              case 404:
>>                      elog(ERROR, "GPHD component not found");
>>                      break;
>>              default:
>>                      return false;
>>      }
>>      return true;
>> }
>> 
>> Now do I need some sort of web service running, in order to make it work?
>> Is it caused by PXF web app was not able to run? Which log do I supposed to 
>> look?
>> catalina log showing this and I’m not sure if it’s the right one to look:
>> 29-Oct-2015 16:37:34.923 SEVERE [localhost-startStop-1] 
>> org.apache.catalina.core.ContainerBase.addChildInternal 
>> ContainerBase.addChild: start: 
>>  org.apache.catalina.LifecycleException: Failed to start component 
>> [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]]
>>      at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154)
>>      at 
>> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
>>      at 
>> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>>      at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
>>      at 
>> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083)
>>      at 
>> org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.lang.NoClassDefFoundError: Lorg/apache/commons/logging/Log;
>>      at java.lang.Class.getDeclaredFields0(Native Method)
>>      at java.lang.Class.privateGetDeclaredFields(Class.java:2436)
>>      at java.lang.Class.getDeclaredFields(Class.java:1806)
>>      at 
>> org.apache.catalina.util.Introspection.getDeclaredFields(Introspection.java:106)
>>      at 
>> org.apache.catalina.startup.WebAnnotationSet.loadFieldsAnnotation(WebAnnotationSet.java:270)
>>      at 
>> org.apache.catalina.startup.WebAnnotationSet.loadApplicationListenerAnnotations(WebAnnotationSet.java:89)
>>      at 
>> org.apache.catalina.startup.WebAnnotationSet.loadApplicationAnnotations(WebAnnotationSet.java:63)
>>      at 
>> org.apache.catalina.startup.ContextConfig.applicationAnnotationsConfig(ContextConfig.java:403)
>>      at 
>> org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:879)
>>      at 
>> org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:374)
>>      at 
>> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117)
>>      at 
>> org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90)
>>      at 
>> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5378)
>>      at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
>>      ... 10 more
>> Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log
>>      at 
>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)
>>      at 
>> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
>>      ... 24 more
>> 
>> 29-Oct-2015 16:37:34.924 SEVERE [localhost-startStop-1] 
>> org.apache.catalina.startup.HostConfig.deployWAR Error deploying web 
>> application archive /var/gphd/pxf/pxf-service/webapps/pxf.war
>>  java.lang.IllegalStateException: ContainerBase.addChild: start: 
>> org.apache.catalina.LifecycleException: Failed to start component 
>> [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]]
>>      at 
>> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:904)
>>      at 
>> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
>>      at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649)
>>      at 
>> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083)
>>      at 
>> org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>      at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> 
>> I’m running on a previously built HDP 2.2.8 and performed manual HAWQ 
>> installation. I got most parts done but stuck at PXF component, any help 
>> would be appreciate.
>> 
>> Thanks
>> 
>>  
>> 
>> 
> 
> 

Reply via email to