Thanks Jimmy, That’s very helpful explanation.
It looks like backend/access/external and bin/gpfusion are the main code where PXF request being sent. I’m proposing to create an interface calling our C API if the given URI indicating the data source is located in our own system. It seems we should overwrite pxfwritable_export and pxfwritable_import interface, is it correct? Thanks > 在 2015年10月30日,下午12:01,Jimmy Da <[email protected]> 写道: > > Great job on linking the right classpaths! > > In terms of resource consumption, pxf daemon shouldn't use more than 1GB and > then some (off-heap stack/memory). C.f. > https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10 > <https://github.com/apache/incubator-hawq/blob/master/pxf/gradlew#L10> > > In terms of performance, the slow down is unavoidable compared with hbase > shell as the two go through different paths to retrieve the data. > > In hbase shell, the client talk with HBaseMaster and RegionServer and get > data in an optimal way where the data could even be warm in the HFile cache > (in memory store). > > With PXF, the Java daemon read off the hdfs location in your CREATE EXTERNAL > TABLE definition, talk to NAMENODE to find out the block locations containing > the HFile (on disk), and then use the HBase java file reader to read the data > with some serde, and then send the results to the local HAWQ segments, where > query processing will happen. > > PXF is built in a way that it generalizes data access to different systems > (the previous paragraph could also apply to reading HDFS files, Hive files, > name-your-own-system). The additional overhead mostly come from retrieving > the initial metadata. I suppose it would be an interesting experiment to run > when dealing with larger data set and see if the performance difference is > additive or multiplicative. > > Noa correct me if I made a mistake :) > > Jimmy Da > That’s what people do, they leap, and hoping to God they can fly. > > On Thu, Oct 29, 2015 at 6:49 PM, sequoiadb <[email protected] > <mailto:[email protected]>> wrote: > Creating soft link from /usr/phd to /usr/hdp makes pxf-service start > successfully. > > Just curious what’s the overhead of using PXF? > > postgres=# select * from hbase_member; > recordkey | address:city | address:contry | address:province | info:age | > info > :birthday | info:company > ------------+--------------+----------------+------------------+----------+----- > ----------+-------------- > scutshuxue | hangzhou | china | zhejiang | 99 | > 1987 > -06-17 | alibaba > xiaofeng | jieyang | china | guangdong | | > 1987 > -4-17 | alibaba > (2 rows) > > Time: 434.412 ms > > hbase(main):004:0* scan 'member' > ROW COLUMN+CELL > > scutshuxue column=address:city, > timestamp=1446104911726, value=hangzhou > scutshuxue column=address:contry, > timestamp=1446104910743, value=china > scutshuxue column=address:province, > timestamp=1446104910775, value=zhejiang > scutshuxue column=info:age, timestamp=1446104987420, > value=99 > scutshuxue column=info:birthday, > timestamp=1446104910674, value=1987-06-17 > scutshuxue column=info:company, > timestamp=1446104910715, value=alibaba > xiaofeng column=address:city, > timestamp=1446104920523, value=jieyang > xiaofeng column=address:contry, > timestamp=1446104920461, value=china > xiaofeng column=address:province, > timestamp=1446104920486, value=guangdong > xiaofeng column=address:town, > timestamp=1446104921802, value=xianqiao > xiaofeng column=info:birthday, > timestamp=1446104920358, value=1987-4-17 > xiaofeng column=info:company, > timestamp=1446104920423, value=alibaba > xiaofeng column=info:favorite, > timestamp=1446104920397, value=movie > 2 row(s) in 0.0540 seconds > > It’s very slow comparing running in hbase shell. > > >> 在 2015年10月29日,下午8:33,Noa Horn <[email protected] <mailto:[email protected]>> >> 写道: >> >> The problem is probably because the jars that are required by PXF are not >> found. >> >> In the attached log file, this error for example shows that hadoop-auth.jar >> is not found: >> 29-Oct-2015 16:37:33.405 WARNING [localhost-startStop-1] >> com.pivotal.pxf.service.utilities.CustomWebappLoader.addRepositories Failed >> to load entry /usr/phd/current/hadoop-client/hadoop-auth.jar: >> java.nio.file.NoSuchFileException: /usr/phd/current/hadoop-client >> >> Have a look at /etc/conf/gphd/pxf (old version) or /etc/conf/pxf (open >> source version), at the file pxf-private.classpath. >> Every source specified there is required by PXF. >> The default paths for these resources is under /usr/phd/... (Pivotal >> distribution) while your system is hdp so the path is different. Luckily, we >> also provide the paths for hdp distribution - in pxf-privatehdp.classpath. >> If you copy the content of that file into pxf-private.classpath and run init >> and start again, it should work. >> >> As an aside, it's highly recommended to compile and use the open source >> version, because we made a few changes in the rpms. >> From the pxf directory, run 'make tomcat' to generate a tomcat rpm (required >> by PXF) and 'make rpm' to compile and create PXF rpms. >> >> Noa >> >> >> On Wed, Oct 28, 2015 at 11:38 PM, mailing-list-recv >> <[email protected] <mailto:[email protected]>> >> wrote: >> Thanks guys, >> >> Not sure if mailing list supports attachment, let me try anyway. >> >> Status command shows following: >> [root@cent61 ~]# service pxf-service status >> >> Checking if tcServer is up and running... >> >> Checking if PXF webapp is up and running... >> >> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more >> information >> >> I was using the binary version downloaded from the site. I haven't tried to >> compile from open source yet. >> >> The port 51200 is opened >> [root@cent61 logs]# cat tcserver.pid >> >> 8385 >> >> [root@cent61 logs]# ps -elf | grep 8385 >> >> 0 S pxf 8385 1 0 80 0 - 312017 futex_ Oct29 ? 00:00:40 >> /usr/jdk64/jdk1.7.0_67/bin/java >> -Djava.util.logging.config.file=/var/gphd/pxf/pxf-service/conf/logging.properties >> >> -Djava.util.logging.manager=com.springsource.tcserver.serviceability.logging.TcServerLogManager >> -Xmx512M -Xss256K >> -Djava.endorsed.dirs=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/endorsed >> -classpath >> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar >> -Dcatalina.base=/var/gphd/pxf/pxf-service >> -Dcatalina.home=/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE >> -Djava.io.tmpdir=/var/gphd/pxf/pxf-service/temp >> org.apache.catalina.startup.Bootstrap start >> >> 4 S root 23247 22386 0 80 0 - 25813 pipe_w 14:35 pts/2 00:00:00 >> grep 8385 >> >> [root@cent61 logs]# netstat -anp | grep 8385 >> >> tcp 0 0 ::ffff:127.0.0.1:6969 <http://127.0.0.1:6969/> >> :::* LISTEN 8385/java >> >> tcp 0 0 :::51200 :::* >> LISTEN 8385/java >> >> unix 2 [ ] STREAM CONNECTED 2344585 8385/java >> >> >> unix 2 [ ] STREAM CONNECTED 2344417 8385/java >> >> >> >> >> Cheers >> >> >> >> >> 在 2015-10-29 03:22:48,"Jimmy Da" <[email protected] >> <mailto:[email protected]>> 写道: >> So it seems that Tomcat server is up, but the pxf servlet has not started. >> To confirm this, you can run "pxf-service status" to double check that pxf >> service is running. >> >> One guess on what the problem is that the Java libraries were not loaded >> correctly. I am looking at this line >> Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log >> >> Can you double check that you can find all the jar files at the locations in >> this file? >> https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath >> >> <https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-service/src/main/resources/pxf-privatehdp.classpath> >> >> Jimmy Da >> That’s what people do, they leap, and hoping to God they can fly. >> >> On Wed, Oct 28, 2015 at 12:03 PM, Ting(Goden) Yao <[email protected] >> <mailto:[email protected]>> wrote: >> Hi sequoiadb, >> >> which hawq/pxf version are you using (did you just compile the open source >> version or it's former pivotal released hawq versions)? >> >> Can you also attach pxf logs for investigation? >> it's at var/log/gphd/ >> >> -Goden >> >> On Wed, Oct 28, 2015 at 1:51 AM sequoiadb <[email protected] >> <mailto:[email protected]>> wrote: >> Hi guys, >> >> I’m trying to setup PXF for HBase and got the following error: >> tpch=# create external table hbase_member ( recordkey bytea, "address:city" >> varchar, "address:contry" varchar, "address:province" varchar, "info:age" >> int, "info:birthday" varchar, "info:company" varchar ) location ( >> 'pxf://cent61:50070/member?PROFILE=HBase' <>) FORMAT 'CUSTOM'( >> FORMATTER='pxfwritable_import'); >> CREATE EXTERNAL TABLE >> tpch=# select * from hbase_member; >> ERROR: remote component error (0) from '192.168.31.205:51200 >> <http://192.168.31.205:51200/>': couldn't connect to host (libchurl.c:852) >> >> I could successfully create regular tables and perform queries, but when I >> try to create pxf tables I’m keep getting error on connecting to port 51200. >> >> So I tried to start pxf-service and got >> [root@cent61 profile.d]# service pxf-service init >> Creating instance 'pxf-service' ... >> Using separate layout >> Creating bin/setenv.sh >> Applying template 'base' >> Copying template's contents >> Applying fragment 'context-fragment.xml' to 'conf/context.xml' >> Applying fragment 'server-fragment.xml' to 'conf/server.xml' >> Applying fragment 'web-fragment.xml' to 'conf/web.xml' >> Applying fragment 'tomcat-users-fragment.xml' to 'conf/tomcat-users.xml' >> Applying fragment 'catalina-fragment.properties' to >> 'conf/catalina.properties' >> Applying template 'base-tomcat-7' >> Copying template's contents >> Applying fragment 'server-fragment.xml' to 'conf/server.xml' >> Applying fragment 'web-fragment.xml' to 'conf/web.xml' >> Applying fragment 'catalina-fragment.properties' to >> 'conf/catalina.properties' >> Applying template 'bio' >> Copying template's contents >> Applying fragment 'server-fragment.xml' to 'conf/server.xml' >> Configuring instance 'pxf-service' to use Tomcat version 7.0.55.A.RELEASE >> Setting permissions >> Instance created >> Connector summary >> Port: 51200 Type: Blocking IO Secure: false >> [root@cent61 profile.d]# service pxf-service start >> /var/gphd/pxf / >> Creating home directory for pxf. >> Using CATALINA_BASE: /var/gphd/pxf/pxf-service >> Using CATALINA_HOME: >> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE >> Using CATALINA_TMPDIR: /var/gphd/pxf/pxf-service/temp >> Using JRE_HOME: /usr/jdk64/jdk1.7.0_67 >> Using CLASSPATH: >> /opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/bootstrap.jar:/opt/vmware/vfabric-tc-server-standard/tomcat-7.0.55.A.RELEASE/bin/tomcat-juli.jar >> Using CATALINA_PID: /var/gphd/pxf/pxf-service/logs/tcserver.pid >> Tomcat started. >> Status: RUNNING as PID=8385 >> / >> Checking if tcServer is up and running... >> tcServer not responding, re-trying after 1 second (attempt number 1) >> tcServer not responding, re-trying after 1 second (attempt number 2) >> Checking if PXF webapp is up and running... >> ERROR: PXF webapp is inaccessible but tcServer is up. Check logs for more >> information >> >> Now the select statement showing another error: >> tpch=# select * from base_member; >> ERROR: GPHD component not found (libchurl.c:1058) >> >> Looks like hit this error: >> bool handle_special_error(long response) >> { >> switch (response) >> { >> case 404: >> elog(ERROR, "GPHD component not found"); >> break; >> default: >> return false; >> } >> return true; >> } >> >> Now do I need some sort of web service running, in order to make it work? >> Is it caused by PXF web app was not able to run? Which log do I supposed to >> look? >> catalina log showing this and I’m not sure if it’s the right one to look: >> 29-Oct-2015 16:37:34.923 SEVERE [localhost-startStop-1] >> org.apache.catalina.core.ContainerBase.addChildInternal >> ContainerBase.addChild: start: >> org.apache.catalina.LifecycleException: Failed to start component >> [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]] >> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:154) >> at >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) >> at >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) >> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649) >> at >> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083) >> at >> org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.NoClassDefFoundError: Lorg/apache/commons/logging/Log; >> at java.lang.Class.getDeclaredFields0(Native Method) >> at java.lang.Class.privateGetDeclaredFields(Class.java:2436) >> at java.lang.Class.getDeclaredFields(Class.java:1806) >> at >> org.apache.catalina.util.Introspection.getDeclaredFields(Introspection.java:106) >> at >> org.apache.catalina.startup.WebAnnotationSet.loadFieldsAnnotation(WebAnnotationSet.java:270) >> at >> org.apache.catalina.startup.WebAnnotationSet.loadApplicationListenerAnnotations(WebAnnotationSet.java:89) >> at >> org.apache.catalina.startup.WebAnnotationSet.loadApplicationAnnotations(WebAnnotationSet.java:63) >> at >> org.apache.catalina.startup.ContextConfig.applicationAnnotationsConfig(ContextConfig.java:403) >> at >> org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:879) >> at >> org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:374) >> at >> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:117) >> at >> org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:90) >> at >> org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5378) >> at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) >> ... 10 more >> Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.Log >> at >> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) >> at >> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >> ... 24 more >> >> 29-Oct-2015 16:37:34.924 SEVERE [localhost-startStop-1] >> org.apache.catalina.startup.HostConfig.deployWAR Error deploying web >> application archive /var/gphd/pxf/pxf-service/webapps/pxf.war >> java.lang.IllegalStateException: ContainerBase.addChild: start: >> org.apache.catalina.LifecycleException: Failed to start component >> [StandardEngine[Catalina].StandardHost[localhost].StandardContext[/pxf]] >> at >> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:904) >> at >> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) >> at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:649) >> at >> org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1083) >> at >> org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1880) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> I’m running on a previously built HDP 2.2.8 and performed manual HAWQ >> installation. I got most parts done but stuck at PXF component, any help >> would be appreciate. >> >> Thanks >> >> >> >> > >
