I made some progress (or so I think) and have a different error now. The interesting part remains the write into HDFS. The SELECT query continue to work fine.
Here's the error I am receiving now. gagan=# INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi'); ERROR: remote component error (0): Failed to connect to my-hadoop-cluster port 51200: Connection timed out (libchurl.c:878) (seg0 my-hadoop-cluster:40000 pid=25073) (dispatcher.c:1753) The select query continue to work and fetch results from HDFS. gagan=# SELECT * FROM ext_get_foo ; i | bar -----+--------------- 100 | Gagan Brahmi 101 | Another Emp 102 | Some More (3 rows) The telnet for 51200 works fine from the segment & master (single node machine) so does the connection to hdfs (8020). I have attached some logs and Stack Trace found in Master and Segment logs. Is there a simpler way to test if pxf-service with it's classpaths and jar able to write to HDFS? Regards, Gagan Brahmi On Tue, Mar 15, 2016 at 4:09 PM, Ting(Goden) Yao <[email protected]> wrote: > Gagan - did you see any error in HAWQ logs or Hadoop logs? > This doesn't look like a PXF issue. > > On Fri, Mar 11, 2016 at 8:59 AM Gagan Brahmi <[email protected]> wrote: >> >> And this is what I mean when I say that fetch (select) seems to be >> working. I place a dummy file to run select query and it provides the >> results. >> >> hdfs@my-hadoop-cluster:~> cat /tmp/test_fetch >> 100 | Random Value >> 101 | Another Random >> hdfs@my-hadoop-cluster:~> hadoop fs -put /tmp/test_fetch /tmp/foo_bar/ >> hdfs@suse11-workplace:~> logout >> my-hadoop-cluster:~ # su - gpadmin >> gpadmin@my-hadoop-cluster:~> source /usr/local/hawq/greenplum_path.sh >> gpadmin@my-hadoop-cluster:~> psql -p 10432 gagan >> psql (8.2.15) >> Type "help" for help. >> >> gagan=# SELECT * FROM ext_get_foo ; >> i | bar >> -----+----------------- >> 100 | Random Value >> 101 | Another Random >> (2 rows) >> >> gagan=# >> >> >> Table DDLs >> >> gagan=# CREATE WRITABLE EXTERNAL TABLE ext_put_foo (i int, bar text) >> LOCATION >> ('pxf://my-hadoop-cluster:51200/tmp/foo_bar?profile=HdfsTextSimple') >> FORMAT 'text' (delimiter '|' null 'null'); >> CREATE EXTERNAL TABLE >> gagan=# CREATE EXTERNAL TABLE ext_get_foo (i int, bar text) LOCATION >> ('pxf://my-hadoop-cluster:51200/tmp/foo_bar?profile=HdfsTextSimple') >> FORMAT 'text' (delimiter '|' null 'null'); >> CREATE EXTERNAL TABLE >> gagan=# INSERT into ext_put_foo VALUES (1, 'Gagan'); >> ERROR: failed sending to remote component (libchurl.c:574) (seg0 >> my-hadoop-cluster:40000 pid=824) (dispatcher.c:1753) >> >> >> Regards, >> Gagan Brahmi >> >> On Fri, Mar 11, 2016 at 9:53 AM, Gagan Brahmi <[email protected]> >> wrote: >> > Nothing in the pxf-service.log or the catalina.out for pxf service. >> > >> > It has the normal startup messages while the webapp starts up. I did >> > modified the overcommit memory to 1 and restarted all the service >> > (just in case). But that still didn't seem to have made any >> > difference. >> > >> > I still see the file is closed by DFSClient message in HDFS everytime >> > I try to run an insert command. The select looks to be working fine. >> > >> > >> > Regards, >> > Gagan Brahmi >> > >> > On Fri, Mar 11, 2016 at 9:21 AM, Daniel Lynch <[email protected]> wrote: >> >> check the pxf service logs for errors. I suspect there is an out of >> >> memory >> >> event at some point during the connection considering this is a single >> >> node >> >> deployment. >> >> >> >> Also make sure overcommit is disabled to prevent virtual mem OOM >> >> errors. >> >> This of course would not be recommended in production but for single >> >> node >> >> deployments you will need this setting. >> >> echo 1 > /proc/sys/vm/overcommit_memory >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Daniel Lynch >> >> Mon-Fri 9-5 PST >> >> Office: 408 780 4498 >> >> >> >> On Fri, Mar 11, 2016 at 2:01 AM, Gagan Brahmi <[email protected]> >> >> wrote: >> >> >> >>> This a standalone box with no ha for hdfs. >> >>> >> >>> I haven't enabled the ha properties in hawq site. >> >>> >> >>> Regards, >> >>> Gagan >> >>> On Mar 11, 2016 00:56, "Leon Zhang" <[email protected]> wrote: >> >>> >> >>> > Hi, Gagang >> >>> > >> >>> > It seems you use HA hdfs cluster? I am not sure if HAWQ can work >> >>> > like >> >>> > this. Can any HAWQ developer clarify this condition? >> >>> > If so, you can try a non-HA hdfs cluster with direct IP access. All >> >>> > PXF >> >>> > services are working perfect here. >> >>> > >> >>> > >> >>> > On Fri, Mar 11, 2016 at 10:25 AM, Gagan Brahmi >> >>> > <[email protected]> >> >>> > wrote: >> >>> > >> >>> >> Thank you Ting! >> >>> >> >> >>> >> That was the problem. It seemed to have worked, but now I am stuck >> >>> >> with a different error. >> >>> >> >> >>> >> gagan=# INSERT into ext_put_foo VALUES (1, 'Gagan'); >> >>> >> ERROR: failed sending to remote component (libchurl.c:574) (seg0 >> >>> >> my-hadoop-cluster:40000 pid=24563) (dispatcher.c:1753) >> >>> >> >> >>> >> This certainly mean that the back ground service has stopped >> >>> >> serving >> >>> >> connection for some reason. >> >>> >> >> >>> >> I check the namenode and find this. >> >>> >> >> >>> >> 2016-03-10 19:28:11,759 INFO hdfs.StateChange >> >>> >> (FSNamesystem.java:completeFile(3503)) - DIR* completeFile: >> >>> >> /tmp/foo_bar/1350_0 is closed by >> >>> >> DFSClient_NONMAPREDUCE_-244490296_23 >> >>> >> >> >>> >> I have a single node installation with a HDFS replication factor of >> >>> >> 1 >> >>> >> (both in hdfs-site and hdfs-client for hawq). >> >>> >> >> >>> >> I have also tried to update the connectTimeout value to 60 secs in >> >>> >> the >> >>> >> server.xml file for pxf webapp. >> >>> >> >> >>> >> A normal write to HDFS works fine. I see file being created in the >> >>> >> directory foor_bar but are 0 bytes in size. >> >>> >> >> >>> >> -rw-r--r-- 1 pxf hdfs 0 2016-03-10 19:08 >> >>> >> /tmp/foo_bar/1336_0 >> >>> >> -rw-r--r-- 1 pxf hdfs 0 2016-03-10 19:27 >> >>> >> /tmp/foo_bar/1349_0 >> >>> >> -rw-r--r-- 1 pxf hdfs 0 2016-03-10 19:28 >> >>> >> /tmp/foo_bar/1350_0 >> >>> >> >> >>> >> Not sure if someone has encountered this before. Would appreciate >> >>> >> any >> >>> >> inputs. >> >>> >> >> >>> >> >> >>> >> Regards, >> >>> >> Gagan Brahmi >> >>> >> >> >>> >> On Thu, Mar 10, 2016 at 11:45 AM, Ting(Goden) Yao <[email protected]> >> >>> >> wrote: >> >>> >> > Your table definition: >> >>> >> > >> >>> >> > ('pxf://my-hadoop-cluster:*50070*/foo_bar?profile=HdfsTextSimple') >> >>> >> > if you installed pxf on 51200, you need to use the port 51200 >> >>> >> > >> >>> >> > >> >>> >> > On Thu, Mar 10, 2016 at 10:34 AM Gagan Brahmi >> >>> >> > <[email protected]> >> >>> >> wrote: >> >>> >> > >> >>> >> >> Hi Team, >> >>> >> >> >> >>> >> >> I was wondering if someone has encountered this problem before. >> >>> >> >> >> >>> >> >> While trying to work with PXF on hawq 2.0 I am encountering the >> >>> >> following >> >>> >> >> error: >> >>> >> >> >> >>> >> >> gagan=# CREATE EXTERNAL TABLE ext_get_foo (i int, bar text) >> >>> >> >> LOCATION >> >>> >> >> ('pxf://my-hadoop-cluster:50070/foo_bar?profile=HdfsTextSimple') >> >>> >> >> FORMAT 'text' (delimiter '|' null 'null'); >> >>> >> >> >> >>> >> >> gagan=# SELECT * FROM ext_get_foo ; >> >>> >> >> ERROR: remote component error (404): PXF service could not be >> >>> >> >> reached. PXF is not running in the tomcat container >> >>> >> >> (libchurl.c:878) >> >>> >> >> >> >>> >> >> The same happens when I try to write to an external table using >> >>> >> >> PXF. >> >>> >> >> >> >>> >> >> I believe the above error signifies that PXF service isn't >> >>> >> >> running or >> >>> >> >> unavailable. But PXF is running on port 51200. >> >>> >> >> >> >>> >> >> Curl response works fine as well: >> >>> >> >> >> >>> >> >> # curl -s http://localhost:51200/pxf/v0 >> >>> >> >> Wrong version v0, supported version is v14 >> >>> >> >> >> >>> >> >> PXF is build using gradlew and installed as RPM files. I also >> >>> >> >> have >> >>> >> >> tomcat 7.0.62 installed with the PXF packages. >> >>> >> >> >> >>> >> >> The following is how PXF is running on the instance: >> >>> >> >> >> >>> >> >> pxf 21405 0.3 2.8 825224 115164 ? Sl 02:07 0:10 >> >>> >> >> /usr/java/latest/bin/java >> >>> >> >> >> >>> >> >> >> >>> >> >> >>> >> >>> -Djava.util.logging.config.file=/var/pxf/pxf-service/conf/logging.properties >> >>> >> >> >> >>> >> >> -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager >> >>> >> >> -Xmx512M -Xss256K >> >>> >> >> -Djava.endorsed.dirs=/var/pxf/pxf-service/endorsed >> >>> >> >> -classpath >> >>> >> >> >> >>> >> >> >>> >> >>> /var/pxf/pxf-service/bin/bootstrap.jar:/var/pxf/pxf-service/bin/tomcat-juli.jar >> >>> >> >> -Dcatalina.base=/var/pxf/pxf-service >> >>> >> >> -Dcatalina.home=/var/pxf/pxf-service >> >>> >> >> -Djava.io.tmpdir=/var/pxf/pxf-service/temp >> >>> >> >> org.apache.catalina.startup.Bootstrap start >> >>> >> >> >> >>> >> >> I do not have apache-tomcat running. Not sure how are the two >> >>> >> >> interrelated. But the RPM file created by gradlew requires >> >>> >> >> tomcat for >> >>> >> >> pxf-service. >> >>> >> >> >> >>> >> >> I would appreciate any inputs into this problem. >> >>> >> >> >> >>> >> >> >> >>> >> >> Regards, >> >>> >> >> Gagan Brahmi >> >>> >> >> >> >>> >> >> >>> > >> >>> > >> >>>
================= From Master ================= 2016-03-15 18:41:35.963865 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","AsyncComm framework receives message 262 from FD 5",,,,,,,0,,"rmcomm_Message.c",100, 2016-03-15 18:41:35.963943 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Resource manager tracked connection.",,,,,,,0,,"requesthandler.c",186, 2016-03-15 18:41:35.964234 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd21,seg-1,,,x1028,sx1,"LOG","00000","AsyncComm framework receives messag e 2310 from FD 29",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_Message.c",100, 2016-03-15 18:41:35.964253 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd21,seg-1,,,x1028,sx1,"LOG","00000","ConnID 6. Registered in HAWQ resour ce manager (By OID)",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_QD2RM.c",601, 2016-03-15 18:41:35.964264 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd21,seg-1,,,x1028,sx1,"LOG","00000","ConnID: 6. Acquire resource request for index 0. Max vseg size 1 Min vseg size 1 Estimated slice size 2 estimated IO bytes size 134217728 Preferred node count 0.",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_QD2RM. c",693, 2016-03-15 18:41:35.964440 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","AsyncComm framework receives message 259 from FD 6",,,,,,,0,,"rmcomm_Message.c",100, 2016-03-15 18:41:35.964453 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Expect query resource for session 28",,,,,,,0,,"resqueuemanager.c",2225, 2016-03-15 18:41:35.964460 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Expect query resource (256 MB, 0.062500 CORE) x 256 (MIN 6) after checking queue capacity.",,,,,,,0, ,"resqueuemanager.c",3760, 2016-03-15 18:41:35.964467 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Expect query resource (256 MB, 0.062500 CORE) x 1 (MIN 1) after checking query expectation 1 (MIN 1) .",,,,,,,0,,"resqueuemanager.c",3790, 2016-03-15 18:41:35.964475 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Expect query resource (256 MB, 0.062500 CORE) x 1 ( MIN 1 ) resource after adjusting based on queue NVSEG limits.",,,,,,,0,,"resqueuemanager.c",2247, 2016-03-15 18:41:35.964482 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","Latency of getting resource allocated is 62us",,,,,,,0,,"resqueuemanager.c",4745, 2016-03-15 18:41:36.067148 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd21,seg-1,,,x1028,sx1,"LOG","00000","AsyncComm framework receives messag e 2307 from FD 29",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_Message.c",100, 2016-03-15 18:41:36.067180 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd21,seg-1,,,x1028,sx1,"LOG","00000","ConnID 6. Acquired resource from re source manager, (256 MB, 0.062500 CORE) x 1.",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_QD2RM.c",860, 2016-03-15 18:41:36.067204 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd21,seg-1,,,x1028,sx1,"LOG","00000","data locality ratio: 0.000; virtual segment number: 1; different host number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): (0.000/0/0); segment size with penalty(avg/min/max): (0.000/0/0); continuity (avg/min/max): (0.000/0.000/0.000).",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"cdbdatalocality.c",3383, 2016-03-15 18:41:36.070267 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","AsyncComm framework receives message 260 from FD 5",,,,,,,0,,"rmcomm_Message.c",100, 2016-03-15 18:41:36.070289 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Returned resource.",,,,,,,0,,"requesthandler.c",557, 2016-03-15 18:41:36.070385 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"LOG","00000","AsyncComm framework receives messag e 2308 from FD 29",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_Message.c",100, 2016-03-15 18:41:36.070401 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"LOG","00000","ConnID 6. Returned resource to reso urce manager.",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_QD2RM.c",943, 2016-03-15 18:41:36.070534 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","AsyncComm framework receives message 258 from FD 6",,,,,,,0,,"rmcomm_Message.c",100, 2016-03-15 18:41:36.070556 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 6. Connection is unregistered.",,,,,,,0,,"requesthandler.c",283, 2016-03-15 18:41:36.070635 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"LOG","00000","AsyncComm framework receives messag e 2306 from FD 29",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_Message.c",100, 2016-03-15 18:41:36.070723 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"LOG","00000","ConnID 6. Unregistered from HAWQ re source manager.",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"rmcomm_QD2RM.c",653, 2016-03-15 18:42:40.096168 MST,,,p21702,th-1373808416,,,,0,con4,,seg-10000,,,,,"LOG","00000","Resource water mark changes from (2048 MB, 0.500000 CORE) to (256 MB, 0.062500 CORE)",,,,,,,0,,"resqueuemanage r.c",2919, 2016-03-15 18:44:46.516640 MST,"gpadmin","gagan",p24124,th-1898744064,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"LOG","00000","function executormgr_consume meets error, connection is bad.",,,,,,,0,,,, 2016-03-15 18:44:46.516663 MST,"gpadmin","gagan",p24124,th-1898744064,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"LOG","00000","dispmgr_thread_func_run meets consu me error for executor, entering error_cleanup",,,,,,,0,,,, 2016-03-15 18:44:46.528934 MST,"gpadmin","gagan",p24124,th-1373808416,"192.168.59.104","32595",2016-03-15 18:15:46 MST,1028,con28,cmd22,seg-1,,,x1028,sx1,"ERROR","XX000","remote component error (0): Faile d to connect to my-hadoop-cluster port 51200: Connection timed out (libchurl.c:878) (seg2 my-hadoop-cluster:40000 pid=36963) (dispatcher.c:1753)",,,,,,"INSERT INTO ext_put_foo VALUES ( 1, 'Gagan Brahmi');",0,,"dispatcher.c",1753,"Stack trace: 1 0x8a99e2 postgres errstart (??:?) 2 0x9eff83 postgres <symbol not found> (dispatcher.c:?) 3 0x683a2c postgres mppExecutorFinishup (??:?) 4 0x67090f postgres ExecutorEnd (??:?) 5 0x7dd1f7 postgres <symbol not found> (pquery.c:?) 6 0x7dd89c postgres <symbol not found> (pquery.c:?) 7 0x7df412 postgres PortalRun (??:?) 8 0x7d65a1 postgres <symbol not found> (postgres.c:?) 9 0x7d84ba postgres PostgresMain (??:?) 10 0x78b71a postgres <symbol not found> (postmaster.c:?) 11 0x78ec89 postgres PostmasterMain (??:?) 12 0x4b5c6f postgres main (??:?) 13 0x7fd6ab0dcc36 libc.so.6 __libc_start_main (??:0) 14 0x4b5ced postgres <symbol not found> (start.S:116) " ================= From Segment ================= 2016-03-15 18:44:46.516394 MST,"gpadmin","gagan",p36963,th2057959648,"192.168.59.104","27774",2016-03-15 18:37:39 MST,990,con28,cmd22,seg0,,,x990,sx1,"ERROR","XX000","remote component error (0): Failed to connect to my-hadoop-cluster port 51200: Connection timed out (libchurl.c:878)",,,,,,"INSERT INTO ext_put_foo VALUES (1, 'Gagan Brahmi');",0,,"libchurl.c",878,"Stack trace: 1 0x8a99e2 postgres errstart + 0x282 2 0x8aa0bb postgres elog_finish + 0xab 3 0x523b96 postgres check_response_code + 0x186 4 0x523d46 postgres churl_read_check_connectivity + 0x16 5 0x528719 postgres <symbol not found> + 0x528719 6 0x5288e1 postgres call_rest + 0x51 7 0x527600 postgres <symbol not found> + 0x527600 8 0x5276db postgres get_datanode_rest_servers + 0x1b 9 0x7fcd5be67dc7 pxf.so get_pxf_server + 0x127 10 0x7fcd5be68082 pxf.so gpbridge_export_start + 0x42 11 0x7fcd5be681d0 pxf.so gpbridge_export + 0x50 12 0x51e218 postgres <symbol not found> + 0x51e218 13 0x521408 postgres url_fwrite + 0x98 14 0x51d150 postgres external_insert + 0x190 15 0x6b73f1 postgres ExecInsert + 0x171 16 0x671c24 postgres <symbol not found> + 0x671c24 17 0x6720bf postgres ExecutorRun + 0x33f 18 0x7dd1d4 postgres <symbol not found> + 0x7dd1d4 19 0x7dd89c postgres <symbol not found> + 0x7dd89c 20 0x7df412 postgres PortalRun + 0x3d2 21 0x7d52f7 postgres <symbol not found> + 0x7d52f7 22 0x7da46a postgres PostgresMain + 0x2e1a 23 0x78b71a postgres <symbol not found> + 0x78b71a 24 0x78ec89 postgres PostmasterMain + 0x7e9 25 0x4b5c6f postgres main + 0x50f 26 0x7fcd779a6c36 libc.so.6 __libc_start_main + 0xe6 27 0x4b5ced postgres <symbol not found> + 0x4b5ced "
