Thanks Noa,
So is it safe to assume it's always append a slash at beginning, and followed
by a slash and other stuff?
Can you show me the code where it's construct the path? I couldn't find it in
order to confirm the logic.
I used the following code to extract data source and I worked in my
environment. Just not sure if the getDataSource() always return the format like
that.
StringTokenizer st = new StringTokenizer ( wds, "/", false ) ;
if ( st.countTokens() == 0 ) {
throw new RuntimeException ( "Invalid data source: " + wds ) ;
}
return st.nextToken () ;
At 2015-11-07 02:36:26, "Noa Horn" <[email protected]> wrote:
Hi,
1. Regarding the permissions issue - PXF is running as pxf user. So any
operation on Hadoop needs to be done on files or directories which allow pxf
user to read/write.
You mentioned changing pxf user to be part of hdfs, but I am not sure it was
necessary. The PXF RPM already adds pxf user to hadoop group.
2. Regarding writable tables. The way to use them is to define a directory
where the data will be written. When the SQL executes, each segment writes its
own data to the same directory, as defined in the external table, but in a
separate file. That's why the setDataSource() is needed when writing, because
each segments creates its own unique file name. The changes you saw in the path
is expected, it should be "<directory>/<unique_file_name>".
Regards,
Noa
On Fri, Nov 6, 2015 at 12:11 AM, hawqstudy <[email protected]> wrote:
Tried to set pxf user to hdfs in /etc/init.d/pxf-service and fix file owners
for several dirs.
Now I got problem that the getDataSource() returns something strange.
My DDL is:
pxf://localhost:51200/foo.main?PROFILE=XXXX
In Read Accessor, getDataSource successfully get foo.main as the data source
name.
However in Write Accessor, InputData.getDataSource() call shows /foo.main/1365_0
By tracking back the code I found pxf.service.WriteBridge.stream has:
public Response stream(@Context final ServletContext servletContext,
@Context HttpHeaders headers,
@QueryParam("path") String path,
InputStream inputStream) throws Exception {
/* Convert headers into a case-insensitive regular map */
Map<String, String> params =
convertToCaseInsensitiveMap(headers.getRequestHeaders());
if (LOG.isDebugEnabled()) {
LOG.debug("WritableResource started with parameters: " + params + "
and write path: " + path);
}
ProtocolData protData = new ProtocolData(params);
protData.setDataSource(path);
SecuredHDFS.verifyToken(protData, servletContext);
Bridge bridge = new WriteBridge(protData);
// THREAD-SAFE parameter has precedence
boolean isThreadSafe = protData.isThreadSafe() && bridge.isThreadSafe();
LOG.debug("Request for " + path + " handled " +
(isThreadSafe ? "without" : "with") + " synchronization");
return isThreadSafe ?
writeResponse(bridge, path, inputStream) :
synchronizedWriteResponse(bridge, path, inputStream);
}
The highlighted protData.setDataSource(path); set the data source from the
expected one into the strange one.
So I keep looking for where the path is from, jdb shows
tomcat-http--18[1] print path
path = "/foo.main/1365_0"
tomcat-http--18[1] where
[1] com.pivotal.pxf.service.rest.WritableResource.stream
(WritableResource.java:102)
[2] sun.reflect.NativeMethodAccessorImpl.invoke0 (本机方法)
[3] sun.reflect.NativeMethodAccessorImpl.invoke
(NativeMethodAccessorImpl.java:57)
...
tomcat-http--18[1] print params
params = "{accept=*/*, content-type=application/octet-stream,
expect=100-continue, host=127.0.0.1:51200, transfer-encoding=chunked,
X-GP-ACCESSOR=com.xxxx.pxf.plugins.xxxx.XXXXAccessor, x-gp-alignment=8,
x-gp-attr-name0=id, x-gp-attr-name1=total, x-gp-attr-name2=comments,
x-gp-attr-typecode0=23, x-gp-attr-typecode1=23, x-gp-attr-typecode2=1043,
x-gp-attr-typename0=int4, x-gp-attr-typename1=int4,
x-gp-attr-typename2=varchar, x-gp-attrs=3, x-gp-data-dir=foo.main,
x-gp-format=GPDBWritable,
X-GP-FRAGMENTER=com.xxxx.pxf.plugins.xxxx.XXXXFragmenter, x-gp-has-filter=0,
x-gp-profile=XXXX, X-GP-RESOLVER=com.xxxx.pxf.plugins.xxxx.XXXXResolver,
x-gp-segment-count=1, x-gp-segment-id=0,
x-gp-uri=pxf://localhost:51200/foo.main?PROFILE=XXXX, x-gp-url-host=localhost,
x-gp-url-port=51200, x-gp-xid=1365}"
So stream() is called from NativeMethodAccessorImpl.invoke0, that's something I
couldn't follow. Is it making sense that "path" showing something strange?
Should I get rid of protData.setDataSource(path) here? What is this code used
for? Where is the "path" coming from? Is it constructed by X-GP-DATA-DIR and
X-GP-XID and X-GP-SEGMENT-ID ?
I'd expect to get "foo.main" instead of "/foo.main/1365_0" from
InputData.getDataSource() like what I got in ReadAccessor
At 2015-11-06 11:49:08, "hawqstudy" <[email protected]> wrote:
Hi Guys,
I've developed a PXF plugin and able to make it work to read from our data
source.
However I implemented WriteResolver and WriteAccessor, however when I tried to
insert into the table I got the following exception:
postgres=# CREATE EXTERNAL TABLE t3 (id int, total int, comments varchar)
LOCATION ('pxf://localhost:51200/foo.bar?PROFILE=XXXX')
FORMAT 'custom' (formatter='pxfwritable_import') ;
CREATE EXTERNAL TABLE
postgres=# select * from t3;
id | total | comments
-----+-------+----------
100 | 500 |
100 | 5000 | abcdfe
| 5000 | 100
(3 rows)
postgres=# drop external table t3;
DROP EXTERNAL TABLE
postgres=# CREATE WRITABLE EXTERNAL TABLE t3 (id int, total int, comments
varchar)
LOCATION ('pxf://localhost:51200/foo.bar?PROFILE=XXXX')
FORMAT 'custom' (formatter='pxfwritable_export') ;
CREATE EXTERNAL TABLE
postgres=# insert into t3 values ( 1, 2, 'hello');
ERROR: remote component error (500) from '127.0.0.1:51200': type Exception
report message
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Access denied for user pxf. Superuser privilege is required description
The server encountered an internal error that prevented it from fulfilling this
request. exception javax.servlet.ServletException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Access denied for user pxf. Superuser privilege is required (libchurl.c:852)
(seg6 localhost.localdomain:40000 pid=19701) (dispatcher.c:1681)
Nov 07, 2015 11:40:08 AM com.sun.jersey.spi.container.ContainerResponse
mapMappableContainerException
The log shows:
SEVERE: The exception contained within MappableContainerException could not be
mapped to a response, re-throwing to the HTTP container
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Access denied for user pxf. Superuser privilege is required
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:122)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:5906)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.datanodeReport(FSNamesystem.java:4941)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDatanodeReport(NameNodeRpcServer.java:1033)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDatanodeReport(ClientNamenodeProtocolServerSideTranslatorPB.java:698)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1407)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy63.getDatanodeReport(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDatanodeReport(ClientNamenodeProtocolTranslatorPB.java:626)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy64.getDatanodeReport(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.datanodeReport(DFSClient.java:2562)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getDataNodeStats(DistributedFileSystem.java:1196)
at
com.pivotal.pxf.service.rest.ClusterNodesResource.read(ClusterNodesResource.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:731)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:957)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
Since our datasource is totally indepedent from HDFS, I'm not sure why it's
still trying to access HDFS and get superuser access.
Please let me know if there anything missing here.
Cheers