Re: Python phoenixdb adapter and JSON serialization on PQS

Josh Elser Mon, 05 Nov 2018 10:43:41 -0800

Is the OOME issue regardless of using the Java client (sqlline-thin) andthe Python client? I would like to know more about this one. If you canshare something that reproduces the problem for you, I'd like to lookinto it. The only suggestion I have at this point in time is to makesure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) viaPHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the rightdirectory when you launch PQS.

Regarding performance, as you've described it, it sounds like the Pythondriver is just slower than the Java driver. You are right that theoperations in PQS should be exactly the same, regardless of the clientyou're using -- that is how this architecture works. Avatica is a wireprotocol that all clients use to talk to PQS. More digging/informationyou can provide about the exact circumstances (and, again,steps/environment to reproduce what you see) would be extremely helpful.


Thanks Manoj.

- Josh

On 11/2/18 7:16 PM, Manoj Ganesan wrote:

Thanks Josh for the response!

I would definitely like to use protobuf serialization, but I'm observingperformance issues trying to run queries with a large number of results.One problem is that I observe PQS runs out of memory, when its trying to(what looks like to me) serialize the results in Avatica. The other isthat the phoenixdb python adapter itself spends a large amount of timein the logic<https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>where its converting the protobuf rows to python objects.

Interestingly when we use sqlline-thin.py instead of python phoenixdb,the protobuf serialization works fine and responses are fast. It's notclear to me why PQS would have problems when using the python adapterand not when using sqlline-thin, do they follow different code paths(especially around serialization)?


Thanks again,
Manoj

On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <els...@apache.org<mailto:els...@apache.org>> wrote:


    I would strongly suggest you do not use the JSON serialization.

    The JSON support is implemented via Jackson which has no means to make
    backwards compatibility "easy". On the contrast, protobuf makes this
    extremely easy and we have multiple examples over the past years where
    we've been able to fix bugs in a backwards compatible manner.

    If you want the thin client to continue to work across versions, stick
    with protobuf.

    On 11/2/18 5:27 PM, Manoj Ganesan wrote:
     > Hey everyone,
     >
     > I'm trying to use the Python phoenixdb adapter work with JSON
     > serialization on PQS.
     >
     > I'm using Phoenix 4.14 and the adapter works fine with protobuf, but
     > when I try making it work with an older version of phoenixdb
    (before the
     > JSON to protobuf switch was introduced), it just returns 0 rows.
    I don't
     > see anything in particular wrong with the HTTP requests itself,
    and they
     > seem to conform to the Avatica JSON spec
     > (http://calcite.apache.org/avatica/docs/json_reference.html).
     >
     > Here's the result (with some debug statements) that returns 0 rows.
     > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
     >
     > request body =  {"maxRowCount": -2, "connectionId":
     > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
    "prepareAndExecute",
     > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
     > request headers =  {'content-type': 'application/json'}
     > _post_request: got response {'fp': <socket._fileobject object at
     > 0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
     > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
    'version': 11,
     > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
     > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
     > response.read(): body =
     >
    
{"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
     >
    
":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}

     >
     >
     > The same query issued against a PQS started with PROTOBUF
    serialization
     > and using a newer phoenixdb adapter returns the correct number of
    rows.
     >
     > Has anyone had luck making this work?
     >
     > Thanks,
     > Manoj
     >

Re: Python phoenixdb adapter and JSON serialization on PQS

Reply via email to