Re: Python phoenixdb adapter and JSON serialization on PQS

Josh Elser Tue, 06 Nov 2018 09:52:32 -0800


On 11/5/18 10:10 PM, Manoj Ganesan wrote:

Thanks for the pointers Josh. I'm working on getting a representativeconcise test to demonstrate the issue.


Meanwhile, I had one question regarding the following:

    You are right that the operations in PQS should be exactly the same,
    regardless of the client you're using -- that is how this
    architecture works.


IIUC, this means the following 2 methods should yield the same result:

 1. sqlline-thin.py -s JSON <query_file>
 2. using a python avatica client script making JSON requests

That's correct. Any client which speaks to PQS should see the sameresults. There may be bugs in the client implementation, of course,which make this statement false.

I made the following change in hbase-site.xml on the PQS host:

<property>
     <name>phoenix.queryserver.serialization</name>
     <value>JSON</value>
</property>
I notice that executing "sqlline-thin.py -s JSON <query_file>" returnsresults just fine. However, when I use a simple script to try the samequery, it returns 0 rows. I'm attaching the Python script here. Thescript essentially makes HTTP calls using the Avatica JSON reference<https://calcite.apache.org/avatica/docs/json_reference.html>. I assumedthat the sqlline-thin wrapper (when passed the -s JSON flag) also makeHTTP calls based on the JSON reference, is that not correct?

Apache mailing lists strip attachments. Please consider hosting itsomewhere else, along with instructions/scripts to generate the requiredtables. Please provide some more analysis of the problem than just asummarization of what you see as an end-user -- I don't have the cyclesor interest to debug the entire system for you :)

Avatica is a protocol that interprets JDBC using some serialization(JSON or Protobuf today) and a transport (only HTTP) to a remote serverto run the JDBC oeprations. So, yes: an Avatica client is always usingHTTP, given whatever serialization you instruct it to use.

I'll work on getting some test cases here soon to illustrate this aswell as the performance problem.


Thanks again!
Manoj

On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <els...@apache.org<mailto:els...@apache.org>> wrote:


    Is the OOME issue regardless of using the Java client (sqlline-thin)
    and
    the Python client? I would like to know more about this one. If you can
    share something that reproduces the problem for you, I'd like to look
    into it. The only suggestion I have at this point in time is to make
    sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via
    PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right
    directory when you launch PQS.

    Regarding performance, as you've described it, it sounds like the
    Python
    driver is just slower than the Java driver. You are right that the
    operations in PQS should be exactly the same, regardless of the client
    you're using -- that is how this architecture works. Avatica is a wire
    protocol that all clients use to talk to PQS. More digging/information
    you can provide about the exact circumstances (and, again,
    steps/environment to reproduce what you see) would be extremely helpful.

    Thanks Manoj.

    - Josh

    On 11/2/18 7:16 PM, Manoj Ganesan wrote:
     > Thanks Josh for the response!
     >
     > I would definitely like to use protobuf serialization, but I'm
    observing
     > performance issues trying to run queries with a large number of
    results.
     > One problem is that I observe PQS runs out of memory, when its
    trying to
     > (what looks like to me) serialize the results in Avatica. The
    other is
     > that the phoenixdb python adapter itself spends a large amount of
    time
     > in the logic
     >
    
<https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>

     > where its converting the protobuf rows to python objects.
     >
     > Interestingly when we use sqlline-thin.py instead of python
    phoenixdb,
     > the protobuf serialization works fine and responses are fast.
    It's not
     > clear to me why PQS would have problems when using the python
    adapter
     > and not when using sqlline-thin, do they follow different code paths
     > (especially around serialization)?
     >
     > Thanks again,
     > Manoj
     >
     > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <els...@apache.org
    <mailto:els...@apache.org>
     > <mailto:els...@apache.org <mailto:els...@apache.org>>> wrote:
     >
     >     I would strongly suggest you do not use the JSON serialization.
     >
     >     The JSON support is implemented via Jackson which has no
    means to make
     >     backwards compatibility "easy". On the contrast, protobuf
    makes this
     >     extremely easy and we have multiple examples over the past
    years where
     >     we've been able to fix bugs in a backwards compatible manner.
     >
     >     If you want the thin client to continue to work across
    versions, stick
     >     with protobuf.
     >
     >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
     >      > Hey everyone,
     >      >
     >      > I'm trying to use the Python phoenixdb adapter work with JSON
     >      > serialization on PQS.
     >      >
     >      > I'm using Phoenix 4.14 and the adapter works fine with
    protobuf, but
     >      > when I try making it work with an older version of phoenixdb
     >     (before the
     >      > JSON to protobuf switch was introduced), it just returns 0
    rows.
     >     I don't
     >      > see anything in particular wrong with the HTTP requests
    itself,
     >     and they
     >      > seem to conform to the Avatica JSON spec
     >      > (http://calcite.apache.org/avatica/docs/json_reference.html).
     >      >
     >      > Here's the result (with some debug statements) that
    returns 0 rows.
     >      > Notice the
    *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
     >      >
     >      > request body =  {"maxRowCount": -2, "connectionId":
     >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
     >     "prepareAndExecute",
     >      > "statementId": 3, "sql": "SELECT col1, col2 from table
    limit 20"}
     >      > request headers =  {'content-type': 'application/json'}
     >      > _post_request: got response {'fp': <socket._fileobject
    object at
     >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False,
    'chunk_left':
     >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
     >     'version': 11,
     >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
     >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
     >      > response.read(): body =
     >      >

> {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable

     >      >

> ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}

     >
     >      >
     >      >
     >      > The same query issued against a PQS started with PROTOBUF
     >     serialization
     >      > and using a newer phoenixdb adapter returns the correct
    number of
     >     rows.
     >      >
     >      > Has anyone had luck making this work?
     >      >
     >      > Thanks,
     >      > Manoj
     >      >
     >

Re: Python phoenixdb adapter and JSON serialization on PQS

Reply via email to