Re: Python phoenixdb adapter and JSON serialization on PQS

Manoj Ganesan Mon, 05 Nov 2018 19:11:39 -0800

Thanks for the pointers Josh. I'm working on getting a representative
concise test to demonstrate the issue.


Meanwhile, I had one question regarding the following:

You are right that the operations in PQS should be exactly the same,
> regardless of the client you're using -- that is how this architecture
> works.


IIUC, this means the following 2 methods should yield the same result:

   1. sqlline-thin.py -s JSON <query_file>
   2. using a python avatica client script making JSON requests

I made the following change in hbase-site.xml on the PQS host:

<property>
    <name>phoenix.queryserver.serialization</name>
    <value>JSON</value>
</property>

I notice that executing "sqlline-thin.py -s JSON <query_file>" returns
results just fine. However, when I use a simple script to try the same
query, it returns 0 rows. I'm attaching the Python script here. The script
essentially makes HTTP calls using the Avatica JSON reference
<https://calcite.apache.org/avatica/docs/json_reference.html>. I assumed
that the sqlline-thin wrapper (when passed the -s JSON flag) also make HTTP
calls based on the JSON reference, is that not correct?

I'll work on getting some test cases here soon to illustrate this as well
as the performance problem.

Thanks again!
Manoj

On Mon, Nov 5, 2018 at 10:43 AM Josh Elser <[email protected]> wrote:

> Is the OOME issue regardless of using the Java client (sqlline-thin) and
> the Python client? I would like to know more about this one. If you can
> share something that reproduces the problem for you, I'd like to look
> into it. The only suggestion I have at this point in time is to make
> sure you set a reasonable max-heap size in hbase-env.sh (e.g. -Xmx) via
> PHOENIX_QUERYSERVER_OPTS and have HBASE_CONF_DIR pointing to the right
> directory when you launch PQS.
>
> Regarding performance, as you've described it, it sounds like the Python
> driver is just slower than the Java driver. You are right that the
> operations in PQS should be exactly the same, regardless of the client
> you're using -- that is how this architecture works. Avatica is a wire
> protocol that all clients use to talk to PQS. More digging/information
> you can provide about the exact circumstances (and, again,
> steps/environment to reproduce what you see) would be extremely helpful.
>
> Thanks Manoj.
>
> - Josh
>
> On 11/2/18 7:16 PM, Manoj Ganesan wrote:
> > Thanks Josh for the response!
> >
> > I would definitely like to use protobuf serialization, but I'm observing
> > performance issues trying to run queries with a large number of results.
> > One problem is that I observe PQS runs out of memory, when its trying to
> > (what looks like to me) serialize the results in Avatica. The other is
> > that the phoenixdb python adapter itself spends a large amount of time
> > in the logic
> > <
> https://github.com/apache/phoenix/blob/master/python/phoenixdb/phoenixdb/cursor.py#L248>
>
> > where its converting the protobuf rows to python objects.
> >
> > Interestingly when we use sqlline-thin.py instead of python phoenixdb,
> > the protobuf serialization works fine and responses are fast. It's not
> > clear to me why PQS would have problems when using the python adapter
> > and not when using sqlline-thin, do they follow different code paths
> > (especially around serialization)?
> >
> > Thanks again,
> > Manoj
> >
> > On Fri, Nov 2, 2018 at 4:05 PM Josh Elser <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     I would strongly suggest you do not use the JSON serialization.
> >
> >     The JSON support is implemented via Jackson which has no means to
> make
> >     backwards compatibility "easy". On the contrast, protobuf makes this
> >     extremely easy and we have multiple examples over the past years
> where
> >     we've been able to fix bugs in a backwards compatible manner.
> >
> >     If you want the thin client to continue to work across versions,
> stick
> >     with protobuf.
> >
> >     On 11/2/18 5:27 PM, Manoj Ganesan wrote:
> >      > Hey everyone,
> >      >
> >      > I'm trying to use the Python phoenixdb adapter work with JSON
> >      > serialization on PQS.
> >      >
> >      > I'm using Phoenix 4.14 and the adapter works fine with protobuf,
> but
> >      > when I try making it work with an older version of phoenixdb
> >     (before the
> >      > JSON to protobuf switch was introduced), it just returns 0 rows.
> >     I don't
> >      > see anything in particular wrong with the HTTP requests itself,
> >     and they
> >      > seem to conform to the Avatica JSON spec
> >      > (http://calcite.apache.org/avatica/docs/json_reference.html).
> >      >
> >      > Here's the result (with some debug statements) that returns 0
> rows.
> >      > Notice the *"firstFrame":{"offset":0,"done":true,"rows":[]* below:
> >      >
> >      > request body =  {"maxRowCount": -2, "connectionId":
> >      > "68c05d12-5770-47d6-b3e4-dba556db4790", "request":
> >     "prepareAndExecute",
> >      > "statementId": 3, "sql": "SELECT col1, col2 from table limit 20"}
> >      > request headers =  {'content-type': 'application/json'}
> >      > _post_request: got response {'fp': <socket._fileobject object at
> >      > 0x7f858330b9d0>, 'status': 200, 'will_close': False, 'chunk_left':
> >      > 'UNKNOWN', 'length': 1395, 'strict': 0, 'reason': 'OK',
> >     'version': 11,
> >      > 'debuglevel': 0, 'msg': <httplib.HTTPMessage instance at
> >      > 0x7f84fb50be18>, 'chunked': 0, '_method': 'POST'}
> >      > response.read(): body =
> >      >
> >
>  
> {"response":"executeResults","missingStatement":false,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"},"results":[{"response":"resultSet","connectionId":"68c05d12-5770-47d6-b3e4-dba556db4790","statementId":3,"ownStatement":true,"signature":{"columns":[{"ordinal":0,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable
> >      >
> >
>  
> ":0,"signed":true,"displaySize":40,"label":"COL1","columnName":"COL1","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"},{"ordinal":1,"autoIncrement":false,"caseSensitive":false,"searchable":true,"currency":false,"nullable":0,"signed":true,"displaySize":40,"label":"COL2","columnName":"COL2","schemaName":"","precision":0,"scale":0,"tableName":"TABLE","catalogName":"","type":{"type":"scalar","id":4,"name":"INTEGER","rep":"PRIMITIVE_INT"},"readOnly":true,"writable":false,"definitelyWritable":false,"columnClassName":"java.lang.Integer"}],"sql":null,"parameters":[],"cursorFactory":{"style":"LIST","clazz":null,"fieldNames":null},"statementType":null},*"firstFrame":{"offset":0,"done":true,"rows":[]*},"updateCount":-1,"rpcMetadata":{"response":"rpcMetadata","serverAddress":"ip-10-55-6-247:8765"}}]}
> >
> >      >
> >      >
> >      > The same query issued against a PQS started with PROTOBUF
> >     serialization
> >      > and using a newer phoenixdb adapter returns the correct number of
> >     rows.
> >      >
> >      > Has anyone had luck making this work?
> >      >
> >      > Thanks,
> >      > Manoj
> >      >
> >
>

import json
import httplib
import uuid
from urlparse import urlparse
from time import time

url = '<enter_url_here>'
parsed_url = urlparse(url)
port = 8765
conn_timeout = 5
query_timeout = 5000
sql = 'select ind_id, attr_id from ind_attr limit 100'

connection = httplib.HTTPConnection(url, port, timeout=conn_timeout)
connection.connect()
connection.sock.settimeout(query_timeout)

conn_id = str(uuid.uuid4())

def post_request(connection, request):
    body = json.dumps(request)
    headers = {'content-type': 'application/json'}
    connection.request('POST', parsed_url.path, body=body, headers=headers)
    response = connection.getresponse()
    response_body = response.read()
    return json.loads(response_body)  # response_data

# 1. open connection
request = {
    'request': 'openConnection',
    'connectionId': conn_id
}

print 'request = ', request
response_data = post_request(connection, request)
print 'response = ', response_data

# 2. create statement
request = {
    'request': 'createStatement',
    'connectionId': conn_id,
}

print 'request = ', request
response_data = post_request(connection, request)
print 'response = ', response_data

# 3. prepare and execute
statement_id = response_data['statementId']
request = {
    'request': 'prepareAndExecute',
    'connectionId': conn_id,
    'sql': sql,
    'maxRowCount': -2,
    'statementId': statement_id
}

s = time()
print 'request = ', request
response_data = post_request(connection, request)
print 'received response in time = %s' % (time() - s)
#print 'response = ', response_data
print 'received %s rows' % len(response_data['results'][0]['firstFrame']['rows'])

# 4. close statement
request = {
    'request': 'closeStatement',
    'connectionId': conn_id,
    'statementId': statement_id,
}

print 'request = ', request
response_data = post_request(connection, request)
print 'response = ', response_data

# 5. close connection
request = {
    'request': 'closeConnection',
    'connectionId': conn_id,
}

print 'request = ', request
response_data = post_request(connection, request)
print 'response = ', response_data

Re: Python phoenixdb adapter and JSON serialization on PQS

Reply via email to