Hi,
I just started using the PyHS2 package to access Hive from my Python apps.

Question 1: How can I get the column headers from my result set?

I see the __init__ method on the Connection class take a configuration arg.
Can I set session variables with that, and use something like
"hive.cli.print.header=true"?  I tried a bit but that's not working.


Question 2 (if this is too broad a question please ignore, question 1 is
more important):
 I'm switching over to PyHS2 after a long practice of using Hive CLI in a
subprocess in my Python apps.
But after upgrading to Hive 0.13 - I can't/shouldn't use Hive CLI anymore.
I explored running Beeline in a subprocess (as an easy swap from Hive CLI)
but the bugs in capturing output (i.e. CSV vs CSV2 and Header option not
working until Hive 0.14) made me look elsewhere.

Can anyone recommend PyHS2?  My testing so far looks good.
The other option I see is using the Hive ODBC driver (I see cloudera hosts
some versions) and pick one of the many Python generic ODBC wrappers (none
of which I'm familiar with) and hope they all fit together well enough.

I'm sure this problem of accessing HiveServer2 from a Python app has been
solved thousands of times already (by a better approach then my prior
choice of Hive CLI).
Is PyHS2 the popular approach or is there something else?

Thanks!

Reply via email to