One thing we should fix to make this easier is to provide properly typed data through the rest API. This result listener is transforming the native drill record format into a simple hashmap with both the keys and values provided as strings. This list of hashmaps is serialized by jackson into the result set returned by the Rest API in response to a query request.
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/QueryWrapper.java#L122 I assume that there should be reasonable libraries in python for parsing extended JSON, which would probably be the easiest way to get fully typed data back to your new client. I think it would be best to deprecate the current behavior of returning all strings entirely and just create two modes. Extended JSON for full typing and simple JSON with types like date, time and binary converted to strings appropriately. We have had discussions on the list in the past from users that had to work around the fact that the numeric data was coming back as strings. We should just make the behavior intuitive to get started with and allow the option to turn on full typing with extended JSON if needed. - Jason On Mon, Dec 28, 2015 at 10:47 AM, Wojciech Nowak <[email protected]> wrote: > Hello! > > Great to read such enthusiastic feedback. > > I have created git repo -> https://github.com/PythonicNinja/pydrill > > Enabled testing via travis. > Enabled creation of docs on > http://pydrill.readthedocs.org/en/latest/readme.html > > Discussions related to Python Driver can move there. > > kind regards, > Wojtek Nowak > > On Monday, 28 December 2015 at 17:12, Tomer Shiran wrote: > > > +1 > > > > Having a Python client would be super valuable > > > > > > > > > On Dec 28, 2015, at 9:45 AM, Peder Jakobsen | gmail < > [email protected] (mailto:[email protected])> wrote: > > > > > > Two thumbs up for this project. An immediate benefit is the ability to > > > take advantage of the enhanced interactive features of the iPython > shell. > > > > > > Perhaps the next step is to model the design after a similar Rest API > > > wrapper, for example, python-twitter: > > > https://github.com/bear/python-twitter > > > > > > > On Mon, Dec 28, 2015 at 8:45 AM, Charles Givre <[email protected] > (mailto:[email protected])> wrote: > > > > > > > > I’d second that and be willing to help. > > > > > > > > > > > > > On Dec 28, 2015, at 07:59, John Omernik <[email protected] (mailto: > [email protected])> wrote: > > > > > > > > > > I think a Pythonic module with Drill could be a great contribution. > > > > Using > > > > > the Rest API makes the most sense, wrapping it, and interfacing > with it > > > > > using requests or something similar. Since everything is done via > JSON in > > > > > the rest API, there could be nice interaction with the API, doing > things > > > > > such as authentication (it's form based, so you have to use a > requests > > > > > session or similar), query submission, results, error > handling,etc. You > > > > > will want to determine what you want your driver to do, do you > want an > > > > > interface to support submitting new storage plugins? Do you want to > > > > > > > > > > > > > expose > > > > > query time settings (such as the JSON read number as double) via > the > > > > > driver, or just via a statement submitted by the user? (one > requires much > > > > > more work, the other requires a eye towards security). Security in > > > > > > > > > > > > > another > > > > > thing, you want to ensure that if something is using your module, > say a > > > > > Python Flask App, that there is validation of SQL, and other such > > > > > > > > > > > > > concerns. > > > > > Drill seems to be pretty good about it, but any module you would > write > > > > > should be explicit about what it is and what it isn't doing > related to > > > > > input sanitization/security > > > > > > > > > > Other things to think about would be something that would allow > result > > > > set > > > > > objects in your Python driver to be easily moved to a pandas data > frame. > > > > > > > > I > > > > > think the Data Science folks out there would love this, and you > would > > > > > > > > have > > > > > a core setup of users and other contributions very quickly with > that. > > > > > > > > The > > > > > key to something like this would be ensuring it's as Pythonic as > possible > > > > > and is trying to bridge the gap between the Python language and > Rest API. > > > > > This allows you, the author, the most flexibility to focus on your > code, > > > > > and not have to worry much about the Drill code base as everything > is > > > > > > > > > > > > > using > > > > > the Rest API (which is really well designed having used it myself > in > > > > > > > > Python > > > > > scripts). > > > > > > > > > > This is a great idea and I would be happy to contribute/assist! > > > > > > > > > > John > > > > > > > > > > > On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak > <[email protected] (mailto:[email protected])> > > > > > wrote: > > > > > > > > > > > Dear Drill developers, > > > > > > > > > > > > Recently I was trying to use Drill from Python through ODBC > interface > > > > > > based on blog post from > > > > > > > > > > > > > > > > > > > > > > > > > https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl > > > > > > It worked as expected, but what struck to me was that It’s a lot > of > > > > > > > > > > > > > hassle > > > > > > to configure it. > > > > > > > > > > > > That’s why based on Your site under Contribution Ideas ( > > > > > > https://drill.apache.org/docs/apache-drill-contribution-ideas/) > I > > > > > > > > > > > > > > > > > > > decided > > > > > > to create simpler solution for Python community. > > > > > > > > > > > > My Contribution would have two phases: > > > > > > client/driver for interacting with Drill > > > > > > dsl which will provide a easier and idiomatic way to write and > > > > > > > > > > > > > > > > > > > manipulate > > > > > > queries using defined query set expressions. > > > > > > > > > > > > > > > > > > 1. > > > > > > Similarly to official client for Elastic Search ( > > > > > > https://github.com/elastic/elasticsearch-py) I would like to use > > > > > > > > > > > > > > > > > > > Rest-Api > > > > > > of Drill for which i found documentation under > > > > > > https://drill.apache.org/docs/rest-api/ > > > > > > sketch of usage: > > > > > > > > > > > > > > > > > > > > https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py > > > > > > > > > > > > questions: > > > > > > 1.1 I was wondering if Python driver for Drill could be based on > > > > > > > > > > > > > > > > > > > Rest-Api, > > > > > > do you see any problems? > > > > > > 1.2 Do you have any ideas or suggestions for that project? > > > > > > > > > > > > 2. > > > > > > It would be separate package from driver, you can install as an > optional > > > > > > package via command: > > > > > > pip install pydrill-dsl > > > > > > so that it would have separate releases from 1 package. > > > > > > It would enhance way of interacting with Drill via query set like > > > > > > expressions. > > > > > > sketch of usage: > > > > > > > > > > > > > > > > > > > > https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py > > > > > > > > > > > > questions: > > > > > > 2.1 Should it be separated from Python Drill Driver package? > > > > > > 2.2 Do you have any ideas or suggestions for that project? > > > > > > > > > > > > This contribution would be part of my Master Thesis, so any > ideas are > > > > > > welcome. My thesis supervisor suggested to contact You to get > Drill core > > > > > > developers perspective. > > > > > > > > > > > > I would be very grateful if You could provide me with your > thoughts. > > > > > > > > > > > > kind regards, > > > > > > Wojtek Nowak > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
