I think a Pythonic module with Drill could be a great contribution. Using the Rest API makes the most sense, wrapping it, and interfacing with it using requests or something similar. Since everything is done via JSON in the rest API, there could be nice interaction with the API, doing things such as authentication (it's form based, so you have to use a requests session or similar), query submission, results, error handling,etc. You will want to determine what you want your driver to do, do you want an interface to support submitting new storage plugins? Do you want to expose query time settings (such as the JSON read number as double) via the driver, or just via a statement submitted by the user? (one requires much more work, the other requires a eye towards security). Security in another thing, you want to ensure that if something is using your module, say a Python Flask App, that there is validation of SQL, and other such concerns. Drill seems to be pretty good about it, but any module you would write should be explicit about what it is and what it isn't doing related to input sanitization/security
Other things to think about would be something that would allow result set objects in your Python driver to be easily moved to a pandas data frame. I think the Data Science folks out there would love this, and you would have a core setup of users and other contributions very quickly with that. The key to something like this would be ensuring it's as Pythonic as possible and is trying to bridge the gap between the Python language and Rest API. This allows you, the author, the most flexibility to focus on your code, and not have to worry much about the Drill code base as everything is using the Rest API (which is really well designed having used it myself in Python scripts). This is a great idea and I would be happy to contribute/assist! John On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <[email protected]> wrote: > Dear Drill developers, > > Recently I was trying to use Drill from Python through ODBC interface > based on blog post from > https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl > It worked as expected, but what struck to me was that It’s a lot of hassle > to configure it. > > That’s why based on Your site under Contribution Ideas ( > https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided > to create simpler solution for Python community. > > My Contribution would have two phases: > client/driver for interacting with Drill > dsl which will provide a easier and idiomatic way to write and manipulate > queries using defined query set expressions. > > > 1. > Similarly to official client for Elastic Search ( > https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api > of Drill for which i found documentation under > https://drill.apache.org/docs/rest-api/ > sketch of usage: > https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py > > questions: > 1.1 I was wondering if Python driver for Drill could be based on Rest-Api, > do you see any problems? > 1.2 Do you have any ideas or suggestions for that project? > > 2. > It would be separate package from driver, you can install as an optional > package via command: > pip install pydrill-dsl > so that it would have separate releases from 1 package. > It would enhance way of interacting with Drill via query set like > expressions. > sketch of usage: > > https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py > > questions: > 2.1 Should it be separated from Python Drill Driver package? > 2.2 Do you have any ideas or suggestions for that project? > > This contribution would be part of my Master Thesis, so any ideas are > welcome. My thesis supervisor suggested to contact You to get Drill core > developers perspective. > > I would be very grateful if You could provide me with your thoughts. > > kind regards, > Wojtek Nowak >
