I think a Pythonic module with Drill could be a great contribution.  Using
the Rest API makes the most sense, wrapping it, and interfacing with it
using requests or something similar. Since everything is done via JSON in
the rest API, there could be nice interaction with the API, doing things
such as authentication (it's form based, so you have to use a requests
session or similar), query submission, results, error handling,etc. You
will want to determine what you want your driver to do, do you want an
interface to support submitting new storage plugins?  Do you want to expose
query time settings (such as the JSON read number as double) via the
driver, or just via a statement submitted by the user? (one requires much
more work, the other requires a eye towards security).  Security in another
thing, you want to ensure that if something is using your module, say a
Python Flask App, that there is validation of SQL, and other such concerns.
Drill seems to be pretty good about it, but any module you would write
should be explicit about what it is and what it isn't doing related to
input sanitization/security

Other things to think about would be something that would allow result set
objects in your Python driver to be easily moved to a pandas data frame. I
think the Data Science folks out there would love this, and you would have
a core setup of users and other contributions very quickly with that.  The
key to something like this would be ensuring it's as Pythonic as possible
and is trying to bridge the gap between the Python language and Rest API.
This allows you, the author, the most flexibility to focus on your code,
and not have to worry much about the Drill code base as everything is using
the Rest API (which is really well designed having used it myself in Python
scripts).

This is a great idea and I would be happy to contribute/assist!

John

On Mon, Dec 28, 2015 at 2:07 AM, Wojciech Nowak <[email protected]> wrote:

> Dear Drill developers,
>
> Recently I was trying to use Drill from Python through ODBC interface
> based on blog post from
> https://www.mapr.com/blog/using-drill-programmatically-python-r-and-perl
> It worked as expected, but what struck to me was that It’s a lot of hassle
> to configure it.
>
> That’s why based on Your site under Contribution Ideas (
> https://drill.apache.org/docs/apache-drill-contribution-ideas/) I decided
> to create simpler solution for Python community.
>
> My Contribution would have two phases:
> client/driver for interacting with Drill
> dsl which will provide a easier and idiomatic way to write and manipulate
> queries using defined query set expressions.
>
>
> 1.
> Similarly to official client for Elastic Search (
> https://github.com/elastic/elasticsearch-py) I would like to use Rest-Api
> of Drill for which i found documentation under
> https://drill.apache.org/docs/rest-api/
> sketch of usage:
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill-py
>
> questions:
> 1.1 I was wondering if Python driver for Drill could be based on Rest-Api,
> do you see any problems?
> 1.2 Do you have any ideas or suggestions for that project?
>
> 2.
> It would be separate package from driver, you can install as an optional
> package via command:
> pip install pydrill-dsl
> so that it would have separate releases from 1 package.
> It would enhance way of interacting with Drill via query set like
> expressions.
> sketch of usage:
>
> https://gist.github.com/PythonicNinja/9b4952b6cbc17572c7db#file-pydrill_dsl-py
>
> questions:
> 2.1 Should it be separated from Python Drill Driver package?
> 2.2 Do you have any ideas or suggestions for that project?
>
> This contribution would be part of my Master Thesis, so any ideas are
> welcome. My thesis supervisor suggested to contact You to get Drill core
> developers perspective.
>
> I would be very grateful if You could provide me with your thoughts.
>
> kind regards,
> Wojtek Nowak
>

Reply via email to