I was able to use Drill in that use case.  Basically I took Drill's API and
fronted it with Python Flask app that took my REST API calls and sanitized
and authenticated people connecting. (I wanted to ensure no funny business
was coming through the API from a security stand point, I also wanted user
based authentication and logging).  This Python app then sent the request
to Drill's API, and pushed the results back to the client.  It worked very
well especially on all data sets, but particularly well, for small to
medium sized JSON/CSV data sets.   When I had larger data sets of CSV or
JSON Files, I would load them into Parquet tables and then I got really
good response times on those as well.  I was using MapR FS and I felt the
performance that also helped my performance (although I didn't test with
HDFS).   The key is trying to understand what type of queries will be
coming through and trying to optimize from that perspective.   I was very
happy with Drill in this use case.

On Fri, Oct 16, 2015 at 3:48 AM, Margus Roo <[email protected]> wrote:

> Hi
>
> I read documentation and FAQ's about drill and played a little with drill.
> Now I have question - Can I use Drill to serve online API's like
> traditional RDB do?
> My use case is:
> I have loads of JSON records. I can hold them in any Hadoop component like
> HBase or HDFS or wherever best for my solution to get high respond.
> I'd like to use any element in JSON as filter in WHERE clause.
> Is there cases to bring up who and what performance uses Drill like online
> db?
>
> --
> Margus (margusja) Roo
> http://margus.roo.ee
> skype: margusja
> +372 51 480
>
>

Reply via email to