I was able to use Drill in that use case. Basically I took Drill's API and fronted it with Python Flask app that took my REST API calls and sanitized and authenticated people connecting. (I wanted to ensure no funny business was coming through the API from a security stand point, I also wanted user based authentication and logging). This Python app then sent the request to Drill's API, and pushed the results back to the client. It worked very well especially on all data sets, but particularly well, for small to medium sized JSON/CSV data sets. When I had larger data sets of CSV or JSON Files, I would load them into Parquet tables and then I got really good response times on those as well. I was using MapR FS and I felt the performance that also helped my performance (although I didn't test with HDFS). The key is trying to understand what type of queries will be coming through and trying to optimize from that perspective. I was very happy with Drill in this use case.
On Fri, Oct 16, 2015 at 3:48 AM, Margus Roo <[email protected]> wrote: > Hi > > I read documentation and FAQ's about drill and played a little with drill. > Now I have question - Can I use Drill to serve online API's like > traditional RDB do? > My use case is: > I have loads of JSON records. I can hold them in any Hadoop component like > HBase or HDFS or wherever best for my solution to get high respond. > I'd like to use any element in JSON as filter in WHERE clause. > Is there cases to bring up who and what performance uses Drill like online > db? > > -- > Margus (margusja) Roo > http://margus.roo.ee > skype: margusja > +372 51 480 > >
