Sorry, bad typo:  I have 50GB of data, NOT 500GB  ;).  And I usually only
query a 1 GB subset of this data using Drill.



On Thu, Feb 4, 2016 at 1:04 PM, Peder Jakobsen | gmail <[email protected]>
wrote:

> On Thu, Feb 4, 2016 at 11:15 AM, Andries Engelbrecht <
> [email protected]> wrote:
>
>> Is there a reason to create a single file? Typically you may want more
>> files to improve parallel operation on distributed systems like drill.
>>
>
>  Good question.   I'm not actually using Drill for "big data".  In fact, I
> never deal with "big data", and I'm unlikely to ever  do so.
>
> But I do have 500 GB of CSV files spread across about 100 directories.
> They are all part of the same dataset, but this is how it's been organized
> by the government department who has released it as and Open Data dump
>
> Drill saves me the hassle of having to stitch these files together using
> python or awk. I love being able to just query the files using SQL (so far
> it's slow though, I need to figure out why - 18 seconds for a simple query
> is too much).   Data eventually needs to end up on the web to share it with
> other people, and I use crossfilter.js and D3.js for presentation.  I need
> fine grained control over online data presentation, and all BI tools I've
> seen are terrible in this department, eg. Tableau.
>
> So I need my data in a format that can be read by common web frameworks,
> and that usually implies dealing with a single file that can be uploaded to
> the web server.  No need for a database, since I'm just reading a few
> columns from a big flat file.
>
> I run my apps on a low cost virtual server. I don't have access to
> java/virtualbox/MongoDB etc.  Nor do I think these things are necessary:
> K.I.S.S
>
> So this use case may be quite different from many of the more "corporate"
> users, but Drill is so very useful regardless.
>
>
>
>

Reply via email to