Hi Divya,

There is nothing as a naive question. Please feel free to post any
questions you have. There is someone in the community that will help you
out.

This is my opinion:
There are a variety of BI tools in the market that offer excellent
visualization and interaction with data capabilities. Tableau,
MicroStrategy, Qlik to name a few. These are tools built by companies but
you could be building your own web app that is highly customized for your
users. The need for such tools as arisen for the need of the end BI
(business intelligence) user who does not have the time and patience of
type SQL queries. If you are slicing and dicing data while following your
intuition, imagine having to rewrite the SQL queries each time and ensuring
that they work syntactically.

That is a lot to ask for the average user who wants to look at the data in
different ways and make a decision that hopefully results in some
action(and not just powerpoint slides). The latter is more important than
anything else inside a company.

Drill provides the SQL query layer for interaction with the data underneath
scattered across various data sources.

So before you jump in to standardize on any BI tool-ask yourself: who are
the business users, what are their needs in terms of decision making and
what kind of workflows do they envision. Then work backwards to find out
the tool that best meets your needs. Be open to the idea of building your
web app if that is something you envision will benefit your users in the
long term.

As for the other questions, these are the ones related to data retrieval
(efficiency which results in performance). I will tell you what I know and
other can chime in with better info:

1. Metadata cache: Only for Parquet files. The idea here is to store the
metadata associated with Parquet rowgroups per file in a separate file so
that you avoid having to open and close every Parquet file to get that
info. Metadata can help you understand basic statistics such as mins and
maxes so that you can skip rowgroups or files that do not match your filter
condition. This idea of storing metadata is not new across other query
engines.

Read more here:
https://drill.apache.org/docs/optimizing-parquet-metadata-reading/

2. Partitioning: Drill is "directory aware". This is an age old concept of
partitioning your data in a way so that Drill can skip directories that are
part of the filter condition. The layout and structuring of the data now
helps Drill. Partitioning schemes depend on query patterns. One rule of
thumb that I use is to look at the BI users and observe their workflows. If
they use a time range as the basis of every analysis, then I will partition
by time (say month). If I know that the likelihood of partitioning by
location is 60%, then I will create a nested directory structure with
time(month) at top of the hierarchy and location just below it.

Read more here:
https://drill.apache.org/docs/partition-pruning-introduction/

3. Generation of Parquet file

https://drill.apache.org/docs/parquet-format/

Please pay attention to how you configure the writer:
https://drill.apache.org/docs/parquet-format/#configuring-the-parquet-storage-format

4. Custom column calculation:
There is some out of the box stuff here:
https://drill.apache.org/docs/sql-window-functions-introduction/

But you should also be aware of nesting operations as well:
https://drill.apache.org/docs/nested-data-limitations/

and of course there are UDFs:
https://drill.apache.org/docs/adding-custom-functions-to-drill-introduction/

Please let us know if you have any additional questions.

Happy drilling:)

Saurabh







On Tue, Jul 25, 2017 at 12:54 AM, Divya Gehlot <[email protected]>
wrote:

> Hi,
> As a naive user would like to know the benefitsof Apache Drill with tableau
> ?
>
> As per my understanding we to visualize we need to push the data to tableau
> for granular visualization .
>
> Would like to understand few features of Drill in terms of visualtion or
> data retrieval :
> 1, Metadata Caching
> 2 .Partitioning
> 3.Generation of Parquet File
> 4.Custom column calculation.
>
>
> Thanks,
> Divya
>

Reply via email to