Hi Divya, There is nothing as a naive question. Please feel free to post any questions you have. There is someone in the community that will help you out.
This is my opinion: There are a variety of BI tools in the market that offer excellent visualization and interaction with data capabilities. Tableau, MicroStrategy, Qlik to name a few. These are tools built by companies but you could be building your own web app that is highly customized for your users. The need for such tools as arisen for the need of the end BI (business intelligence) user who does not have the time and patience of type SQL queries. If you are slicing and dicing data while following your intuition, imagine having to rewrite the SQL queries each time and ensuring that they work syntactically. That is a lot to ask for the average user who wants to look at the data in different ways and make a decision that hopefully results in some action(and not just powerpoint slides). The latter is more important than anything else inside a company. Drill provides the SQL query layer for interaction with the data underneath scattered across various data sources. So before you jump in to standardize on any BI tool-ask yourself: who are the business users, what are their needs in terms of decision making and what kind of workflows do they envision. Then work backwards to find out the tool that best meets your needs. Be open to the idea of building your web app if that is something you envision will benefit your users in the long term. As for the other questions, these are the ones related to data retrieval (efficiency which results in performance). I will tell you what I know and other can chime in with better info: 1. Metadata cache: Only for Parquet files. The idea here is to store the metadata associated with Parquet rowgroups per file in a separate file so that you avoid having to open and close every Parquet file to get that info. Metadata can help you understand basic statistics such as mins and maxes so that you can skip rowgroups or files that do not match your filter condition. This idea of storing metadata is not new across other query engines. Read more here: https://drill.apache.org/docs/optimizing-parquet-metadata-reading/ 2. Partitioning: Drill is "directory aware". This is an age old concept of partitioning your data in a way so that Drill can skip directories that are part of the filter condition. The layout and structuring of the data now helps Drill. Partitioning schemes depend on query patterns. One rule of thumb that I use is to look at the BI users and observe their workflows. If they use a time range as the basis of every analysis, then I will partition by time (say month). If I know that the likelihood of partitioning by location is 60%, then I will create a nested directory structure with time(month) at top of the hierarchy and location just below it. Read more here: https://drill.apache.org/docs/partition-pruning-introduction/ 3. Generation of Parquet file https://drill.apache.org/docs/parquet-format/ Please pay attention to how you configure the writer: https://drill.apache.org/docs/parquet-format/#configuring-the-parquet-storage-format 4. Custom column calculation: There is some out of the box stuff here: https://drill.apache.org/docs/sql-window-functions-introduction/ But you should also be aware of nesting operations as well: https://drill.apache.org/docs/nested-data-limitations/ and of course there are UDFs: https://drill.apache.org/docs/adding-custom-functions-to-drill-introduction/ Please let us know if you have any additional questions. Happy drilling:) Saurabh On Tue, Jul 25, 2017 at 12:54 AM, Divya Gehlot <[email protected]> wrote: > Hi, > As a naive user would like to know the benefitsof Apache Drill with tableau > ? > > As per my understanding we to visualize we need to push the data to tableau > for granular visualization . > > Would like to understand few features of Drill in terms of visualtion or > data retrieval : > 1, Metadata Caching > 2 .Partitioning > 3.Generation of Parquet File > 4.Custom column calculation. > > > Thanks, > Divya >
