You are mostly correct.

Verify that resources you are referencing exist in a readable format that
you have permission to access (files, tables, views, etc)
If the assets are considered strong-schema, verify that the references you
are using exist and have compatible data types

Right now, schemaness falls into these two main categories:

strong-schemaed
views
hive tables
hbase column families
text

weak-schemaed
json
mongodb
hbase column qualifiers
parquet


Note that we really need to move Parquet from the strong weak-schemaed to
strong-schemaed list since the format itself is relatively strong-schemaed.
(I say relative because Parquet doesn't require an application to record
logical data types and many systems that generate Parquet today don't
generate logical type information).  This has caused us to initially treat
it as weakly-schemaed since this allows more liberal casting capabilities
than is normally allowed by SQL and thus a better user experience with
Parquet data that doesn't have logical type information.




On Thu, Jan 29, 2015 at 9:14 AM, Andries Engelbrecht <
[email protected]> wrote:

> Which steps and checks does Drill perform when creating a view?
>
> When creating a view on a directory structure with a large number of
> directories and JSON files in each directory, the view creation takes 5-7
> seconds on small cluster.
>
> From a few tests it seems that Drill will verify Hive tables and columns
> being used in a view.
>
> For the JSON docs in the DFS it does verify the storage plugin and the
> directory it is being pointed at.
> If the directory is empty the view creation does fail.
> Drill does not seem to verify if the maps (specified in the view) in JSON
> files are present, likely due to the convention to assign null to non
> existent maps (still need to dig deeper on this topic on the conventions
> being used for complex data types)
>
> Thx
>
> —Andries
>

Reply via email to