Thanks, this is very helpful and explains the behavior.

On Jan 29, 2015, at 9:30 AM, Jacques Nadeau <[email protected]> wrote:

> You are mostly correct.
> 
> Verify that resources you are referencing exist in a readable format that
> you have permission to access (files, tables, views, etc)
> If the assets are considered strong-schema, verify that the references you
> are using exist and have compatible data types
> 
> Right now, schemaness falls into these two main categories:
> 
> strong-schemaed
> views
> hive tables
> hbase column families
> text
> 
> weak-schemaed
> json
> mongodb
> hbase column qualifiers
> parquet
> 
> 
> Note that we really need to move Parquet from the strong weak-schemaed to
> strong-schemaed list since the format itself is relatively strong-schemaed.
> (I say relative because Parquet doesn't require an application to record
> logical data types and many systems that generate Parquet today don't
> generate logical type information).  This has caused us to initially treat
> it as weakly-schemaed since this allows more liberal casting capabilities
> than is normally allowed by SQL and thus a better user experience with
> Parquet data that doesn't have logical type information.
> 
> 
> 
> 
> On Thu, Jan 29, 2015 at 9:14 AM, Andries Engelbrecht <
> [email protected]> wrote:
> 
>> Which steps and checks does Drill perform when creating a view?
>> 
>> When creating a view on a directory structure with a large number of
>> directories and JSON files in each directory, the view creation takes 5-7
>> seconds on small cluster.
>> 
>> From a few tests it seems that Drill will verify Hive tables and columns
>> being used in a view.
>> 
>> For the JSON docs in the DFS it does verify the storage plugin and the
>> directory it is being pointed at.
>> If the directory is empty the view creation does fail.
>> Drill does not seem to verify if the maps (specified in the view) in JSON
>> files are present, likely due to the convention to assign null to non
>> existent maps (still need to dig deeper on this topic on the conventions
>> being used for complex data types)
>> 
>> Thx
>> 
>> —Andries
>> 

Reply via email to