Thanks, this is very helpful and explains the behavior.
On Jan 29, 2015, at 9:30 AM, Jacques Nadeau <[email protected]> wrote: > You are mostly correct. > > Verify that resources you are referencing exist in a readable format that > you have permission to access (files, tables, views, etc) > If the assets are considered strong-schema, verify that the references you > are using exist and have compatible data types > > Right now, schemaness falls into these two main categories: > > strong-schemaed > views > hive tables > hbase column families > text > > weak-schemaed > json > mongodb > hbase column qualifiers > parquet > > > Note that we really need to move Parquet from the strong weak-schemaed to > strong-schemaed list since the format itself is relatively strong-schemaed. > (I say relative because Parquet doesn't require an application to record > logical data types and many systems that generate Parquet today don't > generate logical type information). This has caused us to initially treat > it as weakly-schemaed since this allows more liberal casting capabilities > than is normally allowed by SQL and thus a better user experience with > Parquet data that doesn't have logical type information. > > > > > On Thu, Jan 29, 2015 at 9:14 AM, Andries Engelbrecht < > [email protected]> wrote: > >> Which steps and checks does Drill perform when creating a view? >> >> When creating a view on a directory structure with a large number of >> directories and JSON files in each directory, the view creation takes 5-7 >> seconds on small cluster. >> >> From a few tests it seems that Drill will verify Hive tables and columns >> being used in a view. >> >> For the JSON docs in the DFS it does verify the storage plugin and the >> directory it is being pointed at. >> If the directory is empty the view creation does fail. >> Drill does not seem to verify if the maps (specified in the view) in JSON >> files are present, likely due to the convention to assign null to non >> existent maps (still need to dig deeper on this topic on the conventions >> being used for complex data types) >> >> Thx >> >> —Andries >>
