John,

Using a simple tool like squirrel and running a select * limit 0 against 
parquet it is able to retrieve the column names, and if you look at the 
returned metadata it identified the data type. Obviously schema change will be 
a challenge to deal with with limit 0 queries. This is where views can be used 
to force data types for the end tool, skip rows, substitute values, skip added 
columns, etc, which makes it much better for most tools to work with the data.

A function like typeof does require a row to be returned to identify the data 
type.


Ideally it will be great to do initial schema discovery with a tool, associate 
the metadata with the data sources and then make it available for wider use. A 
typeof histogram will be brilliant on data structures with schema changes, that 
way it will be possible to see how many records are affected of a certain 
type/change/etc. A endless list of possibilities. It should be feasible to 
utilize Drill as the execution engine with a smart tool on top of it to process.

--Andries


> On May 16, 2016, at 4:08 PM, John Omernik <[email protected]> wrote:
> 
> So how does a limit0 query return the type if I may ask?  I can use limit 0
> queries with SqlAlchemy and Caravel, but in looking at it, I wasn't sure
> how where the types were returned in the results (unless it's in the
> underlying metadata not displayed to the user).  I can dig more if you tell
> me it's there :)
> 
> On Mon, May 16, 2016 at 4:51 PM, Neeraja Rentachintala <
> [email protected]> wrote:
> 
>> Both are options (and thats how the bI tools work with Drill today)
>> 
>> - Views with explicit Casts - Will return schema definitions as part show
>> schemas/describe table/show columns queries.
>> - Limit 0 queries - This will be good as well if we can modify Caravel to
>> issue such queries (this is what Tableau does)
>> 
>> For now, I think returning metadata to Caravel using the above options will
>> be the solution. The ideal approach would be actually to have a  data
>> exploration experience on raw data (without curation) within Caravel itself
>> to create this metadata as needed.
>> 
>> -Neeraja
>> 
>> On Mon, May 16, 2016 at 2:45 PM, Ted Dunning <[email protected]>
>> wrote:
>> 
>>> As you suggest, views are a critical way to lock down that kind of
>>> information.
>>> 
>>> Select with limit 0 is often used for meta-data exploration. This is more
>>> robust than asking about tables since not everything is necessarily
>> really
>>> in a single table.
>>> 
>>> On Mon, May 16, 2016 at 2:12 PM, John Omernik <[email protected]> wrote:
>>> 
>>>> Hey all, as part of my exploration of Caravel,  I realized knowing the
>>>> types of columns can be valuable... I can say create a view of a
>>> directory
>>>> of parquet allowing the "show tables" to work well, however, the type
>> for
>>>> every column is "ANY" which may work (need to tweak some things) but I
>> am
>>>> guessing may make certain down stream things in Caravel more difficult.
>>>> 
>>>> So, just thinking aloud here, would it be possible to "cast" in Views
>> to
>>>> allow the view definition to pass along type information?  Even if it
>>> means
>>>> a more verbose view definition, it would be done once, and then down
>>> stream
>>>> tools like Caravel would know the types...
>>>> 
>>>> Thoughts?
>>>> 
>>>> John
>>>> 
>>> 
>> 

Reply via email to