@Divya, I’m aware of that, but there is no way as it stands to discover the schema of a file, CSV or otherwise. I’m not talking about attempting to infer data types, just the simple number of columns, name of columns and what Drill thinks they are. I know that you can do this via a series of queries, but for files, there is no equivalent of DESCRIBE. That’s what I was getting at and I think that would be really useful to have. Also it wouldn’t really alter the character of Drill. — C
> On Oct 24, 2017, at 02:02, Divya Gehlot <[email protected]> wrote: > > @Charles : when you read csv files can casting and get the data type as per > your requirements > > Hope I answer what you asked for :) > > HTH > Thanks, > Divya > > On 20 October 2017 at 01:42, Charles Givre <[email protected]> wrote: > >> Hi Alfredo, >> When I was trying to get Drill to work with various BI tools, all I really >> needed was a list of columns. Data types would be a big bonus, but Drill >> interprets CSV data as plain text anyway. It would be really useful for >> other file types where Drill does infer data types. >> >> >> — C >> >> >>> On Oct 19, 2017, at 6:13 AM, Alfredo Serafini <[email protected]> wrote: >>> >>> Hi thanks for the replies! >>> >>> @Chun yes using Views is an approach I considered, and I like it also >>> methodologically, in order to have some change to "prepare" the data >>> just a bit. I'm testing drill as a sort of data facade for tools which >>> handles mappings to other context, so this could be helpful for me. >>> >>> Anyway I have some concerns regardings metadata/catalog support for >>> views too: it seems that every view is saved on disk as a JSON file, >>> then experimenting the same issues. Are you suggesting saving views to >>> some kind of relational database storage for staging purposes? Is that >>> possible? >>> >>> Sorry for all the questions :-) >>> >>> >>> @Charles yes Metabase (or Tableau, Superset, and so on...) is another >>> use case in which it would be great to connect them to explore data >>> with the capabilities of drill, and even for an initial exploration of >>> data since sometimes reducing the initial analysis phase time could >>> help with development. >>> >>> For CSV it would be possible IMHO to guess types in a very basic way, >>> at least using basic types and map columns to a text/String when a >>> type can't be inferreed. It could be a starting point, and probably >>> the more confortable case where to start for the (partial) support of >>> catalog informations (JSON would be more complex, just to say). If >>> there are standard interfaces that can be extended/implemented for >>> filling them with those informations I'd like to do some >>> experimentation on that, if it's not too complex to follow, and if >>> someone can point me to a good place where to start for doing some >>> experiments of a possible implementation, for the CSV case. >>> >>> Thanks for the comments, I appreciate them >>> >>> Alfredo >>> >>> >>> >>> I’d like to second Alfredo’s request. I’ve been trying to get Drill >>> to work with some >>>> open source visualization tools such as SqlPad and Metabase and the >> issue I keep running into >>>> is that Drill doesn’t have a convenient way to describe how it >> interprets flat files. This >>>> is really frustrating for me since this is my main use of Drill! >>>> I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul >> interface. In any event, >>>> would be very useful to have some way to get Drill to describe how it >> will interpret a flat >>>> file. >>>> — C >>> >>> >>> >>>> On Oct 18, 2017, at 15:20, Chun Chang <[email protected]> wrote: >>>> >>>> There were discussions on the need of building a catalog for drill. But >> I don't think >>> that's the focus right now. And I am not sure the community will ever >>> decide to go in that >>> direction. For now, you best bet is to create views on top of your >>> JSON/CSV data. >>>> >>>> ________________________________ >>>> From: Alfredo Serafini <[email protected]> >>>> Sent: Wednesday, October 18, 2017 8:31:15 AM >>>> To: [email protected] >>>> Subject: describe query support? (catalog metadata, etc) >>>> >>>> Hi I'm experimenting using Drill as a data virtualization component via >>>> JDBC and it generally works great for my needs. >>>> >>>> However some of the components connected via JDBC needs basic >>>> metadata/catalog informations, and they seems to be missing for JSON / >> CSV >>>> sources. >>>> >>>> For example the simple query >>>> >>>> DESCRIBE cp.`employee.json`; >>>> >>>> returns no results. >>>> >>>> Another possible example case could be when reading from an sqlite >> source >>>> containing the same data on an `employees` table >>>> DESCRIBE `emploees` >>>> >>>> and still get no information: while this command is not directly >> supported >>>> in SQLite, an equivalent one could be for instance: >>>> PRAGMA table_info(`employees`); >>>> >>>> but trying to execute it in Drill is not possible, as it is beyond the >>>> supported standard SQL dialect. >>>> >>>> Moreover using a query like: >>>> SELECT * >>>> FROM INFORMATION_SCHEMA.COLUMNS >>>> WHERE (TABLE_NAME='employees_view'); >>>> >>>> on a view from the same data, seems to return the informations, so I >>>> suppose there should be a way to pass those informations to an >>>> internal *DatabaseMetaData >>>> <https://docs.oracle.com/javase/8/docs/api/java/sql/ >> DatabaseMetaData.html>* >>>> implementation. >>>> I wonder if there is such a component designed to manage all the catalog >>>> informations for different sources? >>>> >>>> In this case it could adopt different strategies for retrieving >> metadata, >>>> depending on the case: for sqlite a different command / dialect could be >>>> used, for CSV types could be guessed using simple heuristics, and so on. >>>> Probably cases like JSON would be much more complex, anyway. >>>> Once the metadata have been retrieved for a source, I suppose the >> standard >>>> SQL dialect should work as expected. >>>> >>>> >>>> Are there any plans to add catalog metadata support for various sources? >>>> Does anybody have some workaround? for example using views or similar >>>> approaches? >>>> >>>> >>>> thanks in advance, sorry if the message is too long :-) >>>> Alfredo >> >>
