Hi thanks for the replies! @Chun yes using Views is an approach I considered, and I like it also methodologically, in order to have some change to "prepare" the data just a bit. I'm testing drill as a sort of data facade for tools which handles mappings to other context, so this could be helpful for me.
Anyway I have some concerns regardings metadata/catalog support for views too: it seems that every view is saved on disk as a JSON file, then experimenting the same issues. Are you suggesting saving views to some kind of relational database storage for staging purposes? Is that possible? Sorry for all the questions :-) @Charles yes Metabase (or Tableau, Superset, and so on...) is another use case in which it would be great to connect them to explore data with the capabilities of drill, and even for an initial exploration of data since sometimes reducing the initial analysis phase time could help with development. For CSV it would be possible IMHO to guess types in a very basic way, at least using basic types and map columns to a text/String when a type can't be inferreed. It could be a starting point, and probably the more confortable case where to start for the (partial) support of catalog informations (JSON would be more complex, just to say). If there are standard interfaces that can be extended/implemented for filling them with those informations I'd like to do some experimentation on that, if it's not too complex to follow, and if someone can point me to a good place where to start for doing some experiments of a possible implementation, for the CSV case. Thanks for the comments, I appreciate them Alfredo I’d like to second Alfredo’s request. I’ve been trying to get Drill to work with some > open source visualization tools such as SqlPad and Metabase and the issue I > keep running into > is that Drill doesn’t have a convenient way to describe how it interprets > flat files. This > is really frustrating for me since this is my main use of Drill! > I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul interface. In > any event, > would be very useful to have some way to get Drill to describe how it will > interpret a flat > file. > — C > On Oct 18, 2017, at 15:20, Chun Chang <[email protected]> wrote: > > There were discussions on the need of building a catalog for drill. But I > don't think that's the focus right now. And I am not sure the community will ever decide to go in that direction. For now, you best bet is to create views on top of your JSON/CSV data. > > ________________________________ > From: Alfredo Serafini <[email protected]> > Sent: Wednesday, October 18, 2017 8:31:15 AM > To: [email protected] > Subject: describe query support? (catalog metadata, etc) > > Hi I'm experimenting using Drill as a data virtualization component via > JDBC and it generally works great for my needs. > > However some of the components connected via JDBC needs basic > metadata/catalog informations, and they seems to be missing for JSON / CSV > sources. > > For example the simple query > > DESCRIBE cp.`employee.json`; > > returns no results. > > Another possible example case could be when reading from an sqlite source > containing the same data on an `employees` table > DESCRIBE `emploees` > > and still get no information: while this command is not directly supported > in SQLite, an equivalent one could be for instance: > PRAGMA table_info(`employees`); > > but trying to execute it in Drill is not possible, as it is beyond the > supported standard SQL dialect. > > Moreover using a query like: > SELECT * > FROM INFORMATION_SCHEMA.COLUMNS > WHERE (TABLE_NAME='employees_view'); > > on a view from the same data, seems to return the informations, so I > suppose there should be a way to pass those informations to an > internal *DatabaseMetaData > <https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html>* > implementation. > I wonder if there is such a component designed to manage all the catalog > informations for different sources? > > In this case it could adopt different strategies for retrieving metadata, > depending on the case: for sqlite a different command / dialect could be > used, for CSV types could be guessed using simple heuristics, and so on. > Probably cases like JSON would be much more complex, anyway. > Once the metadata have been retrieved for a source, I suppose the standard > SQL dialect should work as expected. > > > Are there any plans to add catalog metadata support for various sources? > Does anybody have some workaround? for example using views or similar > approaches? > > > thanks in advance, sorry if the message is too long :-) > Alfredo
