Hi Alfredo, When I was trying to get Drill to work with various BI tools, all I really needed was a list of columns. Data types would be a big bonus, but Drill interprets CSV data as plain text anyway. It would be really useful for other file types where Drill does infer data types.
— C > On Oct 19, 2017, at 6:13 AM, Alfredo Serafini <[email protected]> wrote: > > Hi thanks for the replies! > > @Chun yes using Views is an approach I considered, and I like it also > methodologically, in order to have some change to "prepare" the data > just a bit. I'm testing drill as a sort of data facade for tools which > handles mappings to other context, so this could be helpful for me. > > Anyway I have some concerns regardings metadata/catalog support for > views too: it seems that every view is saved on disk as a JSON file, > then experimenting the same issues. Are you suggesting saving views to > some kind of relational database storage for staging purposes? Is that > possible? > > Sorry for all the questions :-) > > > @Charles yes Metabase (or Tableau, Superset, and so on...) is another > use case in which it would be great to connect them to explore data > with the capabilities of drill, and even for an initial exploration of > data since sometimes reducing the initial analysis phase time could > help with development. > > For CSV it would be possible IMHO to guess types in a very basic way, > at least using basic types and map columns to a text/String when a > type can't be inferreed. It could be a starting point, and probably > the more confortable case where to start for the (partial) support of > catalog informations (JSON would be more complex, just to say). If > there are standard interfaces that can be extended/implemented for > filling them with those informations I'd like to do some > experimentation on that, if it's not too complex to follow, and if > someone can point me to a good place where to start for doing some > experiments of a possible implementation, for the CSV case. > > Thanks for the comments, I appreciate them > > Alfredo > > > > I’d like to second Alfredo’s request. I’ve been trying to get Drill > to work with some >> open source visualization tools such as SqlPad and Metabase and the issue I >> keep running into >> is that Drill doesn’t have a convenient way to describe how it interprets >> flat files. This >> is really frustrating for me since this is my main use of Drill! >> I wish the SELECT * FROM <data> LIMIT 0 worked in the RESTFul interface. In >> any event, >> would be very useful to have some way to get Drill to describe how it will >> interpret a flat >> file. >> — C > > > >> On Oct 18, 2017, at 15:20, Chun Chang <[email protected]> wrote: >> >> There were discussions on the need of building a catalog for drill. But I >> don't think > that's the focus right now. And I am not sure the community will ever > decide to go in that > direction. For now, you best bet is to create views on top of your > JSON/CSV data. >> >> ________________________________ >> From: Alfredo Serafini <[email protected]> >> Sent: Wednesday, October 18, 2017 8:31:15 AM >> To: [email protected] >> Subject: describe query support? (catalog metadata, etc) >> >> Hi I'm experimenting using Drill as a data virtualization component via >> JDBC and it generally works great for my needs. >> >> However some of the components connected via JDBC needs basic >> metadata/catalog informations, and they seems to be missing for JSON / CSV >> sources. >> >> For example the simple query >> >> DESCRIBE cp.`employee.json`; >> >> returns no results. >> >> Another possible example case could be when reading from an sqlite source >> containing the same data on an `employees` table >> DESCRIBE `emploees` >> >> and still get no information: while this command is not directly supported >> in SQLite, an equivalent one could be for instance: >> PRAGMA table_info(`employees`); >> >> but trying to execute it in Drill is not possible, as it is beyond the >> supported standard SQL dialect. >> >> Moreover using a query like: >> SELECT * >> FROM INFORMATION_SCHEMA.COLUMNS >> WHERE (TABLE_NAME='employees_view'); >> >> on a view from the same data, seems to return the informations, so I >> suppose there should be a way to pass those informations to an >> internal *DatabaseMetaData >> <https://docs.oracle.com/javase/8/docs/api/java/sql/DatabaseMetaData.html>* >> implementation. >> I wonder if there is such a component designed to manage all the catalog >> informations for different sources? >> >> In this case it could adopt different strategies for retrieving metadata, >> depending on the case: for sqlite a different command / dialect could be >> used, for CSV types could be guessed using simple heuristics, and so on. >> Probably cases like JSON would be much more complex, anyway. >> Once the metadata have been retrieved for a source, I suppose the standard >> SQL dialect should work as expected. >> >> >> Are there any plans to add catalog metadata support for various sources? >> Does anybody have some workaround? for example using views or similar >> approaches? >> >> >> thanks in advance, sorry if the message is too long :-) >> Alfredo
