John, See if convert_from helps in this regard, I believe it is supposed to be faster than cast varchar.
This is likely what will work on your data CONVERT_FROM(<column>, 'UTF8') Hopefully someone with more in depth knowledge of the Drill Parquet reader can comment. --Andries > On May 23, 2016, at 7:35 AM, John Omernik <[email protected]> wrote: > > I am learning more about my data here, the data was created in a CDH > version of the apache parquet-mr library. (Not sure version yet, getting > that soon). They used snappy and version 1.0 of the Parquet spec due to > Impala needing it. They are also using setEnableDictionary on the write. > > Trying to figure things out right now > > If I make a view and cast all string fields to a VARCHAR drill shows the > right result, but it's slow. > > (10 row select from raw = 1.9 seconds, 10 row select with CAST in a view = > 25 seconds) > > I've resigned myself to converting the table once for performance, which > isn't an issue however I am getting different issues on that front (I'll > open a new thread for that) > > Other than the cast(field AS VARCHAR) as field is there any other (perhaps > more performant) way to handle this situation? > > > > > > On Mon, May 23, 2016 at 8:31 AM, Todd <[email protected]> wrote: > >> >> Looks like Impala encoded string as binary data, I think there is some >> configuration in Drill(I know spark has) that helps do the conversion. >> >> >> >> >> >> At 2016-05-23 21:25:17, "John Omernik" <[email protected]> wrote: >>> Hey all, I have some Parquet files that I believe were made in a Map >> Reduce >>> job and work well in Impala, however, when I read them in Drill, the >> fields >>> that are strings come through as [B@25ddbb etc. The exact string >>> represented as regex would be /\[B@[a-f0-9]{8}/ (Pointers maybe?) >>> >>> Well, I found I can cast those fields as Varchar... and get the right >>> data... is this the right approach? Why is this happening? Performance >>> wise am I hurting something by doing the cast to Varchar? >>> >>> >>> Any thoughts would be helpful... >>> >>> John >>
