The schema has a StructType. Justin
On Tue, Apr 7, 2015 at 6:58 PM, Yin Huai <yh...@databricks.com> wrote: > Hi Justin, > > Does the schema of your data have any decimal, array, map, or struct type? > > Thanks, > > Yin > > On Tue, Apr 7, 2015 at 6:31 PM, Justin Yip <yipjus...@prediction.io> > wrote: > >> Hello, >> >> I have a parquet file of around 55M rows (~ 1G on disk). Performing >> simple grouping operation is pretty efficient (I get results within 10 >> seconds). However, after called DataFrame.cache, I observe a significant >> performance degrade, the same operation now takes 3+ minutes. >> >> My hunch is that DataFrame cannot leverage its columnar format after >> persisting in memory. But cannot find anywhere from the doc mentioning this. >> >> Did I miss anything? >> >> Thanks! >> >> Justin >> > >