I forgot to mention that the imageId field is a custom scala object. Do I need to implement some special method to make it works (equal, hashCode ) ?
On Tue, Apr 14, 2015 at 5:00 PM, Jaonary Rabarisoa <jaon...@gmail.com> wrote: > Dear all, > > In the latest version of spark there's a feature called : automatic > partition discovery and Schema migration for parquet. As far as I know, > this gives the ability to split the DataFrame into several parquet files, > and by just loading the parent directory one can get the global schema of > the parent DataFrame. > > I'm trying to use this feature in the following problem but I get some > troubles. I want to perfom a serie of feature of extraction for a set of > images. At a first step, my DataFrame has just two columns : imageId, > imageRawData. Then I transform the imageRowData column with different image > feature extractors. The result can be of different types. For example on > feature could be a mllib.Vector, and another one could be an Array[Byte]. > Each feature extractor store its output as a parquet file with two columns > : imageId, featureType. Then, at the end, I get the following files : > > - features/rawData.parquet > - features/feature1.parquet > - features/feature2.parquet > > When I load all the features with : > > sqlContext.load("features") > > It seems to works and I get with this example a DataFrame with 4 columns : > imageId, imageRawData, feature1, feature2. > But, when I try to read the values, for example with show, some columns > have null fields and I just can't figure out what's going wrong. > > Any ideas ? > > > Best, > > > Jao >