Great, that helped a lot, issue is fixed now. :)
Thank you very much!
On Sun, Oct 11, 2015 at 12:29 PM, Yana Kadiyska
wrote:
> In our case, we do not actually need partition inference so the
> workaround was easy -- instead of using the path as
> rootpath/batch_id=333/... we changed the paths
In our case, we do not actually need partition inference so the workaround
was easy -- instead of using the path as rootpath/batch_id=333/... we
changed the paths to rootpath/333/ This works for us because we compute
the set of HDFS paths manually and then register a dataframe into a
SQLContex
here is what the df.schema.toString() prints.
DF Schema is ::StructType(StructField(batch_id,StringType,true))
I think you nailed the problem, this filed is the part of our hdfs file
path. We have kind of partitioned our data on the basis of batch_ids folder.
How did you get around it?
Thanks f
can you show the output of df.printSchema? Just a guess but I think I ran
into something similar with a column that was part of a path in parquet.
E.g. we had an account_id in the parquet file data itself which was of type
string but we also named the files in the following manner
/somepath/account