Re: SQLcontext changing String field to Long

2015-10-12 Thread shobhit gupta
Great, that helped a lot, issue is fixed now. :) Thank you very much! On Sun, Oct 11, 2015 at 12:29 PM, Yana Kadiyska wrote: > In our case, we do not actually need partition inference so the > workaround was easy -- instead of using the path as > rootpath/batch_id=333/... we changed the paths

Re: SQLcontext changing String field to Long

2015-10-11 Thread Yana Kadiyska
In our case, we do not actually need partition inference so the workaround was easy -- instead of using the path as rootpath/batch_id=333/... we changed the paths to rootpath/333/ This works for us because we compute the set of HDFS paths manually and then register a dataframe into a SQLContex

Re: SQLcontext changing String field to Long

2015-10-10 Thread shobhit gupta
here is what the df.schema.toString() prints. DF Schema is ::StructType(StructField(batch_id,StringType,true)) I think you nailed the problem, this filed is the part of our hdfs file path. We have kind of partitioned our data on the basis of batch_ids folder. How did you get around it? Thanks f

Re: SQLcontext changing String field to Long

2015-10-10 Thread Yana Kadiyska
can you show the output of df.printSchema? Just a guess but I think I ran into something similar with a column that was part of a path in parquet. E.g. we had an account_id in the parquet file data itself which was of type string but we also named the files in the following manner /somepath/account