Hi,
I am getting the following warning:

[WARN] org.apache.parquet.hadoop.ParquetRecordReader: Can not initialize 
counter due to context is not a instance of TaskInputOutputContext, but is 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl

This seems to occur every time I try to read from parquet a nested data 
structure.

Example to reproduce:


case class A(v: Int)
case class B(v: A)
val filename = "test"
val a = A(1)
val b = B(a)
val df1: DataFrame = Seq[B](b).toDF
df1.write.parquet(filename)
val df2 = spark.read.parquet(filename)
df2.show()


I also found this https://issues.apache.org/jira/browse/SPARK-18660 but it has 
no info and no resolution (it does point to the issue being fixed in parquet 
1.10 while spark still uses parquet 1.8)
I currently added the following to log4j.properties:
log4j.logger.org.apache.parquet.hadoop.ParquetRecordReader=ERROR

however, this seems like a poor solution to me.
Any decent way to solve this?


Thanks,
              Assaf.

Reply via email to