Hi, I'm trying to write a partitioned parquet table and save it as a hive table at a specific path. The code I'm using is in Java (columns and table names are a bit different in my real code) and the code is executed using AirFlow which calls the spark-submit:
aggregatedData.write().format("parquet").mode(SaveMode.Overwrite).partitionBy("schema_partition", "colC").option("path","hdfs://sandbox.hortonworks.com:8020/BatchZone/table.parquet").saveAsTable("table_info"); However I'm getting the following exception: [2016-07-13 10:18:53,490] {bash_operator.py:77} INFO - Exception in thread "main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: [2016-07-13 10:18:53,490] {bash_operator.py:77} INFO - TungstenAggregate(key=[colA#43,colB#44,colC#46], functions=[(min(colD#37L),mode=Final,isDistinct=false),(max(colE#42L),mode=Final,isDistinct=false),(max(colF#41L),mode=Final,isDistinct=false),(max(colG#38L),mode=Final,isDistinct=false),(max(colH#39L),mode=Final,isDistinct=false),(max(colI#40L),mode=Final,isDistinct=false)], output=[colA#43,colB#44,colD#51L,colE#52L,colC#46,colF#53L,colG#54L,colH#55L,colI#56L]) [2016-07-13 10:18:53,490] {bash_operator.py:77} INFO - +- TungstenExchange hashpartitioning(colA#43,colB#44,colC#46,200), None [2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - +- TungstenAggregate(key=[colA#43,colB#44,colC#46], functions=[(min(colD#37L),mode=Partial,isDistinct=false),(max(colE#42L),mode=Partial,isDistinct=false),(max(colF#41L),mode=Partial,isDistinct=false),(max(colG#38L),mode=Partial,isDistinct=false),(max(colH#39L),mode=Partial,isDistinct=false),(max(colI#40L),mode=Partial,isDistinct=false)], output=[colA#43,colB#44,colC#46,min#73L,max#74L,max#75L,max#76L,max#77L,max#78L]) [2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - +- Scan ParquetRelation[colE#42L,colA#43,colG#38L,colD#37L,colH#39L,colF#41L,colI#40L,colC#46,colB#44] InputPaths: hdfs://sandbox.hortonworks.com:8020/BatchZone/table.parquet [2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - [2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) [2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80) [2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) [2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) [2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) [2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) [2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46) [2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) [2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) [2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) [2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) [2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) [2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) [2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:109) [2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) [2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108) [2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) [2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) [2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) [2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) [2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) [2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) [2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) [2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) [2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) [2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) [2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) [2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256) [2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:274) [2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58) [2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56) [2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70) [2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) [2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) [2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) [2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) [2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) [2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) [2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:251) [2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221) [2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at project.common.tables.ParquetReaderWriter.saveTaskOutputPaths(ParquetReaderWriter.java:159) [2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at project.GlobalAggregationManager.main(GlobalAggregationManager.java:97) [2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at java.lang.reflect.Method.invoke(Method.java:497) [2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) [2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) [2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) [2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) [2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: [2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - TungstenExchange hashpartitioning(colA#43,colB#44,colC#46,200), None [2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - +- TungstenAggregate(key=[colA#43,colB#44,colC#46], functions=[(min(colD#37L),mode=Partial,isDistinct=false),(max(colE#42L),mode=Partial,isDistinct=false),(max(colF#41L),mode=Partial,isDistinct=false),(max(colG#38L),mode=Partial,isDistinct=false),(max(colH#39L),mode=Partial,isDistinct=false),(max(colI#40L),mode=Partial,isDistinct=false)], output=[colA#43,colB#44,colC#46,min#73L,max#74L,max#75L,max#76L,max#77L,max#78L]) [2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - +- Scan ParquetRelation[colE#42L,colA#43,colG#38L,colD#37L,colH#39L,colF#41L,colI#40L,colC#46,colB#44] InputPaths: hdfs://sandbox.hortonworks.com:8020/BatchZone/table.parquet [2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - [2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49) [2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247) [2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) [2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - ... 50 more [2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - Caused by: java.io.FileNotFoundException: File does not exist: /BatchZone/table.parquet/schema_partition=2/colC=14/part-r-00003-92a58db2-cfdc-47f0-9d7d-3edb8f5fd7a1.gz.parquet [2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) [2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) [2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) [2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) [2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) [2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:672) [2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373) [2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) [2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) [2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) [2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) [2016-07-13 10:18:53,506] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) [2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at java.security.AccessController.doPrivileged(Native Method) [2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at javax.security.auth.Subject.doAs(Subject.java:422) [2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) [2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) [2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at java.lang.reflect.Constructor.newInstance(Constructor.java:422) [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) [2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242) [2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227) [2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285) [2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:222) [2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:218) [2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) [2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:218) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:210) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:397) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at org.apache.parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:294) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1.getPartitions(ParquetRelation.scala:363) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240) [2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at scala.Option.getOrElse(Option.scala:120) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at scala.Option.getOrElse(Option.scala:120) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) [2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) [2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at scala.collection.AbstractTraversable.map(Traversable.scala:105) [2016-07-13 10:18:53,515] {bash_operator.py:77} INFO - at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66) [2016-07-13 10:18:53,515] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242) [2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240) [2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at scala.Option.getOrElse(Option.scala:120) [2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) [2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) [2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242) [2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240) [2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at scala.Option.getOrElse(Option.scala:120) [2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) [2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at scala.Option.getOrElse(Option.scala:120) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at org.apache.spark.rdd.RDD.partitions(RDD.scala:240) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220) [2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254) [2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248) [2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48) [2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - ... 58 more [2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /BatchZone/table.parquet/schema_partition=2/colC=14/part-r-00003-92a58db2-cfdc-47f0-9d7d-3edb8f5fd7a1.gz.parquet [2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) [2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) [2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828) [2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799) [2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712) [2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:672) [2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373) [2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) [2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) [2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) [2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) [2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) [2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at java.security.AccessController.doPrivileged(Native Method) [2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at javax.security.auth.Subject.doAs(Subject.java:422) [2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) [2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) [2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - [2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1426) [2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.Client.call(Client.java:1363) [2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) [2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source) [2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:257) [2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at java.lang.reflect.Method.invoke(Method.java:497) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240) [2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - ... 105 more Any clue would be appreciated. Nimrod -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Problem-saving-Hive-table-with-Overwrite-mode-tp27327.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org