Problem saving Hive table with Overwrite mode

nimrodo Wed, 13 Jul 2016 00:35:35 -0700

Hi,

I'm trying to write a partitioned parquet table and save it as a hive table
at a specific path. 
The code I'm using is in Java (columns and table names are a bit different
in my real code) and the code is executed using AirFlow which calls the
spark-submit:


aggregatedData.write().format("parquet").mode(SaveMode.Overwrite).partitionBy("schema_partition",
"colC").option("path","hdfs://sandbox.hortonworks.com:8020/BatchZone/table.parquet").saveAsTable("table_info");

However I'm getting the following exception:
[2016-07-13 10:18:53,490] {bash_operator.py:77} INFO - Exception in thread
"main" org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
execute, tree:
[2016-07-13 10:18:53,490] {bash_operator.py:77} INFO -
TungstenAggregate(key=[colA#43,colB#44,colC#46],
functions=[(min(colD#37L),mode=Final,isDistinct=false),(max(colE#42L),mode=Final,isDistinct=false),(max(colF#41L),mode=Final,isDistinct=false),(max(colG#38L),mode=Final,isDistinct=false),(max(colH#39L),mode=Final,isDistinct=false),(max(colI#40L),mode=Final,isDistinct=false)],
output=[colA#43,colB#44,colD#51L,colE#52L,colC#46,colF#53L,colG#54L,colH#55L,colI#56L])
[2016-07-13 10:18:53,490] {bash_operator.py:77} INFO - +- TungstenExchange
hashpartitioning(colA#43,colB#44,colC#46,200), None
[2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - +-
TungstenAggregate(key=[colA#43,colB#44,colC#46],
functions=[(min(colD#37L),mode=Partial,isDistinct=false),(max(colE#42L),mode=Partial,isDistinct=false),(max(colF#41L),mode=Partial,isDistinct=false),(max(colG#38L),mode=Partial,isDistinct=false),(max(colH#39L),mode=Partial,isDistinct=false),(max(colI#40L),mode=Partial,isDistinct=false)],
output=[colA#43,colB#44,colC#46,min#73L,max#74L,max#75L,max#76L,max#77L,max#78L])
[2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - +- Scan
ParquetRelation[colE#42L,colA#43,colG#38L,colD#37L,colH#39L,colF#41L,colI#40L,colC#46,colB#44]
InputPaths: hdfs://sandbox.hortonworks.com:8020/BatchZone/table.parquet
[2016-07-13 10:18:53,491] {bash_operator.py:77} INFO -
[2016-07-13 10:18:53,491] {bash_operator.py:77} INFO - at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
[2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
[2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
[2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
[2016-07-13 10:18:53,492] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
[2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.Project.doExecute(basicOperators.scala:46)
[2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
[2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
[2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[2016-07-13 10:18:53,493] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
[2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
[2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
[2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:109)
[2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
[2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:108)
[2016-07-13 10:18:53,494] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
[2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108)
[2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
[2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
[2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
[2016-07-13 10:18:53,495] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
[2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
[2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
[2016-07-13 10:18:53,496] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
[2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
[2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:256)
[2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at
org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:274)
[2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
[2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
[2016-07-13 10:18:53,497] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
[2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
[2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
[2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
[2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
[2016-07-13 10:18:53,498] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
[2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:251)
[2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at
org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:221)
[2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at
project.common.tables.ParquetReaderWriter.saveTaskOutputPaths(ParquetReaderWriter.java:159)
[2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at
project.GlobalAggregationManager.main(GlobalAggregationManager.java:97)
[2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[2016-07-13 10:18:53,499] {bash_operator.py:77} INFO - at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at
java.lang.reflect.Method.invoke(Method.java:497)
[2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
[2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
[2016-07-13 10:18:53,500] {bash_operator.py:77} INFO - at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
[2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
[2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - Caused by:
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute,
tree:
[2016-07-13 10:18:53,501] {bash_operator.py:77} INFO - TungstenExchange
hashpartitioning(colA#43,colB#44,colC#46,200), None
[2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - +-
TungstenAggregate(key=[colA#43,colB#44,colC#46],
functions=[(min(colD#37L),mode=Partial,isDistinct=false),(max(colE#42L),mode=Partial,isDistinct=false),(max(colF#41L),mode=Partial,isDistinct=false),(max(colG#38L),mode=Partial,isDistinct=false),(max(colH#39L),mode=Partial,isDistinct=false),(max(colI#40L),mode=Partial,isDistinct=false)],
output=[colA#43,colB#44,colC#46,min#73L,max#74L,max#75L,max#76L,max#77L,max#78L])
[2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - +- Scan
ParquetRelation[colE#42L,colA#43,colG#38L,colD#37L,colH#39L,colF#41L,colI#40L,colC#46,colB#44]
InputPaths: hdfs://sandbox.hortonworks.com:8020/BatchZone/table.parquet
[2016-07-13 10:18:53,502] {bash_operator.py:77} INFO -
[2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
[2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
[2016-07-13 10:18:53,502] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
[2016-07-13 10:18:53,503] {bash_operator.py:77} INFO - ... 50 more
[2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - Caused by:
java.io.FileNotFoundException: File does not exist:
/BatchZone/table.parquet/schema_partition=2/colC=14/part-r-00003-92a58db2-cfdc-47f0-9d7d-3edb8f5fd7a1.gz.parquet
[2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
[2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
[2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
[2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
[2016-07-13 10:18:53,504] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
[2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:672)
[2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
[2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
[2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
[2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
[2016-07-13 10:18:53,505] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
[2016-07-13 10:18:53,506] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
[2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at
java.security.AccessController.doPrivileged(Native Method)
[2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at
javax.security.auth.Subject.doAs(Subject.java:422)
[2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
[2016-07-13 10:18:53,509] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
[2016-07-13 10:18:53,509] {bash_operator.py:77} INFO -
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
java.lang.reflect.Constructor.newInstance(Constructor.java:422)
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
[2016-07-13 10:18:53,510] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1242)
[2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1227)
[2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:1285)
[2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:222)
[2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DistributedFileSystem$1.doCall(DistributedFileSystem.java:218)
[2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
[2016-07-13 10:18:53,511] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:218)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:210)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:397)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
org.apache.parquet.hadoop.ParquetInputFormat.getSplits(ParquetInputFormat.java:294)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$buildInternalScan$1$$anon$1.getPartitions(ParquetRelation.scala:363)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
[2016-07-13 10:18:53,512] {bash_operator.py:77} INFO - at
scala.Option.getOrElse(Option.scala:120)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
scala.Option.getOrElse(Option.scala:120)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
[2016-07-13 10:18:53,513] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[2016-07-13 10:18:53,514] {bash_operator.py:77} INFO - at
scala.collection.AbstractTraversable.map(Traversable.scala:105)
[2016-07-13 10:18:53,515] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
[2016-07-13 10:18:53,515] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
[2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
[2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at
scala.Option.getOrElse(Option.scala:120)
[2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
[2016-07-13 10:18:53,517] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
[2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
[2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at
scala.Option.getOrElse(Option.scala:120)
[2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
[2016-07-13 10:18:53,518] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
scala.Option.getOrElse(Option.scala:120)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
[2016-07-13 10:18:53,519] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
[2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
[2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
[2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - ... 58 more
[2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist:
/BatchZone/table.parquet/schema_partition=2/colC=14/part-r-00003-92a58db2-cfdc-47f0-9d7d-3edb8f5fd7a1.gz.parquet
[2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
[2016-07-13 10:18:53,520] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
[2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
[2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
[2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
[2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:672)
[2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:373)
[2016-07-13 10:18:53,521] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
[2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
[2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
[2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
[2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
[2016-07-13 10:18:53,522] {bash_operator.py:77} INFO - at
java.security.AccessController.doPrivileged(Native Method)
[2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at
javax.security.auth.Subject.doAs(Subject.java:422)
[2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
[2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
[2016-07-13 10:18:53,523] {bash_operator.py:77} INFO -
[2016-07-13 10:18:53,523] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Client.call(Client.java:1426)
[2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.Client.call(Client.java:1363)
[2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
[2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at
com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
[2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:257)
[2016-07-13 10:18:53,524] {bash_operator.py:77} INFO - at
sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at
java.lang.reflect.Method.invoke(Method.java:497)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at
com.sun.proxy.$Proxy14.getBlockLocations(Unknown Source)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
[2016-07-13 10:18:53,525] {bash_operator.py:77} INFO - ... 105 more

Any clue would be appreciated.

Nimrod









--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Problem-saving-Hive-table-with-Overwrite-mode-tp27327.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Problem saving Hive table with Overwrite mode

Reply via email to