Re: Over writing files

2015-09-11 Thread Hitesh Shah
This is probably a question for the Hive dev mailing list on how the 
staging/output directory name is determined. i.e. 
".hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1”. You may 
need to change this value in the config being used to configure the output of 
the vertex that is doing the write to HDFS.

— Hitesh


On Sep 11, 2015, at 1:09 PM, Raajay  wrote:

> I am running DAGs generated by Hive using my custom Tez Client. So I 
> serialize a DAG, load it back and submit it later. Everything works great the 
> first time; however, on second runs the I get a RunTime exception (snippet 
> below)
> 
> My guess, it since the same DAG is run again, the output tables (have same 
> id) and that prevents overwrite. 
> 
> Where should i introduce randomness in the file name ? Should I change some 
> name field in FileSinkDescriptor every time I re-run the dag ? 
> 
> Thanks
> Raajay
> 
> 
>  Vertex failed, vertexName=Reducer 3, 
> vertexId=vertex_1441949856963_0011_1_04, diagnostics=[Task failed, 
> taskId=task_1441949856963_0011_1_04_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators: 
> Unable to rename output from: 
> hdfs://10.10.1.2:8020/apps/hive/output_tab/.hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1/_task_tmp.-ext-1/_tmp.00_0
>  to: 
> hdfs://10.10.1.2:8020/apps/hive/output_tab/.hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1/_tmp.-ext-1/00_0
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)
> at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171



Over writing files

2015-09-11 Thread Raajay
I am running DAGs generated by Hive using my custom Tez Client. So I
serialize a DAG, load it back and submit it later. Everything works great
the first time; however, on second runs the I get a RunTime exception
(snippet below)

My guess, it since the same DAG is run again, the output tables (have same
id) and that prevents overwrite.

Where should i introduce randomness in the file name ? Should I change some
name field in FileSinkDescriptor every time I re-run the dag ?

Thanks
Raajay


 Vertex failed, vertexName=Reducer 3,
vertexId=vertex_1441949856963_0011_1_04, diagnostics=[Task failed,
taskId=task_1441949856963_0011_1_04_00, diagnostics=[TaskAttempt 0
failed, info=[Error: Failure while running task:java.lang.RuntimeException:
java.lang.RuntimeException: Hive Runtime Error while closing operators:
Unable to rename output from: hdfs://
10.10.1.2:8020/apps/hive/output_tab/.hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1/_task_tmp.-ext-1/_tmp.00_0
to: hdfs://
10.10.1.2:8020/apps/hive/output_tab/.hive-staging_hive_2015-09-11_00-07-40_043_6365145769624003668-1/_tmp.-ext-1/00_0
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:171)

at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:345)

at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)

at
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171