Vineet Singh created HADOOP-15260: ------------------------------------- Summary: Hive queries on tez are overwriting records on azure wasb storage. Key: HADOOP-15260 URL: https://issues.apache.org/jira/browse/HADOOP-15260 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 2.7.3 Environment: This scenario occurs on hdp 2.5(hadoop 2.7.3) hdfs on WASB microsoft Azure platform. The same query yields proper result on regular hdfs on hdp 2.5(hadoop 2.7.3) on premise cluster. Reporter: Vineet Singh Attachments: On Premise Cluster.JPG, azure cloud.JPG, sample_query.txt
When running multiple hive queries on Tez (see example ) the same mapper task number gets overwritten by the next union query. As seen in the azure snapshot the directories /1 ,2 ...,100 get overwritten again and again since the mapper numbers launch write again and again in the same directories. But in the on premise hadoop cluster version 2.7.3 . The directories are created as 1_copy_0,1_copy_2 and so on. Creating copies does not overwrite the data. The queries would be usually 600-1000 queries union together. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org