[
https://issues.apache.org/jira/browse/HIVE-17249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hans Zeller updated HIVE-17249:
---
Description:
We are running into a problem with data getting lost when loading data in
parallel into a partitioned Hive table. The data loader runs on multiple nodes
and it dynamically creates partitions as it needs them, using the
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(String,
String, String) interface. We assume that if multiple processes try to create
the same partition at the same time, only one of them succeeds while the others
fail.
What we are seeing is that the partition gets created, but a few of the created
files end up in the .Trash folder in HDFS. From the metastore log, we assume
the following is happening in the threads of the metastore server:
- Thread 1: A first process tries to create a partition.
- Thread 1: The org.apache.hadoop.hive.metastore.HiveMetaStore.append_common()
method
creates the HDFS directory.
- Thread 2: A second process tries to create the same partition.
- Thread 2: Notices that the directory already exists and skips the step of
creating it.
- Thread 2: Update the metastore.
- Thread 2: Return success to the caller.
- Caller 2: Create a file in the partition directory and start inserting.
- Thread 1: Try to update the metastore, but this fails, since thread 2 already
has inserted the partition. Retry the operation, but it still fails.
- Thread 1: Abort the transaction and move the HDFS directory to the trash,
since it knows that it created the directory.
- Thread 1: Return failure to the caller.
The second caller can now continue to load data successfully, but the file it
loads is actually already in the trash. It returns success, but the data is not
inserted and not visible in the table.
Note that in our case, the caller(s) that got an error continue as well - they
ignore the error. I think they automatically create the HDFS partition
directory when they create their output files. These processes can insert data
successfully, the data that is lost is from the process that successfully
created the partition, we believe.
was:
We are running into a problem with data getting lost when loading data in
parallel into a partitioned Hive table. The data loader runs on multiple nodes
and it dynamically creates partitions as it needs them, using the
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.appendPartition(String,
String, String) interface. We assume that if multiple processes try to create
the same partition at the same time, only one of them succeeds while the others
fail.
What we are seeing is that the partition gets created, but a few of the created
files end up in the .Trash folder in HDFS. From the metastore log, we assume
the following is happening in the threads of the metastore server:
- Thread 1: A first process tries to create a partition.
- Thread 1: The org.apache.hadoop.hive.metastore.HiveMetaStore.append_common()
method
creates the HDFS directory.
- Thread 2: A second process tries to create the same partition.
- Thread 2: Notices that the directory already exists and skips the step of
creating it.
- Thread 2: Update the metastore.
- Thread 2: Return success to the caller.
- Caller 2: Create a file in the partition directory and start inserting.
- Thread 1: Try to update the metastore, but this fails, since thread 2 already
has inserted the partition. Retry the operation, but it still fails.
- Thread 1: Abort the transaction and move the HDFS directory to the trash,
since it knows that it created the directory.
- Thread 1: Return failure to the caller.
The first caller can now continue to load data successfully, but the file it
loads is actually already in the trash. It returns success, but the data is not
inserted and not visible in the table.
Note that in our case, the callers that got an error continue as well - they
ignore the error. I think they automatically create the HDFS partition
directory when they create their output files. These processes can insert data
successfully, the data that is lost is from the process that successfully
created the partition, we believe.
> Concurrent appendPartition calls lead to data loss
> --
>
> Key: HIVE-17249
> URL: https://issues.apache.org/jira/browse/HIVE-17249
> Project: Hive
> Issue Type: Bug
> Components: Metastore
>Affects Versions: 1.2.1
> Environment: Hortonworks HDP 2.4.
> MySQL metastore.
>Reporter: Hans Zeller
> Attachments: partition1log.txt
>
>
> We are running into a problem with data getting lost when loading data in
> parallel into a partitioned Hive table. The data loader runs on multiple
> nodes and it dynamically creates partitions as it needs them, using the
>