[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-27 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350471#comment-15350471
 ] 

Bing Li commented on HIVE-13850:


Hi, [~ashutoshc]
Thanks a lot for your comment. 
It worked for us to set Hive with ACID supported.

I will close this defect as well.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320783#comment-15320783
 ] 

Ashutosh Chauhan commented on HIVE-13850:
-

DbTxnManager works with only ACID tables. For non-acid tables use 
ZooKeeperLockManager. However, if you are running in highly concurrent 
environment, its better to use ACID tables.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-06-08 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320675#comment-15320675
 ] 

Bing Li commented on HIVE-13850:


Hi, [~ashutoshc]
Thank you for your comments. 
Yes, you're right. The issue hasn't been resolved by naming the target file 
with timestamp. We ran into it again...

We tried to set the following properties, but still got the error. 
Hive.support.concurrency -> true
Hive.txn.manager -> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

Are there any other properties required?

Thank you.

> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13850) File name conflict when have multiple INSERT INTO queries running in parallel

2016-05-26 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302260#comment-15302260
 ] 

Ashutosh Chauhan commented on HIVE-13850:
-

Whatever name you chose you will always be susceptible to [TOCTTOU issue | 
https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use] since name is 
chosen by different process (hive cli) then the one doing renames (Namenode) 
Until HDFS adds merge api (HDFS-9763) best way to handle this scenario is to 
turn on locking https://cwiki.apache.org/confluence/display/Hive/Locking


> File name conflict when have multiple INSERT INTO queries running in parallel
> -
>
> Key: HIVE-13850
> URL: https://issues.apache.org/jira/browse/HIVE-13850
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Bing Li
>Assignee: Bing Li
> Attachments: HIVE-13850-1.2.1.patch
>
>
> We have an application which connect to HiveServer2 via JDBC.
> In the application, it executes "INSERT INTO" query to the same table.
> If there are a lot of users running the application at the same time. Some of 
> the INSERT could fail.
> The root cause is that in Hive.checkPaths(), it uses the following method to 
> check the existing of the file. But if there are multiple inserts running in 
> parallel, it will led to the conflict.
> for (int counter = 1; fs.exists(itemDest) || destExists(result, itemDest); 
> counter++) {
>   itemDest = new Path(destf, name + ("_copy_" + counter) + 
> filetype);
> }
> The Error Message
> ===
> In hive log,
> org.apache.hadoop.hive.ql.metadata.HiveException: copyFiles: error  
> while moving files!!! Cannot move hdfs://node:8020/apps/hive/warehouse/met
> 
> adata.db/scalding_stats/.hive-staging_hive_2016-05-10_18-46-
> 23_642_2056172497900766879-3321/-ext-1/00_0 to 
> hdfs://node:8020/apps/hive  
> /warehouse/metadata.db/scalding_stats/00_0_copy_9014
> at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java: 
> 2719)   
> at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java: 
> 1645)  
> 
> In hadoop log, 
> WARN  hdfs.StateChange (FSDirRenameOp.java: 
> unprotectedRenameTo(174)) - DIR* FSDirectory.unprotectedRenameTo:   
> failed to rename /apps/hive/warehouse/metadata.db/scalding_stats/.hive- 
> staging_hive_2016-05-10_18-46-23_642_2056172497900766879-3321/-ext- 
> 1/00_0 to /apps/hive/warehouse/metadata.
> db/scalding_stats/00_0_copy_9014 because destination exists



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)