Re: Delete hive partition while executing query.

Igor Kuzmenko Tue, 07 Jun 2016 04:02:26 -0700

   - hive.support.concurrency
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.support.concurrency>
–
   true
   - hive.enforce.bucketing
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.enforce.bucketing>
–
   true (Not required as of Hive 2.0
   <https://issues.apache.org/jira/browse/HIVE-12331>)
   - hive.exec.dynamic.partition.mode
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.dynamic.partition.mode>
–
   nonstrict
   - hive.txn.manager
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.txn.manager>
    – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
   - hive.compactor.initiator.on
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.compactor.initiator.on>
– true
   (for exactly one instance of the Thrift metastore service)
   - hive.compactor.worker.threads
   
<https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.compactor.worker.threads>
–
   a positive number on at least one instance of the Thrift metastore service


I've double check all settings and restart hive. The problem remained.
I've complete the same test with little modification in console output.

asyncExecute("Select count(distinct in_info_msisdn) from
mobile_connections where dt=20151124 and msisdn_last_digit=1", 1);
Thread.sleep(3000);
asyncExecute("alter table mobile_connections drop if exists partition
(dt=20151124, msisdn_last_digit=1) purge", 2);
Thread.sleep(3000);
asyncExecute("Select count(distinct in_info_msisdn) from
mobile_connections where dt=20151124 and msisdn_last_digit=1", 3);
Thread.sleep(3000);
asyncExecute("Select count(distinct in_info_msisdn) from
mobile_connections where dt=20151124 and msisdn_last_digit=1", 4);


Like in previous test four queries executing in parallel. First, third and
fourth are simple selects. Second is drop partition command.
Here's out put:

Tue Jun 07 13:45:18 MSK 2016 Start thread 1
Tue Jun 07 13:45:21 MSK 2016 Start thread 2
Tue Jun 07 13:45:24 MSK 2016 Start thread 3
Tue Jun 07 13:45:27 MSK 2016 Start thread 4

Tue Jun 07 13:45:32 MSK 2016 Finish thread 1
Tue Jun 07 13:45:32 MSK 2016 Thread 1 result: '210802'
Tue Jun 07 13:45:32 MSK 2016 Thread 1 completed in 13810 ms

Tue Jun 07 13:45:35 MSK 2016 Finished thread 2
Tue Jun 07 13:45:35 MSK 2016 Thread 2 completed in 13568 ms

Tue Jun 07 13:45:39 MSK 2016 Finish tread 3
Tue Jun 07 13:45:39 MSK 2016 Thread 3 result: '210802'
Tue Jun 07 13:45:39 MSK 2016 Thread 3 completed in 14970 ms

java.sql.SQLException: Error while processing statement: FAILED: Execution
Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1461923723503_0256_1_00,
diagnostics=[Vertex vertex_1461923723503_0256_1_00 [Map 1] killed/failed
due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections
initializer failed, vertex=vertex_1461923723503_0256_1_00 [Map 1],
java.lang.RuntimeException: serious problem
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException:
java.io.FileNotFoundException: File
hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
does not exist.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
... 15 more

So second thread definitely waits until first thread completes and than
make a partition drop. Than, somehow, after partition was droped, third
query completes and shows result. Fourth query doesn't complete at all,
throwing exception.






On Mon, Jun 6, 2016 at 8:30 PM, Alan Gates <alanfga...@gmail.com> wrote:

> Do you have the system configured to use the DbTxnManager?  See
> https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration
> for details on how to set this up.  The transaction manager is what manages
> locking and makes sure that your queries don’t stomp each other.
>
> Alan.
>
> > On Jun 6, 2016, at 06:01, Igor Kuzmenko <f1she...@gmail.com> wrote:
> >
> > Hello, I'm trying to find a safe way to delete partition with all data
> it includes.
> >
> > I'm using Hive 1.2.1, Hive JDBC driver 1.2.1 and perform simple test on
> transactional table:
> >
> > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=2", 1);
> > Thread.sleep(3000);
> > asyncExecute("alter table mobile_connections drop if exists partition
> (dt=20151124, msisdn_last_digit=2) purge", 2);
> > Thread.sleep(3000);
> > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=2", 3);
> > Thread.sleep(3000);
> > asyncExecute("Select count(distinct in_info_msisdn) from
> mobile_connections where dt=20151124 and msisdn_last_digit=2", 4);
> > (full code here)
> >
> > I cretate several threads, each execute query async. First is querying
> partition. Second drop partition. Others are the same as first. First query
> takes about 10-15 seconds to complete, so "alter table" query starts before
> first query completes.
> > As a result i get:
> >       • First query - successfully completes
> >       • Second query - successfully completes
> >       • Third query - successfully completes
> >       • Fourth query - throw exception:
> > java.sql.SQLException: Error while processing statement: FAILED:
> Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map
> 1, vertexId=vertex_1461923723503_0189_1_00, diagnostics=[Vertex
> vertex_1461923723503_0189_1_00 [Map 1] killed/failed due
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: mobile_connections initializer
> failed, vertex=vertex_1461923723503_0189_1_00 [Map 1],
> java.lang.RuntimeException: serious problem
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1059)
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1086)
> >       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:305)
> >       at
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:407)
> >       at
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:255)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:248)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:422)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:248)
> >       at
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:235)
> >       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >       at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.util.concurrent.ExecutionException:
> java.io.FileNotFoundException: File
> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
> does not exist.
> >       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> >       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1036)
> >       ... 15 more
> > Caused by: java.io.FileNotFoundException: File
> hdfs://jupiter.bss:8020/apps/hive/warehouse/mobile_connections/dt=20151124/msisdn_last_digit=2
> does not exist.
> >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:958)
> >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:937)
> >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:882)
> >       at
> org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:878)
> >       at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> >       at
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:878)
> >       at
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1694)
> >       at
> org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedStatus(Hadoop23Shims.java:690)
> >       at
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:366)
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:648)
> >       at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:634)
> >       ... 4 more
> > ]Vertex killed, vertexName=Reducer 3,
> vertexId=vertex_1461923723503_0189_1_02, diagnostics=[Vertex received Kill
> in INITED state., Vertex vertex_1461923723503_0189_1_02 [Reducer 3]
> killed/failed due to:OTHER_VERTEX_FAILURE]Vertex killed, vertexName=Reducer
> 2, vertexId=vertex_1461923723503_0189_1_01, diagnostics=[Vertex received
> Kill in INITED state., Vertex vertex_1461923723503_0189_1_01 [Reducer 2]
> killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to
> VERTEX_FAILURE. failedVertices:1 killedVertices:2
> >       at
> org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
> >       at Test$MyRunnable.run(Test.java:54)
> >       at java.lang.Thread.run(Thread.java:745)
> >
> > Since I'm using transactional table, I expect, that all queries,
> executed after partition drop, will complete successfully with no result.
> Am I doing something wrong? Is there other way to drop partition with data?
>
>

Re: Delete hive partition while executing query.

Reply via email to