[ 
https://issues.apache.org/jira/browse/HIVE-12285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach reassigned HIVE-12285:
-------------------------------------

    Assignee: Carl Steinbach  (was: Elliot West)

> Add locking to HCatClient
> -------------------------
>
>                 Key: HIVE-12285
>                 URL: https://issues.apache.org/jira/browse/HIVE-12285
>             Project: Hive
>          Issue Type: Improvement
>          Components: HCatalog
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>            Assignee: Carl Steinbach
>              Labels: concurrency, hcatalog, lock, locking, locks
>
> With the introduction of a concurrency model (HIVE-1293) Hive uses locks to 
> coordinate  access and updates to both table data and metadata. Within the 
> Hive CLI such lock management is seamless. However, Hive provides additional 
> APIs that permit interaction with data repositories, namely the HCatalog 
> APIs. Currently, operations implemented by this API do not participate with 
> Hive's locking scheme. Furthermore, access to the locking mechanisms is not 
> exposed by the APIs (as is the case with the Metastore Thrift API) and so 
> users are not able to explicitly interact with locks either. This has created 
> a less than ideal situation where users of the APIs have no choice but to 
> manipulate these data repositories outside of the command of Hive's lock 
> management, potentially resulting in situations where data inconsistencies 
> can occur both for external processes using the API and for queries executing 
> within Hive.
> h3. Scope of work
> This ticket is concerned with sections of the HCatalog API that deal with DDL 
> type operations using the metastore, not with those whose purpose is to 
> read/write table data. A separate issue already exists for adding locking to 
> HCat readers and writers (HIVE-6207).
> h3. Proposed work
> The following work items would serve as a minimum deliverable that would both 
> allow API users to effectively work with locks:
> * Comprehensively document on the wiki the locks required for various Hive 
> operations. At a minimum this should cover all operations exposed by 
> {{HCatClient}}. The [Locking design 
> document|https://cwiki.apache.org/confluence/display/Hive/Locking] can be 
> used as a starting point or perhaps updated.
> * Implement methods and types in the {{HCatClient}} API that allow users to 
> manipulate Hive locks. For the most part I'd expect these to delegate to the 
> metastore API implementations:
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.lock(LockRequest)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.checkLock(long)}}
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.unlock(long)}}
> ** -{{org.apache.hadoop.hive.metastore.IMetaStoreClient.showLocks()}}-
> ** {{org.apache.hadoop.hive.metastore.IMetaStoreClient.heartbeat(long, long)}}
> ** {{org.apache.hadoop.hive.metastore.api.LockComponent}}
> ** {{org.apache.hadoop.hive.metastore.api.LockRequest}}
> ** {{org.apache.hadoop.hive.metastore.api.LockResponse}}
> ** {{org.apache.hadoop.hive.metastore.api.LockLevel}}
> ** {{org.apache.hadoop.hive.metastore.api.LockType}}
> ** {{org.apache.hadoop.hive.metastore.api.LockState}}
> ** -{{org.apache.hadoop.hive.metastore.api.ShowLocksResponse}}-
> h3. Additional proposals
> Explicit lock management should be fairly simple to add to {{HCatClient}}, 
> however it puts the onus on the API user to correctly understand and 
> implement code that uses lock in an appropriate manner. Failure to do so may 
> have undesirable consequences. With a simpler user model the operations 
> exposed on the API would automatically acquire and release the locks that 
> they need. This might work well for small numbers of operations, but not 
> perhaps for large sequences of invocations. (Do we need to worry about this 
> though as the API methods usually accept batches?).  Additionally tasks such 
> as heartbeat management could also be handled implicitly for long running 
> sets of operations. With these concerns in mind it may also be beneficial to 
> deliver some of the following:
> * A means to automatically acquire/release appropriate locks for 
> {{HCatClient}} operations.
> * A component that maintains a lock heartbeat from the client.
> * A strategy for switching between manual/automatic lock management, 
> analogous to SQL's {{autocommit}} for transactions.
> An API for lock and heartbeat management already exists in the HCatalog 
> Mutation API (see: 
> {{org.apache.hive.hcatalog.streaming.mutate.client.lock}}). It will likely 
> make sense to refactor either this code and/or code that uses it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to