Re: Adding Hive Metastore functions to add and alter partitions for multiple tables
Thank you all for the help. I'm preparing the patch for reviewing. 秦凯捷 Tel: +86-13810485829 E-mail: daniel...@gmail.com On Tue, Dec 19, 2017 at 12:49 AM, Eugene Koifmanwrote: > +1 to Alex’ comment > > On 12/14/17, 3:27 PM, "Alexander Kolbasov" wrote: > > Kaijie, > > can you describe in more details why would you need such functionality? > What problem does it actually solve? > > I do not think that HMS should do more "atomic" compound operations > then it > does now - IMO it should do less instead. This is especially the case > when > operations involve a mix of metadata operations and filesystem > operations > which can not be always reverted correctly. Such things make semantics > of > HMS calls more and more complex and difficult to maintain. Existing > bulk > APIs are not a good example that we should follow. > > > - Alex > > On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷 wrote: > > > Hi Andrew, > > > > Thanks for you response. For your comments: > > > > -Functionality: > > Support adding and altering multiple partitions for multiple tables > in one > > SQL and API request as one transaction. > > > > - what happens in the case of a failure when part way through the > > operations. > > For altering and adding partitions, all the objectstore changes for > > partitions will be operated in one transaction. So the transaction > will be > > roll-back in case of failure. > > For adding partitions, there may be additional steps to add > directories on > > filesystem for newly added partitions. They will be deleted in case > of > > failure, just like what AddPartitions is doing now. > > > > - what impact on the system there will be if an operation takes a > long time > > Alter partitions for multiple tables actually has no big difference > than > > current altering partitions for one table. They will both take a > long time > > if someone is trying to alter too many partitions or for too many > tables. > > Transaction timeout will strike down the operation. > > We are doing performance test on our system to see how long it takes > for > > multiple scenarios but after all, this should not be a blocker. > > > > Thanks, > > Kaijie > > > > 秦凯捷 > > Tel: +86-13810485829 > > E-mail: daniel...@gmail.com > > > > > > > > On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman < > asher...@cloudera.com> > > wrote: > > > > > Hi Kaijie, > > > > > > I think this is an area that other the Hive community is > interested in. > > So > > > please do go ahead and describe your functionality. > > > I think that it is important to describe > > > - what happens in the case of a failure when part way through the > > > operations. > > > - what impact on the system there will be if an operation takes a > long > > time > > > > > > Thanks > > > > > > -Andrew > > > > > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 wrote: > > > > > > > Hi dev, > > > > > > > > I'm wondering if Hive community have ever considered support > adding and > > > > altering multiple partitions for multiple tables? > > > > > > > > I'm using Hive Metastore to manage the metadata for Presto > querying. > > Our > > > > business requires that we should publish some partitions of data > for > > > > multiple tables at the same time in an atomic transaction to > keep the > > > data > > > > consistency. Currently Hive Metastore only supports adding and > altering > > > > multiple tables for one table. > > > > > > > > I drafted AddPartitionsForTables and AlterPartitionsForTables > function > > to > > > > achieve this based on existing AddPartition and AlterPartition > logic > > and > > > we > > > > are testing it on our system. > > > > I'm wondering if community have considered these functionality. > I would > > > > like to contribute the functionality if you have interest. > > > > > > > > Thank you! > > > > -Kaijie > > > > > > > > > > > > Tel: +86-13810485829 > > > > E-mail: daniel...@gmail.com > > > > > > > > > > > >
Re: Adding Hive Metastore functions to add and alter partitions for multiple tables
+1 to Alex’ comment On 12/14/17, 3:27 PM, "Alexander Kolbasov"wrote: Kaijie, can you describe in more details why would you need such functionality? What problem does it actually solve? I do not think that HMS should do more "atomic" compound operations then it does now - IMO it should do less instead. This is especially the case when operations involve a mix of metadata operations and filesystem operations which can not be always reverted correctly. Such things make semantics of HMS calls more and more complex and difficult to maintain. Existing bulk APIs are not a good example that we should follow. - Alex On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷 wrote: > Hi Andrew, > > Thanks for you response. For your comments: > > -Functionality: > Support adding and altering multiple partitions for multiple tables in one > SQL and API request as one transaction. > > - what happens in the case of a failure when part way through the > operations. > For altering and adding partitions, all the objectstore changes for > partitions will be operated in one transaction. So the transaction will be > roll-back in case of failure. > For adding partitions, there may be additional steps to add directories on > filesystem for newly added partitions. They will be deleted in case of > failure, just like what AddPartitions is doing now. > > - what impact on the system there will be if an operation takes a long time > Alter partitions for multiple tables actually has no big difference than > current altering partitions for one table. They will both take a long time > if someone is trying to alter too many partitions or for too many tables. > Transaction timeout will strike down the operation. > We are doing performance test on our system to see how long it takes for > multiple scenarios but after all, this should not be a blocker. > > Thanks, > Kaijie > > 秦凯捷 > Tel: +86-13810485829 > E-mail: daniel...@gmail.com > > > > On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman > wrote: > > > Hi Kaijie, > > > > I think this is an area that other the Hive community is interested in. > So > > please do go ahead and describe your functionality. > > I think that it is important to describe > > - what happens in the case of a failure when part way through the > > operations. > > - what impact on the system there will be if an operation takes a long > time > > > > Thanks > > > > -Andrew > > > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 wrote: > > > > > Hi dev, > > > > > > I'm wondering if Hive community have ever considered support adding and > > > altering multiple partitions for multiple tables? > > > > > > I'm using Hive Metastore to manage the metadata for Presto querying. > Our > > > business requires that we should publish some partitions of data for > > > multiple tables at the same time in an atomic transaction to keep the > > data > > > consistency. Currently Hive Metastore only supports adding and altering > > > multiple tables for one table. > > > > > > I drafted AddPartitionsForTables and AlterPartitionsForTables function > to > > > achieve this based on existing AddPartition and AlterPartition logic > and > > we > > > are testing it on our system. > > > I'm wondering if community have considered these functionality. I would > > > like to contribute the functionality if you have interest. > > > > > > Thank you! > > > -Kaijie > > > > > > > > > Tel: +86-13810485829 > > > E-mail: daniel...@gmail.com > > > > > >
Re: Adding Hive Metastore functions to add and alter partitions for multiple tables
Thanks Kaijie. One concern is that the new functions effectively expand the size of the transactions and the work that must be undone if they fail. So the question is whether the benefit is large enough to justify adding complexity. If no-one else has comments then you should probably think about having people look at the code. -Andrew On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷wrote: > Hi Andrew, > > Thanks for you response. For your comments: > > -Functionality: > Support adding and altering multiple partitions for multiple tables in one > SQL and API request as one transaction. > > - what happens in the case of a failure when part way through the > operations. > For altering and adding partitions, all the objectstore changes for > partitions will be operated in one transaction. So the transaction will be > roll-back in case of failure. > For adding partitions, there may be additional steps to add directories on > filesystem for newly added partitions. They will be deleted in case of > failure, just like what AddPartitions is doing now. > > - what impact on the system there will be if an operation takes a long time > Alter partitions for multiple tables actually has no big difference than > current altering partitions for one table. They will both take a long time > if someone is trying to alter too many partitions or for too many tables. > Transaction timeout will strike down the operation. > We are doing performance test on our system to see how long it takes for > multiple scenarios but after all, this should not be a blocker. > > Thanks, > Kaijie > > 秦凯捷 > Tel: +86-13810485829 > E-mail: daniel...@gmail.com > > > > On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman > wrote: > > > Hi Kaijie, > > > > I think this is an area that other the Hive community is interested in. > So > > please do go ahead and describe your functionality. > > I think that it is important to describe > > - what happens in the case of a failure when part way through the > > operations. > > - what impact on the system there will be if an operation takes a long > time > > > > Thanks > > > > -Andrew > > > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 wrote: > > > > > Hi dev, > > > > > > I'm wondering if Hive community have ever considered support adding and > > > altering multiple partitions for multiple tables? > > > > > > I'm using Hive Metastore to manage the metadata for Presto querying. > Our > > > business requires that we should publish some partitions of data for > > > multiple tables at the same time in an atomic transaction to keep the > > data > > > consistency. Currently Hive Metastore only supports adding and altering > > > multiple tables for one table. > > > > > > I drafted AddPartitionsForTables and AlterPartitionsForTables function > to > > > achieve this based on existing AddPartition and AlterPartition logic > and > > we > > > are testing it on our system. > > > I'm wondering if community have considered these functionality. I would > > > like to contribute the functionality if you have interest. > > > > > > Thank you! > > > -Kaijie > > > > > > > > > Tel: +86-13810485829 > > > E-mail: daniel...@gmail.com > > > > > >
Re: Adding Hive Metastore functions to add and alter partitions for multiple tables
Kaijie, can you describe in more details why would you need such functionality? What problem does it actually solve? I do not think that HMS should do more "atomic" compound operations then it does now - IMO it should do less instead. This is especially the case when operations involve a mix of metadata operations and filesystem operations which can not be always reverted correctly. Such things make semantics of HMS calls more and more complex and difficult to maintain. Existing bulk APIs are not a good example that we should follow. - Alex On Wed, Dec 13, 2017 at 6:54 PM, 秦凯捷wrote: > Hi Andrew, > > Thanks for you response. For your comments: > > -Functionality: > Support adding and altering multiple partitions for multiple tables in one > SQL and API request as one transaction. > > - what happens in the case of a failure when part way through the > operations. > For altering and adding partitions, all the objectstore changes for > partitions will be operated in one transaction. So the transaction will be > roll-back in case of failure. > For adding partitions, there may be additional steps to add directories on > filesystem for newly added partitions. They will be deleted in case of > failure, just like what AddPartitions is doing now. > > - what impact on the system there will be if an operation takes a long time > Alter partitions for multiple tables actually has no big difference than > current altering partitions for one table. They will both take a long time > if someone is trying to alter too many partitions or for too many tables. > Transaction timeout will strike down the operation. > We are doing performance test on our system to see how long it takes for > multiple scenarios but after all, this should not be a blocker. > > Thanks, > Kaijie > > 秦凯捷 > Tel: +86-13810485829 > E-mail: daniel...@gmail.com > > > > On Thu, Dec 14, 2017 at 3:38 AM, Andrew Sherman > wrote: > > > Hi Kaijie, > > > > I think this is an area that other the Hive community is interested in. > So > > please do go ahead and describe your functionality. > > I think that it is important to describe > > - what happens in the case of a failure when part way through the > > operations. > > - what impact on the system there will be if an operation takes a long > time > > > > Thanks > > > > -Andrew > > > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 wrote: > > > > > Hi dev, > > > > > > I'm wondering if Hive community have ever considered support adding and > > > altering multiple partitions for multiple tables? > > > > > > I'm using Hive Metastore to manage the metadata for Presto querying. > Our > > > business requires that we should publish some partitions of data for > > > multiple tables at the same time in an atomic transaction to keep the > > data > > > consistency. Currently Hive Metastore only supports adding and altering > > > multiple tables for one table. > > > > > > I drafted AddPartitionsForTables and AlterPartitionsForTables function > to > > > achieve this based on existing AddPartition and AlterPartition logic > and > > we > > > are testing it on our system. > > > I'm wondering if community have considered these functionality. I would > > > like to contribute the functionality if you have interest. > > > > > > Thank you! > > > -Kaijie > > > > > > > > > Tel: +86-13810485829 > > > E-mail: daniel...@gmail.com > > > > > >
Re: Adding Hive Metastore functions to add and alter partitions for multiple tables
Hi Andrew, Thanks for you response. For your comments: -Functionality: Support adding and altering multiple partitions for multiple tables in one SQL and API request as one transaction. - what happens in the case of a failure when part way through the operations. For altering and adding partitions, all the objectstore changes for partitions will be operated in one transaction. So the transaction will be roll-back in case of failure. For adding partitions, there may be additional steps to add directories on filesystem for newly added partitions. They will be deleted in case of failure, just like what AddPartitions is doing now. - what impact on the system there will be if an operation takes a long time Alter partitions for multiple tables actually has no big difference than current altering partitions for one table. They will both take a long time if someone is trying to alter too many partitions or for too many tables. Transaction timeout will strike down the operation. We are doing performance test on our system to see how long it takes for multiple scenarios but after all, this should not be a blocker. Thanks, Kaijie 秦凯捷 Tel: +86-13810485829 E-mail: daniel...@gmail.com On Thu, Dec 14, 2017 at 3:38 AM, Andrew Shermanwrote: > Hi Kaijie, > > I think this is an area that other the Hive community is interested in. So > please do go ahead and describe your functionality. > I think that it is important to describe > - what happens in the case of a failure when part way through the > operations. > - what impact on the system there will be if an operation takes a long time > > Thanks > > -Andrew > > On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷 wrote: > > > Hi dev, > > > > I'm wondering if Hive community have ever considered support adding and > > altering multiple partitions for multiple tables? > > > > I'm using Hive Metastore to manage the metadata for Presto querying. Our > > business requires that we should publish some partitions of data for > > multiple tables at the same time in an atomic transaction to keep the > data > > consistency. Currently Hive Metastore only supports adding and altering > > multiple tables for one table. > > > > I drafted AddPartitionsForTables and AlterPartitionsForTables function to > > achieve this based on existing AddPartition and AlterPartition logic and > we > > are testing it on our system. > > I'm wondering if community have considered these functionality. I would > > like to contribute the functionality if you have interest. > > > > Thank you! > > -Kaijie > > > > > > Tel: +86-13810485829 > > E-mail: daniel...@gmail.com > > >
Re: Adding Hive Metastore functions to add and alter partitions for multiple tables
Hi Kaijie, I think this is an area that other the Hive community is interested in. So please do go ahead and describe your functionality. I think that it is important to describe - what happens in the case of a failure when part way through the operations. - what impact on the system there will be if an operation takes a long time Thanks -Andrew On Tue, Dec 12, 2017 at 1:31 AM, 秦凯捷wrote: > Hi dev, > > I'm wondering if Hive community have ever considered support adding and > altering multiple partitions for multiple tables? > > I'm using Hive Metastore to manage the metadata for Presto querying. Our > business requires that we should publish some partitions of data for > multiple tables at the same time in an atomic transaction to keep the data > consistency. Currently Hive Metastore only supports adding and altering > multiple tables for one table. > > I drafted AddPartitionsForTables and AlterPartitionsForTables function to > achieve this based on existing AddPartition and AlterPartition logic and we > are testing it on our system. > I'm wondering if community have considered these functionality. I would > like to contribute the functionality if you have interest. > > Thank you! > -Kaijie > > > Tel: +86-13810485829 > E-mail: daniel...@gmail.com >
Adding Hive Metastore functions to add and alter partitions for multiple tables
Hi dev, I'm wondering if Hive community have ever considered support adding and altering multiple partitions for multiple tables? I'm using Hive Metastore to manage the metadata for Presto querying. Our business requires that we should publish some partitions of data for multiple tables at the same time in an atomic transaction to keep the data consistency. Currently Hive Metastore only supports adding and altering multiple tables for one table. I drafted AddPartitionsForTables and AlterPartitionsForTables function to achieve this based on existing AddPartition and AlterPartition logic and we are testing it on our system. I'm wondering if community have considered these functionality. I would like to contribute the functionality if you have interest. Thank you! -Kaijie Tel: +86-13810485829 E-mail: daniel...@gmail.com