Re: When do the deltas of a transaction become observable?
Thanks Gopal, that was very helpful. Granville On Mon, 26 Nov 2018 at 08:14, Gopal Vijayaraghavan wrote: > > >release of the locks) but I can't seem to find it. As it's a > transactional > >system I'd expect we observe both deltas or none at all, at the point > of > >successful commit. > > In Hive's internals, "observe" is slightly different from "use". Hive ACID > system > can see a file on HDFS and then ignore it, because it is from the > "future". > > You can sort of start from this line > > > https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java#L70 > > and work backwards. > > >I had done some basic tests to determine if the observation semantics > were > >tied to the metadata in the database product for the transactional > system > >but I could only determine write IDs were influencing this, e.g. if > write > >ID = 7 for a given table, then the read would consist of all deltas > with a > >write ID < 7. > > Yes, you're on the right track. There's a mapping from txn_id -> write_id > (per-table), maintained by the writers (i.e if a txn commits, then the > write_id is visible). > > For each table, in each query, there's a snapshot taken which has a > min:max and list of exceptions. > > When a query starts it sees that all txns below 5 are all committed or > cleaned, therefore all <=5 is good. > > It knows that highest known txn is 10, so all >10 is to be ignored. > > And between 5 & 10, it knows that 7 is aborted and 8 is still open (i.e > exceptions). > > So if it sees a delta_11 dir, it ignores it, If it sees a delta_8, it > ignores it. > > The "ACID" implementation hides future updates in plain sight and doesn't > need HDFS to be able to rename multiple dirs together. > > Most of that smarts is in the split-generation, not in the commit > (however, the commit does something else to detect write-conflicts which is > its own thing). > > >If someone could point me in the right direction, or correct my > >understanding then I would greatly appreciate it. > > This implementation is built with the txn -> write_id indirection to > support cross-replication between say an east-coast cluster to a west-coast > cluster, > each owning primary data-sets on their own coasts. > > Cheers, > Gopal > > >
When do the deltas of a transaction become observable?
Hi All, I'm trying to figure out where in the Hive codebase that all deltas that are the side effect of a Hive 3.x transaction become observable. (My current investigation is for HDFS.) For example, from table1 insert into table2 select x insert into table3 select x; This transaction generates two delta files: one that will be appear under the location for table2 and another under the location for table3. I'm expecting that there's some logic that will make the deltas of this transaction appear in their respective HDFS locations upon commit (or release of the locks) but I can't seem to find it. As it's a transactional system I'd expect we observe both deltas or none at all, at the point of successful commit. The only reference to a location I've managed to stumble across is that of the Hive scratch space: conceptually, I had thought that the intermediate result of a transaction would be located here and then a rename would occur to make the content visible to other readers. I had done some basic tests to determine if the observation semantics were tied to the metadata in the database product for the transactional system but I could only determine write IDs were influencing this, e.g. if write ID = 7 for a given table, then the read would consist of all deltas with a write ID < 7. If someone could point me in the right direction, or correct my understanding then I would greatly appreciate it. Thanks, Granville
Re: Why are TXN IDs not partitioned per database?
Thanks Alan. On Tue, 20 Nov 2018, 17:23 Alan Gates History. Originally there were only transaction ids, which were global. > Write ids for tables came later as a way to limit the amount of information > each transaction needed to track and to make it easier to replicate table > changes between Hive instances. > > But even if we had put them in from the start, we'd have them span > databases, otherwise transactions couldn't span databases. Hive has no > restrictions on queries spanning databases so we wouldn't want to restrict > transactions from doing so. > > Alan. > > On Tue, Nov 20, 2018 at 7:32 AM Granville Barnett < > granvillebarn...@gmail.com> wrote: > > > Hi, > > > > Reading the source code of Hive 3.x and I have a question regarding > > transaction IDs which form the span of a transaction: it's begin (TXN ID) > > and commit ID (NEXT_TXN_ID at time of commit). > > > > Why is it that we have a global timeline for transactions rather than a > > timeline partitioned at the granularity of a database, kind of similar to > > how write IDs are partitioned per table but at the database scope? > > > > E.g., > > > > NEXT_TXN_ID > > +---+---+ > > | DB| NTXN_NEXT | > > +---+---+ > > | test1 | 23 | > > | test2 | 4 | > > +---+---+ > > > > Same question could also be applied to NEXT_LOCK_ID. > > > > I am just curious because it seems like partitioning the transaction (and > > lock IDs) would reduce the granularity of locking in the various > > transactional methods. For example, openTxn invocations are mutexed with > > all other openTxn invocations even if they are for transactions running > in > > distinct database domains. Similarly for openTxn mutexing with respect > to > > commitTxn if there is a write-write conflict, which I would have thought > > would only be the case if they are applicable to the same database. I'm > > sure that this would have the side effect of increasing the complexity of > > other subsystems but I had to ask what the rationale was behind this. > > > > (I'm new to Hive to please forgive me if the answer is obvious.) > > > > Regards, > > > > Granville > > >
Why are TXN IDs not partitioned per database?
Hi, Reading the source code of Hive 3.x and I have a question regarding transaction IDs which form the span of a transaction: it's begin (TXN ID) and commit ID (NEXT_TXN_ID at time of commit). Why is it that we have a global timeline for transactions rather than a timeline partitioned at the granularity of a database, kind of similar to how write IDs are partitioned per table but at the database scope? E.g., NEXT_TXN_ID +---+---+ | DB| NTXN_NEXT | +---+---+ | test1 | 23 | | test2 | 4 | +---+---+ Same question could also be applied to NEXT_LOCK_ID. I am just curious because it seems like partitioning the transaction (and lock IDs) would reduce the granularity of locking in the various transactional methods. For example, openTxn invocations are mutexed with all other openTxn invocations even if they are for transactions running in distinct database domains. Similarly for openTxn mutexing with respect to commitTxn if there is a write-write conflict, which I would have thought would only be the case if they are applicable to the same database. I'm sure that this would have the side effect of increasing the complexity of other subsystems but I had to ask what the rationale was behind this. (I'm new to Hive to please forgive me if the answer is obvious.) Regards, Granville