Re: When do the deltas of a transaction become observable?

2018-12-03 Thread Granville Barnett
Thanks Gopal, that was very helpful.

Granville

On Mon, 26 Nov 2018 at 08:14, Gopal Vijayaraghavan 
wrote:

>
> >release of the locks) but I can't seem to find it. As it's a
> transactional
> >system I'd expect we observe both deltas or none at all, at the point
> of
> >successful commit.
>
> In Hive's internals, "observe" is slightly different from "use". Hive ACID
> system
> can see a file on HDFS and then ignore it, because it is from the
> "future".
>
> You can sort of start from this line
>
>
> https://github.com/apache/hive/blob/master/storage-api/src/java/org/apache/hadoop/hive/common/ValidReaderWriteIdList.java#L70
>
> and work backwards.
>
> >I had done some basic tests to determine if the observation semantics
> were
> >tied to the metadata in the database product for the transactional
> system
> >but I could only determine write IDs were influencing this, e.g. if
> write
> >ID = 7 for a given table, then the read would consist of all deltas
> with a
> >write ID < 7.
>
> Yes, you're on the right track. There's a mapping from txn_id -> write_id
> (per-table), maintained by the writers (i.e if a txn commits, then the
> write_id is visible).
>
> For each table, in each query, there's a snapshot taken which has a
> min:max and list of exceptions.
>
> When a query starts it sees that all txns below 5 are all committed or
> cleaned, therefore all <=5 is good.
>
> It knows that highest known txn is 10, so all >10 is to be ignored.
>
> And between 5 & 10, it knows that 7 is aborted and 8 is still open (i.e
> exceptions).
>
> So if it sees a delta_11 dir, it ignores it, If it sees a delta_8, it
> ignores it.
>
> The "ACID" implementation hides future updates in plain sight and doesn't
> need HDFS to be able to rename multiple dirs together.
>
> Most of that smarts is in the split-generation, not in the commit
> (however, the commit does something else to detect write-conflicts which is
> its own thing).
>
> >If someone could point me in the right direction, or correct my
> >understanding then I would greatly appreciate it.
>
> This implementation is built with the txn -> write_id indirection to
> support cross-replication between say an east-coast cluster to a west-coast
> cluster,
> each owning primary data-sets on their own coasts.
>
> Cheers,
> Gopal
>
>
>


When do the deltas of a transaction become observable?

2018-11-23 Thread Granville Barnett
Hi All,

I'm trying to figure out where in the Hive codebase that all deltas that
are the side effect of a Hive 3.x transaction become observable. (My
current investigation is for HDFS.)

For example,

from table1
insert into table2 select x
insert into table3 select x;

This transaction generates two delta files: one that will be appear under
the location for table2 and another under the location for table3.

I'm expecting that there's some logic that will make the deltas of this
transaction appear in their respective HDFS locations upon commit (or
release of the locks) but I can't seem to find it. As it's a transactional
system I'd expect we observe both deltas or none at all, at the point of
successful commit.

The only reference to a location I've managed to stumble across is that of
the Hive scratch space: conceptually, I had thought that the intermediate
result of a transaction would be located here and then a rename would occur
to make the content visible to other readers.

I had done some basic tests to determine if the observation semantics were
tied to the metadata in the database product for the transactional system
but I could only determine write IDs were influencing this, e.g. if write
ID = 7 for a given table, then the read would consist of all deltas with a
write ID < 7.

If someone could point me in the right direction, or correct my
understanding then I would greatly appreciate it.

Thanks,

Granville


Re: Why are TXN IDs not partitioned per database?

2018-11-20 Thread Granville Barnett
Thanks Alan.

On Tue, 20 Nov 2018, 17:23 Alan Gates  History.  Originally there were only transaction ids, which were global.
> Write ids for tables came later as a way to limit the amount of information
> each transaction needed to track and to make it easier to replicate table
> changes between Hive instances.
>
> But even if we had put them in from the start, we'd have them span
> databases, otherwise transactions couldn't span databases.  Hive has no
> restrictions on queries spanning databases so we wouldn't want to restrict
> transactions from doing so.
>
> Alan.
>
> On Tue, Nov 20, 2018 at 7:32 AM Granville Barnett <
> granvillebarn...@gmail.com> wrote:
>
> > Hi,
> >
> > Reading the source code of Hive 3.x and I have a question regarding
> > transaction IDs which form the span of a transaction: it's begin (TXN ID)
> > and commit ID (NEXT_TXN_ID at time of commit).
> >
> > Why is it that we have a global timeline for transactions rather than a
> > timeline partitioned at the granularity of a database, kind of similar to
> > how write IDs are partitioned per table but at the database scope?
> >
> > E.g.,
> >
> > NEXT_TXN_ID
> > +---+---+
> > | DB| NTXN_NEXT  |
> > +---+---+
> > | test1 | 23   |
> > | test2 | 4 |
> > +---+---+
> >
> > Same question could also be applied to NEXT_LOCK_ID.
> >
> > I am just curious because it seems like partitioning the transaction (and
> > lock IDs) would reduce the granularity of locking in the various
> > transactional methods. For example, openTxn invocations are mutexed with
> > all other openTxn invocations even if they are for transactions running
> in
> > distinct database domains.  Similarly for openTxn mutexing with respect
> to
> > commitTxn if there is a write-write conflict, which I would have thought
> > would only be the case if they are applicable to the same database. I'm
> > sure that this would have the side effect of increasing the complexity of
> > other subsystems but I had to ask what the rationale was behind this.
> >
> > (I'm new to Hive to please forgive me if the answer is obvious.)
> >
> > Regards,
> >
> > Granville
> >
>


Why are TXN IDs not partitioned per database?

2018-11-20 Thread Granville Barnett
Hi,

Reading the source code of Hive 3.x and I have a question regarding
transaction IDs which form the span of a transaction: it's begin (TXN ID)
and commit ID (NEXT_TXN_ID at time of commit).

Why is it that we have a global timeline for transactions rather than a
timeline partitioned at the granularity of a database, kind of similar to
how write IDs are partitioned per table but at the database scope?

E.g.,

NEXT_TXN_ID
+---+---+
| DB| NTXN_NEXT  |
+---+---+
| test1 | 23   |
| test2 | 4 |
+---+---+

Same question could also be applied to NEXT_LOCK_ID.

I am just curious because it seems like partitioning the transaction (and
lock IDs) would reduce the granularity of locking in the various
transactional methods. For example, openTxn invocations are mutexed with
all other openTxn invocations even if they are for transactions running in
distinct database domains.  Similarly for openTxn mutexing with respect to
commitTxn if there is a write-write conflict, which I would have thought
would only be the case if they are applicable to the same database. I'm
sure that this would have the side effect of increasing the complexity of
other subsystems but I had to ask what the rationale was behind this.

(I'm new to Hive to please forgive me if the answer is obvious.)

Regards,

Granville