Re: Consuming delta from Hive tables

2019-05-20 Thread Alan Gates
On Sun, May 19, 2019 at 11:21 PM Bhargav Bipinchandra Naik (Seller Platform-BLR) wrote: > Hi Alan, > > > Are write_ids monotonically increasing? > They are assigned monotonically, but the transactions they are a part of may commit at different times, so you can't use it as a low water mark. That

Re: Consuming delta from Hive tables

2019-05-20 Thread Bhargav Bipinchandra Naik (Seller Platform-BLR)
Hi Alan, Are write_ids monotonically increasing? Are write_ids accessible in the hive query? For e.g.: select * from table_name where write_id > N; Basically I am trying to understand if I can use write_id to consume only updated rows. Store the maximum write_id(X) seen in the result and next

Re: Consuming delta from Hive tables

2019-05-17 Thread Alan Gates
Sorry, looks like you sent this earlier and I missed it. A couple of things. One, write_id is per transaction per table. So for table T, all rows written in w1 will have the same write_id, though they will each have their own monotonically increasing row_ids. Row_ids are scoped by a write_id,

Re: Consuming delta from Hive tables

2019-05-17 Thread Bhargav Bipinchandra Naik (Seller Platform-BLR)
Is the following scenario supported? *timestamp:* t1 < t2 < t3 < t4 < t5 < t6 *w1 -* transaction which updates subset of rows in table T {start_time: t1, end_time: t5} *w2 -* transaction which updates subset of rows in table T {start_time: t2, end_time: t3} *r1 - *job which reads rows from table

Re: Consuming delta from Hive tables

2019-05-07 Thread Bhargav Bipinchandra Naik (Seller Platform-BLR)
Hi Jesus and Alan, Thanks for the prompt reply. Had a follow up question. *timestamp:* t1 < t2 < t3 < t4 < t5 < t6 *w1 -* transaction which updates subset of rows in table T {start_time: t1, end_time: t5} *w2 -* transaction which updates subset of rows in table T {start_time: t2, end_time: t3}

Re: Consuming delta from Hive tables

2019-05-06 Thread Jesus Camacho Rodriguez
Hi Bhargav, We solve a similar problem for incremental maintenance for materialized views. row__id.writeid can be used for that scenario indeed.

Re: Consuming delta from Hive tables

2019-05-06 Thread Alan Gates
The other issue is an external system has no ability to control when the compactor is run (it rewrites deltas into the base files and thus erases intermediate states that would interest you). The mapping of writeids (table specific) to transaction ids (system wide) is also cleaned intermittently,

Consuming delta from Hive tables

2019-05-06 Thread Bhargav Bipinchandra Naik (Seller Platform-BLR)
We have a scenario where we want to consume only delta updates from Hive tables. - Multiple producers are updating data in Hive table - Multiple consumer reading data from the Hive table Consumption pattern: - Get all data that has been updated since last time I read. Is there any mechanism in