Re: Review Request 73329: Correlating Deleted Entities with Lineage

2021-05-19 Thread Ashutosh Mestry via Review Board


> On May 18, 2021, 6:05 a.m., Sarath Subramanian wrote:
> > webapp/src/main/java/org/apache/atlas/notification/EntityCorrelationManager.java
> > Lines 93 (patched)
> > 
> >
> > this checks for the first entry that is less than the spooledTimestamp. 
> > We should be fetching the entry(ts) having close proximity to 
> > spooledTimestamp.
> > 
> > CACHE:
> > ---
> > QName  | Guid | 
> > ---
> > T1@cl1 | [7:00: guid1], [7:40: guid2]
> > ---
> > 
> > 6:50 - CTAS (T5) FROM T1 (guid1)
> > 7:20 - CTAS (T6) FROM T1 (guid2)

This case gets addressed with current algorithm.


- Ashutosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73329/#review223005
---


On May 20, 2021, 4 a.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/73329/
> ---
> 
> (Updated May 20, 2021, 4 a.m.)
> 
> 
> Review request for atlas, Radhika Kundam and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-4152
> https://issues.apache.org/jira/browse/ATLAS-4152
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Background**
> As part of ATLAS-4204, HS2 notifications send entity-lineage only (provided 
> the poperty is enabled).
> 
> When Spooling enabled the order of messages can potentially change. The 
> notification messages coming from HS2 and HMS may not be in the same order as 
> when they arrived with direct notification.
> 
> Problem: 
> Consider the sequence of arriving messages:
> 
> This is the sequence of messages for Entity 1 (C = create, U = update, D = 
> delete, L?x = Lineage of type 'x')
> No problem: C1, U1, L1x, L1y, D1
> Problem: C1, U1, D1, L1x, L1y
> 
> This implementation attempts to handle ths problem mentioned above. If the 
> above case is not handled, it will end up creating shell entities, since 
> deleted entities are not looked up as part of entity creation.
> 
> **Approach**
> Used bounded stream approach where an incoming stream of messages is bounded 
> with an indicator that it originates from spool. This helps makes localized 
> decisions on the incoming stream of messages.
> 
> High-level approach:
> - Messages when written to the spool are tagged with a timestamp.
> - Deleted entities are maintained in a cache.
> - Lineage-only message are checked if they refer to a deleted entity.
> - If they refer to deleted entity, they are stitched to the one present in 
> the cache only if it falls within the threshold.
> - Using step-climbing approach for locating right entity to stitch lineage to.
> 
> New: _EntityCorrelationsManager_: Uses message timestamp and cached entity 
> qualifiedName-GUID map.
> Modifed: _NotificationHookConsumer_ Uses the new class.
> New: _HiveDDLLineagePreprocess_ Uses entity-correlation to link to deleted 
> entities.
> Modified: _SpoolConfiguration_: Added new configuration to pause message 
> sending after destination is available: 
> _atlas.hook.spool.pause.before.send.sec_.
> In-memory lookup approach changed to persistent loookup.
> 
> 
> Diffs
> -
> 
>   common/src/main/java/org/apache/atlas/repository/Constants.java ffcec9743 
>   
> intg/src/main/java/org/apache/atlas/model/notification/AtlasNotificationMessage.java
>  810ba97c9 
>   notification/src/main/java/org/apache/atlas/kafka/AtlasKafkaConsumer.java 
> f7d9668ec 
>   notification/src/main/java/org/apache/atlas/kafka/AtlasKafkaMessage.java 
> 22bd79fdf 
>   notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
> 3d1b3ccf1 
>   
> notification/src/main/java/org/apache/atlas/notification/AtlasNotificationMessageDeserializer.java
>  3264e264c 
>   
> notification/src/main/java/org/apache/atlas/notification/NotificationInterface.java
>  edd8ed931 
>   
> notification/src/main/java/org/apache/atlas/notification/spool/AtlasFileSpool.java
>  2d7d19595 
>   
> notification/src/main/java/org/apache/atlas/notification/spool/Publisher.java 
> 22242c933 
>   
> notification/src/main/java/org/apache/atlas/notification/spool/SpoolConfiguration.java
>  a9a3a78cc 
>   notification/src/main/java/org/apache/atlas/notification/spool/Spooler.java 
> 2cacaaadc 
>   
> notification/src/test/java/org/apache/atlas/notification/AbstractNotificationTest.java
>  d7e4959f7 
>   
> notification/src/test/java/org/apache/atlas/notification/spool/AtlasFileSpoolTest.java
>  167efbecc 
>   
> repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
>  cc727c6ba 
>   
> 

Re: Review Request 73329: Correlating Deleted Entities with Lineage

2021-05-19 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/73329/
---

(Updated May 20, 2021, 4 a.m.)


Review request for atlas, Radhika Kundam and Sarath Subramanian.


Changes
---

Updates include: 
- Addressed review comments.
- Added persistent caching approach. 
- Additional unit tests.


Bugs: ATLAS-4152
https://issues.apache.org/jira/browse/ATLAS-4152


Repository: atlas


Description (updated)
---

**Background**
As part of ATLAS-4204, HS2 notifications send entity-lineage only (provided the 
poperty is enabled).

When Spooling enabled the order of messages can potentially change. The 
notification messages coming from HS2 and HMS may not be in the same order as 
when they arrived with direct notification.

Problem: 
Consider the sequence of arriving messages:

This is the sequence of messages for Entity 1 (C = create, U = update, D = 
delete, L?x = Lineage of type 'x')
No problem: C1, U1, L1x, L1y, D1
Problem: C1, U1, D1, L1x, L1y

This implementation attempts to handle ths problem mentioned above. If the 
above case is not handled, it will end up creating shell entities, since 
deleted entities are not looked up as part of entity creation.

**Approach**
Used bounded stream approach where an incoming stream of messages is bounded 
with an indicator that it originates from spool. This helps makes localized 
decisions on the incoming stream of messages.

High-level approach:
- Messages when written to the spool are tagged with a timestamp.
- Deleted entities are maintained in a cache.
- Lineage-only message are checked if they refer to a deleted entity.
- If they refer to deleted entity, they are stitched to the one present in the 
cache only if it falls within the threshold.
- Using step-climbing approach for locating right entity to stitch lineage to.

New: _EntityCorrelationsManager_: Uses message timestamp and cached entity 
qualifiedName-GUID map.
Modifed: _NotificationHookConsumer_ Uses the new class.
New: _HiveDDLLineagePreprocess_ Uses entity-correlation to link to deleted 
entities.
Modified: _SpoolConfiguration_: Added new configuration to pause message 
sending after destination is available: 
_atlas.hook.spool.pause.before.send.sec_.
In-memory lookup approach changed to persistent loookup.


Diffs (updated)
-

  common/src/main/java/org/apache/atlas/repository/Constants.java ffcec9743 
  
intg/src/main/java/org/apache/atlas/model/notification/AtlasNotificationMessage.java
 810ba97c9 
  notification/src/main/java/org/apache/atlas/kafka/AtlasKafkaConsumer.java 
f7d9668ec 
  notification/src/main/java/org/apache/atlas/kafka/AtlasKafkaMessage.java 
22bd79fdf 
  notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
3d1b3ccf1 
  
notification/src/main/java/org/apache/atlas/notification/AtlasNotificationMessageDeserializer.java
 3264e264c 
  
notification/src/main/java/org/apache/atlas/notification/NotificationInterface.java
 edd8ed931 
  
notification/src/main/java/org/apache/atlas/notification/spool/AtlasFileSpool.java
 2d7d19595 
  notification/src/main/java/org/apache/atlas/notification/spool/Publisher.java 
22242c933 
  
notification/src/main/java/org/apache/atlas/notification/spool/SpoolConfiguration.java
 a9a3a78cc 
  notification/src/main/java/org/apache/atlas/notification/spool/Spooler.java 
2cacaaadc 
  
notification/src/test/java/org/apache/atlas/notification/AbstractNotificationTest.java
 d7e4959f7 
  
notification/src/test/java/org/apache/atlas/notification/spool/AtlasFileSpoolTest.java
 167efbecc 
  
repository/src/main/java/org/apache/atlas/repository/graph/GraphBackedSearchIndexer.java
 cc727c6ba 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/EntityCorrelationStore.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java
 0a9470815 
  
repository/src/test/java/org/apache/atlas/repository/store/graph/v2/EntityCorrelationStoreTest.java
 PRE-CREATION 
  
webapp/src/main/java/org/apache/atlas/notification/EntityCorrelationManager.java
 PRE-CREATION 
  
webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
 84cc8d813 
  
webapp/src/main/java/org/apache/atlas/notification/preprocessor/EntityPreprocessor.java
 89568e236 
  
webapp/src/main/java/org/apache/atlas/notification/preprocessor/HiveDbDDLPreprocessor.java
 PRE-CREATION 
  
webapp/src/main/java/org/apache/atlas/notification/preprocessor/HivePreprocessor.java
 86e3384ee 
  
webapp/src/main/java/org/apache/atlas/notification/preprocessor/HiveTableDDLPreprocessor.java
 PRE-CREATION 
  
webapp/src/main/java/org/apache/atlas/notification/preprocessor/PreprocessorContext.java
 608b4a304 
  
webapp/src/test/java/org/apache/atlas/notification/NotificationHookConsumerKafkaTest.java
 65e8b5001 
  

[jira] [Assigned] (ATLAS-4289) [Atlas: Glossary Term Bulk Import] [regression]Bulk import broken for xls/xlsx input when it refers to the term created as a part of the same import

2021-05-19 Thread Dharshana M Krishnamoorthy (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dharshana M Krishnamoorthy reassigned ATLAS-4289:
-

Assignee: Dharshana M Krishnamoorthy

> [Atlas: Glossary Term Bulk Import] [regression]Bulk import broken for 
> xls/xlsx input when it refers to the term created as a part of the same import
> 
>
> Key: ATLAS-4289
> URL: https://issues.apache.org/jira/browse/ATLAS-4289
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Reporter: Dharshana M Krishnamoorthy
>Assignee: Dharshana M Krishnamoorthy
>Priority: Major
> Attachments: xls_input.xls, xlsx_input.xlsx
>
>
> [^xls_input.xls] [^xlsx_input.xlsx]
> While performing import with the provided input, there should be no failure 
> and all data should have proper relationship. 
> This was working fine before.
> Currently, seeing a failure while importing xlsx:
> {code:java}
> {
>   "failedImportInfoList": [
> {
>   "parentObjectName": "xlsx_input_glossary_2",
>   "childObjectName": "term_2",
>   "importStatus": "FAILED",
>   "remarks": "The provided Reference 0%@termAttribute does not exist at 
> Atlas referred at record with TermName  : term_2 and GlossaryName : 
> xlsx_input_glossary_2"
> }
>   ],
>   "successImportInfoList": [
> {
>   "parentObjectName": "xlsx_input_glossary_1",
>   "childObjectName": "term_1",
>   "importStatus": "SUCCESS",
>   "remarks": 
> "{\"termGuid\":\"19b8fec2-9061-4234-9504-e716da735275\",\"qualifiedName\":\"term_1@xlsx_input_glossary_1\"}"
> },
> {
>   "parentObjectName": "xlsx_input_glossary_2",
>   "childObjectName": "term_2",
>   "importStatus": "SUCCESS",
>   "remarks": 
> "{\"termGuid\":\"7bf0dfb2-15d8-49ab-9e4f-d07f4ecb8dc6\",\"qualifiedName\":\"term_2@xlsx_input_glossary_2\"}"
> }
>   ]
> } {code}
> Importing xls:
> {code:java}
> {
>   "failedImportInfoList": [
> {
>   "parentObjectName": "xls_input_glossary_2",
>   "childObjectName": "term_2",
>   "importStatus": "FAILED",
>   "remarks": "The provided Reference 0%@termAttribute does not exist at 
> Atlas referred at record with TermName  : term_2 and GlossaryName : 
> xls_input_glossary_2"
> }
>   ],
>   "successImportInfoList": [
> {
>   "parentObjectName": "xls_input_glossary_1",
>   "childObjectName": "term_1",
>   "importStatus": "SUCCESS",
>   "remarks": 
> "{\"termGuid\":\"c6487c38-f45e-4897-9a86-ea84d3b706a6\",\"qualifiedName\":\"term_1@xls_input_glossary_1\"}"
> },
> {
>   "parentObjectName": "xls_input_glossary_2",
>   "childObjectName": "term_2",
>   "importStatus": "SUCCESS",
>   "remarks": 
> "{\"termGuid\":\"44c6838b-cfe6-43ac-998f-c609bbec53ee\",\"qualifiedName\":\"term_2@xls_input_glossary_2\"}"
> }
>   ]
> } {code}
> Also, please note the 0%0%@termAttribute in the remarks of the failed message



--
This message was sent by Atlassian Jira
(v8.3.4#803005)