Review Request 70463: ATLAS-3132: Improvements

2019-04-11 Thread Ashutosh Mestry

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70463/
---

Review request for atlas, Kapildeo Nayak, Madhan Neethiraj, Nikhil Bonte, Nixon 
Rodrigues, and Sarath Subramanian.


Bugs: ATLAS-3132
https://issues.apache.org/jira/browse/ATLAS-3132


Repository: atlas


Description
---

**Approach**
- Refactored existing implementation for new design.
- Renamed 'Java Patch Framework' to 'Data Patch Framework', rationale being 
that this is essentially to modify structure of existing data.
- New _DataPatchService_: Modified order in which services are called. 
_DataPatchService_ will be called before other services are invoked, thereby 
giving chance for it to complete before entertaining new data.
- New _DataPatchRegistry_: Data access (CRUD) operation for data patches.
- New _UniqueAttributePatchHandler_: Current implementation for adding the new 
property to data vertices. Implemented rudimentary caching to precent 
repetitive look-ups.
- New REST Endpoint to query status of patches.

**Performance**
Since the data patching operation is high-volume operation, it has been treated 
with priority. 
- New _NewPropertyDataHandler_ uses database in bulk loading mode for rapid 
processing. This scales with resources. Additional properties:
- _atlas.processing.batchSize_: Size of batch.
- _atlas.processing.numWorkers_: Number of worker threads to be employed. 
- Leverages existing PC framework.

Processing speed:
- 300K vertices: ~5 mins
- 4.2 M entities: ~45 mins (from: 2019-04-12 04:44:50 to 2019-04-12 05:29:04)


Diffs
-

  graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/AtlasGraph.java 
d282c9966 
  
graphdb/api/src/main/java/org/apache/atlas/repository/graphdb/DataPatchGraphDBHandler.java
 PRE-CREATION 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/VertexIterator.java
 PRE-CREATION 
  
graphdb/janus/src/main/java/org/apache/atlas/repository/graphdb/janus/patches/NewPropertyDataPatch.java
 PRE-CREATION 
  intg/src/main/java/org/apache/atlas/pc/WorkItemConsumer.java b7eb4d89c 
  intg/src/main/java/org/apache/atlas/pc/WorkItemManager.java 0e7d3f22d 
  notification/src/main/java/org/apache/atlas/kafka/EmbeddedKafkaServer.java 
32b597fb6 
  notification/src/main/java/org/apache/atlas/kafka/KafkaNotification.java 
1d0a2734b 
  
repository/src/main/java/org/apache/atlas/repository/patches/AtlasJavaPatchHandler.java
 9153d497b 
  
repository/src/main/java/org/apache/atlas/repository/patches/DataPatchHandler.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/patches/DataPatchManager.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/patches/DataPatchRegistry.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/patches/DataPatchService.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/patches/PatchContext.java 
a60422b80 
  
repository/src/main/java/org/apache/atlas/repository/patches/TypeNameAttributeCache.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatch.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/patches/UniqueAttributePatchHandler.java
 f2238f1b0 
  
repository/src/main/java/org/apache/atlas/repository/store/bootstrap/AtlasTypeDefStoreInitializer.java
 78f3faf99 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasGraphUtilsV2.java
 80141b4f1 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/EntityGraphRetriever.java
 5aa6c8f0e 
  repository/src/test/java/org/apache/atlas/patches/DataPatchRegistryTest.java 
PRE-CREATION 
  
webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
 ce2d76f11 
  webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
c5ceb9d6d 
  webapp/src/test/java/org/apache/atlas/web/resources/AdminResourceTest.java 
223a90a9c 


Diff: https://reviews.apache.org/r/70463/diff/1/


Testing
---

**Unit tests**
Additional tests added.

**Volume tests**
Verification with large datasets: 
- 4M entities
- 3.2M entities
- 16K entities.

**Performance tests**
CPU usage, memory usage and disk IO.

**Pre-commit build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1031/


Thanks,

Ashutosh Mestry



[jira] [Updated] (ATLAS-3134) Change Date.getTime() to System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Description: 
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
 
new Date() is a thin wrapper of method System.currentTimeMillis(). If it is 
intensively invoked in the program, the performance will be greatly damaged.  
According to my local testing, when these two methods are invoked 5,000,000 
times at the same environment, System.currentTimeMillis() can achieve a speedup 
to 5 times (435ms vs 2073ms). 
Therefore, if only getTime() is used for Date object, the light method 
System.currentTimeMillis() is highly recommended, which can also avoid creating 
the temporary Date object.
 

  was:
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
 
new Date() is a thin wrapper of method System.currentTimeMillis(). If it is 
intensively invoked in the program, the performance will be greatly damaged.  
According to my local testing, when these two methods are invoked 5,000,000 
times at the same environment, System.currentTimeMillis() can achieve a speedup 
up to 5 times (435ms vs 2073ms). 
Therefore, if only getTime() is used for Date object, the light method 
System.currentTimeMillis() is highly recommended, which can also avoid creating 
the temporary Date object.
 


> Change Date.getTime() to System.currentTimeMillis() to improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#33}Location: 
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
>  
> new Date() is a thin wrapper of method System.currentTimeMillis(). If it is 
> intensively invoked in the program, the performance will be greatly damaged.  
> According to my local testing, when these two methods are invoked 5,000,000 
> times at the same environment, System.currentTimeMillis() can achieve a 
> speedup to 5 times (435ms vs 2073ms). 
> Therefore, if only getTime() is used for Date object, the light method 
> System.currentTimeMillis() is highly recommended, which can also avoid 
> creating the temporary Date object.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3133) Capture metadata for different executions of the same process in Atlas

2019-04-11 Thread Srikanth Venkat (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Venkat updated ATLAS-3133:
---
Description: 
Background: The current Atlas process metadata model within Atlas does not 
track multiple instances of execution of the same process. For example if we 
run same DDL (e.g. insert into table A select * from table B, C) multiple times 
Atlas does not capture the multiple instances of execution of the same logical 
process. 

User story: As a compliance admin or data steward, I need to be able to track 
multiple executions of the same process or pipeline that were done at different 
times and relate them to the logical process so that I can provide traceability 
and understand how different instances of my data pipelining operations 
performed over time.

Acceptance Criteria:

1) Every new instance of process execution is captured with the appropriate 
metadata for the process along with context (who, what, when)

2) One can navigate between process and process execution in Atlas UI and 
explore relevant metadata

3) Process nodes in lineage display high level information about number of 
process executions associated with a particular process node.

 

 

 

 

  was:
Background: The current Atlas process metadata model within Atlas does not 
track multiple instances of execution of the same process. For example if we 
run a CTAS command (e.g. insert into table A select * from table B, C) multiple 
times Atlas does not capture the multiple instances of execution of the same 
logical process. 

User story: As a compliance admin or data steward, I need to be able to track 
multiple executions of the same process or pipeline that were done at different 
times and relate them to the logical process so that I can provide traceability 
and understand how different instances of my data pipelining operations 
performed over time.

Acceptance Criteria:

1) Every new instance of process execution is captured with the appropriate 
metadata for the process along with context (who, what, when)

2) One can navigate between process and process execution in Atlas UI and 
explore relevant metadata

3) Process nodes in lineage display high level information about number of 
process executions associated with a particular process node.

 

 

 

 


> Capture metadata for different executions of the same process in Atlas
> --
>
> Key: ATLAS-3133
> URL: https://issues.apache.org/jira/browse/ATLAS-3133
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core, atlas-webui
>Reporter: Srikanth Venkat
>Assignee: Aadarsh Jajodia
>Priority: Critical
>
> Background: The current Atlas process metadata model within Atlas does not 
> track multiple instances of execution of the same process. For example if we 
> run same DDL (e.g. insert into table A select * from table B, C) multiple 
> times Atlas does not capture the multiple instances of execution of the same 
> logical process. 
> User story: As a compliance admin or data steward, I need to be able to track 
> multiple executions of the same process or pipeline that were done at 
> different times and relate them to the logical process so that I can provide 
> traceability and understand how different instances of my data pipelining 
> operations performed over time.
> Acceptance Criteria:
> 1) Every new instance of process execution is captured with the appropriate 
> metadata for the process along with context (who, what, when)
> 2) One can navigate between process and process execution in Atlas UI and 
> explore relevant metadata
> 3) Process nodes in lineage display high level information about number of 
> process executions associated with a particular process node.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3071) Add Functionalities to Collect Notification Metrics/Entity Lifecyle

2019-04-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815947#comment-16815947
 ] 

ASF subversion and git services commented on ATLAS-3071:


Commit 0ca5033e72c03ab6bf941b627b41e747a9175033 in atlas's branch 
refs/heads/branch-2.0 from Madhan Neethiraj
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=0ca5033 ]

ATLAS-3071: renamed stats key for backend/index stores stats - #2

(cherry picked from commit e6deb3378dfb19371f42077c7520187f75e61564)


> Add Functionalities to Collect Notification Metrics/Entity Lifecyle
> ---
>
> Key: ATLAS-3071
> URL: https://issues.apache.org/jira/browse/ATLAS-3071
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Le Ma
>Assignee: Le Ma
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: ATLAS-3071-2.patch, ATLAS-3071.patch, 
> MetricsDataModel.json
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend api/atlas/admin/metrics to provide metrics for notifications/entity 
> lifycycle:                           
>                                                          # notification 
> processed 
>                                                          # notification 
> failed 
> today/thisHour/pastHour/total   {      # entity created 
>                                                          # entity updated
>                                                          # entity deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3071) Add Functionalities to Collect Notification Metrics/Entity Lifecyle

2019-04-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815945#comment-16815945
 ] 

ASF subversion and git services commented on ATLAS-3071:


Commit e6deb3378dfb19371f42077c7520187f75e61564 in atlas's branch 
refs/heads/master from Madhan Neethiraj
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=e6deb33 ]

ATLAS-3071: renamed stats key for backend/index stores stats - #2


> Add Functionalities to Collect Notification Metrics/Entity Lifecyle
> ---
>
> Key: ATLAS-3071
> URL: https://issues.apache.org/jira/browse/ATLAS-3071
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Le Ma
>Assignee: Le Ma
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: ATLAS-3071-2.patch, ATLAS-3071.patch, 
> MetricsDataModel.json
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend api/atlas/admin/metrics to provide metrics for notifications/entity 
> lifycycle:                           
>                                                          # notification 
> processed 
>                                                          # notification 
> failed 
> today/thisHour/pastHour/total   {      # entity created 
>                                                          # entity updated
>                                                          # entity deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3134) Change Date.getTime() to System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Description: 
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
 
new Date() is a thin wrapper of method System.currentTimeMillis(). If it is 
intensively invoked in the program, the performance will be greatly damaged.  
According to my local testing, when these two methods are invoked 5,000,000 
times at the same environment, System.currentTimeMillis() can achieve a speedup 
up to 5 times (435ms vs 2073ms). 
Therefore, if only getTime() is used for Date object, the light method 
System.currentTimeMillis() is highly recommended, which can also avoid creating 
the temporary Date object.
 

  was:
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}

{color:#33}new Date() is just a thin wrapper around 
System.currentTimeMillis().  It will improve performance by using 
System.currentTimeMillis() which can get the same current time stamp with 
Date.getTime().  For example, I have run both of them for 5,000,000 times and 
the running time is 2073ms vs 435ms{color}

 


> Change Date.getTime() to System.currentTimeMillis() to improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {color:#33}Location: 
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
>  
> new Date() is a thin wrapper of method System.currentTimeMillis(). If it is 
> intensively invoked in the program, the performance will be greatly damaged.  
> According to my local testing, when these two methods are invoked 5,000,000 
> times at the same environment, System.currentTimeMillis() can achieve a 
> speedup up to 5 times (435ms vs 2073ms). 
> Therefore, if only getTime() is used for Date object, the light method 
> System.currentTimeMillis() is highly recommended, which can also avoid 
> creating the temporary Date object.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [atlas] bd2019us opened a new pull request #40: ATLAS-3134 System.currentTimeMillis() is better than Date.getTime()

2019-04-11 Thread GitBox
bd2019us opened a new pull request #40: ATLAS-3134 System.currentTimeMillis() 
is better than Date.getTime()
URL: https://github.com/apache/atlas/pull/40
 
 
   new Date() is just a thin wrapper around System.currentTimeMillis(). Using 
System.currentTimeMillis() is better since we don't need create a new object


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (ATLAS-3134) Change Date.getTime() to System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Summary: Change Date.getTime() to System.currentTimeMillis() to improve 
performance  (was: Change java.util.Date.getTime() to 
java.lang.System.currentTimeMillis() to improve performance)

> Change Date.getTime() to System.currentTimeMillis() to improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>
> {color:#33}Location: 
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
> {color:#33}new Date() is just a thin wrapper around 
> System.currentTimeMillis().  It will improve performance by using 
> System.currentTimeMillis() which can get the same current time stamp with 
> Date.getTime().  For example, I have run both of them for 5,000,000 times and 
> the running time is 2073ms vs 435ms{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3134) Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Description: 
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}

{color:#33}new Date() is just a thin wrapper around 
System.currentTimeMillis().  It will improve performance by using 
System.currentTimeMillis() which can get the same current time stamp with 
Date.getTime().  For example, I have run both of them for 5,000,000 times and 
the running time is 2073ms vs 435ms{color}

 

  was:
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}

{color:#33}java.util.Date.getTime() is a thin wrapper around 
System.currentTimeMillis().  It will improve performance by using 
System.currentTimeMillis().  For example, I have run both of them for 5,000,000 
times and the running time is 2073ms vs 435ms{color}

 


> Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to 
> improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>
> {color:#33}Location: 
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
> {color:#33}new Date() is just a thin wrapper around 
> System.currentTimeMillis().  It will improve performance by using 
> System.currentTimeMillis() which can get the same current time stamp with 
> Date.getTime().  For example, I have run both of them for 5,000,000 times and 
> the running time is 2073ms vs 435ms{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3134) Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Description: 
{color:#33}Location: 
repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}

{color:#33}java.util.Date.getTime() is a thin wrapper around 
System.currentTimeMillis().  It will improve performance by using 
System.currentTimeMillis().  For example, I have run both of them for 5,000,000 
times and the running time is 2073ms vs 435ms{color}

 

  was:
{color:#33}java.util.Date.getTime() is a thin wrapper around 
System.currentTimeMillis().  It will improve performance by using 
System.currentTimeMillis().  For example, I have run both of them for 5,000,000 
times and the running time is 2073ms vs 435ms{color}

 


> Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to 
> improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>
> {color:#33}Location: 
> repository/src/main/java/org/apache/atlas/repository/store/graph/v2/AtlasTypeDefGraphStoreV2.java:527{color}
> {color:#33}java.util.Date.getTime() is a thin wrapper around 
> System.currentTimeMillis().  It will improve performance by using 
> System.currentTimeMillis().  For example, I have run both of them for 
> 5,000,000 times and the running time is 2073ms vs 435ms{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3134) Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Attachment: 1.patch

> Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to 
> improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
> Attachments: 1.patch
>
>
> {color:#33}java.util.Date.getTime() is a thin wrapper around 
> System.currentTimeMillis().  It will improve performance by using 
> System.currentTimeMillis().  For example, I have run both of them for 
> 5,000,000 times and the running time is 2073ms vs 435ms{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3134) Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Description: 
{color:#33}java.util.Date.getTime() is a thin wrapper around 
System.currentTimeMillis().  It will improve performance by using 
System.currentTimeMillis().  For example, I have run both of them for 5,000,000 
times and the running time is 2073ms vs 435ms{color}

 

  was:{color:#33}java.util.Date.getTime(){color}


> Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to 
> improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
>
> {color:#33}java.util.Date.getTime() is a thin wrapper around 
> System.currentTimeMillis().  It will improve performance by using 
> System.currentTimeMillis().  For example, I have run both of them for 
> 5,000,000 times and the running time is 2073ms vs 435ms{color}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3134) Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bd2019us updated ATLAS-3134:

Description: {color:#33}java.util.Date.getTime(){color}

> Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to 
> improve performance
> --
>
> Key: ATLAS-3134
> URL: https://issues.apache.org/jira/browse/ATLAS-3134
> Project: Atlas
>  Issue Type: Bug
>Reporter: bd2019us
>Priority: Major
>
> {color:#33}java.util.Date.getTime(){color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-3134) Change java.util.Date.getTime() to java.lang.System.currentTimeMillis() to improve performance

2019-04-11 Thread bd2019us (JIRA)
bd2019us created ATLAS-3134:
---

 Summary: Change java.util.Date.getTime() to 
java.lang.System.currentTimeMillis() to improve performance
 Key: ATLAS-3134
 URL: https://issues.apache.org/jira/browse/ATLAS-3134
 Project: Atlas
  Issue Type: Bug
Reporter: bd2019us






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 70462: ATLAS-3133 : Adding support for Process Executions in Atlas

2019-04-11 Thread Sarath Subramanian

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70462/#review214615
---




addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
Line 117 (original), 117 (patched)


for every process entity creation, we also create a process_execution 
entity, should we combine the method to return both entities?

something like:
List processes = getHiveProcessEntities(inputs, outputs);



addons/models/-Area0/0010-base_model.json
Lines 334 (patched)


typeVersion => 1.0



addons/models/1000-Hadoop/1030-hive_model.json
Lines 646 (patched)


hive_process_process_execution => hive_process_process_executions


- Sarath Subramanian


On April 11, 2019, 5:39 p.m., Aadarsh Jajodia wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70462/
> ---
> 
> (Updated April 11, 2019, 5:39 p.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Sridhar K, Le Ma, Madhan 
> Neethiraj, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3133
> https://issues.apache.org/jira/browse/ATLAS-3133
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Adding support for Process Executions in Atlas
> 
> 
> Diffs
> -
> 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
>  31346d0954140cd8bda690dc9079e0913f7b9d7d 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
>  d61f1d74e3238e0a7474de67c0400c108d8919ea 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateTable.java
>  674a89f6e4852dc30c29c5681854ec3ba8611f35 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java
>  94010d0cb1a7a5c48b71f6d77c5e1a8f5cfcf013 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/HiveITBase.java 
> 002b90839f78dc843b5aca56042c3decd299bed8 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> 142e35561fff42f7672c5b5136af1173049580b3 
>   addons/models/-Area0/0010-base_model.json 
> 8b41ee89f3a7288bb4cdad3df6887242b40b68d2 
>   addons/models/1000-Hadoop/1030-hive_model.json 
> e96443382a587411d1207d6e1157ea65350cbdba 
> 
> 
> Diff: https://reviews.apache.org/r/70462/diff/2/
> 
> 
> Testing
> ---
> 
> We want to add support for Process Executions in Atlas. With the help of 
> process executions attributes specific to each execution(like startTime, 
> endTime, queryText) will be captured in the execution object. With the 
> current model of Atlas this is lost since each execution overrides the 
> previous one. To solve this problem. We are creating 2 new entity definitions 
> and one relations definition. We create an entity definition called a process 
> execution and another one called hive process execution. The hive process 
> execution has all the attributes which are specific to each execution. We 
> also create a relationship defintion between a hive process and a hive 
> process execution as a many to one relation. The lineage does not get 
> affected in this since the inputs and outputs are still attached to the hive 
> process. Hive Process can now be thought of as a grouping of multiple 
> executions. There is no lineage shown for a hive process execution. The 
> criteria for grouping is based on the qualif
 iedName of a Hive Process. As long as the qualified Name of a hive process 
remains the same, each execution gets mapped to the same hive process.
> 
> 
> Thanks,
> 
> Aadarsh Jajodia
> 
>



Re: Review Request 70462: ATLAS-3133 : Adding support for Process Executions in Atlas

2019-04-11 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70462/#review214611
---




addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
Lines 616 (patched)


Instead of setting empty value (HIVE_PROCESS_REDUNDANT_FIELD_VALUE) to 
these attributes, it will be simpler to not set these attributes at all. i.e. 
remove lines #616 - #621.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
Lines 636 (patched)


Please replace ":" with QNAME_SEP_PROCESS



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
Lines 638 (patched)


value for attribute 'name' doesn't need to be appended with 'endTime'. It 
can simply be the query-string. Please review and update.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
Line 615 (original), 646 (patched)


Consider caching hostName, perhaps in AtlasHookContext, instead of calling 
InetAddress.getLocalHost().getHostName() for every process-execution.



addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java
Lines 46 (patched)


Please add new enums to end of the list (..a practice from good-old C/C++ 
days!).



addons/models/-Area0/0010-base_model.json
Lines 329 (patched)


To be consistent with other base types (like AtlasServer, AtlasUserProfile, 
AtlasUserSavedSearch, ExportImportAuditEntry), consider renaming 
"Process_Execution" to "ProcessExecution".


- Madhan Neethiraj


On April 12, 2019, 12:39 a.m., Aadarsh Jajodia wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/70462/
> ---
> 
> (Updated April 12, 2019, 12:39 a.m.)
> 
> 
> Review request for atlas, Ashutosh Mestry, Sridhar K, Le Ma, Madhan 
> Neethiraj, and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3133
> https://issues.apache.org/jira/browse/ATLAS-3133
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> Adding support for Process Executions in Atlas
> 
> 
> Diffs
> -
> 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
>  31346d0954140cd8bda690dc9079e0913f7b9d7d 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
>  d61f1d74e3238e0a7474de67c0400c108d8919ea 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateTable.java
>  674a89f6e4852dc30c29c5681854ec3ba8611f35 
>   
> addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java
>  94010d0cb1a7a5c48b71f6d77c5e1a8f5cfcf013 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/HiveITBase.java 
> 002b90839f78dc843b5aca56042c3decd299bed8 
>   addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
> 142e35561fff42f7672c5b5136af1173049580b3 
>   addons/models/-Area0/0010-base_model.json 
> 8b41ee89f3a7288bb4cdad3df6887242b40b68d2 
>   addons/models/1000-Hadoop/1030-hive_model.json 
> e96443382a587411d1207d6e1157ea65350cbdba 
> 
> 
> Diff: https://reviews.apache.org/r/70462/diff/2/
> 
> 
> Testing
> ---
> 
> We want to add support for Process Executions in Atlas. With the help of 
> process executions attributes specific to each execution(like startTime, 
> endTime, queryText) will be captured in the execution object. With the 
> current model of Atlas this is lost since each execution overrides the 
> previous one. To solve this problem. We are creating 2 new entity definitions 
> and one relations definition. We create an entity definition called a process 
> execution and another one called hive process execution. The hive process 
> execution has all the attributes which are specific to each execution. We 
> also create a relationship defintion between a hive process and a hive 
> process execution as a many to one relation. The lineage does not get 
> affected in this since the inputs and outputs are still attached to the hive 
> process. Hive Process can now be thought of as a grouping of multiple 
> executions. There is no lineage shown for a hive process execution. The 
> criteria for grouping is based on the qualif
 iedName of a Hive Process. As long as the qualified Name of a hive process 
remains the same, each execution gets mapped to the same hive process.
> 
> 
> Thanks,
> 
> Aadarsh Jajodia
> 
>



[jira] [Resolved] (ATLAS-3071) Add Functionalities to Collect Notification Metrics/Entity Lifecyle

2019-04-11 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj resolved ATLAS-3071.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Add Functionalities to Collect Notification Metrics/Entity Lifecyle
> ---
>
> Key: ATLAS-3071
> URL: https://issues.apache.org/jira/browse/ATLAS-3071
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Le Ma
>Assignee: Le Ma
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: ATLAS-3071-2.patch, ATLAS-3071.patch, 
> MetricsDataModel.json
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend api/atlas/admin/metrics to provide metrics for notifications/entity 
> lifycycle:                           
>                                                          # notification 
> processed 
>                                                          # notification 
> failed 
> today/thisHour/pastHour/total   {      # entity created 
>                                                          # entity updated
>                                                          # entity deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3071) Add Functionalities to Collect Notification Metrics/Entity Lifecyle

2019-04-11 Thread Madhan Neethiraj (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhan Neethiraj updated ATLAS-3071:

Attachment: ATLAS-3071-2.patch

> Add Functionalities to Collect Notification Metrics/Entity Lifecyle
> ---
>
> Key: ATLAS-3071
> URL: https://issues.apache.org/jira/browse/ATLAS-3071
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Le Ma
>Assignee: Le Ma
>Priority: Major
> Attachments: ATLAS-3071-2.patch, ATLAS-3071.patch, 
> MetricsDataModel.json
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend api/atlas/admin/metrics to provide metrics for notifications/entity 
> lifycycle:                           
>                                                          # notification 
> processed 
>                                                          # notification 
> failed 
> today/thisHour/pastHour/total   {      # entity created 
>                                                          # entity updated
>                                                          # entity deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3071) Add Functionalities to Collect Notification Metrics/Entity Lifecyle

2019-04-11 Thread Madhan Neethiraj (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815874#comment-16815874
 ] 

Madhan Neethiraj commented on ATLAS-3071:
-

 [^ATLAS-3071-2.patch] - minor updates, to rename couple of stats.

> Add Functionalities to Collect Notification Metrics/Entity Lifecyle
> ---
>
> Key: ATLAS-3071
> URL: https://issues.apache.org/jira/browse/ATLAS-3071
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Le Ma
>Assignee: Le Ma
>Priority: Major
> Attachments: ATLAS-3071-2.patch, ATLAS-3071.patch, 
> MetricsDataModel.json
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Extend api/atlas/admin/metrics to provide metrics for notifications/entity 
> lifycycle:                           
>                                                          # notification 
> processed 
>                                                          # notification 
> failed 
> today/thisHour/pastHour/total   {      # entity created 
>                                                          # entity updated
>                                                          # entity deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 70462: ATLAS-3133 : Adding support for Process Executions in Atlas

2019-04-11 Thread Aadarsh Jajodia

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70462/
---

(Updated April 12, 2019, 12:39 a.m.)


Review request for atlas, Ashutosh Mestry, Sridhar K, Le Ma, Madhan Neethiraj, 
and Sarath Subramanian.


Changes
---

Adding the Atlas JIRA to the review


Summary (updated)
-

ATLAS-3133 : Adding support for Process Executions in Atlas


Bugs: ATLAS-3133
https://issues.apache.org/jira/browse/ATLAS-3133


Repository: atlas


Description
---

Adding support for Process Executions in Atlas


Diffs (updated)
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/BaseHiveEvent.java
 31346d0954140cd8bda690dc9079e0913f7b9d7d 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateHiveProcess.java
 d61f1d74e3238e0a7474de67c0400c108d8919ea 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/events/CreateTable.java
 674a89f6e4852dc30c29c5681854ec3ba8611f35 
  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/model/HiveDataTypes.java 
94010d0cb1a7a5c48b71f6d77c5e1a8f5cfcf013 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/HiveITBase.java 
002b90839f78dc843b5aca56042c3decd299bed8 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
142e35561fff42f7672c5b5136af1173049580b3 
  addons/models/-Area0/0010-base_model.json 
8b41ee89f3a7288bb4cdad3df6887242b40b68d2 
  addons/models/1000-Hadoop/1030-hive_model.json 
e96443382a587411d1207d6e1157ea65350cbdba 


Diff: https://reviews.apache.org/r/70462/diff/2/

Changes: https://reviews.apache.org/r/70462/diff/1-2/


Testing
---

We want to add support for Process Executions in Atlas. With the help of 
process executions attributes specific to each execution(like startTime, 
endTime, queryText) will be captured in the execution object. With the current 
model of Atlas this is lost since each execution overrides the previous one. To 
solve this problem. We are creating 2 new entity definitions and one relations 
definition. We create an entity definition called a process execution and 
another one called hive process execution. The hive process execution has all 
the attributes which are specific to each execution. We also create a 
relationship defintion between a hive process and a hive process execution as a 
many to one relation. The lineage does not get affected in this since the 
inputs and outputs are still attached to the hive process. Hive Process can now 
be thought of as a grouping of multiple executions. There is no lineage shown 
for a hive process execution. The criteria for grouping is based on the qualifie
 dName of a Hive Process. As long as the qualified Name of a hive process 
remains the same, each execution gets mapped to the same hive process.


Thanks,

Aadarsh Jajodia



[jira] [Created] (ATLAS-3133) Capture metadata for different execution of the same process in Atlas

2019-04-11 Thread Srikanth Venkat (JIRA)
Srikanth Venkat created ATLAS-3133:
--

 Summary: Capture metadata for different execution of the same 
process in Atlas
 Key: ATLAS-3133
 URL: https://issues.apache.org/jira/browse/ATLAS-3133
 Project: Atlas
  Issue Type: New Feature
  Components:  atlas-core, atlas-webui
Reporter: Srikanth Venkat


Background: The current Atlas process metadata model within Atlas does not 
track multiple instances of execution of the same process. For example if we 
run a CTAS command (e.g. insert into table A select * from table B, C) multiple 
times Atlas does not capture the multiple instances of execution of the same 
logical process. 

User story: As a compliance admin or data steward, I need to be able to track 
multiple executions of the same process or pipeline that were done at different 
times and relate them to the logical process so that I can provide traceability 
and understand how different instances of my data pipelining operations 
performed over time.

Acceptance Criteria:

1) Every new instance of process execution is captured with the appropriate 
metadata for the process along with context (who, what, when)

2) One can navigate between process and process execution in Atlas UI and 
explore relevant metadata

3) Process nodes in lineage display high level information about number of 
process executions associated with a particular process node.

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3133) Capture metadata for different executions of the same process in Atlas

2019-04-11 Thread Srikanth Venkat (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srikanth Venkat updated ATLAS-3133:
---
Summary: Capture metadata for different executions of the same process in 
Atlas  (was: Capture metadata for different execution of the same process in 
Atlas)

> Capture metadata for different executions of the same process in Atlas
> --
>
> Key: ATLAS-3133
> URL: https://issues.apache.org/jira/browse/ATLAS-3133
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core, atlas-webui
>Reporter: Srikanth Venkat
>Priority: Critical
>
> Background: The current Atlas process metadata model within Atlas does not 
> track multiple instances of execution of the same process. For example if we 
> run a CTAS command (e.g. insert into table A select * from table B, C) 
> multiple times Atlas does not capture the multiple instances of execution of 
> the same logical process. 
> User story: As a compliance admin or data steward, I need to be able to track 
> multiple executions of the same process or pipeline that were done at 
> different times and relate them to the logical process so that I can provide 
> traceability and understand how different instances of my data pipelining 
> operations performed over time.
> Acceptance Criteria:
> 1) Every new instance of process execution is captured with the appropriate 
> metadata for the process along with context (who, what, when)
> 2) One can navigate between process and process execution in Atlas UI and 
> explore relevant metadata
> 3) Process nodes in lineage display high level information about number of 
> process executions associated with a particular process node.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ATLAS-3133) Capture metadata for different executions of the same process in Atlas

2019-04-11 Thread Aadarsh Jajodia (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aadarsh Jajodia reassigned ATLAS-3133:
--

Assignee: Aadarsh Jajodia

> Capture metadata for different executions of the same process in Atlas
> --
>
> Key: ATLAS-3133
> URL: https://issues.apache.org/jira/browse/ATLAS-3133
> Project: Atlas
>  Issue Type: New Feature
>  Components:  atlas-core, atlas-webui
>Reporter: Srikanth Venkat
>Assignee: Aadarsh Jajodia
>Priority: Critical
>
> Background: The current Atlas process metadata model within Atlas does not 
> track multiple instances of execution of the same process. For example if we 
> run a CTAS command (e.g. insert into table A select * from table B, C) 
> multiple times Atlas does not capture the multiple instances of execution of 
> the same logical process. 
> User story: As a compliance admin or data steward, I need to be able to track 
> multiple executions of the same process or pipeline that were done at 
> different times and relate them to the logical process so that I can provide 
> traceability and understand how different instances of my data pipelining 
> operations performed over time.
> Acceptance Criteria:
> 1) Every new instance of process execution is captured with the appropriate 
> metadata for the process along with context (who, what, when)
> 2) One can navigate between process and process execution in Atlas UI and 
> explore relevant metadata
> 3) Process nodes in lineage display high level information about number of 
> process executions associated with a particular process node.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3132) Data Patch Fx: Improve Data Patching Performance

2019-04-11 Thread Ashutosh Mestry (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-3132:
---
Description: 
*Background*

The Java patch framework (now called data patching framework) introduced 
recently performs patching at the rate of 1 million entities per 15 hrs. This 
can be improved.

*Proposed Solution*
 * Use the Producer-Consumer framework to spawn multiple workers to perform 
concurrent updates to entity vertices.
 * Use _AtlasGraph_ in bulk loading mode to further gain performance.
 * Perform duplicate data checks during processing.

*Projected Performance Improvement*
 * Based on various tests, these give increased throughput. New rate can be 
~300K entities per 5 mins.

  was:
*Background*

The Java patch framework (now called data patching framework) introduced 
recently performs patching at the rate of 1 million entities per 15 hrs. This 
can be improved.

*Proposed Solution***
 * Use the Producer-Consumer framework to spawn multiple workers to perform 
concurrent updates to entity vertices.
 * Use _AtlasGraph_ in bulk loading mode to further gain performance.
 * Perform duplicate data checks during processing.

*Projected Performance Improvement*
 * Based on various tests, these give increased throughput. New rate can be 
~300K entities per 5 mins.


> Data Patch Fx: Improve Data Patching Performance
> 
>
> Key: ATLAS-3132
> URL: https://issues.apache.org/jira/browse/ATLAS-3132
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: trunk
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: trunk
>
>
> *Background*
> The Java patch framework (now called data patching framework) introduced 
> recently performs patching at the rate of 1 million entities per 15 hrs. This 
> can be improved.
> *Proposed Solution*
>  * Use the Producer-Consumer framework to spawn multiple workers to perform 
> concurrent updates to entity vertices.
>  * Use _AtlasGraph_ in bulk loading mode to further gain performance.
>  * Perform duplicate data checks during processing.
> *Projected Performance Improvement*
>  * Based on various tests, these give increased throughput. New rate can be 
> ~300K entities per 5 mins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-3132) Data Patch Fx: Improve Data Patching Performance

2019-04-11 Thread Ashutosh Mestry (JIRA)
Ashutosh Mestry created ATLAS-3132:
--

 Summary: Data Patch Fx: Improve Data Patching Performance
 Key: ATLAS-3132
 URL: https://issues.apache.org/jira/browse/ATLAS-3132
 Project: Atlas
  Issue Type: Improvement
  Components:  atlas-core
Affects Versions: trunk
Reporter: Ashutosh Mestry
Assignee: Ashutosh Mestry
 Fix For: trunk


*Background*

The Java patch framework (now called data patching framework) introduced 
recently performs patching at the rate of 1 million entities per 15 hrs. This 
can be improved.

*Proposed Solution***
 * Use the Producer-Consumer framework to spawn multiple workers to perform 
concurrent updates to entity vertices.
 * Use _AtlasGraph_ in bulk loading mode to further gain performance.
 * Perform duplicate data checks during processing.

*Projected Performance Improvement*
 * Based on various tests, these give increased throughput. New rate can be 
~300K entities per 5 mins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Review Request 70457: ATLAS-3131 UI : Stats Popup Improvements

2019-04-11 Thread Binit Gutka

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/70457/
---

Review request for atlas, keval bhatt, Madhan Neethiraj, Nikhil Bonte, and 
Sarath Subramanian.


Bugs: ATLAS-3131
https://issues.apache.org/jira/browse/ATLAS-3131


Repository: atlas


Description
---

Sepreation of server stats, notification stats and connections


Diffs
-

  dashboardv2/public/css/scss/theme.scss 25ca86806 
  dashboardv2/public/js/templates/site/Statistics_tmpl.html 71643c7b0 
  dashboardv2/public/js/utils/Enums.js 8df22af1a 
  dashboardv2/public/js/utils/Utils.js 56bd844f1 
  dashboardv2/public/js/views/site/Statistics.js 9d3478483 


Diff: https://reviews.apache.org/r/70457/diff/1/


Testing
---

Done one round of sanity test


Thanks,

Binit Gutka



[jira] [Updated] (ATLAS-3131) UI : Stats Popup Improvements

2019-04-11 Thread Binit Gutka (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binit Gutka updated ATLAS-3131:
---
Attachment: ATLAS-3131.patch

> UI : Stats Popup Improvements
> -
>
> Key: ATLAS-3131
> URL: https://issues.apache.org/jira/browse/ATLAS-3131
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Binit Gutka
>Assignee: Binit Gutka
>Priority: Major
> Attachments: ATLAS-3131.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3131) UI : Stats Popup Improvements

2019-04-11 Thread Binit Gutka (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binit Gutka updated ATLAS-3131:
---
Attachment: (was: ATLAS-3131.patch)

> UI : Stats Popup Improvements
> -
>
> Key: ATLAS-3131
> URL: https://issues.apache.org/jira/browse/ATLAS-3131
> Project: Atlas
>  Issue Type: Improvement
>Reporter: Binit Gutka
>Assignee: Binit Gutka
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3128) UI: Create entity is not working after new relationship attribute introduce

2019-04-11 Thread Keval Bhatt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Keval Bhatt updated ATLAS-3128:
---
Attachment: ATLAS-3128-new-improvment.patch

> UI: Create entity is not working after new relationship attribute introduce
> ---
>
> Key: ATLAS-3128
> URL: https://issues.apache.org/jira/browse/ATLAS-3128
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Keval Bhatt
>Assignee: Keval Bhatt
>Priority: Major
> Attachments: ATLAS-3128-1.patch, ATLAS-3128-2.patch, 
> ATLAS-3128-new-improvment.patch, ATLAS-3128.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-3131) UI : Stats Popup Improvements

2019-04-11 Thread Binit Gutka (JIRA)
Binit Gutka created ATLAS-3131:
--

 Summary: UI : Stats Popup Improvements
 Key: ATLAS-3131
 URL: https://issues.apache.org/jira/browse/ATLAS-3131
 Project: Atlas
  Issue Type: Improvement
Reporter: Binit Gutka
Assignee: Binit Gutka






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Atlas Entity for specific GUID

2019-04-11 Thread anshu shukla
Hi,

Due to some specific use case, I want to create Atlas entity with fixed and
given GUID. Is it possible?

If not then please suggest which part of code correspond to that.
-- 
Thanks & Regards,
Anshu Shukla


[jira] [Commented] (ATLAS-3110) Add a bulk api to get entities by unique attributes

2019-04-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815176#comment-16815176
 ] 

ASF subversion and git services commented on ATLAS-3110:


Commit 07d5cc9fbdcee8d020f4a4d0dc59dc6f12ecf2ab in atlas's branch 
refs/heads/branch-1.0 from Ayush Nigam
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=07d5cc9 ]

ATLAS-3110: added REST API to get multiple entities given their unique 
attribute values

Signed-off-by: Madhan Neethiraj 
(cherry picked from commit 9d4380e0f32ab47ba51eba89d514be3968a3bd86)


> Add a bulk api to get entities by unique attributes
> ---
>
> Key: ATLAS-3110
> URL: https://issues.apache.org/jira/browse/ATLAS-3110
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Ayush Nigam
>Assignee: Ayush Nigam
>Priority: Minor
> Attachments: ATLAS-3110(1).patch
>
>
> Add a bulk api to get entities by unique attributes,currently there is bulk 
> api for guids only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3110) Add a bulk api to get entities by unique attributes

2019-04-11 Thread Ayush Nigam (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Nigam updated ATLAS-3110:
---
Attachment: (was: ATLAS-3110.patch)

> Add a bulk api to get entities by unique attributes
> ---
>
> Key: ATLAS-3110
> URL: https://issues.apache.org/jira/browse/ATLAS-3110
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Ayush Nigam
>Assignee: Ayush Nigam
>Priority: Minor
> Attachments: ATLAS-3110(1).patch
>
>
> Add a bulk api to get entities by unique attributes,currently there is bulk 
> api for guids only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-3110) Add a bulk api to get entities by unique attributes

2019-04-11 Thread Ayush Nigam (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Nigam updated ATLAS-3110:
---
Attachment: ATLAS-3110(1).patch

> Add a bulk api to get entities by unique attributes
> ---
>
> Key: ATLAS-3110
> URL: https://issues.apache.org/jira/browse/ATLAS-3110
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Ayush Nigam
>Assignee: Ayush Nigam
>Priority: Minor
> Attachments: ATLAS-3110(1).patch
>
>
> Add a bulk api to get entities by unique attributes,currently there is bulk 
> api for guids only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3110) Add a bulk api to get entities by unique attributes

2019-04-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815171#comment-16815171
 ] 

ASF subversion and git services commented on ATLAS-3110:


Commit 20977c69680ab1da4ea1aa67ee0d405a20bc1a62 in atlas's branch 
refs/heads/branch-2.0 from Ayush Nigam
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=20977c6 ]

ATLAS-3110: added REST API to get multiple entities given their unique 
attribute values

Signed-off-by: Madhan Neethiraj 
(cherry picked from commit 9d4380e0f32ab47ba51eba89d514be3968a3bd86)


> Add a bulk api to get entities by unique attributes
> ---
>
> Key: ATLAS-3110
> URL: https://issues.apache.org/jira/browse/ATLAS-3110
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Ayush Nigam
>Assignee: Ayush Nigam
>Priority: Minor
> Attachments: ATLAS-3110.patch
>
>
> Add a bulk api to get entities by unique attributes,currently there is bulk 
> api for guids only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-3110) Add a bulk api to get entities by unique attributes

2019-04-11 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815170#comment-16815170
 ] 

ASF subversion and git services commented on ATLAS-3110:


Commit 9d4380e0f32ab47ba51eba89d514be3968a3bd86 in atlas's branch 
refs/heads/master from Ayush Nigam
[ https://gitbox.apache.org/repos/asf?p=atlas.git;h=9d4380e ]

ATLAS-3110: added REST API to get multiple entities given their unique 
attribute values

Signed-off-by: Madhan Neethiraj 


> Add a bulk api to get entities by unique attributes
> ---
>
> Key: ATLAS-3110
> URL: https://issues.apache.org/jira/browse/ATLAS-3110
> Project: Atlas
>  Issue Type: New Feature
>Reporter: Ayush Nigam
>Assignee: Ayush Nigam
>Priority: Minor
> Attachments: ATLAS-3110.patch
>
>
> Add a bulk api to get entities by unique attributes,currently there is bulk 
> api for guids only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)