[jira] [Commented] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations

2021-10-18 Thread Eva Xiao (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17430291#comment-17430291
 ] 

Eva Xiao commented on ATLAS-3655:
-

Hi Vladislav, I'm seeing this change is merged to Atlas 2.2, but I don't see 
corresponding change happen from Spark Atlas Connector repository, I'm 
wondering how did you get the screenshots with this new model? Did you make 
change to the Spark Atlas Connector yourself or is there any undergoing work on 
that project which is not public yet?

> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations
> ---
>
> Key: ATLAS-3655
> URL: https://issues.apache.org/jira/browse/ATLAS-3655
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
> Attachments: Screenshot from 2020-03-03 16-09-39.png
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations. Currently, Spark Atlas Connector uses 
> 'spark_process' as a top-level type for a Spark session, thus it's being 
> updated for multiple operations within the same session.
> The following statements:
> {code:java}
> spark.sql("create table table_1(col1 int,col2 string)");
> spark.sql("create table table_2 as select * from table_1");
> {code}
> result in the next correct lineage:
> table1 --> spark_process1 ---> table2
> but executing similar statements in the same spark session:
> {code:java}
> spark.sql("create table table_3(col1 int,col2 string)"); 
> spark.sql("create table table_4 as select * from table_3");
> {code}
> result in the same 'spark_process' being updated and the lineage now connects 
> all the 4 tables(see screenshot in the attachments).
>  
> The proposal is to create a 'spark_application' entity and associate all 
> 'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ATLAS-3655) Create 'spark_application' type to avoid 'spark_process' from being updated for multiple operations

2020-03-05 Thread Vladislav Glinskiy (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052471#comment-17052471
 ] 

Vladislav Glinskiy commented on ATLAS-3655:
---

cc [~kabhwan] [~sarath] 

> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations
> ---
>
> Key: ATLAS-3655
> URL: https://issues.apache.org/jira/browse/ATLAS-3655
> Project: Atlas
>  Issue Type: Task
>Reporter: Vladislav Glinskiy
>Priority: Major
> Fix For: 2.1.0, 3.0.0
>
> Attachments: Screenshot from 2020-03-03 16-09-39.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Create 'spark_application' type to avoid 'spark_process' from being updated 
> for multiple operations. Currently, Spark Atlas Connector uses 
> 'spark_process' as a top-level type for a Spark session, thus it's being 
> updated for multiple operations within the same session.
> The following statements:
> {code:java}
> spark.sql("create table table_1(col1 int,col2 string)");
> spark.sql("create table table_2 as select * from table_1");
> {code}
> result in the next correct lineage:
> table1 --> spark_process1 ---> table2
> but executing similar statements in the same spark session:
> {code:java}
> spark.sql("create table table_3(col1 int,col2 string)"); 
> spark.sql("create table table_4 as select * from table_3");
> {code}
> result in the same 'spark_process' being updated and the lineage now connects 
> all the 4 tables(see screenshot in the attachments).
>  
> The proposal is to create a 'spark_application' entity and associate all 
> 'spark_process' entities (created within that session) to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)