[jira] [Updated] (ATLAS-4072) spark_column_lineage missing for insert into select * queries run via spark-shell

2020-12-10 Thread Umesh Padashetty (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Padashetty updated ATLAS-4072:

Attachment: Screenshot 2020-12-11 at 1.34.24 AM.png

> spark_column_lineage missing for insert into select * queries run via 
> spark-shell
> -
>
> Key: ATLAS-4072
> URL: https://issues.apache.org/jira/browse/ATLAS-4072
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Reporter: Umesh Padashetty
>Priority: Major
> Attachments: Screenshot 2020-12-11 at 1.29.39 AM.png, Screenshot 
> 2020-12-11 at 1.34.24 AM.png
>
>
> From the spark-shell, ran the below queries
>  * spark.sql("create table umesh(name string)");
>  * spark.sql("create table umesh_insert(name string)");
>  * spark.sql("insert into umesh_insert select * from umesh");
> There is a spark_process created between umesh and umesh_insert tables, but 
> the spark_column_lineage is missing between the umesh.name and 
> umesh_insert.name columns
> !Screenshot 2020-12-11 at 1.29.39 AM.png|width=438,height=435!
> To cross verify the behavior, I ran similar hive queries via beeline and 
> found out that along with hive_process being created between umesh_hive and 
> umesh_hive_insert tables, hive_column_lineage is created between 
> umesh_hive.name and umesh_hive_insert.name columns.
> Queries run via beeline
>  * create table umesh_hive(name string);
>  * create table umesh_hive_insert(name string);
>  * insert into umesh_hive_insert select * from umesh_hive;
> !Screenshot 2020-12-11 at 1.34.24 AM.png|width=441,height=548!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-4072) spark_column_lineage missing for insert into select * queries run via spark-shell

2020-12-10 Thread Umesh Padashetty (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Padashetty updated ATLAS-4072:

Attachment: Screenshot 2020-12-11 at 1.29.39 AM.png

> spark_column_lineage missing for insert into select * queries run via 
> spark-shell
> -
>
> Key: ATLAS-4072
> URL: https://issues.apache.org/jira/browse/ATLAS-4072
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Reporter: Umesh Padashetty
>Priority: Major
> Attachments: Screenshot 2020-12-11 at 1.29.39 AM.png, Screenshot 
> 2020-12-11 at 1.34.24 AM.png
>
>
> From the spark-shell, ran the below queries
>  * spark.sql("create table umesh(name string)");
>  * spark.sql("create table umesh_insert(name string)");
>  * spark.sql("insert into umesh_insert select * from umesh");
> There is a spark_process created between umesh and umesh_insert tables, but 
> the spark_column_lineage is missing between the umesh.name and 
> umesh_insert.name columns
> !Screenshot 2020-12-11 at 1.29.39 AM.png|width=438,height=435!
> To cross verify the behavior, I ran similar hive queries via beeline and 
> found out that along with hive_process being created between umesh_hive and 
> umesh_hive_insert tables, hive_column_lineage is created between 
> umesh_hive.name and umesh_hive_insert.name columns.
> Queries run via beeline
>  * create table umesh_hive(name string);
>  * create table umesh_hive_insert(name string);
>  * insert into umesh_hive_insert select * from umesh_hive;
> !Screenshot 2020-12-11 at 1.34.24 AM.png|width=441,height=548!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ATLAS-4072) spark_column_lineage missing for insert into select * queries run via spark-shell

2020-12-10 Thread Umesh Padashetty (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Umesh Padashetty updated ATLAS-4072:

Description: 
>From the spark-shell, ran the below queries
 * spark.sql("create table umesh(name string)");
 * spark.sql("create table umesh_insert(name string)");
 * spark.sql("insert into umesh_insert select * from umesh");

There is a spark_process created between umesh and umesh_insert tables, but the 
spark_column_lineage is missing between the umesh.name and umesh_insert.name 
columns

!Screenshot 2020-12-11 at 1.29.39 AM.png!

To cross verify the behavior, I ran similar hive queries via beeline and found 
out that along with hive_process being created between umesh_hive and 
umesh_hive_insert tables, hive_column_lineage is created between 
umesh_hive.name and umesh_hive_insert.name columns.

Queries run via beeline
 * create table umesh_hive(name string);
 * create table umesh_hive_insert(name string);
 * insert into umesh_hive_insert select * from umesh_hive;

!Screenshot 2020-12-11 at 1.34.24 AM.png!

 

  was:
>From the spark-shell, ran the below queries
 * spark.sql("create table umesh(name string)");
 * spark.sql("create table umesh_insert(name string)");
 * spark.sql("insert into umesh_insert select * from umesh");

There is a spark_process created between umesh and umesh_insert tables, but the 
spark_column_lineage is missing between the umesh.name and umesh_insert.name 
columns

!Screenshot 2020-12-11 at 1.29.39 AM.png|width=438,height=435!

To cross verify the behavior, I ran similar hive queries via beeline and found 
out that along with hive_process being created between umesh_hive and 
umesh_hive_insert tables, hive_column_lineage is created between 
umesh_hive.name and umesh_hive_insert.name columns.

Queries run via beeline
 * create table umesh_hive(name string);
 * create table umesh_hive_insert(name string);
 * insert into umesh_hive_insert select * from umesh_hive;

!Screenshot 2020-12-11 at 1.34.24 AM.png|width=441,height=548!

 


> spark_column_lineage missing for insert into select * queries run via 
> spark-shell
> -
>
> Key: ATLAS-4072
> URL: https://issues.apache.org/jira/browse/ATLAS-4072
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Reporter: Umesh Padashetty
>Priority: Major
> Attachments: Screenshot 2020-12-11 at 1.29.39 AM.png, Screenshot 
> 2020-12-11 at 1.34.24 AM.png
>
>
> From the spark-shell, ran the below queries
>  * spark.sql("create table umesh(name string)");
>  * spark.sql("create table umesh_insert(name string)");
>  * spark.sql("insert into umesh_insert select * from umesh");
> There is a spark_process created between umesh and umesh_insert tables, but 
> the spark_column_lineage is missing between the umesh.name and 
> umesh_insert.name columns
> !Screenshot 2020-12-11 at 1.29.39 AM.png!
> To cross verify the behavior, I ran similar hive queries via beeline and 
> found out that along with hive_process being created between umesh_hive and 
> umesh_hive_insert tables, hive_column_lineage is created between 
> umesh_hive.name and umesh_hive_insert.name columns.
> Queries run via beeline
>  * create table umesh_hive(name string);
>  * create table umesh_hive_insert(name string);
>  * insert into umesh_hive_insert select * from umesh_hive;
> !Screenshot 2020-12-11 at 1.34.24 AM.png!
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)