vladhlinsky opened a new pull request #93: ATLAS-3661 Create 
'spark_column_lineage' type and relationship definition
URL: https://github.com/apache/atlas/pull/93
 
 
   ## What changes were proposed in this pull request?
   
   Create `spark_column_lineage` type and relationship definition to add 
support of column level lineage for `CREATE TABLE AS SELECT ...` statements and 
views. Column level lineage refers to lineage created between the input and 
output columns.
   For example:
   ```
   hive > create table employee_ctas as select id from employee;
   ```    
   For the above query, lineage is created from `employee` to `employee_ctas`, 
and also from `employee.id` to `employee_ctas.id`.
   
   ## How was this patch tested?
   
   Manually using modified version of Spark Atlas Connector:
   - Installed and started Atlas.
   - `1100-spark_model.json` is updated with proposed changes. Atlas is 
restarted.
   - Executed the next statements using spark-shell:
   
   ```
   spark.sql("create table sparkemployee_1_2(id int,name string)");
   spark.sql("create table sparkemployee_ctas_1_2 as select id from 
sparkemployee_1_2");
   ```
   - Verified that each table has column entities and `spark_column_lineage` 
entity is created.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to