[jira] [Updated] (ATLAS-844) Remove titan berkeley and elastic search jars if hbase/solr based profiles are chosen

2016-06-19 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated ATLAS-844:
---
Attachment: ATLAS-844.patch

The attached patch does the following:

* When the profile selected for packaging is anything other than 
{{berkeley-elasticsearch}}, it excludes the Berkeley DB Java edition jar, 
elastic search jar, and the corresponding titan adapters.
* When the profile selected is {{berkeley-elasticsearch}}, it excludes only the 
Berkeley DB Java edition jar. 
* I think the exclusion may be required because the BerkeleyDB java edition has 
the Sleepy Cat license 
(http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html,
 https://opensource.org/licenses/Sleepycat) which seems to be incompatible for 
distribution with the Apache Software license. Some reference to Titan mailing 
list documentation on this: 
https://groups.google.com/d/msg/aureliusgraphs/5zF6zzGRFEs/igecqgkAOqkJ
* Adds documentation steps for how to get the BerkeleyDB jar if required. (I 
used the documentation steps from Apache Falcon which has similar needs, I 
think).

Note the following:
* I have only removed the bundling of the jars in Atlas distribution. AFAIK, 
that's the only requirement. 
* The titan adapters are covered under Apache license, hence they can be 
redistributed.
* The involved jars were a part of the Atlas server war file, hence the changes 
to {{maven-war-plugin}}.

I tested this with
* {{mvn clean install}} - runs all tests and passes
* {{mvn clean package -Pdist}} - default profile (pointing to external hbase 
and solr, hence removes the other jars)
* {{mvn clean package -Pdist,berkeley-elasticsearch}} (includes the berkeley 
titan adapter, ES jar, and ES titan adapter). Copied the berkeley JE jar to 
${atlas_home}/extlib and server starts up fine.



> Remove titan berkeley and elastic search jars if hbase/solr based profiles 
> are chosen
> -
>
> Key: ATLAS-844
> URL: https://issues.apache.org/jira/browse/ATLAS-844
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Hemanth Yamijala
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-844.patch
>
>
> With ATLAS-833, users of Atlas now have the option of using an external 
> HBase/Solr installation, an self-contained HBase/Solr installation (embedded 
> mode) or BerkeleyDB/Elastic Search installation.
> When choosing either of the first two modes, we can potentially remove the 
> Titan berkeley DB or elastic search jars. This helps distributions which have 
> restrictions on using these jars from a contractual perspective.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ATLAS-904) Hive hook fails due to session state not being set

2016-06-19 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated ATLAS-904:
---
Attachment: ATLAS-904.2.patch

Changes to address [~yhemanth] review comments. 

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying  INSERT, 
INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding 
WriteEntity.WriteType as well which exhibits the following behaviour

a. If there are multiple outputs, for each output, adds the query 
type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION],  
WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - 
linage is not available for this since this is single table operation]





> Hive hook fails due to session state not being set
> --
>
> Key: ATLAS-904
> URL: https://issues.apache.org/jira/browse/ATLAS-904
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-904.1.patch, ATLAS-904.2.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:normalize(557)) - Could not rewrite query due to error. 
> Proceeding with original query EXPORT TABLE test_export_table to 
> 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be 
> non-null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:133)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:120)
>   at 
> org.apache.atlas.hive.rewrite.HiveASTRewriter.(HiveASTRewriter.java:44)
>   at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:run(184)) - Atlas hook failed due to error
> java.lang.NullPointerException
>   at java.lang.StringBuilder.(StringBuilder.java:109)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessQualifiedName(HiveHook.java:738)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:703)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ATLAS-904) Hive hook fails due to session state not being set

2016-06-19 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338949#comment-15338949
 ] 

Suma Shivaprasad edited comment on ATLAS-904 at 6/20/16 3:49 AM:
-

Changes to address [~yhemanth] review comments. 

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying  INSERT, 
INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding 
WriteEntity.WriteType as well which exhibits the following behaviour

a. If there are multiple outputs, for each output, adds the query 
type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION],  
WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - 
linage is not available for this since this is single table operation]
3.When input is of type local dir or hdfs path currently, it doesnt add it to 
qualified name. The reason is that partition based paths cause a lot of 
processes to be created in this case instead of updating the same process.


Pending:

Address [~shwethags] suggestion to add hdfs paths to process qualified name 
only in case of non-partition based queries. This needs to be done per 
HiveOperation type

1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query 
context is dealing with partitions and do not add if it is partition based.
2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the 
query context is dealing with a partitioned table in inputs and decide if we 
need to add or not.









was (Author: suma.shivaprasad):
Changes to address [~yhemanth] review comments. 

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying  INSERT, 
INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding 
WriteEntity.WriteType as well which exhibits the following behaviour

a. If there are multiple outputs, for each output, adds the query 
type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION],  
WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - 
linage is not available for this since this is single table operation]





> Hive hook fails due to session state not being set
> --
>
> Key: ATLAS-904
> URL: https://issues.apache.org/jira/browse/ATLAS-904
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-904.1.patch, ATLAS-904.2.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:normalize(557)) - Could not rewrite query due to error. 
> Proceeding with original query EXPORT TABLE test_export_table to 
> 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be 
> non-null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:133)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:120)
>   at 
> org.apache.atlas.hive.rewrite.HiveASTRewriter.(HiveASTRewriter.java:44)
>   at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:run(184)) - Atlas hook failed due to error
> java.lang.NullPointerException
>   at java.lang.StringBuilder.(StringBuilder.java:109)
>  

Review Request 48939: ATLAS-904 Handle process qualified name per Hive Operation

2016-06-19 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48939/
---

Review request for atlas, Shwetha GS and Hemanth Yamijala.


Repository: atlas


Description
---

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying INSERT, 
INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding 
WriteEntity.WriteType as well which exhibits the following behaviour
a. If there are multiple outputs, for each output, adds the query 
type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION], 
WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - 
linage is not available for this since this is single table operation]
3.When input is of type local dir or hdfs path currently, it doesnt add it to 
qualified name. The reason is that partition based paths cause a lot of 
processes to be created in this case instead of updating the same process.
Pending:
Address Shwetha G S suggestion to add hdfs paths to process qualified name only 
in case of non-partition based queries. This needs to be done per HiveOperation 
type
1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query 
context is dealing with partitions and do not add if it is partition based.
2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the 
query context is dealing with a partitioned table in inputs and decide if we 
need to add or not.


Diffs
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 c956a32 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
23c82df 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e7fbf71 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 
0713d30 

Diff: https://reviews.apache.org/r/48939/diff/


Testing
---

Existing tests modified to query with new qualified name. Need to add tests for 
INSERT INTO TABLE


Thanks,

Suma Shivaprasad



[jira] [Commented] (ATLAS-904) Hive hook fails due to session state not being set

2016-06-19 Thread Suma Shivaprasad (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338961#comment-15338961
 ] 

Suma Shivaprasad commented on ATLAS-904:


https://reviews.apache.org/r/48939

> Hive hook fails due to session state not being set
> --
>
> Key: ATLAS-904
> URL: https://issues.apache.org/jira/browse/ATLAS-904
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-904.1.patch, ATLAS-904.2.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:normalize(557)) - Could not rewrite query due to error. 
> Proceeding with original query EXPORT TABLE test_export_table to 
> 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be 
> non-null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:133)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:120)
>   at 
> org.apache.atlas.hive.rewrite.HiveASTRewriter.(HiveASTRewriter.java:44)
>   at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:run(184)) - Atlas hook failed due to error
> java.lang.NullPointerException
>   at java.lang.StringBuilder.(StringBuilder.java:109)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessQualifiedName(HiveHook.java:738)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:703)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48939: ATLAS-904 Handle process qualified name per Hive Operation

2016-06-19 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48939/
---

(Updated June 20, 2016, 4 a.m.)


Review request for atlas, Shwetha GS and Hemanth Yamijala.


Bugs: ATLAS-904
https://issues.apache.org/jira/browse/ATLAS-904


Repository: atlas


Description
---

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying INSERT, 
INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding 
WriteEntity.WriteType as well which exhibits the following behaviour
a. If there are multiple outputs, for each output, adds the query 
type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION], 
WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - 
linage is not available for this since this is single table operation]
3.When input is of type local dir or hdfs path currently, it doesnt add it to 
qualified name. The reason is that partition based paths cause a lot of 
processes to be created in this case instead of updating the same process.
Pending:
Address Shwetha G S suggestion to add hdfs paths to process qualified name only 
in case of non-partition based queries. This needs to be done per HiveOperation 
type
1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query 
context is dealing with partitions and do not add if it is partition based.
2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the 
query context is dealing with a partitioned table in inputs and decide if we 
need to add or not.


Diffs
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 c956a32 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
23c82df 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e7fbf71 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 
0713d30 

Diff: https://reviews.apache.org/r/48939/diff/


Testing
---

Existing tests modified to query with new qualified name. Need to add tests for 
INSERT INTO TABLE


Thanks,

Suma Shivaprasad



[jira] [Updated] (ATLAS-904) Hive hook fails due to session state not being set

2016-06-19 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated ATLAS-904:
---
Attachment: (was: ATLAS-904.2.patch)

> Hive hook fails due to session state not being set
> --
>
> Key: ATLAS-904
> URL: https://issues.apache.org/jira/browse/ATLAS-904
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-904.1.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:normalize(557)) - Could not rewrite query due to error. 
> Proceeding with original query EXPORT TABLE test_export_table to 
> 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be 
> non-null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:133)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:120)
>   at 
> org.apache.atlas.hive.rewrite.HiveASTRewriter.(HiveASTRewriter.java:44)
>   at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:run(184)) - Atlas hook failed due to error
> java.lang.NullPointerException
>   at java.lang.StringBuilder.(StringBuilder.java:109)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessQualifiedName(HiveHook.java:738)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:703)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 48939: ATLAS-904 Handle process qualified name per Hive Operation

2016-06-19 Thread Suma Shivaprasad

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48939/
---

(Updated June 20, 2016, 4 a.m.)


Review request for atlas, Shwetha GS and Hemanth Yamijala.


Bugs: ATLAS-904
https://issues.apache.org/jira/browse/ATLAS-904


Repository: atlas


Description
---

1. Process qualified name = HiveOperation.name + sorted inputs + sorted outputs
2. HiveOperation.name doesnt provide identifiers for identiifying INSERT, 
INSERT_OVERWRITE, UPDATE, DELETE etc separately . Hence adding 
WriteEntity.WriteType as well which exhibits the following behaviour
a. If there are multiple outputs, for each output, adds the query 
type(WriteType)
b. if query being run if is type INSERT [into/overwrite] TABLE [PARTITION], 
WriteType is INSERT/INSERT_OVERWRITE
b. If query is of type INSERT OVERWRITE hdfs_path, adds WriteType as PATH_WRITE
c. If query is of type UPDATE/DELETE, adds type as UPDATE/DELETE [ Note - 
linage is not available for this since this is single table operation]
3.When input is of type local dir or hdfs path currently, it doesnt add it to 
qualified name. The reason is that partition based paths cause a lot of 
processes to be created in this case instead of updating the same process.
Pending:
Address Shwetha G S suggestion to add hdfs paths to process qualified name only 
in case of non-partition based queries. This needs to be done per HiveOperation 
type
1. if HiveOperation = LOAD, IMPORT, EXPORT - detect if the current query 
context is dealing with partitions and do not add if it is partition based.
2. If HiveOperation = INSERT OVERWRITE DFS_PATH/LOCAL_PATH , then detect if the 
query context is dealing with a partitioned table in inputs and decide if we 
need to add or not.


Diffs (updated)
-

  
addons/hive-bridge/src/main/java/org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java
 c956a32 
  addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java 
23c82df 
  addons/hive-bridge/src/test/java/org/apache/atlas/hive/hook/HiveHookIT.java 
e7fbf71 
  webapp/src/main/java/org/apache/atlas/web/resources/EntityResource.java 
0713d30 

Diff: https://reviews.apache.org/r/48939/diff/


Testing
---

Existing tests modified to query with new qualified name. Need to add tests for 
INSERT INTO TABLE


Thanks,

Suma Shivaprasad



[jira] [Updated] (ATLAS-904) Hive hook fails due to session state not being set

2016-06-19 Thread Suma Shivaprasad (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suma Shivaprasad updated ATLAS-904:
---
Attachment: ATLAS-904.2.patch

> Hive hook fails due to session state not being set
> --
>
> Key: ATLAS-904
> URL: https://issues.apache.org/jira/browse/ATLAS-904
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-904.1.patch, ATLAS-904.2.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:normalize(557)) - Could not rewrite query due to error. 
> Proceeding with original query EXPORT TABLE test_export_table to 
> 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be 
> non-null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:133)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:120)
>   at 
> org.apache.atlas.hive.rewrite.HiveASTRewriter.(HiveASTRewriter.java:44)
>   at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-06-15 11:34:30,423 ERROR [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:run(184)) - Atlas hook failed due to error
> java.lang.NullPointerException
>   at java.lang.StringBuilder.(StringBuilder.java:109)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessQualifiedName(HiveHook.java:738)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:703)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:222)
>   at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:77)
>   at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:182)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ATLAS-904) Hive hook fails due to session state not being set

2016-06-19 Thread ATLAS QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338991#comment-15338991
 ] 

ATLAS QA commented on ATLAS-904:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12811744/ATLAS-904.2.patch
  against master revision 436a524.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

+1 checkstyle.  The patch generated 0 code style errors.

{color:red}-1 findbugs{color}.  The patch appears to introduce 379 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.atlas.repository.typestore.GraphBackedTypeStoreTest

Test results: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningswebapp.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningsauthorization.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningscommon.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningssqoop-bridge.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningshdfs-model.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningsstorm-bridge.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningsfalcon-bridge.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningshive-bridge.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningsrepository.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningstypesystem.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningscatalog.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningsclient.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningsnotification.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ATLAS-Build/325//artifact/patchprocess/newPatchFindbugsWarningstitan.html
Console output: https://builds.apache.org/job/PreCommit-ATLAS-Build/325//console

This message is automatically generated.

> Hive hook fails due to session state not being set
> --
>
> Key: ATLAS-904
> URL: https://issues.apache.org/jira/browse/ATLAS-904
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating
>Reporter: Suma Shivaprasad
>Assignee: Suma Shivaprasad
>Priority: Blocker
> Fix For: 0.7-incubating
>
> Attachments: ATLAS-904.1.patch, ATLAS-904.2.patch, ATLAS-904.patch
>
>
> {noformat}
> 2016-06-15 11:34:30,423 WARN  [Atlas Logger 0]: hook.HiveHook 
> (HiveHook.java:normalize(557)) - Could not rewrite query due to error. 
> Proceeding with original query EXPORT TABLE test_export_table to 
> 'hdfs://localhost:9000/hive_tables/test_path1'
> java.lang.NullPointerException: Conf non-local session path expected to be 
> non-null
>   at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getHDFSSessionPath(SessionState.java:641)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:133)
>   at org.apache.hadoop.hive.ql.Context.(Context.java:120)
>   at 
> org.apache.atlas.hive.rewrite.HiveASTRewriter.(HiveASTRewriter.java:44)
>   at org.apache.atlas.hive.hook.HiveHook.normalize(HiveHook.java:554)
>   at 
> org.apache.atlas.hive.hook.HiveHook.getProcessReferenceable(HiveHook.java:702)
>   at 
> org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:596)
>   at