[jira] [Commented] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2

2018-08-10 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575843#comment-16575843
 ] 

Chengbing Liu commented on ATLAS-2816:
--

Thanks [~apoorvnaik] for review! Uploaded a new patch.

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> 
>
> Key: ATLAS-2816
> URL: https://issues.apache.org/jira/browse/ATLAS-2816
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chengbing Liu
>Assignee: Apoorv Naik
>Priority: Major
> Attachments: ATLAS-2816.01.patch, ATLAS-2816.02.patch
>
>
> We encountered a problem when using Hive bridge in production. One database 
> has 5000+ tables. Importing the first table costs only tens of milliseconds, 
> and then it becomes slower with more tables. In the end, it costs 1~2 seconds 
> to import one table.
> After investigation, we realized that it is not necessary for the 
> {{FullTextMapperV2}} to retrieve all the relationship of the database each 
> time a table is imported. The time complexity of importing a whole database 
> actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: 
> {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will 
> skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will 
> not use relationship attributes of the entity, this can save plenty of time 
> when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2

2018-08-10 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated ATLAS-2816:
-
Attachment: ATLAS-2816.02.patch

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> 
>
> Key: ATLAS-2816
> URL: https://issues.apache.org/jira/browse/ATLAS-2816
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chengbing Liu
>Assignee: Apoorv Naik
>Priority: Major
> Attachments: ATLAS-2816.01.patch, ATLAS-2816.02.patch
>
>
> We encountered a problem when using Hive bridge in production. One database 
> has 5000+ tables. Importing the first table costs only tens of milliseconds, 
> and then it becomes slower with more tables. In the end, it costs 1~2 seconds 
> to import one table.
> After investigation, we realized that it is not necessary for the 
> {{FullTextMapperV2}} to retrieve all the relationship of the database each 
> time a table is imported. The time complexity of importing a whole database 
> actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: 
> {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will 
> skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will 
> not use relationship attributes of the entity, this can save plenty of time 
> when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2

2018-08-09 Thread Chengbing Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated ATLAS-2816:
-
Attachment: ATLAS-2816.01.patch

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> 
>
> Key: ATLAS-2816
> URL: https://issues.apache.org/jira/browse/ATLAS-2816
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chengbing Liu
>Priority: Major
> Attachments: ATLAS-2816.01.patch
>
>
> We encountered a problem when using Hive bridge in production. One database 
> has 5000+ tables. Importing the first table costs only tens of milliseconds, 
> and then it becomes slower with more tables. In the end, it costs 1~2 seconds 
> to import one table.
> After investigation, we realized that it is not necessary for the 
> {{FullTextMapperV2}} to retrieve all the relationship of the database each 
> time a table is imported. The time complexity of importing a whole database 
> actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: 
> {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will 
> skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will 
> not use relationship attributes of the entity, this can save plenty of time 
> when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2

2018-08-09 Thread Chengbing Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/ATLAS-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575686#comment-16575686
 ] 

Chengbing Liu commented on ATLAS-2816:
--

[~apoorvnaik], I just found ATLAS-2815 removes 
{{mapRelationshipAttributes(entityVertex, entity)}} and then adds it back, 
looks like it's an accidental change?
 I will provide a patch based on the latest code today.

> Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2
> 
>
> Key: ATLAS-2816
> URL: https://issues.apache.org/jira/browse/ATLAS-2816
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Chengbing Liu
>Priority: Major
>
> We encountered a problem when using Hive bridge in production. One database 
> has 5000+ tables. Importing the first table costs only tens of milliseconds, 
> and then it becomes slower with more tables. In the end, it costs 1~2 seconds 
> to import one table.
> After investigation, we realized that it is not necessary for the 
> {{FullTextMapperV2}} to retrieve all the relationship of the database each 
> time a table is imported. The time complexity of importing a whole database 
> actually goes to O(n^2) (n is number of tables).
> We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: 
> {{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will 
> skip the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will 
> not use relationship attributes of the entity, this can save plenty of time 
> when importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2816) Allow ignoring relationship in EntityGraphRetriever for FullTextMapperV2

2018-08-08 Thread Chengbing Liu (JIRA)
Chengbing Liu created ATLAS-2816:


 Summary: Allow ignoring relationship in EntityGraphRetriever for 
FullTextMapperV2
 Key: ATLAS-2816
 URL: https://issues.apache.org/jira/browse/ATLAS-2816
 Project: Atlas
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Chengbing Liu


We encountered a problem when using Hive bridge in production. One database has 
5000+ tables. Importing the first table costs only tens of milliseconds, and 
then it becomes slower with more tables. In the end, it costs 1~2 seconds to 
import one table.

After investigation, we realized that it is not necessary for the 
{{FullTextMapperV2}} to retrieve all the relationship of the database each time 
a table is imported. The time complexity of importing a whole database actually 
goes to O(n^2) (n is number of tables).

We propose to add a parameter to the constructor of {{EntityGraphRetriever}}: 
{{ignoreRelationship}}. When set to true, {{mapVertexToAtlasEntity}} will skip 
the {{mapRelationshipAttributes}} call. Since {{FullTextMapperV2}} will not use 
relationship attributes of the entity, this can save plenty of time when 
importing entities with a large number of relations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ATLAS-2541) Add hbase-server jar for Hive hook packaging

2018-04-09 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated ATLAS-2541:
-
Attachment: ATLAS-2541.01.patch

> Add hbase-server jar for Hive hook packaging
> 
>
> Key: ATLAS-2541
> URL: https://issues.apache.org/jira/browse/ATLAS-2541
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.0.0-alpha
>Reporter: Chengbing Liu
>Priority: Major
> Attachments: ATLAS-2541.01.patch
>
>
> When importing Hive metadata using Hive bridge, a NoClassDefFoundError 
> exception was thrown:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
>     at java.lang.ClassLoader.defineClass1(Native Method)
>     at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>     at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>     at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321)
>     at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 23 more
> {code}
> The cause is not having 
> {{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. 
> Currently we have only hbase-common jar on hive hook packaging. Simply adding 
> hbase-server jar solves the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ATLAS-2540) Hive bridge should include hbase-server as runtime dependency

2018-04-08 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu resolved ATLAS-2540.
--
Resolution: Duplicate

Duplicated due to network issue.

> Hive bridge should include hbase-server as runtime dependency
> -
>
> Key: ATLAS-2540
> URL: https://issues.apache.org/jira/browse/ATLAS-2540
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.0.0-alpha
>Reporter: Chengbing Liu
>Priority: Major
>
> When importing Hive metadata using Hive bridge, the following exception was 
> thrown:
> {code:java}
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
>     at java.lang.ClassLoader.defineClass1(Native Method)
>     at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
>     at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>     at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>     at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:348)
>     at 
> org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321)
>     at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040)
>     at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146)
>     at 
> org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     ... 23 more
> {code}
> The cause is not having 
> {{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. 
> Currently we have only hbase-common in the dependency list. Simply adding 
> hbase-server to the dependency list solves the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2541) Add hbase-server jar for Hive hook packaging

2018-04-08 Thread Chengbing Liu (JIRA)
Chengbing Liu created ATLAS-2541:


 Summary: Add hbase-server jar for Hive hook packaging
 Key: ATLAS-2541
 URL: https://issues.apache.org/jira/browse/ATLAS-2541
 Project: Atlas
  Issue Type: Bug
Affects Versions: 1.0.0-alpha
Reporter: Chengbing Liu


When importing Hive metadata using Hive bridge, a NoClassDefFoundError 
exception was thrown:

{code:java}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at 
org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321)
    at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 23 more
{code}

The cause is not having 
{{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. 
Currently we have only hbase-common jar on hive hook packaging. Simply adding 
hbase-server jar solves the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ATLAS-2540) Hive bridge should include hbase-server as runtime dependency

2018-04-08 Thread Chengbing Liu (JIRA)
Chengbing Liu created ATLAS-2540:


 Summary: Hive bridge should include hbase-server as runtime 
dependency
 Key: ATLAS-2540
 URL: https://issues.apache.org/jira/browse/ATLAS-2540
 Project: Atlas
  Issue Type: Bug
Affects Versions: 1.0.0-alpha
Reporter: Chengbing Liu


When importing Hive metadata using Hive bridge, the following exception was 
thrown:

{code:java}
Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/mapreduce/TableInputFormatBase
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:760)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at 
org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:321)
    at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:197)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1040)
    at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:973)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTable(HiveMetaStoreBridge.java:300)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importTables(HiveMetaStoreBridge.java:284)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importDatabases(HiveMetaStoreBridge.java:155)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.importHiveMetadata(HiveMetaStoreBridge.java:146)
    at 
org.apache.atlas.hive.bridge.HiveMetaStoreBridge.main(HiveMetaStoreBridge.java:659)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 23 more
{code}

The cause is not having 
{{org.apache.hadoop.hbase.mapreduce.TableInputFormatBase}} on the classpath. 
Currently we have only hbase-common in the dependency list. Simply adding 
hbase-server to the dependency list solves the problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ATLAS-1175) Type updates should allow removal of optional attributes

2018-03-23 Thread Chengbing Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411307#comment-16411307
 ] 

Chengbing Liu commented on ATLAS-1175:
--

Hi folks, is there any plan to support this? It could be really annoying if you 
cannot delete attributes in a production environment...

The background cleanup thread is reasonable to me.

> Type updates should allow removal of optional attributes
> 
>
> Key: ATLAS-1175
> URL: https://issues.apache.org/jira/browse/ATLAS-1175
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 0.7-incubating, 0.8-incubating
>Reporter: Suma Shivaprasad
>Assignee: Sarath Subramanian
>Priority: Major
>
> Currently optional attributes are not allowed to be removed from a given 
> type. This should be allowed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)