Re: Please add me as a contributor to Atlas

2019-09-03 Thread Madhan Neethiraj
Bolke,

Thanks for your interest in contributing to Apache Atlas. You have been added 
as a contributor. Welcome to Apache Atlas community!

Regards,
Madhan

On 9/3/19, 4:02 PM, "Bolke de Bruin"  wrote:

Hello!

Re subject. Can someone please add me as a contributor? Apache id “bolke”.

Cheers
Bolke.





[jira] [Commented] (ATLAS-3393) Fix inconsistency with attribute createtime in aws_s3_bucket

2019-09-03 Thread Madhan Neethiraj (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921890#comment-16921890
 ] 

Madhan Neethiraj commented on ATLAS-3393:
-

[~bolke] - as I mentioned earlier, changing name of 
attribute/entity-def/struct-def/enum-def/enum-element will likely to break any 
application that use the earlier name. It isn't difficult to see the impact of 
such rename: consider renaming the other way i.e. 'createTime' -> 'createtime'; 
this attribute is referenced in following entity-defs; this rename would break 
existing integrations including Hive Hook, HBase hook, Spark hook and other 
applications that deal with these types.

{noformat}
aws_s3_object
aws_s3_pseudo_dir
fs_path
hbase_column_family
hbase_namespace
hbase_table
hive_table
spark_table
rdbms_table
{noformat}

Hence I suggest to not take up this attribute rename. 

bq. we encountered this bug due to integration issues with our tooling 
expecting "createTime" as it is used everywhere else
If the tooling referenced above deals with all entity-defs (existing and future 
defs), consider an option to look for multiple attribute names for the concept 
of 'createTime'.


> Fix inconsistency with attribute createtime in aws_s3_bucket
> 
>
> Key: ATLAS-3393
> URL: https://issues.apache.org/jira/browse/ATLAS-3393
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Bolke de Bruin
>Priority: Critical
>
> attributes are in camelcase. in aws_s3_bukcket createtime isnt
>  
> (fix available in linked issue/pr)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Please add me as a contributor to Atlas

2019-09-03 Thread Bolke de Bruin
Hello!

Re subject. Can someone please add me as a contributor? Apache id “bolke”.

Cheers
Bolke.


[jira] [Commented] (ATLAS-3393) Fix inconsistency with attribute createtime in aws_s3_bucket

2019-09-03 Thread Bolke de Bruin (Jira)


[ 
https://issues.apache.org/jira/browse/ATLAS-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921622#comment-16921622
 ] 

Bolke de Bruin commented on ATLAS-3393:
---

ping [~madhan.neethiraj]

> Fix inconsistency with attribute createtime in aws_s3_bucket
> 
>
> Key: ATLAS-3393
> URL: https://issues.apache.org/jira/browse/ATLAS-3393
> Project: Atlas
>  Issue Type: Bug
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Bolke de Bruin
>Priority: Critical
>
> attributes are in camelcase. in aws_s3_bukcket createtime isnt
>  
> (fix available in linked issue/pr)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ATLAS-3397) Remove Solr6Index and use upstream version

2019-09-03 Thread Bolke de Bruin (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated ATLAS-3397:
--
Summary: Remove Solr6Index and use upstream version  (was: Push Solr6Index 
improvements upstream and use Janus' version)

> Remove Solr6Index and use upstream version
> --
>
> Key: ATLAS-3397
> URL: https://issues.apache.org/jira/browse/ATLAS-3397
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0
>Reporter: Bolke de Bruin
>Priority: Critical
>
> Solr6Index has changes to support Kerberos and multiple Zookeeper clients. 
> There are several issues with Atlas' version:
>  * LICENSE for this file is incorrect as it is not licensed to the ASF. The 
> file says it is copied from Janus.
>  * It is outdated and is not benefitting from upstream improvements (like 
> indexed OrderBy of Janus 0.4)
>  * Kerberos has been integrated into Janus' version
>  * multiple Zookeeper clients have been implemented (although still using 
> ZOOKEEPER_URL, but it supports multiple entries: 
> [https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-solr/src/main/java/org/janusgraph/diskstorage/solr/SolrIndex.java#L176]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (ATLAS-3397) Push Solr6Index improvements upstream and use Janus' version

2019-09-03 Thread Bolke de Bruin (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bolke de Bruin updated ATLAS-3397:
--
Description: 
Solr6Index has changes to support Kerberos and multiple Zookeeper clients. 
There are several issues with Atlas' version:
 * LICENSE for this file is incorrect as it is not licensed to the ASF. The 
file says it is copied from Janus.
 * It is outdated and is not benefitting from upstream improvements (like 
indexed OrderBy of Janus 0.4)
 * Kerberos has been integrated into Janus' version
 * multiple Zookeeper clients have been implemented (although still using 
ZOOKEEPER_URL, but it supports multiple entries: 
[https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-solr/src/main/java/org/janusgraph/diskstorage/solr/SolrIndex.java#L176]

 

  was:
Solr6Index has changes to support Kerberos and multiple Zookeeper clients. 
There are several issues with Atlas' version:
 * LICENSE for this file is incorrect as it is not licensed to the ASF.
 * It is outdated and is not benefitting from upstream improvements (like 
indexed OrderBy of Janus 0.4)
 * Kerberos has been integrated into Janus' version
 * multiple Zookeeper clients have been implemented (although still using 
ZOOKEEPER_URL, but it supports multiple entries: 
[https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-solr/src/main/java/org/janusgraph/diskstorage/solr/SolrIndex.java#L176]

 


> Push Solr6Index improvements upstream and use Janus' version
> 
>
> Key: ATLAS-3397
> URL: https://issues.apache.org/jira/browse/ATLAS-3397
> Project: Atlas
>  Issue Type: Improvement
>  Components:  atlas-core
>Affects Versions: 2.0.0
>Reporter: Bolke de Bruin
>Priority: Critical
>
> Solr6Index has changes to support Kerberos and multiple Zookeeper clients. 
> There are several issues with Atlas' version:
>  * LICENSE for this file is incorrect as it is not licensed to the ASF. The 
> file says it is copied from Janus.
>  * It is outdated and is not benefitting from upstream improvements (like 
> indexed OrderBy of Janus 0.4)
>  * Kerberos has been integrated into Janus' version
>  * multiple Zookeeper clients have been implemented (although still using 
> ZOOKEEPER_URL, but it supports multiple entries: 
> [https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-solr/src/main/java/org/janusgraph/diskstorage/solr/SolrIndex.java#L176]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Review Request 71426: Import Service: Support for Backing Directory for Large Imports

2019-09-03 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71426/#review217553
---




docs/src/site/twiki/Import-API-Options.twiki
Lines 149 (patched)


- please consider avoiding the client to specify a directory to write at 
the server side. The location must be configured at the server-side.
- if use of 'backingDirectory' benefits all imports, can this approach be 
used for all imports, instead of having the client to request for it? In such 
case, consider creating a directory, perhaps named as a generated guid, under 
the configured location - to avoid potential reuse contents across multiple 
requests. Also, care must be taken to delete the contents after the import is 
completed - successfully or unsuccessfully.


- Madhan Neethiraj


On Sept. 3, 2019, 5:24 p.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71426/
> ---
> 
> (Updated Sept. 3, 2019, 5:24 p.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, 
> and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-3396
> https://issues.apache.org/jira/browse/ATLAS-3396
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Background**
> The approach adds another option to _ImportService_ to be able to optionally 
> use a directory on the server for storing data to be imported.
> 
> **Approach**
> This is a backward compatible.
> 
> - - New: _ZipSourceWithBackingDirectory_: Uses a temporary directory for 
> storing contents of the zip file.
> - Modified: _ImportService_ Now takes _InputStream_ as parameter instead of 
> _ZipSource_. 
> - Modified: _ImportRequest_ Additionl option takes _backingDirectory_ as 
> additional option. Specifying this will use the new 
> _ZipSourceWithBackingDirectory_ during import.
>  
> 
> **CURL**
> 
> Use pre-configured temporary directory:
> ```
> {
> "options": {
> "backingDirectory": "/grid/0/temp"
> }
> }
> ```
> 
> To use default temp directory:
> 
> ```
> {
> "options": {
> "backingDirectory": ""
> }
> }
> 
> ``
> 
> _CURL_
> ```
> curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H 
> "Cache-Control: no-cache" -F request=@../docs/import-options.json -F 
> data=@../docs/smalldb.zip http://localhost:21000/api/atlas/admin/import
> ```
> 
> **Documentation**
> Updated.
> 
> 
> Diffs
> -
> 
>   docs/src/site/twiki/Import-API-Options.twiki 4004e7013 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 
> 6a9985642 
>   
> repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java
>  b5d8b7c39 
>   
> repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceWithBackingDirectory.java
>  PRE-CREATION 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v1/AtlasEntityStreamForImport.java
>  90ae15d1e 
>   
> repository/src/main/java/org/apache/atlas/repository/store/graph/v1/EntityImportStream.java
>  d4b6c5505 
>   
> repository/src/test/java/org/apache/atlas/repository/impexp/ExportIncrementalTest.java
>  a355297a8 
>   
> repository/src/test/java/org/apache/atlas/repository/impexp/ExportSkipLineageTest.java
>  eaf4602d5 
>   
> repository/src/test/java/org/apache/atlas/repository/impexp/ImportServiceTest.java
>  693a163f8 
>   
> repository/src/test/java/org/apache/atlas/repository/impexp/ReplicationEntityAttributeTest.java
>  868b732d3 
>   
> repository/src/test/java/org/apache/atlas/repository/impexp/ZipFileResourceTestUtils.java
>  a2a5f58dc 
>   webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
> 8417e7eb0 
> 
> 
> Diff: https://reviews.apache.org/r/71426/diff/2/
> 
> 
> Testing
> ---
> 
> **Unit tests**
> Updated to exercise the new implementation.
> 
> **Volume tests**
> Imported 4GB ZIP file that took 26 hours. Memory and CPU stayed in normal 
> range during this operation.
> 
> **Functional tests**
> Accuracy tests preformed on small size data.
> 
> **Pre-commit Build**
> https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1379/console
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Re: Review Request 71426: Import Service: Support for Backing Directory for Large Imports

2019-09-03 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71426/
---

(Updated Sept. 3, 2019, 5:24 p.m.)


Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and 
Sarath Subramanian.


Changes
---

Updates include: Minor refactoring.


Bugs: ATLAS-3396
https://issues.apache.org/jira/browse/ATLAS-3396


Repository: atlas


Description
---

**Background**
The approach adds another option to _ImportService_ to be able to optionally 
use a directory on the server for storing data to be imported.

**Approach**
This is a backward compatible.

- - New: _ZipSourceWithBackingDirectory_: Uses a temporary directory for 
storing contents of the zip file.
- Modified: _ImportService_ Now takes _InputStream_ as parameter instead of 
_ZipSource_. 
- Modified: _ImportRequest_ Additionl option takes _backingDirectory_ as 
additional option. Specifying this will use the new 
_ZipSourceWithBackingDirectory_ during import.
 

**CURL**

Use pre-configured temporary directory:
```
{
"options": {
"backingDirectory": "/grid/0/temp"
}
}
```

To use default temp directory:

```
{
"options": {
"backingDirectory": ""
}
}

``

_CURL_
```
curl -v -X POST -u admin:admin -H "Content-Type: multipart/form-data" -H 
"Cache-Control: no-cache" -F request=@../docs/import-options.json -F 
data=@../docs/smalldb.zip http://localhost:21000/api/atlas/admin/import
```

**Documentation**
Updated.


Diffs (updated)
-

  docs/src/site/twiki/Import-API-Options.twiki 4004e7013 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 
6a9985642 
  
repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 
b5d8b7c39 
  
repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceWithBackingDirectory.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v1/AtlasEntityStreamForImport.java
 90ae15d1e 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v1/EntityImportStream.java
 d4b6c5505 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ExportIncrementalTest.java
 a355297a8 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ExportSkipLineageTest.java
 eaf4602d5 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ImportServiceTest.java
 693a163f8 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ReplicationEntityAttributeTest.java
 868b732d3 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ZipFileResourceTestUtils.java
 a2a5f58dc 
  webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
8417e7eb0 


Diff: https://reviews.apache.org/r/71426/diff/2/

Changes: https://reviews.apache.org/r/71426/diff/1-2/


Testing
---

**Unit tests**
Updated to exercise the new implementation.

**Volume tests**
Imported 4GB ZIP file that took 26 hours. Memory and CPU stayed in normal range 
during this operation.

**Functional tests**
Accuracy tests preformed on small size data.

**Pre-commit Build**
https://builds.apache.org/view/A/view/Atlas/job/PreCommit-ATLAS-Build-Test/1379/console


Thanks,

Ashutosh Mestry



[jira] [Updated] (ATLAS-3396) ZipSource: Usage of In-memory Maps Limits the Size of Imports Performed

2019-09-03 Thread Ashutosh Mestry (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-3396:
---
Attachment: 
ATLAS-3396-ZipSource-Memory-usage-improvement-for-la-branch-0.8.patch

> ZipSource: Usage of In-memory Maps Limits the Size of Imports Performed
> ---
>
> Key: ATLAS-3396
> URL: https://issues.apache.org/jira/browse/ATLAS-3396
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.4
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: 0.8.4, trunk
>
> Attachments: 
> ATLAS-3396-ZipSource-Memory-usage-improvement-for-la-branch-0.8.patch
>
>
> *Background*
> _ZipSource_ is the container for working with Atlas-exported ZIP files. The 
> current implementation fetches the contents of the entire ZIP file in memory. 
> This greatly limits the size of ZIP files that can be imported.
> *Solution Guidance*
> New class _ZipSourceWIthBackingDirectory_, this will use a user-specified (or 
> default) temporary directory to preserve the contents of the ZIP file. This 
> will reduce the memory burden and large ZIP files can be imported.
> Minor refactoring to _ImportService_ will be necessary to support the new 
> variation of _EntityImportStream._
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Review Request 71426: Import Service: Support for Backing Directory for Large Imports

2019-09-03 Thread Ashutosh Mestry via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71426/
---

Review request for atlas, Madhan Neethiraj, Nikhil Bonte, Nixon Rodrigues, and 
Sarath Subramanian.


Bugs: ATLAS-3396
https://issues.apache.org/jira/browse/ATLAS-3396


Repository: atlas


Description
---

**Background**
The approach adds another option to _ImportService_ to be able to optionally 
use a directory on the server for storing data to be imported.

**Approach**
This is a backward compatible.

- - New: _ZipSourceWithBackingDirectory_: Uses a temporary directory for 
storing contents of the zip file.
- Modified: _ImportService_ Now takes _InputStream_ as parameter instead of 
_ZipSource_. 
- Modified: _ImportRequest_ Additionl option takes _backingDirectory_ as 
additional option. Specifying this will use the new 
_ZipSourceWithBackingDirectory_ during import.
 
**Documentation**
Updated.


Diffs
-

  docs/src/site/twiki/Import-API-Options.twiki 4004e7013 
  intg/src/main/java/org/apache/atlas/model/impexp/AtlasImportRequest.java 
6a9985642 
  
repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java 
b5d8b7c39 
  
repository/src/main/java/org/apache/atlas/repository/impexp/ZipSourceWithBackingDirectory.java
 PRE-CREATION 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v1/AtlasEntityStreamForImport.java
 90ae15d1e 
  
repository/src/main/java/org/apache/atlas/repository/store/graph/v1/EntityImportStream.java
 d4b6c5505 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ExportIncrementalTest.java
 a355297a8 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ExportSkipLineageTest.java
 eaf4602d5 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ImportServiceTest.java
 693a163f8 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ReplicationEntityAttributeTest.java
 868b732d3 
  
repository/src/test/java/org/apache/atlas/repository/impexp/ZipFileResourceTestUtils.java
 a2a5f58dc 
  webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
8417e7eb0 


Diff: https://reviews.apache.org/r/71426/diff/1/


Testing
---

**Unit tests**
Updated to exercise the new implementation.

**Volume tests**
Imported 4GB ZIP file that took 26 hours. Memory and CPU stayed in normal range 
during this operation.

**Functional tests**
Accuracy tests preformed on small size data.


Thanks,

Ashutosh Mestry



[jira] [Updated] (ATLAS-3396) ZipSource: Usage of In-memory Maps Limits the Size of Imports Performed

2019-09-03 Thread Ashutosh Mestry (Jira)


 [ 
https://issues.apache.org/jira/browse/ATLAS-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Mestry updated ATLAS-3396:
---
Attachment: (was: 
ATLAS-3396-ZipSource-Memory-usage-improvement-for-la.patch)

> ZipSource: Usage of In-memory Maps Limits the Size of Imports Performed
> ---
>
> Key: ATLAS-3396
> URL: https://issues.apache.org/jira/browse/ATLAS-3396
> Project: Atlas
>  Issue Type: Bug
>  Components:  atlas-core
>Affects Versions: 0.8.4
>Reporter: Ashutosh Mestry
>Assignee: Ashutosh Mestry
>Priority: Major
> Fix For: 0.8.4, trunk
>
>
> *Background*
> _ZipSource_ is the container for working with Atlas-exported ZIP files. The 
> current implementation fetches the contents of the entire ZIP file in memory. 
> This greatly limits the size of ZIP files that can be imported.
> *Solution Guidance*
> New class _ZipSourceWIthBackingDirectory_, this will use a user-specified (or 
> default) temporary directory to preserve the contents of the ZIP file. This 
> will reduce the memory burden and large ZIP files can be imported.
> Minor refactoring to _ImportService_ will be necessary to support the new 
> variation of _EntityImportStream._
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)