Re: Review Request 57495: Export API: Memory usage optimization

2017-03-10 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57495/#review168686
---


Ship it!




Ship It!

- Madhan Neethiraj


On March 10, 2017, 4:09 a.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57495/
> ---
> 
> (Updated March 10, 2017, 4:09 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-1503
> https://issues.apache.org/jira/browse/ATLAS-1503
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Background**
> Existing implementation of Export REST API uses *ByteArrayOutputStream* to 
> during output zip file creation. This puts pressure on memory when handling 
> large data. Also, the data transfer does not start until entire export is 
> done. This situation is less than ideal for performance.
> 
> **Solution**
> - Passing *ServletOutputStream* to *ZipSink*.
>   - This improves memory usage as memory does not get held up by 
> *ByteArrayOutputStream*. 
>   - Reduces additional copy from *ByteArrayOutputStream* to 
> *ServletOutputSream*.
>   - Simplifies *ZipSink*.
> - Clear internal data structures after operation completion.
>   - This aids, though not much, when freeing up memory used. There is some 
> improvement in large transfers.
> - *ExportService.ExportContext.guidsToProcess* removed sequential lookup from 
> *List* to *Set*.
> - Data transfer from server to client starts much sooner. Client is able to 
> interrupt the progress if needed.
> 
> 
> Diffs
> -
> 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasExportResult.java 
> e6a967e 
>   webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
> 31a4cf9 
>   webapp/src/main/java/org/apache/atlas/web/resources/ExportService.java 
> c1891e0 
>   webapp/src/main/java/org/apache/atlas/web/resources/ZipSink.java 2e4cb01 
> 
> 
> Diff: https://reviews.apache.org/r/57495/diff/2/
> 
> 
> Testing
> ---
> 
> Profiled using *jmap* & *Eclipse MAT*, verified using *YourKit*.
> 
> Verified: *FetchTypes* viz. *full* and *connected*.
> 
> Memory usage: Stays constant on prolonged use. Verified ~3 hrs of continuous 
> runs using medium and large database exports.
> 
> Performance improvement:
> Date | File Size | No. of Entities | Duration (in mins)|
> -|---|-|---|
> 3/02 |   180 MB  |  202930 |29 mins|
> 3/08 |   180 MB  |  202930 |22 mins|
> 3/09 |   180 MB  |  202930 |19 mins|
> 
> About 15% improvement with list & set combined data structures.
> About 30% improvement by eliminating use of *ByteArrayOutputStream*.
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



[jira] [Commented] (ATLAS-1410) V2 Glossary API

2017-03-10 Thread David Radley (JIRA)

[ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905259#comment-15905259
 ] 

David Radley commented on ATLAS-1410:
-

Responses to comments 

Page numbers would help to tie these comments to the document.  <>
Page 2 - Asset type - defined in terms of itself. How are they used? or is this 
not relevant to this paper?  <>
Page 2 - Why do we need to know about V1 and V2? I think it is because the 
current interfaces works with V1 and the new one will work with V2 - it would 
be helpful to state this explicitly. <>
Page 4 - bullets 4-5 - has-a and is-a relationships are semantic relationships. 
<>
Page 4 - missing from list - ability to associate a semantic meaning to a 
classification (v2), trait (v1)?  <>
Page 4 - Missing from the list - "typed-by" relationship to associate terms 
that include meaning in context with terms that describe more pure objects. For 
example Home Address is typed by Address.<>
Page 5 - Figure 1 - I am not comfortable with terms being owned by categories. 
I think each terms should be owned by a glossary and linked into 0, 1 or more 
categories as appropriate. This creates a much simpler deletion rule for the 
API/End user - particularly when you look at Figure 2 where terms are owned by 
multiple categories. IE, delete term from its glossary and it is deleted. In 
the proposed design, it raises such questions as "Is the term deleted when 
unlinked from all categories - or the first category it is linked to?" <>
Page 6 - Figure 3 - I need more detail to understand the "classifies" 
relationship and how it relates to a classification. It seems redundant. Would 
you not relate a term to a classification which is in itself semantically 
classified by its definition term?
Page 6 - Bullet 6) - What is the alternative to using Gremlin queries? <> 
Page 6 - Bullet 7) - is this an incomplete sentence - or does the paragraph 
that follows supposed to be a nested bullet list? Assuming it is a follow on 
point. My confusion is that I do not understand why the term/category hierarchy 
is relevant to the enhancement of classifications? The Classification object is 
defining the type of classification and its meaning is coming from the term? 
<>  Is this suggesting that the relationships between 
classifications is coming from the term relationships in the same way we do 
thin in IGC today? <> If so it may help to show an example? 
<> 
Page 7 - Figure 4 and 5 - what is the difference between "Classification" and 
"Classification Relationship"? <> 
Page 7 - Maybe strange examples - the Glossaries would be for different subject 
areas - for example, there may be a marketing glossary, a customer care 
glossary, a banking glossary. These may be used for associating meaning to data 
assets (ie data assets). there may also be glossaries for different 
regulations, or standard governance approaches, and these may include terms 
that can be used to describe classification for data that drive operational 
governance? <>  
Page 8 - I am not sure what the proposed enhancements are - it just seems to 
list the problems with the current model. All relationships in metadata are 
bi-directional. It should be the default. This mechanism seems complicated. 
Really need to define relationships independent of entities so we can define 
attributes on these relationships. The Classification is actually an example of 
an independently defined relationship that includes the GUID of the 2 entities 
it connects. This should be the common style of relationship. <> 
Page 9 - on discussion point - a Taxonomy is a hierarchy of categories that the 
terms are placed in - I thought this was included in the proposal and we do 
need this for organising terms so that people can find them - and the category 
hierarchies (taxonomies) help to provide context to terms too. Also, the 
semantic relationships discussed would mean we could support a simple ontology. 
<> 
Page 9 - Fully-qualified name - What a grandparent or parent term? What does a 
fully qualified name mean and when is it used? The unique name is its GUID. Its 
path name (there may be many) is the navigation to the term through the 
category hierarchies. <>  
Page 9 - why do Atlas terms need to follow the schema in defined at this link - 
https://www.ibm.com/support/knowledgecenter/en/SSN364_8.8.0/com.ibm.ima.using/comp/vocab/terms_prop.html?
 it seem to imply a lifecycle which is not included in this proposal and a very 
specific modelling of the IBM industry models that have mandatory fields that 
are not always applicable to all glossaries. I think this doc should describe 
the schema of the glossary term explicitly and explain the fields.<>
page 10 - Figure 7 shows the navigation relationships and 1 way. We need to be 
able to navigate from the hive table to its classification to support the GAF. 
<>
Page 11 - Figure 8 - Atlas entities box is hard to 

[jira] [Updated] (ATLAS-1410) V2 Glossary API

2017-03-10 Thread David Radley (JIRA)

 [ 
https://issues.apache.org/jira/browse/ATLAS-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Radley updated ATLAS-1410:

Attachment: Atlas Glossary V2 proposal v1.1.pdf

> V2 Glossary API
> ---
>
> Key: ATLAS-1410
> URL: https://issues.apache.org/jira/browse/ATLAS-1410
> Project: Atlas
>  Issue Type: Improvement
>Reporter: David Radley
>Assignee: David Radley
> Attachments: Atlas Glossary V2 proposal v1.0.pdf, Atlas Glossary V2 
> proposal v1.1.pdf
>
>
> The BaseResourceDefinition uses the AttributeDefintion class from typesystem. 
> There are newer more funcitonal versions of this capability in the atlas-intg 
> project. This Jira is changing over the glossary implementation to the newer 
> entity / type classes.  
> Instread of the instanceProperties and collectionProperties in the 
> BaseResourceDefintions we should use something in this sort of style :  
> "
>  AtlasEntityDef deptTypeDef =
> AtlasTypeUtil.createClassTypeDef(DEPARTMENT_TYPE, 
> "Department"+_description, ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> new AtlasAttributeDef("employees", 
> String.format("array<%s>", "Person"), true,
> AtlasAttributeDef.Cardinality.SINGLE, 0, 1, 
> false, false,
> 
> Collections.emptyList()));
> AtlasEntityDef personTypeDef = 
> AtlasTypeUtil.createClassTypeDef("Person", "Person"+_description, 
> ImmutableSet.of(),
> AtlasTypeUtil.createRequiredAttrDef("name", "string"),
> AtlasTypeUtil.createOptionalAttrDef("address", "Address"),
> AtlasTypeUtil.createOptionalAttrDef("birthday", "date"),
> AtlasTypeUtil.createOptionalAttrDef("hasPets", "boolean"),
> AtlasTypeUtil.createOptionalAttrDef("numberOfCars", "byte"),
> AtlasTypeUtil.createOptionalAttrDef("houseNumber", "short"),
> AtlasTypeUtil.createOptionalAttrDef("carMileage", "int"),
> AtlasTypeUtil.createOptionalAttrDef("age", "float"),
> "
> For the parent child relationships with glossary categories and terms we 
> should be able to have the type system manage edge deletion. As part of this, 
> we will need to investigate whether we could get rid of the disconnect and 
> connect methods added in ATLAS-1186 
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] Release Apache Atlas 0.8 (incubating) - release candidate 0 (dev group vote)

2017-03-10 Thread Hemanth Yamijala
Folks - David / Venkat,

The latest release candidate is RC1. There is a separate thread on this where 
there are a lots of votes. Could you please transfer your votes there so there 
is no confusion.

Madhan, I guess it will be better to [CLOSE] this vote as cancelled.

Thanks
Hemanth

From: Venkat Ranganathan 
Sent: Friday, March 10, 2017 7:16 PM
To: dev@atlas.incubator.apache.org
Subject: Re: [VOTE] Release Apache Atlas 0.8 (incubating) - release candidate 0 
(dev group vote)

+1

Built and Ran tests


Venkat

From: David Radley 
Sent: Friday, March 10, 2017 1:42 AM
To: dev@atlas.incubator.apache.org
Subject: Re: [VOTE] Release Apache Atlas 0.8 (incubating) - release candidate 0 
(dev group vote)

+1 from me.



From:   Madhan Neethiraj 
To: "dev@atlas.incubator.apache.org" 
Date:   07/03/2017 09:54
Subject:[VOTE] Release Apache Atlas 0.8 (incubating) - release
candidate 0 (dev group vote)



Atlas team,



Apache Atlas 0.8 (incubating) release candidate #0 is now available for a
vote within dev community. Links to the release artifacts are given below.
Can you please review and vote?



The vote will be open for at least 72 hours or until necessary votes are
reached.

[ ] +1  approve

[ ] +0  no opinion

[ ] -1  disapprove (and reason why)



Here is my +1



Thanks,

Madhan





List of improvements and issues addressed in this release:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20Atlas%20AND%20status%20%3D%20Resolved%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%200.8-incubating%20ORDER%20BY%20key%20DESC




Git tag for the release:
https://github.com/apache/incubator-atlas/tree/release-0.8-rc0



Sources for the release:
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz




Source release verification:

  PGP Signature:
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz.asc


  MD5 Hash:
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz.md5




Keys to verify the signature of the release artifacts are available at:
https://dist.apache.org/repos/dist/dev/incubator/atlas/KEYS












Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Re: [VOTE] Release Apache Atlas 0.8 (incubating) - release candidate 0 (dev group vote)

2017-03-10 Thread Venkat Ranganathan
+1

Built and Ran tests


Venkat

From: David Radley 
Sent: Friday, March 10, 2017 1:42 AM
To: dev@atlas.incubator.apache.org
Subject: Re: [VOTE] Release Apache Atlas 0.8 (incubating) - release candidate 0 
(dev group vote)

+1 from me.



From:   Madhan Neethiraj 
To: "dev@atlas.incubator.apache.org" 
Date:   07/03/2017 09:54
Subject:[VOTE] Release Apache Atlas 0.8 (incubating) - release
candidate 0 (dev group vote)



Atlas team,



Apache Atlas 0.8 (incubating) release candidate #0 is now available for a
vote within dev community. Links to the release artifacts are given below.
Can you please review and vote?



The vote will be open for at least 72 hours or until necessary votes are
reached.

[ ] +1  approve

[ ] +0  no opinion

[ ] -1  disapprove (and reason why)



Here is my +1



Thanks,

Madhan





List of improvements and issues addressed in this release:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20Atlas%20AND%20status%20%3D%20Resolved%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%200.8-incubating%20ORDER%20BY%20key%20DESC




Git tag for the release:
https://github.com/apache/incubator-atlas/tree/release-0.8-rc0



Sources for the release:
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz




Source release verification:

  PGP Signature:
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz.asc


  MD5 Hash:
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz.md5




Keys to verify the signature of the release artifacts are available at:
https://dist.apache.org/repos/dist/dev/incubator/atlas/KEYS












Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [VOTE] Release Apache Atlas 0.8 (incubating) - release candidate 0 (dev group vote)

2017-03-10 Thread David Radley
+1 from me. 



From:   Madhan Neethiraj 
To: "dev@atlas.incubator.apache.org" 
Date:   07/03/2017 09:54
Subject:[VOTE] Release Apache Atlas 0.8 (incubating) - release 
candidate 0 (dev group vote)



Atlas team,

 

Apache Atlas 0.8 (incubating) release candidate #0 is now available for a 
vote within dev community. Links to the release artifacts are given below. 
Can you please review and vote?

 

The vote will be open for at least 72 hours or until necessary votes are 
reached.

[ ] +1  approve

[ ] +0  no opinion

[ ] -1  disapprove (and reason why)

 

Here is my +1

 

Thanks,

Madhan

 

 

List of improvements and issues addressed in this release: 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20Atlas%20AND%20status%20%3D%20Resolved%20AND%20resolution%20%3D%20Fixed%20AND%20fixVersion%20%3D%200.8-incubating%20ORDER%20BY%20key%20DESC
 


 

Git tag for the release: 
https://github.com/apache/incubator-atlas/tree/release-0.8-rc0

 

Sources for the release: 
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz


 

Source release verification:

  PGP Signature:  
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz.asc


  MD5 Hash: 
https://dist.apache.org/repos/dist/dev/incubator/atlas/0.8-incubating-rc0/apache-atlas-0.8-incubating-sources.tar.gz.md5
 


 

Keys to verify the signature of the release artifacts are available at: 
https://dist.apache.org/repos/dist/dev/incubator/atlas/KEYS

 

 

 

 




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Reg : Column Level Lineage

2017-03-10 Thread Karthik K
Team,

Am working on Hortonworks Sandbox 2.5 where i have atlas .7 version. Am not
able to see column level lineage in that.

Is it available in Atlas .8 ? If so is it available for download.

Thanks & Regards,
K.Karthikeyan


Re: question on the new V2 api...

2017-03-10 Thread David Radley
Hi Ernie,
This seems to work (I did a search in the UI and looked at the flows): 

curl --user admin:admin -H "Content-Type: application/json" -X GET -d 
'{"typeName":"hbase_table"}' 
http://127.0.0.1:21000/api/atlas/v2/search/basic

I am not sure if this is a supported API; as I could not find this 
documented; the search/basic endpoint is not documented in the Swagger 
http://atlas.incubator.apache.org/api/v2/index.html.  

I notice that QuickStartV2Client, the first DSL example is a query just 
with the typename in it - so this should work. 

all the best, David. 






From:   "Ernie Ostic" 
To: atlas 
Date:   09/03/2017 22:32
Subject:question on the new V2 api...





Hi all...

In the v2 API, is there an equivalent to the legacy APIs "entities" call
for a list of instances for a particular type? Such as:

http://server:21000/api/atlas/entities?type=Table

This is an easy way to process whole listings of a particular type further
in the code.I suspect there may be a way to do this with the v2 
search,
but I haven't been able to figure it out.  Alternative question --- is the
legacy API going to be formally deprecated and removed, or will it remain
available for applications already written?

Thank you.

Ernie




Ernie Ostic

Worldwide Technical Sales
InfoSphere Information Server
IBM Analytics

Cell: (617) 331 8238
---
Open IGC is here!

Extend the Catalog with custom objects and lineage definitions!
https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU