Re: Review Request 57495: Export API: Memory usage optimization

2017-03-10 Thread Madhan Neethiraj

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57495/#review168686
---


Ship it!




Ship It!

- Madhan Neethiraj


On March 10, 2017, 4:09 a.m., Ashutosh Mestry wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57495/
> ---
> 
> (Updated March 10, 2017, 4:09 a.m.)
> 
> 
> Review request for atlas, Madhan Neethiraj and Sarath Subramanian.
> 
> 
> Bugs: ATLAS-1503
> https://issues.apache.org/jira/browse/ATLAS-1503
> 
> 
> Repository: atlas
> 
> 
> Description
> ---
> 
> **Background**
> Existing implementation of Export REST API uses *ByteArrayOutputStream* to 
> during output zip file creation. This puts pressure on memory when handling 
> large data. Also, the data transfer does not start until entire export is 
> done. This situation is less than ideal for performance.
> 
> **Solution**
> - Passing *ServletOutputStream* to *ZipSink*.
>   - This improves memory usage as memory does not get held up by 
> *ByteArrayOutputStream*. 
>   - Reduces additional copy from *ByteArrayOutputStream* to 
> *ServletOutputSream*.
>   - Simplifies *ZipSink*.
> - Clear internal data structures after operation completion.
>   - This aids, though not much, when freeing up memory used. There is some 
> improvement in large transfers.
> - *ExportService.ExportContext.guidsToProcess* removed sequential lookup from 
> *List* to *Set*.
> - Data transfer from server to client starts much sooner. Client is able to 
> interrupt the progress if needed.
> 
> 
> Diffs
> -
> 
>   intg/src/main/java/org/apache/atlas/model/impexp/AtlasExportResult.java 
> e6a967e 
>   webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
> 31a4cf9 
>   webapp/src/main/java/org/apache/atlas/web/resources/ExportService.java 
> c1891e0 
>   webapp/src/main/java/org/apache/atlas/web/resources/ZipSink.java 2e4cb01 
> 
> 
> Diff: https://reviews.apache.org/r/57495/diff/2/
> 
> 
> Testing
> ---
> 
> Profiled using *jmap* & *Eclipse MAT*, verified using *YourKit*.
> 
> Verified: *FetchTypes* viz. *full* and *connected*.
> 
> Memory usage: Stays constant on prolonged use. Verified ~3 hrs of continuous 
> runs using medium and large database exports.
> 
> Performance improvement:
> Date | File Size | No. of Entities | Duration (in mins)|
> -|---|-|---|
> 3/02 |   180 MB  |  202930 |29 mins|
> 3/08 |   180 MB  |  202930 |22 mins|
> 3/09 |   180 MB  |  202930 |19 mins|
> 
> About 15% improvement with list & set combined data structures.
> About 30% improvement by eliminating use of *ByteArrayOutputStream*.
> 
> 
> Thanks,
> 
> Ashutosh Mestry
> 
>



Review Request 57495: Export API: Memory usage optimization

2017-03-09 Thread Ashutosh Mestry

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57495/
---

Review request for atlas, Madhan Neethiraj and Sarath Subramanian.


Bugs: ATLAS-1646
https://issues.apache.org/jira/browse/ATLAS-1646


Repository: atlas


Description
---

**Background**
Existing implementation of Export REST API uses *ByteArrayOutputStream* to 
during output zip file creation. This puts pressure on memory when handling 
large data. Also, the data transfer does not start until entire export is done. 
This situation is less than ideal for performance.

**Solution**
- Passing *ServletOutputStream* to *ZipSink*.
  - This improves memory usage as memory does not get held up by 
*ByteArrayOutputStream*. 
  - Reduces additional copy from *ByteArrayOutputStream* to 
*ServletOutputSream*.
  - Simplifies *ZipSink*.
- Clear internal data structures after operation completion.
  - This aids, though not much, when freeing up memory used. There is some 
improvement in large transfers.
- *ExportService.ExportContext.guidsToProcess* removed sequential lookup from 
*List* to *Set*.
- Data transfer from server to client starts much sooner. Client is able to 
interrupt the progress if needed.


Diffs
-

  intg/src/main/java/org/apache/atlas/model/impexp/AtlasExportResult.java 
e6a967e 
  webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java 
31a4cf9 
  webapp/src/main/java/org/apache/atlas/web/resources/ExportService.java 
c1891e0 
  webapp/src/main/java/org/apache/atlas/web/resources/ZipSink.java 2e4cb01 


Diff: https://reviews.apache.org/r/57495/diff/1/


Testing
---

Profiled using *jmap* & *Eclipse MAT*, verified using *YourKit*.

Verified: *FetchTypes* viz. *full* and *connected*.

Memory usage: Stays constant on prolonged use. Verified ~3 hrs of continuous 
runs using medium and large database exports.

Performance improvement:
Date | File Size | No. of Entities | Duration (in mins)|
-|---|-|---|
3/08 |   180 MB  |  202930 |22 mins|
3/09 |   180 MB  |  202930 |19 mins|

About 15% improvement.


Thanks,

Ashutosh Mestry