----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/57495/#review168686 -----------------------------------------------------------
Ship it! Ship It! - Madhan Neethiraj On March 10, 2017, 4:09 a.m., Ashutosh Mestry wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/57495/ > ----------------------------------------------------------- > > (Updated March 10, 2017, 4:09 a.m.) > > > Review request for atlas, Madhan Neethiraj and Sarath Subramanian. > > > Bugs: ATLAS-1503 > https://issues.apache.org/jira/browse/ATLAS-1503 > > > Repository: atlas > > > Description > ------- > > **Background** > Existing implementation of Export REST API uses *ByteArrayOutputStream* to > during output zip file creation. This puts pressure on memory when handling > large data. Also, the data transfer does not start until entire export is > done. This situation is less than ideal for performance. > > **Solution** > - Passing *ServletOutputStream* to *ZipSink*. > - This improves memory usage as memory does not get held up by > *ByteArrayOutputStream*. > - Reduces additional copy from *ByteArrayOutputStream* to > *ServletOutputSream*. > - Simplifies *ZipSink*. > - Clear internal data structures after operation completion. > - This aids, though not much, when freeing up memory used. There is some > improvement in large transfers. > - *ExportService.ExportContext.guidsToProcess* removed sequential lookup from > *List* to *Set*. > - Data transfer from server to client starts much sooner. Client is able to > interrupt the progress if needed. > > > Diffs > ----- > > intg/src/main/java/org/apache/atlas/model/impexp/AtlasExportResult.java > e6a967e > webapp/src/main/java/org/apache/atlas/web/resources/AdminResource.java > 31a4cf9 > webapp/src/main/java/org/apache/atlas/web/resources/ExportService.java > c1891e0 > webapp/src/main/java/org/apache/atlas/web/resources/ZipSink.java 2e4cb01 > > > Diff: https://reviews.apache.org/r/57495/diff/2/ > > > Testing > ------- > > Profiled using *jmap* & *Eclipse MAT*, verified using *YourKit*. > > Verified: *FetchTypes* viz. *full* and *connected*. > > Memory usage: Stays constant on prolonged use. Verified ~3 hrs of continuous > runs using medium and large database exports. > > Performance improvement: > Date | File Size | No. of Entities | Duration (in mins)| > -----|-----------|-----------------|-------------------| > 3/02 | 180 MB | 202930 | 29 mins| > 3/08 | 180 MB | 202930 | 22 mins| > 3/09 | 180 MB | 202930 | 19 mins| > > About 15% improvement with list & set combined data structures. > About 30% improvement by eliminating use of *ByteArrayOutputStream*. > > > Thanks, > > Ashutosh Mestry > >