Chetan Mehrotra created OAK-6671:
------------------------------------

             Summary: Enable support for custom types in ExternalSort
                 Key: OAK-6671
                 URL: https://issues.apache.org/jira/browse/OAK-6671
             Project: Jackrabbit Oak
          Issue Type: Technical task
          Components: commons
            Reporter: Chetan Mehrotra
            Assignee: Chetan Mehrotra
             Fix For: 1.8


ExternalSort currently sorts the file content as string. For some cases we need 
to sort the content in custom way which is current facilitated via Comparator 
support. However in this mode we need to deserialize the line in required 
format for enabling custom comparison which adds overhead.

For e.g. consider a file having following file
{noformat}
/apps|{"8":"dat:2016-07-01T15:14:37.241+05:30","71":["nam:rep:AccessControllable"],"9":"admin","0":"nam:sling:Folder"}
/apps/assets|{"8":"dat:2016-07-01T15:37:38.598+05:30","9":"admin","0":"nam:nt:folder"}
{noformat}

This needs to be sorted on the basis of path and that too on per element basis. 
Currently sorting a 50Gb file having 130M lines take 30 for a batch for 8M. 
Most of the time is spent in extract the path structure. This can be avoided if 
ExternalSort support mapping line to custom type and retain that type for the 
sorting phase

This would add slight memory overhead for cases where this feature is used. For 
normal case no overhead would be present.

Would come up with a patch



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to