Chetan Mehrotra created OAK-6671:
------------------------------------
Summary: Enable support for custom types in ExternalSort
Key: OAK-6671
URL: https://issues.apache.org/jira/browse/OAK-6671
Project: Jackrabbit Oak
Issue Type: Technical task
Components: commons
Reporter: Chetan Mehrotra
Assignee: Chetan Mehrotra
Fix For: 1.8
ExternalSort currently sorts the file content as string. For some cases we need
to sort the content in custom way which is current facilitated via Comparator
support. However in this mode we need to deserialize the line in required
format for enabling custom comparison which adds overhead.
For e.g. consider a file having following file
{noformat}
/apps|{"8":"dat:2016-07-01T15:14:37.241+05:30","71":["nam:rep:AccessControllable"],"9":"admin","0":"nam:sling:Folder"}
/apps/assets|{"8":"dat:2016-07-01T15:37:38.598+05:30","9":"admin","0":"nam:nt:folder"}
{noformat}
This needs to be sorted on the basis of path and that too on per element basis.
Currently sorting a 50Gb file having 130M lines take 30 for a batch for 8M.
Most of the time is spent in extract the path structure. This can be avoided if
ExternalSort support mapping line to custom type and retain that type for the
sorting phase
This would add slight memory overhead for cases where this feature is used. For
normal case no overhead would be present.
Would come up with a patch
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)