[
https://issues.apache.org/jira/browse/OAK-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra resolved OAK-6671.
----------------------------------
Resolution: Fixed
Fix Version/s: 1.7.8
Thanks Amit for the review. Applied the patch with 1808443
> Enable support for custom types in ExternalSort
> -----------------------------------------------
>
> Key: OAK-6671
> URL: https://issues.apache.org/jira/browse/OAK-6671
> Project: Jackrabbit Oak
> Issue Type: Technical task
> Components: commons
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Fix For: 1.8, 1.7.8
>
> Attachments: OAK-6671-v1.patch
>
>
> ExternalSort currently sorts the file content as string. For some cases we
> need to sort the content in custom way which is current facilitated via
> Comparator support. However in this mode we need to deserialize the line in
> required format for enabling custom comparison which adds overhead.
> For e.g. consider a file having following file
> {noformat}
> /apps|{"8":"dat:2016-07-01T15:14:37.241+05:30","71":["nam:rep:AccessControllable"],"9":"admin","0":"nam:sling:Folder"}
> /apps/assets|{"8":"dat:2016-07-01T15:37:38.598+05:30","9":"admin","0":"nam:nt:folder"}
> {noformat}
> This needs to be sorted on the basis of path and that too on per element
> basis. Currently sorting a 50Gb file having 130M lines take 30 for a batch
> for 8M. Most of the time is spent in extract the path structure. This can be
> avoided if ExternalSort support mapping line to custom type and retain that
> type for the sorting phase
> This would add slight memory overhead for cases where this feature is used.
> For normal case no overhead would be present.
> Would come up with a patch
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)