[ 
https://issues.apache.org/jira/browse/OAK-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra resolved OAK-6671.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.7.8

Thanks Amit for the review. Applied the patch with 1808443

> Enable support for custom types in ExternalSort
> -----------------------------------------------
>
>                 Key: OAK-6671
>                 URL: https://issues.apache.org/jira/browse/OAK-6671
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: commons
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8, 1.7.8
>
>         Attachments: OAK-6671-v1.patch
>
>
> ExternalSort currently sorts the file content as string. For some cases we 
> need to sort the content in custom way which is current facilitated via 
> Comparator support. However in this mode we need to deserialize the line in 
> required format for enabling custom comparison which adds overhead.
> For e.g. consider a file having following file
> {noformat}
> /apps|{"8":"dat:2016-07-01T15:14:37.241+05:30","71":["nam:rep:AccessControllable"],"9":"admin","0":"nam:sling:Folder"}
> /apps/assets|{"8":"dat:2016-07-01T15:37:38.598+05:30","9":"admin","0":"nam:nt:folder"}
> {noformat}
> This needs to be sorted on the basis of path and that too on per element 
> basis. Currently sorting a 50Gb file having 130M lines take 30 for a batch 
> for 8M. Most of the time is spent in extract the path structure. This can be 
> avoided if ExternalSort support mapping line to custom type and retain that 
> type for the sorting phase
> This would add slight memory overhead for cases where this feature is used. 
> For normal case no overhead would be present.
> Would come up with a patch



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to