[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307710#comment-16307710 ] Lefty Leverenz commented on ORC-256: Should this be documented in the wiki? > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > Fix For: 1.5.0 > > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300252#comment-16300252 ] Sandeep More commented on ORC-256: -- Thanks [~owen.omalley] ! Your changes (improvements) look good, thanks for taking out a lot of redundancy and dead code from the patch. > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > Fix For: 1.5.0 > > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299277#comment-16299277 ] ASF GitHub Bot commented on ORC-256: Github user asfgit closed the pull request at: https://github.com/apache/orc/pull/184 > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277206#comment-16277206 ] Sandeep More commented on ORC-256: -- Hello [~owen.omalley], friendly nudge to review this Jira :) > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254276#comment-16254276 ] Sandeep More commented on ORC-256: -- Thanks for the review [~owen.omalley] ! bq. We'd do better with an explicit Range class that held the start/end and make an ArrayList of those. It would be much faster to iterate over. There isn't any need for a synchronized container class in this context. I am assuming you are talking about [unmaskIndexRanges|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR124] map The reason I did not use a separate Range class is because I wanted to keep the Map sorted based on the index (basically when I normalize the negative indexes). I use TreeMap to make sure that the map is always sorted (needed for string substitution operations). Using LinkedList and Range class would work but then I would have to call Collections.sort() every time I update the map. Let me know if you prefer LinkedList implementation over the TreeMap implementation, I can change it. I can probably lose the synchronization on unmaskIndexRanges. bq. You have some spurious whitespace changes at line 271 & 291. Will fix this ! bq. It probably is easier to reason about the numerics if we: 1. Use the original code if there is no unmasking. 2.Convert the original value to a string, use maskString, and then convert it back. I think I may not be following you completely here. The patch takes care of this, for e.g. the function to mask long [unmaskRangeLongValue|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR915] does the same, if no masking is needed (map unmaskIndexRanges is empty) return original else convert the original value to string (String.valueOf(value)) and mask it then convert it back. Let me know your thoughts, and thanks again for the review ! P.S. do you think we should keep the the unmasking option for decimal given the subtleties (precision, length) ? > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221147#comment-16221147 ] Sandeep More commented on ORC-256: -- I was going through the DecimalRedactConverter class in RedactMaskFactory, it appears that we are not using maskDecimal() function on line #361 {code} if (original.isRepeating) { target.isNull[0] = source.isNull[0]; if (target.noNulls || !target.isNull[0]) { target.vector[0].set(maskDecimal(source.vector[0])); } } else { for(int r = start; r < start + length; ++r) { target.isNull[r] = source.isNull[r]; if (target.noNulls || !target.isNull[r]) { target.vector[r].set(source.vector[r]); //#361 } {code} > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask
[ https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211804#comment-16211804 ] Sandeep More commented on ORC-256: -- Right, should have mentioned :, thanks for catching the error in the backwards range. About the parentheses, sounds good. > Add unmasked ranges option for redact mask > -- > > Key: ORC-256 > URL: https://issues.apache.org/jira/browse/ORC-256 > Project: ORC > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Sandeep More > > It would be good to extend the Redact DataMask so that you could leave > certain ranges of strings unmasked. -- This message was sent by Atlassian JIRA (v6.4.14#64029)