[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2018-01-01 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16307710#comment-16307710
 ] 

Lefty Leverenz commented on ORC-256:


Should this be documented in the wiki?

> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
> Fix For: 1.5.0
>
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2017-12-21 Thread Sandeep More (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16300252#comment-16300252
 ] 

Sandeep More commented on ORC-256:
--

Thanks [~owen.omalley] ! 
Your changes (improvements) look good, thanks for taking out a lot of 
redundancy and dead code from the patch.

> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
> Fix For: 1.5.0
>
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2017-12-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16299277#comment-16299277
 ] 

ASF GitHub Bot commented on ORC-256:


Github user asfgit closed the pull request at:

https://github.com/apache/orc/pull/184


> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2017-12-04 Thread Sandeep More (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16277206#comment-16277206
 ] 

Sandeep More commented on ORC-256:
--

Hello [~owen.omalley], friendly nudge to review this Jira :) 

> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2017-11-15 Thread Sandeep More (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254276#comment-16254276
 ] 

Sandeep More commented on ORC-256:
--

Thanks for the review [~owen.omalley] !

bq. We'd do better with an explicit Range class that held the start/end and 
make an ArrayList of those. It would be much faster to iterate over. There 
isn't any need for a synchronized container class in this context.

I am assuming you are talking about 
[unmaskIndexRanges|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR124]
 map 
The reason I did not use a separate Range class is because I wanted to keep the 
Map sorted based on the index (basically when I normalize the negative 
indexes). I use TreeMap to make sure that the map is always sorted (needed for 
string substitution operations). Using LinkedList and Range class would work 
but then I would have to call Collections.sort() every time I update the map. 
Let me know if you prefer LinkedList implementation over the TreeMap 
implementation, I can change it.

I can probably lose the synchronization on unmaskIndexRanges.

bq. You have some spurious whitespace changes at line 271 & 291.

Will fix this !

bq.  It probably is easier to reason about the numerics if we: 1. Use the 
original code if there is no unmasking. 2.Convert the original value to a 
string, use maskString, and then convert it back.

I think I may not be following you completely here. The patch takes care of 
this, for e.g. the function to mask long 
[unmaskRangeLongValue|https://github.com/apache/orc/pull/184/files#diff-620b37519a0f29b19e5f3a3c1ed34b5fR915]
 does the same, if no masking is needed (map unmaskIndexRanges is empty) return 
original else convert the original value to string (String.valueOf(value)) and 
mask it then convert it back.

Let me know your thoughts, and thanks again for the review !

P.S. do you think we should keep the the unmasking option for decimal given the 
subtleties (precision, length) ?







> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2017-10-26 Thread Sandeep More (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221147#comment-16221147
 ] 

Sandeep More commented on ORC-256:
--

I was going through the DecimalRedactConverter class in RedactMaskFactory, it 
appears that we are not using maskDecimal() function on line #361
{code}
if (original.isRepeating) {
target.isNull[0] = source.isNull[0];
if (target.noNulls || !target.isNull[0]) {
  target.vector[0].set(maskDecimal(source.vector[0]));
}
  } else {
for(int r = start; r < start + length; ++r) {
  target.isNull[r] = source.isNull[r];
  if (target.noNulls || !target.isNull[r]) {
target.vector[r].set(source.vector[r]);  //#361
  }
{code}

> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ORC-256) Add unmasked ranges option for redact mask

2017-10-19 Thread Sandeep More (JIRA)

[ 
https://issues.apache.org/jira/browse/ORC-256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16211804#comment-16211804
 ] 

Sandeep More commented on ORC-256:
--

Right, should have mentioned :, thanks for catching the error in 
the backwards range. 

About the parentheses, sounds good. 



> Add unmasked ranges option for redact mask
> --
>
> Key: ORC-256
> URL: https://issues.apache.org/jira/browse/ORC-256
> Project: ORC
>  Issue Type: Sub-task
>Reporter: Owen O'Malley
>Assignee: Sandeep More
>
> It would be good to extend the Redact DataMask so that you could leave 
> certain ranges of strings unmasked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)