[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21246


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r189993896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-22 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r189786071
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3626,6 +3626,125 @@ object functions {
*/
   def map_values(e: Column): Column = withExpr { MapValues(e.expr) }
 
+  
//
--- End diff --

Let's keep these for now and revisit again later. We'll need to discuss 
which one is here or not in a batch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188199210
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3626,6 +3626,125 @@ object functions {
*/
   def map_values(e: Column): Column = withExpr { MapValues(e.expr) }
 
+  
//
--- End diff --

@gatorsmile @ueshin what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188183259
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3626,6 +3626,125 @@ object functions {
*/
   def map_values(e: Column): Column = withExpr { MapValues(e.expr) }
 
+  
//
--- End diff --

Do we need to include these expressions here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188181885
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MaskExpressionsSuite.scala
 ---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.{IntegerType, StringType}
+
+class MaskExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper 
{
+
+  test("mask") {
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), "U", "l", "#"), 
"---")
+checkEvaluation(
+  new Mask(Literal("abcd-EFGH-8765-4321"), Literal("U"), Literal("l"), 
Literal("#")),
+  "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), Literal("U"), 
Literal("l")),
+  "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), 
Literal("U")), "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321")), 
"---")
+checkEvaluation(new Mask(Literal(null, StringType)), null)
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), null, "l", "#"), 
"---")
+checkEvaluation(new Mask(
+  Literal("abcd-EFGH-8765-4321"),
+  Literal(null, StringType),
+  Literal(null, StringType),
+  Literal(null, StringType)), "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), 
Literal("Upper")),
+  "---")
+checkEvaluation(new Mask(Literal("")), "")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), Literal("")), 
"---")
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), "", "", ""), 
"---")
+// scalastyle:off nonascii
+checkEvaluation(Mask(Literal("Ul9U"), "\u2200", null, null), 
"\u2200xn\u2200")
+checkEvaluation(new Mask(Literal("Hello World, こんにちは, ð 
€‹"), Literal("あ"), Literal("𡈽")),
+  "あ𡈽𡈽𡈽𡈽 あ𡈽𡈽𡈽𡈽, こんにちは, 𠀋")
+// scalastyle:on nonascii
+intercept[AnalysisException] {
+  checkEvaluation(new Mask(Literal(""), Literal(1)), "")
+}
+  }
+
+  test("mask_first_n") {
+checkEvaluation(MaskFirstN(Literal("abcd-EFGH-8765-4321"), 6, "U", 
"l", "#"),
--- End diff --

Can you do the same thing to the following tests?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188181134
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
--- End diff --

`appendMaskedToStringBuilder`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188181176
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: java.lang.StringBuilder,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
--- End diff --

`appendUnchangedToStringBuilder`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188181573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-15 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r188179821
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
--- End diff --

I see, thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187941913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187940989
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
--- End diff --

Because I am invoking also in the Java code generated and I wanted to avoid 
using the match clause (instead of the switch java operation) for performance 
reasons.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187837123
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187835032
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187839186
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MaskExpressionsSuite.scala
 ---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.{IntegerType, StringType}
+
+class MaskExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper 
{
+
+  test("mask") {
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), "U", "l", "#"), 
"---")
+checkEvaluation(
+  new Mask(Literal("abcd-EFGH-8765-4321"), Literal("U"), Literal("l"), 
Literal("#")),
+  "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), Literal("U"), 
Literal("l")),
+  "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), 
Literal("U")), "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321")), 
"---")
+checkEvaluation(new Mask(Literal(null, StringType)), null)
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), null, "l", "#"), 
"---")
+checkEvaluation(new Mask(
+  Literal("abcd-EFGH-8765-4321"),
+  Literal(null, StringType),
+  Literal(null, StringType),
+  Literal(null, StringType)), "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), 
Literal("Upper")),
+  "---")
+checkEvaluation(new Mask(Literal("")), "")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), Literal("")), 
"---")
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), "", "", ""), 
"---")
+// scalastyle:off nonascii
+checkEvaluation(Mask(Literal("Ul9U"), "\u2200", null, null), 
"\u2200xn\u2200")
+checkEvaluation(new Mask(Literal("Hello World, こんにちは, ð 
€‹"), Literal("あ"), Literal("𡈽")),
+  "あ𡈽𡈽𡈽𡈽 あ𡈽𡈽𡈽𡈽, こんにちは, 𠀋")
+// scalastyle:on nonascii
+intercept[AnalysisException] {
+  checkEvaluation(new Mask(Literal(""), Literal(1)), "")
+}
+  }
+
+  test("mask_first_n") {
+checkEvaluation(MaskFirstN(Literal("abcd-EFGH-8765-4321"), 6, "U", 
"l", "#"),
--- End diff --

Can you include upper/lower/number/other letters in the first N letters to 
check the mask is working?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187835331
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
--- End diff --

Why is this implemented in Java?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187835834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187839605
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MaskExpressionsSuite.scala
 ---
@@ -0,0 +1,236 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.types.{IntegerType, StringType}
+
+class MaskExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper 
{
+
+  test("mask") {
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), "U", "l", "#"), 
"---")
+checkEvaluation(
+  new Mask(Literal("abcd-EFGH-8765-4321"), Literal("U"), Literal("l"), 
Literal("#")),
+  "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), Literal("U"), 
Literal("l")),
+  "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), 
Literal("U")), "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321")), 
"---")
+checkEvaluation(new Mask(Literal(null, StringType)), null)
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), null, "l", "#"), 
"---")
+checkEvaluation(new Mask(
+  Literal("abcd-EFGH-8765-4321"),
+  Literal(null, StringType),
+  Literal(null, StringType),
+  Literal(null, StringType)), "---")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), 
Literal("Upper")),
+  "---")
+checkEvaluation(new Mask(Literal("")), "")
+checkEvaluation(new Mask(Literal("abcd-EFGH-8765-4321"), Literal("")), 
"---")
+checkEvaluation(Mask(Literal("abcd-EFGH-8765-4321"), "", "", ""), 
"---")
+// scalastyle:off nonascii
+checkEvaluation(Mask(Literal("Ul9U"), "\u2200", null, null), 
"\u2200xn\u2200")
+checkEvaluation(new Mask(Literal("Hello World, こんにちは, ð 
€‹"), Literal("あ"), Literal("𡈽")),
+  "あ𡈽𡈽𡈽𡈽 あ𡈽𡈽𡈽𡈽, こんにちは, 𠀋")
+// scalastyle:on nonascii
+intercept[AnalysisException] {
+  checkEvaluation(new Mask(Literal(""), Literal(1)), "")
+}
+  }
+
+  test("mask_first_n") {
+checkEvaluation(MaskFirstN(Literal("abcd-EFGH-8765-4321"), 6, "U", 
"l", "#"),
+  "-UFGH-8765-4321")
+checkEvaluation(new MaskFirstN(
+  Literal("abcd-EFGH-8765-4321"), Literal(6), Literal("U"), 
Literal("l"), Literal("#")),
+  "-UFGH-8765-4321")
+checkEvaluation(
+  new MaskFirstN(Literal("abcd-EFGH-8765-4321"), Literal(6), 
Literal("U"), Literal("l")),
+  "-UFGH-8765-4321")
+checkEvaluation(new MaskFirstN(Literal("abcd-EFGH-8765-4321"), 
Literal(6), Literal("U")),
+  "-UFGH-8765-4321")
+checkEvaluation(new MaskFirstN(Literal("abcd-EFGH-8765-4321"), 
Literal(6)),
+  "-XFGH-8765-4321")
+intercept[AnalysisException] {
+  checkEvaluation(new MaskFirstN(Literal("abcd-EFGH-8765-4321"), 
Literal("U")), "")
+}
+checkEvaluation(new MaskFirstN(Literal("abcd-EFGH-8765-4321")), 
"-EFGH-8765-4321")
+checkEvaluation(new MaskFirstN(Literal(null, StringType)), null)
+checkEvaluation(MaskFirstN(Literal("abcd-EFGH-8765-4321"), 4, "U", 
"l", null),
+  "-EFGH-8765-4321")
+checkEvaluation(new MaskFirstN(
+  Literal("abcd-EFGH-8765-4321"),
+  Literal(null, IntegerType),
+  Literal(null, StringType),
+  Literal(null, StringType),
+  Literal(null, StringType)), "-EFGH-8765-4321")
+checkEvaluation(new MaskFirstN(Literal("abcd-EFGH-8765-4321"), 
Literal(6), Literal("Upper")),
+  "-UFGH-8765-4321")
+checkEvaluation(new MaskFirstN(Literal("")), "")
+checkEvaluation(new 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187836985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187835126
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
+  final static int UNMASKED_VAL = -1;
+
+  /**
+   *
--- End diff --

Can you add a description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187769261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187759378
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-11 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187687668
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187671225
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-11 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187668256
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187667028
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187253468
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
--- End diff --

I think we don't need to follow Hive behavior perfectly and we should 
handle proper way.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-10 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187253483
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
+   |}
+ """.stripMargin
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask operations.
+ */
+object MaskLike {
+  val defaultCharCount = 4
+
+  def extractCharCount(e: Expression): Int = e match {
+case Literal(i, IntegerType|NullType) =>
+  if (i == null) defaultCharCount else i.asInstanceOf[Int]
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${IntegerType.simpleString}, but got literal of 
${dt.simpleString}")
+case _ => defaultCharCount
+  }
+
+  def extractReplacement(e: Expression): String = e match {
+case Literal(s, StringType|NullType) => if (s == null) null else 
s.toString
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${StringType.simpleString}, but got literal of ${dt.simpleString}")
+case _ => null
+  }
+}
+
+/**
+ * Masks the input string. Additional parameters can be set to change the 
masking chars for
+ * uppercase letters, lowercase letters and digits.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(str[, upper[, lower[, digit]]]) - Masks str. By default, 
upper case letters are converted to \"X\", lower case letters are converted to 
\"x\" and numbers are converted to \"n\". You can override the characters used 
in the mask by supplying additional arguments: the second argument controls the 
mask character for upper 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187025303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
+   |}
+ """.stripMargin
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask operations.
+ */
+object MaskLike {
+  val defaultCharCount = 4
+
+  def extractCharCount(e: Expression): Int = e match {
+case Literal(i, IntegerType|NullType) =>
+  if (i == null) defaultCharCount else i.asInstanceOf[Int]
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${IntegerType.simpleString}, but got literal of 
${dt.simpleString}")
+case _ => defaultCharCount
+  }
+
+  def extractReplacement(e: Expression): String = e match {
+case Literal(s, StringType|NullType) => if (s == null) null else 
s.toString
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${StringType.simpleString}, but got literal of ${dt.simpleString}")
+case _ => null
+  }
+}
+
+/**
+ * Masks the input string. Additional parameters can be set to change the 
masking chars for
+ * uppercase letters, lowercase letters and digits.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(str[, upper[, lower[, digit]]]) - Masks str. By default, 
upper case letters are converted to \"X\", lower case letters are converted to 
\"x\" and numbers are converted to \"n\". You can override the characters used 
in the mask by supplying additional arguments: the second argument controls the 
mask character for upper 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187021056
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
+  final static int UNMASKED_VAL = -1;
+
+  /**
+   *
+   * @param c the character to transform
+   * @param maskedUpperChar the character to use instead of a uppercase 
letter
+   * @param maskedLowerChar the character to use instead of a lowercase 
letter
+   * @param maskedDigitChar the character to use instead of a digit
+   * @param maskedOtherChar the character to use instead of a any other 
character
+   * @return masking character for {@param c}
+   */
+  public static int transformChar(
+  final int c,
+  int maskedUpperChar,
+  int maskedLowerChar,
+  int maskedDigitChar,
+  int maskedOtherChar) {
+switch(Character.getType(c)) {
+  case Character.UPPERCASE_LETTER:
+if(maskedUpperChar != UNMASKED_VAL) {
+  return maskedUpperChar;
+}
+break;
+
+  case Character.LOWERCASE_LETTER:
+if(maskedLowerChar != UNMASKED_VAL) {
+  return maskedLowerChar;
+}
+break;
+
+  case Character.DECIMAL_DIGIT_NUMBER:
+if(maskedDigitChar != UNMASKED_VAL) {
+  return maskedDigitChar;
+}
+break;
+
+  default:
+if(maskedOtherChar != UNMASKED_VAL) {
+  return maskedOtherChar;
+}
+break;
+}
+
+return c;
+  }
+
+  /**
+   * Returns the replacement char to use according to the {@param rep} 
specified by the user and
+   * the {@param def} default.
+   */
+  public static int getReplacementChar(String rep, int def) {
+if (rep != null && rep.length() > 0) {
+  return rep.charAt(0);
--- End diff --

again, Hive does this...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187019563
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
+   |}
+ """.stripMargin
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask operations.
+ */
+object MaskLike {
+  val defaultCharCount = 4
+
+  def extractCharCount(e: Expression): Int = e match {
+case Literal(i, IntegerType|NullType) =>
+  if (i == null) defaultCharCount else i.asInstanceOf[Int]
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${IntegerType.simpleString}, but got literal of 
${dt.simpleString}")
+case _ => defaultCharCount
+  }
+
+  def extractReplacement(e: Expression): String = e match {
+case Literal(s, StringType|NullType) => if (s == null) null else 
s.toString
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${StringType.simpleString}, but got literal of ${dt.simpleString}")
+case _ => null
+  }
+}
+
+/**
+ * Masks the input string. Additional parameters can be set to change the 
masking chars for
+ * uppercase letters, lowercase letters and digits.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(str[, upper[, lower[, digit]]]) - Masks str. By default, 
upper case letters are converted to \"X\", lower case letters are converted to 
\"x\" and numbers are converted to \"n\". You can override the characters used 
in the mask by supplying additional arguments: the second argument controls the 
mask character for upper 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187019117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
--- End diff --

Hive uses `charAt`. So I kept its implementation in order to be consistent 
with it. I think this depends on our goal. If we want to reflect Hive's 
behavior (as I assumed), I think we should not change this. Otherwise we have 
to decide what to do, ie. how these functions are supposed to behave.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187016243
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
--- End diff --

I think this makes the code easier (more compact). But I can put them in 
the companion object and import all them if you prefer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r186999333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
--- End diff --

`codePointAt()` instead of `charAt()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r186999228
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/MaskExpressionsUtils.java
 ---
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions;
+
+/**
+ * Contains all the Utils methods used in the masking expressions.
+ */
+public class MaskExpressionsUtils {
+  final static int UNMASKED_VAL = -1;
+
+  /**
+   *
+   * @param c the character to transform
+   * @param maskedUpperChar the character to use instead of a uppercase 
letter
+   * @param maskedLowerChar the character to use instead of a lowercase 
letter
+   * @param maskedDigitChar the character to use instead of a digit
+   * @param maskedOtherChar the character to use instead of a any other 
character
+   * @return masking character for {@param c}
+   */
+  public static int transformChar(
+  final int c,
+  int maskedUpperChar,
+  int maskedLowerChar,
+  int maskedDigitChar,
+  int maskedOtherChar) {
+switch(Character.getType(c)) {
+  case Character.UPPERCASE_LETTER:
+if(maskedUpperChar != UNMASKED_VAL) {
+  return maskedUpperChar;
+}
+break;
+
+  case Character.LOWERCASE_LETTER:
+if(maskedLowerChar != UNMASKED_VAL) {
+  return maskedLowerChar;
+}
+break;
+
+  case Character.DECIMAL_DIGIT_NUMBER:
+if(maskedDigitChar != UNMASKED_VAL) {
+  return maskedDigitChar;
+}
+break;
+
+  default:
+if(maskedOtherChar != UNMASKED_VAL) {
+  return maskedOtherChar;
+}
+break;
+}
+
+return c;
+  }
+
+  /**
+   * Returns the replacement char to use according to the {@param rep} 
specified by the user and
+   * the {@param def} default.
+   */
+  public static int getReplacementChar(String rep, int def) {
+if (rep != null && rep.length() > 0) {
+  return rep.charAt(0);
--- End diff --

`codePointAt()`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187002866
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
+   |}
+ """.stripMargin
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask operations.
+ */
+object MaskLike {
+  val defaultCharCount = 4
+
+  def extractCharCount(e: Expression): Int = e match {
+case Literal(i, IntegerType|NullType) =>
+  if (i == null) defaultCharCount else i.asInstanceOf[Int]
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${IntegerType.simpleString}, but got literal of 
${dt.simpleString}")
+case _ => defaultCharCount
+  }
+
+  def extractReplacement(e: Expression): String = e match {
+case Literal(s, StringType|NullType) => if (s == null) null else 
s.toString
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${StringType.simpleString}, but got literal of ${dt.simpleString}")
+case _ => null
+  }
+}
+
+/**
+ * Masks the input string. Additional parameters can be set to change the 
masking chars for
+ * uppercase letters, lowercase letters and digits.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(str[, upper[, lower[, digit]]]) - Masks str. By default, 
upper case letters are converted to \"X\", lower case letters are converted to 
\"x\" and numbers are converted to \"n\". You can override the characters used 
in the mask by supplying additional arguments: the second argument controls the 
mask character for upper 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187000675
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
+   |}
+ """.stripMargin
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask operations.
+ */
+object MaskLike {
+  val defaultCharCount = 4
+
+  def extractCharCount(e: Expression): Int = e match {
+case Literal(i, IntegerType|NullType) =>
+  if (i == null) defaultCharCount else i.asInstanceOf[Int]
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${IntegerType.simpleString}, but got literal of 
${dt.simpleString}")
+case _ => defaultCharCount
+  }
+
+  def extractReplacement(e: Expression): String = e match {
+case Literal(s, StringType|NullType) => if (s == null) null else 
s.toString
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${StringType.simpleString}, but got literal of ${dt.simpleString}")
+case _ => null
+  }
+}
+
+/**
+ * Masks the input string. Additional parameters can be set to change the 
masking chars for
+ * uppercase letters, lowercase letters and digits.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(str[, upper[, lower[, digit]]]) - Masks str. By default, 
upper case letters are converted to \"X\", lower case letters are converted to 
\"x\" and numbers are converted to \"n\". You can override the characters used 
in the mask by supplying additional arguments: the second argument controls the 
mask character for upper 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187000263
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
+   |}
+ """.stripMargin
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask operations.
+ */
+object MaskLike {
+  val defaultCharCount = 4
+
+  def extractCharCount(e: Expression): Int = e match {
+case Literal(i, IntegerType|NullType) =>
+  if (i == null) defaultCharCount else i.asInstanceOf[Int]
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${IntegerType.simpleString}, but got literal of 
${dt.simpleString}")
+case _ => defaultCharCount
+  }
+
+  def extractReplacement(e: Expression): String = e match {
+case Literal(s, StringType|NullType) => if (s == null) null else 
s.toString
+case Literal(_, dt) => throw new AnalysisException(s"Expected literal 
expression of type " +
+  s"${StringType.simpleString}, but got literal of ${dt.simpleString}")
+case _ => null
+  }
+}
+
+/**
+ * Masks the input string. Additional parameters can be set to change the 
masking chars for
+ * uppercase letters, lowercase letters and digits.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(str[, upper[, lower[, digit]]]) - Masks str. By default, 
upper case letters are converted to \"X\", lower case letters are converted to 
\"x\" and numbers are converted to \"n\". You can override the characters used 
in the mask by supplying additional arguments: the second argument controls the 
mask character for upper 

[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r186999359
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
+
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def maskAndAppendToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  
$sb.appendCodePoint($maskUtilsClassName.transformChar($inputString.charAt($i),
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  start: String,
+  end: String): String = {
+val i = ctx.freshName("i")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = $start; $i < $end; $i ++) {
+   |  $sb.appendCodePoint($inputString.charAt($i));
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-09 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r186993629
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,534 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  val defaultMaskedUppercase: Int = 'X'
+  val defaultMaskedLowercase: Int = 'x'
+  val defaultMaskedDigit: Int = 'n'
+  val defaultMaskedOther: Int = MaskExpressionsUtils.UNMASKED_VAL
--- End diff --

We should move these into companion object?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-05 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/21246

[SPARK-23901][SQL] Add masking functions

## What changes were proposed in this pull request?

The PR adds the masking function as they are described in Hive's 
documentation: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DataMaskingFunctions.
This means that only `string`s are accepted as parameter for the masking 
functions.

## How was this patch tested?

added UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23901

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21246.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21246


commit 40cffbdd8e5079e86ba3a3b3838e305cf2a3a0fe
Author: Marco Gaido 
Date:   2018-04-13T12:37:56Z

[SPARK-23901][SQL] Add masking functions




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org