[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18023


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126753014
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceTest.scala ---
@@ -20,14 +20,17 @@ package org.apache.spark.sql.sources
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
 private[sql] abstract class DataSourceTest extends QueryTest {
 
-  protected def sqlTest(sqlString: String, expectedAnswer: Seq[Row]) {
+  protected def sqlTest(sqlString: String, expectedAnswer: Seq[Row], 
enableRegex: String = "true") {
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126751775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126749797
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name),
+  SQLConf.get.caseSensitiveAnalysis)
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
-UnresolvedAttribute.quoted(ctx.getText)
+ctx.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && canApplyRegex(ctx) 
=>
--- End diff --

rolled back to conf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126749650
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name),
+  SQLConf.get.caseSensitiveAnalysis)
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
-UnresolvedAttribute.quoted(ctx.getText)
+ctx.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && canApplyRegex(ctx) 
=>
+UnresolvedRegex(columnNameRegex, None, 
SQLConf.get.caseSensitiveAnalysis)
--- End diff --

rolled back to conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126749474
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name),
+  SQLConf.get.caseSensitiveAnalysis)
--- End diff --

rolled back to conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126749439
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126609171
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
--- End diff --

Please add a comment above this function to explain why `regex` can be 
applied under `NamedExpression` only. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126607452
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/DataSourceTest.scala ---
@@ -20,14 +20,17 @@ package org.apache.spark.sql.sources
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql._
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
 private[sql] abstract class DataSourceTest extends QueryTest {
 
-  protected def sqlTest(sqlString: String, expectedAnswer: Seq[Row]) {
+  protected def sqlTest(sqlString: String, expectedAnswer: Seq[Row], 
enableRegex: String = "true") {
--- End diff --

`enableRegex: String = "true"` -> `enableRegex: Boolean = false`

Could you change the type to Boolean and call .toString below and set the 
default to `false`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126607036
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -922,59 +922,61 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   }
 
   test("Applying schemas with MapType") {
-val schemaWithSimpleMap = StructType(
-  StructField("map", MapType(StringType, IntegerType, true), false) :: 
Nil)
-val jsonWithSimpleMap = 
spark.read.schema(schemaWithSimpleMap).json(mapType1)
+withSQLConf(SQLConf.SUPPORT_QUOTED_REGEX_COLUMN_NAME.key -> "false") {
+  val schemaWithSimpleMap = StructType(
+StructField("map", MapType(StringType, IntegerType, true), false) 
:: Nil)
+  val jsonWithSimpleMap = 
spark.read.schema(schemaWithSimpleMap).json(mapType1)
 
-jsonWithSimpleMap.createOrReplaceTempView("jsonWithSimpleMap")
+  jsonWithSimpleMap.createOrReplaceTempView("jsonWithSimpleMap")
 
-checkAnswer(
-  sql("select `map` from jsonWithSimpleMap"),
-  Row(Map("a" -> 1)) ::
-  Row(Map("b" -> 2)) ::
-  Row(Map("c" -> 3)) ::
-  Row(Map("c" -> 1, "d" -> 4)) ::
-  Row(Map("e" -> null)) :: Nil
-)
+  checkAnswer(
+sql("select `map` from jsonWithSimpleMap"),
+Row(Map("a" -> 1)) ::
+  Row(Map("b" -> 2)) ::
+  Row(Map("c" -> 3)) ::
+  Row(Map("c" -> 1, "d" -> 4)) ::
+  Row(Map("e" -> null)) :: Nil
+  )
 
-checkAnswer(
-  sql("select `map`['c'] from jsonWithSimpleMap"),
-  Row(null) ::
-  Row(null) ::
-  Row(3) ::
-  Row(1) ::
-  Row(null) :: Nil
-)
+  checkAnswer(
+sql("select `map`['c'] from jsonWithSimpleMap"),
+Row(null) ::
+  Row(null) ::
+  Row(3) ::
+  Row(1) ::
+  Row(null) :: Nil
+  )
 
-val innerStruct = StructType(
-  StructField("field1", ArrayType(IntegerType, true), true) ::
-  StructField("field2", IntegerType, true) :: Nil)
-val schemaWithComplexMap = StructType(
-  StructField("map", MapType(StringType, innerStruct, true), false) :: 
Nil)
+  val innerStruct = StructType(
+StructField("field1", ArrayType(IntegerType, true), true) ::
+  StructField("field2", IntegerType, true) :: Nil)
+  val schemaWithComplexMap = StructType(
+StructField("map", MapType(StringType, innerStruct, true), false) 
:: Nil)
 
-val jsonWithComplexMap = 
spark.read.schema(schemaWithComplexMap).json(mapType2)
+  val jsonWithComplexMap = 
spark.read.schema(schemaWithComplexMap).json(mapType2)
 
-jsonWithComplexMap.createOrReplaceTempView("jsonWithComplexMap")
+  jsonWithComplexMap.createOrReplaceTempView("jsonWithComplexMap")
 
-checkAnswer(
-  sql("select `map` from jsonWithComplexMap"),
-  Row(Map("a" -> Row(Seq(1, 2, 3, null), null))) ::
-  Row(Map("b" -> Row(null, 2))) ::
-  Row(Map("c" -> Row(Seq(), 4))) ::
-  Row(Map("c" -> Row(null, 3), "d" -> Row(Seq(null), null))) ::
-  Row(Map("e" -> null)) ::
-  Row(Map("f" -> Row(null, null))) :: Nil
-)
+  checkAnswer(
+sql("select `map` from jsonWithComplexMap"),
+Row(Map("a" -> Row(Seq(1, 2, 3, null), null))) ::
+  Row(Map("b" -> Row(null, 2))) ::
+  Row(Map("c" -> Row(Seq(), 4))) ::
+  Row(Map("c" -> Row(null, 3), "d" -> Row(Seq(null), null))) ::
+  Row(Map("e" -> null)) ::
+  Row(Map("f" -> Row(null, null))) :: Nil
+  )
--- End diff --

Could you revert back all the unneeded changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126606904
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name),
+  SQLConf.get.caseSensitiveAnalysis)
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
-UnresolvedAttribute.quoted(ctx.getText)
+ctx.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && canApplyRegex(ctx) 
=>
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126606843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
--- End diff --

Sorry, recently, we reverted a PR back. In the parser, we are unable to use 
`SQLConf.get`.

Could you please change `SQLConf.get` back to `conf`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126606893
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name),
+  SQLConf.get.caseSensitiveAnalysis)
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
-UnresolvedAttribute.quoted(ctx.getText)
+ctx.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && canApplyRegex(ctx) 
=>
+UnresolvedRegex(columnNameRegex, None, 
SQLConf.get.caseSensitiveAnalysis)
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126606864
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1260,26 +1260,51 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def canApplyRegex(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
canApplyRegex(ctx) =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name),
+  SQLConf.get.caseSensitiveAnalysis)
--- End diff --

The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-10 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126517322
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

yes, df.groupBy("a", "b").agg(df.col("`(a)?+.+`")).show works too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126271044
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

@janewangfb You can do something like:

scala> val df = Seq((1, 2), (2, 3), (3, 4)).toDF("a", "b")
df: org.apache.spark.sql.DataFrame = [a: int, b: int]
scala> df.groupBy("a", "b").agg(df.col("*")).show
+---+---+---+---+
|  a|  b|  a|  b|
+---+---+---+---+
|  2|  3|  2|  3|
|  1|  2|  1|  2|
|  3|  4|  3|  4|
+---+---+---+---+

So I guess you can also do something like:

scala> df.groupBy("a", "b").agg(df.colRegex("`...`"))




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126262758
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

@viirya I tried it out, e.g.,  

val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS()
ds.groupByKey(_._1).agg(sum("*").as[Long]) is not supported. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126262446
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,8 +1188,29 @@ class Dataset[T] private[sql](
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
 case _ =>
-  val expr = resolve(colName)
-  Column(expr)
+  if (sqlContext.conf.supportQuotedRegexColumnName) {
+colRegex(colName)
+  } else {
+val expr = resolve(colName)
+Column(expr)
+  }
+  }
+
+  /**
+   * Selects column based on the column name specified as a regex and 
return it as [[Column]].
+   * @group untypedrel
+   * @since 2.3.0
+   */
+  def colRegex(colName: String): Column = {
--- End diff --

 I have tested out. it works for both cases. and I have added testcase 
DatasetSuite.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126256818
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1256,26 +1256,51 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def isContextNamedExpression(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
isContextNamedExpression(ctx) =>
--- End diff --

I think we should limit it to named expression; otherwise, other place like 
where will expand the regex, which does not make sense (see my comment to 
@gatorsmile), e.g., where `(a)?+.+` = 1 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126247287
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

@gatorsmile for quoted Identifiers; if it not in select, it would not be 
affected.
however, if it is in project/aggregation, it will be affected, e.g., before 
`(a)?+.+` is invalid, now, it will be expanded.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126088785
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,8 +1188,29 @@ class Dataset[T] private[sql](
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
 case _ =>
-  val expr = resolve(colName)
-  Column(expr)
+  if (sqlContext.conf.supportQuotedRegexColumnName) {
+colRegex(colName)
+  } else {
+val expr = resolve(colName)
+Column(expr)
+  }
+  }
+
+  /**
+   * Selects column based on the column name specified as a regex and 
return it as [[Column]].
+   * @group untypedrel
+   * @since 2.3.0
+   */
+  def colRegex(colName: String): Column = {
--- End diff --

`col` returns a column resolved on the current `Dataset`. `colRegex` now 
can return an unresolved one. Seems ok, but any possibility to go wrong with it?

For example, we can do:

val colRegex1 = df1.colRegex("`...`")  // colRegex1 is unresolved.
df2.select(colRegex1)

But you can't do the same thing with `col`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126087546
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

Is it possible users to use regex column in agg such as 
`testData2.groupBy($"a", $"b").agg($"`...`")`? In analyzer, seems we process 
`Star` in both `Project` and `Aggregate`.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126084902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1256,26 +1256,51 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
 CaseWhen(branches, Option(ctx.elseExpression).map(expression))
   }
 
+  private def isContextNamedExpression(ctx: ParserRuleContext): Boolean = 
withOrigin(ctx) {
+var parent = ctx.getParent
+while (parent != null) {
+  if (parent.isInstanceOf[NamedExpressionContext]) return true
+  parent = parent.getParent
+}
+return false
+  }
+
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex)
+if SQLConf.get.supportQuotedRegexColumnName && 
isContextNamedExpression(ctx) =>
--- End diff --

Is it required to have named expression context?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r126079978
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

I agree. It only makes sense when we use it in SELECT statement. However, 
our parser allows `quoted Identifiers (using backticks)` in any part of the SQL 
statement. Below is the just the example. If we turn on this conf flag, will it 
cause the problem for the other users when they have quoted identifiers in the 
query except Project/SELECT list?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-07-05 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r125728877
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

we should only support select. It does not make sense to do select a from 
test where `(a)?+.+`=3.

Also, for hive 
(https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select), it 
only supports the select statements:
"REGEX Column Specification
A SELECT statement can take regex-based column specification in Hive 
releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration 
property hive.support.quoted.identifiers is set to none." 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r125155245
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, quoted Identifiers (using backticks) in SELECT 
statement are interpreted" +
--- End diff --

Not only select statement. It can be almost any query.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-30 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r125147853
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,11 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, a SELECT statement can take regex-based column 
specification.")
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-30 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r125144859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -847,6 +847,11 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, a SELECT statement can take regex-based column 
specification.")
--- End diff --

We also need to explain the impact in the description. For example,
> When true, quoted Identifiers (using backticks) are interpreted as 
regular expressions. 

Feel free to input the above text


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-30 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124973097
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -307,6 +311,28 @@ case class UnresolvedStar(target: Option[Seq[String]]) 
extends Star with Unevalu
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => 
input.output.filter(_.name.matches(s"(?i)$regexPattern"))
--- End diff --

Updated the code with conf caseSensitiveAnalysis


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124964462
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -307,6 +311,28 @@ case class UnresolvedStar(target: Option[Seq[String]]) 
extends Star with Unevalu
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => 
input.output.filter(_.name.matches(s"(?i)$regexPattern"))
--- End diff --

You need to check the conf 
`sparkSession.sessionState.conf.caseSensitiveAnalysis`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124456075
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -244,6 +244,71 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   ("a", ClassData("a", 1)), ("b", ClassData("b", 2)), ("c", 
ClassData("c", 3)))
   }
 
+  test("REGEX column specification") {
+val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS()
+
+intercept[AnalysisException] {
+  ds.select(expr("`(_1)?+.+`").as[Int])
+}
+
+intercept[AnalysisException] {
+  ds.select(expr("`(_1|_2)`").as[Int])
+}
+
+intercept[AnalysisException] {
+  ds.select(ds("`(_1)?+.+`"))
+}
+
+intercept[AnalysisException] {
+  ds.select(ds("`(_1|_2)`"))
+}
--- End diff --

updated with message


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124456024
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -123,7 +124,14 @@ case class UnresolvedAttribute(nameParts: Seq[String]) 
extends Attribute with Un
 
   override def toString: String = s"'$name"
 
-  override def sql: String = quoteIdentifier(name)
+  override def sql: String = {
+name match {
+  case ParserUtils.escapedIdentifier(_) |
+   ParserUtils.qualifiedEscapedIdentifier(_, _) => name
--- End diff --

shortened


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124455888
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1189,8 +1189,24 @@ class Dataset[T] private[sql](
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
 case _ =>
-  val expr = resolve(colName)
-  Column(expr)
+  if (sqlContext.conf.supportQuotedRegexColumnName) {
+colRegex(colName)
+  } else {
+val expr = resolve(colName)
+Column(expr)
+  }
+  }
+
+  /**
+   * Selects column based on the column name specified as a regex and 
return it as [[Column]].
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124455687
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql ---
@@ -0,0 +1,24 @@
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1", "11"), (2, "2", "22"), (3, "3", "33"), (4, "4", "44"), (5, "5", 
"55"), (6, "6", "66")
+AS testData(key, value1, value2);
+
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1, 1, 2), (1, 2, 1, 2), (2, 1, 2, 3), (2, 2, 2, 3), (3, 1, 3, 4), (3, 
2, 3, 4)
+AS testData2(a, b, c, d);
+
+-- AnalysisException
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+SELECT `(a|b)` FROM testData2 WHERE a = 2;
+SELECT `(a|b)?+.+` FROM testData2 WHERE a = 2;
+
+set spark.sql.parser.quotedRegexColumnNames=true;
+
+-- Regex columns
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+SELECT `(a|b)` FROM testData2 WHERE a = 2;
+SELECT `(a|b)?+.+` FROM testData2 WHERE a = 2;
+SELECT `(e|f)` FROM testData2;
+SELECT t.`(e|f)` FROM testData2 t;
+SELECT p.`(key)?+.+`, b, testdata2.`(b)?+.+` FROM testData p join 
testData2 ON p.key = testData2.a WHERE key < 3;
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124415882
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -244,6 +244,71 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   ("a", ClassData("a", 1)), ("b", ClassData("b", 2)), ("c", 
ClassData("c", 3)))
   }
 
+  test("REGEX column specification") {
+val ds = Seq(("a", 1), ("b", 2), ("c", 3)).toDS()
+
+intercept[AnalysisException] {
+  ds.select(expr("`(_1)?+.+`").as[Int])
+}
+
+intercept[AnalysisException] {
+  ds.select(expr("`(_1|_2)`").as[Int])
+}
+
+intercept[AnalysisException] {
+  ds.select(ds("`(_1)?+.+`"))
+}
+
+intercept[AnalysisException] {
+  ds.select(ds("`(_1|_2)`"))
+}
--- End diff --

Could you capture the exception error messages? It can help reviewers 
ensure the error messages are correct. 

For example
```Scala
val e = intercept[AnalysisException] { 
ds.select(expr("`(_1)?+.+`").as[Int]) }.getMessage
assert(e.contains("xyz"))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124414653
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -123,7 +124,14 @@ case class UnresolvedAttribute(nameParts: Seq[String]) 
extends Attribute with Un
 
   override def toString: String = s"'$name"
 
-  override def sql: String = quoteIdentifier(name)
+  override def sql: String = {
+name match {
+  case ParserUtils.escapedIdentifier(_) |
+   ParserUtils.qualifiedEscapedIdentifier(_, _) => name
--- End diff --

Nit: shorten it to one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124401692
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1189,8 +1189,24 @@ class Dataset[T] private[sql](
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
 case _ =>
-  val expr = resolve(colName)
-  Column(expr)
+  if (sqlContext.conf.supportQuotedRegexColumnName) {
+colRegex(colName)
+  } else {
+val expr = resolve(colName)
+Column(expr)
+  }
+  }
+
+  /**
+   * Selects column based on the column name specified as a regex and 
return it as [[Column]].
--- End diff --

Please add `@group untypedrel` and `@since 2.3.0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124401574
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql ---
@@ -0,0 +1,24 @@
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1", "11"), (2, "2", "22"), (3, "3", "33"), (4, "4", "44"), (5, "5", 
"55"), (6, "6", "66")
+AS testData(key, value1, value2);
+
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1, 1, 2), (1, 2, 1, 2), (2, 1, 2, 3), (2, 2, 2, 3), (3, 1, 3, 4), (3, 
2, 3, 4)
+AS testData2(a, b, c, d);
+
+-- AnalysisException
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+SELECT `(a|b)` FROM testData2 WHERE a = 2;
+SELECT `(a|b)?+.+` FROM testData2 WHERE a = 2;
+
+set spark.sql.parser.quotedRegexColumnNames=true;
+
+-- Regex columns
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+SELECT `(a|b)` FROM testData2 WHERE a = 2;
+SELECT `(a|b)?+.+` FROM testData2 WHERE a = 2;
+SELECT `(e|f)` FROM testData2;
+SELECT t.`(e|f)` FROM testData2 t;
+SELECT p.`(key)?+.+`, b, testdata2.`(b)?+.+` FROM testData p join 
testData2 ON p.key = testData2.a WHERE key < 3;
--- End diff --

Nit: Please add a new line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124099738
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/query_regex_column.sql.out ---
@@ -0,0 +1,113 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 12
+
+
+-- !query 0
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1", "11"), (2, "2", "22"), (3, "3", "33"), (4, "4", "44"), (5, "5", 
"55"), (6, "6", "66")
+AS testData(key, value1, value2)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1, 1, 2), (1, 2, 1, 2), (2, 1, 2, 3), (2, 2, 2, 3), (3, 1, 3, 4), (3, 
2, 3, 4)
+AS testData2(a, b, c, d)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1
+-- !query 2 schema
+struct<>
+-- !query 2 output
+org.apache.spark.sql.AnalysisException
+cannot resolve '```(a)?+.+```' given input columns: [a, b, c, d]; line 1 
pos 7
--- End diff --

fixed the three backquote to single quote


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124099510
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,11 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, column names specified by quoted regex pattern will 
be expanded.")
--- End diff --

Updated the description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r124099358
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1191,6 +1191,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

refactored the code with a new function colRegex().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r123901447
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/query_regex_column.sql.out ---
@@ -0,0 +1,113 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 12
+
+
+-- !query 0
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1", "11"), (2, "2", "22"), (3, "3", "33"), (4, "4", "44"), (5, "5", 
"55"), (6, "6", "66")
+AS testData(key, value1, value2)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1, 1, 2), (1, 2, 1, 2), (2, 1, 2, 3), (2, 2, 2, 3), (3, 1, 3, 4), (3, 
2, 3, 4)
+AS testData2(a, b, c, d)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1
+-- !query 2 schema
+struct<>
+-- !query 2 output
+org.apache.spark.sql.AnalysisException
+cannot resolve '```(a)?+.+```' given input columns: [a, b, c, d]; line 1 
pos 7
--- End diff --

The error message is confusing. Three backquote marks are being used. Could 
you please improve it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r123901251
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,11 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.doc("When true, column names specified by quoted regex pattern will 
be expanded.")
--- End diff --

Please also update the description here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-06-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r123901234
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1191,6 +1191,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

@janewangfb Based on the above comment from @cloud-fan and @hvanhovell , 
how about creating a new function `colRegex`? 

`spark.sql.parser.quotedRegexColumnNames` can be defined as a SQL Parser 
specific configuration. That configuration will not affect the new function 
`colRegex`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-30 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r119194989
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

@cloud-fan, the code path is shared by by both select a, select a.b and 
where cause. If it is select a.b, the table part also go here and it will not 
be right. I rolled back to the code before last friday (June 23rd, 2017).

Do you have any suggestion? Currently Hive only supports select column 
regex expansion. and this PR matches the hive behavior.s




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118806698
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

@cloud-fan I might have misunderstood your last comment "I don't think sql 
a-like syntax is really useful here. How about we create a special cased col 
function that takes a regex?". Can you clarify with some examples? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118805375
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

@cloud-fan, I updated the code such that for the column field part, always 
use regex if supportQuotedRegexColumnName


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118794998
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

yes, ctx.fieldName.getText will trim the backquote


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118749165
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

we cannot avoid detecting regex string. but the string passed in is `xyz`, 
we need to match with the regex pattern to extract xyz part.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-26 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118748843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
--- End diff --

Hive supper regex column specification, see 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-24 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118197805
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

oh sorry I made a mistake, `ctx.fieldName.getText` will trim the backquote?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118160405
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

this wont work.  In your first "case", ctx.fieldName.getStart.getText is 
`XYZ`, nameparts is XYZ. and the table part should come from ctx.base.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118159701
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -177,6 +177,12 @@ object ParserUtils {
 sb.toString()
   }
 
+  /** the column name pattern in quoted regex without qualifier */
+  val escapedIdentifier = "`(.+)`".r
+
+  /** the column name pattern in quoted regex with qualifier */
+  val qualifiedEscapedIdentifier = ("(.+)" + """.""" + "`(.+)`").r
--- End diff --

when the config is on, we need to extract XYZ from `XYZ` pattern, thats why 
we need these patterns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118145019
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -177,6 +177,12 @@ object ParserUtils {
 sb.toString()
   }
 
+  /** the column name pattern in quoted regex without qualifier */
+  val escapedIdentifier = "`(.+)`".r
+
+  /** the column name pattern in quoted regex with qualifier */
+  val qualifiedEscapedIdentifier = ("(.+)" + """.""" + "`(.+)`").r
--- End diff --

these 2 seems hacky to me, we can always create `UnresolvedRegex` if the 
config is on, and `UnresolvedAttribute` otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118145607
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
--- End diff --

does hive support it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118145159
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

I'm not talking about algorithm complexity, but saying that we can simplify 
the logic by avoiding detecting the regex string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118145333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

how about
```
case u @ UnresolvedAttribute(nameParts) if nameParts.length == 1 && 
conf.supportQuotedRegexColumnName =>
  UnresolvedRegex(ctx.fieldName.getStart.getText, Some(nameParts.head))

// If there are more dereferences, turn `UnresolvedRegex` back to 
`UnresolvedAttribute`
case UnresolvedRegex(regex, table) =>
  UnresolvedAttribute(table.toSeq + regex)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118133059
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
--- End diff --

for this diff, we support column only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118130193
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

I am not sure if I understand what you said. no matter what, the colName is 
string, you need to figure out it is a regex or not. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-23 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r118129729
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

the code complexity will be similar, because if the column is ``, we need 
to extract the pattern; 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117813681
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

I don't think sql a-like syntax is really useful here. How about we create 
a special cased `col` function that takes a regex?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117808657
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

there seems no problem if we always go with the `UnresolvedRegex`, then we 
can simplify the code and remove the logic to detect regex string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117807843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
--- End diff --

```
* @param target an optional name that should be the target of the 
expansion.  If omitted all
  *  targets' columns are produced. This can either be a table 
name or struct name. This
  *  is a list of identifiers that is the path of the expansion.
```

shall we support ```record.`(id)?+.+` ```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117797804
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

no. this improves the current behavior. when it is `a`, if a is the column 
name, it will expand to just column a. if it is `(a)?+.+`, it will be treated 
it as regular expression and expand. (the current behavior is to throw 
AnalysisException.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117797457
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

we should only create UnresolvedRegex when necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117797005
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
--- End diff --

this regex is only for column names. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117795801
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
--- End diff --

moved


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117795667
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.exists(resolver(_, 
t)))
+.filter(_.name.matches(regexPattern))
+}
+  }
+
+  override def toString: String = table.map(_ + ".").getOrElse("") + 
regexPattern
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117691835
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,12 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(columnNameRegex)
--- End diff --

this is a breaking change, previously we always treat the input string as a 
column name(except for star), even it's quoted. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117690933
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(columnNameRegex) if 
conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(columnNameRegex, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
--- End diff --

what if we always create `UnresolvedRegex`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117690219
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
--- End diff --

shall we consider "struct expansion" like what we did in `UnresolvedStar`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117690015
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
--- End diff --

can we move it below `UnresolvedStar`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117689813
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,28 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(regexPattern: String, table: Option[String])
+  extends Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(regexPattern))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.exists(resolver(_, 
t)))
+.filter(_.name.matches(regexPattern))
+}
+  }
+
+  override def toString: String = table.map(_ + ".").getOrElse("") + 
regexPattern
--- End diff --

nit: `table.map(_ + "." + regexPattern).getOrElse(regexPattern)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117591864
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -244,6 +244,71 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   ("a", ClassData("a", 1)), ("b", ClassData("b", 2)), ("c", 
ClassData("c", 3)))
   }
 
+  test("select 3, regex") {
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117591611
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -244,6 +244,71 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   ("a", ClassData("a", 1)), ("b", ClassData("b", 2)), ("c", 
ClassData("c", 3)))
   }
 
+  test("select 3, regex") {
--- End diff --

-> `test("REGEX column specification")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117589317
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(i) if conf.supportQuotedRegexColumnName =>
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117589166
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(i) if conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(i, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
-UnresolvedAttribute.quoted(ctx.getText)
+ctx.getStart.getText match {
+  case escapedIdentifier(i) if conf.supportQuotedRegexColumnName =>
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117588977
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,11 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(i) if 
sqlContext.conf.supportQuotedRegexColumnName =>
+  Column(UnresolvedRegex(i, None))
+case ParserUtils.qualifiedEscapedIdentifier(i, j)
+  if sqlContext.conf.supportQuotedRegexColumnName =>
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117588953
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,11 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(i) if 
sqlContext.conf.supportQuotedRegexColumnName =>
+  Column(UnresolvedRegex(i, None))
+case ParserUtils.qualifiedEscapedIdentifier(i, j)
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117588878
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,11 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(i) if 
sqlContext.conf.supportQuotedRegexColumnName =>
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117588000
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql ---
@@ -0,0 +1,24 @@
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1"), (2, "2"), (3, "3"), (4, "4"), (5, "5"), (6, "6")
+AS testData(key, value);
+
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)
+AS testData2(a, b);
--- End diff --

sure. added two more columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117587910
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql ---
@@ -0,0 +1,24 @@
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1"), (2, "2"), (3, "3"), (4, "4"), (5, "5"), (6, "6")
+AS testData(key, value);
+
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)
+AS testData2(a, b);
+
+-- AnalysisException
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+
+-- AnalysisException
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+
+set spark.sql.parser.quotedRegexColumnNames=true;
+
+-- Regex columns
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+SELECT p.`(key)?+.+`, b, testdata2.`(b)?+.+` FROM testData p join 
testData2 ON p.key = testData2.a WHERE key < 3;
+
+-- Clean-up
+DROP VIEW IF EXISTS testData;
+DROP VIEW IF EXISTS testData2;
--- End diff --

removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117587285
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(i) if conf.supportQuotedRegexColumnName =>
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117587254
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,25 +1230,37 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
-UnresolvedAttribute(nameParts :+ attr)
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+ctx.fieldName.getStart.getText match {
+  case escapedIdentifier(i) if conf.supportQuotedRegexColumnName =>
+UnresolvedRegex(i, Some(unresolved_attr.name))
+  case _ =>
+UnresolvedAttribute(nameParts :+ attr)
+}
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
-UnresolvedAttribute.quoted(ctx.getText)
+ctx.getStart.getText match {
+  case escapedIdentifier(i) if conf.supportQuotedRegexColumnName =>
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117587207
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,11 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(i) if 
sqlContext.conf.supportQuotedRegexColumnName =>
+  Column(UnresolvedRegex(i, None))
+case ParserUtils.qualifiedEscapedIdentifier(i, j)
+  if sqlContext.conf.supportQuotedRegexColumnName =>
--- End diff --

Nit: style issue. We prefer to add the extra two space before `if` in this 
case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117587110
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,11 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(i) if 
sqlContext.conf.supportQuotedRegexColumnName =>
+  Column(UnresolvedRegex(i, None))
+case ParserUtils.qualifiedEscapedIdentifier(i, j)
--- End diff --

The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117587075
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1188,6 +1188,11 @@ class Dataset[T] private[sql](
   def col(colName: String): Column = colName match {
 case "*" =>
   Column(ResolvedStar(queryExecution.analyzed.output))
+case ParserUtils.escapedIdentifier(i) if 
sqlContext.conf.supportQuotedRegexColumnName =>
--- End diff --

Please avoid using `i` or `j`. Instead, using some meaningful variable 
names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117586747
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql ---
@@ -0,0 +1,24 @@
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1"), (2, "2"), (3, "3"), (4, "4"), (5, "5"), (6, "6")
+AS testData(key, value);
+
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)
+AS testData2(a, b);
--- End diff --

Since the test cases are testing the regex pattern matching in column 
names, could you add more names and let the regex pattern match more columns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117586530
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/query_regex_column.sql ---
@@ -0,0 +1,24 @@
+CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
+(1, "1"), (2, "2"), (3, "3"), (4, "4"), (5, "5"), (6, "6")
+AS testData(key, value);
+
+CREATE OR REPLACE TEMPORARY VIEW testData2 AS SELECT * FROM VALUES
+(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)
+AS testData2(a, b);
+
+-- AnalysisException
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+
+-- AnalysisException
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+
+set spark.sql.parser.quotedRegexColumnNames=true;
+
+-- Regex columns
+SELECT `(a)?+.+` FROM testData2 WHERE a = 1;
+SELECT t.`(a)?+.+` FROM testData2 t WHERE a = 1;
+SELECT p.`(key)?+.+`, b, testdata2.`(b)?+.+` FROM testData p join 
testData2 ON p.key = testData2.a WHERE key < 3;
+
+-- Clean-up
+DROP VIEW IF EXISTS testData;
+DROP VIEW IF EXISTS testData2;
--- End diff --

No need to drop the temp views.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117545031
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
+
+  test("SPARK-12139: REGEX Column Specification for Hive Queries") {
--- End diff --

ok. moved the test to sql/core/src/test/resources/sql-tests/inputs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117540055
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
+
+  test("SPARK-12139: REGEX Column Specification for Hive Queries") {
--- End diff --

Yes let's use those rather than adding more files to SQLQuerySUite. I'd 
love to get rid of SQLQuerySuite 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117539904
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.internal()
--- End diff --

should be public


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117533359
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
+
+  test("SPARK-12139: REGEX Column Specification for Hive Queries") {
+// hive.support.quoted.identifiers is turned off by default
+checkAnswer(
+  sql(
+"""
+  |SELECT b
+  |FROM testData2
+  |WHERE a = 1
+""".stripMargin),
+  Row(1) :: Row(2) :: Nil)
+
+checkAnswer(
+  sql(
+"""
+  |SELECT t.b
+  |FROM testData2 t
+  |WHERE a = 1
+""".stripMargin),
+  Row(1) :: Row(2) :: Nil)
--- End diff --

removed. I was trying to make sure that the existing behaviors are not 
broken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117533040
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.internal()
--- End diff --

I think it should be public. I didn't realize that that I put it under 
internal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117522441
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
+
+  test("SPARK-12139: REGEX Column Specification for Hive Queries") {
+// hive.support.quoted.identifiers is turned off by default
+checkAnswer(
+  sql(
+"""
+  |SELECT b
+  |FROM testData2
+  |WHERE a = 1
+""".stripMargin),
+  Row(1) :: Row(2) :: Nil)
+
+checkAnswer(
+  sql(
+"""
+  |SELECT t.b
+  |FROM testData2 t
+  |WHERE a = 1
+""".stripMargin),
+  Row(1) :: Row(2) :: Nil)
--- End diff --

The above two test queries are not needed in the new suite. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117522248
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 val e = intercept[AnalysisException](sql("SELECT nvl(1, 2, 3)"))
 assert(e.message.contains("Invalid number of arguments"))
   }
+
+  test("SPARK-12139: REGEX Column Specification for Hive Queries") {
--- End diff --

Could you create a file in 
`https://github.com/apache/spark/tree/master/sql/core/src/test/resources/sql-tests/inputs`?
 Now, all the new SQL test cases need to be moved there.

You can run `SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only 
*SQLQueryTestSuite"` to generate the result files. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117521173
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_REGEX_COLUMN_NAME = 
buildConf("spark.sql.parser.quotedRegexColumnNames")
+.internal()
--- End diff --

@rxin @hvanhovell @cloud-fan Should we keep it internal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117403590
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -177,6 +177,18 @@ object ParserUtils {
 sb.toString()
   }
 
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117403303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
--- End diff --

renamed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb
Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117403331
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
+.internal()
+.doc("When true, identifiers specified by regex patterns will be 
expanded.")
--- End diff --

yes. this only applies to column names. updated the doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >