[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-09-15 Thread chuxi
Github user chuxi closed the pull request at:

https://github.com/apache/spark/pull/2082


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-09-02 Thread chuxi
Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2082#issuecomment-54152206
  
@marmbrus , do I need to check something else? Or merge the code?

Besides, I see another PR : https://github.com/apache/spark/pull/2230

:) it is my friend, I  suggested him have a look of the nested parquet 
sqlpaser in the sql parquet test suit which parses dot as a delimiter, not 
identChar.

it has a problem that we don't know whether the dot should be in identchar 
or delimiter. It leads to different sql parsing result. If only struct in a 
json or parquet, apparently put the dot. in identchar is better. Does it have 
some other reasons about the dot parsing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-09-01 Thread chuxi
Github user chuxi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2082#discussion_r16969910
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypes.scala
 ---
@@ -101,3 +101,41 @@ case class GetField(child: Expression, fieldName: 
String) extends UnaryExpressio
 
   override def toString = s"$child.$fieldName"
 }
+
+/**
+ * Returns an array containing the value of fieldName
+ * for each element in the input array of type struct
+ */
+case class GetArrayField(child: Expression, fieldName: String) extends 
UnaryExpression {
+  type EvaluatedType = Any
+
+  def dataType = field.dataType
+  override def nullable = child.nullable || field.nullable
+  override def foldable = child.foldable
+
+  protected def arrayType = child.dataType match {
+case ArrayType(s: StructType, _) => s
+case otherType => sys.error(s"GetArrayField is not valid on fields of 
type $otherType")
+  }
+
+  lazy val field = if (arrayType.isInstanceOf[StructType]) {
+arrayType.fields
+  .find(_.name == fieldName)
+  .getOrElse(sys.error(s"No such field $fieldName in 
${child.dataType}"))
+  } else null
+
+
+  lazy val ordinal = arrayType.fields.indexOf(field)
+
+  override lazy val resolved = childrenResolved && 
child.dataType.isInstanceOf[ArrayType]
+
+  override def eval(input: Row): Any = {
+val value : Seq[Row] = child.eval(input).asInstanceOf[Seq[Row]]
+val v = value.map{ t =>
+  if (t == null) null else t(ordinal)
+}
+v
--- End diff --

= =


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-27 Thread chuxi
Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2082#issuecomment-53673090
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...

2014-08-26 Thread chuxi
Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2084#issuecomment-53387954
  
@marmbrus, I agree with you. Use CAST and so we can avoid some tough 
design. I know little about hive and do you mean in HiveTypeCoercion there is a 
CAST problem? I will try to follow the code.

 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-26 Thread chuxi
Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2082#issuecomment-53386890
  
Thank you, marmbrus, you are so nice. I am fresh here and never post any PR 
to a open project. I will take your suggestions and modify my code as the scala 
style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-3194][SQL] Add AttributeSet to fix...

2014-08-24 Thread chuxi
Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2109#issuecomment-53225175
  
I think references is used for transformExpression in QueryPlan to make 
sure the same attribute just traversed once when Analyzer is explaining the 
Attributes in a TreeNode (especially start to recursive from root treenode). 

Failed two times = = ..momo



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3065][SQL] Timestamp support in the par...

2014-08-24 Thread chuxi
Github user chuxi commented on the pull request:

https://github.com/apache/spark/pull/2084#issuecomment-53189498
  
I think CAST is the better choice(Compared with the NO CAST method). It is 
implemented in the 

case class Cast(child: Expression, dataType: DataType) extends 
UnaryExpression

with a lot of dataTypes, including timestampType

Besides, if you want to implement "modify Comparable expression evaluation 
so the the explicit casting is not necessary", you need to tell apart a String 
whether it is a TimeStamp format. And then modify the last line code:

```
protected lazy val literal: Parser[Literal] =
numericLit ^^ {
  case i if i.toLong > Int.MaxValue => Literal(i.toLong)
  case i => Literal(i.toInt)
} |
NULL ^^^ Literal(null, NullType) |
floatLit ^^ {case f => Literal(f.toDouble) } |
stringLit ^^ {case s => Literal(s, StringType) }
```

Literal supports TimeStamp too.

stringLit ^^ { case s => try try Timestamp.valueOf(s) catch { Literal(s, 
StringType) }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3065][SQL] Timestamp support in the par...

2014-08-21 Thread chuxi
Github user chuxi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2084#discussion_r16580536
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala ---
@@ -357,7 +358,7 @@ class SqlParser extends StandardTokenParsers with 
PackratParsers {
 literal
 
   protected lazy val dataType: Parser[DataType] =
-STRING ^^^ StringType
+STRING ^^^ StringType | TIMESTAMP ^^^ TimestampType
--- End diff --

why don't you use String as Timestamp? I don't think add another 
TimestampType is a good idea. It makes the datatype complex. 

Besides, if you want to add it in Filter part (where ~~) it will lead to a 
big problem, such as the logic of EqualTo and c2 functions in file 
predicates.scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-21 Thread chuxi
Github user chuxi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2082#discussion_r16540937
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -108,6 +109,8 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] {
 a.dataType match {
   case StructType(fields) =>
 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), 
nestedFields.last)())
+  case fields :ArrayType =>
--- End diff --

case ArrayType(fields) leads to an compile error


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-21 Thread chuxi
Github user chuxi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2082#discussion_r16541208
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala 
---
@@ -292,24 +292,29 @@ class JsonSuite extends QueryTest {
   sql("select structWithArrayFields.field1[1], 
structWithArrayFields.field2[3] from jsonTable"),
   (5, null) :: Nil
 )
-  }
 
-  ignore("Complex field and type inferring (Ignored)") {
-val jsonSchemaRDD = jsonRDD(complexFieldAndType)
-jsonSchemaRDD.registerTempTable("jsonTable")
+checkAnswer(
+  sql("select arrayOfStruct.field1, arrayOfStruct.field2 from 
jsonTable"),
+  (Seq(true, false, null), Seq("str1", null, null)) :: Nil
+)
 
-// Right now, "field1" and "field2" are treated as aliases. We should 
fix it.
 checkAnswer(
   sql("select arrayOfStruct[0].field1, arrayOfStruct[0].field2 from 
jsonTable"),
   (true, "str1") :: Nil
 )
 
-// Right now, the analyzer cannot resolve arrayOfStruct.field1 and 
arrayOfStruct.field2.
-// Getting all values of a specific field from an array of structs.
+  }
+
+  ignore("Complex field and type inferring (Ignored)") {
+val jsonSchemaRDD = jsonRDD(complexFieldAndType)
+jsonSchemaRDD.registerTempTable("jsonTable")
+
+// still need add filter??? I am not sure whether this function is 
necessary. quite complex
 checkAnswer(
-  sql("select arrayOfStruct.field1, arrayOfStruct.field2 from 
jsonTable"),
-  (Seq(true, false), Seq("str1", null)) :: Nil
+  sql("select arrayOfStruct.field1 from jsonTable where 
arrayOfStruct.field1 = true"),
--- End diff --

I add sql("select arrayOfStruct.field1 from jsonTable where 
arrayOfStruct.field1 = true") this test case in ignored part. It does not work 
because I came up with it but did not solve it. Or it makes no sense to solve 
it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-21 Thread chuxi
Github user chuxi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2082#discussion_r16540895
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -108,6 +109,8 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] {
 a.dataType match {
   case StructType(fields) =>
 Some(Alias(nestedFields.foldLeft(a: Expression)(GetField), 
nestedFields.last)())
+  case fields :ArrayType =>
--- End diff --

In fact, ArrayType(fields, _) would be better 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-21 Thread chuxi
Github user chuxi commented on a diff in the pull request:

https://github.com/apache/spark/pull/2082#discussion_r16540807
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/json/JsonSuite.scala 
---
@@ -292,24 +292,29 @@ class JsonSuite extends QueryTest {
   sql("select structWithArrayFields.field1[1], 
structWithArrayFields.field2[3] from jsonTable"),
   (5, null) :: Nil
 )
-  }
 
-  ignore("Complex field and type inferring (Ignored)") {
-val jsonSchemaRDD = jsonRDD(complexFieldAndType)
-jsonSchemaRDD.registerTempTable("jsonTable")
+checkAnswer(
+  sql("select arrayOfStruct.field1, arrayOfStruct.field2 from 
jsonTable"),
+  (Seq(true, false, null), Seq("str1", null, null)) :: Nil
+)
 
-// Right now, "field1" and "field2" are treated as aliases. We should 
fix it.
 checkAnswer(
   sql("select arrayOfStruct[0].field1, arrayOfStruct[0].field2 from 
jsonTable"),
   (true, "str1") :: Nil
 )
 
-// Right now, the analyzer cannot resolve arrayOfStruct.field1 and 
arrayOfStruct.field2.
-// Getting all values of a specific field from an array of structs.
+  }
+
+  ignore("Complex field and type inferring (Ignored)") {
+val jsonSchemaRDD = jsonRDD(complexFieldAndType)
+jsonSchemaRDD.registerTempTable("jsonTable")
+
+// still need add filter??? I am not sure whether this function is 
necessary. quite complex
 checkAnswer(
-  sql("select arrayOfStruct.field1, arrayOfStruct.field2 from 
jsonTable"),
-  (Seq(true, false), Seq("str1", null)) :: Nil
+  sql("select arrayOfStruct.field1 from jsonTable where 
arrayOfStruct.field1 = true"),
--- End diff --

wang pangzi, someone add field3 in testData arrayOfStruct. So it requires 
another null. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-2096 [SQL]: Correctly parse dot notation...

2014-08-21 Thread chuxi
GitHub user chuxi opened a pull request:

https://github.com/apache/spark/pull/2082

SPARK-2096 [SQL]: Correctly parse dot notations for accessing an array of 
structs

For example, "arrayOfStruct" is an array of structs and every element of 
this array has a field called "field1". "arrayOfStruct[0].field1" means to 
access the value of "field1" for the first element of "arrayOfStruct", but the 
SQL parser (in sql-core) treats "field1" as an alias. Also, 
"arrayOfStruct.field1" means to access all values of "field1" in this array of 
structs and the returns those values as an array. But, the SQL parser cannot 
resolve it.

I have passed the test case in JsonSuite ("Complex field and type inferring 
(Ignored)") which is ignored, by a little modified.
modified test part :
checkAnswer(
sql("select arrayOfStruct.field1, arrayOfStruct.field2 from jsonTable"),
(Seq(true, false, null), Seq("str1", null, null)) :: Nil
)
However, another question is repeated nested structure is a problem, like 
arrayOfStruct.field1.arrayOfStruct.field1 or 
arrayOfStruct[0].field1.arrayOfStruct[0].field1
I plan to ignore this problem and try to add "select arrayOfStruct.field1, 
arrayOfStruct.field2 from jsonTable where arrayOfStruct.field1==true "
Besides, my friend anyweil (Wei Li) solved the problem of 
arrayOfStruct.field1 and its Filter part( means where parsing).
I am fresh here but will continue working on spark :)

I checked the problem " where arrayOfStruct.field1==true "
this problem will lead to modify every kind of comparisonExpression. And I 
think it makes no sense to add this function. So I discard it.
Over.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chuxi/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2082.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2082


commit b1cb4fb4e3da7ed54ac875afc20a81f25310fa87
Author: chuxi 
Date:   2014-08-21T12:47:25Z

Correctly parse dot notations for accessing an array of structs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org