date:20170518

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17770
  
**[Test build #77082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77082/testReport)**
 for PR 17770 at commit 
[`6a7204c`](https://github.com/apache/spark/commit/6a7204c0fc00dbe2e43d6d65e722b3b13c3b35d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17770
  
It seems to me that we won't want to show `AnalysisBarrier` in analyzed 
plan, unlike `SubqueryAlias`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77078/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17770#discussion_r117404393
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -187,6 +187,9 @@ class Dataset[T] private[sql](
 }
   }
 
+  // Wrap analyzed logical plan with an analysis barrier so we won't 
traverse/resolve it again.
+  @transient private val planWithBarrier: LogicalPlan = 
AnalysisBarrier(logicalPlan)
--- End diff --

`CacheManager` uses `Dataset.logicalPlan` as key to look up identical plans 
already cached. If we always wrap `logicalPlan` with a barrier, we need to 
strip it when looking up caches.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77078 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77078/testReport)**
 for PR 18016 at commit 
[`6d51c07`](https://github.com/apache/spark/commit/6d51c07e464c81f7d0337d7f632d3d9552a50cec).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17770#discussion_r117404229
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1741,7 +1744,7 @@ class Dataset[T] private[sql](
   def union(other: Dataset[T]): Dataset[T] = withSetOperator {
 // This breaks caching, but it's usually ok because it addresses a 
very specific use case:
 // using union to union many files or partitions.
-CombineUnions(Union(logicalPlan, other.logicalPlan))
+CombineUnions(Union(logicalPlan, 
other.logicalPlan)).mapChildren(AnalysisBarrier(_))
--- End diff --

Sure and also all above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17770#discussion_r117404204
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2470,6 +2480,13 @@ object CleanupAliases extends Rule[LogicalPlan] {
   }
 }
 
+/** Remove the barrier nodes of analysis */
+object CleanupBarriers extends Rule[LogicalPlan] {
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17770#discussion_r117404192
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -166,14 +166,15 @@ class Analyzer(
 Batch("Subquery", Once,
   UpdateOuterReferences),
 Batch("Cleanup", fixedPoint,
-  CleanupAliases)
+  CleanupAliases,
+  CleanupBarriers)
--- End diff --

We do cleaning up the barriers in the end of Analysis is because we don't 
want to show it in analyzed plan. If we move it the "Finish Analysis" batch, it 
will show up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-18 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17770#discussion_r117403987
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -912,3 +913,10 @@ case class Deduplicate(
 
   override def output: Seq[Attribute] = child.output
 }
+
+/** A logical plan for setting a barrier of analysis */
+case class AnalysisBarrier(child: LogicalPlan) extends LeafNode {
+  override def output: Seq[Attribute] = child.output
+  override def analyzed: Boolean = true
+  override def isStreaming: Boolean = child.isStreaming
--- End diff --

It should be fine to use the default `canonicalized`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77077/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77077 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77077/testReport)**
 for PR 18016 at commit 
[`8b346e6`](https://github.com/apache/spark/commit/8b346e6f6e211a8945e9d3fc9db489ce4c27ba87).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18011
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77076/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18011
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117403590
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -177,6 +177,18 @@ object ParserUtils {
 sb.toString()
   }
 
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18011
  
**[Test build #77076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77076/testReport)**
 for PR 18011 at commit 
[`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117403303
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
--- End diff --

renamed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117403331
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
+.internal()
+.doc("When true, identifiers specified by regex patterns will be 
expanded.")
--- End diff --

yes. this only applies to column names. updated the doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117402527
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
 ---
@@ -177,6 +177,18 @@ object ParserUtils {
 sb.toString()
   }
 
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

Please add a comment for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117402461
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
+.internal()
+.doc("When true, identifiers specified by regex patterns will be 
expanded.")
--- End diff --

It must be quoted. Thus, we also need to mention it in the description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117402094
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
+.internal()
+.doc("When true, identifiers specified by regex patterns will be 
expanded.")
--- End diff --

We only do it for the column names, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117402025
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
   .intConf
   
.createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt)
 
+  val SUPPORT_QUOTED_IDENTIFIERS = 
buildConf("spark.sql.support.quoted.identifiers")
--- End diff --

How about renaming it to `spark.sql.parser.regexColumnNames`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18023
  
Like what we did for `*` in `Column.scala`, we also need to handle the 
Dataset APIs. You can follow the way we handle star there.
```Scala
df.select(df("(a|b)?+.+"))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18023
  
**[Test build #77081 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77081/testReport)**
 for PR 18023 at commit 
[`6e37517`](https://github.com/apache/spark/commit/6e375177e68a216cdd53de1e5d600d898b2b59d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #77080 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77080/testReport)**
 for PR 18029 at commit 
[`9944da8`](https://github.com/apache/spark/commit/9944da82b0b07642f0489c597d9b63176a361f0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117399885
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
+case r@escapedIdentifier(i) =>
+  UnresolvedRegex(i, Some(unresolved_attr.name))
+case _ =>
+  UnresolvedAttribute(nameParts :+ attr)
+  })
+  return ret.get
+}
+
 UnresolvedAttribute(nameParts :+ attr)
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.getStart).map(_.getText match {
--- End diff --

removed the option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117399877
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
+case r@escapedIdentifier(i) =>
+  UnresolvedRegex(i, Some(unresolved_attr.name))
+case _ =>
+  UnresolvedAttribute(nameParts :+ attr)
+  })
+  return ret.get
+}
+
 UnresolvedAttribute(nameParts :+ attr)
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

Add API in ParserUtils.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-18 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
@felixcheung Thanks for your feedback. 
- This does not affect discoverability: the name of the method is still on 
the index list 
- No problem with help either, e.g., one can use `?avg`. 


![image](https://cloud.githubusercontent.com/assets/11082368/26232656/945b3afe-3c0c-11e7-8c17-fa8df5e4ee2e.png)

Another benefit is that we can get rid of most warnings on no examples 
since we now document all the tiny functions together. 

I think it is important and the change is straightforward. However, this is 
a pretty manual (and big) change. I would like to get a `Yes` from you for 
doing this. Thanks.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117399811
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

Add API in ParserUtils. 

I think in the parser, it can still get ``; after that, the `` are stripped 
off.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117399718
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
--- End diff --

removed the option


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117399399
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18031
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18031
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77072/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: [SPARK-20801] Record accurate size of blocks in MapStatu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18031
  
**[Test build #77072 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77072/testReport)**
 for PR 18031 at commit 
[`bfea9f5`](https://github.com/apache/spark/commit/bfea9f59fd7587b87de0ddb4601f76786671f38a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117398452
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+val expandedAttributes: Seq[Attribute] = table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(expr))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, 
t)).nonEmpty)
+.filter(_.name.matches(expr))
+}
+
+expandedAttributes.zip(input.output).map {
--- End diff --

you are right. we dont need it any more. removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16677
  
**[Test build #77079 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77079/testReport)**
 for PR 16677 at commit 
[`55ee6b0`](https://github.com/apache/spark/commit/55ee6b0fb3bc9e6998b4098a369c54a15824e414).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistices to improve...

2017-05-18 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77078 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77078/testReport)**
 for PR 18016 at commit 
[`6d51c07`](https://github.com/apache/spark/commit/6d51c07e464c81f7d0337d7f632d3d9552a50cec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117398110
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+val expandedAttributes: Seq[Attribute] = table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(expr))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, 
t)).nonEmpty)
--- End diff --

updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread janewangfb

Github user janewangfb commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117397712
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
--- End diff --

renamed to pattern


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77074/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77074 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77074/testReport)**
 for PR 18016 at commit 
[`1f771bd`](https://github.com/apache/spark/commit/1f771bd9bdee15b4a2c2d829f5f60404044ba9af).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77077 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77077/testReport)**
 for PR 18016 at commit 
[`8b346e6`](https://github.com/apache/spark/commit/8b346e6f6e211a8945e9d3fc9db489ce4c27ba87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18016: [SPARK-20786][SQL]Improve ceil and floor handle t...

2017-05-18 Thread heary-cao

Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18016#discussion_r117397093
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala ---
@@ -173,6 +173,14 @@ class MathFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(
   sql("SELECT ceiling(0), ceiling(1), ceiling(1.5)"),
   Row(0L, 1L, 2L))
+
+checkAnswer(
+  sql("SELECT ceil(1234567890123456), ceil(12345678901234567)"),
+  Row(1234567890123456L, 12345678901234567L))
+
+checkAnswer(
+  sql("SELECT ceiling(1234567890123456), ceiling(12345678901234567)"),
+  Row(1234567890123456L, 12345678901234567L))
--- End diff --

ok, add new tests to the end of operators.sql.
please review it again.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77073/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18016
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77073 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77073/testReport)**
 for PR 18016 at commit 
[`68ecf5e`](https://github.com/apache/spark/commit/68ecf5e129eaba5830c439e1196bd4f1ee22ae42).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...

2017-05-18 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18015
  
Thank you, I will work better for Spark web ui.
jenkins to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17936: [SPARK-20638][Core]Optimize the CartesianRDD to reduce r...

2017-05-18 Thread ConeyLiu

Github user ConeyLiu commented on the issue:

https://github.com/apache/spark/pull/17936
  
@srowen  Sorry for the late reply. I updated the code. Because we should 
reduce times of the remotely fetch, the second partition should be cached in 
locally. There are two ways, first cached by the `TaskConsumer` which 
controlled by the `Execution Memory`(this methods seems #9969); Second, cached 
by the `BlockManager` which controlled by the `Storage Memory`.  Through the 
experiment found that the first way gc problem is very serious. 

Cartesian only used in `ALS` and `UnsafeCartesianRDD`. However, the latter 
itself implements a `Cartesian`, you can see as follow:
```
class UnsafeCartesianRDD(
left : RDD[UnsafeRow],
right : RDD[UnsafeRow],
numFieldsOfRight: Int,
spillThreshold: Int)
  extends CartesianRDD[UnsafeRow, UnsafeRow](left.sparkContext, left, 
right) {

  override def compute(split: Partition, context: TaskContext): 
Iterator[(UnsafeRow, UnsafeRow)] = {
val rowArray = new ExternalAppendOnlyUnsafeRowArray(spillThreshold)

val partition = split.asInstanceOf[CartesianPartition]
rdd2.iterator(partition.s2, context).foreach(rowArray.add)

// Create an iterator from rowArray
def createIter(): Iterator[UnsafeRow] = rowArray.generateIterator()

val resultIter =
  for (x <- rdd1.iterator(partition.s1, context);
   y <- createIter()) yield (x, y)
CompletionIterator[(UnsafeRow, UnsafeRow), Iterator[(UnsafeRow, 
UnsafeRow)]](
  resultIter, rowArray.clear())
  }
}
```
So I think there should be no other impact.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18011
  
**[Test build #77076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77076/testReport)**
 for PR 18011 at commit 
[`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77071/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...

2017-05-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16986#discussion_r117394141
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -329,35 +329,19 @@ object ScalaReflection extends ScalaReflection {
 }
 UnresolvedMapObjects(mapFunction, getPath, Some(cls))
 
-  case t if t <:< localTypeOf[Map[_, _]] =>
+  case t if t <:< localTypeOf[Map[_, _]] || t <:< 
localTypeOf[java.util.Map[_, _]] =>
--- End diff --

let's remove them and related java map tests in this PR and add them in 
next PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77071 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77071/testReport)**
 for PR 14971 at commit 
[`1e4182d`](https://github.com/apache/spark/commit/1e4182d1e03622cdcc84f6cd951b2c534289e78f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-18 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18011
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17936: [SPARK-20638][Core]Optimize the CartesianRDD to r...

2017-05-18 Thread ConeyLiu

Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17936#discussion_r117393923
  
--- Diff: 
core/src/test/scala/org/apache/spark/metrics/InputOutputMetricsSuite.scala ---
@@ -198,8 +198,12 @@ class InputOutputMetricsSuite extends SparkFunSuite 
with SharedSparkContext
 // write files to disk so we can read them later.
 sc.parallelize(cartVector).saveAsTextFile(cartFilePath)
 val aRdd = sc.textFile(cartFilePath, numPartitions)
+aRdd.cache()
+aRdd.count()
--- End diff --

There is a very strange mistake. If we cache both `aRdd` & `tmpRdd`,  this 
pr and master branch all pasted the test. But if we just cache the `tmpRdd`, 
both the branch are failed. So here are temporarily set to cache. I will look 
at the details of the problem, it may be a bug, if I understand the wrong me, 
please pointer me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17936: [SPARK-20638][Core]Optimize the CartesianRDD to r...

2017-05-18 Thread ConeyLiu

Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17936#discussion_r117393634
  
--- Diff: 
core/src/test/scala/org/apache/spark/metrics/InputOutputMetricsSuite.scala ---
@@ -198,8 +198,12 @@ class InputOutputMetricsSuite extends SparkFunSuite 
with SharedSparkContext
 // write files to disk so we can read them later.
 sc.parallelize(cartVector).saveAsTextFile(cartFilePath)
 val aRdd = sc.textFile(cartFilePath, numPartitions)
+aRdd.cache()
+aRdd.count()
 
 val tmpRdd = sc.textFile(tmpFilePath, numPartitions)
+tmpRdd.cache()
+tmpRdd.count()
--- End diff --

Because we cache the rdd in the CartesianRDD compute method, so there we 
should count the bytes read from memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117393090
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -215,6 +218,215 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
 }
   }
 
+  private def createNonPartitionedTable(
+  tabName: String,
+  analyzedBySpark: Boolean = true,
+  analyzedByHive: Boolean = true): Unit = {
+val hiveClient = 
spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
+sql(
+  s"""
+ |CREATE TABLE $tabName (key STRING, value STRING)
+ |STORED AS TEXTFILE
+ |TBLPROPERTIES ('prop1' = 'val1', 'prop2' = 'val2')
+   """.stripMargin)
+sql(s"INSERT INTO TABLE $tabName SELECT * FROM src")
+if (analyzedBySpark) sql(s"ANALYZE TABLE $tabName COMPUTE STATISTICS")
+// This is to mimic the scenario in which Hive genrates statistics 
before we reading it
+if (analyzedByHive) hiveClient.runSqlHive(s"ANALYZE TABLE $tabName 
COMPUTE STATISTICS")
+val describeResult1 = hiveClient.runSqlHive(s"DESCRIBE FORMATTED 
$tabName")
+
+val tableMetadata =
+  
spark.sessionState.catalog.getTableMetadata(TableIdentifier(tabName)).properties
+// statistics info is not contained in the metadata of the original 
table
+assert(Seq(StatsSetupConst.COLUMN_STATS_ACCURATE,
+  StatsSetupConst.NUM_FILES,
+  StatsSetupConst.NUM_PARTITIONS,
+  StatsSetupConst.ROW_COUNT,
+  StatsSetupConst.RAW_DATA_SIZE,
+  StatsSetupConst.TOTAL_SIZE).forall(!tableMetadata.contains(_)))
+
+if (analyzedByHive) {
+  assert(StringUtils.filterPattern(describeResult1, 
"*numRows\\s+500*").nonEmpty)
+} else {
+  assert(StringUtils.filterPattern(describeResult1, 
"*numRows\\s+500*").isEmpty)
+}
+  }
+
+  private def extractStatsPropValues(
+  descOutput: Seq[String],
+  propKey: String): Option[BigInt] = {
+val str = descOutput
+  .filterNot(_.contains(HiveExternalCatalog.STATISTICS_PREFIX))
+  .filter(_.contains(propKey))
+if (str.isEmpty) {
+  None
+} else {
+  assert(str.length == 1, "found more than one matches")
+  val pattern = new Regex(s"""$propKey\\s+(-?\\d+)""")
+  val pattern(value) = str.head.trim
+  Option(BigInt(value))
+}
+  }
+
+  test("get statistics when not analyzed in both Hive and Spark") {
+val tabName = "tab1"
+withTable(tabName) {
+  createNonPartitionedTable(tabName, analyzedByHive = false, 
analyzedBySpark = false)
+  checkTableStats(
+tabName, hasSizeInBytes = true, expectedRowCounts = None)
+
+  // ALTER TABLE SET TBLPROPERTIES invalidates some contents of Hive 
specific statistics
+  // This is triggered by the Hive alterTable API
+  val hiveClient = 
spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
+  val describeResult = hiveClient.runSqlHive(s"DESCRIBE FORMATTED 
$tabName")
+
+  val rawDataSize = extractStatsPropValues(describeResult, 
"rawDataSize")
+  val numRows = extractStatsPropValues(describeResult, "numRows")
+  val totalSize = extractStatsPropValues(describeResult, "totalSize")
+  assert(rawDataSize.isEmpty, "rawDataSize should not be shown without 
table analysis")
+  assert(numRows.isEmpty, "numRows should not be shown without table 
analysis")
+  assert(totalSize.isDefined && totalSize.get > 0, "totalSize is lost")
+}
+  }
+
+  test("alter table rename after analyze table") {
+Seq(true, false).foreach { analyzedBySpark =>
+  val oldName = "tab1"
+  val newName = "tab2"
+  withTable(oldName, newName) {
+createNonPartitionedTable(oldName, analyzedByHive = true, 
analyzedBySpark = analyzedBySpark)
+val fetchedStats1 = checkTableStats(
+  oldName, hasSizeInBytes = true, expectedRowCounts = Some(500))
+sql(s"ALTER TABLE $oldName RENAME TO $newName")
+val fetchedStats2 = checkTableStats(
+  newName, hasSizeInBytes = true, expectedRowCounts = Some(500))
+assert(fetchedStats1 == fetchedStats2)
+
+// ALTER TABLE RENAME does not affect the contents of Hive 
specific statistics
+val hiveClient = 
spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client
+val describeResult = hiveClient.runSqlHive(s"DESCRIBE FORMATTED 
$newName")
+
+val rawDataSize = extractStatsPropValues(describeResult, 
"rawDataSize")
+val numRows = extractStatsPropValues(describeResult, "numRows")

[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-18 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117392683
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -414,6 +415,50 @@ private[hive] class HiveClientImpl(
 
   val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
 
+  // Hive-generated Statistics are also recorded in ignoredProperties
+  val ignoredProperties = scala.collection.mutable.Map.empty[String, 
String]
+  for (key <- HiveStatisticsProperties; value <- properties.get(key)) {
+ignoredProperties += key -> value
+  }
+
+  val excludedTableProperties = HiveStatisticsProperties ++ Set(
+// The property value of "comment" is moved to the dedicated field 
"comment"
+"comment",
+// For EXTERNAL_TABLE, the table properties has a particular field 
"EXTERNAL". This is added
+// in the function toHiveTable.
+"EXTERNAL"
+  )
+
+  val filteredProperties = properties.filterNot {
+case (key, _) => excludedTableProperties.contains(key)
+  }
+  val comment = properties.get("comment")
+
+  val totalSize = 
properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
+  val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
+  lazy val rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) match {
--- End diff --

1. I think we can just use val, no need to bother about performance here.
2. can be simplified to `xxx.filter(_ >= 0)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77075/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #77075 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77075/testReport)**
 for PR 18029 at commit 
[`75d8523`](https://github.com/apache/spark/commit/75d852384f12554c3171513f11d31604ff206dac).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #77075 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77075/testReport)**
 for PR 18029 at commit 
[`75d8523`](https://github.com/apache/spark/commit/75d852384f12554c3171513f11d31604ff206dac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18029: [SPARK-20168][WIP][DStream] Add changes to use kinesis f...

2017-05-18 Thread yssharma

Github user yssharma commented on the issue:

https://github.com/apache/spark/pull/18029
  
@budde @brkyvz would love to hear your thoughts if this is the best way to 
add this functionality 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77074 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77074/testReport)**
 for PR 18016 at commit 
[`1f771bd`](https://github.com/apache/spark/commit/1f771bd9bdee15b4a2c2d829f5f60404044ba9af).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...

2017-05-18 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17955#discussion_r117388593
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala ---
@@ -42,13 +41,12 @@ private[spark] class ShuffleMapStage(
 parents: List[Stage],
 firstJobId: Int,
 callSite: CallSite,
-val shuffleDep: ShuffleDependency[_, _, _])
+val shuffleDep: ShuffleDependency[_, _, _],
--- End diff --

Good catch. I agree, but with the caveat that we can only clean this up if 
this isn't functioning as the last strong reference which keeps the dependency 
from being garbage-collected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18016
  
**[Test build #77073 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77073/testReport)**
 for PR 18016 at commit 
[`68ecf5e`](https://github.com/apache/spark/commit/68ecf5e129eaba5830c439e1196bd4f1ee22ae42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...

2017-05-18 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17955#discussion_r117388447
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1233,17 +1223,6 @@ class DAGScheduler(
   logInfo("waiting: " + waitingStages)
   logInfo("failed: " + failedStages)
 
-  // We supply true to increment the epoch number here in case 
this is a
-  // recomputation of the map outputs. In that case, some 
nodes may have cached
-  // locations with holes (from when we detected the error) 
and will need the
-  // epoch incremented to refetch them.
-  // TODO: Only increment the epoch number if this is not the 
first time
-  //   we registered these map outputs.
-  mapOutputTracker.registerMapOutputs(
-shuffleStage.shuffleDep.shuffleId,
-shuffleStage.outputLocInMapOutputTrackerFormat(),
-changeEpoch = true)
--- End diff --

I need to think about this carefully and maybe make a matrix of possible 
cases to be sure. My original thought process was something like this:

- The old code comment says `TODO: Only increment the epoch number if this 
is not the first time we registered these map outputs`, which implies that at 
least some of the epoch increments here were unnecessary.
- If we assume that a new, never-before-computed map output won't be 
requested by executors before it is complete then we don't need to worry about 
executors caching incomplete map outputs.
- I believe that any FetchFailure should end up incrementing the epoch.

That said, the increment here is only occurring once per stage completion. 
It probably doesn't _hurt_ to bump the epoch here because in a 
single-stage-at-a-time case we'd only be invalidating map outputs which we'll 
never fetch again anyways. Even if we were unnecessarily invalidating the map 
output statuses of other concurrent stages I think that the impact of this is 
going to be relatively small (if we did find that this had an impact then a 
sane approach would be to implement an e-tag like mechanism where bumping the 
epoch doesn't purge the executor-side caches, but, instead, has them verify a 
per-stage epoch / counter). Finally, the existing code might be giving us nice 
eager cleanup of map statuses after stages complete (vs. the cleanup which 
occurs later when stages or shuffles are fully cleaned up).

I think you're right that this change carries unnecessary / 
not-fully-understood risks for now, so let me go ahead and put in an explicit 
increment here (with an updated comment / ref. to this discussion) in my next 
push to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18023
  
**[Test build #77068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77068/testReport)**
 for PR 18023 at commit 
[`7699e87`](https://github.com/apache/spark/commit/7699e871a31e37755b35c88b893faf9df8f7664f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18023
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18023
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77068/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17999
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...

2017-05-18 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17955#discussion_r117385925
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapStage.scala ---
@@ -42,13 +41,12 @@ private[spark] class ShuffleMapStage(
 parents: List[Stage],
 firstJobId: Int,
 callSite: CallSite,
-val shuffleDep: ShuffleDependency[_, _, _])
+val shuffleDep: ShuffleDependency[_, _, _],
--- End diff --

Seems we can pass the `shuffleId`, instead of `ShuffleDependency` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17955: [SPARK-20715] Store MapStatuses only in MapOutput...

2017-05-18 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17955#discussion_r117385673
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1233,17 +1223,6 @@ class DAGScheduler(
   logInfo("waiting: " + waitingStages)
   logInfo("failed: " + failedStages)
 
-  // We supply true to increment the epoch number here in case 
this is a
-  // recomputation of the map outputs. In that case, some 
nodes may have cached
-  // locations with holes (from when we detected the error) 
and will need the
-  // epoch incremented to refetch them.
-  // TODO: Only increment the epoch number if this is not the 
first time
-  //   we registered these map outputs.
-  mapOutputTracker.registerMapOutputs(
-shuffleStage.shuffleDep.shuffleId,
-shuffleStage.outputLocInMapOutputTrackerFormat(),
-changeEpoch = true)
--- End diff --

Is it safer if we increment the epoch number here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17999
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77067/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17999
  
**[Test build #77067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77067/testReport)**
 for PR 17999 at commit 
[`ea10dee`](https://github.com/apache/spark/commit/ea10dee343671e3d9c79eb0bcddc55a2ee3d1d71).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17985: Add "full_outer" name to join types

2017-05-18 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17985
  
@BartekH Yes, I think we can add that to exception message. Please also add 
a test case for checking supported join types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17992: [SPARK-20759] SCALA_VERSION in _config.yml should be con...

2017-05-18 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/17992
  
@srowen  
Hello,do you know how to finish the test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18031
  
**[Test build #77072 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77072/testReport)**
 for PR 18031 at commit 
[`bfea9f5`](https://github.com/apache/spark/commit/bfea9f59fd7587b87de0ddb4601f76786671f38a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18031
  
@HyukjinKwon 
Thank you so much ! Really helpful ð 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18031: Record accurate size of blocks in MapStatus when ...

2017-05-18 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18031#discussion_r117385321
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus(
 }
 
 /**
- * A [[MapStatus]] implementation that only stores the average size of 
non-empty blocks,
- * plus a bitmap for tracking which blocks are empty.
+ * A [[MapStatus]] implementation that stores the accurate size of huge 
blocks, which are larger
+ * than both [[config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD]] and
+ * [[config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE]] * 
averageSize. It stores the
--- End diff --

It looks the documentation generation for Javadoc 8 is being failed due to 
these links - 

```
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:4:
 error: reference not found
[error]  * than both {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD} and
[error] ^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:5:
 error: reference not found
[error]  * {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE} 
* averageSize. It stores the
[error]   ^
[error] 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/core/target/java/org/apache/spark/sql/functions.java:2996:
 error: invalid uri: 
"http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html 
Customizing Formats"
[error]* @see http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html 
Customizing Formats"/>
[error] 
^
```

Probably, we should wrap it `` `...` `` as I did before - 
https://github.com/apache/spark/pull/16013 or find a way to make this link 
properly.

The other errors seem spurious. Please refer my observation - 
https://github.com/apache/spark/pull/17389#issuecomment-288438704

(I think we should fix it or document ^ somewhere at least).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77071/testReport)**
 for PR 14971 at commit 
[`1e4182d`](https://github.com/apache/spark/commit/1e4182d1e03622cdcc84f6cd951b2c534289e78f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18031
  
**[Test build #77070 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/testReport)**
 for PR 18031 at commit 
[`970421b`](https://github.com/apache/spark/commit/970421b2a5cb2278d60403f72dc165418e4faf87).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18031
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77070/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77066/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18031
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77066/testReport)**
 for PR 14971 at commit 
[`aa9a36e`](https://github.com/apache/spark/commit/aa9a36e1c5bff881a053a139f49344be0ad62452).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117367232
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+val expandedAttributes: Seq[Attribute] = table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(expr))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, 
t)).nonEmpty)
--- End diff --

`input.output.filter(_.qualifier.exists(resolver(_, t)))` is a bit more 
concise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117379878
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
+case r@escapedIdentifier(i) =>
+  UnresolvedRegex(i, Some(unresolved_attr.name))
+case _ =>
+  UnresolvedAttribute(nameParts :+ attr)
+  })
+  return ret.get
+}
+
 UnresolvedAttribute(nameParts :+ attr)
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

We don't need to compile the same regex over and over. Can you move this to 
the ParserUtils...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117367155
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
--- End diff --

`expr` is the pattern right? Maybe we should give it a better name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117380037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
--- End diff --

Using an option here does not add a thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117366828
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -84,6 +84,33 @@ case class UnresolvedTableValuedFunction(
 }
 
 /**
+ * Represents all of the input attributes to a given relational operator, 
for example in
+ * "SELECT `(id)?+.+` FROM ...".
+ *
+ * @param table an optional table that should be the target of the 
expansion.  If omitted all
+ *  tables' columns are produced.
+ */
+case class UnresolvedRegex(expr: String, table: Option[String]) extends 
Star with Unevaluable {
+  override def expand(input: LogicalPlan, resolver: Resolver): 
Seq[NamedExpression] = {
+val expandedAttributes: Seq[Attribute] = table match {
+  // If there is no table specified, use all input attributes that 
match expr
+  case None => input.output.filter(_.name.matches(expr))
+  // If there is a table, pick out attributes that are part of this 
table that match expr
+  case Some(t) => input.output.filter(_.qualifier.filter(resolver(_, 
t)).nonEmpty)
+.filter(_.name.matches(expr))
+}
+
+expandedAttributes.zip(input.output).map {
--- End diff --

An `Attribute` is always a `NamedExpression`, why do we need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117368022
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
--- End diff --

Please use a guard, e.g.: `case unresolved_attr @ 
UnresolvedAttribute(nameParts) if conf.supportQuotedIdentifiers => `. That 
makes the logic down the line much simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117380055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.fieldName.getStart).map(_.getText match {
+case r@escapedIdentifier(i) =>
+  UnresolvedRegex(i, Some(unresolved_attr.name))
+case _ =>
+  UnresolvedAttribute(nameParts :+ attr)
+  })
+  return ret.get
+}
+
 UnresolvedAttribute(nameParts :+ attr)
   case e =>
 UnresolvedExtractValue(e, Literal(attr))
 }
   }
 
   /**
-   * Create an [[UnresolvedAttribute]] expression.
+   * Create an [[UnresolvedAttribute]] expression or a [[UnresolvedRegex]] 
if it is a regex
+   * quoted in ``
*/
   override def visitColumnReference(ctx: ColumnReferenceContext): 
Expression = withOrigin(ctx) {
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
+  val ret = Option(ctx.getStart).map(_.getText match {
--- End diff --

Using an option here does not add a thing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-18 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/18023#discussion_r117367722
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1230,24 +1230,49 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a dereference expression. The return type depends on the type 
of the parent, this can
-   * either be a [[UnresolvedAttribute]] (if the parent is an 
[[UnresolvedAttribute]]), or an
-   * [[UnresolvedExtractValue]] if the parent is some expression.
+   * Create a dereference expression. The return type depends on the type 
of the parent.
+   * If the parent is an [[UnresolvedAttribute]], it can be a 
[[UnresolvedAttribute]] or
+   * a [[UnresolvedRegex]] for regex quoted in ``; if the parent is some 
other expression,
+   * it can be [[UnresolvedExtractValue]].
*/
   override def visitDereference(ctx: DereferenceContext): Expression = 
withOrigin(ctx) {
 val attr = ctx.fieldName.getText
 expression(ctx.base) match {
-  case UnresolvedAttribute(nameParts) =>
+  case unresolved_attr @ UnresolvedAttribute(nameParts) =>
+if (conf.supportQuotedIdentifiers) {
+  val escapedIdentifier = "`(.+)`".r
--- End diff --

We don't need to compile the same regex over and over. Can you move this to 
the ParserUtils...

I am also wondering if we shouldn't do the match in the parser it self.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18031
  
**[Test build #77070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/testReport)**
 for PR 18031 at commit 
[`970421b`](https://github.com/apache/spark/commit/970421b2a5cb2278d60403f72dc165418e4faf87).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18031: Record accurate size of blocks in MapStatus when it's ab...

2017-05-18 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18031
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 441 matches

Mail list logo