[GitHub] spark issue #18025: [WIP][SparkR] Update doc and examples for sql functions

2017-05-17 Thread actuaryzhang
Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/18025
  
@felixcheung @HyukjinKwon 

Per this 
[suggestion](https://github.com/apache/spark/pull/18003#discussion-diff-116853922L57),
 I'm creating more meaningful examples for the SQL functions. 

Since these functions can be grouped, we can create a single page doc for 
each group of the functions and construct concrete and useful examples for each 
group. The benefit is obvious:
-  Centralized documentation of related functions. This makes it easier for 
user to navigate. Right now there are TOO many items in the `see also` section. 
-  Examples can share the same data. This avoids creating a data frame for 
each function if they are documented separately.
- Cleaner structure and much fewer Rd files.

Indeed, this is part of what was discussed in #17161. I have explored this 
for a few functions to illustrate the idea. Since this is a big effort, I would 
like to get folks' opinions before extending this to all functions. 

In this commit, I created docs for some sample functions in three groups:
-  'column_datetime_functions' to document all datetime functions
-  'column_aggregate_functions' to document all aggregate functions
-  'column_math_functions' to document all math functions
-  ...

Below is what 'column_datetime_functions.Rd' looks like:


![image](https://cloud.githubusercontent.com/assets/11082368/26189797/426029f0-3b5b-11e7-9175-c63b0e5c0014.png)

![image](https://cloud.githubusercontent.com/assets/11082368/26189810/56630954-3b5b-11e7-9d70-3e74b6d3b032.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17997: [SPARK-20763][SQL]The function of `month` and `day` retu...

2017-05-17 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17997
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18025: [WIP][SparkR] Update doc and examples for sql fun...

2017-05-17 Thread actuaryzhang
GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/18025

[WIP][SparkR] Update doc and examples for sql functions

## What changes were proposed in this pull request?
Create better examples for sql functions. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark sparkRDoc4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18025.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18025


commit 5c8cd1e5da896d78ea3cb4fcf5e046d22090dc2a
Author: Wayne Zhang 
Date:   2017-05-18T06:32:42Z

sql function examples prototype




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18020: [SPARK-20700][SQL] InferFiltersFromConstraints st...

2017-05-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18020


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18020
  
Thanks! Merging to master/2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #77039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77039/testReport)**
 for PR 16989 at commit 
[`4ece142`](https://github.com/apache/spark/commit/4ece142d2a3c4b46a712539e3aa7f7ee0d4e6b5b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18011
  
**[Test build #77040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77040/testReport)**
 for PR 18011 at commit 
[`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread JoshRosen
Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/16989
  
I think that the current use of `MemoryMode.OFF_HEAP` allocation will cause 
problems in out-of-the-box deployments using the default configurations. In 
Spark's current memory manager implementation the total amount of Spark-managed 
off-heap memory that we will use is controlled by `spark.memory.offHeap.size` 
and the default value is 0. In this PR, the comment on 
`spark.reducer.maxReqSizeShuffleToMem` says that it should be smaller than 
`spark.memory.offHeap.size` and yet the default is 200 megabytes so the default 
configuration is invalid.

Because `preferDirectBufs()` is `true` by default it looks like the code 
here will always try to reserve memory using `MemoryMode.OFF_HEAP` and these 
reservations will always fail in the default configuration because the off-heap 
size will be zero, so I think the net effect of this patch will be to always 
spill to disk.

One way to address this problem is to configure the default value of 
`spark.memory.offHeap.size` to match the JVM's internal limit on the amount of 
direct buffers that it can allocate minus some percentage or fixed overhead. 
Basically the problem is that Spark's off-heap memory manager was originally 
designed to only manage off-heap memory explicitly allocated by Spark itself 
when creating its own buffers / pages or caching blocks, not to account for 
off-heap memory used by lower-level code or third-party libraries. I'll see if 
I can think of a clean way to fix this, which I think will need to be done 
before the defaults used here can work as intended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add a Bucketizer that can bin mul...

2017-05-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
ping @MLnick Do you have more comments on this? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18000
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117168737
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -538,6 +538,21 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   // scalastyle:on nonascii
 }
   }
+
+  test("SPARK-20364: Disable Parquet predicate pushdown for fields having 
dots in the names") {
--- End diff --

Looks much better now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117168546
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -47,39 +49,47 @@ import org.apache.spark.util.{AccumulatorContext, 
AccumulatorV2}
  *data type is nullable.
  */
 class ParquetFilterSuite extends QueryTest with ParquetTest with 
SharedSQLContext {
--- End diff --

Sure, I just revert it back and made a simple test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117168094
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStats(va

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117168074
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStats(va

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117167259
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStat

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117167238
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -730,4 +726,49 @@ public void testToLong() throws IOException {
   assertFalse(negativeInput, 
UTF8String.fromString(negativeInput).toLong(wrapper));
 }
   }
+  @Test
+  public void trimsChar() {
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117167072
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStats(va

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117166546
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

please see

![scala](https://cloud.githubusercontent.com/assets/26266482/26188588/a9682798-3bd2-11e7-99b0-31587235f9a3.png)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117166463
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -510,6 +510,67 @@ public UTF8String trim() {
 }
   }
 
+  /**
+   * Removes all specified trim character string either from the beginning 
or the ending of a string
+   * @param trimString the trim character string
+   */
+  public UTF8String trim(UTF8String trimString) {
+// this method do the trimLeft first, then trimRight
+int s = 0; // the searching byte position of the input string
+int i = 0; // the first beginning byte position of a non-matching 
character
+int e = 0; // the last byte position
+int numChars = 0; // number of characters from the input string
+int[] stringCharLen = new int[numBytes]; // array of character length 
for the input string
+int[] stringCharPos = new int[numBytes]; // array of the first byte 
position for each character in the input string
+int searchCharBytes;
+
+while (s < this.numBytes) {
+  UTF8String searchChar = copyUTF8String(s, s + 
numBytesForFirstByte(this.getByte(s)) - 1);
+  searchCharBytes = searchChar.numBytes;
+  // try to find the matching for the searchChar in the trimString set
+  if (trimString.find(searchChar, 0) >= 0) {
--- End diff --

I described the behavior in the comments. thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117166353
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1069,6 +1069,8 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   override def visitFunctionCall(ctx: FunctionCallContext): Expression = 
withOrigin(ctx) {
 // Create the function call.
 val name = ctx.qualifiedName.getText
+val trimFuncName = Option(ctx.trimOperator).map {
+  o => visitTrimFuncName(ctx, o)}
--- End diff --

changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117166374
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -510,6 +510,67 @@ public UTF8String trim() {
 }
   }
 
+  /**
+   * Removes all specified trim character string either from the beginning 
or the ending of a string
--- End diff --

changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117166341
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -461,68 +462,246 @@ case class FindInSet(left: Expression, right: 
Expression) extends BinaryExpressi
 }
 
 /**
- * A function that trim the spaces from both ends for the specified string.
+ * A function that trims leading or trailing characters (or both) from the 
specified string.
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread kevinyu98
Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117166332
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -461,68 +462,246 @@ case class FindInSet(left: Expression, right: 
Expression) extends BinaryExpressi
 }
 
 /**
- * A function that trim the spaces from both ends for the specified string.
+ * A function that trims leading or trailing characters (or both) from the 
specified string.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(str) - Removes the leading and trailing space characters 
from `str`.",
+  usage = """
+_FUNC_(str) - Removes the leading and trailing space characters from 
`str`.
+_FUNC_(BOTH trimString FROM str) - Remove the leading and trailing 
trimString from `str`
+_FUNC_(LEADING trimChar FROM str) - Remove the leading trimString from 
`str`
+_FUNC_(TRAILING trimChar FROM str) - Remove the trailing trimString 
from `str`
+  """,
   extended = """
+Arguments:
+  str - a string expression
+  trimString - the trim string
+  BOTH, FROM - these are keyword to specify for trim string from both 
ends of the string
+  LEADING, FROM - these are keyword to specify for trim string from 
left end of the string
+  TRAILING, FROM - these are keyword to specify for trim string from 
right end of the string
 Examples:
   > SELECT _FUNC_('SparkSQL   ');
SparkSQL
+  > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS');
+   parkSQ
+  > SELECT _FUNC_(LEADING 'paS' FROM 'SSparkSQLS');
+   rkSQLS
+  > SELECT _FUNC_(TRAILING 'SLQ' FROM 'SSparkSQLS');
+   SSparkS
   """)
-case class StringTrim(child: Expression)
-  extends UnaryExpression with String2StringExpression {
+case class StringTrim(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes {
+
+  require(children.size <= 2 && children.nonEmpty,
+s"$prettyName requires at least one argument and no more than two.")
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.size)(StringType)
 
-  def convert(v: UTF8String): UTF8String = v.trim()
+  override def nullable: Boolean = children.exists(_.nullable)
+  override def foldable: Boolean = children.forall(_.foldable)
 
   override def prettyName: String = "trim"
 
+  override def eval(input: InternalRow): Any = {
+val inputs = children.map(_.eval(input).asInstanceOf[UTF8String])
+if (inputs(0) != null) {
--- End diff --

sure. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17992: [SPARK-20759] SCALA_VERSION in _config.yml should be con...

2017-05-17 Thread liu-zhaokun
Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/17992
  
@srowen 
The test doesn't finish,need I do anything?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117163859
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

I can't remember what flags/options run the style check with mvn, but you 
can always run it directly with `dev/scalastyle`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18024: [SPARK-20792][SS] Support same timeout operations...

2017-05-17 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/18024

[SPARK-20792][SS] Support same timeout operations in mapGroupsWithState 
function in batch queries as in streaming queries

## What changes were proposed in this pull request?

Currently, in the batch queries, timeout is disabled (i.e. 
GroupStateTimeout.NoTimeout) which means any GroupState.setTimeout*** operation 
would throw UnsupportedOperationException. This makes it weird when converting 
a streaming query into a batch query by changing the input DF from streaming to 
a batch DF. If the timeout was enabled and used, then the batch query will 
start throwing UnsupportedOperationException.

This creates the dummy state in batch queries with the provided timeoutConf 
so that it behaves in the same way.

## How was this patch tested?
Additional tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-20792

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18024


commit eef789fe1fd04a98b4d82da6864ca4f4b23c2bfb
Author: Tathagata Das 
Date:   2017-05-18T05:31:44Z

Fixed bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117163563
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

How to run the style checker? But i build the code with maven success.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117163321
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

have you run the style checker? I think this may be in the wrong order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117162950
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -47,39 +49,47 @@ import org.apache.spark.util.{AccumulatorContext, 
AccumulatorV2}
  *data type is nullable.
  */
 class ParquetFilterSuite extends QueryTest with ParquetTest with 
SharedSQLContext {
--- End diff --

can we just have a simple end-to-end test? The fix is actually very simple 
and seems not worth such complex tests to verify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

2017-05-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18014
  
I thought that idea is for Apache Arrow.
We could use binary type for `UnsafeArrayData`. However, it involves some 
complexity to use 
[`ColumnVector.Array`](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L1015-L1017).

Is it better to use existing code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17995
  
ping @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17999
  
**[Test build #77041 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77041/testReport)**
 for PR 17999 at commit 
[`c80c184`](https://github.com/apache/spark/commit/c80c184d5a9f85e2bff740e8cf96bd9a97d0f8a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117162403
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
--- End diff --

nit: `nameToType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18011
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18011: [SPARK-19089][SQL] Add support for nested sequenc...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18011#discussion_r117161759
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala ---
@@ -258,6 +258,10 @@ class DatasetPrimitiveSuite extends QueryTest with 
SharedSQLContext {
   ListClass(List(1)) -> Queue("test" -> SeqClass(Seq(2
   }
 
+  test("nested sequences") {
+checkDataset(Seq(Seq(Seq(1))).toDS(), Seq(Seq(1)))
--- End diff --

let's also add test for specific collection type, e.g. `List(Queue(1))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18011
  
**[Test build #77040 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77040/testReport)**
 for PR 18011 at commit 
[`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18011
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16986#discussion_r117160501
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -329,35 +329,19 @@ object ScalaReflection extends ScalaReflection {
 }
 UnresolvedMapObjects(mapFunction, getPath, Some(cls))
 
-  case t if t <:< localTypeOf[Map[_, _]] =>
+  case t if t <:< localTypeOf[Map[_, _]] || t <:< 
localTypeOf[java.util.Map[_, _]] =>
--- End diff --

we should handle java map in `JavaTypeInference`, but I think it's better 
to do it in another PR and focus on scala map in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18000
  
I would rather like to say it is a limitation in Parquet API. It looks 
there is no way to set column names having dots in Parquet filters properly. 
https://github.com/apache/spark/pull/17680 suggests a hacky workaround to set 
this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18000
  
a high-level question, is it a parquet bug or Spark doesn't use parquet 
reader correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18014
  
I may miss something, can we just treat array type as binary type and put 
it in `ColumnVector`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117158817
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -603,7 +603,13 @@ object DateTimeUtils {
*/
   private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, 
Int) = {
 // add the difference (in days) between 1.1.1970 and the artificial 
year 0 (-17999)
-val daysNormalized = daysSince1970 + toYearZero
+var  daysSince1970Tmp = daysSince1970
+// In history,the period(5.10.1582 ~ 14.10.1582) is not exist
--- End diff --

OK, I will do ,thanks @kiszk @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117158766
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -175,7 +178,7 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
   sql(s"INSERT INTO TABLE $textTable SELECT * FROM src")
   checkTableStats(
 textTable,
-hasSizeInBytes = false,
+hasSizeInBytes = true,
--- End diff --

why the behavior is changed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117158738
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala ---
@@ -325,26 +325,24 @@ class ShowCreateTableSuite extends QueryTest with 
SQLTestUtils with TestHiveSing
 "last_modified_by",
 "last_modified_time",
 "Owner:",
-"COLUMN_STATS_ACCURATE",
 // The following are hive specific schema parameters which we do 
not need to match exactly.
-"numFiles",
-"numRows",
-"rawDataSize",
-"totalSize",
 "totalNumberFiles",
 "maxFileSize",
-"minFileSize",
-// EXTERNAL is not non-deterministic, but it is filtered out for 
external tables.
-"EXTERNAL"
+"minFileSize"
   )
 
   table.copy(
 createTime = 0L,
 lastAccessTime = 0L,
-properties = 
table.properties.filterKeys(!nondeterministicProps.contains(_))
+properties = 
table.properties.filterKeys(!nondeterministicProps.contains(_)),
+stats = None,
+ignoredProperties = Map.empty
   )
 }
 
+val e = normalize(actual)
+val m = normalize(expected)
--- End diff --

remove this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117158531
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -414,6 +415,50 @@ private[hive] class HiveClientImpl(
 
   val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
 
+  // Hive-generated Statistics are also recorded in ignoredProperties
+  val ignoredProperties = scala.collection.mutable.Map.empty[String, 
String]
+  for (key <- HiveStatisticsProperties; value <- properties.get(key)) {
+ignoredProperties += key -> value
+  }
+
+  val excludedTableProperties = HiveStatisticsProperties ++ Set(
+// The property value of "comment" is moved to the dedicated field 
"comment"
+"comment",
+// For EXTERNAL_TABLE, the table properties has a particular field 
"EXTERNAL". This is added
+// in the function toHiveTable.
+"EXTERNAL"
+  )
+
+  val filteredProperties = properties.filterNot {
+case (key, _) => excludedTableProperties.contains(key)
+  }
+  val comment = properties.get("comment")
+
+  val totalSize = 
properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
+  val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
+  def rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) match {
+case Some(c) if c >= 0 => Some(c)
+case _ => None
+  }
+  // TODO: check if this estimate is valid for tables after partition 
pruning.
+  // NOTE: getting `totalSize` directly from params is kind of hacky, 
but this should be
+  // relatively cheap if parameters for the table are populated into 
the metastore.
+  // Currently, only totalSize, rawDataSize, and row_count are used to 
build the field `stats`
+  // TODO: stats should include all the other two fields (`numFiles` 
and `numPartitions`).
+  // (see StatsSetupConst in Hive)
+  val stats =
+  // When table is external, `totalSize` is always zero, which will 
influence join strategy
+  // so when `totalSize` is zero, use `rawDataSize` instead. When 
`rawDataSize` is also zero,
+  // return None. Later, we will use the other ways to estimate the 
statistics.
+  if (totalSize.isDefined && totalSize.get > 0L) {
--- End diff --

the indention is wrong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117158477
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

OK, thanks @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117158402
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -490,6 +516,42 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("SPARK-20364 Do not push down filters when column names have dots") 
{
+implicit class StringToAttribute(str: String) {
+  // Implicits for attr, $ and symbol do not handle backticks.
+  def attribute: Attribute = UnresolvedAttribute.quotedString(str)
--- End diff --

Yea, actually my initial version in my local included the change for 
`symbol` and` $` to match them to `Column`. It also looks making sense per 
https://github.com/apache/spark/pull/7969. I believe this is an internal API - 
https://github.com/apache/spark/blob/e9c91badce64731ffd3e53cbcd9f044a7593e6b8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala#L24
 so I guess it would be fine even if it introduces a behaviour change.

Nevertheless, I believe some guys don't like this change much and wanted to 
avoid such changes here for now (it is single place it needs anyway for now ... 
).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117157965
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -490,6 +516,42 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("SPARK-20364 Do not push down filters when column names have dots") 
{
+implicit class StringToAttribute(str: String) {
+  // Implicits for attr, $ and symbol do not handle backticks.
+  def attribute: Attribute = UnresolvedAttribute.quotedString(str)
--- End diff --

Shall we make $ to use`UnresolvedAttribute.quotedString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17995
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77038/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117157765
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

let's follow mysql


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17995
  
**[Test build #77038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77038/testReport)**
 for PR 17995 at commit 
[`bed4c41`](https://github.com/apache/spark/commit/bed4c4183fa94b20d978ac9e61d225ea989c8a73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17994: [SPARK-20505][ML] Add docs and examples for ml.st...

2017-05-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17994


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17994: [SPARK-20505][ML] Add docs and examples for ml.stat.Corr...

2017-05-17 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17994
  
Merged into master and branch-2.2. Thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #77039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77039/testReport)**
 for PR 16989 at commit 
[`4ece142`](https://github.com/apache/spark/commit/4ece142d2a3c4b46a712539e3aa7f7ee0d4e6b5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17996#discussion_r117155950
  
--- Diff: docs/ml-guide.md ---
@@ -72,35 +72,26 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
 and the migration guide below will explain all changes between releases.
 
-## From 2.0 to 2.1
+## From 2.1 to 2.2
 
 ### Breaking changes
- 
-**Deprecated methods removed**
 
-* `setLabelCol` in `feature.ChiSqSelectorModel`
-* `numTrees` in `classification.RandomForestClassificationModel` (This now 
refers to the Param called `numTrees`)
-* `numTrees` in `regression.RandomForestRegressionModel` (This now refers 
to the Param called `numTrees`)
-* `model` in `regression.LinearRegressionSummary`
-* `validateParams` in `PipelineStage`
-* `validateParams` in `Evaluator`
+There are no breaking changes.
 
 ### Deprecations and changes of behavior
 
 **Deprecations**
 
-* [SPARK-18592](https://issues.apache.org/jira/browse/SPARK-18592):
-  Deprecate all Param setter methods except for input/output column Params 
for `DecisionTreeClassificationModel`, `GBTClassificationModel`, 
`RandomForestClassificationModel`, `DecisionTreeRegressionModel`, 
`GBTRegressionModel` and `RandomForestRegressionModel`
+There are no deprecations.
 
 **Changes of behavior**
--- End diff --

Should we include #17233 in this section?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117155497
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

@cloud-fan Because in history,the period(5.10.1582 ~ 14.10.1582) is not 
exist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18017: [INFRA] Close stale PRs

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18017
  
(#16654 was took out as it was closed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117155315
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

why `278` is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16989
  
Checking the code:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala#L59
`SparkConfigProvider` just check if the key is in JMap, if not return the 
default value. It doesn't check the alternatives.
I think it seems this is the reason  
`org.apache.spark.memory.TaskMemoryManagerSuite.offHeapConfigurationBackwardsCompatibility
 ` fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit tests h...

2017-05-17 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/17869
  
@srowen ,
I commit to modify the PR.
Can you help me to run `test build` again.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16989
  
that seems impossible, can you give an example? BTW if this blocks you, 
just revert the off-heap config changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-17 Thread heary-cao
Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18016
  
@hvanhovell  @srowen 
I have modify it again. and` floor`  is same problem.
review please.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17995
  
**[Test build #77038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77038/testReport)**
 for PR 17995 at commit 
[`bed4c41`](https://github.com/apache/spark/commit/bed4c4183fa94b20d978ac9e61d225ea989c8a73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17995
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117153595
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

In mysql ,the rusult is :
mysql> select dayofyear("1982-10-04");
+-+
| dayofyear("1982-10-04") |
+-+
| 277 |
+-+
1 row in set (0.00 sec)

mysql> select dayofyear("1982-10-015");
+--+
| dayofyear("1982-10-015") |
+--+
|  288 |
+--+


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117153570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStat

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117153480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStat

[GitHub] spark pull request #16654: [SPARK-19303][ML][WIP] Add evaluate method in clu...

2017-05-17 Thread zhengruifeng
Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/16654


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117153431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+  gatherValueStat

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117153106
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

can we check with other databases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117153080
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -603,7 +603,13 @@ object DateTimeUtils {
*/
   private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, 
Int) = {
 // add the difference (in days) between 1.1.1970 and the artificial 
year 0 (-17999)
-val daysNormalized = daysSince1970 + toYearZero
+var  daysSince1970Tmp = daysSince1970
+// In history,the period(5.10.1582 ~ 14.10.1582) is not exist
--- End diff --

It's only about comment, and I think 1582-10-5 or Oct. 5, 1582 is more 
human readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16989
  
It seems like `SparkConfigProvider` is not checking alternatives in 
`SparkConf`. That's why spark.memory.offHeap.enabled is not set(still the 
default value), though we've already set `spark.unsafe.offHeap`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r117152091
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,39 @@ package object config {
 "spark.io.compression.codec.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_ACCURATE_BLOCK_THRESHOLD =
+ConfigBuilder("spark.shuffle.accurateBlkThreshold")
+  .doc("When we compress the size of shuffle blocks in 
HighlyCompressedMapStatus, we will " +
+"record the size accurately if it's above the threshold specified 
by this config. This " +
+"helps to prevent OOM by avoiding underestimating shuffle block 
size when fetch shuffle " +
+"blocks.")
+  .longConf
+  .createWithDefault(100 * 1024 * 1024)
+
+  private[spark] val MEMORY_OFF_HEAP_ENABLED =
+ConfigBuilder("spark.memory.offHeap.enabled")
+  .doc("If true, Spark will attempt to use off-heap memory for certain 
operations(e.g. sort, " +
+"aggregate, etc. However, the buffer used for fetching shuffle 
blocks is always " +
+"off-heap). If off-heap memory use is enabled, then 
spark.memory.offHeap.size must be " +
+"positive.")
+  .booleanConf
+  .createWithDefault(false)
+
+  private[spark] val MEMORY_OFF_HEAP_SIZE =
+ConfigBuilder("spark.memory.offHeap.size")
+  .doc("The absolute amount of memory in bytes which can be used for 
off-heap allocation." +
+" This setting has no impact on heap memory usage, so if your 
executors' total memory" +
+" consumption must fit within some hard limit then be sure to 
shrink your JVM heap size" +
+" accordingly. This must be set to a positive value when " +
+"spark.memory.offHeap.enabled=true.")
+  .longConf
--- End diff --

Yes, I should refine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r117151567
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,39 @@ package object config {
 "spark.io.compression.codec.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_ACCURATE_BLOCK_THRESHOLD =
+ConfigBuilder("spark.shuffle.accurateBlkThreshold")
+  .doc("When we compress the size of shuffle blocks in 
HighlyCompressedMapStatus, we will " +
+"record the size accurately if it's above the threshold specified 
by this config. This " +
+"helps to prevent OOM by avoiding underestimating shuffle block 
size when fetch shuffle " +
+"blocks.")
+  .longConf
+  .createWithDefault(100 * 1024 * 1024)
+
+  private[spark] val MEMORY_OFF_HEAP_ENABLED =
+ConfigBuilder("spark.memory.offHeap.enabled")
+  .doc("If true, Spark will attempt to use off-heap memory for certain 
operations(e.g. sort, " +
+"aggregate, etc. However, the buffer used for fetching shuffle 
blocks is always " +
+"off-heap). If off-heap memory use is enabled, then 
spark.memory.offHeap.size must be " +
+"positive.")
+  .booleanConf
+  .createWithDefault(false)
+
+  private[spark] val MEMORY_OFF_HEAP_SIZE =
+ConfigBuilder("spark.memory.offHeap.size")
+  .doc("The absolute amount of memory in bytes which can be used for 
off-heap allocation." +
+" This setting has no impact on heap memory usage, so if your 
executors' total memory" +
+" consumption must fit within some hard limit then be sure to 
shrink your JVM heap size" +
+" accordingly. This must be set to a positive value when " +
+"spark.memory.offHeap.enabled=true.")
+  .longConf
--- End diff --

we should use `.bytesConf(ByteUnit.BYTE)`, see 
`SQLConf.SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE` as an example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...

2017-05-17 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18015
  
@ajbozarth 
Thank you very much for the suggestion that I have modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77037/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77037/testReport)**
 for PR 14971 at commit 
[`cce31db`](https://github.com/apache/spark/commit/cce31db80cdc66516e3e537f33a3611b07186b6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77036/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77036/testReport)**
 for PR 14971 at commit 
[`22a2c00`](https://github.com/apache/spark/commit/22a2c00333ffc39458f45d629c1b3199f73f1f3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17435
  
I think we need a test and @holdenk's review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18017: [INFRA] Close stale PRs

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18017
  
(Actually, let me take out #17435. It looks recently updated and I believe 
it has a point there).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -33,24 +33,24 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
 
   override def render(request: HttpServletRequest): Seq[Node] = {
 val currentTime = System.currentTimeMillis()
-val content = listener.synchronized {
+var content : NodeSeq = listener.synchronized {
--- End diff --

I'd rather not switch to a `var` (it's very un-scala), see below for alt 
suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148750
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -61,6 +61,36 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   
details.parentNode.querySelector('.stage-details').classList.toggle('collapsed')
 }}
   
+content =
+  
+
+  {
+  if (listener.getRunningExecutions.nonEmpty) {
+
+  Running 
Queries:
+  {listener.getRunningExecutions.size}
+
+  }
+  }
+  {
+  if (listener.getCompletedExecutions.nonEmpty) {
+
+  Completed 
Queries:
+  {listener.getCompletedExecutions.size}
+
+  }
+  }
+  {
+  if (listener.getFailedExecutions.nonEmpty) {
+
+  Failed 
Queries:
+  {listener.getFailedExecutions.size}
+
+  }
+  }
+
+   ++ content
+
 UIUtils.headerSparkPage("SQL", content, parent, Some(5000))
--- End diff --

then you could replace `content` here with `summary ++ content`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148693
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -61,6 +61,36 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   
details.parentNode.querySelector('.stage-details').classList.toggle('collapsed')
 }}
   
+content =
--- End diff --

perhaps leave this as `summary`, but not `++ content` at the end


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77035/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18020
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18020
  
**[Test build #77035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77035/testReport)**
 for PR 18020 at commit 
[`aa16ab3`](https://github.com/apache/spark/commit/aa16ab38fc0e0c80b179a5860f477c3650f64609).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117148664
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -730,4 +726,49 @@ public void testToLong() throws IOException {
   assertFalse(negativeInput, 
UTF8String.fromString(negativeInput).toLong(wrapper));
 }
   }
+  @Test
+  public void trimsChar() {
--- End diff --

Could you split this test case into three test cases for trim, trimLeft, 
trimRight?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...

2017-05-17 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18015
  
@ajbozarth 
Rebuild, optimize the variable name.
I add two screenshots.Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread guoxiaolongzte
Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148012
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -61,7 +61,37 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   
details.parentNode.querySelector('.stage-details').classList.toggle('collapsed')
 }}
   
-UIUtils.headerSparkPage("SQL", content, parent, Some(5000))
+
+val summary: NodeSeq =
--- End diff --

Rebuild, optimize the variable name. 
I add two screenshots.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18000
  
Thank you @viirya.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117145159
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
+
+// Parquet does not allow dots in the column name because dots are 
used as a column path
--- End diff --

Not just for speed.  Also for the number of codes needed to change. But I 
think it is ok for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18000
  
Sounds ok for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77032/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77032 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)**
 for PR 15821 at commit 
[`b4eebc2`](https://github.com/apache/spark/commit/b4eebc27e261eddb4d8b0b829245fa3c187dade1).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18000
  
Just to make sure, I don't feel strongly for both comments @viirya. I am 
willing to fix if you feel strongly. Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >