date:20170517

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117163859
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

I can't remember what flags/options run the style check with mvn, but you 
can always run it directly with `dev/scalastyle`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18024: [SPARK-20792][SS] Support same timeout operations...

2017-05-17 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/18024

[SPARK-20792][SS] Support same timeout operations in mapGroupsWithState 
function in batch queries as in streaming queries

## What changes were proposed in this pull request?

Currently, in the batch queries, timeout is disabled (i.e. 
GroupStateTimeout.NoTimeout) which means any GroupState.setTimeout*** operation 
would throw UnsupportedOperationException. This makes it weird when converting 
a streaming query into a batch query by changing the input DF from streaming to 
a batch DF. If the timeout was enabled and used, then the batch query will 
start throwing UnsupportedOperationException.

This creates the dummy state in batch queries with the provided timeoutConf 
so that it behaves in the same way.

## How was this patch tested?
Additional tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-20792

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18024.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18024


commit eef789fe1fd04a98b4d82da6864ca4f4b23c2bfb
Author: Tathagata Das 
Date:   2017-05-18T05:31:44Z

Fixed bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117163563
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

How to run the style checker? But i build the code with maven success.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117163321
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui
 import javax.servlet.http.HttpServletRequest
 
 import scala.collection.mutable
-import scala.xml.Node
+import scala.xml.{NodeSeq, Node}
--- End diff --

have you run the style checker? I think this may be in the wrong order


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117162950
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -47,39 +49,47 @@ import org.apache.spark.util.{AccumulatorContext, 
AccumulatorV2}
  *data type is nullable.
  */
 class ParquetFilterSuite extends QueryTest with ParquetTest with 
SharedSQLContext {
--- End diff --

can we just have a simple end-to-end test? The fix is actually very simple 
and seems not worth such complex tests to verify it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

2017-05-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18014
  
I thought that idea is for Apache Arrow.
We could use binary type for `UnsafeArrayData`. However, it involves some 
complexity to use 
[`ColumnVector.Array`](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L1015-L1017).

Is it better to use existing code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17995
  
ping @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17999
  
**[Test build #77041 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77041/testReport)**
 for PR 17999 at commit 
[`c80c184`](https://github.com/apache/spark/commit/c80c184d5a9f85e2bff740e8cf96bd9a97d0f8a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117162403
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
--- End diff --

nit: `nameToType`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18011
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18011: [SPARK-19089][SQL] Add support for nested sequenc...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18011#discussion_r117161759
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala ---
@@ -258,6 +258,10 @@ class DatasetPrimitiveSuite extends QueryTest with 
SharedSQLContext {
   ListClass(List(1)) -> Queue("test" -> SeqClass(Seq(2
   }
 
+  test("nested sequences") {
+checkDataset(Seq(Seq(Seq(1))).toDS(), Seq(Seq(1)))
--- End diff --

let's also add test for specific collection type, e.g. `List(Queue(1))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18011
  
**[Test build #77040 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77040/testReport)**
 for PR 18011 at commit 
[`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18011
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16986#discussion_r117160501
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -329,35 +329,19 @@ object ScalaReflection extends ScalaReflection {
 }
 UnresolvedMapObjects(mapFunction, getPath, Some(cls))
 
-  case t if t <:< localTypeOf[Map[_, _]] =>
+  case t if t <:< localTypeOf[Map[_, _]] || t <:< 
localTypeOf[java.util.Map[_, _]] =>
--- End diff --

we should handle java map in `JavaTypeInference`, but I think it's better 
to do it in another PR and focus on scala map in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18000
  
I would rather like to say it is a limitation in Parquet API. It looks 
there is no way to set column names having dots in Parquet filters properly. 
https://github.com/apache/spark/pull/17680 suggests a hacky workaround to set 
this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18000
  
a high-level question, is it a parquet bug or Spark doesn't use parquet 
reader correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18014
  
I may miss something, can we just treat array type as binary type and put 
it in `ColumnVector`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117158817
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -603,7 +603,13 @@ object DateTimeUtils {
*/
   private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, 
Int) = {
 // add the difference (in days) between 1.1.1970 and the artificial 
year 0 (-17999)
-val daysNormalized = daysSince1970 + toYearZero
+var  daysSince1970Tmp = daysSince1970
+// In history,the period(5.10.1582 ~ 14.10.1582) is not exist
--- End diff --

OK, I will do ,thanks @kiszk @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117158766
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -175,7 +178,7 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
   sql(s"INSERT INTO TABLE $textTable SELECT * FROM src")
   checkTableStats(
 textTable,
-hasSizeInBytes = false,
+hasSizeInBytes = true,
--- End diff --

why the behavior is changed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117158738
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala ---
@@ -325,26 +325,24 @@ class ShowCreateTableSuite extends QueryTest with 
SQLTestUtils with TestHiveSing
 "last_modified_by",
 "last_modified_time",
 "Owner:",
-"COLUMN_STATS_ACCURATE",
 // The following are hive specific schema parameters which we do 
not need to match exactly.
-"numFiles",
-"numRows",
-"rawDataSize",
-"totalSize",
 "totalNumberFiles",
 "maxFileSize",
-"minFileSize",
-// EXTERNAL is not non-deterministic, but it is filtered out for 
external tables.
-"EXTERNAL"
+"minFileSize"
   )
 
   table.copy(
 createTime = 0L,
 lastAccessTime = 0L,
-properties = 
table.properties.filterKeys(!nondeterministicProps.contains(_))
+properties = 
table.properties.filterKeys(!nondeterministicProps.contains(_)),
+stats = None,
+ignoredProperties = Map.empty
   )
 }
 
+val e = normalize(actual)
+val m = normalize(expected)
--- End diff --

remove this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14971#discussion_r117158531
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -414,6 +415,50 @@ private[hive] class HiveClientImpl(
 
   val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
 
+  // Hive-generated Statistics are also recorded in ignoredProperties
+  val ignoredProperties = scala.collection.mutable.Map.empty[String, 
String]
+  for (key <- HiveStatisticsProperties; value <- properties.get(key)) {
+ignoredProperties += key -> value
+  }
+
+  val excludedTableProperties = HiveStatisticsProperties ++ Set(
+// The property value of "comment" is moved to the dedicated field 
"comment"
+"comment",
+// For EXTERNAL_TABLE, the table properties has a particular field 
"EXTERNAL". This is added
+// in the function toHiveTable.
+"EXTERNAL"
+  )
+
+  val filteredProperties = properties.filterNot {
+case (key, _) => excludedTableProperties.contains(key)
+  }
+  val comment = properties.get("comment")
+
+  val totalSize = 
properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_))
+  val rawDataSize = 
properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_))
+  def rowCount = 
properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) match {
+case Some(c) if c >= 0 => Some(c)
+case _ => None
+  }
+  // TODO: check if this estimate is valid for tables after partition 
pruning.
+  // NOTE: getting `totalSize` directly from params is kind of hacky, 
but this should be
+  // relatively cheap if parameters for the table are populated into 
the metastore.
+  // Currently, only totalSize, rawDataSize, and row_count are used to 
build the field `stats`
+  // TODO: stats should include all the other two fields (`numFiles` 
and `numPartitions`).
+  // (see StatsSetupConst in Hive)
+  val stats =
+  // When table is external, `totalSize` is always zero, which will 
influence join strategy
+  // so when `totalSize` is zero, use `rawDataSize` instead. When 
`rawDataSize` is also zero,
+  // return None. Later, we will use the other ways to estimate the 
statistics.
+  if (totalSize.isDefined && totalSize.get > 0L) {
--- End diff --

the indention is wrong


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117158477
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

OK, thanks @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117158402
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -490,6 +516,42 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("SPARK-20364 Do not push down filters when column names have dots") 
{
+implicit class StringToAttribute(str: String) {
+  // Implicits for attr, $ and symbol do not handle backticks.
+  def attribute: Attribute = UnresolvedAttribute.quotedString(str)
--- End diff --

Yea, actually my initial version in my local included the change for 
`symbol` and` $` to match them to `Column`. It also looks making sense per 
https://github.com/apache/spark/pull/7969. I believe this is an internal API - 
https://github.com/apache/spark/blob/e9c91badce64731ffd3e53cbcd9f044a7593e6b8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala#L24
 so I guess it would be fine even if it introduces a behaviour change.

Nevertheless, I believe some guys don't like this change much and wanted to 
avoid such changes here for now (it is single place it needs anyway for now ... 
).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117157965
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -490,6 +516,42 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("SPARK-20364 Do not push down filters when column names have dots") 
{
+implicit class StringToAttribute(str: String) {
+  // Implicits for attr, $ and symbol do not handle backticks.
+  def attribute: Attribute = UnresolvedAttribute.quotedString(str)
--- End diff --

Shall we make $ to use`UnresolvedAttribute.quotedString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17995
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77038/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117157765
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

let's follow mysql


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17995
  
**[Test build #77038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77038/testReport)**
 for PR 17995 at commit 
[`bed4c41`](https://github.com/apache/spark/commit/bed4c4183fa94b20d978ac9e61d225ea989c8a73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17994: [SPARK-20505][ML] Add docs and examples for ml.st...

2017-05-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17994


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17994: [SPARK-20505][ML] Add docs and examples for ml.stat.Corr...

2017-05-17 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17994
  
Merged into master and branch-2.2. Thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16989
  
**[Test build #77039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77039/testReport)**
 for PR 16989 at commit 
[`4ece142`](https://github.com/apache/spark/commit/4ece142d2a3c4b46a712539e3aa7f7ee0d4e6b5b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide

2017-05-17 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17996#discussion_r117155950
  
--- Diff: docs/ml-guide.md ---
@@ -72,35 +72,26 @@ MLlib is under active development.
 The APIs marked `Experimental`/`DeveloperApi` may change in future 
releases,
 and the migration guide below will explain all changes between releases.
 
-## From 2.0 to 2.1
+## From 2.1 to 2.2
 
 ### Breaking changes
- 
-**Deprecated methods removed**
 
-* `setLabelCol` in `feature.ChiSqSelectorModel`
-* `numTrees` in `classification.RandomForestClassificationModel` (This now 
refers to the Param called `numTrees`)
-* `numTrees` in `regression.RandomForestRegressionModel` (This now refers 
to the Param called `numTrees`)
-* `model` in `regression.LinearRegressionSummary`
-* `validateParams` in `PipelineStage`
-* `validateParams` in `Evaluator`
+There are no breaking changes.
 
 ### Deprecations and changes of behavior
 
 **Deprecations**
 
-* [SPARK-18592](https://issues.apache.org/jira/browse/SPARK-18592):
-  Deprecate all Param setter methods except for input/output column Params 
for `DecisionTreeClassificationModel`, `GBTClassificationModel`, 
`RandomForestClassificationModel`, `DecisionTreeRegressionModel`, 
`GBTRegressionModel` and `RandomForestRegressionModel`
+There are no deprecations.
 
 **Changes of behavior**
--- End diff --

Should we include #17233 in this section?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117155497
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

@cloud-fan Because in history,the period(5.10.1582 ~ 14.10.1582) is not 
exist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18017: [INFRA] Close stale PRs

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18017
  
(#16654 was took out as it was closed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117155315
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

why `278` is better?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16989
  
Checking the code:

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala#L59
`SparkConfigProvider` just check if the key is in JMap, if not return the 
default value. It doesn't check the alternatives.
I think it seems this is the reason  
`org.apache.spark.memory.TaskMemoryManagerSuite.offHeapConfigurationBackwardsCompatibility
 ` fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit tests h...

2017-05-17 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/17869
  
@srowen ,
I commit to modify the PR.
Can you help me to run `test build` again.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16989
  
that seems impossible, can you give an example? BTW if this blocks you, 
just revert the off-heap config changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-17 Thread heary-cao

Github user heary-cao commented on the issue:

https://github.com/apache/spark/pull/18016
  
@hvanhovell  @srowen 
I have modify it again. and` floor`  is same problem.
review please.
thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17995
  
**[Test build #77038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77038/testReport)**
 for PR 17995 at commit 
[`bed4c41`](https://github.com/apache/spark/commit/bed4c4183fa94b20d978ac9e61d225ea989c8a73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive

2017-05-17 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/17995
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread 10110346

Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117153595
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

In mysql ,the rusult is :
mysql> select dayofyear("1982-10-04");
+-+
| dayofyear("1982-10-04") |
+-+
| 277 |
+-+
1 row in set (0.00 sec)

mysql> select dayofyear("1982-10-015");
+--+
| dayofyear("1982-10-015") |
+--+
|  288 |
+--+


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117153570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117153480
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+

[GitHub] spark pull request #16654: [SPARK-19303][ML][WIP] Add evaluate method in clu...

2017-05-17 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/16654


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18002#discussion_r117153431
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala
 ---
@@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends 
Serializable {
   /**
* Gathers statistics information from `row(ordinal)`.
*/
-  def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-if (row.isNullAt(ordinal)) {
-  nullCount += 1
-  // 4 bytes for null position
-  sizeInBytes += 4
-}
+  def gatherStats(row: InternalRow, ordinal: Int): Unit
+
+  /**
+   * Gathers statistics information on `null`.
+   */
+  def gatherNullStats(): Unit = {
+nullCount += 1
+// 4 bytes for null position
+sizeInBytes += 4
 count += 1
   }
 
   /**
-   * Column statistics represented as a single row, currently including 
closed lower bound, closed
+   * Column statistics represented as an array, currently including closed 
lower bound, closed
* upper bound and null count.
*/
-  def collectedStatistics: GenericInternalRow
+  def collectedStatistics: Array[Any]
 }
 
 /**
  * A no-op ColumnStats only used for testing purposes.
  */
-private[columnar] class NoopColumnStats extends ColumnStats {
-  override def gatherStats(row: InternalRow, ordinal: Int): Unit = 
super.gatherStats(row, ordinal)
+private[columnar] final class NoopColumnStats extends ColumnStats {
+  override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
+if (!row.isNullAt(ordinal)) {
+  count += 1
+} else {
+  gatherNullStats
+}
+  }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L))
+  override def collectedStatistics: Array[Any] = Array[Any](null, null, 
nullCount, count, 0L)
 }
 
-private[columnar] class BooleanColumnStats extends ColumnStats {
+private[columnar] final class BooleanColumnStats extends ColumnStats {
   protected var upper = false
   protected var lower = true
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getBoolean(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BOOLEAN.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Boolean): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BOOLEAN.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ByteColumnStats extends ColumnStats {
+private[columnar] final class ByteColumnStats extends ColumnStats {
   protected var upper = Byte.MinValue
   protected var lower = Byte.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getByte(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += BYTE.defaultSize
+  gatherValueStats(value)
+} else {
+  gatherNullStats
 }
   }
 
-  override def collectedStatistics: GenericInternalRow =
-new GenericInternalRow(Array[Any](lower, upper, nullCount, count, 
sizeInBytes))
+  def gatherValueStats(value: Byte): Unit = {
+if (value > upper) upper = value
+if (value < lower) lower = value
+sizeInBytes += BYTE.defaultSize
+count += 1
+  }
+
+  override def collectedStatistics: Array[Any] =
+Array[Any](lower, upper, nullCount, count, sizeInBytes)
 }
 
-private[columnar] class ShortColumnStats extends ColumnStats {
+private[columnar] final class ShortColumnStats extends ColumnStats {
   protected var upper = Short.MinValue
   protected var lower = Short.MaxValue
 
   override def gatherStats(row: InternalRow, ordinal: Int): Unit = {
-super.gatherStats(row, ordinal)
 if (!row.isNullAt(ordinal)) {
   val value = row.getShort(ordinal)
-  if (value > upper) upper = value
-  if (value < lower) lower = value
-  sizeInBytes += SHORT.defaultSize
+

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117153106
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

can we check with other databases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117153080
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -603,7 +603,13 @@ object DateTimeUtils {
*/
   private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, 
Int) = {
 // add the difference (in days) between 1.1.1970 and the artificial 
year 0 (-17999)
-val daysNormalized = daysSince1970 + toYearZero
+var  daysSince1970Tmp = daysSince1970
+// In history,the period(5.10.1582 ~ 14.10.1582) is not exist
--- End diff --

It's only about comment, and I think 1582-10-5 or Oct. 5, 1582 is more 
human readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...

2017-05-17 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16989
  
It seems like `SparkConfigProvider` is not checking alternatives in 
`SparkConf`. That's why spark.memory.offHeap.enabled is not set(still the 
default value), though we've already set `spark.unsafe.offHeap`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread jinxing64

Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r117152091
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,39 @@ package object config {
 "spark.io.compression.codec.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_ACCURATE_BLOCK_THRESHOLD =
+ConfigBuilder("spark.shuffle.accurateBlkThreshold")
+  .doc("When we compress the size of shuffle blocks in 
HighlyCompressedMapStatus, we will " +
+"record the size accurately if it's above the threshold specified 
by this config. This " +
+"helps to prevent OOM by avoiding underestimating shuffle block 
size when fetch shuffle " +
+"blocks.")
+  .longConf
+  .createWithDefault(100 * 1024 * 1024)
+
+  private[spark] val MEMORY_OFF_HEAP_ENABLED =
+ConfigBuilder("spark.memory.offHeap.enabled")
+  .doc("If true, Spark will attempt to use off-heap memory for certain 
operations(e.g. sort, " +
+"aggregate, etc. However, the buffer used for fetching shuffle 
blocks is always " +
+"off-heap). If off-heap memory use is enabled, then 
spark.memory.offHeap.size must be " +
+"positive.")
+  .booleanConf
+  .createWithDefault(false)
+
+  private[spark] val MEMORY_OFF_HEAP_SIZE =
+ConfigBuilder("spark.memory.offHeap.size")
+  .doc("The absolute amount of memory in bytes which can be used for 
off-heap allocation." +
+" This setting has no impact on heap memory usage, so if your 
executors' total memory" +
+" consumption must fit within some hard limit then be sure to 
shrink your JVM heap size" +
+" accordingly. This must be set to a positive value when " +
+"spark.memory.offHeap.enabled=true.")
+  .longConf
--- End diff --

Yes, I should refine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...

2017-05-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16989#discussion_r117151567
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,39 @@ package object config {
 "spark.io.compression.codec.")
   .booleanConf
   .createWithDefault(false)
+
+  private[spark] val SHUFFLE_ACCURATE_BLOCK_THRESHOLD =
+ConfigBuilder("spark.shuffle.accurateBlkThreshold")
+  .doc("When we compress the size of shuffle blocks in 
HighlyCompressedMapStatus, we will " +
+"record the size accurately if it's above the threshold specified 
by this config. This " +
+"helps to prevent OOM by avoiding underestimating shuffle block 
size when fetch shuffle " +
+"blocks.")
+  .longConf
+  .createWithDefault(100 * 1024 * 1024)
+
+  private[spark] val MEMORY_OFF_HEAP_ENABLED =
+ConfigBuilder("spark.memory.offHeap.enabled")
+  .doc("If true, Spark will attempt to use off-heap memory for certain 
operations(e.g. sort, " +
+"aggregate, etc. However, the buffer used for fetching shuffle 
blocks is always " +
+"off-heap). If off-heap memory use is enabled, then 
spark.memory.offHeap.size must be " +
+"positive.")
+  .booleanConf
+  .createWithDefault(false)
+
+  private[spark] val MEMORY_OFF_HEAP_SIZE =
+ConfigBuilder("spark.memory.offHeap.size")
+  .doc("The absolute amount of memory in bytes which can be used for 
off-heap allocation." +
+" This setting has no impact on heap memory usage, so if your 
executors' total memory" +
+" consumption must fit within some hard limit then be sure to 
shrink your JVM heap size" +
+" accordingly. This must be set to a positive value when " +
+"spark.memory.offHeap.enabled=true.")
+  .longConf
--- End diff --

we should use `.bytesConf(ByteUnit.BYTE)`, see 
`SQLConf.SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE` as an example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...

2017-05-17 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18015
  
@ajbozarth 
Thank you very much for the suggestion that I have modified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77037/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77037 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77037/testReport)**
 for PR 14971 at commit 
[`cce31db`](https://github.com/apache/spark/commit/cce31db80cdc66516e3e537f33a3611b07186b6b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77036/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77036 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77036/testReport)**
 for PR 14971 at commit 
[`22a2c00`](https://github.com/apache/spark/commit/22a2c00333ffc39458f45d629c1b3199f73f1f3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17435
  
I think we need a test and @holdenk's review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18017: [INFRA] Close stale PRs

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18017
  
(Actually, let me take out #17435. It looks recently updated and I believe 
it has a point there).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148652
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -33,24 +33,24 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
 
   override def render(request: HttpServletRequest): Seq[Node] = {
 val currentTime = System.currentTimeMillis()
-val content = listener.synchronized {
+var content : NodeSeq = listener.synchronized {
--- End diff --

I'd rather not switch to a `var` (it's very un-scala), see below for alt 
suggestion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148750
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -61,6 +61,36 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   
details.parentNode.querySelector('.stage-details').classList.toggle('collapsed')
 }}
   
+content =
+  
+
+  {
+  if (listener.getRunningExecutions.nonEmpty) {
+
+  Running 
Queries:
+  {listener.getRunningExecutions.size}
+
+  }
+  }
+  {
+  if (listener.getCompletedExecutions.nonEmpty) {
+
+  Completed 
Queries:
+  {listener.getCompletedExecutions.size}
+
+  }
+  }
+  {
+  if (listener.getFailedExecutions.nonEmpty) {
+
+  Failed 
Queries:
+  {listener.getFailedExecutions.size}
+
+  }
+  }
+
+   ++ content
+
 UIUtils.headerSparkPage("SQL", content, parent, Some(5000))
--- End diff --

then you could replace `content` here with `summary ++ content`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread ajbozarth

Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148693
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -61,6 +61,36 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   
details.parentNode.querySelector('.stage-details').classList.toggle('collapsed')
 }}
   
+content =
--- End diff --

perhaps leave this as `summary`, but not `++ content` at the end


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77035/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18020
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18020
  
**[Test build #77035 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77035/testReport)**
 for PR 18020 at commit 
[`aa16ab3`](https://github.com/apache/spark/commit/aa16ab38fc0e0c80b179a5860f477c3650f64609).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-05-17 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r117148664
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -730,4 +726,49 @@ public void testToLong() throws IOException {
   assertFalse(negativeInput, 
UTF8String.fromString(negativeInput).toLong(wrapper));
 }
   }
+  @Test
+  public void trimsChar() {
--- End diff --

Could you split this test case into three test cases for trim, trimLeft, 
trimRight?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...

2017-05-17 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18015
  
@ajbozarth 
Rebuild, optimize the variable name.
I add two screenshots.Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...

2017-05-17 Thread guoxiaolongzte

Github user guoxiaolongzte commented on a diff in the pull request:

https://github.com/apache/spark/pull/18015#discussion_r117148012
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala
 ---
@@ -61,7 +61,37 @@ private[ui] class AllExecutionsPage(parent: SQLTab) 
extends WebUIPage("") with L
   
details.parentNode.querySelector('.stage-details').classList.toggle('collapsed')
 }}
   
-UIUtils.headerSparkPage("SQL", content, parent, Some(5000))
+
+val summary: NodeSeq =
--- End diff --

Rebuild, optimize the variable name. 
I add two screenshots.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18000
  
Thank you @viirya.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117145159
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
+
+// Parquet does not allow dots in the column name because dots are 
used as a column path
--- End diff --

Not just for speed.  Also for the number of codes needed to change. But I 
think it is ok for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18000
  
Sounds ok for me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77032/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15821
  
**[Test build #77032 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)**
 for PR 15821 at commit 
[`b4eebc2`](https://github.com/apache/spark/commit/b4eebc27e261eddb4d8b0b829245fa3c187dade1).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15821
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18000
  
Just to make sure, I don't feel strongly for both comments @viirya. I am 
willing to fix if you feel strongly. Please let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117143908
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
+
+// Parquet does not allow dots in the column name because dots are 
used as a column path
--- End diff --

Hm, I expect this is a non-critical path and not executed multiple times. 
Also, it does not look particularly faster to call, `Filter.references` -> 
`Filter.findReferences` -> `Filter.references` ... . Another downside (maybe 
nitpicking) is, this will introduce another small code path that returns `None` 
for filter creation failure.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117143600
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
+
+// Parquet does not allow dots in the column name because dots are 
used as a column path
--- End diff --

Ok. Sounds making sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...

2017-05-17 Thread saulshanabrook

Github user saulshanabrook commented on the issue:

https://github.com/apache/spark/pull/14957
  
@xuanyuanking Have you determined if the functionality provided here is 
superseded by #16578? I am trying to figure out which PR to help out on since I 
need this feature as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18020
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18020
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77031/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77034/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #77034 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77034/testReport)**
 for PR 17094 at commit 
[`9461c45`](https://github.com/apache/spark/commit/9461c45b39ac4c61d0faf6830ee9da0ed8d7015b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18020
  
**[Test build #77031 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77031/testReport)**
 for PR 18020 at commit 
[`3890b91`](https://github.com/apache/spark/commit/3890b91f42205d4db19349af459c6511ab81daf1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18023: Fix SPARK-12139: REGEX Column Specification for Hive Que...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18023
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...

2017-05-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18000#discussion_r117140137
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -166,7 +166,14 @@ private[parquet] object ParquetFilters {
* Converts data sources filters to Parquet filter predicates.
*/
   def createFilter(schema: StructType, predicate: sources.Filter): 
Option[FilterPredicate] = {
-val dataTypeOf = getFieldMap(schema)
+val nameTypeMap = getFieldMap(schema)
+
+// Parquet does not allow dots in the column name because dots are 
used as a column path
--- End diff --

Yes, it is. However, we don't already log pushed filters failed to create, 
e.g., `In` AFAIK. Probably, we should log in those cases across all the 
sources. If you don't strongly feel about this, I would like to not log here 
for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18023: Fix SPARK-12139: REGEX Column Specification for H...

2017-05-17 Thread janewangfb

GitHub user janewangfb opened a pull request:

https://github.com/apache/spark/pull/18023

Fix SPARK-12139: REGEX Column Specification for Hive Queries

## What changes were proposed in this pull request?
Hive interprets regular expression, e.g., `(a)?+.+` in query specification. 
This PR enables spark to support this feature when 
hive.support.quoted.identifiers is set to true.

## How was this patch tested?

- Add unittests in SQLQuerySuite.scala
- Iin spark-shell tested the original failed query:
scala> hc.sql("SELECT `(appid|ds|host|instance|offset|ts)?+.+`, 
IF(FB_IS_VALID_HIVE_PARTITION_VALUE(appid), appid, 'BAD_APPID'), 
IF(FB_IS_VALID_HIVE_PARTITION_VALUE(ts), ts, 'BAD_TS') FROM 
time_spent_bit_array_mobile_current WHERE ds='2017-05-14' AND 
instance='cc_deterministic_loader' AND ts='2017-05-14+15:00:99' limit 
100").collect.foreach(println)

result:
[1.4947744605006E9,Map(delta -> 803, ip -> 84.16.234.63, ig_id -> 
1928710114, hces_extra -> 
{"radio_type":"wifi-none","auth_flag":"unable_to_verify"}),0.0,1494774434,1.494774459676E9,WrappedArray(517867,
 
0),26,0,lncny1,e46e8616-9763-475a-b80f-a46094b263a6,9,188,10.20.0,4C0175EC-B421-4676-ACFF-8E1E353D53E5,,57944460,null,6f72336f74c9f85c6e1b7b16c64e9dec,,567067343352427,2017-05-14+15:00:99]



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/janewangfb/spark support_select_regex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18023.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18023


commit af55afd8d6839e38337f67e19a614ea3eae9a2cf
Author: Jane Wang 
Date:   2017-05-18T00:21:14Z

Fix SPARK-12139: REGEX Column Specification for Hive Queries




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17763: [SPARK-13747][Core]Add ThreadUtils.awaitReady and...

2017-05-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17763


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17763: [SPARK-13747][Core]Add ThreadUtils.awaitReady and disall...

2017-05-17 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/17763
  
Thanks! Merging to master and 2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18020
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-17 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/17997#discussion_r117137648
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala
 ---
@@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
   }
 }
 checkEvaluation(DayOfYear(Literal.create(null, DateType)), null)
+
+checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 
13:10:15").getTime))), 288)
--- End diff --

cc @cloud-fan @gatorsmile
Do you have any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77037 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77037/testReport)**
 for PR 14971 at commit 
[`cce31db`](https://github.com/apache/spark/commit/cce31db80cdc66516e3e537f33a3611b07186b6b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #77036 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77036/testReport)**
 for PR 14971 at commit 
[`22a2c00`](https://github.com/apache/spark/commit/22a2c00333ffc39458f45d629c1b3199f73f1f3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...

2017-05-17 Thread gatorsmile

GitHub user gatorsmile reopened a pull request:

https://github.com/apache/spark/pull/14971

[SPARK-17410] [SPARK-17284] Move Hive-generated Stats Info to HiveClientImpl

### What changes were proposed in this pull request?

After we adding a new field `stats` into `CatalogTable`, we should not 
expose Hive-specific Stats metadata to `MetastoreRelation`. It complicates all 
the related codes. It also introduces a bug in `SHOW CREATE TABLE`. The 
statistics-related table properties should be skipped by `SHOW CREATE TABLE`, 
since it could be incorrect in the newly created table. See the Hive JIRA: 
https://issues.apache.org/jira/browse/HIVE-13792

This PR is to handle Hive-specific Stats metadata in `HiveClientImpl`. 
### How was this patch tested?

Added a few test cases.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark showCreateTableNew

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14971.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14971


commit 92474c5a142fb9db2c86549c8347f910fc01fcbd
Author: gatorsmile 
Date:   2016-08-28T22:28:15Z

remove stats-related props

commit ce8e8b89a5b61648daaa59578e2b6a99ec2f6d74
Author: gatorsmile 
Date:   2016-08-29T05:24:05Z

address comments

commit 9ce526b7729c4111292d6abb69bd81aec0ecf5de
Author: gatorsmile 
Date:   2016-09-06T00:25:56Z

Merge remote-tracking branch 'upstream/master' into showCreateTable

commit efd879dbda12e235d00d9b6bc4891a591832912b
Author: gatorsmile 
Date:   2016-09-06T07:07:12Z

move stats from metastorerelation to hiveclientimpl

commit 491c0cd2580cfd358b2fd4d94a6f41b4063f172f
Author: gatorsmile 
Date:   2016-09-06T07:34:51Z

improve the comments.

commit c9cdf44b561c5e88a108cd09ad17842332d14162
Author: gatorsmile 
Date:   2016-09-06T07:35:26Z

improve the comments.

commit 4b0aed54b0aef6675f4f9fac82f6647563afb5cc
Author: gatorsmile 
Date:   2016-09-08T03:21:28Z

Merge remote-tracking branch 'upstream/master' into showCreateTable

commit 552101af418e4a30febf7d09938022ecc4c08da9
Author: gatorsmile 
Date:   2016-09-08T03:25:00Z

merge

commit d3dcb564509fd2a32a3fadefb811495affaaa466
Author: gatorsmile 
Date:   2016-09-11T05:55:20Z

Merge remote-tracking branch 'upstream/master' into showCreateTable

commit 9e18ba104527d2bb14331f4b51194002dabb2556
Author: gatorsmile 
Date:   2016-09-11T21:48:36Z

fix and add more test cases

commit c6a85bcd4b6b58c46787d1ca1510418cef79a8d5
Author: gatorsmile 
Date:   2016-09-16T04:50:18Z

Merge remote-tracking branch 'upstream/master' into showCreateTable

commit 3ed68e0b0aa6aff19a5e31c89fed7e5c814e83f3
Author: gatorsmile 
Date:   2016-09-16T06:16:40Z

improve the test case

commit 2e4d398388cd64f3e1d130af81d5e7ddc23a2a19
Author: gatorsmile 
Date:   2016-09-17T06:52:32Z

also utilizes Hive-generated row counts when not analyzed in Spark

commit 5dfa17efa84ed180e68b4922cfaf85e3d50f14ad
Author: gatorsmile 
Date:   2016-09-17T07:30:57Z

more comments

commit 2f40c7f5532c8b6e66c786f3b1506bd4efdcf711
Author: gatorsmile 
Date:   2016-09-18T00:08:48Z

address comments.

commit 3376bd6a57a65fa004abd43237f8f3c87f07064a
Author: gatorsmile 
Date:   2016-09-18T03:31:11Z

fix test cases

commit 90cd18e9d7bad6462fb0254d7981e23341795c11
Author: gatorsmile 
Date:   2016-09-21T04:58:42Z

Merge remote-tracking branch 'upstream/master' into showCreateTable

commit 7ad08fe2a488fa759b4abf4e99a7206e031379d9
Author: gatorsmile 
Date:   2016-09-21T05:23:58Z

test case fix

commit f4c0ebb0901216ea09eaf3f77e4fdcd431b15d37
Author: gatorsmile 
Date:   2016-09-22T23:08:55Z

address comments

commit 4c89d92ab65d7f4f061e32aa22780fd6e4b7c798
Author: gatorsmile 
Date:   2016-09-22T23:12:57Z

address comments

commit 699b5d8aa4d9370009c73f45d1618f1e5bb92210
Author: gatorsmile 
Date:   2016-09-24T01:10:04Z

fix.

commit 8c90d0b7364c46de5a4a59fa89457000bb283dd9
Author: gatorsmile 
Date:   2016-09-25T05:33:44Z

Merge remote-tracking branch 'upstream/master' into showCreateTable

commit 50ce04e51aebd5f68f7e50ec7c3bbe72275bf629
Author: gatorsmile 
Date:   2016-09-25T07:10:30Z

address comments.




---
If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18020
  
**[Test build #77035 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77035/testReport)**
 for PR 18020 at commit 
[`aa16ab3`](https://github.com/apache/spark/commit/aa16ab38fc0e0c80b179a5860f477c3650f64609).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #77034 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77034/testReport)**
 for PR 17094 at commit 
[`9461c45`](https://github.com/apache/spark/commit/9461c45b39ac4c61d0faf6830ee9da0ed8d7015b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77033/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #77033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77033/testReport)**
 for PR 17094 at commit 
[`b55b7fe`](https://github.com/apache/spark/commit/b55b7fe0c6af2a744e193f36090845773253ef97).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-05-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 331 matches

Mail list logo