[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117163859 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui import javax.servlet.http.HttpServletRequest import scala.collection.mutable -import scala.xml.Node +import scala.xml.{NodeSeq, Node} --- End diff -- I can't remember what flags/options run the style check with mvn, but you can always run it directly with `dev/scalastyle` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18024: [SPARK-20792][SS] Support same timeout operations...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/18024 [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries ## What changes were proposed in this pull request? Currently, in the batch queries, timeout is disabled (i.e. GroupStateTimeout.NoTimeout) which means any GroupState.setTimeout*** operation would throw UnsupportedOperationException. This makes it weird when converting a streaming query into a batch query by changing the input DF from streaming to a batch DF. If the timeout was enabled and used, then the batch query will start throwing UnsupportedOperationException. This creates the dummy state in batch queries with the provided timeoutConf so that it behaves in the same way. ## How was this patch tested? Additional tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-20792 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18024.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18024 commit eef789fe1fd04a98b4d82da6864ca4f4b23c2bfb Author: Tathagata DasDate: 2017-05-18T05:31:44Z Fixed bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user guoxiaolongzte commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117163563 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui import javax.servlet.http.HttpServletRequest import scala.collection.mutable -import scala.xml.Node +import scala.xml.{NodeSeq, Node} --- End diff -- How to run the style checker? But i build the code with maven success. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117163321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -20,7 +20,7 @@ package org.apache.spark.sql.execution.ui import javax.servlet.http.HttpServletRequest import scala.collection.mutable -import scala.xml.Node +import scala.xml.{NodeSeq, Node} --- End diff -- have you run the style checker? I think this may be in the wrong order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117162950 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -47,39 +49,47 @@ import org.apache.spark.util.{AccumulatorContext, AccumulatorV2} *data type is nullable. */ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContext { --- End diff -- can we just have a simple end-to-end test? The fix is actually very simple and seems not worth such complex tests to verify it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18014 I thought that idea is for Apache Arrow. We could use binary type for `UnsafeArrayData`. However, it involves some complexity to use [`ColumnVector.Array`](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L1015-L1017). Is it better to use existing code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17995 ping @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17999 **[Test build #77041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77041/testReport)** for PR 17999 at commit [`c80c184`](https://github.com/apache/spark/commit/c80c184d5a9f85e2bff740e8cf96bd9a97d0f8a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117162403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -166,7 +166,14 @@ private[parquet] object ParquetFilters { * Converts data sources filters to Parquet filter predicates. */ def createFilter(schema: StructType, predicate: sources.Filter): Option[FilterPredicate] = { -val dataTypeOf = getFieldMap(schema) +val nameTypeMap = getFieldMap(schema) --- End diff -- nit: `nameToType` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18011 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18011: [SPARK-19089][SQL] Add support for nested sequenc...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18011#discussion_r117161759 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala --- @@ -258,6 +258,10 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext { ListClass(List(1)) -> Queue("test" -> SeqClass(Seq(2 } + test("nested sequences") { +checkDataset(Seq(Seq(Seq(1))).toDS(), Seq(Seq(1))) --- End diff -- let's also add test for specific collection type, e.g. `List(Queue(1))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18011 **[Test build #77040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77040/testReport)** for PR 18011 at commit [`dd3bf01`](https://github.com/apache/spark/commit/dd3bf0113cbf66ebf784f68d7f602c39f4a46b8b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18011: [SPARK-19089][SQL] Add support for nested sequences
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18011 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16986: [SPARK-18891][SQL] Support for Map collection typ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16986#discussion_r117160501 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -329,35 +329,19 @@ object ScalaReflection extends ScalaReflection { } UnresolvedMapObjects(mapFunction, getPath, Some(cls)) - case t if t <:< localTypeOf[Map[_, _]] => + case t if t <:< localTypeOf[Map[_, _]] || t <:< localTypeOf[java.util.Map[_, _]] => --- End diff -- we should handle java map in `JavaTypeInference`, but I think it's better to do it in another PR and focus on scala map in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18000 I would rather like to say it is a limitation in Parquet API. It looks there is no way to set column names having dots in Parquet filters properly. https://github.com/apache/spark/pull/17680 suggests a hacky workaround to set this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18000 a high-level question, is it a parquet bug or Spark doesn't use parquet reader correctly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18014: [SPARK-20783][SQL] Enhance ColumnVector to keep UnsafeAr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18014 I may miss something, can we just treat array type as binary type and put it in `ColumnVector`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117158817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -603,7 +603,13 @@ object DateTimeUtils { */ private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int) = { // add the difference (in days) between 1.1.1970 and the artificial year 0 (-17999) -val daysNormalized = daysSince1970 + toYearZero +var daysSince1970Tmp = daysSince1970 +// In history,the period(5.10.1582 ~ 14.10.1582) is not exist --- End diff -- OK, I will do ,thanks @kiszk @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14971#discussion_r117158766 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -175,7 +178,7 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto sql(s"INSERT INTO TABLE $textTable SELECT * FROM src") checkTableStats( textTable, -hasSizeInBytes = false, +hasSizeInBytes = true, --- End diff -- why the behavior is changed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14971#discussion_r117158738 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/ShowCreateTableSuite.scala --- @@ -325,26 +325,24 @@ class ShowCreateTableSuite extends QueryTest with SQLTestUtils with TestHiveSing "last_modified_by", "last_modified_time", "Owner:", -"COLUMN_STATS_ACCURATE", // The following are hive specific schema parameters which we do not need to match exactly. -"numFiles", -"numRows", -"rawDataSize", -"totalSize", "totalNumberFiles", "maxFileSize", -"minFileSize", -// EXTERNAL is not non-deterministic, but it is filtered out for external tables. -"EXTERNAL" +"minFileSize" ) table.copy( createTime = 0L, lastAccessTime = 0L, -properties = table.properties.filterKeys(!nondeterministicProps.contains(_)) +properties = table.properties.filterKeys(!nondeterministicProps.contains(_)), +stats = None, +ignoredProperties = Map.empty ) } +val e = normalize(actual) +val m = normalize(expected) --- End diff -- remove this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14971#discussion_r117158531 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -414,6 +415,50 @@ private[hive] class HiveClientImpl( val properties = Option(h.getParameters).map(_.asScala.toMap).orNull + // Hive-generated Statistics are also recorded in ignoredProperties + val ignoredProperties = scala.collection.mutable.Map.empty[String, String] + for (key <- HiveStatisticsProperties; value <- properties.get(key)) { +ignoredProperties += key -> value + } + + val excludedTableProperties = HiveStatisticsProperties ++ Set( +// The property value of "comment" is moved to the dedicated field "comment" +"comment", +// For EXTERNAL_TABLE, the table properties has a particular field "EXTERNAL". This is added +// in the function toHiveTable. +"EXTERNAL" + ) + + val filteredProperties = properties.filterNot { +case (key, _) => excludedTableProperties.contains(key) + } + val comment = properties.get("comment") + + val totalSize = properties.get(StatsSetupConst.TOTAL_SIZE).map(BigInt(_)) + val rawDataSize = properties.get(StatsSetupConst.RAW_DATA_SIZE).map(BigInt(_)) + def rowCount = properties.get(StatsSetupConst.ROW_COUNT).map(BigInt(_)) match { +case Some(c) if c >= 0 => Some(c) +case _ => None + } + // TODO: check if this estimate is valid for tables after partition pruning. + // NOTE: getting `totalSize` directly from params is kind of hacky, but this should be + // relatively cheap if parameters for the table are populated into the metastore. + // Currently, only totalSize, rawDataSize, and row_count are used to build the field `stats` + // TODO: stats should include all the other two fields (`numFiles` and `numPartitions`). + // (see StatsSetupConst in Hive) + val stats = + // When table is external, `totalSize` is always zero, which will influence join strategy + // so when `totalSize` is zero, use `rawDataSize` instead. When `rawDataSize` is also zero, + // return None. Later, we will use the other ways to estimate the statistics. + if (totalSize.isDefined && totalSize.get > 0L) { --- End diff -- the indention is wrong --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117158477 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- OK, thanks @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117158402 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -490,6 +516,42 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("SPARK-20364 Do not push down filters when column names have dots") { +implicit class StringToAttribute(str: String) { + // Implicits for attr, $ and symbol do not handle backticks. + def attribute: Attribute = UnresolvedAttribute.quotedString(str) --- End diff -- Yea, actually my initial version in my local included the change for `symbol` and` $` to match them to `Column`. It also looks making sense per https://github.com/apache/spark/pull/7969. I believe this is an internal API - https://github.com/apache/spark/blob/e9c91badce64731ffd3e53cbcd9f044a7593e6b8/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/package.scala#L24 so I guess it would be fine even if it introduces a behaviour change. Nevertheless, I believe some guys don't like this change much and wanted to avoid such changes here for now (it is single place it needs anyway for now ... ). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117157965 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -490,6 +516,42 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("SPARK-20364 Do not push down filters when column names have dots") { +implicit class StringToAttribute(str: String) { + // Implicits for attr, $ and symbol do not handle backticks. + def attribute: Attribute = UnresolvedAttribute.quotedString(str) --- End diff -- Shall we make $ to use`UnresolvedAttribute.quotedString`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17995 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17995 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77038/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117157765 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- let's follow mysql --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17995 **[Test build #77038 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77038/testReport)** for PR 17995 at commit [`bed4c41`](https://github.com/apache/spark/commit/bed4c4183fa94b20d978ac9e61d225ea989c8a73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17994: [SPARK-20505][ML] Add docs and examples for ml.st...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17994 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17994: [SPARK-20505][ML] Add docs and examples for ml.stat.Corr...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/17994 Merged into master and branch-2.2. Thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16989 **[Test build #77039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77039/testReport)** for PR 16989 at commit [`4ece142`](https://github.com/apache/spark/commit/4ece142d2a3c4b46a712539e3aa7f7ee0d4e6b5b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17996: [SPARK-20506][DOCS] 2.2 migration guide
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17996#discussion_r117155950 --- Diff: docs/ml-guide.md --- @@ -72,35 +72,26 @@ MLlib is under active development. The APIs marked `Experimental`/`DeveloperApi` may change in future releases, and the migration guide below will explain all changes between releases. -## From 2.0 to 2.1 +## From 2.1 to 2.2 ### Breaking changes - -**Deprecated methods removed** -* `setLabelCol` in `feature.ChiSqSelectorModel` -* `numTrees` in `classification.RandomForestClassificationModel` (This now refers to the Param called `numTrees`) -* `numTrees` in `regression.RandomForestRegressionModel` (This now refers to the Param called `numTrees`) -* `model` in `regression.LinearRegressionSummary` -* `validateParams` in `PipelineStage` -* `validateParams` in `Evaluator` +There are no breaking changes. ### Deprecations and changes of behavior **Deprecations** -* [SPARK-18592](https://issues.apache.org/jira/browse/SPARK-18592): - Deprecate all Param setter methods except for input/output column Params for `DecisionTreeClassificationModel`, `GBTClassificationModel`, `RandomForestClassificationModel`, `DecisionTreeRegressionModel`, `GBTRegressionModel` and `RandomForestRegressionModel` +There are no deprecations. **Changes of behavior** --- End diff -- Should we include #17233 in this section? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117155497 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- @cloud-fan Because in history,the period(5.10.1582 ~ 14.10.1582) is not exist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18017: [INFRA] Close stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18017 (#16654 was took out as it was closed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117155315 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- why `278` is better? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 Checking the code: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala#L59 `SparkConfigProvider` just check if the key is in JMap, if not return the default value. It doesn't check the alternatives. I think it seems this is the reason `org.apache.spark.memory.TaskMemoryManagerSuite.offHeapConfigurationBackwardsCompatibility ` fails. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17869: [SPARK-20609][CORE]Run the SortShuffleSuite unit tests h...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/17869 @srowen , I commit to modify the PR. Can you help me to run `test build` again. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16989 that seems impossible, can you give an example? BTW if this blocks you, just revert the off-heap config changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/18016 @hvanhovell @srowen I have modify it again. and` floor` is same problem. review please. thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17995 **[Test build #77038 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77038/testReport)** for PR 17995 at commit [`bed4c41`](https://github.com/apache/spark/commit/bed4c4183fa94b20d978ac9e61d225ea989c8a73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17995: [SPARK-20762][ML]Make String Params Case-Insensitive
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/17995 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user 10110346 commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117153595 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- In mysql ,the rusult is : mysql> select dayofyear("1982-10-04"); +-+ | dayofyear("1982-10-04") | +-+ | 277 | +-+ 1 row in set (0.00 sec) mysql> select dayofyear("1982-10-015"); +--+ | dayofyear("1982-10-015") | +--+ | 288 | +--+ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18002#discussion_r117153570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala --- @@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends Serializable { /** * Gathers statistics information from `row(ordinal)`. */ - def gatherStats(row: InternalRow, ordinal: Int): Unit = { -if (row.isNullAt(ordinal)) { - nullCount += 1 - // 4 bytes for null position - sizeInBytes += 4 -} + def gatherStats(row: InternalRow, ordinal: Int): Unit + + /** + * Gathers statistics information on `null`. + */ + def gatherNullStats(): Unit = { +nullCount += 1 +// 4 bytes for null position +sizeInBytes += 4 count += 1 } /** - * Column statistics represented as a single row, currently including closed lower bound, closed + * Column statistics represented as an array, currently including closed lower bound, closed * upper bound and null count. */ - def collectedStatistics: GenericInternalRow + def collectedStatistics: Array[Any] } /** * A no-op ColumnStats only used for testing purposes. */ -private[columnar] class NoopColumnStats extends ColumnStats { - override def gatherStats(row: InternalRow, ordinal: Int): Unit = super.gatherStats(row, ordinal) +private[columnar] final class NoopColumnStats extends ColumnStats { + override def gatherStats(row: InternalRow, ordinal: Int): Unit = { +if (!row.isNullAt(ordinal)) { + count += 1 +} else { + gatherNullStats +} + } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L)) + override def collectedStatistics: Array[Any] = Array[Any](null, null, nullCount, count, 0L) } -private[columnar] class BooleanColumnStats extends ColumnStats { +private[columnar] final class BooleanColumnStats extends ColumnStats { protected var upper = false protected var lower = true override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getBoolean(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += BOOLEAN.defaultSize + gatherValueStats(value) +} else { + gatherNullStats } } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](lower, upper, nullCount, count, sizeInBytes)) + def gatherValueStats(value: Boolean): Unit = { +if (value > upper) upper = value +if (value < lower) lower = value +sizeInBytes += BOOLEAN.defaultSize +count += 1 + } + + override def collectedStatistics: Array[Any] = +Array[Any](lower, upper, nullCount, count, sizeInBytes) } -private[columnar] class ByteColumnStats extends ColumnStats { +private[columnar] final class ByteColumnStats extends ColumnStats { protected var upper = Byte.MinValue protected var lower = Byte.MaxValue override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getByte(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += BYTE.defaultSize + gatherValueStats(value) +} else { + gatherNullStats } } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](lower, upper, nullCount, count, sizeInBytes)) + def gatherValueStats(value: Byte): Unit = { +if (value > upper) upper = value +if (value < lower) lower = value +sizeInBytes += BYTE.defaultSize +count += 1 + } + + override def collectedStatistics: Array[Any] = +Array[Any](lower, upper, nullCount, count, sizeInBytes) } -private[columnar] class ShortColumnStats extends ColumnStats { +private[columnar] final class ShortColumnStats extends ColumnStats { protected var upper = Short.MinValue protected var lower = Short.MaxValue override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getShort(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += SHORT.defaultSize +
[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18002#discussion_r117153480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala --- @@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends Serializable { /** * Gathers statistics information from `row(ordinal)`. */ - def gatherStats(row: InternalRow, ordinal: Int): Unit = { -if (row.isNullAt(ordinal)) { - nullCount += 1 - // 4 bytes for null position - sizeInBytes += 4 -} + def gatherStats(row: InternalRow, ordinal: Int): Unit + + /** + * Gathers statistics information on `null`. + */ + def gatherNullStats(): Unit = { +nullCount += 1 +// 4 bytes for null position +sizeInBytes += 4 count += 1 } /** - * Column statistics represented as a single row, currently including closed lower bound, closed + * Column statistics represented as an array, currently including closed lower bound, closed * upper bound and null count. */ - def collectedStatistics: GenericInternalRow + def collectedStatistics: Array[Any] } /** * A no-op ColumnStats only used for testing purposes. */ -private[columnar] class NoopColumnStats extends ColumnStats { - override def gatherStats(row: InternalRow, ordinal: Int): Unit = super.gatherStats(row, ordinal) +private[columnar] final class NoopColumnStats extends ColumnStats { + override def gatherStats(row: InternalRow, ordinal: Int): Unit = { +if (!row.isNullAt(ordinal)) { + count += 1 +} else { + gatherNullStats +} + } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L)) + override def collectedStatistics: Array[Any] = Array[Any](null, null, nullCount, count, 0L) } -private[columnar] class BooleanColumnStats extends ColumnStats { +private[columnar] final class BooleanColumnStats extends ColumnStats { protected var upper = false protected var lower = true override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getBoolean(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += BOOLEAN.defaultSize + gatherValueStats(value) +} else { + gatherNullStats } } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](lower, upper, nullCount, count, sizeInBytes)) + def gatherValueStats(value: Boolean): Unit = { +if (value > upper) upper = value +if (value < lower) lower = value +sizeInBytes += BOOLEAN.defaultSize +count += 1 + } + + override def collectedStatistics: Array[Any] = +Array[Any](lower, upper, nullCount, count, sizeInBytes) } -private[columnar] class ByteColumnStats extends ColumnStats { +private[columnar] final class ByteColumnStats extends ColumnStats { protected var upper = Byte.MinValue protected var lower = Byte.MaxValue override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getByte(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += BYTE.defaultSize + gatherValueStats(value) +} else { + gatherNullStats } } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](lower, upper, nullCount, count, sizeInBytes)) + def gatherValueStats(value: Byte): Unit = { +if (value > upper) upper = value +if (value < lower) lower = value +sizeInBytes += BYTE.defaultSize +count += 1 + } + + override def collectedStatistics: Array[Any] = +Array[Any](lower, upper, nullCount, count, sizeInBytes) } -private[columnar] class ShortColumnStats extends ColumnStats { +private[columnar] final class ShortColumnStats extends ColumnStats { protected var upper = Short.MinValue protected var lower = Short.MaxValue override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getShort(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += SHORT.defaultSize +
[GitHub] spark pull request #16654: [SPARK-19303][ML][WIP] Add evaluate method in clu...
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/16654 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18002: [SPARK-20770][SQL] Improve ColumnStats
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18002#discussion_r117153431 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnStats.scala --- @@ -53,219 +53,299 @@ private[columnar] sealed trait ColumnStats extends Serializable { /** * Gathers statistics information from `row(ordinal)`. */ - def gatherStats(row: InternalRow, ordinal: Int): Unit = { -if (row.isNullAt(ordinal)) { - nullCount += 1 - // 4 bytes for null position - sizeInBytes += 4 -} + def gatherStats(row: InternalRow, ordinal: Int): Unit + + /** + * Gathers statistics information on `null`. + */ + def gatherNullStats(): Unit = { +nullCount += 1 +// 4 bytes for null position +sizeInBytes += 4 count += 1 } /** - * Column statistics represented as a single row, currently including closed lower bound, closed + * Column statistics represented as an array, currently including closed lower bound, closed * upper bound and null count. */ - def collectedStatistics: GenericInternalRow + def collectedStatistics: Array[Any] } /** * A no-op ColumnStats only used for testing purposes. */ -private[columnar] class NoopColumnStats extends ColumnStats { - override def gatherStats(row: InternalRow, ordinal: Int): Unit = super.gatherStats(row, ordinal) +private[columnar] final class NoopColumnStats extends ColumnStats { + override def gatherStats(row: InternalRow, ordinal: Int): Unit = { +if (!row.isNullAt(ordinal)) { + count += 1 +} else { + gatherNullStats +} + } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](null, null, nullCount, count, 0L)) + override def collectedStatistics: Array[Any] = Array[Any](null, null, nullCount, count, 0L) } -private[columnar] class BooleanColumnStats extends ColumnStats { +private[columnar] final class BooleanColumnStats extends ColumnStats { protected var upper = false protected var lower = true override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getBoolean(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += BOOLEAN.defaultSize + gatherValueStats(value) +} else { + gatherNullStats } } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](lower, upper, nullCount, count, sizeInBytes)) + def gatherValueStats(value: Boolean): Unit = { +if (value > upper) upper = value +if (value < lower) lower = value +sizeInBytes += BOOLEAN.defaultSize +count += 1 + } + + override def collectedStatistics: Array[Any] = +Array[Any](lower, upper, nullCount, count, sizeInBytes) } -private[columnar] class ByteColumnStats extends ColumnStats { +private[columnar] final class ByteColumnStats extends ColumnStats { protected var upper = Byte.MinValue protected var lower = Byte.MaxValue override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getByte(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += BYTE.defaultSize + gatherValueStats(value) +} else { + gatherNullStats } } - override def collectedStatistics: GenericInternalRow = -new GenericInternalRow(Array[Any](lower, upper, nullCount, count, sizeInBytes)) + def gatherValueStats(value: Byte): Unit = { +if (value > upper) upper = value +if (value < lower) lower = value +sizeInBytes += BYTE.defaultSize +count += 1 + } + + override def collectedStatistics: Array[Any] = +Array[Any](lower, upper, nullCount, count, sizeInBytes) } -private[columnar] class ShortColumnStats extends ColumnStats { +private[columnar] final class ShortColumnStats extends ColumnStats { protected var upper = Short.MinValue protected var lower = Short.MaxValue override def gatherStats(row: InternalRow, ordinal: Int): Unit = { -super.gatherStats(row, ordinal) if (!row.isNullAt(ordinal)) { val value = row.getShort(ordinal) - if (value > upper) upper = value - if (value < lower) lower = value - sizeInBytes += SHORT.defaultSize +
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117153106 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- can we check with other databases? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117153080 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -603,7 +603,13 @@ object DateTimeUtils { */ private[this] def getYearAndDayInYear(daysSince1970: SQLDate): (Int, Int) = { // add the difference (in days) between 1.1.1970 and the artificial year 0 (-17999) -val daysNormalized = daysSince1970 + toYearZero +var daysSince1970Tmp = daysSince1970 +// In history,the period(5.10.1582 ~ 14.10.1582) is not exist --- End diff -- It's only about comment, and I think 1582-10-5 or Oct. 5, 1582 is more human readable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16989: [SPARK-19659] Fetch big blocks to disk when shuffle-read...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16989 It seems like `SparkConfigProvider` is not checking alternatives in `SparkConf`. That's why spark.memory.offHeap.enabled is not set(still the default value), though we've already set `spark.unsafe.offHeap` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r117152091 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,39 @@ package object config { "spark.io.compression.codec.") .booleanConf .createWithDefault(false) + + private[spark] val SHUFFLE_ACCURATE_BLOCK_THRESHOLD = +ConfigBuilder("spark.shuffle.accurateBlkThreshold") + .doc("When we compress the size of shuffle blocks in HighlyCompressedMapStatus, we will " + +"record the size accurately if it's above the threshold specified by this config. This " + +"helps to prevent OOM by avoiding underestimating shuffle block size when fetch shuffle " + +"blocks.") + .longConf + .createWithDefault(100 * 1024 * 1024) + + private[spark] val MEMORY_OFF_HEAP_ENABLED = +ConfigBuilder("spark.memory.offHeap.enabled") + .doc("If true, Spark will attempt to use off-heap memory for certain operations(e.g. sort, " + +"aggregate, etc. However, the buffer used for fetching shuffle blocks is always " + +"off-heap). If off-heap memory use is enabled, then spark.memory.offHeap.size must be " + +"positive.") + .booleanConf + .createWithDefault(false) + + private[spark] val MEMORY_OFF_HEAP_SIZE = +ConfigBuilder("spark.memory.offHeap.size") + .doc("The absolute amount of memory in bytes which can be used for off-heap allocation." + +" This setting has no impact on heap memory usage, so if your executors' total memory" + +" consumption must fit within some hard limit then be sure to shrink your JVM heap size" + +" accordingly. This must be set to a positive value when " + +"spark.memory.offHeap.enabled=true.") + .longConf --- End diff -- Yes, I should refine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16989: [SPARK-19659] Fetch big blocks to disk when shuff...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16989#discussion_r117151567 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -278,4 +278,39 @@ package object config { "spark.io.compression.codec.") .booleanConf .createWithDefault(false) + + private[spark] val SHUFFLE_ACCURATE_BLOCK_THRESHOLD = +ConfigBuilder("spark.shuffle.accurateBlkThreshold") + .doc("When we compress the size of shuffle blocks in HighlyCompressedMapStatus, we will " + +"record the size accurately if it's above the threshold specified by this config. This " + +"helps to prevent OOM by avoiding underestimating shuffle block size when fetch shuffle " + +"blocks.") + .longConf + .createWithDefault(100 * 1024 * 1024) + + private[spark] val MEMORY_OFF_HEAP_ENABLED = +ConfigBuilder("spark.memory.offHeap.enabled") + .doc("If true, Spark will attempt to use off-heap memory for certain operations(e.g. sort, " + +"aggregate, etc. However, the buffer used for fetching shuffle blocks is always " + +"off-heap). If off-heap memory use is enabled, then spark.memory.offHeap.size must be " + +"positive.") + .booleanConf + .createWithDefault(false) + + private[spark] val MEMORY_OFF_HEAP_SIZE = +ConfigBuilder("spark.memory.offHeap.size") + .doc("The absolute amount of memory in bytes which can be used for off-heap allocation." + +" This setting has no impact on heap memory usage, so if your executors' total memory" + +" consumption must fit within some hard limit then be sure to shrink your JVM heap size" + +" accordingly. This must be set to a positive value when " + +"spark.memory.offHeap.enabled=true.") + .longConf --- End diff -- we should use `.bytesConf(ByteUnit.BYTE)`, see `SQLConf.SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE` as an example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18015 @ajbozarth Thank you very much for the suggestion that I have modified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77037/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77037 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77037/testReport)** for PR 14971 at commit [`cce31db`](https://github.com/apache/spark/commit/cce31db80cdc66516e3e537f33a3611b07186b6b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77036/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77036/testReport)** for PR 14971 at commit [`22a2c00`](https://github.com/apache/spark/commit/22a2c00333ffc39458f45d629c1b3199f73f1f3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17435 I think we need a test and @holdenk's review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18017: [INFRA] Close stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18017 (Actually, let me take out #17435. It looks recently updated and I believe it has a point there). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117148652 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -33,24 +33,24 @@ private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with L override def render(request: HttpServletRequest): Seq[Node] = { val currentTime = System.currentTimeMillis() -val content = listener.synchronized { +var content : NodeSeq = listener.synchronized { --- End diff -- I'd rather not switch to a `var` (it's very un-scala), see below for alt suggestion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117148750 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -61,6 +61,36 @@ private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with L details.parentNode.querySelector('.stage-details').classList.toggle('collapsed') }} +content = + + + { + if (listener.getRunningExecutions.nonEmpty) { + + Running Queries: + {listener.getRunningExecutions.size} + + } + } + { + if (listener.getCompletedExecutions.nonEmpty) { + + Completed Queries: + {listener.getCompletedExecutions.size} + + } + } + { + if (listener.getFailedExecutions.nonEmpty) { + + Failed Queries: + {listener.getFailedExecutions.size} + + } + } + + ++ content + UIUtils.headerSparkPage("SQL", content, parent, Some(5000)) --- End diff -- then you could replace `content` here with `summary ++ content` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user ajbozarth commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117148693 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -61,6 +61,36 @@ private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with L details.parentNode.querySelector('.stage-details').classList.toggle('collapsed') }} +content = --- End diff -- perhaps leave this as `summary`, but not `++ content` at the end --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77035/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18020 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18020 **[Test build #77035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77035/testReport)** for PR 18020 at commit [`aa16ab3`](https://github.com/apache/spark/commit/aa16ab38fc0e0c80b179a5860f477c3650f64609). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r117148664 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java --- @@ -730,4 +726,49 @@ public void testToLong() throws IOException { assertFalse(negativeInput, UTF8String.fromString(negativeInput).toLong(wrapper)); } } + @Test + public void trimsChar() { --- End diff -- Could you split this test case into three test cases for trim, trimLeft, trimRight? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide jump link...
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/18015 @ajbozarth Rebuild, optimize the variable name. I add two screenshots.Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18015: [SAPRK-20785][WEB-UI][SQL]Spark should provide ju...
Github user guoxiaolongzte commented on a diff in the pull request: https://github.com/apache/spark/pull/18015#discussion_r117148012 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala --- @@ -61,7 +61,37 @@ private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with L details.parentNode.querySelector('.stage-details').classList.toggle('collapsed') }} -UIUtils.headerSparkPage("SQL", content, parent, Some(5000)) + +val summary: NodeSeq = --- End diff -- Rebuild, optimize the variable name. I add two screenshots. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18000 Thank you @viirya. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117145159 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -166,7 +166,14 @@ private[parquet] object ParquetFilters { * Converts data sources filters to Parquet filter predicates. */ def createFilter(schema: StructType, predicate: sources.Filter): Option[FilterPredicate] = { -val dataTypeOf = getFieldMap(schema) +val nameTypeMap = getFieldMap(schema) + +// Parquet does not allow dots in the column name because dots are used as a column path --- End diff -- Not just for speed. Also for the number of codes needed to change. But I think it is ok for me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18000 Sounds ok for me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77032/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15821 **[Test build #77032 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77032/testReport)** for PR 15821 at commit [`b4eebc2`](https://github.com/apache/spark/commit/b4eebc27e261eddb4d8b0b829245fa3c187dade1). * This patch **fails PySpark pip packaging tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15821 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18000: [SPARK-20364][SQL] Disable Parquet predicate pushdown fo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18000 Just to make sure, I don't feel strongly for both comments @viirya. I am willing to fix if you feel strongly. Please let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117143908 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -166,7 +166,14 @@ private[parquet] object ParquetFilters { * Converts data sources filters to Parquet filter predicates. */ def createFilter(schema: StructType, predicate: sources.Filter): Option[FilterPredicate] = { -val dataTypeOf = getFieldMap(schema) +val nameTypeMap = getFieldMap(schema) + +// Parquet does not allow dots in the column name because dots are used as a column path --- End diff -- Hm, I expect this is a non-critical path and not executed multiple times. Also, it does not look particularly faster to call, `Filter.references` -> `Filter.findReferences` -> `Filter.references` ... . Another downside (maybe nitpicking) is, this will introduce another small code path that returns `None` for filter creation failure. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117143600 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -166,7 +166,14 @@ private[parquet] object ParquetFilters { * Converts data sources filters to Parquet filter predicates. */ def createFilter(schema: StructType, predicate: sources.Filter): Option[FilterPredicate] = { -val dataTypeOf = getFieldMap(schema) +val nameTypeMap = getFieldMap(schema) + +// Parquet does not allow dots in the column name because dots are used as a column path --- End diff -- Ok. Sounds making sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14957: [SPARK-4502][SQL]Support parquet nested struct pruning a...
Github user saulshanabrook commented on the issue: https://github.com/apache/spark/pull/14957 @xuanyuanking Have you determined if the functionality provided here is superseded by #16578? I am trying to figure out which PR to help out on since I need this feature as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18020 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77031/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17094 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77034/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17094 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17094 **[Test build #77034 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77034/testReport)** for PR 17094 at commit [`9461c45`](https://github.com/apache/spark/commit/9461c45b39ac4c61d0faf6830ee9da0ed8d7015b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18020 **[Test build #77031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77031/testReport)** for PR 18020 at commit [`3890b91`](https://github.com/apache/spark/commit/3890b91f42205d4db19349af459c6511ab81daf1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: Fix SPARK-12139: REGEX Column Specification for Hive Que...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18023 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18000: [SPARK-20364][SQL] Disable Parquet predicate push...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18000#discussion_r117140137 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -166,7 +166,14 @@ private[parquet] object ParquetFilters { * Converts data sources filters to Parquet filter predicates. */ def createFilter(schema: StructType, predicate: sources.Filter): Option[FilterPredicate] = { -val dataTypeOf = getFieldMap(schema) +val nameTypeMap = getFieldMap(schema) + +// Parquet does not allow dots in the column name because dots are used as a column path --- End diff -- Yes, it is. However, we don't already log pushed filters failed to create, e.g., `In` AFAIK. Probably, we should log in those cases across all the sources. If you don't strongly feel about this, I would like to not log here for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: Fix SPARK-12139: REGEX Column Specification for H...
GitHub user janewangfb opened a pull request: https://github.com/apache/spark/pull/18023 Fix SPARK-12139: REGEX Column Specification for Hive Queries ## What changes were proposed in this pull request? Hive interprets regular expression, e.g., `(a)?+.+` in query specification. This PR enables spark to support this feature when hive.support.quoted.identifiers is set to true. ## How was this patch tested? - Add unittests in SQLQuerySuite.scala - Iin spark-shell tested the original failed query: scala> hc.sql("SELECT `(appid|ds|host|instance|offset|ts)?+.+`, IF(FB_IS_VALID_HIVE_PARTITION_VALUE(appid), appid, 'BAD_APPID'), IF(FB_IS_VALID_HIVE_PARTITION_VALUE(ts), ts, 'BAD_TS') FROM time_spent_bit_array_mobile_current WHERE ds='2017-05-14' AND instance='cc_deterministic_loader' AND ts='2017-05-14+15:00:99' limit 100").collect.foreach(println) result: [1.4947744605006E9,Map(delta -> 803, ip -> 84.16.234.63, ig_id -> 1928710114, hces_extra -> {"radio_type":"wifi-none","auth_flag":"unable_to_verify"}),0.0,1494774434,1.494774459676E9,WrappedArray(517867, 0),26,0,lncny1,e46e8616-9763-475a-b80f-a46094b263a6,9,188,10.20.0,4C0175EC-B421-4676-ACFF-8E1E353D53E5,,57944460,null,6f72336f74c9f85c6e1b7b16c64e9dec,,567067343352427,2017-05-14+15:00:99] You can merge this pull request into a Git repository by running: $ git pull https://github.com/janewangfb/spark support_select_regex Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18023.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18023 commit af55afd8d6839e38337f67e19a614ea3eae9a2cf Author: Jane WangDate: 2017-05-18T00:21:14Z Fix SPARK-12139: REGEX Column Specification for Hive Queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17763: [SPARK-13747][Core]Add ThreadUtils.awaitReady and...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17763 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17763: [SPARK-13747][Core]Add ThreadUtils.awaitReady and disall...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/17763 Thanks! Merging to master and 2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18020 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r117137648 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala --- @@ -76,6 +76,9 @@ class DateExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { } } checkEvaluation(DayOfYear(Literal.create(null, DateType)), null) + +checkEvaluation(DayOfYear(Literal(new Date(sdf.parse("1582-10-15 13:10:15").getTime))), 288) --- End diff -- cc @cloud-fan @gatorsmile Do you have any ideas? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77037 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77037/testReport)** for PR 14971 at commit [`cce31db`](https://github.com/apache/spark/commit/cce31db80cdc66516e3e537f33a3611b07186b6b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #77036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77036/testReport)** for PR 14971 at commit [`22a2c00`](https://github.com/apache/spark/commit/22a2c00333ffc39458f45d629c1b3199f73f1f3e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated S...
GitHub user gatorsmile reopened a pull request: https://github.com/apache/spark/pull/14971 [SPARK-17410] [SPARK-17284] Move Hive-generated Stats Info to HiveClientImpl ### What changes were proposed in this pull request? After we adding a new field `stats` into `CatalogTable`, we should not expose Hive-specific Stats metadata to `MetastoreRelation`. It complicates all the related codes. It also introduces a bug in `SHOW CREATE TABLE`. The statistics-related table properties should be skipped by `SHOW CREATE TABLE`, since it could be incorrect in the newly created table. See the Hive JIRA: https://issues.apache.org/jira/browse/HIVE-13792 This PR is to handle Hive-specific Stats metadata in `HiveClientImpl`. ### How was this patch tested? Added a few test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark showCreateTableNew Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14971.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14971 commit 92474c5a142fb9db2c86549c8347f910fc01fcbd Author: gatorsmileDate: 2016-08-28T22:28:15Z remove stats-related props commit ce8e8b89a5b61648daaa59578e2b6a99ec2f6d74 Author: gatorsmile Date: 2016-08-29T05:24:05Z address comments commit 9ce526b7729c4111292d6abb69bd81aec0ecf5de Author: gatorsmile Date: 2016-09-06T00:25:56Z Merge remote-tracking branch 'upstream/master' into showCreateTable commit efd879dbda12e235d00d9b6bc4891a591832912b Author: gatorsmile Date: 2016-09-06T07:07:12Z move stats from metastorerelation to hiveclientimpl commit 491c0cd2580cfd358b2fd4d94a6f41b4063f172f Author: gatorsmile Date: 2016-09-06T07:34:51Z improve the comments. commit c9cdf44b561c5e88a108cd09ad17842332d14162 Author: gatorsmile Date: 2016-09-06T07:35:26Z improve the comments. commit 4b0aed54b0aef6675f4f9fac82f6647563afb5cc Author: gatorsmile Date: 2016-09-08T03:21:28Z Merge remote-tracking branch 'upstream/master' into showCreateTable commit 552101af418e4a30febf7d09938022ecc4c08da9 Author: gatorsmile Date: 2016-09-08T03:25:00Z merge commit d3dcb564509fd2a32a3fadefb811495affaaa466 Author: gatorsmile Date: 2016-09-11T05:55:20Z Merge remote-tracking branch 'upstream/master' into showCreateTable commit 9e18ba104527d2bb14331f4b51194002dabb2556 Author: gatorsmile Date: 2016-09-11T21:48:36Z fix and add more test cases commit c6a85bcd4b6b58c46787d1ca1510418cef79a8d5 Author: gatorsmile Date: 2016-09-16T04:50:18Z Merge remote-tracking branch 'upstream/master' into showCreateTable commit 3ed68e0b0aa6aff19a5e31c89fed7e5c814e83f3 Author: gatorsmile Date: 2016-09-16T06:16:40Z improve the test case commit 2e4d398388cd64f3e1d130af81d5e7ddc23a2a19 Author: gatorsmile Date: 2016-09-17T06:52:32Z also utilizes Hive-generated row counts when not analyzed in Spark commit 5dfa17efa84ed180e68b4922cfaf85e3d50f14ad Author: gatorsmile Date: 2016-09-17T07:30:57Z more comments commit 2f40c7f5532c8b6e66c786f3b1506bd4efdcf711 Author: gatorsmile Date: 2016-09-18T00:08:48Z address comments. commit 3376bd6a57a65fa004abd43237f8f3c87f07064a Author: gatorsmile Date: 2016-09-18T03:31:11Z fix test cases commit 90cd18e9d7bad6462fb0254d7981e23341795c11 Author: gatorsmile Date: 2016-09-21T04:58:42Z Merge remote-tracking branch 'upstream/master' into showCreateTable commit 7ad08fe2a488fa759b4abf4e99a7206e031379d9 Author: gatorsmile Date: 2016-09-21T05:23:58Z test case fix commit f4c0ebb0901216ea09eaf3f77e4fdcd431b15d37 Author: gatorsmile Date: 2016-09-22T23:08:55Z address comments commit 4c89d92ab65d7f4f061e32aa22780fd6e4b7c798 Author: gatorsmile Date: 2016-09-22T23:12:57Z address comments commit 699b5d8aa4d9370009c73f45d1618f1e5bb92210 Author: gatorsmile Date: 2016-09-24T01:10:04Z fix. commit 8c90d0b7364c46de5a4a59fa89457000bb283dd9 Author: gatorsmile Date: 2016-09-25T05:33:44Z Merge remote-tracking branch 'upstream/master' into showCreateTable commit 50ce04e51aebd5f68f7e50ec7c3bbe72275bf629 Author: gatorsmile Date: 2016-09-25T07:10:30Z address comments. --- If your project is set up for it, you can reply to this email and have your
[GitHub] spark issue #18020: [SPARK-20700][SQL] InferFiltersFromConstraints stackover...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18020 **[Test build #77035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77035/testReport)** for PR 18020 at commit [`aa16ab3`](https://github.com/apache/spark/commit/aa16ab38fc0e0c80b179a5860f477c3650f64609). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17094 **[Test build #77034 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77034/testReport)** for PR 17094 at commit [`9461c45`](https://github.com/apache/spark/commit/9461c45b39ac4c61d0faf6830ee9da0ed8d7015b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17094 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77033/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17094 **[Test build #77033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77033/testReport)** for PR 17094 at commit [`b55b7fe`](https://github.com/apache/spark/commit/b55b7fe0c6af2a744e193f36090845773253ef97). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17094 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org