[GitHub] [spark] AmplabJenkins commented on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24382: [SPARK-27330][SS] support task abort 
in foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-521608879
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24382: [SPARK-27330][SS] support task abort 
in foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-521608885
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109152/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24382: [SPARK-27330][SS] support task 
abort in foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-521608885
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109152/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24382: [SPARK-27330][SS] support task 
abort in foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-521608879
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data 
source tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#discussion_r314292322
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala
 ##
 @@ -650,4 +650,44 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Data source tables support fallback to HDFS for size estimation") {
+// Non-partitioned table
+withTempDir { dir =>
+  Seq(false, true).foreach { fallBackToHDFSForStats =>
+withSQLConf(SQLConf.ENABLE_FALL_BACK_TO_HDFS_FOR_STATS.key -> 
s"$fallBackToHDFSForStats") {
+  withTable("spark_25474") {
+sql(s"CREATE TABLE spark_25474 (c1 BIGINT) USING PARQUET LOCATION 
'${dir.toURI}'")
+
spark.range(5).write.mode(SaveMode.Overwrite).parquet(dir.getCanonicalPath)
+
+assert(getCatalogTable("spark_25474").stats.isEmpty)
+val relation = 
spark.table("spark_25474").queryExecution.analyzed.children.head
+assert(relation.stats.sizeInBytes === getDataSize(dir))
+  }
+}
+  }
+}
+
+// Partitioned table
+Seq(false, true).foreach { fallBackToHDFSForStats =>
 
 Review comment:
   please create a test case for it. e.g.
   ```
   test("partitioned data source tables support fallback to HDFS for size 
estimation")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #25430: [SPARK-28540][WEBUI] Document Environment page

2019-08-15 Thread GitBox
srowen commented on a change in pull request #25430: [SPARK-28540][WEBUI] 
Document Environment page
URL: https://github.com/apache/spark/pull/25430#discussion_r314323676
 
 

 ##
 File path: docs/web-ui.md
 ##
 @@ -49,6 +49,50 @@ sizes and using executors for all partitions in an RDD or 
DataFrame.
 The Environment tab displays the values for the different environment and 
configuration variables,
 including JVM, Spark, and system properties.
 
+
+  
+  
+
+
+We can see that this environment page has five parts, it is a useful place to 
check whether your properties have
+been set correctly. The first part 'Runtime Information' simply contains the
+[runtime properties](configuration.html#runtime-environment) like versions of 
Java and Scala.
+The second part 'Spark Properties' lists the [application 
properties](configuration.html#runtime-environment) like
+'spark.app.name' and 'spark.driver.memory'.
+
+
+  
+  
+
+Clicking the 'Hadoop Properties' link displays properties relative to Hadoop 
and Yarn. Note that properties like
 
 Review comment:
   Yarn -> YARN
   tick-quote `spark.hadoop.*`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25449: [PYSPARK] Simpler countByValue using collections' Counter

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #25449: [PYSPARK] Simpler countByValue using 
collections' Counter
URL: https://github.com/apache/spark/pull/25449#issuecomment-521653459
 
 
   **[Test build #4830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4830/testReport)**
 for PR 25449 at commit 
[`08ea5a0`](https://github.com/apache/spark/commit/08ea5a00cb889077a5dc389fcfe185dea389f574).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
shahidki31 commented on a change in pull request #25460: 
[SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not 
available
URL: https://github.com/apache/spark/pull/25460#discussion_r314349801
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ##
 @@ -72,7 +72,8 @@ case class HadoopFsRelation(
 val compressionFactor = sqlContext.conf.fileCompressionFactor
 val defaultSize = (location.sizeInBytes * compressionFactor).toLong
 location match {
-  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled =>
+  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled
+&& defaultSize == sqlContext.conf.defaultSizeInBytes =>
 
 Review comment:
   @wangyum (location.sizeInBytes * compressionFactor).toLong is always 8.0EB, 
even after the PR #24715 
   I am not sure I understand your comment.  If the statistics doesn't exists, 
it has to fallback to HDFS. right? Next time onwards it will read from stats 
cache.
   
   Number of times falling back to HDFS after this PR and #24715 are also same. 
right?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
SparkQA commented on issue #25299: [SPARK-27651][Core] Avoid the network when 
shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521674863
 
 
   **[Test build #4831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4831/testReport)**
 for PR 25299 at commit 
[`40b7339`](https://github.com/apache/spark/commit/40b73398e2651bb2289ed24ccec26e29d2c9e877).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25463: [SPARK-28744][SQL][TEST] rename SharedSQLContext to SharedSparkSession

2019-08-15 Thread GitBox
dongjoon-hyun commented on issue #25463: [SPARK-28744][SQL][TEST] rename 
SharedSQLContext to SharedSparkSession
URL: https://github.com/apache/spark/pull/25463#issuecomment-521674923
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #25410: [SPARK-28690][SQL] Add `date_part` function for timestamps/dates

2019-08-15 Thread GitBox
MaxGekk commented on a change in pull request #25410: [SPARK-28690][SQL] Add 
`date_part` function for timestamps/dates
URL: https://github.com/apache/spark/pull/25410#discussion_r314189408
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ##
 @@ -1963,3 +1963,64 @@ case class Epoch(child: Expression, timeZoneId: 
Option[String] = None)
 defineCodeGen(ctx, ev, c => s"$dtu.getEpoch($c, $zid)")
   }
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(field, source) - Extracts a part of the date/timestamp.",
+  arguments = """
+Arguments:
+  * field - selects which part of the source should be extracted. 
Supported string values are:
+["MILLENNIUM", "CENTURY", "DECADE", "YEAR", "QUARTER", "MONTH",
+ "WEEK", "DAY", "DAYOFWEEK", "DOW", "ISODOW", "DOY",
+ "HOUR", "MINUTE", "SECOND"]
+  * source - a date (or timestamp) column from where `field` should be 
extracted
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_('YEAR', TIMESTAMP '2019-08-12 01:00:00.123456');
+   2019
+  > SELECT _FUNC_('week', timestamp'2019-08-12 01:00:00.123456');
+   33
+  > SELECT _FUNC_('doy', DATE'2019-08-12');
+   224
+  """,
+  since = "3.0.0")
+case class DatePart(field: Expression, source: Expression, child: Expression)
+  extends RuntimeReplaceable {
+
+  def this(field: Expression, source: Expression) {
+this(field, source, {
+  if (!field.foldable) {
+throw new AnalysisException("The field parameter needs to be a 
foldable string value.")
 
 Review comment:
   According to PostgreSQL docs 
https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT:
   
   >_source_ must be a value expression ...  the _**field**_ parameter 
needs to be **a string value**
   
   Accepting _field_ as an expression is undocumented feature. We could support 
that separately if it is needed.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] 19kka commented on issue #25450: [SPARK-23793][SQL]Handle database names in spark.udf.register()

2019-08-15 Thread GitBox
19kka commented on issue #25450: [SPARK-23793][SQL]Handle database names in 
spark.udf.register()
URL: https://github.com/apache/spark/pull/25450#issuecomment-521537674
 
 
   > Thank you for your first contribution, @19kka . Could you run 
`dev/scalastyle` and fix the errors? I saw some violation like 
[this](https://github.com/apache/spark/pull/25450/files#diff-85fdb913077429ac8e211a3c68375994L24)
 here.
   
   I'm awfully sorry about forget check the style, now I fixed the style error 
and add UDFSuite Test.
   
   I read the related register code again, I realized `spark.sql.resigter()` is 
responsible for  **Create Temp Function** , so I modify the code if 
`spark.sql.resigter()` function name with  **database** name It will throw new 
AnalysisException 
   
   e.g.
   
   ```scala 
   spark.udf.register("db.fun1", (x: Long) => x + 1)
   // throw new AnalysisException
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521541158
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14217/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24486: [SPARK-27592][SQL] Set the bucketed 
data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521541147
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the 
bucketed data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521541147
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521541158
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14217/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24486: [SPARK-27592][SQL] Set the bucketed 
data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521541153
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14218/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521541148
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521541148
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the 
bucketed data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521541153
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14218/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25461: [SPARK-28741][SQL]Throw exceptions when casting to integers causes overflow

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25461: [SPARK-28741][SQL]Throw 
exceptions when casting to integers causes overflow
URL: https://github.com/apache/spark/pull/25461#issuecomment-521550690
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14220/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25461: [SPARK-28741][SQL]Throw exceptions when casting to integers causes overflow

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25461: [SPARK-28741][SQL]Throw 
exceptions when casting to integers causes overflow
URL: https://github.com/apache/spark/pull/25461#issuecomment-521550680
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25461: [SPARK-28741][SQL]Throw exceptions when casting to integers causes overflow

2019-08-15 Thread GitBox
SparkQA commented on issue #25461: [SPARK-28741][SQL]Throw exceptions when 
casting to integers causes overflow
URL: https://github.com/apache/spark/pull/25461#issuecomment-521551264
 
 
   **[Test build #109151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109151/testReport)**
 for PR 25461 at commit 
[`ce1e1b5`](https://github.com/apache/spark/commit/ce1e1b52e3a8a4b4ffdcb00195119a7ee1b8db41).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #24405: [SPARK-27506][SQL] Allow deserialization of Avro data using compatible schemas

2019-08-15 Thread GitBox
cloud-fan commented on issue #24405: [SPARK-27506][SQL] Allow deserialization 
of Avro data using compatible schemas
URL: https://github.com/apache/spark/pull/24405#issuecomment-521576807
 
 
   I mean the Spark avro data source.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on issue #25451: [SPARK-28728][BUILD] Bump Jackson Databind to 2.9.9.3

2019-08-15 Thread GitBox
Fokko commented on issue #25451: [SPARK-28728][BUILD] Bump Jackson Databind to 
2.9.9.3
URL: https://github.com/apache/spark/pull/25451#issuecomment-521586416
 
 
   Sorry for the late reply, I'm in a different timezone. I've `git cherry-pick 
-x`'d your commit onto the branch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-08-15 Thread GitBox
SparkQA commented on issue #24382: [SPARK-27330][SS] support task abort in 
foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-521608476
 
 
   **[Test build #109152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109152/testReport)**
 for PR 24382 at commit 
[`b8d6aee`](https://github.com/apache/spark/commit/b8d6aee3b39a57c1b7469a9c3fca53949ea65342).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24382: [SPARK-27330][SS] support task abort in foreach writer

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #24382: [SPARK-27330][SS] support task abort 
in foreach writer
URL: https://github.com/apache/spark/pull/24382#issuecomment-521553230
 
 
   **[Test build #109152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109152/testReport)**
 for PR 24382 at commit 
[`b8d6aee`](https://github.com/apache/spark/commit/b8d6aee3b39a57c1b7469a9c3fca53949ea65342).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
wangyum commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521611355
 
 
   I did some benchmark.
   
   Prepare data:
   ```scala
   
spark.range(1).repartition(1).write.saveAsTable("test_non_partition_1")
   
spark.range(1).repartition(30).write.saveAsTable("test_non_partition_30")
   spark.range(1).selectExpr("id", "id % 5000 as c2", "id as 
c3").repartition(org.apache.spark.sql.functions.col("c2")).write.partitionBy("c2").saveAsTable("test_partition_5000")
   spark.range(1).selectExpr("id", "id % 1 as c2", "id as 
c3").repartition(org.apache.spark.sql.functions.col("c2")).write.partitionBy("c2").saveAsTable("test_partition_1")
   ```
   Add these lines to 
[LogicalRelation.computeStats](https://github.com/apache/spark/blob/950d407f2b22f1ae088c55cba3d0081c3c1ecff9/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala#L44):
   ```scala
   val time1 = System.currentTimeMillis()
   val relationSize = relation.sizeInBytes
   val time2 = System.currentTimeMillis()
   val fallBackToHdfsSize = 
CommandUtils.getSizeInBytesFallBackToHdfs(relation.sqlContext.sparkSession, 
catalogTable.get)
   val time3 = System.currentTimeMillis()
   // scalastyle:off
   println(s"Get size from relation: $relationSize, time: ${time2 - time1}")
   println(s"Get size fall back to HDFS: $fallBackToHdfsSize, time: ${time3 - 
time2}")
   // scalastyle:on
   ```
   
   Non-partitioned table benchmark result:
   ```
   scala> spark.sql("explain cost select * from test_non_partition_1 limit 
1").show
   Get size from relation: 576588171, time: 22
   Get size fall back to HDFS: 576588171, time: 41
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   
   
   scala> spark.sql("explain cost select * from test_non_partition_1 limit 
1").show
   Get size from relation: 576588171, time: 3
   Get size fall back to HDFS: 576588171, time: 28
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   
   
   scala>
   
   scala> spark.sql("explain cost select * from test_non_partition_30 limit 
1").show
   Get size from relation: 706507984, time: 135
   Get size fall back to HDFS: 706507984, time: 2038
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   
   
   scala> spark.sql("explain cost select * from test_non_partition_30 limit 
1").show
   Get size from relation: 706507984, time: 168
   Get size fall back to HDFS: 706507984, time: 3629
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   ```
   
   Partitioned table benchmark result:
   ```
   scala> spark.sql("explain cost select * from test_partition_5000 limit 
1").show
   Get size from relation: 9223372036854775807, time: 0
   Get size fall back to HDFS: 1018560794, time: 46
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   
   
   scala> spark.sql("explain cost select * from test_partition_1 limit 
1").show
   Get size from relation: 9223372036854775807, time: 0
   Get size fall back to HDFS: 1036799332, time: 43
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   ```
   
   Partitioned table with `spark.sql.hive.manageFilesourcePartitions=false` 
(set it by --conf) benchmark result:
   ```
   scala> spark.sql("set spark.sql.hive.manageFilesourcePartitions").show
   ++-+
   | key|value|
   ++-+
   |spark.sql.hive.ma...|false|
   ++-+
   
   
   scala> spark.sql("explain cost select * from test_partition_5000 limit 
1").show
   Get size from relation: 1018560794, time: 3
   Get size fall back to HDFS: 1018560794, time: 45
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   
   
   scala> spark.sql("explain cost select * from test_partition_1 limit 
1").show
   Get size from relation: 1036799332, time: 865
   Get size fall back to HDFS: 1036799332, time: 69
   ++
   |plan|
   ++
   |== Optimized Logi...|
   ++
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,

[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521616829
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109148/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #22502: [SPARK-25474][SQL] Support `spark.sql.statistics.fallBackToHdfs` in data source tables

2019-08-15 Thread GitBox
shahidki31 commented on a change in pull request #22502: [SPARK-25474][SQL] 
Support `spark.sql.statistics.fallBackToHdfs` in data source tables
URL: https://github.com/apache/spark/pull/22502#discussion_r314297097
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ##
 @@ -71,7 +70,13 @@ case class HadoopFsRelation(
 
   override def sizeInBytes: Long = {
 val compressionFactor = sqlContext.conf.fileCompressionFactor
-(location.sizeInBytes * compressionFactor).toLong
+val defaultSize = (location.sizeInBytes * compressionFactor).toLong
+location match {
+  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled =>
 
 Review comment:
   `Yes. I have prepared some tests to illustrate this issue. These tests can 
be passed before this commit:`
   
   @wangyum First and 3rd will pass after the PR. 2nd test is a bug which fixed 
in the commit. 
   
   Btw, 1st and 3rd tests are not CatalogFileIndex, which any way won't come to 
this flow


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #22502: [SPARK-25474][SQL] Support `spark.sql.statistics.fallBackToHdfs` in data source tables

2019-08-15 Thread GitBox
shahidki31 commented on a change in pull request #22502: [SPARK-25474][SQL] 
Support `spark.sql.statistics.fallBackToHdfs` in data source tables
URL: https://github.com/apache/spark/pull/22502#discussion_r314297097
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ##
 @@ -71,7 +70,13 @@ case class HadoopFsRelation(
 
   override def sizeInBytes: Long = {
 val compressionFactor = sqlContext.conf.fileCompressionFactor
-(location.sizeInBytes * compressionFactor).toLong
+val defaultSize = (location.sizeInBytes * compressionFactor).toLong
+location match {
+  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled =>
 
 Review comment:
   > Yes. I have prepared some tests to illustrate this issue. These tests can 
be passed before this commit:
   
   @wangyum First and 3rd will pass after the PR. 2nd test is a bug which fixed 
in the commit. 
   
   Btw, 1st and 3rd tests are not CatalogFileIndex, which any way won't come to 
this flow


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #25430: [SPARK-28540][WEBUI] Document Environment page

2019-08-15 Thread GitBox
srowen commented on a change in pull request #25430: [SPARK-28540][WEBUI] 
Document Environment page
URL: https://github.com/apache/spark/pull/25430#discussion_r314324097
 
 

 ##
 File path: docs/web-ui.md
 ##
 @@ -49,6 +49,50 @@ sizes and using executors for all partitions in an RDD or 
DataFrame.
 The Environment tab displays the values for the different environment and 
configuration variables,
 including JVM, Spark, and system properties.
 
+
+  
+  
+
+
+We can see that this environment page has five parts, it is a useful place to 
check whether your properties have
 
 Review comment:
   I might use a different paragraph for each of the five sections you mention.
   "it is" should start a new sentence.
   You can remove wording like "We can see that"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] tgravescs commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-15 Thread GitBox
tgravescs commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show 
resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-521669113
 
 
   Just looking at the screen shots, It would be nice for the worker detailed 
page to have the free and used resources like the top level page. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on a change in pull request #25412: [SPARK-28691][EXAMPLES] Add Java/Scala DirectKerberizedKafkaWordCount examples

2019-08-15 Thread GitBox
gaborgsomogyi commented on a change in pull request #25412: 
[SPARK-28691][EXAMPLES] Add Java/Scala DirectKerberizedKafkaWordCount examples
URL: https://github.com/apache/spark/pull/25412#discussion_r314269808
 
 

 ##
 File path: 
examples/src/main/java/org/apache/spark/examples/streaming/JavaDirectKerberizedKafkaWordCount.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.streaming;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.Set;
+import java.util.regex.Pattern;
+
+import scala.Tuple2;
+
+import org.apache.kafka.clients.CommonClientConfigs;
+import org.apache.kafka.common.security.auth.SecurityProtocol;
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.common.serialization.StringDeserializer;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.streaming.api.java.*;
+import org.apache.spark.streaming.kafka010.ConsumerStrategies;
+import org.apache.spark.streaming.kafka010.KafkaUtils;
+import org.apache.spark.streaming.kafka010.LocationStrategies;
+import org.apache.spark.streaming.Durations;
+
+/**
+ * Consumes messages from one or more topics in Kafka and does wordcount.
+ * Usage: JavaDirectKerberizedKafkaWordCount   
+ *is a list of one or more Kafka brokers
+ *is a consumer group name to consume from topics
+ *is a list of one or more kafka topics to consume from
+ *
+ * Example:
+ *$ bin/run-example --files ${path}/kafka_jaas.conf \
+ *  --driver-java-options 
"-Djava.security.auth.login.config=${path}/kafka_jaas.conf" \
+ *  --conf \
+ *  
"spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./kafka_jaas.conf"
 \
+ *  streaming.JavaDirectKerberizedKafkaWordCount 
broker1-host:port,broker2-host:port \
+ *  consumer-group topic1,topic2
+ *
+ * kafka_jaas.conf can manually create, template as:
+ *   KafkaClient {
+ * com.sun.security.auth.module.Krb5LoginModule required
+ * keyTab="${path_of_keytab}/kafka.service.keytab"
 
 Review comment:
   I'm fine to add keytab file to the `--files` section but then the keytab 
path in the jaas file has to be modified to `./kafka.service.keytab` since 
`--files` doesn't preserve the path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] DylanGuedes commented on issue #25331: [SPARK-27768][SQL] Support Infinity/NaN-related float/double literals case-insensitively

2019-08-15 Thread GitBox
DylanGuedes commented on issue #25331: [SPARK-27768][SQL] Support 
Infinity/NaN-related float/double literals case-insensitively
URL: https://github.com/apache/spark/pull/25331#issuecomment-521617976
 
 
   Thank you for your answer, @dilipbiswal 
`numerics` has the following scheme:
   ```sql
   create table numerics (  
   
|2
   id int,  
   
|2
   f_float4 float,  
   
|2
   f_float8 float,  
   
|2
   f_numeric int
   
|2
   ) using parquet;
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #25463: [SPARK-28744][SQL][TEST] rename SharedSQLContext to SharedSparkSession

2019-08-15 Thread GitBox
cloud-fan commented on issue #25463: [SPARK-28744][SQL][TEST] rename 
SharedSQLContext to SharedSparkSession
URL: https://github.com/apache/spark/pull/25463#issuecomment-521625368
 
 
   ok to test


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data 
source tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#discussion_r314289988
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##
 @@ -619,3 +620,35 @@ object DataSourceStrategy {
 (nonconvertiblePredicates ++ unhandledPredicates, pushedFilters, 
handledFilters)
   }
 }
+
+
+/**
+ * Support for recalculating table statistics if table statistics are not 
available.
+ */
+class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {
+
+  private val sessionConf = session.sessionState.conf
+
+  override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+// For the data source table, we only recalculate the table statistics 
when it creates
+// the CatalogFileIndex using defaultSizeInBytes. See SPARK-25474 for more 
details.
 
 Review comment:
   `when it creates the CatalogFileIndex using defaultSizeInBytes` -> `when the 
table stats are not available`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data 
source tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#discussion_r314290254
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##
 @@ -619,3 +620,35 @@ object DataSourceStrategy {
 (nonconvertiblePredicates ++ unhandledPredicates, pushedFilters, 
handledFilters)
   }
 }
+
+
+/**
+ * Support for recalculating table statistics if table statistics are not 
available.
+ */
+class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {
+
+  private val sessionConf = session.sessionState.conf
 
 Review comment:
   nit: just call it `conf`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test

2019-08-15 Thread GitBox
cloud-fan commented on issue #25243: [SPARK-28498][SQL][TEST] clear the states 
of SparkSession after each test
URL: https://github.com/apache/spark/pull/25243#issuecomment-521631419
 
 
   There are tests that create temp views in `beforeAll`, or set some configs 
in `beforeAll`. It's not that easy to clear session state at each test, closing 
this PR for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #25243: [SPARK-28498][SQL][TEST] clear the states of SparkSession after each test

2019-08-15 Thread GitBox
cloud-fan closed pull request #25243: [SPARK-28498][SQL][TEST] clear the states 
of SparkSession after each test
URL: https://github.com/apache/spark/pull/25243
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a shuffle map stage and re-generate the shuffle files

2019-08-15 Thread GitBox
cloud-fan commented on issue #24892: [SPARK-25341][Core] Support rolling back a 
shuffle map stage and re-generate the shuffle files
URL: https://github.com/apache/spark/pull/24892#issuecomment-521637405
 
 
   @vanzin I checked the code:
   
https://github.com/apache/spark/blob/1b416a0c77706ba352b72841d8b6ca3f459593fa/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1355-L1365
   
   Spark will ignore late speculative tasks that are completed after the stage 
is completed. So at least the worst case won't happen: a speculative task could 
replace the output of another task after the stage has finished (and thus after 
the next stage started running)
   
   But we still have the contradiction: executor side first shuffle write wins, 
driver side last shuffle write wins. When we run an indeterminate stage, the 
downstream stages are always fresh (when we rerun an indeterminate stage, the 
scheduler rolls back all downstream stages). So it doesn't matter which shuffle 
write wins, as long as the shuffle write is atomic. We'd better fix this 
contradiction, but it doesn't cause any real problems.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521650177
 
 
   jenkins retest this please
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25449: [PYSPARK] Simpler countByValue using collections' Counter

2019-08-15 Thread GitBox
SparkQA commented on issue #25449: [PYSPARK] Simpler countByValue using 
collections' Counter
URL: https://github.com/apache/spark/pull/25449#issuecomment-521665292
 
 
   **[Test build #4830 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4830/testReport)**
 for PR 25449 at commit 
[`08ea5a0`](https://github.com/apache/spark/commit/08ea5a00cb889077a5dc389fcfe185dea389f574).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on a change in pull request #25412: [SPARK-28691][EXAMPLES] Add Java/Scala DirectKerberizedKafkaWordCount examples

2019-08-15 Thread GitBox
gaborgsomogyi commented on a change in pull request #25412: 
[SPARK-28691][EXAMPLES] Add Java/Scala DirectKerberizedKafkaWordCount examples
URL: https://github.com/apache/spark/pull/25412#discussion_r314268725
 
 

 ##
 File path: 
examples/src/main/java/org/apache/spark/examples/streaming/JavaDirectKerberizedKafkaWordCount.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.streaming;
+
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Arrays;
+import java.util.Map;
+import java.util.Set;
+import java.util.regex.Pattern;
+
+import scala.Tuple2;
+
+import org.apache.kafka.clients.CommonClientConfigs;
+import org.apache.kafka.common.security.auth.SecurityProtocol;
+import org.apache.kafka.clients.consumer.ConsumerConfig;
+import org.apache.kafka.clients.consumer.ConsumerRecord;
+import org.apache.kafka.common.serialization.StringDeserializer;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.streaming.api.java.*;
+import org.apache.spark.streaming.kafka010.ConsumerStrategies;
+import org.apache.spark.streaming.kafka010.KafkaUtils;
+import org.apache.spark.streaming.kafka010.LocationStrategies;
+import org.apache.spark.streaming.Durations;
+
+/**
+ * Consumes messages from one or more topics in Kafka and does wordcount.
+ * Usage: JavaDirectKerberizedKafkaWordCount   
+ *is a list of one or more Kafka brokers
+ *is a consumer group name to consume from topics
+ *is a list of one or more kafka topics to consume from
+ *
+ * Example:
+ *$ bin/run-example --files ${path}/kafka_jaas.conf \
+ *  --driver-java-options 
"-Djava.security.auth.login.config=${path}/kafka_jaas.conf" \
+ *  --conf \
+ *  
"spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./kafka_jaas.conf"
 \
+ *  streaming.JavaDirectKerberizedKafkaWordCount 
broker1-host:port,broker2-host:port \
+ *  consumer-group topic1,topic2
+ *
+ * kafka_jaas.conf can manually create, template as:
+ *   KafkaClient {
+ * com.sun.security.auth.module.Krb5LoginModule required
+ * keyTab="${path_of_keytab}/kafka.service.keytab"
+ * useKeyTab=true
+ * storeKey=true
+ * useTicketCache=false
+ * serviceName="kafka"
+ * principal="kafka/server@example";
 
 Review comment:
   The principal doesn't contain the `org.domain` parameter which makes this 
example constantly fail in my setup, the rest looks good. I'm using hadoop's 
MiniKDC with default settings.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521541634
 
 
   **[Test build #109148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109148/testReport)**
 for PR 25443 at commit 
[`0ac0b30`](https://github.com/apache/spark/commit/0ac0b30947dc1da33d0ac5ed0c8201a4c3d54c8a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521616825
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521616829
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109148/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521616825
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data 
source tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#discussion_r314291750
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ##
 @@ -619,3 +620,35 @@ object DataSourceStrategy {
 (nonconvertiblePredicates ++ unhandledPredicates, pushedFilters, 
handledFilters)
   }
 }
+
+
+/**
+ * Support for recalculating table statistics if table statistics are not 
available.
+ */
+class DetermineTableStats(session: SparkSession) extends Rule[LogicalPlan] {
+
+  private val sessionConf = session.sessionState.conf
+
+  override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
+// For the data source table, we only recalculate the table statistics 
when it creates
+// the CatalogFileIndex using defaultSizeInBytes. See SPARK-25474 for more 
details.
+case logical @ LogicalRelation(_, _, Some(table), _)
+  if sessionConf.fallBackToHdfsForStatsEnabled && table.stats.isEmpty &&
+sessionConf.manageFilesourcePartitions &&
+table.tracksPartitionsInCatalog && table.partitionColumnNames.nonEmpty 
=>
+  val sizeInBytes = CommandUtils.getSizeInBytesFallBackToHdfs(session, 
table)
+  val withStats = table.copy(stats = Some(CatalogStatistics(sizeInBytes = 
BigInt(sizeInBytes
+  logical.copy(catalogTable = Some(withStats))
+
+case relation: HiveTableRelation
 
 Review comment:
   shall we catch `InsertIntoTable(HiveTableRelation)` as well?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on a change in pull request #24715: [SPARK-25474][SQL] Data 
source tables support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#discussion_r314292087
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala
 ##
 @@ -650,4 +650,44 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
   }
 }
   }
+
+  test("Data source tables support fallback to HDFS for size estimation") {
+// Non-partitioned table
+withTempDir { dir =>
+  Seq(false, true).foreach { fallBackToHDFSForStats =>
+withSQLConf(SQLConf.ENABLE_FALL_BACK_TO_HDFS_FOR_STATS.key -> 
s"$fallBackToHDFSForStats") {
 
 Review comment:
   why this config has no effect in this test?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25449: [PYSPARK] Simpler countByValue using collections' Counter

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25449: [PYSPARK] Simpler countByValue 
using collections' Counter
URL: https://github.com/apache/spark/pull/25449#issuecomment-521135734
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] [FOLLOW-UP] Update the TODO with actual blocking JIRA IDs

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] 
[FOLLOW-UP] Update the TODO with  actual blocking JIRA IDs
URL: https://github.com/apache/spark/pull/25415#issuecomment-521653151
 
 
   **[Test build #4829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4829/testReport)**
 for PR 25415 at commit 
[`dd41b26`](https://github.com/apache/spark/commit/dd41b268a773c262af6d7631c5ae7d94d543de8a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #25344: [WIP][SPARK-28151][SQL] Mapped ByteType to TinyINT for MsSQLServerDialect

2019-08-15 Thread GitBox
srowen commented on a change in pull request #25344: [WIP][SPARK-28151][SQL] 
Mapped ByteType to TinyINT for MsSQLServerDialect
URL: https://github.com/apache/spark/pull/25344#discussion_r314336492
 
 

 ##
 File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala
 ##
 @@ -202,4 +204,25 @@ class MsSqlServerIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 df2.write.jdbc(jdbcUrl, "datescopy", new Properties)
 df3.write.jdbc(jdbcUrl, "stringscopy", new Properties)
   }
+
+  test("SPARK-28151 Test write table with BYTETYPE") {
+val tableSchema = StructType(Seq(StructField("serialNum", ByteType, true)))
+val tableData = Seq(Row(10))
+val df1 = spark.createDataFrame(
+  spark.sparkContext.parallelize(tableData),
+  tableSchema)
+
+df1.write
+  .format("jdbc")
+  .mode("overwrite")
+  .option("url", jdbcUrl)
+  .option("dbtable", "testTable")
+  .save()
+val df2 = spark.read
+  .format("jdbc")
+  .option("url", jdbcUrl)
+  .option("dbtable", "byteTable")
+  .load()
+df2.show()
 
 Review comment:
   @shivsood if you update the test a bit I think we can add this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #24779: [SPARK-27929][SQL] Make percentile function receive frq of double

2019-08-15 Thread GitBox
srowen commented on a change in pull request #24779: [SPARK-27929][SQL] Make 
percentile function receive frq of double
URL: https://github.com/apache/spark/pull/24779#discussion_r314346114
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Percentile.scala
 ##
 @@ -67,16 +67,23 @@ case class Percentile(
 child: Expression,
 percentageExpression: Expression,
 frequencyExpression : Expression,
+isIntFreqExpression: Expression,
 
 Review comment:
   At the least, I think you can switch logic based on the type of the column, 
rather than add a new parameter.
   
   I don't know the logic here, but I'd imagine that anything that works for 
integer weights (1, 2) should work identically for continuous ones (1.0, 2.0). 
It's possible that the current tests are actually expecting the 'wrong' value 
if the code is using an unsuitable approximation for integer values or 
something.
   
   I'd be interested in knowing what fails, as I don't expect Hive-related 
tests to fail - it doesn't implement weighted percentiles, I thought? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
wangyum commented on a change in pull request #25460: 
[SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not 
available
URL: https://github.com/apache/spark/pull/25460#discussion_r314282305
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ##
 @@ -72,7 +72,8 @@ case class HadoopFsRelation(
 val compressionFactor = sqlContext.conf.fileCompressionFactor
 val defaultSize = (location.sizeInBytes * compressionFactor).toLong
 location match {
-  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled =>
+  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled
+&& defaultSize == sqlContext.conf.defaultSizeInBytes =>
 
 Review comment:
   Please see benchmark here: 
https://github.com/apache/spark/pull/24715#issuecomment-521611355


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables support fallback to HDFS for size estimation

2019-08-15 Thread GitBox
cloud-fan commented on issue #24715: [SPARK-25474][SQL] Data source tables 
support fallback to HDFS for size estimation
URL: https://github.com/apache/spark/pull/24715#issuecomment-521626600
 
 
   @wangyum do you mean `CommandUtils.getSizeInBytesFallBackToHdfs` is very 
slow if there are many files?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521633221
 
 
   jenkins retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] abellina commented on a change in pull request #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-15 Thread GitBox
abellina commented on a change in pull request #25403: [SPARK-28679][YARN] 
changes to setResourceInformation to handle empty resources and reflection 
error handling
URL: https://github.com/apache/spark/pull/25403#discussion_r314310736
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ##
 @@ -143,17 +143,29 @@ private object ResourceRequestHelper extends Logging {
 require(resource != null, "Resource parameter should not be null!")
 
 logDebug(s"Custom resources requested: $resources")
+if (resources.isEmpty) {
+  // no point in going forward, as we don't have anything to set
+  return
+}
+
 if (!isYarnResourceTypesAvailable()) {
-  if (resources.nonEmpty) {
-logWarning("Ignoring custom resource requests because " +
-"the version of YARN does not support it!")
-  }
+  logWarning("Ignoring custom resource requests because " +
+  "the version of YARN does not support it!")
   return
 }
 
 val resInfoClass = Utils.classForName(RESOURCE_INFO_CLASS)
 val setResourceInformationMethod =
-  resource.getClass.getMethod("setResourceInformation", classOf[String], 
resInfoClass)
+  try {
+resource.getClass.getMethod("setResourceInformation", classOf[String], 
resInfoClass)
+  } catch {
+case e: NoSuchMethodException =>
+  throw new SparkException(
+s"""Cannot find $RESOURCE_INFO_CLASS.setResourceInformation.
+|This is likely due to a jar conflict between different yarn 
versions."""
+.stripMargin.replace("\n", " "), e)
 
 Review comment:
   @dongjoon-hyun I changed this, sorry for the delay.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25451: [SPARK-28728][BUILD] Bump Jackson Databind to 2.9.9.3

2019-08-15 Thread GitBox
SparkQA commented on issue #25451: [SPARK-28728][BUILD] Bump Jackson Databind 
to 2.9.9.3
URL: https://github.com/apache/spark/pull/25451#issuecomment-521649568
 
 
   **[Test build #4828 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4828/testReport)**
 for PR 25451 at commit 
[`f6c2f4a`](https://github.com/apache/spark/commit/f6c2f4acb6f9eb57eae23190b2be93002e1c0f6a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-15 Thread GitBox
attilapiros commented on issue #25299: [SPARK-27651][Core] Avoid the network 
when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#issuecomment-521670791
 
 
   retest this please.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-15 Thread GitBox
srowen commented on a change in pull request #25403: [SPARK-28679][YARN] 
changes to setResourceInformation to handle empty resources and reflection 
error handling
URL: https://github.com/apache/spark/pull/25403#discussion_r314346740
 
 

 ##
 File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ResourceRequestHelper.scala
 ##
 @@ -153,7 +153,16 @@ private object ResourceRequestHelper extends Logging {
 
 val resInfoClass = Utils.classForName(RESOURCE_INFO_CLASS)
 val setResourceInformationMethod =
-  resource.getClass.getMethod("setResourceInformation", classOf[String], 
resInfoClass)
+  try {
+resource.getClass.getMethod("setResourceInformation", classOf[String], 
resInfoClass)
+  } catch {
+case e: NoSuchMethodException =>
 
 Review comment:
   OK, so all this is adding is an extra message. Hm, OK, maybe worth it.
   Nit: jar -> JAR, yarn -> YARN


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
shahidki31 commented on a change in pull request #25460: 
[SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not 
available
URL: https://github.com/apache/spark/pull/25460#discussion_r314349801
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ##
 @@ -72,7 +72,8 @@ case class HadoopFsRelation(
 val compressionFactor = sqlContext.conf.fileCompressionFactor
 val defaultSize = (location.sizeInBytes * compressionFactor).toLong
 location match {
-  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled =>
+  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled
+&& defaultSize == sqlContext.conf.defaultSizeInBytes =>
 
 Review comment:
   @wangyum 
   `(location.sizeInBytes * compressionFactor).toLong` is always `8.0EB`, even 
after the PR #24715 
   I am not sure I understand your comment.  If the statistics doesn't exists, 
it has to fallback to HDFS. right? Next time onwards it will read from stats 
cache.
   
   Number of times falling back to HDFS after this PR and #24715 are also same. 
right?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on a change in pull request #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
shahidki31 commented on a change in pull request #25460: 
[SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not 
available
URL: https://github.com/apache/spark/pull/25460#discussion_r314274591
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ##
 @@ -72,7 +72,8 @@ case class HadoopFsRelation(
 val compressionFactor = sqlContext.conf.fileCompressionFactor
 val defaultSize = (location.sizeInBytes * compressionFactor).toLong
 location match {
-  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled =>
+  case cfi: CatalogFileIndex if 
sparkSession.sessionState.conf.fallBackToHdfsForStatsEnabled
+&& defaultSize == sqlContext.conf.defaultSizeInBytes =>
 
 Review comment:
   @wangyum 
   default size is coming as Long.MAX. If it is correct size it will not 
fallback to HDFS. Even if it is correct size, fallbackhdfs also give the same 
result. Also it is not a performance sensitive path I guess. Because, it will 
come here only when it requires to compute statistics, for eg: during join 
operation. And if the table already has statistics, flow will not come here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shahidki31 commented on issue #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
shahidki31 commented on issue #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to 
hdfs when relation table stats is not available
URL: https://github.com/apache/spark/pull/25460#issuecomment-521616299
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
SparkQA commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521616576
 
 
   **[Test build #109148 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109148/testReport)**
 for PR 25443 at commit 
[`0ac0b30`](https://github.com/apache/spark/commit/0ac0b30947dc1da33d0ac5ed0c8201a4c3d54c8a).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #25424: [SPARK-28543][DOCS][WebUI] Document Spark Jobs page

2019-08-15 Thread GitBox
srowen commented on issue #25424: [SPARK-28543][DOCS][WebUI] Document Spark 
Jobs page
URL: https://github.com/apache/spark/pull/25424#issuecomment-521648673
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen closed pull request #25424: [SPARK-28543][DOCS][WebUI] Document Spark Jobs page

2019-08-15 Thread GitBox
srowen closed pull request #25424: [SPARK-28543][DOCS][WebUI] Document Spark 
Jobs page
URL: https://github.com/apache/spark/pull/25424
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] [FOLLOW-UP] Update the TODO with actual blocking JIRA IDs

2019-08-15 Thread GitBox
SparkQA commented on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] 
[FOLLOW-UP] Update the TODO with  actual blocking JIRA IDs
URL: https://github.com/apache/spark/pull/25415#issuecomment-521653151
 
 
   **[Test build #4829 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4829/testReport)**
 for PR 25415 at commit 
[`dd41b26`](https://github.com/apache/spark/commit/dd41b268a773c262af6d7631c5ae7d94d543de8a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on issue #25449: [PYSPARK] Simpler countByValue using collections' Counter

2019-08-15 Thread GitBox
srowen commented on issue #25449: [PYSPARK] Simpler countByValue using 
collections' Counter
URL: https://github.com/apache/spark/pull/25449#issuecomment-521653337
 
 
   This also needs a JIRA. https://github.com/apache/spark/pull/25429


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] [FOLLOW-UP] Update the TODO with actual blocking JIRA IDs

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25415: 
[SPARK-28390][SQL][PYTHON][TESTS] [FOLLOW-UP] Update the TODO with  actual 
blocking JIRA IDs
URL: https://github.com/apache/spark/pull/25415#issuecomment-520309524
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25449: [PYSPARK] Simpler countByValue using collections' Counter

2019-08-15 Thread GitBox
SparkQA commented on issue #25449: [PYSPARK] Simpler countByValue using 
collections' Counter
URL: https://github.com/apache/spark/pull/25449#issuecomment-521653459
 
 
   **[Test build #4830 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4830/testReport)**
 for PR 25449 at commit 
[`08ea5a0`](https://github.com/apache/spark/commit/08ea5a00cb889077a5dc389fcfe185dea389f574).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] [FOLLOW-UP] Update the TODO with actual blocking JIRA IDs

2019-08-15 Thread GitBox
SparkQA commented on issue #25415: [SPARK-28390][SQL][PYTHON][TESTS] 
[FOLLOW-UP] Update the TODO with  actual blocking JIRA IDs
URL: https://github.com/apache/spark/pull/25415#issuecomment-521660122
 
 
   **[Test build #4829 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4829/testReport)**
 for PR 25415 at commit 
[`dd41b26`](https://github.com/apache/spark/commit/dd41b268a773c262af6d7631c5ae7d94d543de8a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on issue #25448: [SPARK-28697][SQL] Invalidate Database/Table names starting with underscore

2019-08-15 Thread GitBox
cloud-fan commented on issue #25448: [SPARK-28697][SQL] Invalidate 
Database/Table names starting with underscore
URL: https://github.com/apache/spark/pull/25448#issuecomment-521527717
 
 
   Wait, does table name starting with `_` work in Spark currently? From 
SPARK-19059 it seems supported, but from SPARK-28697 it seems not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25410: [SPARK-28690][SQL] Add `date_part` function for timestamps/dates

2019-08-15 Thread GitBox
SparkQA commented on issue #25410: [SPARK-28690][SQL] Add `date_part` function 
for timestamps/dates
URL: https://github.com/apache/spark/pull/25410#issuecomment-521532339
 
 
   **[Test build #109145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109145/testReport)**
 for PR 25410 at commit 
[`1b2c8d4`](https://github.com/apache/spark/commit/1b2c8d4d72394cfd27e8d4e6b0a9291706cd62e5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25460: [SPARK-25474][SQL][Followup] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25460: [SPARK-25474][SQL][Followup] 
fallback to hdfs when relation table stats is not available
URL: https://github.com/apache/spark/pull/25460#issuecomment-521535225
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24440: [SPARK-27545] [SQL] Uncache table needs to delete the temporary view …

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24440: [SPARK-27545] [SQL] Uncache 
table needs to delete the temporary view …
URL: https://github.com/apache/spark/pull/24440#issuecomment-521539522
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521539498
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109140/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the 
bucketed data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521539445
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the 
bucketed data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521539223
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14216/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521539487
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25456: [SPARK-28739][SQL] Add a simple cost check for Adaptive Query Execution

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25456: [SPARK-28739][SQL] Add a 
simple cost check for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/25456#issuecomment-521539566
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed 
data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521537900
 
 
   **[Test build #109147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109147/testReport)**
 for PR 24486 at commit 
[`842bd3e`](https://github.com/apache/spark/commit/842bd3ec57a33093a5f47ceb38016ebabf9503e1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-521539472
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24440: [SPARK-27545] [SQL] Uncache table needs to delete the temporary view …

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24440: [SPARK-27545] [SQL] Uncache table 
needs to delete the temporary view …
URL: https://github.com/apache/spark/pull/24440#issuecomment-521539522
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25456: [SPARK-28739][SQL] Add a simple cost check for Adaptive Query Execution

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #25456: [SPARK-28739][SQL] Add a simple cost 
check for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/25456#issuecomment-521516999
 
 
   **[Test build #109142 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109142/testReport)**
 for PR 25456 at commit 
[`74dd386`](https://github.com/apache/spark/commit/74dd3865e0fe3287d73a7b6aa954cc63bf17e9fd).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25443: [WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 2.3.6 on jenkins

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25443: 
[WIP][SPARK-28723][test-hadoop3.2][test-maven] Test JDK 11 with Hadoop-3.2/Hive 
2.3.6 on jenkins
URL: https://github.com/apache/spark/pull/25443#issuecomment-521539487
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25410: [SPARK-28690][SQL] Add `date_part` function for timestamps/dates

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #25410: [SPARK-28690][SQL] Add `date_part` 
function for timestamps/dates
URL: https://github.com/apache/spark/pull/25410#issuecomment-521532339
 
 
   **[Test build #109145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109145/testReport)**
 for PR 25410 at commit 
[`1b2c8d4`](https://github.com/apache/spark/commit/1b2c8d4d72394cfd27e8d4e6b0a9291706cd62e5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-521525967
 
 
   **[Test build #109144 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109144/testReport)**
 for PR 25368 at commit 
[`45cbbd0`](https://github.com/apache/spark/commit/45cbbd04408251e14a9157d1a5b93ae6a8e91401).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25460: [SPARK-25474][SQL][FOLLOW-UP] fallback to hdfs when relation table stats is not available

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25460: [SPARK-25474][SQL][FOLLOW-UP] 
fallback to hdfs when relation table stats is not available
URL: https://github.com/apache/spark/pull/25460#issuecomment-521539449
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #24440: [SPARK-27545] [SQL] Uncache table needs to delete the temporary view …

2019-08-15 Thread GitBox
SparkQA removed a comment on issue #24440: [SPARK-27545] [SQL] Uncache table 
needs to delete the temporary view …
URL: https://github.com/apache/spark/pull/24440#issuecomment-521519898
 
 
   **[Test build #109143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109143/testReport)**
 for PR 24440 at commit 
[`770ee42`](https://github.com/apache/spark/commit/770ee4261335635fafe79afebb1ce7302db96d92).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25456: [SPARK-28739][SQL] Add a simple cost check for Adaptive Query Execution

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25456: [SPARK-28739][SQL] Add a simple cost 
check for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/25456#issuecomment-521539566
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25456: [SPARK-28739][SQL] Add a simple cost check for Adaptive Query Execution

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25456: [SPARK-28739][SQL] Add a simple cost 
check for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/25456#issuecomment-521539570
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109142/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #24486: [SPARK-27592][SQL] Set the 
bucketed data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521539217
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25410: [SPARK-28690][SQL] Add `date_part` function for timestamps/dates

2019-08-15 Thread GitBox
AmplabJenkins removed a comment on issue #25410: [SPARK-28690][SQL] Add 
`date_part` function for timestamps/dates
URL: https://github.com/apache/spark/pull/25410#issuecomment-521539429
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24440: [SPARK-27545] [SQL] Uncache table needs to delete the temporary view …

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24440: [SPARK-27545] [SQL] Uncache table 
needs to delete the temporary view …
URL: https://github.com/apache/spark/pull/24440#issuecomment-521539525
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109143/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-521539472
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-15 Thread GitBox
SparkQA commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to 
track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-521539389
 
 
   **[Test build #109144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109144/testReport)**
 for PR 25368 at commit 
[`45cbbd0`](https://github.com/apache/spark/commit/45cbbd04408251e14a9157d1a5b93ae6a8e91401).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
SparkQA commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data 
source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521539394
 
 
   **[Test build #109147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109147/testReport)**
 for PR 24486 at commit 
[`842bd3e`](https://github.com/apache/spark/commit/842bd3ec57a33093a5f47ceb38016ebabf9503e1).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24440: [SPARK-27545] [SQL] Uncache table needs to delete the temporary view …

2019-08-15 Thread GitBox
SparkQA commented on issue #24440: [SPARK-27545] [SQL] Uncache table needs to 
delete the temporary view …
URL: https://github.com/apache/spark/pull/24440#issuecomment-521539388
 
 
   **[Test build #109143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109143/testReport)**
 for PR 24440 at commit 
[`770ee42`](https://github.com/apache/spark/commit/770ee4261335635fafe79afebb1ce7302db96d92).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #24486: [SPARK-27592][SQL] Set the bucketed data source table SerDe correctly

2019-08-15 Thread GitBox
AmplabJenkins commented on issue #24486: [SPARK-27592][SQL] Set the bucketed 
data source table SerDe correctly
URL: https://github.com/apache/spark/pull/24486#issuecomment-521539448
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109147/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >