[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20511 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/920/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20057 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20511 Sure, I rebased this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20619 **[Test build #87482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87482/testReport)** for PR 20619 at commit [`43f809f`](https://github.com/apache/spark/commit/43f809fd2ff619c901e05bc062ab70aa65371a46). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20620 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20619 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20619 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/919/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20619: [SPARK-23390][SQL] Register task completion listerners f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20619 Hi, @cloud-fan and @gatorsmile . This is the same kind of PR about opened file leakage for ParquetFileFormat. Could you review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20620 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20620: [SPARK-23438][DSTREAMS] Fix DStreams data loss wi...
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20620 [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes ## What changes were proposed in this pull request? There is a race condition introduced in SPARK-11141 which could cause data loss. The problem is that ReceivedBlockTracker.insertAllocatedBatch function assumes that all blocks from streamIdToUnallocatedBlockQueues allocated to the batch and clears the queue. In this PR only the allocated blocks will be removed from the queue which will prevent data loss. ## How was this patch tested? Additional unit test + manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gaborgsomogyi/spark SPARK-23438 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20620 commit 152fec431218161e538c377a6cb82753100dc70b Author: Gabor Somogyi Date: 2018-02-09T08:30:19Z [SPARK-23438][DSTREAMS] Fix DStreams data loss with WAL when driver crashes --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20610 Thank you, @gatorsmile , @cloud-fan , and @viirya . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20610 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20619: [SPARK-23390][SQL] Register task completion liste...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20619 [SPARK-23390][SQL] Register task completion listerners first in ParquetFileFormat ## What changes were proposed in this pull request? ParquetFileFormat leaks opened files in some cases. This PR prevents that by register task completion listers first before initialization. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/205/testReport/org.apache.spark.sql/FileBasedDataSourceSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/ ``` Caused by: sbt.ForkMain$ForkError: java.lang.Throwable: null at org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:538) at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase.initialize(SpecificParquetRecordReaderBase.java:149) at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initialize(VectorizedParquetRecordReader.java:133) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:400) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:356) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:125) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:179) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:106) ``` ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-23390 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20619.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20619 commit 43f809fd2ff619c901e05bc062ab70aa65371a46 Author: Dongjoon Hyun Date: 2018-02-15T16:55:43Z [SPARK-23390][SQL] Register task completion listerners first in ParquetFileFormat --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20511: [SPARK-23340][SQL] Upgrade Apache ORC to 1.4.3
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20511 @dongjoon-hyun Could you rebase this PR? We want to merge it to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20610 LGTM Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20617 Thank you for review and approval, @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20616 Hi, @cloud-fan and @gatorsmile . Could you review this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20601 **[Test build #87481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87481/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20610#discussion_r168531711 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -33,6 +33,19 @@ import org.apache.spark.util.Utils class FileStreamSinkSuite extends StreamTest { import testImplicits._ + override def beforeAll(): Unit = { --- End diff -- Hi, @cloud-fan . I tested it, but that doesn't work in this `FileStreamSinkSuite`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20295 **[Test build #87480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87480/testReport)** for PR 20295 at commit [`9ed3779`](https://github.com/apache/spark/commit/9ed3779b665c90e5bb25bc6636997a4b080c3d34). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user misutoth commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168531352 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -770,7 +837,14 @@ case class Unhex(child: Expression) extends UnaryExpression with ImplicitCastInp // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr1, expr2) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`expr1`, `expr2`).", + usage = "_FUNC_(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`exprX`, `exprY`), " + +"as if computed by `java.lang.Math._FUNC_`.", + arguments = +""" +Arguments: + * exprY - the ordinate coordinate + * exprX - the abscissa coordinate --- End diff -- Sure, I will fix that. I borrowed that from java.lang.Math ... which sometimes is more complicated than it should be. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20295: [SPARK-23011] Support alternative function form with gro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/918/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20601 jenkins retest please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user misutoth commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168526385 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1313,131 +1313,178 @@ object functions { // /** - * Computes the cosine inverse of the given value; the returned angle is in the range - * 0.0 through pi. + * @param e the value whose arc cosine is to be returned + * @return cosine inverse of the given value in the range of 0.0 through pi, + * as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ def acos(e: Column): Column = withExpr { Acos(e.expr) } /** - * Computes the cosine inverse of the given column; the returned angle is in the range - * 0.0 through pi. + * @param colName the value whose arc cosine is to be returned + * @returncosine inverse of the given value in the range of 0.0 through pi, + *as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ - def acos(columnName: String): Column = acos(Column(columnName)) + def acos(colName: String): Column = acos(Column(colName)) --- End diff -- columnName was too long and it ran into the description in the generated doc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user misutoth commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168525504 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1313,131 +1313,178 @@ object functions { // /** - * Computes the cosine inverse of the given value; the returned angle is in the range - * 0.0 through pi. + * @param e the value whose arc cosine is to be returned + * @return cosine inverse of the given value in the range of 0.0 through pi, --- End diff -- Ok. Then I will cut the part about the domains and ranges. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user Fokko commented on the issue: https://github.com/apache/spark/pull/20057 We're in the process of integrating Spark in Airflow, and support for the `cascadeTruncate` is required to make this succeed. First steps are here: https://github.com/apache/incubator-airflow/pull/3021. Would be great if we can get this merged asap so we can continue testing. Cheers --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user misutoth commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168521316 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -262,6 +285,11 @@ case class Cos(child: Expression) extends UnaryMathExpression(math.cos, "COS") @ExpressionDescription( usage = "_FUNC_(expr) - Returns the hyperbolic cosine of `expr`.", + arguments = +""" +Arguments: + * expr - number whose hyperbolic consine is to be returned. --- End diff -- I was not very familiar with hyperbolic functions so I followed the way this is described in java.lang.Math. But hyperbolic angle is really what this parameter actually is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20607: Don't block on cleanup tasks by default
Github user rkrzr commented on the issue: https://github.com/apache/spark/pull/20607 I'll close this for now. (I am running into a problem related to this, but I'll better open a new issue about that when I know more) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20607: Don't block on cleanup tasks by default
Github user rkrzr closed the pull request at: https://github.com/apache/spark/pull/20607 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20057: [SPARK-22880][SQL] Add cascadeTruncate option to JDBC da...
Github user danielvdende commented on the issue: https://github.com/apache/spark/pull/20057 @dongjoon-hyun @gatorsmile sorry to keep asking, but could you let me know when we can get this merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87477/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20601 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20601 **[Test build #87477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87477/testReport)** for PR 20601 at commit [`c8ef968`](https://github.com/apache/spark/commit/c8ef96839d0ec79eb397757736f9a9df0b876a11). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20587: Branch 2.2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20587 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20586: Branch 2.1
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20586 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17619: [SPARK-19755][Mesos] Blacklist is always active f...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/17619#discussion_r168509141 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -484,7 +481,6 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( cpus + totalCoresAcquired <= maxCores && mem <= offerMem && numExecutors() < executorLimit && - slaves.get(slaveId).map(_.taskFailures).getOrElse(0) < MAX_SLAVE_FAILURES && --- End diff -- rather than just deleting this, we should replace it with a check to `scheduler.nodeBlacklist()`, like the YarnScheduler is doing here: https://github.com/apache/spark/blob/44e20c42254bc6591b594f54cd94ced5fcfadae3/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala#L128 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20575: [SPARK-23386][DEPLOY] enable direct application links in...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20575 If doing this, it would be cleaner to do it as part of parsing the logs. e.g., if you make `AppListingListener` write the app info to the store when interesting events happen, that would be much better and less race-prone. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20607: Don't block on cleanup tasks by default
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20607 @rkrzr please close the PR, unless you plan to actually test that this is not an issue anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168511361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1313,131 +1313,178 @@ object functions { // /** - * Computes the cosine inverse of the given value; the returned angle is in the range - * 0.0 through pi. + * @param e the value whose arc cosine is to be returned + * @return cosine inverse of the given value in the range of 0.0 through pi, + * as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ def acos(e: Column): Column = withExpr { Acos(e.expr) } /** - * Computes the cosine inverse of the given column; the returned angle is in the range - * 0.0 through pi. + * @param colName the value whose arc cosine is to be returned + * @returncosine inverse of the given value in the range of 0.0 through pi, + *as if computed by [[java.lang.Math#acos]] * * @group math_funcs * @since 1.4.0 */ - def acos(columnName: String): Column = acos(Column(columnName)) + def acos(colName: String): Column = acos(Column(colName)) --- End diff -- Why change columnName -> colName? doesn't seem to matter, so I'd avoid changing it if so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168511774 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -770,7 +837,14 @@ case class Unhex(child: Expression) extends UnaryExpression with ImplicitCastInp // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr1, expr2) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`expr1`, `expr2`).", + usage = "_FUNC_(exprY, exprX) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`exprX`, `exprY`), " + +"as if computed by `java.lang.Math._FUNC_`.", + arguments = +""" +Arguments: + * exprY - the ordinate coordinate + * exprX - the abscissa coordinate --- End diff -- Here and below -- it's not clear how ordinate and abscissa relate to the function's description. Above it's described more simply as the coordinates of a point in the plane, and I think that's more recognizable than these terms. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168504378 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -262,6 +285,11 @@ case class Cos(child: Expression) extends UnaryMathExpression(math.cos, "COS") @ExpressionDescription( usage = "_FUNC_(expr) - Returns the hyperbolic cosine of `expr`.", + arguments = +""" +Arguments: + * expr - number whose hyperbolic consine is to be returned. --- End diff -- consine -> cosine For the hyperbolic functions, you can just describe the argument as a "hyperbolic angle". This description is redundant with the "Returns ..." description above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168511869 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2873,7 +2945,7 @@ object functions { * or equal to the `windowDuration`. Check * `org.apache.spark.unsafe.types.CalendarInterval` for valid duration * identifiers. This duration is likewise absolute, and does not vary -* according to a calendar. + * according to a calendar. --- End diff -- This should be reverted. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168511172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -1313,131 +1313,178 @@ object functions { // /** - * Computes the cosine inverse of the given value; the returned angle is in the range - * 0.0 through pi. + * @param e the value whose arc cosine is to be returned + * @return cosine inverse of the given value in the range of 0.0 through pi, --- End diff -- I'd use the same description here as above, for all of these functions. The one above looks better to me. Up to you whether you want to talk about the range of the return value or leave that to the Math.acos docs; it should just be consistent. Also "cosine inverse" -> "inverse cosine", and make it clear as you do above that it's a synonym for arc cosine. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168510272 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -770,7 +837,14 @@ case class Unhex(child: Expression) extends UnaryExpression with ImplicitCastInp // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(expr1, expr2) - Returns the angle in radians between the positive x-axis of a plane and the point given by the coordinates (`expr1`, `expr2`).", --- End diff -- It looks like the declaration of atan2 should group with the trig functions. I think that's OK to fix here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/20618#discussion_r168510579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala --- @@ -972,6 +1045,7 @@ case class Logarithm(left: Expression, right: Expression) } } + --- End diff -- I'd revert the whitespace changes here, or at least only make changes that make the spacing consistent. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20601 LGTM pending tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20424 **[Test build #87479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87479/testReport)** for PR 20424 at commit [`eceb24e`](https://github.com/apache/spark/commit/eceb24e61798f9e5da0ed3c4dfb94d677d08b10e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20424 Sure, you can. retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20424: [Spark-23240][python] Better error message when extraneo...
Github user bersprockets commented on the issue: https://github.com/apache/spark/pull/20424 The test that finished last succeeded, but the one that started last had a spurious error. Can I get a retest? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18176: [SPARK-20952] ParquetFileFormat should forward Ta...
Github user robert3005 closed the pull request at: https://github.com/apache/spark/pull/18176 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20601 **[Test build #87478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87478/testReport)** for PR 20601 at commit [`22179e8`](https://github.com/apache/spark/commit/22179e84f6cf601be18e9b060246c54bd0cede8d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20464: [SPARK-23291][SQL][R] R's substr should not reduce start...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20464 ping @felixcheung --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20362: [Spark-22886][ML][TESTS] ML test for structured streamin...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20362 gentle ping @jkbradley @WeichenXu123 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20505: [SPARK-23251][SQL] Add checks for collection element Enc...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/20505 Yes, that is the idea. Frankly, I am not that familiar with how the compiler resolves all the implicit parameters to say confidently what is going on. But here's my take: I did a little more research and found out that the "diverging implicit expansion" error means the compiler is able to follow a resolution route that can be potentially infinite. I think it might be possible that the compiler is following my other implicit methods, trying to fit various collections inside V with the hope of satisfying the `<:<` condition some time in the future before finally giving up. Just to be sure multi-level collections work with my change, I successfully tried this: ``` scala> implicitly[Encoder[Map[Seq[Map[String, Seq[Long]]], List[Array[Map[String, Int]] res5: org.apache.spark.sql.Encoder[Map[Seq[Map[String,Seq[Long]]],List[Array[Map[String,Int] = class[value[0]: map>>,array>>>] ``` One thing that doesn't make sense for me, however, is the information the compiler gives me when enabling implicit resolution logging via `-Xlog-implicits`: ``` scala> implicitly[Encoder[Map[String, Any]]] :25: newCheckedSetEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[Map[String,Any]] because: hasMatchingSymbol reported error: polymorphic expression cannot be instantiated to expected type; found : [T[_], E]org.apache.spark.sql.Encoder[T[E]] required: org.apache.spark.sql.Encoder[Map[String,Any]] implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedSetEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[String] because: hasMatchingSymbol reported error: polymorphic expression cannot be instantiated to expected type; found : [T[_], E]org.apache.spark.sql.Encoder[T[E]] required: org.apache.spark.sql.Encoder[String] implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedMapEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[String] because: hasMatchingSymbol reported error: polymorphic expression cannot be instantiated to expected type; found : [T[_, _], K, V]org.apache.spark.sql.Encoder[T[K,V]] required: org.apache.spark.sql.Encoder[String] implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedSequenceEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[String] because: hasMatchingSymbol reported error: polymorphic expression cannot be instantiated to expected type; found : [T[_], E]org.apache.spark.sql.Encoder[T[E]] required: org.apache.spark.sql.Encoder[String] implicitly[Encoder[Map[String, Any]]] ^ :25: materializing requested reflect.runtime.universe.type.TypeTag[A] using `package`.this.materializeTypeTag[A](scala.reflect.runtime.`package`.universe) implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedMapEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[E] because: hasMatchingSymbol reported error: diverging implicit expansion for type org.apache.spark.sql.Encoder[K] starting with method newStringEncoder in class SQLImplicits implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedSetEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[Any] because: hasMatchingSymbol reported error: ambiguous implicit values: both method newIntEncoder in class SQLImplicits of type => org.apache.spark.sql.Encoder[Int] and method newLongEncoder in class SQLImplicits of type => org.apache.spark.sql.Encoder[Long] match expected type org.apache.spark.sql.Encoder[E] implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedMapEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[Any] because: hasMatchingSymbol reported error: diverging implicit expansion for type org.apache.spark.sql.Encoder[K] starting with method newStringEncoder in class SQLImplicits implicitly[Encoder[Map[String, Any]]] ^ :25: materializing requested reflect.runtime.universe.type.TypeTag[A] using `package`.this.materializeTypeTag[A](scala.reflect.runtime.`package`.universe) implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedMapEncoder is not a valid implicit value for org.apache.spark.sql.Encoder[E] because: hasMatchingSymbol reported error: diverging implicit expansion for type org.apache.spark.sql.Encoder[K] starting with method newStringEncoder in class SQLImplicits implicitly[Encoder[Map[String, Any]]] ^ :25: newCheckedSequenceEncoder is not a valid implicit value for org.apache.spark
[GitHub] spark issue #20568: [SPARK-23381][CORE] Murmur3 hash generates a different v...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20568 Retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user attilapiros commented on the issue: https://github.com/apache/spark/pull/20601 Yes, of course. The test of @zsxwing is perfect to avoid similar problems in the future. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20512: [SPARK-23182][CORE] Allow enabling TCP keep alive on the...
Github user peshopetrov commented on the issue: https://github.com/apache/spark/pull/20512 Any update? We have rolled out our Spark clusters with this change and it seems to be working great. We see no lingering connections on the masters. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20601 @attilapiros could you take a look at the test case Ryan added in #20615 and add something like that to your patch? It'd be nice to catch these things in unit tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Executor I...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20601 **[Test build #87477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87477/testReport)** for PR 20601 at commit [`c8ef968`](https://github.com/apache/spark/commit/c8ef96839d0ec79eb397757736f9a9df0b876a11). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...
Github user attilapiros commented on a diff in the pull request: https://github.com/apache/spark/pull/20601#discussion_r168458891 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable( private object ApiHelper { + val HEADER_ID = "ID" + val HEADER_TASK_INDEX = "Index" + val HEADER_ATTEMPT = "Attempt" + val HEADER_STATUS = "Status" + val HEADER_LOCALITY = "Locality Level" + val HEADER_EXECUTOR = "Executor ID" + val HEADER_HOST = "Host" + val HEADER_LAUNCH_TIME = "Launch Time" + val HEADER_DURATION = "Duration" + val HEADER_SCHEDULER_DELAY = "Scheduler Delay" + val HEADER_DESER_TIME = "Task Deserialization Time" + val HEADER_GC_TIME = "GC Time" + val HEADER_SER_TIME = "Result Serialization Time" + val HEADER_GETTING_RESULT_TIME = "Getting Result Time" + val HEADER_PEAK_MEM = "Peak Execution Memory" + val HEADER_ACCUMULATORS = "Accumulators" + val HEADER_INPUT_SIZE = "Input Size / Records" + val HEADER_OUTPUT_SIZE = "Output Size / Records" + val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time" + val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records" --- End diff -- In the header constants naming I have followed the existing task index names: ```scala HEADER_SHUFFLE_TOTAL_READS -> TaskIndexNames.SHUFFLE_TOTAL_READS, ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20554: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20554 @zsxwing can you look at it once again. some more changes. @jose-torres PTAL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20608: [SPARK-23422][Core] YarnShuffleIntegrationSuite f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20608 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20608: [SPARK-23422][Core] YarnShuffleIntegrationSuite fix when...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20608 Merging to master / 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20616 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20616 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87476/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20616 **[Test build #87476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87476/testReport)** for PR 20616 at commit [`a14ff69`](https://github.com/apache/spark/commit/a14ff6974446b8e692b03c3e3f1cab52693cc6c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20610 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87475/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20610 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20610 **[Test build #87475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87475/testReport)** for PR 20610 at commit [`19b50b1`](https://github.com/apache/spark/commit/19b50b1eb5dcdf02ecd515b5d27d0256c7f4a3ab). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17619: [SPARK-19755][Mesos] Blacklist is always active for Meso...
Github user hantuzun commented on the issue: https://github.com/apache/spark/pull/17619 Even though we only run normal Spark jobs this PR is going to fix a case for us as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20617 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87474/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20617: [MINOR][SQL] Fix an error message about inserting into b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20617 **[Test build #87474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87474/testReport)** for PR 20617 at commit [`180413a`](https://github.com/apache/spark/commit/180413a23eeb8a13a3d4d1783aa249413176cf20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14431 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14431: [SPARK-16258][SparkR] Automatically append the grouping ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14431 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/917/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disabl...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20610#discussion_r168416253 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -33,6 +33,19 @@ import org.apache.spark.util.Utils class FileStreamSinkSuite extends StreamTest { import testImplicits._ + override def beforeAll(): Unit = { --- End diff -- nit: a simpler way to fix this ``` override val conf = new SQLConf().copy(SQLConf.ORC_IMPLEMENTATION -> "native") ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fiel...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20545 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20568: [SPARK-23381][CORE] Murmur3 hash generates a diff...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20568#discussion_r168415065 --- Diff: common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java --- @@ -71,6 +73,20 @@ public static int hashUnsafeBytes(Object base, long offset, int lengthInBytes, i return fmix(h1, lengthInBytes); } + public static int hashUnsafeBytes2(Object base, long offset, int lengthInBytes, int seed) { +// This is compatible with original and another implementations. +// Use this method after 2.3.0. --- End diff -- nit: `Use this method for new components after Spark 2.3` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20555: [SPARK-23366] Improve hot reading path in ReadAhe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20555 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20545: [SPARK-23359][SQL] Adds an alias 'names' of 'fieldNames'...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20545 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20555: [SPARK-23366] Improve hot reading path in ReadAheadInput...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20555 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20618 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20618 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20567 ^ this change LGTM. Can we make a PR for this change only and leave the fallback part for Spark 2.4? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20618: [SPARK-23329][SQL] Fix documentation of trigonome...
GitHub user misutoth opened a pull request: https://github.com/apache/spark/pull/20618 [SPARK-23329][SQL] Fix documentation of trigonometric functions ## What changes were proposed in this pull request? Provide more details in trigonometric function documentations. Referenced `java.lang.Math` for further details in the descriptions. ## How was this patch tested? Ran full build, checked generated documentation manually You can merge this pull request into a Git repository by running: $ git pull https://github.com/misutoth/spark trigonometric-doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20618.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20618 commit 25c329b4f93b407f53d87e8199444e83b6a1be15 Author: Mihaly Toth Date: 2018-02-13T14:34:26Z [SPARK-23329][SQL] Fix documentation of trigonometric functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 > The binary type bug sounds like a blocker, can we just fix it surgically by checking the supported data types before going to the arrow optimization path? For now we can stick with that the current behavior is, i.e. throw exception. That's basically (https://github.com/apache/spark/pull/20567#issuecomment-365064243): ```python if # 'spark.sql.execution.arrow.enabled' true? require_minimum_pyarrow_version() try: to_arrow_schema(self.schema) # return the one with Arrow except Exception as e: raise Exception("'spark.sql.execution.arrow.enabled' blah blah ...") else: # return the one without Arrow ``` because `to_arrow_schema(self.schema)` checks the supported types like other Pandas/Arrow functionalities. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20605: [SPARK-23419][SPARK-23416][SS] data source v2 wri...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20605 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20605: [SPARK-23419][SPARK-23416][SS] data source v2 write path...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20605 thanks, merging to master/2.3! Since it's a bug fix so I included it in branch 2.3, please let me know if you have other concerns, cc @sameeragarwal @marmbrus --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20567 The binary type bug sounds like a blocker, can we just fix it surgically by checking the supported data types before going to the arrow optimization path? For now we can stick with that the current behavior is, i.e. throw exception. The inconsistent behavior between `toPandas` and `createDataFrame` is confusing but may not be a blocker. We can fix it in Spark 2.4 and add a note in the migration guide. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20615: [SPARK-23430][WebUI]ApiHelper.COLUMN_TO_INDEX should mat...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/20615 This looks the same as SPARK-23413 / #20601 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20601: [SPARK-23413][UI] Fix sorting tasks by Host / Exe...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/20601#discussion_r168408617 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala --- @@ -963,33 +965,60 @@ private[ui] class TaskPagedTable( private object ApiHelper { + val HEADER_ID = "ID" + val HEADER_TASK_INDEX = "Index" + val HEADER_ATTEMPT = "Attempt" + val HEADER_STATUS = "Status" + val HEADER_LOCALITY = "Locality Level" + val HEADER_EXECUTOR = "Executor ID" + val HEADER_HOST = "Host" + val HEADER_LAUNCH_TIME = "Launch Time" + val HEADER_DURATION = "Duration" + val HEADER_SCHEDULER_DELAY = "Scheduler Delay" + val HEADER_DESER_TIME = "Task Deserialization Time" + val HEADER_GC_TIME = "GC Time" + val HEADER_SER_TIME = "Result Serialization Time" + val HEADER_GETTING_RESULT_TIME = "Getting Result Time" + val HEADER_PEAK_MEM = "Peak Execution Memory" + val HEADER_ACCUMULATORS = "Accumulators" + val HEADER_INPUT_SIZE = "Input Size / Records" + val HEADER_OUTPUT_SIZE = "Output Size / Records" + val HEADER_SHUFFLE_READ_TIME = "Shuffle Read Blocked Time" + val HEADER_SHUFFLE_TOTAL_READS = "Shuffle Read Size / Records" + val HEADER_SHUFFLE_REMOTE_READS = "Shuffle Remote Reads" + val HEADER_SHUFFLE_WRITE_TIME = "Write Time" + val HEADER_SHUFFLE_WRITE_SIZE = "Shuffle Write Size / Records" + val HEADER_MEM_SPILL = "Shuffle Spill (Memory)" + val HEADER_DISK_SPILL = "Shuffle Spill (Disk)" + val HEADER_ERROR = "Errors" private val COLUMN_TO_INDEX = Map( -"ID" -> null.asInstanceOf[String], -"Index" -> TaskIndexNames.TASK_INDEX, -"Attempt" -> TaskIndexNames.ATTEMPT, -"Status" -> TaskIndexNames.STATUS, -"Locality Level" -> TaskIndexNames.LOCALITY, -"Executor ID / Host" -> TaskIndexNames.EXECUTOR, -"Launch Time" -> TaskIndexNames.LAUNCH_TIME, -"Duration" -> TaskIndexNames.DURATION, -"Scheduler Delay" -> TaskIndexNames.SCHEDULER_DELAY, -"Task Deserialization Time" -> TaskIndexNames.DESER_TIME, -"GC Time" -> TaskIndexNames.GC_TIME, -"Result Serialization Time" -> TaskIndexNames.SER_TIME, -"Getting Result Time" -> TaskIndexNames.GETTING_RESULT_TIME, -"Peak Execution Memory" -> TaskIndexNames.PEAK_MEM, -"Accumulators" -> TaskIndexNames.ACCUMULATORS, -"Input Size / Records" -> TaskIndexNames.INPUT_SIZE, -"Output Size / Records" -> TaskIndexNames.OUTPUT_SIZE, -"Shuffle Read Blocked Time" -> TaskIndexNames.SHUFFLE_READ_TIME, -"Shuffle Read Size / Records" -> TaskIndexNames.SHUFFLE_TOTAL_READS, -"Shuffle Remote Reads" -> TaskIndexNames.SHUFFLE_REMOTE_READS, -"Write Time" -> TaskIndexNames.SHUFFLE_WRITE_TIME, -"Shuffle Write Size / Records" -> TaskIndexNames.SHUFFLE_WRITE_SIZE, -"Shuffle Spill (Memory)" -> TaskIndexNames.MEM_SPILL, -"Shuffle Spill (Disk)" -> TaskIndexNames.DISK_SPILL, -"Errors" -> TaskIndexNames.ERROR) +HEADER_ID -> null.asInstanceOf[String], +HEADER_TASK_INDEX -> TaskIndexNames.TASK_INDEX, +HEADER_ATTEMPT -> TaskIndexNames.ATTEMPT, +HEADER_STATUS -> TaskIndexNames.STATUS, +HEADER_LOCALITY -> TaskIndexNames.LOCALITY, +HEADER_EXECUTOR -> TaskIndexNames.EXECUTOR, +HEADER_HOST -> TaskIndexNames.EXECUTOR, --- End diff -- Looks like we'll have a new RC, so I'll jump in the bandwagon and mark this one a blocker too. We can then add the new index in 2.3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20616 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20616 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/916/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20616 **[Test build #87476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87476/testReport)** for PR 20616 at commit [`a14ff69`](https://github.com/apache/spark/commit/a14ff6974446b8e692b03c3e3f1cab52693cc6c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 The root cause is Arrow conversion in Python side interprets binaries as `str`, and I here avoided this by checking if the type is what we supported or not. This is the most trivial fix. I made a fix safe and small as possible as I can here. I can fix the error message only but the size of change and diff is virtually the same - https://github.com/apache/spark/pull/20567#issuecomment-365064243. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20616: [SPARK-23434][SQL] Spark should not warn `metadata direc...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20616 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20610: [SPARK-23426][SQL] Use `hive` ORC impl and disable PPD f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20610 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org