[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21701 **[Test build #92553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92553/testReport)** for PR 21701 at commit [`c0d1c6e`](https://github.com/apache/spark/commit/c0d1c6e0a5532eeab0848834d2dc348808e54069). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait MultipleWatermarkPolicy ` * `case class WatermarkTracker(policy: MultipleWatermarkPolicy) extends Logging ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21701 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92553/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r199701953 --- Diff: python/pyspark/sql/conf.py --- @@ -59,7 +62,7 @@ def unset(self, key): def _checkType(self, obj, identifier): """Assert that an object is of type str.""" -if not isinstance(obj, str) and not isinstance(obj, unicode): +if not isinstance(obj, basestring): --- End diff -- Is there an issue here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r199702001 --- Diff: dev/create-release/releaseutils.py --- @@ -49,6 +49,9 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- It creates a new function called __raw_input()__ that is identical to the builtin __input()__. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21701 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r199701965 --- Diff: dev/merge_spark_pr.py --- @@ -39,6 +39,9 @@ except ImportError: JIRA_IMPORTED = False +if sys.version_info[0] >= 3: +raw_input = input --- End diff -- It creates a new function called __raw_input()__ that is identical to the builtin __input()__. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21596 **[Test build #92554 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92554/testReport)** for PR 21596 at commit [`5006467`](https://github.com/apache/spark/commit/50064675706f7ac46f2665da752e0f410ad84183). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92554/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21633 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21633 **[Test build #92555 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92555/testReport)** for PR 21633 at commit [`4419f52`](https://github.com/apache/spark/commit/4419f52bf0104cc44fc6b27183030876778bbdc4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21633 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2
GitHub user cclauss opened a pull request: https://github.com/apache/spark/pull/21702 [SPARK-23698] Remove raw_input() from Python 2 Signed-off-by: cclauss ## What changes were proposed in this pull request? Humans will be able to enter text in Python 3 prompts which they can not do today. The Python builtin __raw_input()__ was removed in Python 3 in favor of __input()__. This PR does the same thing in Python 2. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) flake8 testing Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cclauss/spark python-fix-raw_input Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21702.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21702 commit 960769a735933d58d136ea068954ad83d4731b10 Author: cclauss Date: 2018-07-03T07:10:46Z [SPARK-23698] Remove raw_input() from Python 2 Signed-off-by: cclauss --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21667: [SPARK-24691][SQL]Add new API `supportDataType` i...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r199705148 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala --- @@ -42,63 +38,27 @@ object DataSourceUtils { /** * Verify if the schema is supported in datasource. This verification should be done - * in a driver side, e.g., `prepareWrite`, `buildReader`, and `buildReaderWithPartitionValues` - * in `FileFormat`. - * - * Unsupported data types of csv, json, orc, and parquet are as follows; - * csv -> R/W: Interval, Null, Array, Map, Struct - * json -> W: Interval - * orc -> W: Interval, Null - * parquet -> R/W: Interval, Null + * in a driver side. */ private def verifySchema(format: FileFormat, schema: StructType, isReadPath: Boolean): Unit = { -def throwUnsupportedException(dataType: DataType): Unit = { - throw new UnsupportedOperationException( -s"$format data source does not support ${dataType.simpleString} data type.") -} - -def verifyType(dataType: DataType): Unit = dataType match { - case BooleanType | ByteType | ShortType | IntegerType | LongType | FloatType | DoubleType | - StringType | BinaryType | DateType | TimestampType | _: DecimalType => - - // All the unsupported types for CSV - case _: NullType | _: CalendarIntervalType | _: StructType | _: ArrayType | _: MapType - if format.isInstanceOf[CSVFileFormat] => -throwUnsupportedException(dataType) - - case st: StructType => st.foreach { f => verifyType(f.dataType) } - - case ArrayType(elementType, _) => verifyType(elementType) - - case MapType(keyType, valueType, _) => -verifyType(keyType) -verifyType(valueType) - - case udt: UserDefinedType[_] => verifyType(udt.sqlType) - - // Interval type not supported in all the write path - case _: CalendarIntervalType if !isReadPath => -throwUnsupportedException(dataType) - - // JSON and ORC don't support an Interval type, but we pass it in read pass - // for back-compatibility. - case _: CalendarIntervalType if format.isInstanceOf[JsonFileFormat] || -format.isInstanceOf[OrcFileFormat] => +def verifyType(dataType: DataType): Unit = { + if (!format.supportDataType(dataType, isReadPath)) { +throw new UnsupportedOperationException( + s"$format data source does not support ${dataType.simpleString} data type.") + } + dataType match { --- End diff -- I see. I will update it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21702 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21702 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21702 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21649: [SPARK-23648][R][SQL]Adds more types for hint in ...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21649#discussion_r199706822 --- Diff: R/pkg/R/DataFrame.R --- @@ -3905,6 +3905,16 @@ setMethod("rollup", groupedData(sgd) }) +isTypeAllowedForSqlHint <- function(x) { + if (is.character(x) | is.numeric(x)) { +TRUE + } else if (is.list(x)) { +all (sapply(x, (function (y) is.character(y) | is.numeric(y + } else { +FALSE + } +} + #' hint #' #' Specifies execution plan hint and return a new SparkDataFrame. --- End diff -- the concern would be if other types in python or R are going to be translated/mapped properly to Java/Scala types, so this is probably ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21678 ok then, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21678: [SPARK-23461][R]vignettes should include model predictio...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21678 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21666: [SPARK-24535][SPARKR] fix tests on java check err...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21666#discussion_r199713355 --- Diff: R/pkg/R/client.R --- @@ -61,6 +61,11 @@ generateSparkSubmitArgs <- function(args, sparkHome, jars, sparkSubmitOpts, pack } checkJavaVersion <- function() { + if (is_windows()) { +# See SPARK-24535 --- End diff -- updated --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/633/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21666 **[Test build #92556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92556/testReport)** for PR 21666 at commit [`e1d1a64`](https://github.com/apache/spark/commit/e1d1a64f5bf38710560c6c83b46d8562bb53dd35). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21666 **[Test build #92557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92557/testReport)** for PR 21666 at commit [`8d9ef83`](https://github.com/apache/spark/commit/8d9ef83deaecc9a0c0c193b7a56c6c4177cbb952). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/634/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21596 **[Test build #92558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92558/testReport)** for PR 21596 at commit [`4b78651`](https://github.com/apache/spark/commit/4b786518095c7ed2fd034f74e5b4bd83a3062c29). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21702#discussion_r199718062 --- Diff: dev/create-release/releaseutils.py --- @@ -49,13 +49,16 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +if sys.version < '3': +input = raw_input --- End diff -- If we can do the opposite, the diff should be only 4 lines though --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/21703 [SPARK-24732][SQL] Type coercion between MapTypes. ## What changes were proposed in this pull request? Currently we don't allow type coercion between maps. We can support type coercion between MapTypes where both the key types and the value types are compatible. ## How was this patch tested? Added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-24732/maptypecoercion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21703 commit 928501a63f2ae90b4d95949e6fc505b762d03ac7 Author: Takuya UESHIN Date: 2018-07-03T08:08:25Z Type coercion between MapTypes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21703 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21703 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/635/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21703 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21703 **[Test build #92559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92559/testReport)** for PR 21703 at commit [`928501a`](https://github.com/apache/spark/commit/928501a63f2ae90b4d95949e6fc505b762d03ac7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21073: [SPARK-23936][SQL] Implement map_concat
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21073#discussion_r199722217 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -551,6 +551,36 @@ object TypeCoercion { case None => s } + case m @ MapConcat(children) if children.forall(c => MapType.acceptsType(c.dataType)) && +!haveSameType(children) => +val keyTypes = children.map(_.dataType.asInstanceOf[MapType].keyType) --- End diff -- As for 1), I submitted a pr #21703. I'm not sure we can merge it yet, but it will help you improve this pr. As for 2), Adding casts to the same type should not be the problem because the extra casts will be removed during the optimization phase. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user cclauss commented on a diff in the pull request: https://github.com/apache/spark/pull/21702#discussion_r199723478 --- Diff: dev/create-release/releaseutils.py --- @@ -49,13 +49,16 @@ print("Install using 'sudo pip install unidecode'") sys.exit(-1) +if sys.version < '3': +input = raw_input --- End diff -- Two downsides to that approach: 1. We stick with the legacy Python syntax which will unnecessarily complicate our lives (and our diffs) [in 18 months](http://pythonclock.org) 2. This approach reduces the linting errors from 10 down to just 2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21666 **[Test build #92556 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92556/testReport)** for PR 21666 at commit [`e1d1a64`](https://github.com/apache/spark/commit/e1d1a64f5bf38710560c6c83b46d8562bb53dd35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92556/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21666 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92557/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21666: [SPARK-24535][SPARKR] fix tests on java check error
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21666 **[Test build #92557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92557/testReport)** for PR 21666 at commit [`8d9ef83`](https://github.com/apache/spark/commit/8d9ef83deaecc9a0c0c193b7a56c6c4177cbb952). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21703 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21699#discussion_r199742580 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -340,36 +340,52 @@ class RelationalGroupedDataset protected[sql]( /** * Pivots a column of the current `DataFrame` and performs the specified aggregation. - * There are two versions of pivot function: one that requires the caller to specify the list - * of distinct values to pivot on, and one that does not. The latter is more concise but less - * efficient, because Spark needs to first compute the list of distinct values internally. * * {{{ * // Compute the sum of earnings for each year by course with each course as a separate column - * df.groupBy("year").pivot("course", Seq("dotNET", "Java")).sum("earnings") - * - * // Or without specifying column values (less efficient) - * df.groupBy("year").pivot("course").sum("earnings") + * df.groupBy($"year").pivot($"course", Seq("dotNET", "Java")).sum($"earnings") * }}} * - * @param pivotColumn Name of the column to pivot. + * @param pivotColumn the column to pivot. * @param values List of values that will be translated to columns in the output DataFrame. - * @since 1.6.0 + * @since 2.4.0 */ - def pivot(pivotColumn: String, values: Seq[Any]): RelationalGroupedDataset = { + def pivot(pivotColumn: Column, values: Seq[Any]): RelationalGroupedDataset = { --- End diff -- To make diffs smaller, can you move this under the signature `def pivot(pivotColumn: String, values: Seq[Any])`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21699 cc: @rxin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21699 `def pivot(pivotColumn: String)`, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21693: [SPARK-24673][SQL] scala sql function from_utc_ti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21693#discussion_r199744502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2945,6 +2956,17 @@ object functions { ToUTCTimestamp(ts.expr, Literal(tz)) } + /** + * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time + * zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield + * '2017-07-14 01:40:00.0'. + * @group datetime_funcs + * @since 1.5.0 --- End diff -- `@since 2.4.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21693: [SPARK-24673][SQL] scala sql function from_utc_ti...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21693#discussion_r199744569 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2934,6 +2934,17 @@ object functions { FromUTCTimestamp(ts.expr, Literal(tz)) } + /** + * Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders + * that time as a timestamp in the given time zone. For example, 'GMT+1' would yield + * '2017-07-14 03:40:00.0'. + * @group datetime_funcs + * @since 1.5.0 --- End diff -- `@since 2.4.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21693: [SPARK-24673][SQL] scala sql function from_utc_timestamp...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21693 cc: @ueshin @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21260 **[Test build #92560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92560/testReport)** for PR 21260 at commit [`45eb477`](https://github.com/apache/spark/commit/45eb477623d89fb9352bf38b75c0a27e228f291f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21260 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21260 **[Test build #92560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92560/testReport)** for PR 21260 at commit [`45eb477`](https://github.com/apache/spark/commit/45eb477623d89fb9352bf38b75c0a27e228f291f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21260: [SPARK-23529][K8s] Support mounting volumes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21260 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92560/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21633 **[Test build #92555 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92555/testReport)** for PR 21633 at commit [`4419f52`](https://github.com/apache/spark/commit/4419f52bf0104cc44fc6b27183030876778bbdc4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21633 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21633 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92555/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21699 **[Test build #92561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92561/testReport)** for PR 21699 at commit [`d62b7e7`](https://github.com/apache/spark/commit/d62b7e789f38219b62fb5b010fb2cacc0324fe29). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/21704 [SPARK-24734][SQL] Fix containsNull of Concat for array type. ## What changes were proposed in this pull request? Currently `Concat` for array type uses the data type of the first child as its own data type, but the children might include an array containing nulls. We should aware the nullabilities of all children. ## How was this patch tested? Modified and added some tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-24734/concat_containsnull Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21704.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21704 commit d87a8c6c0d1a4db5c9444781160a65562f8ea738 Author: Takuya UESHIN Date: 2018-07-03T11:21:06Z Fix containsNull of Concat for array type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21704 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/636/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21704 **[Test build #92562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92562/testReport)** for PR 21704 at commit [`d87a8c6`](https://github.com/apache/spark/commit/d87a8c6c0d1a4db5c9444781160a65562f8ea738). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/13599 Is there any work being done on this PR at this point in time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21704 cc @mn-mikke @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21703 **[Test build #92559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92559/testReport)** for PR 21703 at commit [`928501a`](https://github.com/apache/spark/commit/928501a63f2ae90b4d95949e6fc505b762d03ac7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21703 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21703 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92559/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21704 @ueshin Thanks for bringing this topic! This problem with different ```nullable```/```containsNull``` flags seems to be more generic. In [21687](https://github.com/apache/spark/pull/21687), we've touched a similar problem with ```CaseWhen``` and ```If``` expression. So I think It would nice if we could think together about a generic and consistent solution for all espressions. WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21596 **[Test build #92558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92558/testReport)** for PR 21596 at commit [`4b78651`](https://github.com/apache/spark/commit/4b786518095c7ed2fd034f74e5b4bd83a3062c29). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92558/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21596 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21633 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21633 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/637/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21633: [SPARK-24646][CORE] Minor change to spark.yarn.dist.forc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21633 **[Test build #92563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92563/testReport)** for PR 21633 at commit [`4419f52`](https://github.com/apache/spark/commit/4419f52bf0104cc44fc6b27183030876778bbdc4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21704: [SPARK-24734][SQL] Fix containsNull of Concat for array ...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/21704 @mn-mikke Thanks! I'll take a look and join the discussion later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user cclauss commented on the issue: https://github.com/apache/spark/pull/21702 Tested data entry... In __./dev/merge_spark_pr.py__ just after __clean_up()__, I added the lines: ``` while not continue_maybe('y to conntinue'): print('loop') sys.exit() ``` Test: 'y' or 'Y' caused a loop while all others including "", "yes", "n", "N", "0", "1", "." caused an exit() Identical results on both Python 2 and Python 3 --- In __./dev/create-release/releaseutils.py__ just after __yesOrNoPrompt()_ I added the lines: ``` while not yesOrNoPrompt('y to quit'): print('got an 'n'') sys.exit() ``` Test: 'y' caused an exit() while all others including "", "Y", "yes", "n", "N", "0", "1", "." caused a loop Identical results on both Python 2 and Python 3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21596 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spark on K8...
Github user skonto commented on the issue: https://github.com/apache/spark/pull/21669 @ifilonenko build fails due to the tags issue I guess. I fixed it in the other PR ;) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20446: [SPARK-23254][ML] Add user guide entry and example for D...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20446 @WeichenXu123 looks like there was one more outstanding comment, about using `.show()`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r199801815 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -336,7 +336,7 @@ private[spark] class SparkSubmit extends Logging { val targetDir = Utils.createTempDir() // assure a keytab is available from any place in a JVM -if (clusterManager == YARN || clusterManager == LOCAL || isMesosClient) { +if (clusterManager == YARN || clusterManager == LOCAL || isMesosClient || isKubernetesCluster) { --- End diff -- This check has been restrictive for customers in the past. There are cases where spark submit should not have the file locally and keytab should be mounted as a secret within the cluster. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21433 **[Test build #4203 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4203/testReport)** for PR 21433 at commit [`245181a`](https://github.com/apache/spark/commit/245181a6ebb03b4f394097297ae245705aaf9b0f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21596: [SPARK-24601] Bump Jackson version
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/21596#discussion_r199803209 --- Diff: pom.xml --- @@ -158,8 +158,8 @@ 2.11.12 2.11 1.9.13 -2.6.7 - 2.6.7.1 +2.9.6 + 2.9.6 --- End diff -- I suspect we can collapse these two versions; they were broken out to handle the fact that a few 2.6.x Jackson releases didn't publish all artifacts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r199803715 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala --- @@ -211,6 +211,51 @@ private[spark] object Config extends Logging { "Ensure that major Python version is either Python2 or Python3") .createWithDefault("2") + val KUBERNETES_KERBEROS_SUPPORT = +ConfigBuilder("spark.kubernetes.kerberos.enabled") + .doc("Specify whether your job is a job that will require a Delegation Token to access HDFS") --- End diff -- I think kerberos goes beyond DTs so it shouldnt be specific to that. Also I think you dont need the user to pass that. You just need to call: UserGroupInformation.isSecurityEnabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21705 cc: @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21705 [SPARK-24727][SQL] Add a static config to control cache size for generated classes ## What changes were proposed in this pull request? Since SPARK-24250 has been resolved, executors correctly references user-defined configurations. So, this pr added a static config to control cache size for generated classes in `CodeGenerator`. ## How was this patch tested? Manually checked that executors referenced `spark.sql.cacheSize` correctly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-24727 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21705.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21705 commit 8ee8e00f156e577b32b01d015b8bd24f72ae7340 Author: Takeshi Yamamuro Date: 2018-07-03T13:14:35Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/638/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21693: [SPARK-24673][SQL] scala sql function from_utc_timestamp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21693 **[Test build #4204 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4204/testReport)** for PR 21693 at commit [`d4ebc8f`](https://github.com/apache/spark/commit/d4ebc8f45aa78eae13cb6166204f0f5de9de4bd8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21657 @HyukjinKwon kindly ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92564/testReport)** for PR 21705 at commit [`8ee8e00`](https://github.com/apache/spark/commit/8ee8e00f156e577b32b01d015b8bd24f72ae7340). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21705 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21669: [SPARK-23257][K8S][WIP] Kerberos Support for Spar...
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/21669#discussion_r199806583 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Constants.scala --- @@ -81,4 +83,35 @@ private[spark] object Constants { val KUBERNETES_MASTER_INTERNAL_URL = "https://kubernetes.default.svc"; val DRIVER_CONTAINER_NAME = "spark-kubernetes-driver" val MEMORY_OVERHEAD_MIN_MIB = 384L + + // Hadoop Configuration + val HADOOP_FILE_VOLUME = "hadoop-properties" + val HADOOP_CONF_DIR_PATH = "/etc/hadoop/conf" + val ENV_HADOOP_CONF_DIR = "HADOOP_CONF_DIR" + val HADOOP_CONF_DIR_LOC = "spark.kubernetes.hadoop.conf.dir" + val HADOOP_CONFIG_MAP_SPARK_CONF_NAME = +"spark.kubernetes.hadoop.executor.hadoopConfigMapName" + + // Kerberos Configuration + val KERBEROS_DELEGEGATION_TOKEN_SECRET_NAME = +"spark.kubernetes.kerberos.delegation-token-secret-name" + val KERBEROS_KEYTAB_SECRET_NAME = +"spark.kubernetes.kerberos.key-tab-secret-name" + val KERBEROS_KEYTAB_SECRET_KEY = +"spark.kubernetes.kerberos.key-tab-secret-key" + val KERBEROS_SPARK_USER_NAME = +"spark.kubernetes.kerberos.spark-user-name" + val KERBEROS_SECRET_LABEL_PREFIX = +"hadoop-tokens" + val SPARK_HADOOP_PREFIX = "spark.hadoop." + val HADOOP_SECURITY_AUTHENTICATION = +SPARK_HADOOP_PREFIX + "hadoop.security.authentication" + + // Kerberos Token-Refresh Server + val KERBEROS_REFRESH_LABEL_KEY = "refresh-hadoop-tokens" --- End diff -- I left a comment also in the design doc, can we also provide the option for using an existing renewal service like when integrating with an external hadoop cluster where people already have that. This is how it worked for mesos so far. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21705 can we add a test case in `ExecutorSideSQLConfSuite` to prove that static conf also works? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21705 ok, will do. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19410 @mallman Just my two cents regarding built-in solutions: Periodic checkpointer deletes checkpoint files not to pollute the hard drive. Although disk storage is cheap it's not free. For example, in my case (graph with >1B vertices and about the same amount of edges) checkpoint directory with a single checkpoint took about 150-200GB. Checkpoint interval was set to 5, and then job was able to complete in about 100 iterations. So in case of not cleaning up unnecessary checkpoints, the checkpoint directory could grow up to 6TB (which is quite a lot) in my case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/639/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92565/testReport)** for PR 21705 at commit [`0a9eaa2`](https://github.com/apache/spark/commit/0a9eaa26356e6c0adef53b07c47ed19265aa9383). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21705#discussion_r199817343 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala --- @@ -40,16 +40,24 @@ class ExecutorSideSQLConfSuite extends SparkFunSuite with SQLTestUtils { spark = null } + private def withStaticSQLConf(pairs: (String, String)*)(f: => Unit): Unit = { --- End diff -- ah sorry I was wrong. Static conf is no different from normal conf, it's just immutable during runtime. Maybe just call this method `withSQLConf`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org