[GitHub] spark issue #19709: [SPARK-22483][CORE]. Exposing java.nio bufferedPool memo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19709 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19709: [SPARK-22483][CORE]. Exposing java.nio bufferedPool memo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19709 **[Test build #83649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83649/testReport)** for PR 19709 at commit [`2a0b281`](https://github.com/apache/spark/commit/2a0b2816b70f5c1f83a0da3f8dd81913c5e90051). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19709: [SPARK-22483][CORE]. Exposing java.nio bufferedPool memo...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19709 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19709: [SPARK-22483][CORE]. Exposing java.nio bufferedPool memo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19709 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19709: [SPARK-22483][CORE]. Exposing java.nio bufferedPo...
GitHub user vundela opened a pull request: https://github.com/apache/spark/pull/19709 [SPARK-22483][CORE]. Exposing java.nio bufferedPool memory metrics to Metric System ## What changes were proposed in this pull request? Adds java.nio bufferedPool memory metrics to metrics system which includes both direct and mapped memory. ## How was this patch tested? Manually tested and checked direct and mapped memory metrics too available in metrics system using Console sink. Here is the sample console output application_1509655862825_0016.2.jvm.direct.capacity value = 19497 application_1509655862825_0016.2.jvm.direct.count value = 6 application_1509655862825_0016.2.jvm.direct.used value = 19498 application_1509655862825_0016.2.jvm.mapped.capacity value = 0 application_1509655862825_0016.2.jvm.mapped.count value = 0 application_1509655862825_0016.2.jvm.mapped.used value = 0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/vundela/spark SPARK-22483 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19709.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19709 commit 2a0b2816b70f5c1f83a0da3f8dd81913c5e90051 Author: Srinivasa Reddy VundelaDate: 2017-11-09T18:16:26Z [SPARK-22483][CORE]. Exposing java.nio bufferedPool memory metrics to metrics system --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150041775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- BTW, the name should be `spark.sql.ui.retainedStages` instead of `spark.ui.retainedStages`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19700: [SPARK-22471][SQL] SQLListener consumes much memo...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19700#discussion_r150041314 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -101,6 +101,8 @@ class SQLListener(conf: SparkConf) extends SparkListener with Logging { private val retainedExecutions = conf.getInt("spark.sql.ui.retainedExecutions", 1000) + private val retainedStages = conf.getInt("spark.ui.retainedStages", 1000) --- End diff -- @tashoyan . Could you add a doc for this like `spark.sql.ui.retainedExecutions` here? Please refer #9052. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19701 thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19699: [MINOR][Core] Fix nits in MetricsSystemSuite
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19699 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19515 @pmackles perhaps you could email this to d...@spark.apache.org to get some visibility to this and hopefully someone else on the mesos side can review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19699: [MINOR][Core] Fix nits in MetricsSystemSuite
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19699 Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19515 **[Test build #83648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83648/testReport)** for PR 19515 at commit [`33a8e68`](https://github.com/apache/spark/commit/33a8e6880a468335330a7cb6507493de8b125faa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19515 @susanxhuynh or anyone from the mesos side would you please review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19543 @susanxhuynh or anyone from the mesos side would you please review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19515 Jenkins, test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...
GitHub user felixcheung reopened a pull request: https://github.com/apache/spark/pull/19657 [SPARK-22344][SPARKR] clean up install dir if running test as source package ## What changes were proposed in this pull request? remove spark if spark downloaded & installed ## How was this patch tested? manually by building package Jenkins, AppVeyor You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rinstalldir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19657.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19657 commit d4433e13565e9e3d41928e1d2262696204476341 Author: Felix CheungDate: 2017-11-04T08:14:33Z add flag to cleanup commit 0ea7c9b1c26c604296c35bc1588a6a5606a10cb2 Author: Felix Cheung Date: 2017-11-05T03:21:26Z no get0 commit d0064ca24339143aeac9f1ef78b924361f908248 Author: Felix Cheung Date: 2017-11-07T10:27:13Z make into function commit 31f3bd06cc7d2b7bf482eddfe2f2738244cfbca7 Author: Felix Cheung Date: 2017-11-07T10:50:55Z fix lint commit ca5349bfc0dae03c2402b104e51c78a841541b09 Author: Felix Cheung Date: 2017-11-07T10:55:27Z comment commit f2aa5b7e12ed36e7b56610e695615260643f952f Author: Felix Cheung Date: 2017-11-07T17:31:16Z fix windows commit 90d36c9ee3b0aed60ac9343e05b44366d1d2bf43 Author: Felix Cheung Date: 2017-11-07T17:38:12Z more test commit f21a90bef2a08c9d4cfdcc6588fb2da64679b4ec Author: Felix Cheung Date: 2017-11-07T17:39:05Z fix commit 18e238a62d53de5a73283a741c1a9bb8230f4484 Author: Felix Cheung Date: 2017-11-08T04:54:53Z fix 2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19657 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83647/testReport)** for PR 19459 at commit [`0ad736b`](https://github.com/apache/spark/commit/0ad736b352eacd394ea6ea684aa851853769e7d1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19703: [SPARK-22403][SS] Add optional checkpointLocation argume...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19703 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19703: [SPARK-22403][SS] Add optional checkpointLocation argume...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19703 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83646/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19703: [SPARK-22403][SS] Add optional checkpointLocation argume...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19703 **[Test build #83646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83646/testReport)** for PR 19703 at commit [`171496a`](https://github.com/apache/spark/commit/171496a424ed23ebadafe29ff74de72f3db5a49f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for te...
Github user henryr closed the pull request at: https://github.com/apache/spark/pull/19701 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19704 Thank you, @ueshin ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19703: [SPARK-22403][SS] Add optional checkpointLocation...
Github user wypoon commented on a diff in the pull request: https://github.com/apache/spark/pull/19703#discussion_r150030572 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredKafkaWordCount.scala --- @@ -46,11 +51,13 @@ object StructuredKafkaWordCount { def main(args: Array[String]): Unit = { if (args.length < 3) { System.err.println("Usage: StructuredKafkaWordCount " + -" ") +" []") System.exit(1) } -val Array(bootstrapServers, subscribeType, topics) = args +val Array(bootstrapServers, subscribeType, topics, _*) = args +val checkpointLocation = + if (args.length > 3) args(3) else "/tmp/temporary-" + UUID.randomUUID.toString --- End diff -- This is what the internal default would be if java.io.tmpdir is "/tmp", but in case of YARN cluster mode, java.io.tmpdir is something else (the underlying problem). Supplying this default here is just to ease the user experience. They would get the same result running in YARN cluster mode or client mode, without supplying an explicit checkpoint location. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83645/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83645/testReport)** for PR 19479 at commit [`8af3868`](https://github.com/apache/spark/commit/8af38687d638ae2d94d9f76955b182df02404cce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19703: [SPARK-22403][SS] Add optional checkpointLocation argume...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19703 **[Test build #83646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83646/testReport)** for PR 19703 at commit [`171496a`](https://github.com/apache/spark/commit/171496a424ed23ebadafe29ff74de72f3db5a49f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19703: [SPARK-22403][SS] Add optional checkpointLocation...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/19703#discussion_r150029549 --- Diff: examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredKafkaWordCount.scala --- @@ -46,11 +51,13 @@ object StructuredKafkaWordCount { def main(args: Array[String]): Unit = { if (args.length < 3) { System.err.println("Usage: StructuredKafkaWordCount " + -" ") +" []") System.exit(1) } -val Array(bootstrapServers, subscribeType, topics) = args +val Array(bootstrapServers, subscribeType, topics, _*) = args +val checkpointLocation = + if (args.length > 3) args(3) else "/tmp/temporary-" + UUID.randomUUID.toString --- End diff -- why bother supplying a default? will this be any better than spark's internal default? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19703: [SPARK-22403][SS] Add optional checkpointLocation argume...
Github user squito commented on the issue: https://github.com/apache/spark/pull/19703 Jenkins, add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19707: [SPARK-22472][SQL] add null check for top-level p...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19707#discussion_r150027756 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1408,6 +1409,23 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkDataset(ds, SpecialCharClass("1", "2")) } } + + test("SPARK-22472: add null check for top-level primitive values") { +// If the primitive values are from Option, we need to do runtime null check. +val ds = Seq(Some(1), None).toDS().as[Int] +intercept[NullPointerException](ds.collect()) +val e = intercept[SparkException](ds.map(_ * 2).collect()) +assert(e.getCause.isInstanceOf[NullPointerException]) + +withTempPath { path => + Seq(new Integer(1), null).toDF("i").write.parquet(path.getCanonicalPath) --- End diff -- Is this PR orthogonal to data source format? Could you test more data source like `JSON`, here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19703: [SPARK-22403][SS] Add optional checkpointLocation argume...
Github user wypoon commented on the issue: https://github.com/apache/spark/pull/19703 @srowen This change is indeed just a workaround for an underlying problem, as explained in the JIRA. @zsxwing suggested improving the StructuredKafkaWordCount example as a workaround. He did not have a proposal on how best to address the underlying problem. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19707: [SPARK-22472][SQL] add null check for top-level p...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19707#discussion_r150026840 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -134,7 +134,13 @@ object ScalaReflection extends ScalaReflection { val tpe = localTypeOf[T] val clsName = getClassNameFromType(tpe) val walkedTypePath = s"""- root class: "$clsName :: Nil -deserializerFor(tpe, None, walkedTypePath) +val expr = deserializerFor(tpe, None, walkedTypePath) +val Schema(_, nullable) = schemaFor(tpe) +if (nullable) { + expr +} else { + AssertNotNull(expr, walkedTypePath) +} --- End diff -- Hi, @cloud-fan . It looks great. Can we add a test case in `ScalaReflectionSuite`, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19707: [SPARK-22472][SQL] add null check for top-level p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19707#discussion_r150024246 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1408,6 +1409,23 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkDataset(ds, SpecialCharClass("1", "2")) } } + + test("SPARK-22472: add null check for top-level primitive values") { +// If the primitive values are from Option, we need to do runtime null check. +val ds = Seq(Some(1), None).toDS().as[Int] +intercept[NullPointerException](ds.collect()) +val e = intercept[SparkException](ds.map(_ * 2).collect()) +assert(e.getCause.isInstanceOf[NullPointerException]) + +withTempPath { path => + Seq(new Integer(1), null).toDF("i").write.parquet(path.getCanonicalPath) --- End diff -- not a big deal, but `toDF("i")` is more explicit about column name. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19651 Hi, @cloud-fan and @gatorsmile . Could you review this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19697: [SPARK-22222][CORE][TEST][FOLLOW-UP] Remove redundant an...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19697 Thank you, @HyukjinKwon , @srowen , and @jiangxb1987 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19250 ok @squito can we send a new PR to do it? basically in parquet read task, get the writer info from the footer. If the writer is impala, and a config is set, we treat the seconds as seconds from epoch of session local time zone, and adjust the seconds to seconds from Unix epoch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19701 Please close this PR, @henryr . `branch-2.2` PR is not closed automatically. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19701 Thank you, @gatorsmile and @henryr ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19707 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19707 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19707 **[Test build #83644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83644/testReport)** for PR 19707 at commit [`dad5080`](https://github.com/apache/spark/commit/dad50806b27a40ed1112d8ee29b3bd5c60164170). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 ping @cloud-fan for review --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19707 LGTM except one minor comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19707: [SPARK-22472][SQL] add null check for top-level p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19707#discussion_r150015018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -1408,6 +1409,23 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkDataset(ds, SpecialCharClass("1", "2")) } } + + test("SPARK-22472: add null check for top-level primitive values") { +// If the primitive values are from Option, we need to do runtime null check. +val ds = Seq(Some(1), None).toDS().as[Int] +intercept[NullPointerException](ds.collect()) +val e = intercept[SparkException](ds.map(_ * 2).collect()) +assert(e.getCause.isInstanceOf[NullPointerException]) + +withTempPath { path => + Seq(new Integer(1), null).toDF("i").write.parquet(path.getCanonicalPath) --- End diff -- nit: `toDF()` also works. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r150011624 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -1034,11 +1034,18 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat schema.fields.map(f => (f.name, f.dataType)).toMap stats.colStats.foreach { case (colName, colStat) => colStat.toMap(colName, colNameTypeMap(colName)).foreach { case (k, v) => -statsProperties += (columnStatKeyPropName(colName, k) -> v) +val statKey = columnStatKeyPropName(colName, k) +val threshold = conf.get(SCHEMA_STRING_LENGTH_THRESHOLD) +if (v.length > threshold) { + throw new AnalysisException(s"Cannot persist '$statKey' into hive metastore as " + --- End diff -- what if we don't do it? will hive give us an exception? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19702 Is it available in parquet 1.8.2? that's the version Spark currently use. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user squito commented on the issue: https://github.com/apache/spark/pull/19702 hey thanks for doing this @cloud-fan but I have a small request -- can we get another day to review how this works, especially in connection with somewhat recent changes in parquet to include a [`isAdjustedToUTC`](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L271)? Just want to make sure this doesn't cause problems with resolving with / without time zone in parquet data later on. (don't think it should, just want to take a bit closer look) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19705 ok, now I question my own testing... does maven not run scalastyle tests? Or did I not run the tests properly somehow? I just ran mvn test from root, and it all seemed to work on my machine --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19666 @facaiy Your idea looks also reasonable. So we can use the condition "exclude the first bin" to do the pruning (filter out the other half symmetric splits). This condition looks simpler than `1 <= combNumber <= numSplists`, `Good idea ! And your code use another traverse order, my current PR is also backtracking, with different traverse order, but I think both of them works, and both of their complexity will be `O(2^n)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19708: [SPARK-22479][SQL] Exclude credentials from SaveintoData...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19708 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83643/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19702 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19702 **[Test build #83643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83643/testReport)** for PR 19702 at commit [`e10c806`](https://github.com/apache/spark/commit/e10c8062e3df5b5caa784b0c10ccd92cf56099d2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19708: [SPARK-22479][SQL] Exclude credentials from Savei...
GitHub user onursatici opened a pull request: https://github.com/apache/spark/pull/19708 [SPARK-22479][SQL] Exclude credentials from SaveintoDataSourceCommand.simpleString ## What changes were proposed in this pull request? Do not include jdbc properties which may contain credentials in logging a logical plan with `SaveIntoDataSourceCommand` in it. ## How was this patch tested? building locally and trying to reproduce (per the steps in https://issues.apache.org/jira/browse/SPARK-22479): ``` == Parsed Logical Plan == SaveIntoDataSourceCommand org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider@10ffe32f, ErrorIfExists +- Range (0, 100, step=1, splits=Some(8)) == Analyzed Logical Plan == SaveIntoDataSourceCommand org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider@10ffe32f, ErrorIfExists +- Range (0, 100, step=1, splits=Some(8)) == Optimized Logical Plan == SaveIntoDataSourceCommand org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider@10ffe32f, ErrorIfExists +- Range (0, 100, step=1, splits=Some(8)) == Physical Plan == Execute SaveIntoDataSourceCommand +- SaveIntoDataSourceCommand org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider@10ffe32f, ErrorIfExists +- Range (0, 100, step=1, splits=Some(8)) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/onursatici/spark os/redact-jdbc-creds Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19708 commit 04aa9f0363f6202a5358e41587415da4fa5f425e Author: osaticiDate: 2017-11-09T14:06:05Z do not log properties on SaveintoDataSourceCommand.simpleString --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17819 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17819 Merged to master. Thanks @viirya and all the reviewers! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83645/testReport)** for PR 19479 at commit [`8af3868`](https://github.com/apache/spark/commit/8af38687d638ae2d94d9f76955b182df02404cce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19661 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19661 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83642/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19661 **[Test build #83642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83642/testReport)** for PR 19661 at commit [`2eb1b62`](https://github.com/apache/spark/commit/2eb1b62c6fb281f89f05aa8a3c0fcd923ed62cf4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19707 **[Test build #83644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83644/testReport)** for PR 19707 at commit [`dad5080`](https://github.com/apache/spark/commit/dad50806b27a40ed1112d8ee29b3bd5c60164170). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19707: [SPARK-22472][SQL] add null check for top-level primitiv...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19707 cc @gatorsmile @kiszk @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19707: [SPARK-22472][SQL] add null check for top-level p...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/19707 [SPARK-22472][SQL] add null check for top-level primitive values ## What changes were proposed in this pull request? One powerful feature of `Dataset` is, we can easily map SQL rows to Scala/Java objects and do runtime null check automatically. For example, let's say we have a parquet file with schema ``, and we have a `case class Data(a: Int, b: String)`. Users can easily read this parquet file into `Data` objects, and Spark will throw NPE if column `a` has null values. However the null checking is left behind for top-level primitive values. For example, let's say we have a parquet file with schema ``, and we read it into Scala `Int`. If column `a` has null values, we will get some weird results. ``` scala> val ds = spark.read.parquet(...).as[Int] scala> ds.show() ++ |v | ++ |null| |1 | ++ scala> ds.collect res0: Array[Long] = Array(0, 1) scala> ds.map(_ * 2).show +-+ |value| +-+ |-2 | |2| +-+ ``` This is because internally Spark use some special default values for primitive types, but never expect users to see/operate these default value directly. This PR adds null check for top-level primitive values ## How was this patch tested? new test You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19707.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19707 commit dad50806b27a40ed1112d8ee29b3bd5c60164170 Author: Wenchen FanDate: 2017-11-09T13:39:10Z add null check for top-level primitive values --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/19666 In fact, I'm not sure whether the idea is right, so no hesitate to correct me. I assume the algorithm requires O(N^2) complexity. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19702 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83641/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19702 **[Test build #83641 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83641/testReport)** for PR 19702 at commit [`af62d30`](https://github.com/apache/spark/commit/af62d301ee9d2f3f9ed0a5797110b6388b78f3e6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19543: [SPARK-19606][MESOS] Support constraints in spark-dispat...
Github user pmackles commented on the issue: https://github.com/apache/spark/pull/19543 @felixcheung - any chance of getting this merged into the upcoming 2.2.1 release? I cleaned up the merge conflict --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19666: [SPARK-22451][ML] Reduce decision tree aggregate size fo...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/19666 Hi, I write a demo with python. I'll be happy if it could be useful. For N bins, say `[x_1, x_2, ..., x_N]`, since all its splits contain either `x_1` or not, so we can choose the half splits which doesn't contain x_1 as left splits. If I understand it correctly, the left splits are indeed all combinations of the left bins, `[x_2, x_3, ... x_N]`. The problem can be solved by the [backtracking algorithm](https://en.wikipedia.org/wiki/Backtracking). Please correct me if I'm wrong. Thanks very much. ```python #!/usr/bin/env python def gen_splits(bins): if len(bins) == 1: return bins results = [] partial_res = [] gen_splits_iter(1, bins, partial_res, results) return results def gen_splits_iter(dep, bins, partial_res, results): if partial_res: left_splits = partial_res[:] right_splits = [x for x in bins if x not in left_splits] results.append("left: {:20}, right: {}".format(str(left_splits), right_splits)) for m in range(dep, len(bins)): partial_res.append(bins[m]) gen_splits_iter(m+1, bins, partial_res, results) partial_res.pop() if __name__ == "__main__": print("first example:") bins = ["a", "b", "c"] print("bins: {}\n-".format(bins)) splits = gen_splits(bins) for s in splits: print(s) print("\n\n=") print("second example:") bins = ["a", "b", "c", "d", "e"] print("bins: {}\n-".format(bins)) splits = gen_splits(bins) for s in splits: print(s) ``` logs: ```bash ~/Downloads â¯â¯â¯ python test.py first example: bins: ['a', 'b', 'c'] - left: ['b'] , right: ['a', 'c'] left: ['b', 'c'] , right: ['a'] left: ['c'] , right: ['a', 'b'] = second example: bins: ['a', 'b', 'c', 'd', 'e'] - left: ['b'] , right: ['a', 'c', 'd', 'e'] left: ['b', 'c'] , right: ['a', 'd', 'e'] left: ['b', 'c', 'd'] , right: ['a', 'e'] left: ['b', 'c', 'd', 'e'], right: ['a'] left: ['b', 'c', 'e'] , right: ['a', 'd'] left: ['b', 'd'] , right: ['a', 'c', 'e'] left: ['b', 'd', 'e'] , right: ['a', 'c'] left: ['b', 'e'] , right: ['a', 'c', 'd'] left: ['c'] , right: ['a', 'b', 'd', 'e'] left: ['c', 'd'] , right: ['a', 'b', 'e'] left: ['c', 'd', 'e'] , right: ['a', 'b'] left: ['c', 'e'] , right: ['a', 'b', 'd'] left: ['d'] , right: ['a', 'b', 'c', 'e'] left: ['d', 'e'] , right: ['a', 'b', 'c'] left: ['e'] , right: ['a', 'b', 'c', 'd'] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149956415 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -527,27 +570,28 @@ private[ml] object SummaryBuilderImpl extends Logging { weightExpr: Expression, mutableAggBufferOffset: Int, inputAggBufferOffset: Int) -extends TypedImperativeAggregate[SummarizerBuffer] { +extends TypedImperativeAggregate[SummarizerBuffer] with ImplicitCastInputTypes { -override def eval(state: SummarizerBuffer): InternalRow = { +override def eval(state: SummarizerBuffer): Any = { --- End diff -- Both of them works, but other similar aggregate function also use `Any`. Will it cause some issues ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19515: [SPARK-22287][MESOS] SPARK_DAEMON_MEMORY not honored by ...
Github user pmackles commented on the issue: https://github.com/apache/spark/pull/19515 @felixcheung - any chance of getting this tiny change merged and included in the upcoming 2.2.1 release? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps
Github user zivanfi commented on the issue: https://github.com/apache/spark/pull/19250 Yes, that is correct. We introduced the table property to address the 2nd problem I mentioned above: "The adjustment depends on the local timezone." (details in my [previous comment](https://github.com/apache/spark/pull/19250#issuecomment-342787956)). But I think that a simpler workaround similar to what already exists in Hive would already be a big step forward for interoperability of existing data. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user ManchesterUnited16 commented on the issue: https://github.com/apache/spark/pull/19687 can you show me you maven dependency when you ran the program,thank you very much! At 2017-11-09 13:37:46, "Shixiong Zhu"wrote: @ManchesterUnited16 I ran your codes and didn't see NotSerializableException. How did you patch Spark with my PR? â You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19702 **[Test build #83643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83643/testReport)** for PR 19702 at commit [`e10c806`](https://github.com/apache/spark/commit/e10c8062e3df5b5caa784b0c10ccd92cf56099d2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149943555 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -94,46 +98,87 @@ object Summarizer extends Logging { * - min: the minimum for each coefficient. * - normL2: the Euclidian norm for each coefficient. * - normL1: the L1 norm of each coefficient (sum of the absolute values). - * @param firstMetric the metric being provided - * @param metrics additional metrics that can be provided. + * @param metrics metrics that can be provided. * @return a builder. * @throws IllegalArgumentException if one of the metric names is not understood. * * Note: Currently, the performance of this interface is about 2x~3x slower then using the RDD * interface. */ @Since("2.3.0") - def metrics(firstMetric: String, metrics: String*): SummaryBuilder = { -val (typedMetrics, computeMetrics) = getRelevantMetrics(Seq(firstMetric) ++ metrics) + @scala.annotation.varargs + def metrics(metrics: String*): SummaryBuilder = { --- End diff -- ah then it doesn't matter --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83638/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19695 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19695 **[Test build #83638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83638/testReport)** for PR 19695 at commit [`a6642fa`](https://github.com/apache/spark/commit/a6642fa41795cff82ec30c38e3c909d8025f358f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149941345 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -94,46 +98,87 @@ object Summarizer extends Logging { * - min: the minimum for each coefficient. * - normL2: the Euclidian norm for each coefficient. * - normL1: the L1 norm of each coefficient (sum of the absolute values). - * @param firstMetric the metric being provided - * @param metrics additional metrics that can be provided. + * @param metrics metrics that can be provided. * @return a builder. * @throws IllegalArgumentException if one of the metric names is not understood. * * Note: Currently, the performance of this interface is about 2x~3x slower then using the RDD * interface. */ @Since("2.3.0") - def metrics(firstMetric: String, metrics: String*): SummaryBuilder = { -val (typedMetrics, computeMetrics) = getRelevantMetrics(Seq(firstMetric) ++ metrics) + @scala.annotation.varargs + def metrics(metrics: String*): SummaryBuilder = { --- End diff -- This class was added after 2.2, does it matters ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19702#discussion_r149940418 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala --- @@ -982,7 +941,7 @@ class ParquetSchemaSuite extends ParquetSchemaTest { binaryAsString = true, int96AsTimestamp = false, writeLegacyParquetFormat = true, -int64AsTimestampMillis = true) +outputTimestampType = SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS) --- End diff -- Should we add a test for `TIMESTAMP_MICROS` just in case? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19702 LGTM pending tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19649: [SPARK-22405][SQL] Add new alter table and alter ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19649 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19649 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19664: [SPARK-22442][SQL] ScalaReflection should produce...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19664 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19664 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19661 **[Test build #83642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83642/testReport)** for PR 19661 at commit [`2eb1b62`](https://github.com/apache/spark/commit/2eb1b62c6fb281f89f05aa8a3c0fcd923ed62cf4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interfa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19156#discussion_r149928022 --- Diff: mllib/src/main/scala/org/apache/spark/ml/stat/Summarizer.scala --- @@ -94,46 +98,87 @@ object Summarizer extends Logging { * - min: the minimum for each coefficient. * - normL2: the Euclidian norm for each coefficient. * - normL1: the L1 norm of each coefficient (sum of the absolute values). - * @param firstMetric the metric being provided - * @param metrics additional metrics that can be provided. + * @param metrics metrics that can be provided. * @return a builder. * @throws IllegalArgumentException if one of the metric names is not understood. * * Note: Currently, the performance of this interface is about 2x~3x slower then using the RDD * interface. */ @Since("2.3.0") - def metrics(firstMetric: String, metrics: String*): SummaryBuilder = { -val (typedMetrics, computeMetrics) = getRelevantMetrics(Seq(firstMetric) ++ metrics) + @scala.annotation.varargs + def metrics(metrics: String*): SummaryBuilder = { --- End diff -- How about binary compatibility? e.g. spark jobs built with old spark versions, can they run on new Spark without re-compile? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19661: [SPARK-22450][Core][Mllib]safely register class f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19661#discussion_r149927241 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -178,6 +179,28 @@ class KryoSerializer(conf: SparkConf) kryo.register(Utils.classForName("scala.collection.immutable.Map$EmptyMap$")) kryo.register(classOf[ArrayBuffer[Any]]) +// We can't load those class directly in order to avoid unnecessary jar dependencies. +// We load them safely, ignore it if the class not found. +Seq("org.apache.spark.mllib.linalg.Vector", + "org.apache.spark.mllib.linalg.DenseVector", + "org.apache.spark.mllib.linalg.SparseVector", + "org.apache.spark.mllib.linalg.Matrix", + "org.apache.spark.mllib.linalg.DenseMatrix", + "org.apache.spark.mllib.linalg.SparseMatrix", + "org.apache.spark.ml.linalg.Vector", + "org.apache.spark.ml.linalg.DenseVector", + "org.apache.spark.ml.linalg.SparseVector", + "org.apache.spark.ml.linalg.Matrix", + "org.apache.spark.ml.linalg.DenseMatrix", + "org.apache.spark.ml.linalg.SparseMatrix", + "org.apache.spark.ml.feature.Instance", + "org.apache.spark.ml.feature.OffsetInstance" +).map(name => Try(Utils.classForName(name))).foreach { t => --- End diff -- a bit curious, can't we do ``` Seq( ... ).foreach { clsName => try { val cls = Utils.classForName(clsName) kryo.register(cls) } catch { case NonFatal(_) => // do nothing } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19532: [DOC]update the API doc and modify the stage API ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19532 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19661: [SPARK-22450][Core][Mllib]safely register class for mlli...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19661 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15332: [SPARK-10364][SQL] Support Parquet logical type TIMESTAM...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15332 great, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19532: [DOC]update the API doc and modify the stage API descrip...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19532 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19706: [SPARK-22476][R] Add dayofweek function to R
GitHub user HyukjinKwon reopened a pull request: https://github.com/apache/spark/pull/19706 [SPARK-22476][R] Add dayofweek function to R ## What changes were proposed in this pull request? This PR adds `dayofweek` to R API: ```r data <- list(list(d = as.Date("2012-12-13")), list(d = as.Date("2013-12-14")), list(d = as.Date("2014-12-15"))) df <- createDataFrame(data) collect(select(df, dayofweek(df$d))) ``` ``` dayofweek(d) 15 27 32 ``` ## How was this patch tested? Manual tests and unit tests in `R/pkg/tests/fulltests/test_sparkSQL.R` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark add-dayofweek Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19706.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19706 commit d24a89b6a756457c651d0c208ccbe59b979e9ecc Author: hyukjinkwonDate: 2017-11-08T11:31:35Z Add support for dayofweek function in R --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19706: [SPARK-22476][R] Add dayofweek function to R
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/19706 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19702 **[Test build #83641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83641/testReport)** for PR 19702 at commit [`af62d30`](https://github.com/apache/spark/commit/af62d301ee9d2f3f9ed0a5797110b6388b78f3e6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19702: [SPARK-10365][SQL] Support Parquet logical type T...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19702#discussion_r149924096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -428,15 +417,9 @@ object ParquetFileFormat extends Logging { private[parquet] def readSchema( footers: Seq[Footer], sparkSession: SparkSession): Option[StructType] = { -def parseParquetSchema(schema: MessageType): StructType = { - val converter = new ParquetSchemaConverter( -sparkSession.sessionState.conf.isParquetBinaryAsString, -sparkSession.sessionState.conf.isParquetBinaryAsString, -sparkSession.sessionState.conf.writeLegacyParquetFormat, -sparkSession.sessionState.conf.isParquetINT64AsTimestampMillis) - - converter.convert(schema) -} +val converter = new ParquetToSparkSchemaConverter( + sparkSession.sessionState.conf.isParquetBinaryAsString, + sparkSession.sessionState.conf.isParquetBinaryAsString) --- End diff -- good catch! It's an existing type ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19156 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83640/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19156 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19156: [SPARK-19634][SQL][ML][FOLLOW-UP] Improve interface of d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19156 **[Test build #83640 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83640/testReport)** for PR 19156 at commit [`2e4b232`](https://github.com/apache/spark/commit/2e4b232adabe45e9dcafad72ca9c1d3ba5b34dce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org