[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21775 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93311/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21805#discussion_r203945646 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala --- @@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with SharedSQLContext with TimeLimits // first time use, load cache checkDataset(df5, Row(10)) } + + test("SPARK-24850 InMemoryRelation string representation does not include cached plan") { +val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution +val inMemoryRelation = InMemoryRelation( + true, + 1000, + StorageLevel.MEMORY_ONLY, + dummyQueryExecution.sparkPlan, + Some("test-relation"), + dummyQueryExecution.logical) + + assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString)) +assert(inMemoryRelation.simpleString.contains( + "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 replicas))")) --- End diff -- Or we might not need the batch size in the plan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21805: [SPARK-24850][SQL] fix str representation of Cach...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21805#discussion_r203945605 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetCacheSuite.scala --- @@ -206,4 +206,19 @@ class DatasetCacheSuite extends QueryTest with SharedSQLContext with TimeLimits // first time use, load cache checkDataset(df5, Row(10)) } + + test("SPARK-24850 InMemoryRelation string representation does not include cached plan") { +val dummyQueryExecution = spark.range(0, 1).toDF().queryExecution +val inMemoryRelation = InMemoryRelation( + true, + 1000, + StorageLevel.MEMORY_ONLY, + dummyQueryExecution.sparkPlan, + Some("test-relation"), + dummyQueryExecution.logical) + + assert(!inMemoryRelation.simpleString.contains(dummyQueryExecution.sparkPlan.toString)) +assert(inMemoryRelation.simpleString.contains( + "CachedRDDBuilder(true, 1000, StorageLevel(memory, deserialized, 1 replicas))")) --- End diff -- `true` and `1000` look confusing to end users. Can we improve it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21775: [SPARK-24812][SQL] Last Access Time in the table descrip...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21775 **[Test build #93311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93311/testReport)** for PR 21775 at commit [`b527fdc`](https://github.com/apache/spark/commit/b527fdc5919296ffa12e1be54367b9132ecee61e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21774#discussion_r203945436 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -36,4 +40,27 @@ package object avro { @scala.annotation.varargs def avro(sources: String*): DataFrame = reader.format("avro").load(sources: _*) } + + /** + * Converts a binary column of avro format into its corresponding catalyst value. The specified + * schema must match the read data, otherwise the behavior is undefined: it may fail or return + * arbitrary result. + * + * @param data the binary column. + * @param avroType the avro type. + */ + @Experimental + def from_avro(data: Column, avroType: Schema): Column = { --- End diff -- ah sorry i thought you are talking about the `data` parameter. Yes, for `avroType` parameter, we should have a string version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21805: [SPARK-24850][SQL] fix str representation of CachedRDDBu...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21805 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org