[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72622/testReport)** for PR 16787 at commit [`df3597e`](https://github.com/apache/spark/commit/df3597ec28e71dc82c56f87464e6c12f3862ca95). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16854: [WIP][SPARK-15463][SQL] Add an API to load DataFrame fro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16854 **[Test build #72623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72623/testReport)** for PR 16854 at commit [`a7e8c2b`](https://github.com/apache/spark/commit/a7e8c2bfaf98c27885907caa21cce7e93d4afd1b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16858: [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16858 Thanks but the R test removal is unnecessary but probably ideally should be added back at some point. it was just parsing a sample URL - we should have one to make sure we don't break with the release share URL in which the sub directory structure is different from nightly snapshot builds. It's not urgent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16837 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16837 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72617/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16837 **[Test build #72617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72617/testReport)** for PR 16837 at commit [`2c4d3c7`](https://github.com/apache/spark/commit/2c4d3c71e971bd039936c43143718ba7091d6113). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r100225911 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -357,30 +361,70 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val jsonData = """{"a" 1}""" val schema = StructType(StructField("a", IntegerType) :: Nil) checkEvaluation( - JsonToStruct(schema, Map.empty, Literal(jsonData)), + JsonToStruct(schema, Map.empty, Literal(jsonData), gmtId), null ) // Other modes should still return `null`. checkEvaluation( - JsonToStruct(schema, Map("mode" -> ParseModes.PERMISSIVE_MODE), Literal(jsonData)), + JsonToStruct(schema, Map("mode" -> ParseModes.PERMISSIVE_MODE), Literal(jsonData), gmtId), null ) } test("from_json null input column") { val schema = StructType(StructField("a", IntegerType) :: Nil) checkEvaluation( - JsonToStruct(schema, Map.empty, Literal.create(null, StringType)), + JsonToStruct(schema, Map.empty, Literal.create(null, StringType), gmtId), null ) } + test("from_json with timestamp") { +val schema = StructType(StructField("t", TimestampType) :: Nil) + +val jsonData1 = """{"t": "2016-01-01T00:00:00.123Z"}""" +var c = Calendar.getInstance(DateTimeUtils.TimeZoneGMT) +c.set(2016, 0, 1, 0, 0, 0) +c.set(Calendar.MILLISECOND, 123) +checkEvaluation( + JsonToStruct(schema, Map.empty, Literal(jsonData1), gmtId), + InternalRow.fromSeq(c.getTimeInMillis * 1000L :: Nil) +) +checkEvaluation( + JsonToStruct(schema, Map.empty, Literal(jsonData1), Option("PST")), + InternalRow.fromSeq(c.getTimeInMillis * 1000L :: Nil) --- End diff -- FYI, it's because the input json string includes timezone string `"Z"`, which means GMT. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16415: [SPARK-19063][ML]Speedup and optimize the GradientBooste...
Github user zdh2292390 commented on the issue: https://github.com/apache/spark/pull/16415 @jkbradley @srowen Hi, sorry to bother, it's been a lot time since last discussion. Can you please tell me if there is a problem in my latest commit? Thank you very much! Expect reply. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r100225669 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -357,30 +361,70 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val jsonData = """{"a" 1}""" val schema = StructType(StructField("a", IntegerType) :: Nil) checkEvaluation( - JsonToStruct(schema, Map.empty, Literal(jsonData)), + JsonToStruct(schema, Map.empty, Literal(jsonData), gmtId), null ) // Other modes should still return `null`. checkEvaluation( - JsonToStruct(schema, Map("mode" -> ParseModes.PERMISSIVE_MODE), Literal(jsonData)), + JsonToStruct(schema, Map("mode" -> ParseModes.PERMISSIVE_MODE), Literal(jsonData), gmtId), null ) } test("from_json null input column") { val schema = StructType(StructField("a", IntegerType) :: Nil) checkEvaluation( - JsonToStruct(schema, Map.empty, Literal.create(null, StringType)), + JsonToStruct(schema, Map.empty, Literal.create(null, StringType), gmtId), null ) } + test("from_json with timestamp") { +val schema = StructType(StructField("t", TimestampType) :: Nil) + +val jsonData1 = """{"t": "2016-01-01T00:00:00.123Z"}""" +var c = Calendar.getInstance(DateTimeUtils.TimeZoneGMT) +c.set(2016, 0, 1, 0, 0, 0) +c.set(Calendar.MILLISECOND, 123) +checkEvaluation( + JsonToStruct(schema, Map.empty, Literal(jsonData1), gmtId), + InternalRow.fromSeq(c.getTimeInMillis * 1000L :: Nil) +) +checkEvaluation( + JsonToStruct(schema, Map.empty, Literal(jsonData1), Option("PST")), + InternalRow.fromSeq(c.getTimeInMillis * 1000L :: Nil) --- End diff -- I'm sorry, I should have added a comment. I'll add soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16750: [SPARK-18937][SQL] Timezone support in CSV/JSON p...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16750#discussion_r100225659 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -357,30 +361,70 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { val jsonData = """{"a" 1}""" val schema = StructType(StructField("a", IntegerType) :: Nil) checkEvaluation( - JsonToStruct(schema, Map.empty, Literal(jsonData)), + JsonToStruct(schema, Map.empty, Literal(jsonData), gmtId), null ) // Other modes should still return `null`. checkEvaluation( - JsonToStruct(schema, Map("mode" -> ParseModes.PERMISSIVE_MODE), Literal(jsonData)), + JsonToStruct(schema, Map("mode" -> ParseModes.PERMISSIVE_MODE), Literal(jsonData), gmtId), null ) } test("from_json null input column") { val schema = StructType(StructField("a", IntegerType) :: Nil) checkEvaluation( - JsonToStruct(schema, Map.empty, Literal.create(null, StringType)), + JsonToStruct(schema, Map.empty, Literal.create(null, StringType), gmtId), null ) } + test("from_json with timestamp") { +val schema = StructType(StructField("t", TimestampType) :: Nil) + +val jsonData1 = """{"t": "2016-01-01T00:00:00.123Z"}""" +var c = Calendar.getInstance(DateTimeUtils.TimeZoneGMT) +c.set(2016, 0, 1, 0, 0, 0) +c.set(Calendar.MILLISECOND, 123) +checkEvaluation( + JsonToStruct(schema, Map.empty, Literal(jsonData1), gmtId), + InternalRow.fromSeq(c.getTimeInMillis * 1000L :: Nil) --- End diff -- Thanks, I'll use it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][test-maven][test-hadoop2.6]Avoid usi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16859 **[Test build #3570 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3570/testReport)** for PR 16859 at commit [`1c88474`](https://github.com/apache/spark/commit/1c8847494c29d4b51182ecfeebb5cc85e000e7a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100225242 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- But not all of the keys are exposed as public APIs as far as I can see. e.g. calling the `save` method adds a "path" key to the option map, but is that key name a public API? I don't consider the key name a public API in this case (you can change it and existing applications will keep working). Similarly for the JDBC URL and table names. The public method is the API, not necessarily the keys used internally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100224399 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- As we already use `Map` as the user-facing API for `DataFrameWriter`, I think we can't change it. To make it consistent, I think it's reasonable to still pass `Map` to listeners. I agree it's bad that listeners to know these magic keys, but it's already the case of `DataFrameWriter`/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72622/testReport)** for PR 16787 at commit [`df3597e`](https://github.com/apache/spark/commit/df3597ec28e71dc82c56f87464e6c12f3862ca95). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: [SPARK-19115] [SQL] Supporting Create External Table Lik...
Github user ouyangxiaochen commented on the issue: https://github.com/apache/spark/pull/16638 Should I delete my local master repository firstlyï¼and fork a new one again? @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16736 LGTM, pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16862 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72614/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16862 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16862 **[Test build #72614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72614/testReport)** for PR 16862 at commit [`bdbe267`](https://github.com/apache/spark/commit/bdbe267617f14392455881862571561724f7d5de). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16797: [SPARK-19455][SQL] Add option for case-insensitive Parqu...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16797 @budde Spark does support mixed-case-schema tables, and it has always been. It's because we write table schema to metastore case-preserving, via table properties. When we read a table, we get schema from metastore and assume it's the schema of the table data files. So the data file schema must match the table schema, or Spark will fail, it has always been. However, there is one exception. There are 2 kinds of tables in Spark: data source tables and hive serde tables(we have different SQL syntax to create them). Data source tables are totally managed by Spark, we read/write data files directly and only use hive metastore as a persistent layer, which means data source tables are not compatible with hive, hive can't read/write it. For hive serde tables, it should be compatible with hive and we use hive api to read/write it. For any table, as long as hive can read it, Spark can read it. However, the exception is, for parquet and orc formats, we will read data files directly, as an optimization(reading using hive api is slow). Before Spark 2.1, we save schema to hive metastore directly, which means schema will be lowercased. w.r.t. this, ideally we should not support mixed-case-schema parquet/orc data files for this kind of table, or the data schema will mismatch the table schema. But we supported it, with the cost of runtime schema inference. This problem was solved in Spark 2.1, by writing table schema to metastore case-preserving for hive serde tables. Now we can say that, the data schema must match the table schema, or Spark should fail. Then comes to this problem: for parquet/orc format hive serde tables created by Spark prior to 2.1, the data file schema may not match the table schema, but we need to still support it for compatibility. That's why I prefer the migration command approach, it keeps the concept clean: data schema must match table schema. Like you said, users can still create a hive table with mixed-case-schema parquet/orc files, by hive or other systems like presto. This table is readable for hive, and for Spark prior to 2.1, because of the runtime schema inference. But this is not intentional, and Spark should not support it as the data file schema and table schema mismatch. We can make the migration command cover this case too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16760 **[Test build #72621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72621/testReport)** for PR 16760 at commit [`2473e0c`](https://github.com/apache/spark/commit/2473e0c440a9d1cd761ae6d704d0aa02c63afd83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/16760 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16859: [SPARK-17714][Core][maven][test-hadoop2.6]Avoid using Ex...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16859 btw I think you want `[test-maven]` to run the builder on maven. (last one ran on sbt.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #72620 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72620/testReport)** for PR 16386 at commit [`f71a465`](https://github.com/apache/spark/commit/f71a465cf07fb9c043b2ccd86fa57e8e8ea9dc00). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user NathanHowell commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100219153 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -194,5 +195,8 @@ class PortableDataStream( } def getPath(): String = path + + @Since("2.2.0") --- End diff -- Done, pushed in f71a465cf07fb9c043b2ccd86fa57e8e8ea9dc00 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72619 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72619/testReport)** for PR 16736 at commit [`f29c9d7`](https://github.com/apache/spark/commit/f29c9d77a683c1a63abac92f19210eadcb68682e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16864: [SPARK-19527][Core] Approximate Size of Intersection of ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16864 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16859: [SPARK-17714][Core][maven][test-hadoop2.6]Avoid u...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16859#discussion_r100218630 --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageDecoder.java --- @@ -35,6 +35,12 @@ private static final Logger logger = LoggerFactory.getLogger(MessageDecoder.class); + public static final MessageDecoder INSTANCE = new MessageDecoder(); + + private MessageDecoder() { +super(); --- End diff -- nit: not really necessary (also in the other class) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16864: [SPARK-19527][Core] Approximate Size of Intersect...
GitHub user Bcpoole opened a pull request: https://github.com/apache/spark/pull/16864 [SPARK-19527][Core] Approximate Size of Intersection of Bloom Filters **What changes were proposed in this pull request?** Added functions to get the Swamidass & Baldi (2007) approximation for number of items in a Bloom filter and the intersections of two filters. Added an exception type IncompatibleUnionException mimicing IncompatibleMergeException. As needed for the intersection approximation, there is a function that create the union of two Bloom filters (no mutations). **How was this patch tested?** Manual Tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/Bcpoole/spark approxItemsInBloomFilterIntersection Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16864.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16864 commit 7a3ad46ff86bd3d2d47f6a56bace1a0c4fd171c8 Author: BcpooleDate: 2017-02-09T01:11:07Z Swamidass & Baldi approx. items in intersection of two Bloom filters. Also function to create union (non-mutation) of two Bloom filters. commit b9680c57b2f8b1d93c28884de9a7ebbe52505f6c Author: Bcpoole Date: 2017-02-09T01:42:36Z Changed createUnionBloomFilter & approxItemsInIntersection to be instance instead of static functions commit 501ad7e22101b00862c0c77ef8c38e1b166d33a4 Author: Bcpoole Date: 2017-02-09T01:53:50Z Updated abstract class to reflect changes in previous commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16664: [SPARK-18120 ][SQL] Call QueryExecutionListener c...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/16664#discussion_r100218129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -218,7 +247,14 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { bucketSpec = getBucketSpec, options = extraOptions.toMap) -dataSource.write(mode, df) +val destination = source match { + case "jdbc" => extraOptions.get(JDBCOptions.JDBC_TABLE_NAME) + case _ => extraOptions.get("path") --- End diff -- So that comment is about different types of queries that might have different extra information. I still think a generic map is the wrong idea for fixing that. You could, for example, have this parameter be `Any` (or some tagging interface, e.g. `trait QueryExecutionParams` or some such), and the listener can then match to find the right type. That is extensible (new types can be added without breaking existing ones) and wouldn't require this API to change in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16856 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72618/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16856 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16856 **[Test build #72618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72618/testReport)** for PR 16856 at commit [`a5b91fd`](https://github.com/apache/spark/commit/a5b91fd5e261f27df9e088f5ee16929a7489be6d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16386: [SPARK-18352][SQL] Support parsing multiline json...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16386#discussion_r100217653 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -194,5 +195,8 @@ class PortableDataStream( } def getPath(): String = path + + @Since("2.2.0") --- End diff -- SGTM, can you take a look at other public methods in this class and add since tag for them? or it looks weird that only one method has since tag... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15009 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15009 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72610/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15009: [SPARK-17443][SPARK-11035] Stop Spark Application if lau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15009 **[Test build #72610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72610/testReport)** for PR 15009 at commit [`2707d21`](https://github.com/apache/spark/commit/2707d219f3f2c3fa1e89553809cf3a8d118fc084). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72615/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16863: Swamidass & Baldi Approximations
Github user Bcpoole closed the pull request at: https://github.com/apache/spark/pull/16863 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72615 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72615/testReport)** for PR 16715 at commit [`4bc670c`](https://github.com/apache/spark/commit/4bc670cf4e512953019d97cfd57413158f31377a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16863: Swamidass & Baldi Approximations
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16863 Please review http://spark.apache.org/contributing.html before opening a pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16856 **[Test build #72618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72618/testReport)** for PR 16856 at commit [`a5b91fd`](https://github.com/apache/spark/commit/a5b91fd5e261f27df9e088f5ee16929a7489be6d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16863: Swamidass & Baldi Approximations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16863 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16674: [SPARK-19331][SQL][TESTS] Improve the test coverage of S...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16674 @gatorsmile @cloud-fan Could you please look at this when you have time? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16863: Swamidass & Baldi Approximations
GitHub user Bcpoole reopened a pull request: https://github.com/apache/spark/pull/16863 Swamidass & Baldi Approximations ## What changes were proposed in this pull request? Added functions to get the Swamidass & Baldi (2007) approximation for number of items in a Bloom filter and the intersections of two filters. Added an exception type IncompatibleUnionException mimicing IncompatibleMergeException. As needed for the intersection approximation, there is a function that create the union of two Bloom filters (no mutations). ## How was this patch tested? Manual Tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Bcpoole/spark approxItemsInBloomFilterIntersection Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16863.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16863 commit 7a3ad46ff86bd3d2d47f6a56bace1a0c4fd171c8 Author: BcpooleDate: 2017-02-09T01:11:07Z Swamidass & Baldi approx. items in intersection of two Bloom filters. Also function to create union (non-mutation) of two Bloom filters. commit b9680c57b2f8b1d93c28884de9a7ebbe52505f6c Author: Bcpoole Date: 2017-02-09T01:42:36Z Changed createUnionBloomFilter & approxItemsInIntersection to be instance instead of static functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16856: [SPARK-19516][DOC] update public doc to use SparkSession...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16856 several questions: 1. as `SparkSession` becomes the new entry point, now we require users to include spark-sql module in their sbt file or pom.xml, which seems weird if users only want to write a very simple spark application without SQL functionality. Shall we rename spark-sql to a more general name like `spark-structure`? 2. when it comes to java application with RDD, it's inconvenient to build `SparkSession` and then create `JavaSparkContext`, users can create `JavaSparkContext` directly. cc @rxin @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16856#discussion_r100213922 --- Diff: docs/programming-guide.md --- @@ -244,13 +239,13 @@ use IPython, set the `PYSPARK_DRIVER_PYTHON` variable to `ipython` when running $ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark {% endhighlight %} -To use the Jupyter notebook (previously known as the IPython notebook), --- End diff -- sorry my editor removes unnecessary whitespace automatically... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16856: [SPARK-19516][DOC] update public doc to use Spark...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16856#discussion_r100213837 --- Diff: docs/programming-guide.md --- @@ -77,9 +76,9 @@ In addition, if you wish to access an HDFS cluster, you need to add a dependency Finally, you need to import some Spark classes into your program. Add the following lines: {% highlight scala %} -import org.apache.spark.api.java.JavaSparkContext -import org.apache.spark.api.java.JavaRDD -import org.apache.spark.SparkConf +import org.apache.spark.api.java.JavaSparkContext; --- End diff -- hmm, shouldn't it be java code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16804: [SPARK-19459][SQL] Add Hive datatype (char/varcha...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16804#discussion_r100213581 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala --- @@ -152,14 +152,39 @@ abstract class OrcSuite extends QueryTest with TestHiveSingleton with BeforeAndA assert(new OrcOptions(Map("Orc.Compress" -> "NONE")).compressionCodec == "NONE") } - test("SPARK-18220: read Hive orc table with varchar column") { + test("SPARK-19459/SPARK-18220: read char/varchar column written by Hive") { val hiveClient = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog].client +val location = Utils.createTempDir().toURI --- End diff -- shall we remove this temp dir in the finally block? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 I read jkbradley's thoughts here, so I will modify this as following: first we need 4 traits, using the following hierarchy: LogisticRegressionSummary LogisticRegressionTrainingSummary: LogisticRegressionSummary BinaryLogisticRegressionSummary: LogisticRegressionSummary BinaryLogisticRegressionTrainingSummary: LogisticRegressionTrainingSummary, BinaryLogisticRegressionSummary and the public method such as `def summary` only return trait type listed above. and then implement 4 concrete classes: `LogisticRegressionSummaryImpl` (multiclass case) `LogisticRegressionTrainingSummaryImpl` (multiclass case) `BinaryLogisticRegressionSummaryImpl` (binary case). `BinaryLogisticRegressionTrainingSummaryImpl` (binary case). Is that right ? @jkbradley @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16863: Swamidass & Baldi Approximations
Github user Bcpoole closed the pull request at: https://github.com/apache/spark/pull/16863 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16837: [SPARK-19359][SQL] renaming partition should not leave u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16837 **[Test build #72617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72617/testReport)** for PR 16837 at commit [`2c4d3c7`](https://github.com/apache/spark/commit/2c4d3c71e971bd039936c43143718ba7091d6113). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: Fork SparkSession with option to inherit a copy of the S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72606/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16863: Swamidass & Baldi Approximations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16863 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: Fork SparkSession with option to inherit a copy of the S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72613/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: Fork SparkSession with option to inherit a copy of the S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16826 **[Test build #72606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72606/testReport)** for PR 16826 at commit [`a343d8a`](https://github.com/apache/spark/commit/a343d8af9c577158042e4af9f8832f46aeecd509). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16863: Swamidass & Baldi Approximations
GitHub user Bcpoole opened a pull request: https://github.com/apache/spark/pull/16863 Swamidass & Baldi Approximations ## What changes were proposed in this pull request? Added functions to get the Swamidass & Baldi (2007) approximation for number of items in a Bloom filter and the intersections of two filters. Added an exception type IncompatibleUnionException mimicing IncompatibleMergeException. As needed for the intersection approximation, there is a function that create the union of two Bloom filters (no mutations). ## How was this patch tested? Manual Tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Bcpoole/spark approxItemsInBloomFilterIntersection Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16863.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16863 commit 7a3ad46ff86bd3d2d47f6a56bace1a0c4fd171c8 Author: BcpooleDate: 2017-02-09T01:11:07Z Swamidass & Baldi approx. items in intersection of two Bloom filters. Also function to create union (non-mutation) of two Bloom filters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16800 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16736: [SPARK-19265][SQL][Follow-up] Configurable `table...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16736#discussion_r100211955 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfEntrySuite.scala --- @@ -171,4 +171,14 @@ class SQLConfEntrySuite extends SparkFunSuite { buildConf(key).stringConf.createOptional } } + + test("StaticSQLConf.FILESOURCE_TABLE_RELATION_CACHE_SIZE") { +val confEntry = StaticSQLConf.FILESOURCE_TABLE_RELATION_CACHE_SIZE +assert(conf.getConf(confEntry) === 1000) +conf.setConf(confEntry, -1) --- End diff -- please also test `setConfString`, which should check the value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16800 **[Test build #72613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72613/testReport)** for PR 16800 at commit [`bbc72e1`](https://github.com/apache/spark/commit/bbc72e1b02d6b1125cd125986beb5236a30b9d7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16787 **[Test build #72616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72616/testReport)** for PR 16787 at commit [`bf09f15`](https://github.com/apache/spark/commit/bf09f15ca7c90138312eb73b819131adf16ac040). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16787: [SPARK-19448][SQL]optimize some duplication functions in...
Github user windpiger commented on the issue: https://github.com/apache/spark/pull/16787 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72615 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72615/testReport)** for PR 16715 at commit [`4bc670c`](https://github.com/apache/spark/commit/4bc670cf4e512953019d97cfd57413158f31377a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16861: Added function to get union of 2 Bloom filters (n...
Github user Bcpoole closed the pull request at: https://github.com/apache/spark/pull/16861 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16862 **[Test build #72614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72614/testReport)** for PR 16862 at commit [`bdbe267`](https://github.com/apache/spark/commit/bdbe267617f14392455881862571561724f7d5de). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72605/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16736 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16736: [SPARK-19265][SQL][Follow-up] Configurable `tableRelatio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16736 **[Test build #72605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72605/testReport)** for PR 16736 at commit [`314f6f8`](https://github.com/apache/spark/commit/314f6f8de6990b1c3bfddea503490a1797e25117). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16862: [SPARK-19520][streaming] Do not encrypt data writ...
GitHub user vanzin opened a pull request: https://github.com/apache/spark/pull/16862 [SPARK-19520][streaming] Do not encrypt data written to the WAL. Spark's I/O encryption uses an ephemeral key for each driver instance. So driver B cannot decrypt data written by driver A since it doesn't have the correct key. The write ahead log is used for recovery, thus needs to be readable by a different driver. So it cannot be encrypted by Spark's I/O encryption code. The BlockManager APIs used by the WAL code to write the data automatically encrypt data, so changes are needed so that callers can to opt out of encryption. Aside from that, the "putBytes" API in the BlockManager does not do encryption, so a separate situation arised where the WAL would write unencrypted data to the BM and, when those blocks were read, decryption would fail. So the WAL code needs to ask the BM to encrypt that data when encryption is enabled; this code is not optimal since it results in a (temporary) second copy of the data block in memory, but should be OK for now until a more performant solution is added. The non-encryption case should not be affected. Tested with new unit tests, and by running streaming apps that do recovery using the WAL data with I/O encryption turned on. You can merge this pull request into a Git repository by running: $ git pull https://github.com/vanzin/spark SPARK-19520 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16862.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16862 commit bdbe267617f14392455881862571561724f7d5de Author: Marcelo VanzinDate: 2017-02-08T22:18:14Z [SPARK-19520][streaming] Do not encrypt data written to the WAL. Spark's I/O encryption uses an ephemeral key for each driver instance. So driver B cannot decrypt data written by driver A since it doesn't have the correct key. The write ahead log is used for recovery, thus needs to be readable by a different driver. So it cannot be encrypted by Spark's I/O encryption code. The BlockManager APIs used by the WAL code to write the data automatically encrypt data, so changes are needed so that callers can to opt out of encryption. Aside from that, the "putBytes" API in the BlockManager does not do encryption, so a separate situation arised where the WAL would write unencrypted data to the BM and, when those blocks were read, decryption would fail. So the WAL code needs to ask the BM to encrypt that data when encryption is enabled; this code is not optimal since it results in a (temporary) second copy of the data block in memory, but should be OK for now until a more performant solution is added. The non-encryption case should not be affected. Tested with new unit tests, and by running streaming apps that do recovery using the WAL data with I/O encryption turned on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16862: [SPARK-19520][streaming] Do not encrypt data written to ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16862 @tdas @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16804 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72604/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16804 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16804: [SPARK-19459][SQL] Add Hive datatype (char/varchar) to S...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16804 **[Test build #72604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72604/testReport)** for PR 16804 at commit [`e7ca0ea`](https://github.com/apache/spark/commit/e7ca0ead843f2c9650e690fe649be18fa6389e48). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15415 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72611/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15415 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16386 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15415 **[Test build #72611 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72611/testReport)** for PR 15415 at commit [`049e1a3`](https://github.com/apache/spark/commit/049e1a326daee4c55edb6d65090fafd229b93b6a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16386 **[Test build #72603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72603/testReport)** for PR 16386 at commit [`30fb509`](https://github.com/apache/spark/commit/30fb509c71ed1f919e0198f2366aa817d96fc0ca). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16825: [SPARK-19481][REPL][maven]Avoid to leak SparkCont...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16825#discussion_r100206338 --- Diff: repl/src/main/scala/org/apache/spark/repl/Signaling.scala --- @@ -28,15 +28,17 @@ private[repl] object Signaling extends Logging { * when no jobs are currently running. * This makes it possible to interrupt a running shell job by pressing Ctrl+C. */ - def cancelOnInterrupt(ctx: SparkContext): Unit = SignalUtils.register("INT") { -if (!ctx.statusTracker.getActiveJobIds().isEmpty) { - logWarning("Cancelling all active jobs, this can take a while. " + -"Press Ctrl+C again to exit now.") - ctx.cancelAllJobs() - true -} else { - false -} + def cancelOnInterrupt(): Unit = SignalUtils.register("INT") { --- End diff -- It's used by REPL to cancel the running job if any. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16825: [SPARK-19481][REPL][maven]Avoid to leak SparkCont...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/16825#discussion_r100206125 --- Diff: repl/src/main/scala/org/apache/spark/repl/Signaling.scala --- @@ -28,15 +28,17 @@ private[repl] object Signaling extends Logging { * when no jobs are currently running. * This makes it possible to interrupt a running shell job by pressing Ctrl+C. */ - def cancelOnInterrupt(ctx: SparkContext): Unit = SignalUtils.register("INT") { -if (!ctx.statusTracker.getActiveJobIds().isEmpty) { - logWarning("Cancelling all active jobs, this can take a while. " + -"Press Ctrl+C again to exit now.") - ctx.cancelAllJobs() - true -} else { - false -} + def cancelOnInterrupt(): Unit = SignalUtils.register("INT") { --- End diff -- Who is using this one? Is this a breaking change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16830: [MINOR][CORE] Fix incorrect documentation of WritableCon...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16830 **[Test build #3569 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3569/testReport)** for PR 16830 at commit [`c300ff6`](https://github.com/apache/spark/commit/c300ff6e0a2d802c474b2af5e1bea9afb8101a2c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16852: [SPARK-19512][SQL] codegen for compare structs fails
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16852 **[Test build #3568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3568/testReport)** for PR 16852 at commit [`9a8d853`](https://github.com/apache/spark/commit/9a8d8537748f38a4276188b3f60f6852010e3387). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16861: Added function to get union of 2 Bloom filters (no mutat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16861 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16800 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72608/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16800 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16800 **[Test build #72608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72608/testReport)** for PR 16800 at commit [`a180126`](https://github.com/apache/spark/commit/a1801261335cf08fc87f17d76979fda6fdcc1969). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16861: Added function to get union of 2 Bloom filters (n...
GitHub user Bcpoole opened a pull request: https://github.com/apache/spark/pull/16861 Added function to get union of 2 Bloom filters (no mutation). Added S⦠â¦wamidass & Baldi approximations for number of items in a Bloom filter and for the intersection of 2 Bloom filters given those Bloom filters. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Bcpoole/spark intersectionOfBloomFilters Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16861.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16861 commit 934766b6527f512d8d81d2533899583b98d0219c Author: BcpooleDate: 2017-02-08T23:59:41Z Added function to get union of 2 Bloom filters (no mutation). Added Swamidass & Baldi approximations for number of items in a Bloom filter and for the intersection of 2 Bloom filters given those Bloom filters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16800: [SPARK-19456][SparkR]:Add LinearSVC R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16800 **[Test build #72613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72613/testReport)** for PR 16800 at commit [`bbc72e1`](https://github.com/apache/spark/commit/bbc72e1b02d6b1125cd125986beb5236a30b9d7d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16860: [SPARK-18613][ML] make spark.mllib LDA dependencies in s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16860 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16860: [SPARK-18613][ML] make spark.mllib LDA dependencies in s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16860 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72607/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16860: [SPARK-18613][ML] make spark.mllib LDA dependencies in s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16860 **[Test build #72607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72607/testReport)** for PR 16860 at commit [`ce8abb3`](https://github.com/apache/spark/commit/ce8abb308c0b555754d332712d922a7aa9a8b220). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72609/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72609/testReport)** for PR 16715 at commit [`8e5468f`](https://github.com/apache/spark/commit/8e5468f6946a8f2c051746ddb0c0e65586bd1eed). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72612/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16715 **[Test build #72612 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72612/testReport)** for PR 16715 at commit [`6e85e1a`](https://github.com/apache/spark/commit/6e85e1a04b02dea26e82c8bc77151b8e389f4fe5). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16715: [Spark-18080][ML] Python API & Examples for Locality Sen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16715 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org