[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14803 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65812/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15205: [SPARK-16240][ML] ML persistence backward compati...
Github user jkbradley closed the pull request at: https://github.com/apache/spark/pull/15205 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14803 **[Test build #65812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65812/consoleFull)** for PR 14803 at commit [`e21536e`](https://github.com/apache/spark/commit/e21536e7c20253cf2c04f80041592d8b095dbff4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class DeleteFile(file: File) extends ExternalAction ` * ` trait ExternalAction extends StreamAction ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10212: [SPARK-12221] add cpu time to metrics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/10212 **[Test build #65815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65815/consoleFull)** for PR 10212 at commit [`f0ef503`](https://github.com/apache/spark/commit/f0ef503f9e732b91d405d1e15dade58e78999052). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15041 **[Test build #65814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65814/consoleFull)** for PR 15041 at commit [`4b29ded`](https://github.com/apache/spark/commit/4b29ded0e678a50c53a38bcac5d0b6906141558e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80184799 --- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala --- @@ -1097,7 +1100,9 @@ private[spark] object JsonProtocolSuite extends Assertions { | }, | "Task Metrics": { |"Executor Deserialize Time": 300, + |"Executor Deserialize CPU Time": 0, --- End diff -- Yeah I tested it on my testing cluster, but this makes sense. I will add non-zero CPU times by setting the CPU times same as given wall times. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #10212: [SPARK-12221] add cpu time to metrics
Github user jisookim0513 commented on a diff in the pull request: https://github.com/apache/spark/pull/10212#discussion_r80184744 --- Diff: core/src/test/resources/HistoryServerExpectations/complete_stage_list_json_expectation.json --- @@ -6,6 +6,7 @@ "numCompleteTasks" : 8, "numFailedTasks" : 0, "executorRunTime" : 162, + "executorCpuTime" : 0, --- End diff -- Oh no, these are expected outputs. I think the inputs are stored under `src/test/resources/spark-events`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15205: [SPARK-16240][ML] ML persistence backward compatibility ...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15205 Local testing worked, so merging with branch-2.0 now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user jwbear commented on the issue: https://github.com/apache/spark/pull/15102 Just curious looking at this, if you are comparing "sequential" offsets across partitions a rebalance would definitely affect this and, unless something has changed, it probably not a good idea to compare offsets from kafka across partitions. You could simply add an id/timestamp to the producer and send it with the message rather than using this methodology or if you must use offset query the broker for the full list and compare what you consumed to that list (small increase in latency btwn consumption and processing). This is from the Kafka paper, which makes me question your scheme: "...Note that our message ids are increasing but not consecutive. To compute the id of the next message, we have to add the length of the current message to its id." This means simply comparing which offsets are larger will not necessarily yield you the most recent message across partitions and definitely won't hold in a rebalance during which time some broker logs will be on hold and not consumed. In my own implementation, the offsets are great for message guarantees (eg delivery/consumption checks), because the broker has a full ordered list, but not for cross partition ordering. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14897 We also need to update [the Analyzer rule `ResolveRelations`](https://github.com/apache/spark/blob/248922fd4fb7c11a40304431e8cc667a8911a906/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L461-L462). Otherwise, the following query will fail: ```Scala sql(s"SELECT * from $globalTempDB.src").show() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14359: [SPARK-16719][ML] Random Forests should communica...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14359 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14359: [SPARK-16719][ML] Random Forests should communicate fewe...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/14359 Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14897: [SPARK-17338][SQL] add global temp view
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14897#discussion_r80183259 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -188,6 +199,10 @@ class SessionCatalog( def setCurrentDatabase(db: String): Unit = { val dbName = formatDatabaseName(db) +if (dbName == globalTempDB) { --- End diff -- When `globalTempDB` is set to a name that is not in the lower case, this compare is not right. Thus, `formatDatabaseName` need to be applied to both sides. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged file e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65811/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged file e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15210 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14698#discussion_r80182811 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala --- @@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends GeneratorDrivenPropertyChecks { // some expression is reusing variable names across different instances. // This behavior is tested in ExpressionEvalHelperSuite. val plan = generateProject( - GenerateUnsafeProjection.generate( + UnsafeProjection.create( --- End diff -- @lw-lin without this patch's changes to ExpressionEvalHelper.scala, this test still passes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged file e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15210 **[Test build #65811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65811/consoleFull)** for PR 15210 at commit [`20a6c4b`](https://github.com/apache/spark/commit/20a6c4b2116c8b41bf675e40c0bb9a5297225051). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15172 **[Test build #65813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65813/consoleFull)** for PR 15172 at commit [`da8aee6`](https://github.com/apache/spark/commit/da8aee619310cbe3525626bb652dcaf53beed42d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15090 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65810/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15090 **[Test build #65810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65810/consoleFull)** for PR 15090 at commit [`bb19f72`](https://github.com/apache/spark/commit/bb19f72789abc960efb937712512c0716fecd800). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14897: [SPARK-17338][SQL] add global temp view
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14897#discussion_r80180856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -36,6 +36,9 @@ import org.apache.spark.sql.catalyst.util.StringUtils object SessionCatalog { val DEFAULT_DATABASE = "default" + + val GLOBAL_TEMP_DB_CONF_KEY = "spark.sql.database.globalTemp" --- End diff -- Should we follow `spark.sql.catalogImplementation` and define it [here](https://github.com/apache/spark/blob/2cd1bfa4f0c6625b0ab1dbeba2b9586b9a6a9f42/core/src/main/scala/org/apache/spark/internal/config/package.scala#L95-L99)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15209: replace function type with function isinstance
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15209#discussion_r80180841 --- Diff: python/pyspark/ml/linalg/__init__.py --- @@ -101,7 +101,7 @@ def _vector_size(v): return len(v) elif type(v) in (array.array, list, tuple, xrange): --- End diff -- If this change is legitimate, we should change this to `isinstance(v, (array.array, list, tuple, xrange))` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15172 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15204: [SPARK-17639][build] Add jce.jar to buildclasspat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15204 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15204 Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15209: replace function type with function isinstance
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15209 I think we need a JIRA because `type` and `isinstance` are not exactly same. Also, maybe it'd better if the PR descriptions explains the bug and how this PR tries to resolve it. BTW, it seems you intend to support sub-classes via `isinstance` consistently across the API, right? If so, there are some instances similar with this. Maybe we should check those as well. ``` $ grep -r "type(.*) [=|\!]" . | grep .python | grep -v "tests.py" ./python/pyspark/ml/linalg/__init__.py:elif type(v) == np.ndarray: ./python/pyspark/ml/linalg/__init__.py:if type(other) == np.ndarray: ./python/pyspark/ml/linalg/__init__.py:if type(pairs) == dict: ./python/pyspark/ml/param/__init__.py:if type(value) == list: ./python/pyspark/ml/param/__init__.py:elif type(value) == np.unicode_: ./python/pyspark/ml/param/__init__.py:if type(value) == bool: ./python/pyspark/mllib/linalg/__init__.py:elif type(v) == np.ndarray: ./python/pyspark/mllib/linalg/__init__.py:if type(other) == np.ndarray: ./python/pyspark/mllib/linalg/__init__.py:if type(pairs) == dict: ./python/pyspark/mllib/stat/_statistics.py:if type(y) == str: ./python/pyspark/sql/column.py:if type(startPos) != type(length): ./python/pyspark/sql/readwriter.py:if type(path) != list: ./python/pyspark/sql/readwriter.py:if type(path) == list: ./python/pyspark/sql/streaming.py:if type(interval) != str or len(interval.strip()) == 0: ./python/pyspark/sql/streaming.py:if type(path) != str or len(path.strip()) == 0: ./python/pyspark/sql/streaming.py:if not outputMode or type(outputMode) != str or len(outputMode.strip()) == 0: ./python/pyspark/sql/streaming.py:if not queryName or type(queryName) != str or len(queryName.strip()) == 0: ./python/pyspark/sql/streaming.py:if type(processingTime) != str or len(processingTime.strip()) == 0: ./python/pyspark/sql/types.py:return type(self) == type(other) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15204 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65808/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15204 **[Test build #65808 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65808/consoleFull)** for PR 15204 at commit [`33fed28`](https://github.com/apache/spark/commit/33fed28341d387def52a915c762c3db8f5c01abd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15200: Skip building R vignettes if Spark is not built
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15200 we could hand write the result instead of running test/Spark to generate the vignettes? It seems that could be problematic if output are getting out of sync - and similar problem if we build doc without jar and then just skip the vignettes? maybe we should add vignettes to profile `-Psparkr` in Maven? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14818: [SPARK-17157][SPARKR][WIP]: Add multiclass logistic regr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14818 `glm` has a `link=logit` parameter? not sure if it maps to this http://www.statmethods.net/advstats/glm.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15082: [SPARK-17528][SQL] MutableProjection should not cache co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65807/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15082: [SPARK-17528][SQL] MutableProjection should not cache co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15082 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15082: [SPARK-17528][SQL] MutableProjection should not cache co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15082 **[Test build #65807 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65807/consoleFull)** for PR 15082 at commit [`c56de6d`](https://github.com/apache/spark/commit/c56de6da72c18b2cd1f65eed956cdee89371b075). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15207: [SPARK-17643] Remove comparable requirement from Offset
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15207 LGTM. You probably already checked this, but FWIW I verified the kafka topic deletion test does pass once this is merged: https://github.com/koeninger/spark-1/tree/kafka-source-deletion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14803 **[Test build #65812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65812/consoleFull)** for PR 14803 at commit [`e21536e`](https://github.com/apache/spark/commit/e21536e7c20253cf2c04f80041592d8b095dbff4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15208: [SPARK-17641][SQL] Collect_list/Collect_set should not c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65806/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15208: [SPARK-17641][SQL] Collect_list/Collect_set should not c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15208 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14803 > * What error is printed (if any) if an invalid partition directory is created midstream. The error is: [info] org.apache.spark.sql.streaming.StreamingQueryException: Query query-14 terminated with exception: assertio n failed: Conflicting partition column names detected: [info] [info] Partition column name list #0: partition2 [info] Partition column name list #1: partition [info] [info] For partitioned table directories, data files should only live in leaf directories. [info] And directories at the same level should have the same partition column name. [info] Please check the following directories for unexpected files or inconsistent partition column names: [info] [info] file:/root/repos/spark-1/target/tmp/streaming.src-c3a9895d-7be1-4ded-9154-7a24026513d7/partition2=bar [info] file:/root/repos/spark-1/target/tmp/streaming.src-c3a9895d-7be1-4ded-9154-7a24026513d7/partition=bar [info] file:/root/repos/spark-1/target/tmp/streaming.src-c3a9895d-7be1-4ded-9154-7a24026513d7/partition=foo [info] at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$Strea mExecution$$runBatches(StreamExecution.scala:211) [info] at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:124) [info] Cause: java.lang.AssertionError: assertion failed: Conflicting partition column names detected: [info] [info] Partition column name list #0: partition2 [info] Partition column name list #1: partition > * Are we okay if all of the data disappears (that has already been processed) and then new data arrives? I enhanced the added test to test this. It okay, if I understand your point correctly here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15208: [SPARK-17641][SQL] Collect_list/Collect_set should not c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15208 **[Test build #65806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65806/consoleFull)** for PR 15208 at commit [`37c4539`](https://github.com/apache/spark/commit/37c4539978f4e92fef9055dfae292b22392a0bf8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15195: [SPARK-17632][SQL]make console sink and other sin...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/15195#discussion_r80178129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -290,8 +284,8 @@ final class DataStreamWriter[T] private[sql](ds: Dataset[T]) { df, dataSource.createSink(outputMode), outputMode, -useTempCheckpointLocation = useTempCheckpointLocation, -recoverFromCheckpointLocation = recoverFromCheckpointLocation, +useTempCheckpointLocation = true, --- End diff -- AFAIK, It is not suitable to use temporary checkpoint location for other sinks beside "console", temporary directory will be deleted after process is finished. So the ability of checkpoint recovery is lost. Also for `ConsoleSink`, there's no consistency and failure recovery guarantee, so it should not set `recoverFromCheckpointLocation` to `true`. Correct me if I'm wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14644: [SPARK-14082][MESOS] Enable GPU support with Mesos
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/14644 @klueska Just updated the patch and I think it's using the right semantics now, where it has a global gpus max just like cores. Can you try it out? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15207: [SPARK-17643] Remove comparable requirement from Offset
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65805/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15207: [SPARK-17643] Remove comparable requirement from Offset
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15207 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15207: [SPARK-17643] Remove comparable requirement from Offset
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15207 **[Test build #65805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65805/consoleFull)** for PR 15207 at commit [`76ae1ba`](https://github.com/apache/spark/commit/76ae1ba1d899b02195e6008337d38739a81f6874). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait Offset extends Serializable ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15159: [SPARK-17605][SPARK_SUBMIT] Add option spark.usePython a...
Github user zjffdu commented on the issue: https://github.com/apache/spark/pull/15159 Add @rxin @davies @JoshRosen @shivaram for more feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15205: [SPARK-16240][ML] ML persistence backward compatibility ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15205 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15205: [SPARK-16240][ML] ML persistence backward compatibility ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15205 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65804/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15205: [SPARK-16240][ML] ML persistence backward compatibility ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15205 **[Test build #65804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65804/consoleFull)** for PR 15205 at commit [`a3d02ce`](https://github.com/apache/spark/commit/a3d02ce8ce8ccadd59c3df0c9748367a379f1b1d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged file e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15210 **[Test build #65811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65811/consoleFull)** for PR 15210 at commit [`20a6c4b`](https://github.com/apache/spark/commit/20a6c4b2116c8b41bf675e40c0bb9a5297225051). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15210: [SPARK-17604][SQL][Streaming] Supprt purging aged...
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/15210 [SPARK-17604][SQL][Streaming] Supprt purging aged file entries in FileStreamSourceLog ## What changes were proposed in this pull request? Currently with [SPARK-15698](https://issues.apache.org/jira/browse/SPARK-15698), FileStreamSource metadata log will be compacted periodically (10 batches by default), this means compacted batch file will contain whole file entries been processed. With the time passed, the compacted batch file will be accumulated to a very large file. With [SPARK-17165](https://issues.apache.org/jira/browse/SPARK-17165), now FileStreamSource doesn't track the aged file entry in memory, but in the log we still keep the full logs, this is not necessary and quite time-consuming during recovery. So here propose to also add file entry purging ability to remove aged file entries. ## How was this patch tested? Unit test added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jerryshao/apache-spark SPARK-17604 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15210 commit 20a6c4b2116c8b41bf675e40c0bb9a5297225051 Author: jerryshaoDate: 2016-09-23T02:20:12Z Supprt purging aged file entries in FileStreamSourceLog --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15209: replace function type with function isinstance
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15209 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15209: replace function type with function isinstance
GitHub user frankfqchen opened a pull request: https://github.com/apache/spark/pull/15209 replace function type with function isinstance ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/frankfqchen/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15209 commit a216d2d26d0a9275f0da5e97a0ceb6fb40ec1a29 Author: frankfqchenDate: 2016-09-23T03:07:31Z replace function type with function isinstance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15204 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65802/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15089: [SPARK-15621] [SQL] Support spilling for Python U...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15089#discussion_r80174300 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/RowQueue.scala --- @@ -0,0 +1,278 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.python + +import java.io._ + +import com.google.common.io.Closeables + +import org.apache.spark.SparkException +import org.apache.spark.memory.{MemoryConsumer, TaskMemoryManager} +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.unsafe.Platform +import org.apache.spark.unsafe.memory.MemoryBlock + +/** + * A RowQueue is an FIFO queue for UnsafeRow. + */ +private[python] trait RowQueue { + /** + * Add a row to the end of it, returns true iff the row has added into it. + */ + def add(row: UnsafeRow): Boolean + + /** + * Retrieve and remove the first row, returns null if it's empty. + * + * It can only be called after add is called. + */ + def remove(): UnsafeRow + + /** + * Cleanup all the resources. + */ + def close(): Unit +} + +/** + * A RowQueue that is based on in-memory page. UnsafeRows are appended into it until it's full. + * Another thread could read from it at the same time (behind the writer). + * + * The format of UnsafeRow in page: + * [4 bytes to hold length of record (N)] [N bytes to hold record] [...] + */ +private[python] abstract class InMemoryRowQueue(val page: MemoryBlock, numFields: Int) + extends RowQueue { + private val base: AnyRef = page.getBaseObject + private val endOfPage: Long = page.getBaseOffset + page.size + // the first location where a new row would be written + private var writeOffset = page.getBaseOffset + // points to the start of the next row to read + private var readOffset = page.getBaseOffset + private val resultRow = new UnsafeRow(numFields) + + def add(row: UnsafeRow): Boolean = { +val size = row.getSizeInBytes +if (writeOffset + 4 + size > endOfPage) { + // if there is not enough space in this page to hold the new record + if (writeOffset + 4 <= endOfPage) { +// if there's extra space at the end of the page, store a special "end-of-page" length (-1) +Platform.putInt(base, writeOffset, -1) + } + false +} else { + Platform.putInt(base, writeOffset, size) + Platform.copyMemory(row.getBaseObject, row.getBaseOffset, base, writeOffset + 4, size) + writeOffset += 4 + size + true +} + } + + def remove(): UnsafeRow = { +if (readOffset + 4 > endOfPage || Platform.getInt(base, readOffset) < 0) { + null +} else { + val size = Platform.getInt(base, readOffset) + resultRow.pointTo(base, readOffset + 4, size) + readOffset += 4 + size + resultRow +} + } +} + +/** + * A RowQueue that is backed by a file on disk. This queue will stop accepting new rows once any + * reader has begun reading from the queue. + */ +private[python] case class DiskRowQueue(file: File, fields: Int) extends RowQueue { + private var fout = new FileOutputStream(file.toString) + private var out = new DataOutputStream(new BufferedOutputStream(fout)) + private var unreadBytes = 0L + + private var fin: FileInputStream = _ + private var in: DataInputStream = _ + private val resultRow = new UnsafeRow(fields) + + def add(row: UnsafeRow): Boolean = synchronized { +if (out == null) { + // Another thread is reading, stop writing this one + return false +} +out.writeInt(row.getSizeInBytes) +out.write(row.getBytes) +unreadBytes += 4 + row.getSizeInBytes +true + } + + def remove(): UnsafeRow = synchronized { +if (out != null) { +
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15204 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15204 **[Test build #65802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65802/consoleFull)** for PR 15204 at commit [`d0136f5`](https://github.com/apache/spark/commit/d0136f585b10a7b9583a1e79417120b2d3219db2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15090 **[Test build #65810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65810/consoleFull)** for PR 15090 at commit [`bb19f72`](https://github.com/apache/spark/commit/bb19f72789abc960efb937712512c0716fecd800). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15206 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65803/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15206 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15206 **[Test build #65803 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65803/consoleFull)** for PR 15206 at commit [`f506a43`](https://github.com/apache/spark/commit/f506a43b5401844e568708cee6a354c4212e3dea). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class FileEntry(path: String, timestamp: Timestamp, batchId: Long) extends Serializable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 > I agree that if/when we add that ability to add existing partitions midstream we'd probably need to add two offsets in to the SQL offset for new partitions. It's not just existing partitions. If you have a low-value high-volume stream (which is the kind of situation where you'd want auto offset reset latest to begin with), you may not even want your first batch to have however many messages got in between creation and subscription rebalance. I dunno, I just don't want to assume too much. > I'd also support JSON here, but I would not mandate it (i.e. try json parsing and fall back to comma separation). Its not ambiguous, supports consistent usage, and doesn't penalize the simple use cases. Cool, seems reasonable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15041 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15041 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65809/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15041 **[Test build #65809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65809/consoleFull)** for PR 15041 at commit [`09e0740`](https://github.com/apache/spark/commit/09e0740bead1a4b2fd888abdbbdfdd404dff1ead). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15199: [SPARK-17635][SQL] Remove hardcode "agg_plan" in HashAgg...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/15199 Thanks all, it should be fixed in master only, my mistake. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when the data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15041 **[Test build #65809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65809/consoleFull)** for PR 15041 at commit [`09e0740`](https://github.com/apache/spark/commit/09e0740bead1a4b2fd888abdbbdfdd404dff1ead). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15204 **[Test build #65808 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65808/consoleFull)** for PR 15204 at commit [`33fed28`](https://github.com/apache/spark/commit/33fed28341d387def52a915c762c3db8f5c01abd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 > "I want to be able to add a topicpartition mid stream, but I don't want to start it from the beginning." I see, I was thinking only of new topics that appear that match your pattern. I agree that if/when we add that ability to add existing partitions midstream we'd probably need to add two offsets in to the SQL offset for new partitions. > I think consistency in using json for any non-scalar values is worth 2 extra characters per topic and 4 at the ends. I'd also support JSON here, but I would not mandate it (i.e. try json parsing and fall back to comma separation). Its not ambiguous, supports consistent usage, and doesn't penalize the simple use cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15082: [SPARK-17528][SQL] MutableProjection should not cache co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15082 **[Test build #65807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65807/consoleFull)** for PR 15082 at commit [`c56de6d`](https://github.com/apache/spark/commit/c56de6da72c18b2cd1f65eed956cdee89371b075). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65799/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14659: [SPARK-16757] Set up Spark caller context to HDFS and YA...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14659 **[Test build #65799 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65799/consoleFull)** for PR 14659 at commit [`47de8a2`](https://github.com/apache/spark/commit/47de8a2a9e1640e0ea942d1a689150d7b7a66c10). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15174: [SPARK-17502] [17609] [SQL] [Backport] [2.0] Fix ...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/15174 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15174: [SPARK-17502] [17609] [SQL] [Backport] [2.0] Fix Multipl...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15174 Let me close it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 @tdas I think as long as marmbrus' PR to remove comparable from the interface works for sane variations of subscription changes it's the best way to go. I'm honestly fine with someone getting what they deserve if they delete and recreate a topic in the space of a single batch or while a stream is down. @marmbrus > Why do you care when it acquired it? This isn't so much a temporal thing, as a let the consumer do its job thing. This sort of configuration should ideally be handled by auto.offset.reset, and we shouldn't bake in too much second guessing about it. There's plenty of use case for "I want to be able to add a topicpartition mid stream, but I don't want to start it from the beginning." > Are you proposing users have to type I'm saying that you guys proposed json as a workaround for the string->string thing. Given that, yeah, I think consistency in using json for any non-scalar values is worth 2 extra characters per topic and 4 at the ends. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15206 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15206 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65801/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15174: [SPARK-17502] [17609] [SQL] [Backport] [2.0] Fix Multipl...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15174 thanks, merging to 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15206: [SPARK-17640][SQL]Avoid using -1 as the default batchId ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15206 **[Test build #65801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65801/consoleFull)** for PR 15206 at commit [`d9177e5`](https://github.com/apache/spark/commit/d9177e5bd9cd89f70e4f5080587311d58a3a12f8). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class FileEntry(path: String, timestamp: Timestamp, batchId: Long) extends Serializable` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15208: [SPARK-17641][SQL] Collect_list/Collect_set should not c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15208 **[Test build #65806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65806/consoleFull)** for PR 15208 at commit [`37c4539`](https://github.com/apache/spark/commit/37c4539978f4e92fef9055dfae292b22392a0bf8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15208: [SPARK-17641][SQL] Collect_list/Collect_set should not c...
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/15208 cc @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15208: [SPARK-17641][SQL] Collect_list/Collect_set shoul...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/15208 [SPARK-17641][SQL] Collect_list/Collect_set should not collect null values. ## What changes were proposed in this pull request? We added native versions of `collect_set` and `collect_list` in Spark 2.0. These currently also (try to) collect null values, this is different from the original Hive implementation. This PR fixes this by adding a null check to the `Collect.update` method. ## How was this patch tested? Added a regression test to `DataFrameAggregateSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-17641 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15208.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15208 commit 37c4539978f4e92fef9055dfae292b22392a0bf8 Author: Herman van HovellDate: 2016-09-23T01:45:38Z Do not collect null values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/15102 Comparable requirement removed in #15207. > I think in the absence of prior information about the position in a topicpartition, you start a new batch on topic B starting from wherever the consumer's position was at the time it acquired the subscription, which might not be 0. I.e. you call position() before seekToEnd(). Why do you care when it acquired it? If it appeared in-between the the last batch and now, don't you want to consume all of the available data from it? Otherwise the answer is going to depend on the specifics on when you see the topic, which seems counter to the model of Structured Streaming. > I think the main thing that would be confusing is to specify topics in one way (custom-delimited string) for one configuration, and in another way (structured json) for another configuration. Are you proposing users have to type `"[\"topic1\", \"topic2\"]` (or pull in a json library) instead of `"topic1,topic2"`? Seems we could pretty seamlessly add support for JSON in the future, while still making the common case easy to type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15207: [SPARK-17643] Remove comparable requirement from Offset
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15207 **[Test build #65805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65805/consoleFull)** for PR 15207 at commit [`76ae1ba`](https://github.com/apache/spark/commit/76ae1ba1d899b02195e6008337d38739a81f6874). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65796/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15207: [SPARK-17643] Remove comparable requirement from ...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/15207 [SPARK-17643] Remove comparable requirement from Offset For some sources, it is difficult to provide a global ordering based only on the data in the offset. Since we don't use comparison for correctness, lets remove it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark removeComparable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15207.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15207 commit 76ae1ba1d899b02195e6008337d38739a81f6874 Author: Michael ArmbrustDate: 2016-09-23T01:34:38Z [SPARK-17643] Remove comparable requirement from Offset --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65796/consoleFull)** for PR 14971 at commit [`4c89d92`](https://github.com/apache/spark/commit/4c89d92ab65d7f4f061e32aa22780fd6e4b7c798). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15102 @koeninger I did some independent brainstorming with @zsxwing on topic deletion, and yeah I agree with you that attempting to account for deleted topics in the offset in the KafkaSourceOffset such that compareTo is satisfied is more complicated than just eliminating compareTo. That said, there are still a few corner case - of the same topic being deleted and recreated. I am not familiar with how often this can happen (let us know your thoughts). But the general idea we can implement that that we attach a unique id to the topic in the KafkaSourceOffset. Whenever the new topic is detected (while running or across query restarts), generate a unique id so that it is consider as a new topic. Here are the options **Option 1: When getOffset detects new topic, if the topic existed in previous offset, create new (topic, unique id)** - Pro: Simple - Con: Cannot detect if topic gets deleted+recreated between triggers (possibly, across query restarts), **Option 2: Use RebalanceListener to know when topic has been deleted** - Pro: Handles topic deletion+recreation between triggers while query is active - Con: Misses deletion+recreation during query restarts - Con: Listener called on different thread, so possible race conditions **Option 3: Use the creation time / cZxid of topic info stored in ZK to disambiguate** - Pro: Zookeeper maintains uniques ness across any component restarts - Con: Requires depending on full Kafka + ZK, - Con: Requires knowing the exact ZK path where topics are saved, but this can be tested and made sure that it never fails when we upgrade Kafka I feel that we should just keep it simple for now, and go for Option 1. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65795/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14971 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14971 **[Test build #65795 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65795/consoleFull)** for PR 14971 at commit [`f4c0ebb`](https://github.com/apache/spark/commit/f4c0ebb0901216ea09eaf3f77e4fdcd431b15d37). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15204: [SPARK-17639][build] Add jce.jar to buildclasspath when ...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/15204 If tests pass I'll merge this to unblock #15172. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15205: [SPARK-16240][ML] ML persistence backward compatibility ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15205 **[Test build #65804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65804/consoleFull)** for PR 15205 at commit [`a3d02ce`](https://github.com/apache/spark/commit/a3d02ce8ce8ccadd59c3df0c9748367a379f1b1d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65797/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9766 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to register Jav...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9766 **[Test build #65797 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65797/consoleFull)** for PR 9766 at commit [`dc31d78`](https://github.com/apache/spark/commit/dc31d78381e325e2b9af406bd1701594941866c9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org