[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15359 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15359 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66377/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14828: [SPARK-17258][SQL] Parse scientific decimal liter...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14828 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15359 **[Test build #66377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66377/consoleFull)** for PR 15359 at commit [`0d9a9c7`](https://github.com/apache/spark/commit/0d9a9c74ca714b1df3dde50f2c0386a4a974fa73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15355 Actually we do have the infra to not run these tests if it is just an unrelated module change. Was those not setup for Kafka? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14828: [SPARK-17258][SQL] Parse scientific decimal literals as ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14828 LGTM - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66369/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15318 Yea @gatorsmile you should try. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66369/consoleFull)** for PR 15307 at commit [`e708b3b`](https://github.com/apache/spark/commit/e708b3b86a69833169962713ce8bef88bcbdc2f7). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait PeriodicWarning ` * ` case class CallLocation(className: String, lineNum: Int, threadId: Long)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15356 Ah, sure. I just wanted to include as many as I could as I already opened this but I will keep it in mind. Thank you for your tip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81908785 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -185,25 +203,36 @@ class StreamExecution( SparkSession.setActiveSession(sparkSession) triggerExecutor.execute(() => { -if (isActive) { - if (currentBatchId < 0) { -// We'll do this initialization only once -populateStartOffsets() -logDebug(s"Stream running from $committedOffsets to $availableOffsets") - } else { -constructNextBatch() - } - if (dataAvailable) { -runBatch() -// We'll increase currentBatchId after we complete processing current batch's data -currentBatchId += 1 +streamMetrics.reportTriggerStarted(currentBatchId) +streamMetrics.reportTriggerInfo(STATUS_MESSAGE, "Finding new data from sources") +val isTerminated = timeIt(TRIGGER_LATENCY) { + if (isActive) { +if (currentBatchId < 0) { + // We'll do this initialization only once + populateStartOffsets() + logDebug(s"Stream running from $committedOffsets to $availableOffsets") +} else { + constructNextBatch() +} +if (dataAvailable) { + streamMetrics.reportTriggerInfo(STATUS_MESSAGE, "Processing new data") + streamMetrics.reportTriggerInfo(DATA_AVAILABLE, true) + runBatch() + // We'll increase currentBatchId after we complete processing current batch's data + currentBatchId += 1 +} else { + streamMetrics.reportTriggerInfo(STATUS_MESSAGE, "No new data") + streamMetrics.reportTriggerInfo(DATA_AVAILABLE, false) + Thread.sleep(100) --- End diff -- Fixed it. I goofed up while merging with master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15318: [SPARK-17750][SQL] Fix CREATE VIEW with INTERVAL arithme...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/15318 Hi, @gatorsmile . Could you merge this PR? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66367/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15356 Same thing, I think. I agree it should _probably_ be closed but given that there's not a big hurry, I like to do the courtesy of giving at least days if not a week for someone to reply. I just put it on my big to-do list for later (FWIW I manage all this stuff with Gmail Inbox snooze / reminders and it works quite well for me, to easily "check on X and act in Y days") --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15249 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15249 **[Test build #66367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66367/consoleFull)** for PR 15249 at commit [`a6c863f`](https://github.com/apache/spark/commit/a6c863f2462986b66a93f0beac3bb1f163afa50d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15357 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66368/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15357 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15357 **[Test build #66368 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66368/consoleFull)** for PR 15357 at commit [`4b60195`](https://github.com/apache/spark/commit/4b601951e4b3311501363ac4de864c4bf9a1a756). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15356 Do you mind if I ask to confirm https://github.com/apache/spark/pull/12355 too? if you are uncertain, I'd like to not add it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh
Github user weiqingy commented on the issue: https://github.com/apache/spark/pull/15246 Hi, @srowen Could you please re-trigger jenkins to retest this? I think the the failure of `org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite.pattern based subscription` is not caused by the change of this PR. Thank you very much. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15351 **[Test build #66379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66379/consoleFull)** for PR 15351 at commit [`5ced339`](https://github.com/apache/spark/commit/5ced339c1d6fd64c3bdfcb2af3522dc88ede8d85). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15356 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15356 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66366/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15356 **[Test build #66366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66366/consoleFull)** for PR 15356 at commit [`a307b5e`](https://github.com/apache/spark/commit/a307b5e40d59e5ce40a0c3986a6db1553acea50a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15359: [Minor][ML] Avoid 2D array flatten in NB training...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15359#discussion_r81907248 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/NaiveBayes.scala --- @@ -177,7 +177,7 @@ class NaiveBayes @Since("1.5.0") ( val numDocuments = aggregated.map(_._2._1).sum val piArray = Array.fill[Double](numLabels)(0.0) -val thetaArrays = Array.fill[Double](numLabels, numFeatures)(0.0) +val thetaArray = Array.fill[Double](numLabels * numFeatures)(0.0) --- End diff -- LGTM. While we're here, is it simpler / more efficient to just call `new Array[Double](...)` rather than also fill it with 0, which it's already initialized to? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15239: [SPARK-17665][SPARKR] Support options/mode all for read/...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15239 Sure, I will within tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15360 cc @cloud-fan @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15090 @gatorsmile Hive tables don't support case sensitive column names, so I use data source tables in the added test cases. See [the followup pr](https://github.com/apache/spark/pull/15360) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-level sta...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15360 **[Test build #66378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66378/consoleFull)** for PR 15360 at commit [`0ad7c88`](https://github.com/apache/spark/commit/0ad7c8837d0ef860e398349652f7589870358c14). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15360: [SPARK-17073] [SQL] [FOLLOWUP] generate column-le...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/15360 [SPARK-17073] [SQL] [FOLLOWUP] generate column-level statistics ## What changes were proposed in this pull request? This pr adds some test cases for statistics: case sensitive column names, non ascii column names, refresh table, and also improves some documentation. ## How was this patch tested? add test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark colStats2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15360.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15360 commit 0ad7c8837d0ef860e398349652f7589870358c14 Author: Zhenhua Wang Date: 2016-10-05T06:06:56Z add test cases and improve documentation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15239: [SPARK-17665][SPARKR] Support options/mode all for read/...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15239 I think this has conflict now since SPARK-17658 is merged, could you bring this up to date please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15231: [SPARK-17658][SPARKR] read.df/write.df API taking path o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15231 Thanks again for your close review. Will keep in mind the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15231: [SPARK-17658][SPARKR] read.df/write.df API taking...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15231: [SPARK-17658][SPARKR] read.df/write.df API taking path o...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15231 I've merged this to master. We could take some example improvements along with SPARK-17665 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15292 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15292 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66365/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15292: [SPARK-17719][SPARK-17776][SQL] Unify and tie up options...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15292 **[Test build #66365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66365/consoleFull)** for PR 15292 at commit [`3fa9f43`](https://github.com/apache/spark/commit/3fa9f43686f1195a9f86ab1bcda054119c332a20). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15231: [SPARK-17658][SPARKR] read.df/write.df API taking path o...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15231 I think also it would be great to add to examples one for read.df and one for write.df without the path parameter (like a jdbc one) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15359 **[Test build #66377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66377/consoleFull)** for PR 15359 at commit [`0d9a9c7`](https://github.com/apache/spark/commit/0d9a9c74ca714b1df3dde50f2c0386a4a974fa73). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15231: [SPARK-17658][SPARKR] read.df/write.df API taking...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15231#discussion_r81904325 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -167,10 +167,13 @@ test_that("convertToJSaveMode", { }) test_that("captureJVMException", { - expect_error(tryCatch(callJStatic("org.apache.spark.sql.api.r.SQLUtils", "getSQLDataType", + method <- "getSQLDataType" + expect_error(tryCatch(callJStatic("org.apache.spark.sql.api.r.SQLUtils", method, --- End diff -- let's change this test to `handledCallJStatic` too in a follow up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15359: [Minor][ML] Avoid 2D array flatten in NB training.
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15359 cc @zhengruifeng @sethah --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15359: [Minor][ML] Avoid 2D array flatten in NB training...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15359 [Minor][ML] Avoid 2D array flatten in NB training. ## What changes were proposed in this pull request? Avoid 2D array flatten in ```NaiveBayes``` training, since flatten method might be expensive (It will create another array and copy data there). ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark nb-theta Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15359.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15359 commit 0d9a9c74ca714b1df3dde50f2c0386a4a974fa73 Author: Yanbo Liang Date: 2016-10-05T05:39:13Z Avoid 2D array flatten in NB training. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15258 **[Test build #66374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66374/consoleFull)** for PR 15258 at commit [`01cb666`](https://github.com/apache/spark/commit/01cb6664ea9ea2da7bc861432c19e3ac14ede524). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15262 **[Test build #66373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66373/consoleFull)** for PR 15262 at commit [`3a1cd22`](https://github.com/apache/spark/commit/3a1cd221402f4ade6b496996b81665ad19ce3e86). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15355 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15355 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66361/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14151 **[Test build #66375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66375/consoleFull)** for PR 14151 at commit [`e263b15`](https://github.com/apache/spark/commit/e263b1508a77424b371a0796ea4f9c05bc1c0121). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14087: [SPARK-16411][SQL][STREAMING] Add textFile to Structured...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14087 **[Test build #66376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66376/consoleFull)** for PR 14087 at commit [`ecdf653`](https://github.com/apache/spark/commit/ecdf6539c8c19da3f019601309993fde634d6c22). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15355 **[Test build #66361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66361/consoleFull)** for PR 15355 at commit [`b7074d4`](https://github.com/apache/spark/commit/b7074d48159804035eaf00e1abed35e408684b42). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15354 **[Test build #66372 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66372/consoleFull)** for PR 15354 at commit [`5f185e3`](https://github.com/apache/spark/commit/5f185e36aba86865e2cae772351e90fb8bec6492). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12355: [SPARK-14344][SQL] Not creating meta files when summary-...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12355 It seems working fine now. Therefore, it seems not a problem. ```scala test("SPARK-14344 - write metadata") withSQLConf(ParquetOutputFormat.ENABLE_JOB_SUMMARY -> "true") { withTempPath { dir => val path = s"${dir.getCanonicalPath}/part-r-0.parquet" spark.range(10).write.parquet(path) val compressedFiles = new File(path).listFiles() assert(compressedFiles.exists(_.getName.endsWith("_common_metadata"))) } } withSQLConf(ParquetOutputFormat.ENABLE_JOB_SUMMARY -> "false") { withTempPath { dir => val path = s"${dir.getCanonicalPath}/part-r-0.parquet" spark.range(10).write.parquet(path) val compressedFiles = new File(path).listFiles() assert(!compressedFiles.exists(_.getName.endsWith("_common_metadata"))) } } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15249 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66358/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15249 **[Test build #66358 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66358/consoleFull)** for PR 15249 at commit [`89d3c5e`](https://github.com/apache/spark/commit/89d3c5eb44939c38b0be14a6fc10c2139d0126ab). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14452 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66360/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14452 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14452 **[Test build #66360 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66360/consoleFull)** for PR 14452 at commit [`cebfbf5`](https://github.com/apache/spark/commit/cebfbf5e3dd7b2d2365e5152991ab7ff2c63dd90). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66362/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15357#discussion_r81901737 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -265,7 +265,9 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } val statement = plan(ctx.statement) -if (isExplainableStatement(statement)) { +if (statement == null) { + null // This is enough since ParseException will raise later. --- End diff -- I added the testcase to intercept that. If it returns `null`, `ParseDriver.scala` recognizes it as a parsing error and raises `Unsupported SQL statement`. ``` override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser => astBuilder.visitSingleStatement(parser.singleStatement()) match { case plan: LogicalPlan => plan case _ => val position = Origin(None, None) throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66362/consoleFull)** for PR 15307 at commit [`f5732a5`](https://github.com/apache/spark/commit/f5732a50da7f0df326f52ad9b85da3876ecfafbc). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12693: [SPARK-14914] Fix Resource not closed after using, mostl...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/12693 @srowen I am willing to proceed this too if you approve and @taoli91 is not echoing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15357 **[Test build #66371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66371/consoleFull)** for PR 15357 at commit [`45e46a9`](https://github.com/apache/spark/commit/45e46a969919c3fb184a3678764fa094054d223a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12135 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12135 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66359/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12135 **[Test build #66359 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66359/consoleFull)** for PR 12135 at commit [`89ed0cc`](https://github.com/apache/spark/commit/89ed0ccfe22e345655f33fb77b670c4d2309ecd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12135 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12135 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66363/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12135 **[Test build #66363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66363/consoleFull)** for PR 12135 at commit [`a475090`](https://github.com/apache/spark/commit/a475090f5424752a1cfe04983d964f6fb85181b0). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/15357#discussion_r81900796 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -265,7 +265,9 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } val statement = plan(ctx.statement) -if (isExplainableStatement(statement)) { +if (statement == null) { + null // This is enough since ParseException will raise later. --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/15357#discussion_r81900460 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -265,7 +265,9 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder { } val statement = plan(ctx.statement) -if (isExplainableStatement(statement)) { +if (statement == null) { + null // This is enough since ParseException will raise later. --- End diff -- Where will the parseException be raised? Could you be a bit more specific in your comment? Maybe add a small test for this? Could you also add a few unit tests (not end-2-end) to SparkSqlParserSuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15358: [SPARK-17783] [SQL] Hide Credentials in CREATE and DESC ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15358 **[Test build #66370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66370/consoleFull)** for PR 15358 at commit [`d3cc470`](https://github.com/apache/spark/commit/d3cc47025df10012940f281af5db94c90fc83917). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15358: Hide Credentials in CREATE and DESC FORMATTED/EXT...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15358 Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a PERSISTENT/TEMP Table for JDBC ### What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ### How was this patch tested? Added test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark maskCredentials Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15358.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15358 commit d3cc47025df10012940f281af5db94c90fc83917 Author: gatorsmile Date: 2016-10-05T04:37:35Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15354 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66356/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15354 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15354: [SPARK-17764][SQL] Add `to_json` supporting to convert n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15354 **[Test build #66356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66356/consoleFull)** for PR 15354 at commit [`eec0cd3`](https://github.com/apache/spark/commit/eec0cd32bde8564a080da425be48986055523e8c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15314 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15314 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66364/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double datatypes
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15314 **[Test build #66364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66364/consoleFull)** for PR 15314 at commit [`423fd51`](https://github.com/apache/spark/commit/423fd5117e32e971e47a02728d6a863a726fc539). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81899064 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution( case object TERMINATED extends State } -object StreamExecution { +object StreamExecution extends Logging { private val _nextId = new AtomicLong(0) + /** + * Get the number of input rows from the executed plan of the trigger + * @param triggerExecutionPlan Execution plan of the trigger + * @param triggerLogicalPlan Logical plan of the trigger, generated from the query logical plan + * @param sourceToDataframe Source to DataFrame returned by the source.getBatch in this trigger + */ + def getNumInputRowsFromTrigger( --- End diff -- I managed to improve the test code, so remove this static method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66369/consoleFull)** for PR 15307 at commit [`e708b3b`](https://github.com/apache/spark/commit/e708b3b86a69833169962713ce8bef88bcbdc2f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15357 **[Test build #66368 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66368/consoleFull)** for PR 15357 at commit [`4b60195`](https://github.com/apache/spark/commit/4b601951e4b3311501363ac4de864c4bf9a1a756). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81899011 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -525,8 +645,62 @@ class StreamExecution( case object TERMINATED extends State } -object StreamExecution { +object StreamExecution extends Logging { private val _nextId = new AtomicLong(0) + /** + * Get the number of input rows from the executed plan of the trigger + * @param triggerExecutionPlan Execution plan of the trigger + * @param triggerLogicalPlan Logical plan of the trigger, generated from the query logical plan + * @param sourceToDataframe Source to DataFrame returned by the source.getBatch in this trigger + */ + def getNumInputRowsFromTrigger( + triggerExecutionPlan: SparkPlan, + triggerLogicalPlan: LogicalPlan, + sourceToDataframe: Map[Source, DataFrame]): Map[Source, Long] = { + +// We want to associate execution plan leaves to sources that generate them, so that we match +// the their metrics (e.g. numOutputRows) to the sources. To do this we do the following. +// Consider the translation from the streaming logical plan to the final executed plan. +// +// streaming logical plan (with sources) <==> trigger's logical plan <==> executed plan +// +// 1. We keep track of streaming sources associated with each leaf in the trigger's logical plan +//- Each logical plan leaf will be associated with a single streaming source. +//- There can be multiple logical plan leaves associated a streaming source. +//- There can be leaves not associated with any streaming source, because they were +// generated from a batch source (e.g. stream-batch joins) +// +// 2. Assuming that the executed plan has same number of leaves in the same order as that of +//the trigger logical plan, we associate executed plan leaves with corresponding +//streaming sources. +// +// 3. For each source, we sum the metrics of the associated execution plan leaves. +// +val logicalPlanLeafToSource = sourceToDataframe.flatMap { case (source, df) => + df.logicalPlan.collectLeaves().map { leaf => leaf -> source } +} +val allLogicalPlanLeaves = triggerLogicalPlan.collectLeaves() // includes non-streaming sources +val allExecPlanLeaves = triggerExecutionPlan.collectLeaves() +if (allLogicalPlanLeaves.size == allExecPlanLeaves.size) { + val execLeafToSource = allLogicalPlanLeaves.zip(allExecPlanLeaves).flatMap { +case (lp, ep) => logicalPlanLeafToSource.get(lp).map { source => ep -> source } + } + val sourceToNumInputRows = execLeafToSource.map { case (execLeaf, source) => +val numRows = execLeaf.metrics.get("numOutputRows").map(_.value).getOrElse(0L) +source -> numRows + } + sourceToNumInputRows.groupBy(_._1).mapValues(_.map(_._2).sum) // sum up rows for each source +} else { + def toString[T](seq: Seq[T]): String = s"(size = ${seq.size}), ${seq.mkString(", ")}" + logWarning( +"Could not report metrics as number leaves in trigger logical plan did not match that" + --- End diff -- I added [`logPeriodicWarning`](https://github.com/apache/spark/pull/15307/commits/e708b3b86a69833169962713ce8bef88bcbdc2f7) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15044 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66357/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15357: [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/15357 [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE ## What changes were proposed in this pull request? This PR fixes the following NPE scenario. **Reported Error Scenario** ``` scala> sql("EXPLAIN DESCRIBE TABLE x").show(truncate = false) INFO SparkSqlParser: Parsing command: EXPLAIN DESCRIBE TABLE x java.lang.NullPointerException ``` ## How was this patch tested? Pass the Jenkins test with a new test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-17328 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15357.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15357 commit 4b601951e4b3311501363ac4de864c4bf9a1a756 Author: Dongjoon Hyun Date: 2016-10-05T04:24:27Z [SPARK-17328][SQL] Fix NPE with EXPLAIN DESCRIBE TABLE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15044 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15044 **[Test build #66357 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66357/consoleFull)** for PR 15044 at commit [`2b22d12`](https://github.com/apache/spark/commit/2b22d128ef4c51643cd4dcdbe17a1f3d28362a90). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15249 **[Test build #66367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66367/consoleFull)** for PR 15249 at commit [`a6c863f`](https://github.com/apache/spark/commit/a6c863f2462986b66a93f0beac3bb1f163afa50d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [WIP][SQL][SPARK-17490] Optimize SerializeFromObject() f...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/15044 Thanks, I should update one file and update it later today in Japan. PR #13758 can also solve this issue without allocating UnsafeArrayData. I think that PR #13758 is a generic solution and has small amount of changes. Which PR is preferable, #15044 or #13758? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user squito commented on the issue: https://github.com/apache/spark/pull/15249 @kayousterhout @mridulm thanks for the feedback. obviously still need to figure out the timeout thing but otherwise think I've addressed things. will do another pass in the morning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSet...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/15249#discussion_r81898588 --- Diff: core/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.scheduler + +import org.apache.spark.SparkConf +import org.apache.spark.internal.config +import org.apache.spark.internal.Logging +import org.apache.spark.util.Utils + +private[scheduler] object BlacklistTracker extends Logging { + + private val DEFAULT_TIMEOUT = "1h" --- End diff -- (longer top-level comment responding to this) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets
Github user squito commented on the issue: https://github.com/apache/spark/pull/15249 @mridulm on the questions about expiry from blacklists, you are not missing anything -- this explictly does not do any timeouts at the taskset level (this is mentioned in the design doc). The timeout code you see is mostly just incremental stuff as a step towards https://github.com/apache/spark/pull/14079, but doesn't actually add any value here. The primary motivation for blacklisting that I've seen is actually quite different from the use case you are describing -- its not to help deal w/ resource contention, but to deal w/ truly broken resources (a bad disk in all the cases I can think of). In fact, in these cases, 1 hour is really short -- users really want something more like 6-12 hours probably. But 1 hr really isn't so bad, it just means that the bad resources need to be "rediscovered" that often, with a scheduling hiccup while that happens. This is really different from the use case you are describing -- its a form of back off to deal w/ resource contention. I have actually talked to a couple of different folks about doing something like this recently and think it would be great, though I see problems with this approach, since it allows other tasks to still be scheduled on those executors, and also the time isn't relative to the task runtime etc. Nonetheless, an issue here might be that the old option serves some purpose which is no longer supported. Do we need to add it back in? Just adding the logic for the timeouts again is pretty easy, though (a) I need to figure out the right place to do it so that it doesn't impact scheduling performance and more importantly (b) I really worry about being able to configure things so that blacklisting can actually handle totally broken resources. Eg., say that you set the timeout to 10s. If your tasks take 1 minute each, then your one bad executor might cycle through the leftover tasks, fail them all, pass the timeout, and repeat that cycle a few times till you go over spark.task.maxFailures. I don't see a good way to deal w/ while setting a sensible a timeout for the entire application. Two other workarounds: (2) just enable the timeout per-task when the legacy configuration is used. Leave it undocumented. We don't change behavior then, but configuration is kind of a mess, and it'll be a headache to continue to maintain this (3) Add a timeout just to *taskset* level blacklisting. So its a behavior change from the existing blacklisting, which has a timeout per *task*. This removes the interaction w/ spark.task.maxFailures that we've always got to tiptoe around. I also think it might satisfy your use case even better. I still don't think its a great solution to the problem, and we need something else for handling this sort of backoff better, so I don't feel great about it getting shoved into this feature. I'm thinking (3) is the best but will give it a bit more thought. Also @kayousterhout @tgravescs @markhamstra for opinions as well since this is a bigger design point to consider. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66352/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66352/consoleFull)** for PR 15307 at commit [`02603c7`](https://github.com/apache/spark/commit/02603c7f56c8722d9003d09e40889084122ba40d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15356 FYI, for 15294, the JIRA is set as `Won't fix`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15356: [BUILD] Closing some stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15356 Just to defense myself, it seems the PR such 15339 against a branch to a branch leaving a failure mark for each commit in the branches ([branch-2.0](https://github.com/apache/spark/commits/branch-2.0)). Could you please take a look please @srowen ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15351: [SPARK-17612][SQL][branch-2.0] Support `DESCRIBE table P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15351 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66354/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org