[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201993844 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11870#issuecomment-201992876 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54273/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11870#issuecomment-201992875 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11870#issuecomment-201992845 **[Test build #54273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54273/consoleFull)** for PR 11870 at commit [`11b2364`](https://github.com/apache/spark/commit/11b2364d714869b92a98fd863d341f743d2d302c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201991966 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201991965 **[Test build #54276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54276/consoleFull)** for PR 11984 at commit [`e56406e`](https://github.com/apache/spark/commit/e56406ed5173ae0c196705abcb8f7f28f0be0387). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201991969 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54276/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201991876 **[Test build #54276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54276/consoleFull)** for PR 11984 at commit [`e56406e`](https://github.com/apache/spark/commit/e56406ed5173ae0c196705abcb8f7f28f0be0387). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13844][SQL] Generate better code for fi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11684#issuecomment-201991658 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54275/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13844][SQL] Generate better code for fi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11684#issuecomment-201991655 **[Test build #54275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54275/consoleFull)** for PR 11684 at commit [`9fd7773`](https://github.com/apache/spark/commit/9fd7773c6cb3856d1d4a2cb893c50361d829b01f). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13844][SQL] Generate better code for fi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11684#issuecomment-201991657 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13844][SQL] Generate better code for fi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11684#issuecomment-201991629 **[Test build #54275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54275/consoleFull)** for PR 11684 at commit [`9fd7773`](https://github.com/apache/spark/commit/9fd7773c6cb3856d1d4a2cb893c50361d829b01f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11957#discussion_r57523346 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala --- @@ -480,7 +480,7 @@ abstract class HiveComparisonTest val executions = queryList.map(new TestHive.QueryExecution(_)) executions.foreach(_.toRdd) val tablesGenerated = queryList.zip(executions).flatMap { - case (q, e) => e.sparkPlan.collect { + case (q, e) => e.executedPlan.collect { --- End diff -- We do extra processing for `sparkPlan` in `prepareForExecution`. It includes four rules now: override val batches: Seq[Batch] = Seq( Batch("Subquery", Once, PlanSubqueries(SessionState.this)), Batch("Add exchange", Once, EnsureRequirements(conf)), Batch("Whole stage codegen", Once, CollapseCodegenStages(conf)), Batch("Reuse duplicated exchanges", Once, ReuseExchange(conf)) ) The error shown in PR description is happened because `sparkPlan` lacks the processing of `EnsureRequirements` (i.e., "Add exchange" rule). We will add additional node in `sparkPlan` to enable exchanging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11957#discussion_r57523270 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala --- @@ -480,7 +480,7 @@ abstract class HiveComparisonTest val executions = queryList.map(new TestHive.QueryExecution(_)) executions.foreach(_.toRdd) val tablesGenerated = queryList.zip(executions).flatMap { - case (q, e) => e.sparkPlan.collect { + case (q, e) => e.executedPlan.collect { --- End diff -- At here, we were using `sparkPlan` to do the collect. Does using `executedPlan` make any difference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201989670 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54274/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201989658 **[Test build #54274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54274/consoleFull)** for PR 11984 at commit [`fea2a52`](https://github.com/apache/spark/commit/fea2a524bbd5b1d0d285e02e6eda590d1f7d67e3). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201989665 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11984#issuecomment-201988732 **[Test build #54274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54274/consoleFull)** for PR 11984 at commit [`fea2a52`](https://github.com/apache/spark/commit/fea2a524bbd5b1d0d285e02e6eda590d1f7d67e3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-14138][SQL] Fix generated SpecificColum...
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/11984 [Spark-14138][SQL] Fix generated SpecificColumnarIterator code can exceed JVM size limit for cached DataFrames ## What changes were proposed in this pull request? This PR reduces Java byte code size of method in ```SpecificColumnarIterator``` by using two approaches: 1. Generate and call ```getTYPEColumnAccessor()``` for each type, which is actually used, for instantiating accessors 2. Group a lot of method calls (more than 4000) into a method ## How was this patch tested? Added a new unit test to ```InMemoryColumnarQuerySuite``` Here is generate code ```java /* 033 */ private org.apache.spark.sql.execution.columnar.CachedBatch batch = null; /* 034 */ /* 035 */ private org.apache.spark.sql.execution.columnar.IntColumnAccessor accessor; /* 036 */ private org.apache.spark.sql.execution.columnar.IntColumnAccessor accessor1; /* 037 */ /* 038 */ public SpecificColumnarIterator() { /* 039 */ this.nativeOrder = ByteOrder.nativeOrder(); /* 030 */ this.mutableRow = new MutableUnsafeRow(rowWriter); /* 041 */ } /* 042 */ /* 043 */ public void initialize(Iterator input, DataType[] columnTypes, int[] columnIndexes, /* 044 */ boolean columnNullables[]) { /* 044 */ this.input = input; /* 046 */ this.columnTypes = columnTypes; /* 047 */ this.columnIndexes = columnIndexes; /* 048 */ } /* 049 */ /* 050 */ /* 051 */ private org.apache.spark.sql.execution.columnar.IntColumnAccessor getIntColumnAccessor(int idx) { /* 052 */ byte[] buffer = batch.buffers()[columnIndexes[idx]]; /* 053 */ return new org.apache.spark.sql.execution.columnar.IntColumnAccessor(ByteBuffer.wrap(buffer).order(nativeOrder)); /* 054 */ } /* 055 */ /* 056 */ /* 057 */ /* 058 */ /* 059 */ /* 060 */ /* 061 */ public boolean hasNext() { /* 062 */ if (currentRow < numRowsInBatch) { /* 063 */ return true; /* 064 */ } /* 065 */ if (!input.hasNext()) { /* 066 */ return false; /* 067 */ } /* 068 */ /* 069 */ batch = (org.apache.spark.sql.execution.columnar.CachedBatch) input.next(); /* 070 */ currentRow = 0; /* 071 */ numRowsInBatch = batch.numRows(); /* 072 */ accessor = getIntColumnAccessor(0); /* 073 */ accessor1 = getIntColumnAccessor(1); /* 074 */ /* 075 */ return hasNext(); /* 076 */ } /* 077 */ /* 078 */ public InternalRow next() { /* 079 */ currentRow += 1; /* 080 */ bufferHolder.reset(); /* 081 */ rowWriter.zeroOutNullBytes(); /* 082 */ accessor.extractTo(mutableRow, 0); /* 083 */ accessor1.extractTo(mutableRow, 1); /* 084 */ unsafeRow.setTotalSize(bufferHolder.totalSize()); /* 085 */ return unsafeRow; /* 086 */ } ``` (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/kiszk/spark SPARK-14138 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11984.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11984 commit ab67d33787e568245c9e2ab30e51b471f21fa2ed Author: Kazuaki IshizakiDate: 2016-03-27T04:15:06Z make code size of hasNext() smaller by preparing get*Acceessor() methods group a lot of calls into a method commit fea2a524bbd5b1d0d285e02e6eda590d1f7d67e3 Author: Kazuaki Ishizaki Date: 2016-03-27T04:15:38Z add test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992][WIP] Add support for off-heap ca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-201984448 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992][WIP] Add support for off-heap ca...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-201984454 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54272/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992][WIP] Add support for off-heap ca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-201984028 **[Test build #54272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54272/consoleFull)** for PR 11805 at commit [`df8be62`](https://github.com/apache/spark/commit/df8be62b3107a3fe6d01f000721c5e953452cd84). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11957#issuecomment-201982155 @yhuai I added it to PR description. Please let me know if it is clear now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11957#issuecomment-201981360 Can you attach an example showing the message before and the message after the change? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14052] [SQL] build a BytesToBytesMap di...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11870#issuecomment-201980413 **[Test build #54273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54273/consoleFull)** for PR 11870 at commit [`11b2364`](https://github.com/apache/spark/commit/11b2364d714869b92a98fd863d341f743d2d302c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11957#issuecomment-201979883 cc @yhuai @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11977 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201979696 I have fixed the conflict. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11959 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201979613 Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11959#issuecomment-201979487 LGTM. Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-201971518 @MLnick This bug has been fixed without changing existing interfaces. Have tested it with your test script with Lee corpus from Gensim. I am not sure whether you need an additional function `getNormalizedVector()` as follows ``` /** * Returns a map of words to their normalized vector representations. */ def getNormalizedVectors: Map[String, Array[Float]] = { wordIndex.map { case (word, ind) => val vec = wordVectors.slice(vectorSize * ind, vectorSize * ind + vectorSize) if(wordVecNorms(ind) != 0.0) { blas.sscal(vectorSize, 1 / wordVecNorms(ind).toFloat, vec, 0, 1) } (word, vec) } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11959#issuecomment-201971431 ping @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992][WIP] Add support for off-heap ca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-201971414 **[Test build #54272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54272/consoleFull)** for PR 11805 at commit [`df8be62`](https://github.com/apache/spark/commit/df8be62b3107a3fe6d01f000721c5e953452cd84). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11959#issuecomment-201970947 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54268/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11959#issuecomment-201970946 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201970903 **[Test build #54271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54271/consoleFull)** for PR 11983 at commit [`bf14a24`](https://github.com/apache/spark/commit/bf14a24f9ba91fdb7719c98027e5486ccbe79854). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201970906 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54271/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201970905 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11959#issuecomment-201970895 **[Test build #54268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54268/consoleFull)** for PR 11959 at commit [`db23480`](https://github.com/apache/spark/commit/db23480f37df7f8ec89e53990fc246c9239bda03). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201968589 **[Test build #54271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54271/consoleFull)** for PR 11983 at commit [`bf14a24`](https://github.com/apache/spark/commit/bf14a24f9ba91fdb7719c98027e5486ccbe79854). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201967263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54270/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201967255 **[Test build #54270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54270/consoleFull)** for PR 11983 at commit [`aa4b408`](https://github.com/apache/spark/commit/aa4b408a077bd3905693c8f6428eb682b5dc47b4). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201967262 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201966705 **[Test build #54270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54270/consoleFull)** for PR 11983 at commit [`aa4b408`](https://github.com/apache/spark/commit/aa4b408a077bd3905693c8f6428eb682b5dc47b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user tejasapatil commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-201966523 @liancheng : I have made all requested changes as per review and also rebased. Can you please take a look ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201964442 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201964443 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54269/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201964439 **[Test build #54269 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54269/consoleFull)** for PR 11983 at commit [`bc06166`](https://github.com/apache/spark/commit/bc0616605091f77d6c9621fc55f5d3561ba5a05d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class Model[M <: Model[M]] extends Transformer ` * `abstract class MutableEstimator[T <: MutableEstimator[T]] extends Transformer ` * `class MutableEstimator(Transformer):` * `class StringIndexer(JavaMutableEstimator, HasInputCol, HasOutputCol, HasHandleInvalid,` * `class JavaMutableEstimator(MutableEstimator, JavaTransformer):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201964119 **[Test build #54269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54269/consoleFull)** for PR 11983 at commit [`bc06166`](https://github.com/apache/spark/commit/bc0616605091f77d6c9621fc55f5d3561ba5a05d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/11983#issuecomment-201963929 CC: @mengxr Here's the prototype --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14100][ML] Merging Estimator and Model:...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/11983 [SPARK-14100][ML] Merging Estimator and Model: prototype for StringIndexer ## What changes were proposed in this pull request? This is a *prototype*. It will be used to decide whether or not to proceed with [https://issues.apache.org/jira/browse/SPARK-14100]. Main changes * Created new abstraction MutableEstimator which will eventually replace Estimator for Spark 2.0. * MutableEstimator inherits from Transformer, and it contains method ```fit()```. * It does not contain fit() methods taking ParamMaps. The expected behavior of such methods becomes more ambiguous since it is unclear if they modify the current instance. * Merged StringIndexer and StringIndexerModel, where the merged abstraction now inherits from MutableEstimator. * Did the same for the Python API. Also added JavaMutableEstimator for Python wrappers. Other required changes * Modified Pipeline to handle MutableEstimator. Other proposed changes * Deprecated transform() methods in Transformer taking Param settings. * Added ```copy()``` without arguments to PipelineStage since this will be a more common operation after the merge. * Added more ```set()``` methods to Params to facilitate setting Param values, now that fit() and transform() methods taking ParamMaps are going to be removed. ## How was this patch tested? Existing unit tests. Note that the required changes were minimal. More changes to meta-algorithms such as CrossValidator may be needed as we merge other Estimator-Model pairs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark thunterdb-14100 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11983.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11983 commit 0585e3fbc6dc7316ae2d8dd425648c9f6b45e041 Author: Timothy HunterDate: 2016-03-15T21:54:20Z passing tests commit 6aa439bb78aea37476e7a12209a9f902a7be9871 Author: Timothy Hunter Date: 2016-03-15T21:56:36Z cleanups commit 317df204c049a08c3e230c4d3ca61ea6f122c864 Author: Timothy Hunter Date: 2016-03-23T21:01:59Z wokr commit bc0616605091f77d6c9621fc55f5d3561ba5a05d Author: Joseph K. Bradley Date: 2016-03-26T22:24:28Z Made StringIndexer extend MutableEstimator in Python and Scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14169][Core]Add UninterruptibleThread
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/11971#discussion_r57521413 --- Diff: core/src/main/scala/org/apache/spark/util/UninterruptibleThread.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import javax.annotation.concurrent.GuardedBy + +/** + * A special Thread that provides "runUninterruptibly" to allow running codes without being + * interrupted by `Thread.interrupt()`. If `Thread.interrupt()` is called during runUninterruptibly + * is running, it won't set the interrupted status. Instead, setting the interrupted status will be + * deferred until it's returning from "runUninterruptibly". + * + * Note: this method should be called only in `this` thread. + */ +private[spark] class UninterruptibleThread(name: String) extends Thread(name) { + + /** A monitor to protect "uninterruptible" and "interrupted" */ + private val uninterruptibleLock = new Object + + /** + * Indicates if `this` thread are in the uninterruptible status. If so, interrupting + * "this" will be deferred until `this` enters into the interruptible status. + */ + @GuardedBy("uninterruptibleLock") + private var uninterruptible = false + + /** + * Indicates if we should interrupt `this` when we are leaving the uninterruptible zone. + */ + @GuardedBy("uninterruptibleLock") + private var shouldInterruptThread = false + + /** + * Run `f` uninterruptibly in `this` thread. The thread won't be interrupted before returning + * from `f`. + * + * Note: this method should be called only in `this` thread. + */ + def runUninterruptibly[T](f: => T): T = { +if (Thread.currentThread() != this) { + throw new IllegalStateException(s"Call runUninterruptibly in a wrong thread. " + +s"Expected: $this but was ${Thread.currentThread()}") +} + --- End diff -- minor: should you bail out early if `shouldInterruptThread` has already been set somehow? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14169][Core]Add UninterruptibleThread
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/11971#issuecomment-201960142 Yes, my suggestion would make the whole thread uninterruptible. But from the only use case, it seems that would be ok - there are no calls I see that can be interrupted outside of the calls to `runUninterrubptibly`. In any case, not a huge deal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-201959124 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-201959127 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54267/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-201958767 **[Test build #54267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54267/consoleFull)** for PR 11891 at commit [`3c25e7e`](https://github.com/apache/spark/commit/3c25e7ec503f3bc705d2635a13d235bf6e0ec5a1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user mtustin-handy commented on a diff in the pull request: https://github.com/apache/spark/pull/11982#discussion_r57521152 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -56,9 +56,12 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) val confFactor = { if (counter.count > 100) { new NormalDistribution().inverseCumulativeProbability(1 - (1 - confidence) / 2) -} else { +} else if (counter.count > 1) { val degreesOfFreedom = (counter.count - 1).toInt new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 - (1 - confidence) / 2) +} else { + // this may not be statistically meaningful + confidence --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user yongtang commented on a diff in the pull request: https://github.com/apache/spark/pull/11981#discussion_r57521139 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -42,6 +42,14 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) new BoundedDouble(counter.sum, 1.0, counter.sum, counter.sum) } else if (outputsMerged == 0) { new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) +} else if (counter.count == 0) { + new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) +} else if (counter.count == 1) { + val p = outputsMerged.toDouble / totalOutputs --- End diff -- Thanks @srowen just updated the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user yongtang commented on a diff in the pull request: https://github.com/apache/spark/pull/11981#discussion_r57521128 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -42,6 +42,14 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) new BoundedDouble(counter.sum, 1.0, counter.sum, counter.sum) } else if (outputsMerged == 0) { new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) +} else if (counter.count == 0) { + new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) --- End diff -- Thanks. Just updated the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14157][SQL] Parse Drop Function DDL com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11959#issuecomment-201952198 **[Test build #54268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54268/consoleFull)** for PR 11959 at commit [`db23480`](https://github.com/apache/spark/commit/db23480f37df7f8ec89e53990fc246c9239bda03). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user mtustin-handy commented on a diff in the pull request: https://github.com/apache/spark/pull/11982#discussion_r57521048 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -56,9 +56,12 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) val confFactor = { if (counter.count > 100) { new NormalDistribution().inverseCumulativeProbability(1 - (1 - confidence) / 2) -} else { +} else if (counter.count > 1) { val degreesOfFreedom = (counter.count - 1).toInt new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 - (1 - confidence) / 2) +} else { + // this may not be statistically meaningful + confidence --- End diff -- Right, Double.PositiveInfinity it is then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/11982#discussion_r57521027 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -56,9 +56,12 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) val confFactor = { if (counter.count > 100) { new NormalDistribution().inverseCumulativeProbability(1 - (1 - confidence) / 2) -} else { +} else if (counter.count > 1) { val degreesOfFreedom = (counter.count - 1).toInt new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 - (1 - confidence) / 2) +} else { + // this may not be statistically meaningful + confidence --- End diff -- This isn't valid; confidence is really a probability and confFactor is a number of standard deviations. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11982#issuecomment-201951571 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user marcintustin commented on the pull request: https://github.com/apache/spark/pull/11981#issuecomment-201951574 FYI I have a more parsimonious change here: #11982 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
GitHub user mtustin-handy opened a pull request: https://github.com/apache/spark/pull/11982 [SPARK-14163][CORE] SumEvaluator and countApprox cannot reliably handle RDDs of size 1 ## What changes were proposed in this pull request? This special cases 0 and 1 counts to avoid passing 0 degrees of freedom. ## How was this patch tested? Tests run successfully. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mtustin-handy/spark SPARK-14163 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11982.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11982 commit 3b26826f32b95f661991d1dc08b9a087a8779f63 Author: Marcin TustinDate: 2016-03-26T23:32:23Z Use confidence when count is 1. Work with me in New York? https://www.handy.com/careers/73115?gh_jid=73115_src=o5qcxn --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/11981#discussion_r57520944 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -42,6 +42,14 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) new BoundedDouble(counter.sum, 1.0, counter.sum, counter.sum) } else if (outputsMerged == 0) { new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) +} else if (counter.count == 0) { + new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) +} else if (counter.count == 1) { + val p = outputsMerged.toDouble / totalOutputs --- End diff -- This duplicates code in the following block. I think you can compute the common values in one shared block and then branch afterwards to handle count == 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user marcintustin commented on a diff in the pull request: https://github.com/apache/spark/pull/11981#discussion_r57520937 --- Diff: core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala --- @@ -42,6 +42,14 @@ private[spark] class SumEvaluator(totalOutputs: Int, confidence: Double) new BoundedDouble(counter.sum, 1.0, counter.sum, counter.sum) } else if (outputsMerged == 0) { new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) +} else if (counter.count == 0) { + new BoundedDouble(0, 0.0, Double.NegativeInfinity, Double.PositiveInfinity) --- End diff -- Why not just use an || to avoid repeating code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11981#issuecomment-201949785 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201949680 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54266/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201949678 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14163][CORE] SumEvaluator and countAppr...
GitHub user yongtang opened a pull request: https://github.com/apache/spark/pull/11981 [SPARK-14163][CORE] SumEvaluator and countApprox cannot reliably handle RDDs of size 1. ## What changes were proposed in this pull request? This fix fixes issues in SPARK-14163 where SumEvaluator could not handle `counter.count <=` 1 as `degreesOfFreedom` requires `counter.count > 1`. In this fix, `counter.count <= 1` is handled separately. ## How was this patch tested? A manual test was done to make sure that no Exception is thrown for `degreesOfFreedom`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yongtang/spark SPARK-14163 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11981.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11981 commit f739dcd0bbb84bf5429ae75997b7d0c54c95ac22 Author: Yong TangDate: 2016-03-26T23:19:54Z [SPARK-14163][CORE] SumEvaluator and countApprox cannot reliably handle RDDs of size 1. This fix fixes issues in SPARK-14163 where SumEvaluator could not handle counter.count of `<=` 1 as degreesOfFreedom requires counter.count > 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201949626 **[Test build #54266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54266/consoleFull)** for PR 11977 at commit [`6517f1f`](https://github.com/apache/spark/commit/6517f1fdc4aefd4ae7629456bbbfc4eb01e17825). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AlterDatabaseProperties(` * `case class DescribeDatabase(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14116][SQL] Implements buildReader() fo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11936 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14116][SQL] Implements buildReader() fo...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11936#issuecomment-201948284 Thanks. I am merging this to master. @liancheng @cloud-fan Let's address https://github.com/apache/spark/pull/11936/files#r57520723 in either of your PR for other formats. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14116][SQL] Implements buildReader() fo...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11936#discussion_r57520723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala --- @@ -56,9 +55,10 @@ import org.apache.spark.sql.types._ */ private[sql] object FileSourceStrategy extends Strategy with Logging { def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { -case PhysicalOperation(projects, filters, l@LogicalRelation(files: HadoopFsRelation, _, _)) +case PhysicalOperation(projects, filters, l @ LogicalRelation(files: HadoopFsRelation, _, _)) if (files.fileFormat.toString == "TestFileFormat" || - files.fileFormat.isInstanceOf[parquet.DefaultSource]) && + files.fileFormat.isInstanceOf[parquet.DefaultSource] || + files.fileFormat.toString == "ORC") && files.sqlContext.conf.parquetFileScan => --- End diff -- Let's rename this conf. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14071][PySpark][ML]Change MLWritable.wr...
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11945#issuecomment-201947768 @jkbradley I am not sure whether the property tag will change the appearance of the members in the doc. I can do a quick check by roll-back the change to check whether doc includes write. In addition, do you mean since tag should be included regardless private or note? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-201945963 **[Test build #54267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54267/consoleFull)** for PR 11891 at commit [`3c25e7e`](https://github.com/apache/spark/commit/3c25e7ec503f3bc705d2635a13d235bf6e0ec5a1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user tejasapatil commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-201945359 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201943853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54265/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201943850 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201943320 **[Test build #54265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54265/consoleFull)** for PR 11980 at commit [`3e32b6a`](https://github.com/apache/spark/commit/3e32b6aa4dbc0adcdd892ee838ccaed77a67dc58). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14013][SQL] Proper temp function suppor...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11972#issuecomment-201941448 Overall looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14013][SQL] Proper temp function suppor...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11972#discussion_r57520314 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala --- @@ -141,6 +141,16 @@ private[hive] class HiveFunctionRegistry( } }.getOrElse(None)) } + + override def lookupFunctionBuilder(name: String): Option[FunctionBuilder] = { +underlying.lookupFunctionBuilder(name) + } + + // Note: This only does not drop functions stored in the metastore --- End diff -- You mean "this does not", right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14013][SQL] Proper temp function suppor...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11972#discussion_r57520291 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -476,33 +497,29 @@ class SessionCatalog(externalCatalog: ExternalCatalog, conf: CatalystConf) { throw new AnalysisException("rename does not support moving functions across databases") } val db = oldName.database.getOrElse(currentDb) -if (oldName.database.isDefined || !tempFunctions.containsKey(oldName.funcName)) { +lazy val oldBuilder = functionRegistry.lookupFunctionBuilder(oldName.funcName) --- End diff -- why use lazy val? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201936414 **[Test build #54266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54266/consoleFull)** for PR 11977 at commit [`6517f1f`](https://github.com/apache/spark/commit/6517f1fdc4aefd4ae7629456bbbfc4eb01e17825). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14119][SPARK-14120][SPARK-14122][SQL] T...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11948#issuecomment-201934254 @hvanhovell The thought is that it is better to all commands by Spark because asking hive to execute some commands may introduce inconsistent behaviors or failures with bad error messages. Also, we can reduce the dependence on hive at the execution side. For commands shown in this PR, since we do not support them right now, we just throw exceptions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201934044 sure, let me do it now. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14177] [SQL] Native Parsing for DDL Com...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11977#issuecomment-201931827 LGTM. Let's rebase the PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14161] [SQL] Native Parsing for DDL Com...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11962 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14161] [SQL] Native Parsing for DDL Com...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11962#issuecomment-201930088 Merging to master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201929929 **[Test build #54265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54265/consoleFull)** for PR 11980 at commit [`3e32b6a`](https://github.com/apache/spark/commit/3e32b6aa4dbc0adcdd892ee838ccaed77a67dc58). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201929262 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201929263 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54264/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201929261 **[Test build #54264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54264/consoleFull)** for PR 11980 at commit [`b500c8b`](https://github.com/apache/spark/commit/b500c8bb1d3d8f1ea1d88a2f761056b946598871). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11980#issuecomment-201929188 **[Test build #54264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54264/consoleFull)** for PR 11980 at commit [`b500c8b`](https://github.com/apache/spark/commit/b500c8bb1d3d8f1ea1d88a2f761056b946598871). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14139 Dataset loses nullability in opera...
GitHub user koertkuipers opened a pull request: https://github.com/apache/spark/pull/11980 SPARK-14139 Dataset loses nullability in operations with RowEncoder ## What changes were proposed in this pull request? RowEncoder now respects nullability for struct fields when creating extractor expressions in the extractorsFor method. Note that to get the correct value for nullable for the returned expression i chose to drop the If statement checking for nulls if the field has nullable=false. If this is undesired because we should defensively be checking for nulls anyhow with the If statement then that can be achieved as well, by modifying the If class, however to me that solution seems less clear/elegant. ## How was this patch tested? Added new unit test in DataFrameSuite for the bug described in the jira issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tresata/spark feat-rowencoder-nullable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11980.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11980 commit 847d7c7cdfe6626ea1f73656f9eaf868d641ae1c Author: Koert KuipersDate: 2016-03-26T20:14:18Z change RowEncoder to respect nullable for struct fields when generating extractors commit b500c8bb1d3d8f1ea1d88a2f761056b946598871 Author: Koert Kuipers Date: 2016-03-26T20:46:54Z merge from master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR] Fix newly added java-lint errors
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/11968#issuecomment-201917668 Thank you, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14135] Add off-heap storage memory book...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11942 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14175] [SQL] whole stage codegen interf...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11975 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org