[GitHub] spark pull request: [SPARK-11627] Add initial input rate limit for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9593#issuecomment-166178142 **[Test build #48091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48091/consoleFull)** for PR 9593 at commit [`2d750c4`](https://github.com/apache/spark/commit/2d750c4c1cedaff9849137710b58242bcd15bef9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10283#issuecomment-166184502 **[Test build #48094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48094/consoleFull)** for PR 10283 at commit [`2500de3`](https://github.com/apache/spark/commit/2500de3ba716ad93dca8001f5fde6c670c898416). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12443][SQL] encoderFor should support D...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10399#discussion_r48114340 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ScalaReflectionRelationSuite.scala --- @@ -138,4 +144,16 @@ class ScalaReflectionRelationSuite extends SparkFunSuite with SharedSQLContext { Map(10 -> 100L, 20 -> 200L, 30 -> null), Row(null, "abc" } + + test("decimal type with ScalaReflection") { --- End diff -- Can we write the test in `ExpressionEncoderSuite`? just add a line `encodeDecodeTest(Decimal("32131413.211321313"), "catalyst decimal")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11627] Add initial input rate limit for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9593#issuecomment-166185197 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48115321 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -131,6 +131,7 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } +assert(!df2.queryExecution.toString.contains("!")) --- End diff -- how about `assert(df2.queryExecution.executedPlan.resolved)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12437][SQL] [WIP] Encapsulate the table...
Github user naveenminchu commented on the pull request: https://github.com/apache/spark/pull/10403#issuecomment-166203008 @rxin Agree 100% --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48117154 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -131,6 +131,7 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } +assert(!df2.queryExecution.toString.contains("!")) --- End diff -- @cloud-fan I like your suggestion! `resolved` is not defined in `SparkPlan`. It is only available in `LogicalPlan`. I am not sure if you want me to define it in `SparkPlan` and overwrite it if necessary like what we did in `LogicalPlan`? Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12287] [SQL] Support UnsafeRow in MapPa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10398#issuecomment-166212601 **[Test build #48095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48095/consoleFull)** for PR 10398 at commit [`4c745f5`](https://github.com/apache/spark/commit/4c745f5256700b160a10f0077be49e77a10e758b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6624][SQL]Convert filters into CNF for ...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/8200#issuecomment-166212614 It sounds like multiple PRs are blocked by this PR. I will submit a PR for fixing it tomorrow. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12287] [SQL] Support UnsafeRow in MapPa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10398#issuecomment-166212830 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48095/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-166215753 Pinging @jkbradley @mengxr @MechCoder again for a final review - could you give this a look and confirm you're in agreement with my comments above. Also thoughts on whether this should target `1.6.1` - as it is actually a fairly major yet subtle bug in the implementation. Or even be backported to `1.5.3`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Dataset nullability check
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10331#issuecomment-166215771 Yeah, let's use this PR for the runtime check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12232] New R API for read.table to avoi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10406#issuecomment-166169310 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48090/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166170888 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48089/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166170840 **[Test build #48089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48089/consoleFull)** for PR 10405 at commit [`0a46559`](https://github.com/apache/spark/commit/0a4655999772eed9296de438a61319765389e588). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12396][Core]Once driver connect to a ma...
GitHub user echoTomei opened a pull request: https://github.com/apache/spark/pull/10407 [SPARK-12396][Core]Once driver connect to a master successfully, stop it connect to master again. You can merge this pull request into a Git repository by running: $ git pull https://github.com/echoTomei/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10407.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10407 commit 5bee290e357d8f855bcf22393fd076a8301f1001 Author: echo2mei <534384...@qq.com> Date: 2015-12-17T08:28:31Z Once driver register successfully, stop it to connect master again. commit 7959c1f75cd34e46ceda011ec11ce56e8e166fd1 Author: echo2mei <534384...@qq.com> Date: 2015-12-21T01:57:25Z [SPARK-12396][Core] Cancel the driver retry thread once it register successfull. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12443][SQL] encoderFor should support D...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10399#issuecomment-166181467 cc @cloud-fan @marmbrus @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10391#issuecomment-166181451 cc @cloud-fan @marmbrus @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12438][SQL] Add SQLUserDefinedType supp...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10390#discussion_r48113653 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala --- @@ -89,6 +94,23 @@ class UserDefinedTypeSuite extends QueryTest with SharedSQLContext with ParquetT assert(featuresArrays.contains(new MyDenseVector(Array(0.2, 2.0 } + test("user type with ScalaReflection") { +val points = Seq( + MyLabeledPoint(1.0, new MyDenseVector(Array(0.1, 1.0))), + MyLabeledPoint(0.0, new MyDenseVector(Array(0.2, 2.0 + +val schema = ScalaReflection.schemaFor[MyLabeledPoint].dataType.asInstanceOf[StructType] +val attributeSeq = schema.toAttributes + +val pointEncoder = encoderFor[MyLabeledPoint] +val unsafeRows = points.map(pointEncoder.toRow(_).copy()) --- End diff -- can we also test `encoder.fromRow`? we can just create a `MyLabelPoint` and encode it to `InternalRow` and decode it back by encoder, and check if the decoded `MyLabelPoint` is same with the original one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12327][SPARKR] fix code for lintr warni...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10408#issuecomment-166184144 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10400#issuecomment-166195414 I'm with @markhamstra here. It is unclear what ! or ? mean. They are unintuitive, and are not general symbols for schema construction or nullability. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10400#issuecomment-166195514 hi @markhamstra , defining a schema is very frequent when writing test, do you feel it's a good idea to put this functionality only in test scope? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12287] [SQL] Support UnsafeRow in MapPa...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10398#issuecomment-166201463 Thank you! @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Dataset nullability check
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/10331#discussion_r48116699 --- Diff: sql/core/src/test/resources/log4j.properties --- @@ -33,7 +33,7 @@ log4j.appender.FA.layout=org.apache.log4j.PatternLayout log4j.appender.FA.layout.ConversionPattern=%d{HH:mm:ss.SSS} %t %p %c{1}: %m%n # Set the logger level of File Appender to WARN -log4j.appender.FA.Threshold = INFO +log4j.appender.FA.Threshold = TRACE --- End diff -- Revert these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48117166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala --- @@ -51,9 +52,12 @@ case class Generate( join: Boolean, outer: Boolean, output: Seq[Attribute], +generatorOutput: Seq[Attribute], child: SparkPlan) extends UnaryNode { + override def missingInput: AttributeSet = super.missingInput -- generatorOutput + --- End diff -- Thank you, @viirya and @cloud-fan ! Just did the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10393#issuecomment-166203846 **[Test build #48096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48096/consoleFull)** for PR 10393 at commit [`6b4ba74`](https://github.com/apache/spark/commit/6b4ba7458398ecd74c394fba0b062b2d8bfa8752). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48119148 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -131,6 +131,7 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } +assert(!df2.queryExecution.toString.contains("!")) --- End diff -- Thank you! @cloud-fan I did the change as you suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12085] [SQL] The join condition hidden ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10087#issuecomment-166212178 @flyson Great work though, you'd better to work with #8200 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10152#discussion_r48119953 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -77,6 +77,20 @@ class Word2Vec extends Serializable with Logging { private var numIterations = 1 private var seed = Utils.random.nextLong() private var minCount = 5 + private var maxSentenceLength = 1000 + + /** + * sets the maxSentenceLength, maxSentenceLength is used as the threshold for cutting sentence --- End diff -- One final thing - can you address the comment above? And I think we can actually remove the `@param` and `@return` to match the comments for the other setters in this class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12153][SPARK-7617][MLlib]add support of...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-166215472 @ygcao just one final comment on the `setMaxSentenceLength` setter comment to address, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
GitHub user ajbozarth opened a pull request: https://github.com/apache/spark/pull/10405 [SPARK-12339] [WebUI] Added a null check that was removed in SPARK-11206 Updates made in SPARK-11206 missed an edge case which cause's a NullPointerException when a task is killed. In some cases when a task ends in failure taskMetrics is initialized as null (see JobProgressListener.onTaskEnd()). To address this a null check was added. Before the changes in SPARK-11206 this null check was called at the start of the updateTaskAccumulatorValues() function. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ajbozarth/spark spark12339 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10405 commit 0a4655999772eed9296de438a61319765389e588 Author: Alex BozarthDate: 2015-12-20T02:54:53Z Added null check that was removed in SPARK-11206 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12232] New R API for read.table to avoi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10406#issuecomment-166169309 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12327][SPARKR] fix code for lintr warni...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/10408 [SPARK-12327][SPARKR] fix code for lintr warning for commented code @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rcodecomment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10408.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10408 commit a4f47a2e31d908a1214e3a680cbe34b28e5f6049 Author: felixcheungDate: 2015-12-21T02:13:44Z fix code for lintr warning for commented code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12327][SPARKR] fix code for lintr warni...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10408#issuecomment-166184145 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48092/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12327][SPARKR] fix code for lintr warni...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10408#issuecomment-166184083 **[Test build #48092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48092/consoleFull)** for PR 10408 at commit [`a4f47a2`](https://github.com/apache/spark/commit/a4f47a2e31d908a1214e3a680cbe34b28e5f6049). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10400#issuecomment-166195733 Note that we also have the "ColumnName" implicit. Using that you can already define a struct field using: `'fieldName.int`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10346#issuecomment-166199376 **[Test build #48093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48093/consoleFull)** for PR 10346 at commit [`d962f15`](https://github.com/apache/spark/commit/d962f15e186bfe77d3fb3e5e4ec44d10b5523c0f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10346#issuecomment-166199432 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48093/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/10400#issuecomment-166197805 @cloud-fan You're kind of hinting at my point: To me, this DSL seems to make life easier for Spark developers, not Spark users. In that kind of trade-off, we should always opt for making things easier for users. Putting this functionality in test scope or otherwise hiding it from the public API isn't as troubling, but having secret or unintuitive shortcuts that only Spark developers use will make getting up to speed more difficult for people looking to contribute to Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48118001 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -131,6 +131,7 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).toSeq } +assert(!df2.queryExecution.toString.contains("!")) --- End diff -- ah, I think we should not add `resolved` to `SparkPlan` for this purpose, how about `assert(df2.queryExecution.executedPlan.missingInput.isEmpty)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12446][SQL] Add unit tests for JDBCRDD ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10409#issuecomment-166211388 **[Test build #48097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48097/consoleFull)** for PR 10409 at commit [`ed94623`](https://github.com/apache/spark/commit/ed94623cb01e36e790824903b9e937495cae3942). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10393#issuecomment-166213409 **[Test build #48098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48098/consoleFull)** for PR 10393 at commit [`63058e3`](https://github.com/apache/spark/commit/63058e32ebe178616af54702852a9e83fa025df9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user ajbozarth commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166165475 FYI the line in JobProgressListener.onTaskEnd that initializes the null value is 387. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user ajbozarth commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166165392 Looping in those involved with SPARK-11206: @carsonwang @JoshRosen @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11627] Add initial input rate limit for...
Github user junhaoMg commented on a diff in the pull request: https://github.com/apache/spark/pull/9593#discussion_r48112803 --- Diff: docs/configuration.md --- @@ -1523,6 +1523,15 @@ Apart from these, the following properties are also available, and may be useful + spark.streaming.backpressure.initialRate + not set + +Initial rate for backpressure mechanism (since 1.5). This provides maximum receiving rate of +receivers in the first batch when enables the backpressure mechanism, then the maximum receiving --- End diff -- thank you, I have modified it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user carsonwang commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166184298 Thanks for catching this. I think the null check here is necessary, and it seems the code that really pass a null taskMetrcis is from the `TaskSetManager` line 796 when a task is resubmitted because of executor lost. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10400#issuecomment-166195988 @rxin you are right, an example is [`'b.struct('a.int, 'b.long)`](https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/EncoderResolutionSuite.scala#L67) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12232] New R API for read.table to avoi...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/10406 [SPARK-12232] New R API for read.table to avoid name conflict @shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark readtable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10406.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10406 commit 042da7403c8cd289f0c9881b014cdacf84705421 Author: felixcheungDate: 2015-12-09T01:54:08Z read.table commit 85d54790e63266dc5390ffc73631236e48d91ba8 Author: felixcheung Date: 2015-12-09T05:21:24Z test and revert change commit 86a12f607ab1a3f843e777c03f508fc29fccf8a5 Author: felixcheung Date: 2015-12-14T23:28:24Z update name as per suggestion commit f1cd057ac8988607334db84f7d712c16c8133d28 Author: felixcheung Date: 2015-12-15T01:49:31Z update test commit 2e5c46bc9e4a45fd7662ea9924c62fed6207dbf9 Author: felixcheung Date: 2015-12-21T00:17:41Z fix test commit 2e4b0908f5fc6d46aa41d64047a347fa62fbf0e7 Author: felixcheung Date: 2015-12-21T00:21:06Z fix export in namespace --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12232] New R API for read.table to avoi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10406#issuecomment-166169270 **[Test build #48090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48090/consoleFull)** for PR 10406 at commit [`2e4b090`](https://github.com/apache/spark/commit/2e4b0908f5fc6d46aa41d64047a347fa62fbf0e7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12287] [SQL] Support UnsafeRow in MapPa...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10398#issuecomment-166191466 LGTM except a minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12287] [SQL] Support UnsafeRow in MapPa...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10398#discussion_r48115141 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -253,6 +254,18 @@ class DatasetSuite extends QueryTest with SharedSQLContext { (1, 1)) } + test("MapPartitions can process unsafe rows") { +// InMemoryColumnarTableScan's outputsUnsafeRows is unsafe +val ds = sparkContext.makeRDD(Seq("a", "b", "c"), 3).toDS().cache() +val dsMapPartitions = ds.mapPartitions(_ => Iterator(1)) +val preparedPlan = dsMapPartitions.queryExecution.executedPlan +// unsafe->safe convertor is not inserted between Generate and InMemoryColumnarTableScan + assert(preparedPlan.children.head.isInstanceOf[InMemoryColumnarTableScan]) --- End diff -- how about `assert(preparedPlan.find(_.isInstanceOf[InMemoryColumnarTableScan]))).isEmpty`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Dataset nullability check
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10331#issuecomment-166201679 Should we just keep the runtime part of changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Dataset nullability check
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10331#issuecomment-166213937 @yhuai Do you think we should move analysis phase checking into another PR or just drop that part? This check does find other nullability bugs (revealed by the Jenkins build failure). And I think Dataset nullability of Dataset schema should conforms to the underlying logical plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Dataset nullability check
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/10331#discussion_r48119682 --- Diff: sql/catalyst/src/test/resources/log4j.properties --- @@ -16,9 +16,9 @@ # # Set everything to be logged to the file target/unit-tests.log -log4j.rootCategory=INFO, file +log4j.rootCategory=TRACE, file log4j.appender.file=org.apache.log4j.FileAppender -log4j.appender.file.append=true +log4j.appender.file.append=false --- End diff -- Oh yeah, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12400][Shuffle] Avoid generating temp s...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/10376#issuecomment-166170360 Hi @JoshRosen , from performance point I don't think there's a big difference with this patch, since at most we will only open `200 * Cores` number of files simultaneously. But at least we could avoid generating file while the related partition is empty. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12168][SPARKR] Add automated tests for ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10171#issuecomment-166189995 @shivaram what do you think about adding `--vanilla` to `RRunner` [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/RRunner.scala#L81)? It'd be consistent since worker/demon is already running with `--vanilla` [here](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRDD.scala#L401), and users would still be able to have their desired environ/profile/init file/workspace when starting SparkR programmatically (ie. when with `sparkR.init()`, but not with `sparkR` or `spark-submit something.R`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11807] Remove support for Hadoop < 2.2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10404#issuecomment-166193507 **[Test build #2242 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2242/consoleFull)** for PR 10404 at commit [`6c9fb80`](https://github.com/apache/spark/commit/6c9fb800ea5d3ed2dcaba8cbbdb24bd4d32f0b65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10346#issuecomment-166199431 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11807] Remove support for Hadoop < 2.2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10404#issuecomment-166210297 **[Test build #2242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2242/consoleFull)** for PR 10404 at commit [`6c9fb80`](https://github.com/apache/spark/commit/6c9fb800ea5d3ed2dcaba8cbbdb24bd4d32f0b65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12232] New R API for read.table to avoi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10406#issuecomment-166168381 **[Test build #48090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48090/consoleFull)** for PR 10406 at commit [`2e4b090`](https://github.com/apache/spark/commit/2e4b0908f5fc6d46aa41d64047a347fa62fbf0e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12396][Core]Once driver connect to a ma...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10407#issuecomment-166175075 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12232][SPARKR] New R API for read.table...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/10406#issuecomment-166180732 How about "tableToDF" ? there are some API methods having table in their names, like "createExternalTable", "saveAsTable", "tables". "tableToDF" is shorter and consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12327][SPARKR] fix code for lintr warni...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10408#issuecomment-166181204 **[Test build #48092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48092/consoleFull)** for PR 10408 at commit [`a4f47a2`](https://github.com/apache/spark/commit/a4f47a2e31d908a1214e3a680cbe34b28e5f6049). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11627] Add initial input rate limit for...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9593#issuecomment-166185198 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48091/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11627] Add initial input rate limit for...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9593#issuecomment-166185146 **[Test build #48091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48091/consoleFull)** for PR 9593 at commit [`2d750c4`](https://github.com/apache/spark/commit/2d750c4c1cedaff9849137710b58242bcd15bef9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12443][SQL] encoderFor should support D...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10399#discussion_r48114262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -61,6 +61,7 @@ object ScalaReflection extends ScalaReflection { case t if t <:< definitions.ByteTpe => ByteType case t if t <:< definitions.BooleanTpe => BooleanType case t if t <:< localTypeOf[Array[Byte]] => BinaryType + case t if t <:< localTypeOf[Decimal] => DecimalType.SYSTEM_DEFAULT --- End diff -- Should we add a TODO to say that, we can remove this line after we hide the `Decimal`? Logically `Decimal` is an internal concept and we should not expose it to users. cc @marmbrus @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Dataset nullability check
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/10331#discussion_r48116475 --- Diff: sql/catalyst/src/test/resources/log4j.properties --- @@ -16,9 +16,9 @@ # # Set everything to be logged to the file target/unit-tests.log -log4j.rootCategory=INFO, file +log4j.rootCategory=TRACE, file log4j.appender.file=org.apache.log4j.FileAppender -log4j.appender.file.append=true +log4j.appender.file.append=false --- End diff -- remove these? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6624][SQL]Convert filters into CNF for ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/8200#issuecomment-166213204 @gatorsmile +1 and great work :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166165926 **[Test build #48089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48089/consoleFull)** for PR 10405 at commit [`0a46559`](https://github.com/apache/spark/commit/0a4655999772eed9296de438a61319765389e588). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11807] Remove support for Hadoop < 2.2
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10404#issuecomment-166168817 **[Test build #48088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48088/consoleFull)** for PR 10404 at commit [`6c9fb80`](https://github.com/apache/spark/commit/6c9fb800ea5d3ed2dcaba8cbbdb24bd4d32f0b65). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11807] Remove support for Hadoop < 2.2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10404#issuecomment-166168831 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48088/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11807] Remove support for Hadoop < 2.2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10404#issuecomment-166168830 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12339] [WebUI] Added a null check that ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10405#issuecomment-166170887 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10346#issuecomment-166179434 @andrewor14 Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12392][Core] Optimize a location order ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10346#issuecomment-166181392 **[Test build #48093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48093/consoleFull)** for PR 10346 at commit [`d962f15`](https://github.com/apache/spark/commit/d962f15e186bfe77d3fb3e5e4ec44d10b5523c0f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12438][SQL] Add SQLUserDefinedType supp...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10390#issuecomment-166181434 cc @cloud-fan @marmbrus @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12293][SQL] Support UnsafeRow in LocalT...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10283#issuecomment-166184035 @cloud-fan I think I have addressed all your comments. These bugs found in implementing UnsafeRow support in LocalTableScan are submitted as other PRs with their tests, so you can review them better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12399] Display correct error message wh...
Github user carsonwang commented on a diff in the pull request: https://github.com/apache/spark/pull/10352#discussion_r48114578 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -115,7 +117,17 @@ class HistoryServer( } def getSparkUI(appKey: String): Option[SparkUI] = { -Option(appCache.get(appKey)) --- End diff -- `appCache.getIfPresent` returns null if there is no cached value for the appKey. But `appCache.get` will try to obtain that value from a `CacheLoader`, cache it and return it. So I think we still need use `appCache.get` here and handle the exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48114994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala --- @@ -51,9 +52,12 @@ case class Generate( join: Boolean, outer: Boolean, output: Seq[Attribute], +generatorOutput: Seq[Attribute], child: SparkPlan) extends UnaryNode { + override def missingInput: AttributeSet = super.missingInput -- generatorOutput + --- End diff -- You can use the same approach in logical.Generate, i.e., override def expressions: Seq[Expression] = generator :: Nil to solve this issue. Then you don't need to modify SparkStrategies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12439][SQL] Fix toCatalystArray and Map...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10391#issuecomment-166189382 Good catch! One mirror comment, can we write the test in `ExpressionEncoderSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12441] [SQL] Fixing missingInput in Gen...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10393#discussion_r48115249 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/Generate.scala --- @@ -51,9 +52,12 @@ case class Generate( join: Boolean, outer: Boolean, output: Seq[Attribute], +generatorOutput: Seq[Attribute], child: SparkPlan) extends UnaryNode { + override def missingInput: AttributeSet = super.missingInput -- generatorOutput + --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12287] [SQL] Support UnsafeRow in MapPa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10398#issuecomment-166202288 **[Test build #48095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48095/consoleFull)** for PR 10398 at commit [`4c745f5`](https://github.com/apache/spark/commit/4c745f5256700b160a10f0077be49e77a10e758b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12292] [SQL] Support UnsafeRow in Gener...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10396#issuecomment-166203268 After a few tries, I am unable to create a test case that can trigger the issue. I think I am not the right person to fix it. Thus, For not wasting the reviewers' time, I close it. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12292] [SQL] Support UnsafeRow in Gener...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/10396 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12446][SQL] Add unit tests for JDBCRDD ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/10409 [SPARK-12446][SQL] Add unit tests for JDBCRDD internal functions No tests done for JDBCRDD#compileFilter. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark AddTestsInJdbcRdd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10409.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10409 commit 30a01c9ec2f44339511cf3fb816d91650e0f7ebb Author: Takeshi YAMAMURODate: 2015-12-18T05:21:57Z Add tests in JDBCSuite commit ed94623cb01e36e790824903b9e937495cae3942 Author: Takeshi YAMAMURO Date: 2015-12-21T05:34:45Z fix minor bugs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12446][SQL] Add unit tests for JDBCRDD ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10409#issuecomment-166211512 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12446][SQL] Add unit tests for JDBCRDD ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10409#issuecomment-166211511 **[Test build #48097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48097/consoleFull)** for PR 10409 at commit [`ed94623`](https://github.com/apache/spark/commit/ed94623cb01e36e790824903b9e937495cae3942). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12446][SQL] Add unit tests for JDBCRDD ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10409#issuecomment-166211513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48097/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5479] [yarn] Handle --py-files correctl...
Github user zjffdu commented on the pull request: https://github.com/apache/spark/pull/6360#issuecomment-166224830 @vanzin I am reading the yarn related code specially on org.apache.spark.deploy.yarn.Client.scala Do you know where LOCAL_SCHEME("local") come from ? As I know we use file:// to represent a local resource, so not sure where "local" come from. Another question is that if I specify spark.yarn.jar as a hdfs location, the yarn client will still copy it to staging directory, I don't know why we do this. Would just use the hdfs file as LocalResource without copying much easier ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Runtime nullability check f...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10331#issuecomment-166227395 @yhuai Narrowed down the scope of this PR. As we discussed offline, will open another one for the analysis phase check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12102][SQL] Cast a non-nullable struct ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10156#discussion_r48123275 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -274,4 +274,12 @@ class AnalysisSuite extends AnalysisTest { assert(lits(1) >= min && lits(1) <= max) assert(lits(0) == lits(1)) } + + test("SPARK-12102: Ignore nullablity when comparing two sides of case") { +val caseBranches = Seq((Literal(1) > Literal(0)), + CreateStruct(Seq(Cast(Floor(Literal(10)), IntegerType))), + CreateStruct(Seq(Literal(10 +val plan = OneRowRelation.select(Alias(CaseWhen(caseBranches), "val")()) +assertAnalysisSuccess(plan) --- End diff -- we can simplify this test to: ``` val relation = LocalRelation('a.struct('x.int), 'b.struct('x.int.withNullability(false))) val plan = relation.select(CaseWhen(Seq(Literal(true), 'a, 'b)).as("val")) assertAnalysisSuccess(plan) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2331] SparkContext.emptyRDD should retu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10394#issuecomment-166158720 **[Test build #2241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2241/consoleFull)** for PR 10394 at commit [`6c3df28`](https://github.com/apache/spark/commit/6c3df287eec016df93df02f2f8715fe24355cc65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12349] [ML] Make spark.ml PCAModel load...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/10327#discussion_r48101296 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala --- @@ -167,14 +167,37 @@ object PCAModel extends MLReadable[PCAModel] { private val className = classOf[PCAModel].getName +/** + * Loads a [[PCAModel]] from data the input path. Note that the model includes an --- End diff -- Oops, will fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10158] [PySpark] [MLlib] ALS better err...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9361 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/10400 [SPARK-12444][SQL] A lightweight Scala DSL for schema construction This PR introduces a lightweight Scala DSL for constructing complex Spark SQL schema without introducing implicit conversion or any new types. Two DSL methods `!` and `?` are added to `DataType` to help indicating nullability of struct field, array element type, and map value type. - `!` means non-nullable (or required), while - `?` means nullable (or optional) With the help of these two methods, and three more constructors, we can now construct schema like this: ```scala StructType( "f0" -> IntegerType.!, "f1" -> ArrayType(IntegerType.?).!, "f2" -> MapType( IntegerType, StructType( "f20" -> DoubleType.!, "f21" -> StringType.? ).! ).? ) ``` which is conciser and arguably more readable than equivalent existing approaches: ```scala StructType(Seq( StructField("f0", IntegerType, nullable = false), StructField("f1", ArrayType(IntegerType, containsNull = true), nullable = false), StructField("f2", MapType( IntegerType, StructType(Seq( StructField("f20", DoubleType, nullable = false), StructField("f21", StringType, nullable = true) )), valueContainsNull = false ), nullable = true) )) new StructType() .add("f0", IntegerType, nullable = false) .add("f1", ArrayType(IntegerType, containsNull = true), nullable = false) .add("f2", MapType( IntegerType, new StructType() .add("f20", DoubleType, nullable = false) .add("f21", StringType, nullable = true), valueContainsNull = false ), nullable = true) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark schema-dsl Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10400.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10400 commit a6ea8e7ef7a8a30ebe6bc7bc931649f32e1bb7f0 Author: Cheng LianDate: 2015-12-20T09:14:18Z A lightweight DSL for schema construction --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12292] [SQL] Support UnsafeRow in Gener...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10396#issuecomment-16611 @gatorsmile I think this needs a bit more work. Generate produces new rows, this means we also need to add a code path for generating ```UnsafeRow```s. I think we need to add/change code in ```Generate.execute``` and also to the ```UserDefinedGenerator``` and ```Explode``` generators. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12369][SQL]DataFrameReader fails on glo...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10379#issuecomment-166088383 @yanakad Thanks for your explanation, now I understand your use case. I agree that this is somewhat inconvenient under this use case. But I still tend to say this shouldn't be an issue, because: 1. At application level, this issue can be worked around by first globbing the lowest directories first, and then passing result path(s) to `DataFrameReader.parquet()` method. 2. Changes made in this PR bring negative impact to the public API: - As mentioned above, the behavior becomes more error-prone and dangerous - The behavior becomes inconsistent with other data sources. For example, ORC, JSON, and JDBC all throws exception when the input path/JDBC URL is invalid or doesn't exist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12010][SQL] Spark JDBC requires support...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/10380#discussion_r48101417 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -60,20 +60,6 @@ object JdbcUtils extends Logging { } /** - * Returns a PreparedStatement that inserts a row into table via conn. --- End diff -- Hm, the only problem here is that this is a public method, and while it feels like it was intended to be a Spark-only utility method, I'm not sure it's marked as such. It's not a big deal to retain it and implement in terms of the new method. However it's now a function of a dialect, which is not an argument here. I suppose any dialect will do since they all behave the same now. This method could then be deprecated. However: yeah, the behavior is actually the same for all dialects now. Really, this has come full circle and can just be a modification to this method, which was already the same for all dialects. Is there reason to believe the insert statement might vary later? Then I could see keeping the current structure here and just deprecating this method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12444][SQL] A lightweight Scala DSL for...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10400#issuecomment-166112698 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12371][SQL] Checks Dataset nullability ...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10331#issuecomment-166142301 @yhuai Thanks a lot for the explanation, I misunderstood the scope of the JIRA ticket. Updated this PR according to @marmbrus's [comment][1] in #10296. A new expression `AssertNotNull` is added to assert non-nullable constructor arguments are indeed non-null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org